This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Geometry and analytic properties of the sliced Wasserstein space

Sangmin Park  and  Dejan Slepčev S. Park, D. Slepčev: Department of Mathematical Sciences, Carnegie Mellon University, 5000 Forbes ave., Pittsburgh, PA 15213 [email protected], [email protected]
Abstract.

The sliced Wasserstein metric compares probability measures on d\mathbb{R}^{d} by taking averages of the Wasserstein distances between projections of the measures to lines. The distance has found a range of applications in statistics and machine learning, as it is easier to approximate and compute in high dimensions than the Wasserstein distance. While the geometry of the Wasserstein metric is quite well understood, and has led to important advances, very little is known about the geometry and metric properties of the sliced Wasserstein (SW) metric. Here we show that when the measures considered are “nice” (e.g. bounded above and below by positive multiples of the Lebesgue measure ) then the SW metric is comparable to the (homogeneous) negative Sobolev norm H˙(d+1)/2\dot{H}^{-(d+1)/2}. On the other hand when the measures considered are close in the infinity transportation metric to a discrete measure, then the SW metric between them is close to a multiple of the Wasserstein metric. We characterize the tangent space of the SW space, and show that the speed of curves in the space can be described by a quadratic form, but that the SW space is not a length space. We establish a number of properties of the metric given by the minimal length of curves between measures – the SW length. Finally we highlight the consequences of these properties on the gradient flows in the SW metric.

Keywords: Sliced Wasserstein Distance, Optimal Transport, Radon Transform, Gradient Flows in Spaces of Measures

MSC (2020): 49Q22, 46E27, 60B10, 44A12


Notation

  • d\mathbb{P}_{d} – the set we call the Radon domain is defined as d=(𝕊d1×)/\mathbb{P}_{d}=(\mathbb{S}^{d-1}\times\mathbb{R})/\!\sim, the quotient space for the equivalence relation (θ,r)(θ,r)(-\theta,-r)\sim(\theta,r); see (1.5)

  • Rf=f^Rf=\widehat{f} – the Radon transform of function f:df\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R}; see (1.4)

  • R𝔤=𝔤ˇR^{\ast}\mathfrak{g}=\widecheck{\mathfrak{g}} – the dual Radon transform of function 𝔤:d\mathfrak{g}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{P}_{d}\rightarrow\mathbb{R}; see (2.1)

  • 𝒮(Ω)\mathcal{S}(\Omega) with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} – the Schwartz class of functions; see (2.5)

  • 𝒮(Ω)\mathcal{S}^{\prime}(\Omega) with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} – the set of tempered distributions on Ω\Omega

  • (Ω)\mathcal{M}(\Omega) with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} – the set of locally finite Borel measures on Ω\Omega; see (2.4)

  • b(Ω)\mathcal{M}_{b}(\Omega) with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} – the set of bounded Borel measures on Ω\Omega

  • 𝒫p(d)\mathscr{P}_{p}(\mathbb{R}^{d}), 𝒫p(d)\mathscr{P}_{p}(\mathbb{P}_{d}) – sets of probability measures with bounded pp-th moments; see (2.15)

  • WpW_{p} – the pp-transportation distance; see (1.1). We write WW for the Wasserstein distance W2W_{2}.

  • SWpSW_{p} – the pp-sliced Wasserstein distance; see (1.3). We write SWSW for SW2SW_{2}.

  • Γ(μ,ν)\Gamma(\mu,\nu) – set of transport plans between probability measures μ,ν\mu,\nu; see (1.1)

  • Γ^(μ^,ν^)\widehat{\Gamma}(\widehat{\mu},\widehat{\nu}) – set of slice-wise transport plans between probability measures μ^,ν^𝒫(d)\widehat{\mu},\widehat{\nu}\in\mathscr{P}(\mathbb{P}_{d}); see (2.18)

  • Γo(μ,ν)\Gamma_{o}(\mu,\nu) – set of optimal transport plans between μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}) for the quadratic cost; see (2.17)

  • TμνT_{\mu}^{\nu} – optimal transport map for quadratic cost between μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) and ν𝒫2(d)\nu\in\mathscr{P}_{2}(\mathbb{R}^{d})

  • Γ^o(μ^,ν^)\widehat{\Gamma}_{o}(\widehat{\mu},\widehat{\nu}) – set of slice-wise optimal transport plans between μ^,ν^𝒫2(d)\widehat{\mu},\widehat{\nu}\in\mathscr{P}_{2}(\mathbb{P}_{d}); see (2.19)

  • T^μ^ν^\widehat{T}_{\widehat{\mu}}^{\widehat{\nu}} – slice-wise optimal transport map from μ^𝒫2(d)\widehat{\mu}\in\mathscr{P}_{2}(\mathbb{P}_{d}) to ν^𝒫2(d)\widehat{\nu}\in\mathscr{P}_{2}(\mathbb{P}_{d})

  • df=(2π)d/2deixf(x)𝑑x\mathcal{F}_{d}f=(2\pi)^{-d/2}\int_{\mathbb{R}^{d}}e^{-ix}f(x)\,dx – the dd-dimensional Fourier transform of a function f:df\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R}

  • 1𝔤=(2π)1/2eir𝔤(θ,r)𝑑r\mathcal{F}_{1}\mathfrak{g}=(2\pi)^{-1/2}\int_{\mathbb{R}}e^{-ir}\mathfrak{g}(\theta,r)\,dr – the slice-wise 11-dimensional Fourier transform of 𝔤:d\mathfrak{g}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{P}_{d}\rightarrow\mathbb{R}

  • Λd=(r2)d12\Lambda_{d}=(-\partial_{r}^{2})^{\tfrac{d-1}{2}} – slice-wise fractional derivative applied to functions on d\mathbb{P}_{d}; see (2.11) and (A.9)

  • Hts(Ω)H_{t}^{s}(\Omega) with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} and ss\in\mathbb{R}, t>d2t>-\frac{d}{2} – Sobolev space with attenuated/amplified low frequencies; see (2.6) and (2.7). We write Hts(Ω)=ε>0HtsεH_{t}^{s-}(\Omega)=\bigcap_{\varepsilon>0}H_{t}^{s-\varepsilon}; see (3.1)

  • (d;d)\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) – space of admissible fluxes for the continuity equation; see (3.1)

  • 𝒞\mathcal{CE} – set of suitable distributional solutions of the continuity equation; see Definition 3.6

  • f,gΩ\langle f,g\rangle_{\Omega} with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} – action of a distribution ff on Ω\Omega on a function gg on Ω\Omega; see (2.14)

1. Introduction

The sliced Wasserstein distance, introduced by Rabin, Peyré, Delon, and Bernot [51], compares probability measures on d\mathbb{R}^{d} by taking averages of the Wasserstein distances between projections of the measures to each 1-dimensional subspaces of d\mathbb{R}^{d}. Thanks to its lower sample and computational complexity relative to the Wasserstein distance, the sliced Wasserstein distance and its variants [20, 28, 9, 45, 48, 4, 43, 42] have recently expanded its applications in statistics [44, 34, 40, 37, 41] and machine learning [29, 10, 35, 18, 21, 20, 30, 31] as a tool to compare measures and construct paths in spaces of measures. For p[1,+)p\in[1,+\infty), we write SWpSW_{p} (or the SWpSW_{p} distance) to refer to the pp-sliced Wasserstein distance, and refer to the corresponding space as the SWpSW_{p} space. When p=2p=2, we drop the subscript and simply refer to them as the SWSW distance and the SWSW space.

Despite the multitude of uses of the sliced Wasserstein distance there are only a few works dealing with its metric and geometric properties: Bonnotte [12] established that SWpSW_{p} is indeed a metric, and is equivalent to WpW_{p} for measures supported on a common compact set for all p1p\geq 1; more recently, Bayraktar and Guo [5] showed that SWpSW_{p} and WpW_{p} induce the same topology on 𝒫p(d)\mathscr{P}_{p}(\mathbb{R}^{d}) for p1p\geq 1. This is in stark contrast with the Wasserstein metric, whose geometry has been a subject of intense study and has led to important advances; see [1, 54, 65].

Here we take steps towards a better understanding of the sliced Wasserstein distance and its geometry. In particular we show that for measures that are absolutely continuous with respect to the Lebesgue measure, have bounded density, and differ only within a set compactly contained in the interior of their support, the SW metric is comparable to the (homogeneous) negative Sobolev norm H˙(d+1)/2\dot{H}^{-(d+1)/2}; see Theorem 5.2. On the other hand when the measures considered are close in the infinity-transportation metric to a discrete measure, the SW metric between them is close to a multiple of the Wasserstein metric (Theorem 5.5). We show that, unlike the Wasserstein space, the SW space is not a length space. Nevertheless, it still has a tangential structure that resembles the one of the Wasserstein space. We also show that geodesics (considered as length minimizing curves) in SW exist and study the intrinsic metric SW\ell_{SW}, defined as the length of the minimizing geodesics between measures. In particular we show that SW\ell_{SW} satisfies some of the similar comparison and approximation properties as the SW metric. Finally we discuss the consequences of these properties to gradient flows with respect to the SW metric.

1.1. Setting

For Ω=d,𝕊d1×\Omega=\mathbb{R}^{d},\mathbb{S}^{d-1}\times\mathbb{R} we denote by 𝒫(Ω)\mathscr{P}(\Omega) the space of all Borel probability measures on Ω\Omega. Let 1p<1\leq p<\infty. For probability measures μ,ν𝒫p(d):={μ𝒫(d):d|x|pdμ(x)<}\mu,\nu\in\mathscr{P}_{p}(\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}=\{\mu\in\mathscr{P}(\mathbb{R}^{d})\>\mathrel{\mathop{\mathchar 58\relax}}\>\int_{\mathbb{R}^{d}}|x|^{p}d\mu(x)<\infty\}, the pp-Wasserstein distance, WpW_{p}, is defined as follows:

(1.1) Wp(μ,ν):=infγΓ(μ,ν)(d×d|xy|p𝑑γ(x,y))1/p where Γ(μ,ν)={γ𝒫(d×d):π#1γ=μ,π#2γ=ν}.\begin{split}W_{p}(\mu,\nu)\mathrel{\mathop{\mathchar 58\relax}}=&\inf_{\gamma\in\Gamma(\mu,\nu)}\left(\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}|x-y|^{p}\,d\gamma(x,y)\right)^{1/p}\\ &\text{ where }\Gamma(\mu,\nu)=\left\{\gamma\in\mathscr{P}(\mathbb{R}^{d}\times\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}\,\pi^{1}_{\#}\gamma=\mu,\,\pi^{2}_{\#}\gamma=\nu\right\}.\end{split}

To define the sliced Wasserstein distance, we introduce the following notation: for each θ𝕊d1\theta\in\mathbb{S}^{d-1}, let θ:=Span{θ}d\mathbb{R}\theta\mathrel{\mathop{\mathchar 58\relax}}=\operatorname{Span}{\{\theta\}}\subset\mathbb{R}^{d}, and define the projection πθ:dθ\pi^{\theta}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R}\theta by

(1.2) πθ(x)=(θx)θ.\pi^{\theta}(x)=(\theta\cdot x)\theta.

The pp-sliced Wasserstein distance SWpSW_{p} is defined by

(1.3) SWp(μ,σ)=(𝕊d1Wpp(π#θμ,π#θσ)𝑑θ)1p.SW_{p}(\mu,\sigma)=\left(\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}W_{p}^{p}(\pi^{\theta}_{\#}\mu,\pi^{\theta}_{\#}\sigma)\,d\theta\right)^{\frac{1}{p}}.

The Radon transform provides a natural language to describe objects relating to the sliced Wasserstein distance. Consider an integrable function f:df\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R}, for d2d\geq 2. We use both RfRf and f^\widehat{f} to denote its Radon transform: For θ𝕊d1\theta\in\mathbb{S}^{d-1} and rr\in\mathbb{R}

(1.4) Rθf(r)=Rf(θ,r)=f^(θ,r):=θf(rθ+yθ)dyθ,R_{\theta}f(r)=Rf(\theta,r)=\widehat{f}(\theta,r)\mathrel{\mathop{\mathchar 58\relax}}=\int_{\theta^{\perp}}f(r\theta+y^{\theta})\,dy^{\theta},

where θ:={yd:yθ=0}\theta^{\perp}\mathrel{\mathop{\mathchar 58\relax}}=\{y\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\;y\cdot\theta=0\} and dyθdy^{\theta} is the (d1)(d-1)-dimensional Lebesgue measure on θ\theta^{\perp}. By Fubini’s theorem, Rθf(r)R_{\theta}f(r) exists for 1\mathscr{L}^{1} a.e. rr\in\mathbb{R} for each θ𝕊d1\theta\in\mathbb{S}^{d-1} when fL1(d)f\in L^{1}(\mathbb{R}^{d}).

Note that RfRf is even on 𝕊d1×\mathbb{S}^{d-1}\times\mathbb{R}, meaning that Rf(θ,r)=Rf(θ,r)Rf(-\theta,-r)=Rf(\theta,r). This motivates defining the dd-dimensional “Radon domain” d\mathbb{P}_{d} by

(1.5) d:=(𝕊d1×)/, where the equivalence relation is given by (θ,r)(θ,r).\mathbb{P}_{d}\mathrel{\mathop{\mathchar 58\relax}}=(\mathbb{S}^{d-1}\times\mathbb{R})/\sim\text{, where the equivalence relation is given by }(-\theta,-r)\sim(\theta,r).

The Radon transform can be extended to distributions (see [25, Chapter 1.5] and [57]); in particular, when μ\mu is a bounded measure, the distributional extension is consistent with the definition of RθμR_{\theta}\mu as a pushforward of μ\mu by the projection map xxθx\mapsto x\cdot\theta (see Remark A.1). Thus, for μ𝒫p(d)\mu\in\mathscr{P}_{p}(\mathbb{R}^{d}) we have μ^θ=Rθμ𝒫p()\widehat{\mu}^{\theta}=R_{\theta}\mu\in\mathscr{P}_{p}(\mathbb{R}), and

SWp(μ,σ)=(𝕊d1Wpp(μ^θ,σ^θ)𝑑θ)1p.SW_{p}(\mu,\sigma)=\left(\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}W_{p}^{p}(\widehat{\mu}^{\theta},\widehat{\sigma}^{\theta})\,d\theta\right)^{\frac{1}{p}}.

Note that μ^θ𝒫p()\widehat{\mu}^{\theta}\in\mathscr{P}_{p}(\mathbb{R}) whereas π#θμ𝒫p(θ)\pi^{\theta}_{\#}\mu\in\mathscr{P}_{p}(\mathbb{R}\theta); we will sometimes use the latter when it is more convenient to consider measures on subspaces of d\mathbb{R}^{d} than on \mathbb{R}.

Henceforth we will focus on the case p=2p=2, and write SW=SW2SW=SW_{2} and W=W2W=W_{2}.

1.2. Summary of results

We obtain a number of geometric and analytic properties of the SWSW distance and of the associated length space, and investigate their implications on the sliced Wasserstein gradient flows and statistical estimation rates in the metrics. Throughout this paper, we use Xα1,,αkYX\lesssim_{\alpha_{1},\cdots,\alpha_{k}}Y as a shorthand for the inequality XC(α1,,αk)YX\leq C(\alpha_{1},\cdots,\alpha_{k})Y, where C(α1,,αk)>0C(\alpha_{1},\cdots,\alpha_{k})>0 is a finite positive constant depending on (αi)i=1k(\alpha_{i})_{i=1}^{k}. When summarizing results or making remarks, we sometimes omit the dependence on the parameters and write \lesssim for the sake of simplicity; however, all rigorous statements contain clear characterizations of the constants.

Basic properties. In Section 2 we establish some basic properties of the SWSW metric. In particular in Proposition 2.4 we show that (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) is a complete metric space, which we refer to as the SWSW space. We then turn to the intrinsic geometry of the SWSW space. Example 2.5 shows that, unlike the Wasserstein space, the SWSW space is not a geodesic space. That is, one cannot in general find a continuous curve connecting two measures in the SW space with its length equal to the distance between the measures.

Tangential structure of sliced Wasserstein space. We show in Section 3 that the SWSW space has a tangent structure which resembles the tangent structure of the Wasserstein space. Recall that in the Wasserstein space, each absolutely continuous curve (μt)tI(\mu_{t})_{t\in I} defined on an interval II\subset\mathbb{R} corresponds to a measure-valued distributional solution of the continuity equation

tμt+(vtμt)=0, with vtL2(μt)=|μ|W(t) for a.e. tI,\partial_{t}\mu_{t}+\nabla\cdot(v_{t}\mu_{t})=0,\;\text{ with }\|v_{t}\|_{L^{2}(\mu_{t})}=|\mu^{\prime}|_{W}(t)\text{ for a.e. }t\in I,

where |μ|W|\mu^{\prime}|_{W} is the metric derivative w.r.t WW; see [1, Theorem 8.3.1]. Theorem 3.9 establishes an analogous result for the sliced Wasserstein space – for each absolutely continuous curve (μt)tI(\mu_{t})_{t\in I} in the sliced Wasserstein space, there exists a vector-valued flux (Jt)tI(J_{t})_{t\in I} such that JtJ_{t} is in a suitable subspace of 𝒮(d;d)\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}) and

tμt+Jt=0, with dJ^tdμ^tL2(μ^t)=|μ|SW(t) for a.e. tI.\partial_{t}\mu_{t}+\nabla\cdot J_{t}=0,\;\text{ with }\mathinner{\!\left\lVert\tfrac{d\widehat{J}_{t}}{d\widehat{\mu}_{t}}\right\rVert}_{L^{2}(\widehat{\mu}_{t})}=|\mu^{\prime}|_{SW}(t)\text{ for a.e. }t\in I.

We note two key differences: the metric derivative is characterized by the weighted L2L^{2} norm in the Radon domain, finiteness of which does not imply JtμtJ_{t}\ll\mu_{t} in general (see Remark 3.10). Moreover, Sharafutdinov’s results of Radon transform on Sobolev spaces [57] imply that RfL2(d)=fH˙(d1)/2(d)\|Rf\|_{L^{2}(\mathbb{P}_{d})}=\|f\|_{\dot{H}^{-(d-1)/2}(\mathbb{R}^{d})}, thus we can formally understand |μ|SW(t)|\mu^{\prime}|_{SW}(t) as corresponding to a weighted high order negative Sobolev norm of the flux JtJ_{t}, in contrast to the weighted L2L^{2}-norm in the Wasserstein case. Hence, at least for absolutely continuous measures, the formal Riemannian metric measuring infinitesimal length in the sliced Wasserstein space corresponds to a weaker space than the one for the Wasserstein metric. Furthermore in Section 3.2, we characterize its tangent space which has the following key property analogous to its Wasserstein counterpart: Each absolutely continuous curve (μt)tI(\mu_{t})_{t\in I} is associated to a unique (up to a 1¬I\mathscr{L}^{1}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{I}-null set) family of tangent vectors (Jt)tI(J_{t})_{t\in I}, which moreover attain the metric derivative through the quadratic form JtdJ^t/dμ^tL2(μ^t)J_{t}\mapsto\|d\widehat{J}_{t}/d\widehat{\mu}_{t}\|_{L^{2}(\widehat{\mu}_{t})}.

Intrinsic sliced Wasserstein length space. In Section 4, motivated by the general of lack geodesics in (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW), we introduce the sliced Wasserstein length metric SW\ell_{SW} defined as the infimum of the lengths of curves between measures in the SW space. We establish the basic properties of the metric space (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}); in particular we prove in Proposition 4.5 that geodesics exist, which further implies that in general SWSW\ell_{SW}\neq SW.

Comparison of sliced Wasserstein metric with negative Sobolev norms and Wasserstein metric. In Section 5 we establish some of the key results of this paper, namely the comparison theorems of SWSW metric with negative Sobolev norms near absolutely continuous measures and comparisons of SWSW with the Wasserstein metric near discrete measures. In particular, consider an absolutely continuous measure μ\mu bounded away from zero and infinity on some bounded open convex domain Ω\Omega. Theorem 5.2 establishes that

μνH˙(d+1)/2(d)SW(μ,ν)SW(μ,ν)SW(μ,ν)μνH˙(d+1)/2(d),\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\lesssim SW(\mu,\nu)\leq\ell_{SW}(\mu,\nu)\lesssim SW(\mu,\nu)\lesssim\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})},

for all measures ν\nu which are bounded above and below by constant multiples of μ\mu and coincide with μ\mu near the boundary of Ω\Omega. In other words we show that near μ\mu, SWSW is equivalent to H˙(d+1)/2\dot{H}^{-(d+1)/2}.

On the other hand, Theorem 5.5 states

SW(μn,ν)SW(μn,ν)1dW(μn,ν)(1+o(1))SW(μn,ν)SW(\mu^{n},\nu)\leq\ell_{SW}(\mu^{n},\nu)\leq\frac{1}{d}W(\mu^{n},\nu)\leq(1+o(1))SW(\mu^{n},\nu)

for ν\nu near discrete measures of the form μn=i=1nmiδxi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}}.

These two results provide interesting insights about the SWSW metric. Near smooth measures it behaves like a highly negative Sobolev space, in contrast to the Wasserstein metric which for such measures behaves like the H˙1\dot{H}^{-1} norm as noted by Peyre [49], while near discrete measures SWSW behaves like the Wasserstein distance.

Approximation by discrete measures in sliced Wasserstein length. Manole, Balakrishnan, and Wasserman [37, Proposition 4] have shown that a finite random sample (i.e. the empirical measure of the set of nn random points) of a probability measure on d\mathbb{R}^{d} estimates the measure in the sliced Wasserstein distance at a parametric rate O(n1/2)O(n^{-1/2}) for a large class of measures; see also [41]. This is in stark contrast with the Wasserstein distance where the approximation error is poor in high dimensions and scales like n1/dn^{-1/d}. We start by pointing out a connection between the results on the parametric finite-sample estimation in the sliced Wasserstein distance and the results in statistical literature, that our results in Section 5 identify. Namely it is known that finite-sample estimation of measures with respect to maximum mean discrepancy (MMD) also enjoys parametric rate  [61, Theorem 3.3]. MMD distance is nothing but the norm in the dual of a reproducing kernel Hilbert space (RKHS). In particular the results of [61] apply to the dual of the Sobolev space HsH^{s} with s>d2s>\frac{d}{2} (when the spaces embeds in the spaces of Hölder continuous functions and are RKHS). Our Theorem 5.2 says that near absolutely continuous measures, SW behaves like the H˙(d+1)/2\dot{H}^{-(d+1)/2}-norm; as the associated norm H(d+1)/2(d)\|\cdot\|_{H^{-(d+1)/2}(\mathbb{R}^{d})} is an MMD, we can formally understand SWSW to exhibit behaviors like an MMD. Thus the MMD parametric estimation can be seen as a tangential or a linearized analogue of the finite sample estimation rates in SW distance.

Here we investigate the finite sample estimation rates in the SW intrinsic length metric SW\ell_{SW}. The goal is to gain a better understanding of the extent to which SWSW and SW\ell_{SW} share properties. In Theorem 6.3, we establish that the finite sample approximation in SW\ell_{SW} happens at the parametric rate up to a logarithmic correction, namely that

SW(μ,μn)lognn with high probability, where μn=1ni=1nδXi with Xii.i.d.μ.\ell_{SW}(\mu,\mu^{n})\lesssim\sqrt{\frac{\log n}{n}}\;\text{ with high probability, where }\mu^{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\text{ with }X_{i}\overset{i.i.d.}{\sim}\mu.

While this is consistent with the geometric view of (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) as a curved or nonlinear dual of Reproducing Kernel Hilbert Space (see beginning of Section 6 for discussion), the statement and the proof requires dealing with discrete measures where such heuristic view does not hold.

Implications on gradient flows. Section 7 applies the comparison results on SW\ell_{SW}, SWSW to obtain comparisons for the metric slopes. Given a metric space (X,m)(X,m), recall that metric slope ||m|\partial\mathcal{E}|_{m} of a functional :X\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}X\rightarrow\mathbb{R} is defined by

(1.6) ||m(u)=lim supv𝑑u[(u)(v)]+m(u,v).|\partial\mathcal{E}|_{m}(u)=\limsup_{v\xrightarrow[]{d}u}\frac{[\mathcal{E}(u)-\mathcal{E}(v)]_{+}}{m(u,v)}.

Let V:dV\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R} and consider the potential energy 𝒱(μ):=dV(x)dμ(x)\mathcal{V}(\mu)\mathrel{\mathop{\mathchar 58\relax}}=\int_{\mathbb{R}^{d}}V(x)\,d\mu(x). Proposition 7.2 states that when VV is smooth and compactly supported, for suitable absolutely continuous μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) it holds that

|𝒱|H˙(d+1)/2(d)(μ)|𝒱|SW(μ)|𝒱|SW(μ)|𝒱|H˙(d+1)/2(d)(μ)|\partial\mathcal{V}|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}(\mu)\lesssim|\partial\mathcal{V}|_{\ell_{SW}}(\mu)\leq|\partial\mathcal{V}|_{SW}(\mu)\lesssim|\partial\mathcal{V}|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}(\mu)

whereas Proposition 7.5 shows that the slope behaves quite differently at discrete measures, μn=i=1nmiδxi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}}, namely that

|𝒱|SW(μn)=|𝒱|SW(μn)=d|𝒱|W(μn).|\partial\mathcal{V}|_{SW}(\mu^{n})=|\partial\mathcal{V}|_{\ell_{SW}}(\mu^{n})=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu^{n}).

By considering a sequence of discrete measures μn\mu^{n} converging to an absolutely continuous measure μ\mu, we may deduce that |𝒱|SW|\partial\mathcal{V}|_{SW} (resp. |𝒱|SW|\partial\mathcal{V}|_{\ell_{SW}}) is not lower semicontinuous in SWSW (resp. SW\ell_{SW}) in general, even when VCc(d)V\in C_{c}^{\infty}(\mathbb{R}^{d}); see Corollary 7.7. This implies that the potential energy is not λ\lambda-geodesically convex in (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}). Furthermore we observe in Remark 7.6 that starting from discrete measures with finite number of particles, the curves of maximal slope in the Wasserstein space, after a constant rescaling of time, are the curves of maximal slope in the SW space.

On the other hand, for smooth measures, the curves of maximal slope with respect to the Wasserstein metric are not curves of maximal slope in the SW space. We formally show that SW gradient flows of the potential energy satisfy a higher order equation given by a pseudodifferential operator of order dd, which is consistent with the rigorous results of Proposition 7.2. We conclude that the framework of gradient flows in metric spaces would not be the right tool to study such equations; PDE based approaches may provide an avenue for creating a well-posedness theory, which remains an open problem.

1.3. Related works

Since the introduction of the sliced Wasserstein distance [51], numerous variants have been considered. Deshpande et al. [20] proposed the max-sliced Wasserstein distance (max-SW distance), which is the maximum of the 1D Wasserstein distances, instead of the average as in the SWSW case. Niles-Weed and Rigollet [45] and Paty and Cuturi [48] independently proposed the kk-dimensional generalization (max-kk-SW distance) for 1kd1\leq k\leq d. Generalizations to spherical [9] and other nonlinear projections [28] have also been considered to more effectively capture the geometric structure of data. Based on the ideas of partial optimal transportation [23], Bai, Schmitzer, Thorpe, and Kolouri [4] introduced sliced optimal partial transport to compare of measures with different masses. Further projection-based transport metrics include the distributional sliced Wasserstein distance introduced by Nguyen, Ho, Pham, and Bui [43] and the convolution sliced Wasserstein distance proposed by Nguyen and Ho [42].

The sliced Wasserstein distances have found numerous applications in image processing. In fact, utility of the sliced Wasserstein barycenter for tasks such as image synthesis, color transfer, and texture mixing served as a motivation behind the introduction of sliced Wasserstein distance [51]. Bonneel, Rabin, Peyré, and Pfister [11] further studied efficient numerical methods to compute sliced Wasserstein and related barycenters, and their applications. Kolouri, Park, and Rhode proposed the Radon cumulative distribution transport (Radon CDT) [29] for image classification; Radon CDT effectively computes the sliced Wasserstein ‘geodesic’, by taking the Radon inverse of the displacement interpolation between the Radon transform of the measures. However, we note that such inverse will in general fail to be a curve in the space of probability measures, as the Radon inverse of nonnegative functions need not be nonnegative.

Gradient flows related to the sliced Wasserstein distance have been applied to various machine learning and image processing tasks. Bonnotte noticed [12] that the continuous analogue of the the isotropic Iterative Distribution Transfer (IDT) algorithm, introduced by [50] to transfer the color palette of a reference picture to a target picture, is the Wasserstein gradient flow of μ12SW2(μ,σ)\mu\mapsto\frac{1}{2}SW^{2}(\mu,\sigma). Liutkus et al. [35] utilizes the Wasserstein gradient flows of the entropy-regularized version of the same energy functional for generative modelling. Gradient flow of the same energy in the sliced Wasserstein space have been considered by Bonet et al. [10] also for generative modelling; they also study the JKO scheme with respect to SWSW, and establish existence and uniqueness of minimizers of the scheme when the optimization is restricted to probability measures supported on a common compact set [10, Section 3.2]. Sliced Iterative Normalizing Flows (SINF) [18], useful for sampling and density evaluation, can be seen as a max-SW variant of the isotropic IDT algorithm.

Other applications in machine learning include: sliced Wasserstein generative adversarial nets by Deshpande, Zhang, Schwing [21]; max-SW generative adversarial nets [20] for generative modelling; sliced Wasserstein autoencoder by [30]; and use of SW1SW_{1} distance for unsupervised domain adaptation [31].

On the statistical side, Manole, Balakrishnan, and Wasserman [37] established, based on the 1-dimensional results by Bobkov and Ledoux [8], the parametric estimation rate 𝔼SW(μn,μ)n1/2\mathbb{E}SW(\mu^{n},\mu)\lesssim n^{-1/2} for the empirical measure μn\mu^{n} of nn i.i.d samples of μ\mu, and further investigated statistical properties of the trimmed sliced Wasserstein distances. Nietert, Goldfeld, Sadhu, and Kato [44] established empirical estimation rate in SWSW and max-SWSW for log-concave distributions with explicit constants dependent on the intrinsic dimension, and showed robustness to data contamination and explored efficient computational methods. Lin, Zheng, Chen, Cuturi, and Jordan [34] investigated the max-kk-sliced distances and their corresponding integral variants integral projection robust Wasserstein (IPRW) distance, also known as the kk-sliced Wasserstein distances (kk-SW distances), and established several statistical properties including sample complexity O(n1/k)O(n^{-1/k}). More recently, Olea, Rush, Velez, and Wiesel [46] explored the connection between a certain linear predictor problems and distributionally robust optimization based on a modified max-SW. For applications in Approximate Bayesian Computation, we refer the readers to [40, Chapter 4] and the references therein.

Regarding analytic and topological properties, Bonnotte [12] showed that SWpSW_{p} is indeed a distance on 𝒫p(d)\mathscr{P}_{p}(\mathbb{R}^{d}) for 1p<1\leq p<\infty, and established that, for measures supported on Ωd\Omega\subset\subset\mathbb{R}^{d} – i.e. Ω\Omega compactly contained in d\mathbb{R}^{d} – we have

SWpWpΩSWp12p(d+1).SW_{p}\leq W_{p}\lesssim_{\Omega}SW_{p}^{\frac{1}{2p(d+1)}}.

More recently, Bayraktar and Guo [5] showed that SWpSW_{p}, WpW_{p}, and the pp-max-sliced Wasserstein distance induce the same topology on 𝒫p(d)\mathscr{P}_{p}(\mathbb{R}^{d}) for p1p\geq 1. We note here that this does not directly imply completeness of (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW), as not all Cauchy sequences in SWSW need be Cauchy in WW.

Bonnotte also showed the existence of the Wasserstein gradient flow of the energy functional μ12SW2(μ,σ)\mu\mapsto\frac{1}{2}SW^{2}(\mu,\sigma) for the target measure σ\sigma, despite the lack of geodesic convexity of the energy functional, and derived the corresponding PDE [12, Chapter 5], the continuous-time version of the previously mentioned isotropic IDT algorithm. Due to the lack of convexity of the energy functional, even the asymptotic convergence of the gradient flow remains open. Nevertheless, Li and Moosmüller [33] recently established almost sure convergence of the discrete isotropic IDT algorithm with step-sizes satisfying certain summability conditions. More recently, Cozzi and Santambrogio [17] established the convergence rate tSW2(μt,σ)=O(t1)t\mapsto SW^{2}(\mu_{t},\sigma)=O(t^{-1}) when the target measure σ\sigma is any isotropic Gaussian.

As our work was nearing completion, we became aware of the independent work by Kitagawa and Takatsu [26] on the sliced Wasserstein spaces. In their work, Kitagawa and Takatsu establish in the metric completeness of sliced optimal-transportation-based spaces, which generalizes our Proposition 2.4, and also demonstrate that the SW spaces are not geodesic spaces, generalizing our Example 2.5. Their work focuses on isometrically embedding the SW type spaces into larger spaces and the barycenter problem. See also [27] for their more recent work on disintegrated optimal transport for metric fiber bundles.

2. Basic properties of the sliced Wasserstein space

In this section, we examine the basic properties of the sliced Wasserstein space (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW). We start by reviewing a few properties of the Radon transform that we use (Section 2.1). In Section 2.2 we establish basic metric properties including lower semicontinuity of SWSW and precompactness of balls in SWSW with respect to the narrow topology, from which completeness follows. We conclude the section by noting that, unlike the Wasserstein space, the sliced Wasserstein space is not a geodesic space.

2.1. Preliminaries on the Radon transform

Here we provide a brief overview of the key properties of the Radon transform. We refer the readers to Appendix A for precise statements and to the book by Helgason [25] for a more thorough introduction.

The dual Radon transform. Given an integrable function 𝔤:d\mathfrak{g}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{P}_{d}\rightarrow\mathbb{R} we define its dual Radon transform, which we write R𝔤R^{\ast}\mathfrak{g} or 𝔤ˇ\widecheck{\mathfrak{g}}, by

(2.1) R𝔤(x)=𝔤ˇ(x)=𝕊d1𝔤(θ,xθ)𝑑θ.R^{\ast}\mathfrak{g}(x)=\widecheck{\mathfrak{g}}(x)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathfrak{g}(\theta,x\cdot\theta)\,d\theta.

As

𝕊d1g(θ,r)𝑑1(r)𝑑θ=𝕊d1dg(θ,xθ)𝑑d(x)𝑑θ=dRg(x)𝑑d(x),\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}g(\theta,r)\,d\mathscr{L}^{1}(r)\,d\theta=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}}g(\theta,x\cdot\theta)\,d\mathscr{L}^{d}(x)\,d\theta=\int_{\mathbb{R}^{d}}R^{\ast}g(x)\,d\mathscr{L}^{d}(x),

by Fubini’s theorem R𝔤R^{\ast}\mathfrak{g} is well-defined for d\mathscr{L}^{d}-a.e. xdx\in\mathbb{R}^{d} whenever 𝔤L1(d)\mathfrak{g}\in L^{1}(\mathbb{P}_{d}). Furthermore, the dual transform RR^{\ast} satisfies

(2.2) 𝕊d1Rf(θ,r)𝔤(θ,r)𝑑r𝑑θ=df(x)R𝔤(x)𝑑x\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}Rf(\theta,r)\mathfrak{g}(\theta,r)\,dr\,d\theta=\int_{\mathbb{R}^{d}}f(x)R^{\ast}\mathfrak{g}(x)dx

whenever either Rf𝔤Rf\mathfrak{g} or fR𝔤fR^{\ast}\mathfrak{g} are absolutely integrable; see [25, Lemma 5.1] for further details. In particular, the extension of the Radon transform to finite measures μ\mu as the pushforward of μ\mu under the map xxθx\mapsto x\cdot\theta is consistent with (2.2) (see Remark A.1). Consequently, we will often use the duality formula for bounded measures in the form

(2.3) dR𝔤𝑑μ=𝕊d1𝔤𝑑μ^ for μb(d) and 𝔤C0(d).\int_{\mathbb{R}^{d}}R^{\ast}\mathfrak{g}\,d\mu=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\mathfrak{g}\,d\widehat{\mu}\;\;\text{ for }\mu\in\mathcal{M}_{b}(\mathbb{R}^{d})\text{ and }\mathfrak{g}\in C_{0}(\mathbb{P}_{d}).

Spaces related to the Radon transform. To add clarity, we denote the functions defined on d\mathbb{P}_{d} by a different set of symbols – e.g. 𝔣,𝔤,𝔲,𝔳,𝔍\mathfrak{f},\mathfrak{g},\mathfrak{u},\mathfrak{v},\mathfrak{J}. We denote by (Ω)\mathcal{M}(\Omega) the space of locally finite signed Borel measures on Ω\Omega. We note that (d)\mathcal{M}(\mathbb{P}_{d}) can be identified with

(2.4) {𝔍(𝕊d1×):d𝔍(θ,r)=d𝔍(θ,r)}.\{\mathfrak{J}\in\mathcal{M}(\mathbb{S}^{d-1}\times\mathbb{R})\mathrel{\mathop{\mathchar 58\relax}}d\mathfrak{J}(-\theta,-r)=d\mathfrak{J}(\theta,r)\}.

We write b(Ω)\mathcal{M}_{b}(\Omega) for the space of bounded Borel measures on Ω\Omega. Any Ω=d,d,I×d,I×d\Omega=\mathbb{R}^{d},\mathbb{P}_{d},I\times\mathbb{R}^{d},I\times\mathbb{P}_{d} is a Polish space, hence (Ω)\mathcal{M}(\Omega) can be equivalently understood as a space of signed Radon measures. Finally, we denote by (Ω;d):=(Ω)d\mathcal{M}(\Omega;\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}=\mathcal{M}(\Omega)^{d} the space of vector valued Radon measures.

We will mostly treat θ𝕊d1\theta\in\mathbb{S}^{d-1} as a parameter and rr\in\mathbb{R} as the variable, which is reflected in our notation. For instance, for a function 𝔤:d\mathfrak{g}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{P}_{d}\rightarrow\mathbb{R} we write 𝔤θ(r)=𝔤(θ,r)\mathfrak{g}^{\theta}(r)=\mathfrak{g}(\theta,r). Denoting by Vol𝕊d1\operatorname{Vol}_{\mathbb{S}^{d-1}} the normalized volume measure on 𝕊d1\mathbb{S}^{d-1} satisfying 𝕊d1dVol𝕊d1=1\int_{\mathbb{S}^{d-1}}d\operatorname{Vol}_{\mathbb{S}^{d-1}}=1, for each 𝔍b(d)\mathfrak{J}\in\mathcal{M}_{b}(\mathbb{P}_{d}) we write 𝔍θb()\mathfrak{J}^{\theta}\in\mathcal{M}_{b}(\mathbb{R}) for its disintegration with respect to Vol𝕊d1\operatorname{Vol}_{\mathbb{S}^{d-1}} – i.e. 𝔍=𝔍θdVol𝕊d1(θ)\mathfrak{J}=\mathfrak{J}^{\theta}\,d\operatorname{Vol}_{\mathbb{S}^{d-1}}(\theta); for precise statement of the disintegration theorem, see [19, III-70] or [1, Theorem 5.3.1]. We will always consider 𝔍b(d)\mathfrak{J}\in\mathcal{M}_{b}(\mathbb{P}_{d}) with its first marginal equal to Vol𝕊d1\operatorname{Vol}_{\mathbb{S}^{d-1}}.

We denote by 𝒮(Ω)\mathcal{S}(\Omega) with Ω=d,d\Omega=\mathbb{R}^{d},\mathbb{P}_{d} the Schwartz-Bruhat space of smooth rapidly decreasing functions [15, 47]. We note that 𝒮(d)\mathcal{S}(\mathbb{R}^{d}) is the usual Schwartz class, whereas 𝒮(d)\mathcal{S}(\mathbb{P}_{d}) can be identified with the subspace of 𝒮(𝕊d1×)\mathcal{S}(\mathbb{S}^{d-1}\times\mathbb{R}) of even functions, namely the set

(2.5) {𝔤𝒮(𝕊d1×):𝔤(θ,r)=𝔤(θ,r)}\{\mathfrak{g}\in\mathcal{S}(\mathbb{S}^{d-1}\times\mathbb{R})\mathrel{\mathop{\mathchar 58\relax}}\,\mathfrak{g}(-\theta,-r)=\mathfrak{g}(\theta,r)\}

We write 𝒮(Ω)\mathcal{S}^{\prime}(\Omega) with Ω=d,d\Omega=\mathbb{R}^{d},\mathbb{P}_{d} to denote the space of continuous linear functionals on 𝒮(Ω)\mathcal{S}(\Omega) –i.e. the space of tempered distributions on Ω\Omega.

The L2L^{2}-Sobolev theory of Radon transforms will be crucial in understanding the differential structure of the sliced Wasserstein space. For this purpose, we use Sobolev spaces HtsH_{t}^{s} with attenuated (t>0t>0) or amplified (t<0t<0) low frequencies, introduced by Sharafutdinov [57]. For each 1kd1\leq k\leq d let us denote by k\mathcal{F}_{k} the kk-dimensional Fourier transform

kf(ξ)=(2π)k/2kf(x)eixξ𝑑x.\mathcal{F}_{k}f(\xi)=(2\pi)^{-k/2}\int_{\mathbb{R}^{k}}f(x)e^{-ix\cdot\xi}\,dx.

For ss\in\mathbb{R} and t>d2t>-\frac{d}{2}, the Hilbert space Hts(d)H_{t}^{s}(\mathbb{R}^{d}) is defined as the completion of 𝒮(d)\mathcal{S}(\mathbb{R}^{d}) under the norm

(2.6) fHts(d)2=d|ξ|2t(1+|ξ|2)st|df(ξ)|2𝑑ξ.\|f\|_{H_{t}^{s}(\mathbb{R}^{d})}^{2}=\int_{\mathbb{R}^{d}}|\xi|^{2t}(1+|\xi|^{2})^{s-t}|\mathcal{F}_{d}f(\xi)|^{2}\,d\xi.

Similarly we define the analogous space Hts(d)H_{t}^{s}(\mathbb{P}_{d}) for ss\in\mathbb{R} and t>12t>-\frac{1}{2} in the Radon domain as the completion of Schwartz functions 𝒮(d)\mathcal{S}(\mathbb{P}_{d}) on the Radon domain under the norm

(2.7) 𝔤Hts(d)2=12(2π)d1𝕊d1|ζ|2t(1+ζ2)st|1𝔤(θ,ζ)|2𝑑ζ𝑑θ.\|\mathfrak{g}\|_{H_{t}^{s}(\mathbb{P}_{d})}^{2}=\frac{1}{2(2\pi)^{d-1}}\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\zeta|^{2t}(1+\zeta^{2})^{s-t}|\mathcal{F}_{1}\mathfrak{g}(\theta,\zeta)|^{2}\,d\zeta\,d\theta.

Here and in the sequel the one-dimensional Fourier transform 1\mathcal{F}_{1} always applies to the scalar variable when applied to functions defined on d\mathbb{P}_{d}. We only use the norms HtsH_{t}^{s} or HtsH_{-t}^{-s} for 0ts0\leq t\leq s. In the first case, we can view the HtsH_{t}^{s} norm as counting derivatives of order between tt and ss, as

|ξ|2t+|ξ|2sts|ξ|2t(1+|ξ|2)tsts|ξ|2t+|ξ|2s.|\xi|^{2t}+|\xi|^{2s}\lesssim_{t-s}|\xi|^{2t}(1+|\xi|^{2})^{t-s}\lesssim_{t-s}|\xi|^{2t}+|\xi|^{2s}.

Thus when 0=ts0=t\leq s we see HtsH_{t}^{s} coincides with the standard Sobolev space of order ss. On the other hand, the space HtsH_{-t}^{-s} can be understood as the dual of HtsH_{t}^{s}. Indeed, for any f,g𝒮(d)f,g\in\mathcal{S}(\mathbb{R}^{d})

(2.8) |df(x)g(x)𝑑x|fHts(d)gHts(d);\left|\int_{\mathbb{R}^{d}}f(x)g(x)\,dx\right|\leq\|f\|_{H_{t}^{s}(\mathbb{R}^{d})}\|g\|_{H_{-t}^{-s}(\mathbb{R}^{d})};

for details, see [57, Theorem 5.3] and its proof. We provide further information on the relationship between the Radon transform and Sobolev spaces in Appendix A.

Outside of Section 2 we will mostly be interested in the case t=st=s, where H˙s(Ω):=Hss(Ω)\dot{H}^{s}(\Omega)\mathrel{\mathop{\mathchar 58\relax}}=H_{s}^{s}(\Omega) with Ω=d\Omega=\mathbb{R}^{d} or d\mathbb{P}_{d} is equivalent to the more familiar homogeneous Sobolev space.

We only consider the space Hts(d)H_{t}^{s}(\mathbb{R}^{d}) where d2<t<d2-\frac{d}{2}<t<\frac{d}{2}, and Hts(d)H_{t}^{s}(\mathbb{P}_{d}) for 12<t<12-\frac{1}{2}<t<\frac{1}{2} which ensures that the identity map continuously embeds Hts(d)H_{t}^{s}(\mathbb{R}^{d}) to 𝒮(d)\mathcal{S}^{\prime}(\mathbb{R}^{d}) and the same holds for Hts(d)H_{t}^{s}(\mathbb{P}_{d}) and 𝒮(d)\mathcal{S}^{\prime}(\mathbb{P}_{d}); see [57, Theorem 5.3], which we have included in the appendix (Theorem A.11) for completeness. Thus, for Ω=d,d\Omega=\mathbb{R}^{d},\mathbb{P}_{d}, the spaces Hts(Ω)H_{t}^{s}(\Omega) can be seen as a complete normed subspace of 𝒮(Ω)\mathcal{S}^{\prime}(\Omega). We stress that, while we use the norm H˙(d+1)/2(d)\|\cdot\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})} in comparison to SWSW, generic elements of spaces Hts(d)H_{t}^{s}(\mathbb{R}^{d}) with |t|>d2|t|>\frac{d}{2} are not considered in this paper.

Furthermore, when 0<s<d20<s<\frac{d}{2}, H˙s(d)\dot{H}^{s}(\mathbb{R}^{d}) continuously embeds to L2dd2s(d)L^{\frac{2d}{d-2s}}(\mathbb{R}^{d}) by Gagliardo-Nirenberg-Sobolev inequality for fractional Sobolev spaces; see for instance [32, Theorem 11.31]. Thus, we can consider H˙s(d)\dot{H}^{s}(\mathbb{R}^{d}) as a space of functions in this case.

We note that in some works the definition of homogeneous Sobolev spaces for s<d/2s<d/2 differs from the one we use. Namely H˙s(Ω)\dot{H}^{s}(\Omega) is defined as the subset of 𝒮(Ω)\mathcal{S}^{\prime}(\Omega) for which the seminorm (2.6) for t=st=s is bounded; in this case, elements in H˙s\dot{H}^{s} are uniquely defined in 𝒮\mathcal{S}^{\prime} modulo polynomials; see for instance [62, Remark 3, Section 5.1].

Sharafutdinov showed [57, Theorem 2.1] that the Radon transform can be extended as a bijective isometry between Hts(d)H_{t}^{s}(\mathbb{R}^{d}) and Ht+(d1)/2s+(d1)/2(d)H^{s+(d-1)/2}_{t+(d-1)/2}(\mathbb{P}_{d}) – i.e. when t>d2t>-\frac{d}{2}

(2.9) fHts(d)=RfHt+(d1)/2s+(d1)/2(d).\|f\|_{H_{t}^{s}(\mathbb{R}^{d})}=\|Rf\|_{H^{s+(d-1)/2}_{t+(d-1)/2}(\mathbb{P}_{d})}.

The special case t=s=0t=s=0 was observed by Reshetnyak, recorded in [24, Section 1.1.5] and also in [25, Chapter 1, Theorem 4.1]. Whenever t(d/2,d/2+1)t\in(-d/2,-d/2+1), the Hs+(d1)/2t+(d1)/2(d)H_{s+(d-1)/2}^{t+(d-1)/2}(\mathbb{P}_{d})-norm is stronger than the topology of 𝒮(d)\mathcal{S}^{\prime}(\mathbb{P}_{d}), thus the continuous extension of the Radon transform applied to any function fHts(d)f\in H_{t}^{s}(\mathbb{R}^{d}) is unambiguously defined as an element of 𝒮(d)\mathcal{S}^{\prime}(\mathbb{P}_{d}) independently of t(d/2,d/2+1)t\in(-d/2,-d/2+1) and ss\in\mathbb{R}. Therefore in the remainder of this paper we refer to this extension simply as the Radon transform.

In Sections 5 and 6 we will make use of the weighted homogeneous Sobolev norm of order 1-1 in dimension 1. Given σ,μ,ν𝒫2()\sigma,\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}), the H˙1(σ)\dot{H}^{-1}(\sigma)-norm of μν\mu-\nu is defined by

(2.10) μνH˙1(σ):=sup{φd(μν):φ𝒮() and φH˙1(σ)1}.\|\mu-\nu\|_{\dot{H}^{-1}(\sigma)}\mathrel{\mathop{\mathchar 58\relax}}=\sup\left\{\int_{\mathbb{R}}\varphi\,d(\mu-\nu)\mathrel{\mathop{\mathchar 58\relax}}\;\varphi\in\mathcal{S}(\mathbb{R})\text{ and }\|\varphi\|_{\dot{H}^{1}(\sigma)}\leq 1\right\}.

Operators related to the Radon transform. Calculus using the Radon transform often involves Λd\Lambda_{d}, which is defined via

(2.11) Λd=(r2)d12.\Lambda_{d}=(-\partial_{r}^{2})^{\frac{d-1}{2}}.

When dd is even, the fractional power of the 1-dimensional Laplace operator is defined using the Hilbert transform; for precise definitions, see Definition A.7 in the appendix. Observe that Λd\Lambda_{d} is well-defined as an operator from 𝒮(d)\mathcal{S}(\mathbb{P}_{d}) to itself, and can be extended as a bounded operator from Htr(d)H^{r}_{t}(\mathbb{P}_{d}) to Ht(d1)r(d1)(d)H^{r-(d-1)}_{t-(d-1)}(\mathbb{P}_{d}) for t>d3/2t>d-3/2; see Remark A.8.

The operators Λd\Lambda_{d} can be understood by their interaction with the Fourier transform, namely

(2.12) (1Λd𝔤)(θ,ζ)=|ζ|d11𝔤(θ,ζ).\begin{split}(\mathcal{F}_{1}\Lambda_{d}\mathfrak{g})(\theta,\zeta)&=|\zeta|^{d-1}\mathcal{F}_{1}\mathfrak{g}(\theta,\zeta).\end{split}

We also use fractional powers of the Laplace operator in dd-dimensions, which can be defined via the Fourier transform by

(d(Δ)sf)(ξ)=|ξ|2sdf(ξ).(\mathcal{F}_{d}(-\Delta)^{s}f)(\xi)=|\xi|^{2s}\mathcal{F}_{d}f(\xi).

Again, observe that (Δ)s(-\Delta)^{s} is well-defined as an operator from 𝒮(d)\mathcal{S}(\mathbb{R}^{d}) to itself, and can be extended as a bounded operator from Htr(d)H^{r}_{t}(\mathbb{R}^{d}) to Ht2sr2s(d)H^{r-2s}_{t-2s}(\mathbb{R}^{d}) when t2s>d2t-2s>-\frac{d}{2}. Rigorous definition of (Δ)s(-\Delta)^{s} for fractional powers ss without relying on the Fourier transform can be found in [25, Chapter 7.6], which we have included in Proposition A.5 in the appendix for completeness. The inversion formulae are expressed using these operators: Setting cd=(4π)(d1)/2Γ(d/2)/Γ(1/2)c_{d}=(4\pi)^{(d-1)/2}\Gamma(d/2)/\Gamma(1/2), for each f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}) and 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d}) we have

cdf=RΛdRf=(Δ)(d1)/2RRf and cd𝔤(θ,r)=RR(Λd𝔤).c_{d}f=R^{\ast}\Lambda_{d}Rf=(-\Delta)^{(d-1)/2}R^{\ast}Rf\quad\text{ and }\quad c_{d}\mathfrak{g}(\theta,r)=RR^{\ast}(\Lambda_{d}\mathfrak{g}).

See Proposition A.9 for further details.

Whenever t(d2,d2+1)t\in(-\frac{d}{2},-\frac{d}{2}+1), we have Hts(d)𝒮(d)H_{t}^{s}(\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d}) and Ht+(d1)/2s+(d1)/2(d)𝒮(d)H_{t+(d-1)/2}^{s+(d-1)/2}(\mathbb{P}_{d})\subset\mathcal{S}^{\prime}(\mathbb{P}_{d}). In this case, straightforward calculations using the inversion formula and the Fourier transform imply

(2.13) J(cdφ)=RJ(ΛdRφ) for JHts(d) and φ𝒮(d).J(c_{d}\varphi)=RJ(\Lambda_{d}R\varphi)\text{ for }J\in H_{t}^{s}(\mathbb{R}^{d})\text{ and }\varphi\in\mathcal{S}(\mathbb{R}^{d}).

For each of the domains Ω=d,d,\Omega=\mathbb{R}^{d},\mathbb{P}_{d},\mathbb{R}, we will write

(2.14) J,φΩ:=J(φ) for J𝒮(Ω) and φ𝒮(Ω).\langle J,\varphi\rangle_{\Omega}\mathrel{\mathop{\mathchar 58\relax}}=J(\varphi)\text{ for }J\in\mathcal{S}^{\prime}(\Omega)\text{ and }\varphi\in\mathcal{S}(\Omega).

2.2. Basic properties of sliced Wasserstein metric

In this section we establish some basic properties of the SW distance and the SW space.

Let 𝒫2(d)\mathscr{P}_{2}(\mathbb{P}_{d}) be the set of Borel probability measures in the Radon domain with bounded second moment

(2.15) 𝒫2(d):={μ^𝒫2(𝕊d1×):π#1μ^=Vol𝕊d1,μ^θ𝒫2(),dμ^θ(r)=dμ^θ(r),𝕊d1r2dμ^θ(r)dθ<}\begin{split}\mathscr{P}_{2}(\mathbb{P}_{d})\mathrel{\mathop{\mathchar 58\relax}}=&\left\{\widehat{\mu}\in\mathscr{P}_{2}(\mathbb{S}^{d-1}\times\mathbb{R})\mathrel{\mathop{\mathchar 58\relax}}\pi^{1}_{\#}\widehat{\mu}=\operatorname{Vol}_{\mathbb{S}^{d-1}},\;\;\widehat{\mu}^{\theta}\in\mathscr{P}_{2}(\mathbb{R}),\phantom{\int_{\mathbb{R}}}\right.\\ &\left.\;\;d\widehat{\mu}^{-\theta}(-r)=d\widehat{\mu}^{\theta}(r),\quad\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}r^{2}\,d\widehat{\mu}^{\theta}(r)\,d\theta<\infty\right\}\end{split}

where π1\pi^{1} is the projection in the first variable. In other words, (μ^θ)θ𝕊d1(\widehat{\mu}^{\theta})_{\theta\in\mathbb{S}^{d-1}} is a family of measures in 𝒫2()\mathscr{P}_{2}(\mathbb{R}) parametrized by θ𝕊d1\theta\in\mathbb{S}^{d-1} additionally satisfying the evenness condition. Observe that for each θ𝕊d1\theta\in\mathbb{S}^{d-1}, we can choose an orthonormal frame {θ1,,θd}\{\theta^{1},\cdots,\theta^{d}\} with θ1=θ\theta^{1}=\theta and θi𝕊d1\theta^{i}\in\mathbb{S}^{d-1} for i=1,,di=1,\cdots,d. As |x|2=i=1d|xθi|2|x|^{2}=\sum_{i=1}^{d}|x\cdot\theta^{i}|^{2}, we have

𝕊d1|x|2dVol𝕊d1(θ)=i=1d𝕊d1|xθi|2dVol𝕊d1(θi)=d𝕊d1|xθ|2dVol𝕊d1(θ).\int_{\mathbb{S}^{d-1}}|x|^{2}\,d\operatorname{Vol}_{\mathbb{S}^{d-1}}(\theta)=\sum_{i=1}^{d}\int_{\mathbb{S}^{d-1}}|x\cdot\theta^{i}|^{2}\,d\operatorname{Vol}_{\mathbb{S}^{d-1}}(\theta^{i})=d\int_{\mathbb{S}^{d-1}}|x\cdot\theta|^{2}\,d\operatorname{Vol}_{\mathbb{S}^{d-1}}(\theta).

Thus

𝕊d1r2𝑑μ^θ(r)𝑑θ=𝕊d1d|xθ|2𝑑μ(x)𝑑θ=1dd|x|2𝑑μ(x).\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}r^{2}d\widehat{\mu}^{\theta}(r)\,d\theta=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}}|x\cdot\theta|^{2}\,d\mu(x)\,d\theta=\frac{1}{d}\int_{\mathbb{R}^{d}}|x|^{2}\,d\mu(x).

Equivalently,

(2.16) SW2(μ,δ0)=1dW2(μ,δ0) for any μ𝒫2(d).SW^{2}(\mu,\delta_{0})=\frac{1}{d}W^{2}(\mu,\delta_{0})\text{ for any }\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}).

Thus the finite second moment condition in the Euclidean and Radon domain coincide. Hence, μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) if and only if μ^𝒫2(d)\widehat{\mu}\in\mathscr{P}_{2}(\mathbb{P}_{d}).

Given μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}), let Γo(μ,ν)\Gamma_{o}(\mu,\nu) be the set of optimal transport plans for the quadratic cost:

(2.17) Γo(μ,ν)={γΓ(μ,ν):d×d|xy|2𝑑γ(x,y)=W2(μ,ν)},\Gamma_{o}(\mu,\nu)=\left\{\gamma\in\Gamma(\mu,\nu)\mathrel{\mathop{\mathchar 58\relax}}\;\normalcolor\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}|x-y|^{2}\,d\gamma(x,y)\normalcolor=W^{2}(\mu,\nu)\right\},

where Γ(μ,ν)\Gamma(\mu,\nu) is as defined in (1.1). On the other hand, given μ^,ν^𝒫2(d)\widehat{\mu},\widehat{\nu}\in\mathscr{P}_{2}(\mathbb{P}_{d}), write

(2.18) Γ^(μ^,ν^)={γ^𝒫(𝕊d1××):π#1γ^=Vol𝕊d1,γ^θΓ(μ^θ,ν^θ) for a.e. θ𝕊d1}\widehat{\Gamma}(\widehat{\mu},\widehat{\nu})=\{\widehat{\gamma}\in\mathscr{P}(\mathbb{S}^{d-1}\times\mathbb{R}\times\mathbb{R})\mathrel{\mathop{\mathchar 58\relax}}\;\pi^{1}_{\#}\widehat{\gamma}=\operatorname{Vol}_{\mathbb{S}^{d-1}},\;\widehat{\gamma}^{\theta}\in\Gamma(\widehat{\mu}^{\theta},\widehat{\nu}^{\theta})\text{ for a.e. }\theta\in\mathbb{S}^{d-1}\}

where (γ^θ)θ𝕊d1(\widehat{\gamma}^{\theta})_{\theta\in\mathbb{S}^{d-1}} is the disintegration of γ^\widehat{\gamma} with respect to Vol𝕊d1\operatorname{Vol}_{\mathbb{S}^{d-1}}. Then we define the set Γ^o\widehat{\Gamma}_{o} of slice-wise optimal transport plans by

(2.19) Γ^o(μ^,ν^)={γ^Γ^(μ^,ν^):γ^θΓo(μ^θ,ν^θ) for a.e. θ𝕊d1}.\widehat{\Gamma}_{o}(\widehat{\mu},\widehat{\nu})=\{\widehat{\gamma}\in\widehat{\Gamma}(\widehat{\mu},\widehat{\nu})\mathrel{\mathop{\mathchar 58\relax}}\;\widehat{\gamma}^{\theta}\in\Gamma_{o}(\widehat{\mu}^{\theta},\widehat{\nu}^{\theta})\text{ for a.e. }\theta\in\mathbb{S}^{d-1}\}.

As noticed by Bonnotte [12], given γΓ(μ,ν)\gamma\in\Gamma(\mu,\nu), (πθ×πθ)#γΓ(π#θμ,π#θν)(\pi^{\theta}\times\pi^{\theta})_{\#}\gamma\in\Gamma(\pi^{\theta}_{\#}\mu,\pi^{\theta}_{\#}\nu), and thus

SW(μ,ν)1dW(μ,ν) for all μ,ν𝒫2(d).SW(\mu,\nu)\leq\frac{1}{\sqrt{d}}W(\mu,\nu)\text{ for all }\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}).

We denote the optimal transport map between μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}), if it exists, by TμνT_{\mu}^{\nu} – i.e. (Id×Tμν)Γo(μ,ν)(\operatorname{Id}\times T_{\mu}^{\nu})\in\Gamma_{o}(\mu,\nu), where Id(x)=x\operatorname{Id}(x)=x. Similarly, we denote by T^μ^ν^\widehat{T}_{\widehat{\mu}}^{\widehat{\nu}} the family of optimal transport maps in the Radon domain – i.e. T^μ^ν^:𝕊d1×\widehat{T}_{\widehat{\mu}}^{\widehat{\nu}}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{S}^{d-1}\times\mathbb{R}\rightarrow\mathbb{R} such that

T^μ^ν^Id^L2(μ^)=SW(μ,ν) where Id^(θ,r)=r.\|\widehat{T}_{\widehat{\mu}}^{\widehat{\nu}}-\widehat{\operatorname{Id}}\|_{L^{2}(\widehat{\mu})}=SW(\mu,\nu)\text{ where }\widehat{\operatorname{Id}}(\theta,r)=r.

Recall that the sequence (μn)n(\mu_{n})_{n\in\mathbb{N}} in 𝒫(d)\mathscr{P}(\mathbb{R}^{d}) converges narrowly to μ𝒫(d)\mu\in\mathscr{P}(\mathbb{R}^{d}) if for each continuously bounded function φCb(d)\varphi\in C_{b}(\mathbb{R}^{d})

(2.20) limndφ(x)𝑑μn(x)=dφ(x)𝑑μ(x).\lim_{n\rightarrow\infty}\int_{\mathbb{R}^{d}}\varphi(x)\,d\mu_{n}(x)=\int_{\mathbb{R}^{d}}\varphi(x)\,d\mu(x).

We begin by establishing lower semicontinuity of SWSW with respect to the narrow convergence.

Proposition 2.1 (SWSW is lower semicontinuous with respect to the narrow topology).

The map (μ,ν)SW(μ,ν)(\mu,\nu)\mapsto SW(\mu,\nu) from 𝒫2(d)×𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d})\times\mathscr{P}_{2}(\mathbb{R}^{d}) to [0,+)[0,+\infty) is lower semicontinuous with respect to the narrow topology.

Proof.

Note that the analogous statement for WpW_{p} for p[1,+)p\in[1,+\infty) is a classical result; see [65, Remark 6.12] for instance (where they refer to the narrow convergence as weak convergence). Clearly σkσ\sigma^{k}\rightharpoonup\sigma implies π#θσkπ#θσ\pi^{\theta}_{\#}\sigma^{k}\rightharpoonup\pi^{\theta}_{\#}\sigma. Thus, if (μk,νk)(μ,ν)(\mu^{k},\nu^{k})\rightharpoonup(\mu,\nu) narrowly, then by Fatou’s lemma

lim infkSW2(μk,νk)\displaystyle\liminf_{k}SW^{2}(\mu^{k},\nu^{k}) =lim infk𝕊d1W2(π#θμk,π#θνk)𝑑θ\displaystyle=\liminf_{k}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}W^{2}(\pi^{\theta}_{\#}\mu^{k},\pi^{\theta}_{\#}\nu^{k})\,d\theta
𝕊d1lim infkW2(π#θμk,π#θνk)dθ𝕊d1W2(π#θμ,π#θν)𝑑θ=SW2(μ,ν).\displaystyle\geq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\liminf_{k}W^{2}(\pi^{\theta}_{\#}\mu^{k},\pi^{\theta}_{\#}\nu^{k})\,d\theta\geq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}W^{2}(\pi^{\theta}_{\#}\mu,\pi^{\theta}_{\#}\nu)\,d\theta=SW^{2}(\mu,\nu).

Thus we deduce that SWSW is lower semicontinuous with respect to the narrow convergence. ∎

Remark 2.2 (Lack of compactness in SWSW).

We note here that the closed unit ball B¯SW(ν,1)\overline{B}^{SW}(\nu,1) in SWSW is not compact in the SWSW topology. The argument is analogous to one that shows that B¯W(ν,1)\overline{B}^{W}(\nu,1) is not compact with respect to the topology of the Wasserstein metric. The argument is as follows: Consider

νk=(1εk)ν+εkδxkB¯W(ν,1)B¯SW(ν,1)\nu_{k}=(1-\varepsilon_{k})\nu+\varepsilon_{k}\delta_{x_{k}}\in\overline{B}_{W}(\nu,1)\subset\overline{B}^{SW}(\nu,1)

where |xk||x_{k}|\nearrow\infty and choose εk0\varepsilon_{k}\searrow 0 such that εk|xkx0|2=1\varepsilon_{k}|x_{k}-x_{0}|^{2}=1. Quick calculations show that while νkν\nu_{k}\rightharpoonup\nu, the second moments do not converge, as

lim infkd|x|2𝑑νk=d|x|2𝑑ν+1.\liminf_{k\to\infty}\int_{\mathbb{R}^{d}}|x|^{2}\,d\nu_{k}=\int_{\mathbb{R}^{d}}|x|^{2}\,\normalcolor d\nu\normalcolor+1.

Thus W(νk,ν)↛0W(\nu_{k},\nu)\not\rightarrow 0. As WW and SWSW induce the same topology, we deduce that B¯SW(ν,1)\overline{B}^{SW}(\nu,1) is indeed not compact with respect to the SWSW topology. ∎

On the other hand we note that balls in sliced Wasserstein space are compact with respect to the narrow convergence of measures:

Proposition 2.3 (Narrow compactness of the sliced Wasserstein unit ball).

Let ν𝒫2(d)\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}) be fixed. Then the closed unit ball

B¯SW(ν,1):={σ𝒫2(d):SW(σ,ν)1}\overline{B}^{SW}(\nu,1)\mathrel{\mathop{\mathchar 58\relax}}=\left\{\sigma\in\mathscr{P}_{2}(\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}\;SW(\sigma,\nu)\leq 1\right\}

is compact with respect to the topology of narrow convergence.

Proof.

Recall from (2.16) that

d|x|2𝑑σ(x)=W2(δ0,σ)=dSW2(δ0,σ).\displaystyle\int_{\mathbb{R}^{d}}|x|^{2}\,d\sigma(x)=W^{2}(\delta_{0},\sigma)=dSW^{2}(\delta_{0},\sigma).

Thus, for all σB¯:=B¯SW(ν,1)\sigma\in\overline{B}\mathrel{\mathop{\mathchar 58\relax}}=\overline{B}^{SW}(\nu,1), we have

d|x|2𝑑σ(x)dSW2(δ0,σ)dSW2(δ0,ν)+SW2(ν,σ)SW2(δ0,ν)+1.\int_{\mathbb{R}^{d}}|x|^{2}\,d\sigma(x)\leq dSW^{2}(\delta_{0},\sigma)\lesssim_{d}SW^{2}(\delta_{0},\nu)+SW^{2}(\nu,\sigma)\leq SW^{2}(\delta_{0},\nu)+1.

Hence the second moment of probability measures in B¯\overline{B} is uniformly bounded, hence B¯\overline{B} is tight, as

σ(dB(0,R))\displaystyle\sigma(\mathbb{R}^{d}\setminus B(0,R)) =σ({xd:|x|R})1R2dB(0,R)|x|2𝑑σ(x)\displaystyle=\sigma(\{x\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}|x|\geq R\})\leq\frac{1}{R^{2}}\int_{\mathbb{R}^{d}\setminus B(0,R)}|x|^{2}\,d\sigma(x)
1R2d|x|2𝑑σ(x)d1R2(SW2(δ0,ν)+1).\displaystyle\leq\frac{1}{R^{2}}\int_{\mathbb{R}^{d}}|x|^{2}\,d\sigma(x)\lesssim_{d}\frac{1}{R^{2}}(SW^{2}(\delta_{0},\nu)+1).

Thus by Prokhorov’s theorem, for any sequence (σk)k(\sigma^{k})_{k} in B¯\overline{B} we can find a subsequence narrowly converging to σ0𝒫(d)\sigma^{0}\in\mathscr{P}(\mathbb{R}^{d}). Fix such a subsequence without relabling. Moreover σ0B¯\sigma^{0}\in\overline{B}, as

SW(ν,σ0)lim infkSW(ν,σk)1SW(\nu,\sigma^{0})\leq\liminf_{k}SW(\nu,\sigma^{k})\leq 1

by lower semicontinuity of SWSW (Proposition 2.1). ∎

We can deduce completeness from weak compactness, lower semicontinuity, and the topological equivalence. While the authors believe this is known, we record the proof here as we could not locate the statement of completeness in the literature.

Proposition 2.4 (Completeness).

(𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) is a complete metric space.

Proof.

Suppose (μk)k(\mu^{k})_{k} is a Cauchy sequence with respect to SWSW. Then we can find a closed unit ball B¯\overline{B} in SWSW that contains all μk\mu^{k} for sufficiently large kk, which is relatively compact by Proposition 2.3 hence has a subsequential narrow limit μ0B¯𝒫2(d)\mu^{0}\in\overline{B}\subset\mathscr{P}_{2}(\mathbb{R}^{d}). Fix such a subsequence without relabeling. Then, by the lower semicontinuity established in Proposition 2.1,

SW(μk,μ0)lim inflSW(μk,μl)k0.SW(\mu^{k},\mu^{0})\leq\liminf_{l}SW(\mu^{k},\mu^{l})\xrightarrow[]{k\rightarrow\infty}0.

By the triangle inequality we deduce μkμ0\mu^{k}\rightarrow\mu^{0} for the original sequence in SWSW. ∎

We conclude this section by noting that the sliced Wasserstein space is not a geodesic space. Indeed, let μ0,μ1𝒫2(d)\mu_{0},\mu_{1}\in\mathscr{P}_{2}(\mathbb{R}^{d}), and suppose (μt)t:[0,1]𝒫2(d)(\mu_{t})_{t}\mathrel{\mathop{\mathchar 58\relax}}[0,1]\rightarrow\mathscr{P}_{2}(\mathbb{R}^{d}) is a constant-speed SWSW-geodesic from μ0\mu_{0} to μ1\mu_{1} – i.e. for any 0s<t10\leq s<t\leq 1, SW(μt,μs)=(ts)SW(μ0,μ1)SW(\mu_{t},\mu_{s})=(t-s)SW(\mu_{0},\mu_{1}). Then for any NN\in\mathbb{N} and 0=t0<t1<<tN=10=t_{0}<t_{1}<\cdots<t_{N}=1,

i=0N1W(μ^ti,μ^ti+1)L2(Vol𝕊d1)i=0N1W(μ^ti,μ^ti+1)L2(Vol𝕊d1)=i=0N1SW(μti,μti+1)=SW(μ0,μ1).\displaystyle\mathinner{\!\left\lVert\sum_{i=0}^{N-1}W(\widehat{\mu}_{t_{i}}^{\cdot},\widehat{\mu}_{t_{i+1}}^{\cdot})\right\rVert}_{L^{2}(\operatorname{Vol}_{\mathbb{S}^{d-1}})}\leq\sum_{i=0}^{N-1}\|W(\widehat{\mu}_{t_{i}}^{\cdot},\widehat{\mu}_{t_{i+1}}^{\cdot})\|_{L^{2}(\operatorname{Vol}_{\mathbb{S}^{d-1}})}=\sum_{i=0}^{N-1}SW(\mu_{t_{i}},\mu_{t_{i+1}})=SW(\mu_{0},\mu_{1}).

Using this and the triangle inequality W(μ^0θ,μ^1θ)i=0N1W(μ^tiθ,μ^ti+1θ)W(\widehat{\mu}_{0}^{\theta},\widehat{\mu}_{1}^{\theta})\leq\sum_{i=0}^{N-1}W(\widehat{\mu}_{t_{i}}^{\theta},\widehat{\mu}_{t_{i+1}}^{\theta}),

0𝕊d1(i=0N1W(μ^tiθ,μ^ti+1θ))2W2(μ^0θ,μ^1θ)dθSW2(μ0,μ1)SW2(μ0,μ1)=0.\displaystyle 0\leq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\left(\sum_{i=0}^{N-1}W(\widehat{\mu}_{t_{i}}^{\theta},\widehat{\mu}_{t_{i+1}}^{\theta})\right)^{2}-W^{2}(\widehat{\mu}_{0}^{\theta},\widehat{\mu}_{1}^{\theta})\,d\theta\leq SW^{2}(\mu_{0},\mu_{1})-SW^{2}(\mu_{0},\mu_{1})=0.

As the integrand above is nonnegative, it is zero for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. By for instance considering all rational sequences 0=t0<<tN=10=t_{0}<\cdots<t_{N}=1 and arguing by density, we may deduce that for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1} the curve (μ^tθ)t[0,1](\widehat{\mu}^{\theta}_{t})_{t\in[0,1]} must be a 22-Wasserstein geodesic between μ^0θ\widehat{\mu}_{0}^{\theta} and μ^1θ\widehat{\mu}_{1}^{\theta}, hence characterized by the displacement interpolation based on the 1D transport plan.

Thus the problem of identifying the geodesic comes down to invertibility of displacement interpolant (μ^t)t[0,1](\widehat{\mu}_{t})_{t\in[0,1]}. In principle, sufficient regularity of μ^t\widehat{\mu}_{t} guarantees the existence of a function μt\mu_{t} such that Rμt=μ^tR\mu_{t}=\widehat{\mu}_{t} for t[0,1]t\in[0,1]. However, we additionally require R1μ^t0R^{-1}\widehat{\mu}_{t}\geq 0.

While the Radon transform preserves nonnegativity, the inverse Radon transform does not. For example, consider

gε=a𝟙B(0,1)b𝟙B(0,ε) with 0<a<b and ε1.g_{\varepsilon}=a\mathds{1}_{B(0,1)}-b\mathds{1}_{B(0,\varepsilon)}\text{ with }0<a<b\text{ and }\varepsilon\ll 1.

For sufficiently small ε>0\varepsilon>0 we see g^ε0\widehat{g}_{\varepsilon}\geq 0, whereas gε<0g_{\varepsilon}<0 near the origin. Mollifying gεg_{\varepsilon} within radius ε~ε\tilde{\varepsilon}\ll\varepsilon, we see that the additional regularity does not resolve the issue. In general, it is difficult to determine when the Radon inversion is nonnegative, as the inversion formula involves high order derivatives and RR^{\ast}. Indeed, in many cases the geodesic (μt)t(\mu_{t})_{t} cannot exist, as we can see in the following example.

Refer to caption𝝁𝟎=(𝑹𝝅/𝟒𝝁𝟎)\bm{\mu_{0}=(R_{\pi/4}\mu_{0})}𝝁𝟏\bm{\mu_{1}}𝑹𝝅/𝟒𝝁𝟏\bm{R_{\pi/4}\mu_{1}}𝑹𝝅/𝟒𝝁𝟏/𝟐\bm{R_{\pi/4}\mu_{1/2}}
Figure 1. Illustration of the counterexample provided in Example 2.5. Measures μ0\mu_{0} and μ1\mu_{1} each consist of a pairs of delta masses at the opposite vertices of the unit square, indicated by blue and orange discs respectively. The transparent disc in the center is Rπ/4μ1R_{\pi/4}\mu_{1} shown on the line at angle θ=π/4\theta=\pi/4. Green discs on the dashed anti-diagonal lines mark represent the displacement interpolation at t=1/2t=1/2 between Rπ/4μ0R_{\pi/4}\mu_{0} and Rπ/4μ1R_{\pi/4}\mu_{1}.
Example 2.5 ((𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) is not a geodesic space).

Consider μ0=12(δ(11)+δ(1,1))\mu_{0}=\frac{1}{2}(\delta_{(-1-1)}+\delta_{(1,1)}) and μ1=12(δ(1,1)+δ(1,1))\mu_{1}=\frac{1}{2}(\delta_{(-1,1)}+\delta_{(1,-1)}) . Suppose (μt)t[0,1](\mu_{t})_{t\in[0,1]} is the constant speed geodesic from μ0\mu_{0} to μ1\mu_{1}. As the quadratic cost is strictly convex, for each θ𝕊1\theta\in\mathbb{S}^{1}, RθμtR_{\theta}\mu_{t} should be the displacement interpolation between Rθμ0R_{\theta}\mu_{0} and Rθμ1R_{\theta}\mu_{1}. Considering t=12t=\frac{1}{2} in particular, Rθμ1/2R_{\theta}\mu_{1/2} should satisfy

Rθμ1/2={12(δ1+δ1) for θ=0,12(δ12+δ12) for θ=π4, and 12(δ1+δ1) for θ=π2.R_{\theta}\mu_{1/2}=\begin{cases}\frac{1}{2}(\delta_{-1}+\delta_{1})&\text{ for }\theta=0,\\ \frac{1}{2}\left(\delta_{-\frac{1}{\sqrt{2}}}+\delta_{\frac{1}{\sqrt{2}}}\right)&\text{ for }\theta=\frac{\pi}{4},\text{ and }\\ \frac{1}{2}(\delta_{-1}+\delta_{1})&\text{ for }\theta=\frac{\pi}{2}.\end{cases}

However, this is impossible: The first and the third line imply that μ1/2\mu_{1/2} must be a convex combination of δ(1,1),δ(1,1),δ(1,1)\delta_{(-1,-1)},\delta_{(-1,1)},\delta_{(1,-1)}, and δ(1,1)\delta_{(1,1)}, which contradicts the second requirement. By continuity of θRθμ1/2\theta\mapsto R_{\theta}\mu_{1/2}, similar properties hold for Rθμ1/2R_{\theta}\mu_{1/2} close to θ=0,π/4,π/2\theta=0,\pi/4,\pi/2, which shows that the geodesic cannot exist.

Note that regularity is not the only issue; by convolving μ,σ\mu,\sigma with a smooth kernel with a sufficiently small radius, we can argue similarly that there cannot be a geodesic.

In fact, we will see in Corollary 4.8 that SWSW is not even a length metric on 𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d}) – i.e. in general it cannot be approximated by the sliced Wasserstein length of absolutely continuous curves. This motivates us to investigate the length (and geodesic) metric induced by SWSW on 𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d}).

3. Curves in the sliced Wasserstein space and the tangential structure

In this section we study absolutely continuous curves in the SW space. In Section 3.1 we investigate the sliced Wasserstein metric derivative and prove the main result of this section, Theorem 3.9, characterizing absolutely continuous curves by corresponding distributional solutions of the continuity equation in the flux form

tμt+Jt=0.\partial_{t}\mu_{t}+\nabla\cdot J_{t}=0.

Section 3.2 characterizes the tangent space Tanμ(𝒫2(d),SW)\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW). The main result regarding the tangent space, Proposition 3.12, states that if (μt,Jt)tI(\mu_{t},J_{t})_{t\in I} is a solution of the continuity equation corresponding to an absolutely continuous curve, dJ^t/dμ^tL2(μ^t)\|d\widehat{J}_{t}/d\widehat{\mu}_{t}\|_{L^{2}(\widehat{\mu}_{t})} attains |μ|SW(t)|\mu^{\prime}|_{SW}(t) if and only if JtTanμt(𝒫2(d),SW)J_{t}\in\operatorname{Tan}_{\mu_{t}}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW), up to a 1¬I\mathscr{L}^{1}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{I}-null set.

While these results are reminiscent of the analogous statements for the Wasserstein space [1, Proposition 8.4.5], we note here a few differences. Firstly, from Theorem 3.9 we know that a generic absolutely continuous curve (μt)tI(\mu_{t})_{t\in I} admits the representation tμt+Jt=0\partial_{t}\mu_{t}+\nabla\cdot J_{t}=0 for some Jt(d;d)J_{t}\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}), where

(3.1) (d;d):=R1b(d;d).\begin{split}\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})&\mathrel{\mathop{\mathchar 58\relax}}=R^{-1}\mathcal{M}_{b}(\mathbb{P}_{d};\mathbb{R}^{d}).\end{split}

We will see in Lemma 3.4 that the above space is well-defined as a subspace of 𝒮(d;d)\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}), and that

(d;d)H(d1)/2d/2(d;d):=ε>0H(d1)/2d/2ε(d;d)𝒮(d;d).\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset H_{-(d-1)/2}^{-d/2-}(\mathbb{R}^{d};\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}=\bigcap_{\varepsilon>0}H_{-(d-1)/2}^{-d/2-\varepsilon}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}).

In general JtJ_{t} need not be measures, in which case it does not even make sense to consider the Radon-Nikodym derivative dJtdμt\frac{dJ_{t}}{d\mu_{t}}. Thus the distributional fluxes JJ will be the main object on which the tangential structure is based. Moreover, not all fluxes in Tanμ(𝒫2(d),SW)\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) preserve the nonnegativity of μ\mu. Therefore the tangent vectors attainable by curves in the SWSW space forms a (convex) cone, whereas the tangent space of the 2-Wasserstein space is a vector space. Thus, we can formally consider the SWSW space as a manifold with corners; see Remark 3.13 for further details.

3.1. Absolutely continuous curves in the sliced Wasserstein space

Let (X,m)(X,m) be a complete metric space. We say a curve v:(a,b)Xv\mathrel{\mathop{\mathchar 58\relax}}(a,b)\rightarrow X belongs to AC((a,b);X,m)AC((a,b);X,m) if there exists mL1(a,b)m\in L^{1}(a,b) such that

(3.2) m(v(s),v(t))stm(r)𝑑ra<st<b.m(v(s),v(t))\leq\int_{s}^{t}m(r)\,dr\quad\forall a<s\leq t<b.

Furthermore, for any vAC((a,b);X,d)v\in AC((a,b);X,d), the metric derivative

(3.3) |v|m(t):=limstm(v(s),v(t))|st||v^{\prime}|_{m}(t)\mathrel{\mathop{\mathchar 58\relax}}=\lim_{s\rightarrow t}\frac{m(v(s),v(t))}{|s-t|}

exists for 1\mathscr{L}^{1}-a.e. t(a,b)t\in(a,b), and |v|mL1(a,b)|v^{\prime}|_{m}\in L^{1}(a,b). We will often write II to denote an interval, which is assumed to be open but not necessarily bounded, unless otherwise stated. When there is no room for confusion regarding the interval, we simply write (vt)tIAC(X,m)=AC(I;X,m)(v_{t})_{t\in I}\in AC(X,m)=AC(I;X,m)

Prior to studying the length associated to SWSW, let us first investigate the metric derivative |μ|SW|\mu^{\prime}|_{SW}. As SW(μ,ν)W(μ,ν)SW(\mu,\nu)\leq W(\mu,\nu) for any μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}), it immediately follows that

|μ|SW(t)|μ|W(t)|\mu^{\prime}|_{SW}(t)\leq|\mu^{\prime}|_{W}(t)

for all tt at which both metric derivatives are well-defined.

Recall [1, Theorem 8.3.1] that each absolutely continuous curve (μt)tI(\mu_{t})_{t\in I} in the Wasserstein space has a corresponding distributional solution (μt,vtμt)tI(\mu_{t},v_{t}\mu_{t})_{t\in I} of the continuity equation such that

|μ|W(t)=vtL2(μt) a.e. tI.|\mu^{\prime}|_{W}(t)=\|v_{t}\|_{L^{2}(\mu_{t})}\text{ a.e. }t\in I.

We want to establish an analogous result for the sliced Wasserstein space. Suppose |μ^(θ,)|W(t)|\widehat{\mu}^{\prime}(\theta,\cdot)|_{W}(t) is well-defined at all tIt\in I. Then

SW2(μt,μt+h)h2=𝕊d1W2(μ^tθ,μ^t+hθ)h2𝑑θh0𝕊d1|μ^(θ,)|W2𝑑θ.\displaystyle\frac{SW^{2}(\mu_{t},\mu_{t+h})}{h^{2}}=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\frac{W^{2}(\widehat{\mu}_{t}^{\theta},\widehat{\mu}_{t+h}^{\theta})}{h^{2}}\,d\theta\xrightarrow[]{h\rightarrow 0}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|\widehat{\mu}^{\prime}(\theta,\cdot)|_{W}^{2}\,d\theta.

Assume that (μt,vt)tI(\mu_{t},v_{t})_{t\in I} satisfy the continuity equation

tμt+(vtμt)=0.\partial_{t}\mu_{t}+\normalcolor\nabla\cdot\normalcolor(v_{t}\mu_{t})=0.

From direct calculations one can readily verify (see Proposition A.6 for the proof)

(3.4) R(if)(θ,r)=θirRf(θ,r) for f𝒮(d).R(\partial_{i}f)(\theta,r)=\theta_{i}\partial_{r}Rf(\theta,r)\text{ for }f\in\mathcal{S}(\mathbb{R}^{d}).

Formally applying the Radon transform to the continuity equation and using (3.4), we obtain

tμ^t+i=1dR[i(vtiμt)]=tμ^t+i=1dθirR(vtiμt)=0.\partial_{t}\widehat{\mu}_{t}+\sum_{i=1}^{d}R[\partial_{i}(v_{t}^{i}\mu_{t})]=\partial_{t}\widehat{\mu}_{t}+\sum_{i=1}^{d}\theta_{i}\partial_{r}R(v_{t}^{i}\mu_{t})=0.

Rewriting in the velocity formulation,

tμ^t+r((θdvtμt^θdμ^tθ)μ^tθ)=0.\partial_{t}\widehat{\mu}_{t}+\partial_{r}\left(\left(\theta\cdot\frac{d\widehat{v_{t}\mu_{t}}^{\theta}}{d\widehat{\mu}_{t}^{\theta}}\right)\widehat{\mu}_{t}^{\theta}\right)=0.

Thus, by applying [1, Theorem 8.3.1] for each θ𝕊d1\theta\in\mathbb{S}^{d-1}, we deduce that formally

(3.5) |μ|SW2(t)=𝕊d1|μ^(θ,)|W2(t)𝑑θ𝕊d1|θdvtμt^θdμ^tθ|2𝑑μ^θ𝑑θ=θ(θdvtμt^dμ^t)L2(μ^t;d)2.|\mu^{\prime}|_{SW}^{2}(t)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|\widehat{\mu}^{\prime}(\theta,\cdot)|_{W}^{2}(t)\,d\theta\leq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\left|\theta\cdot\frac{d\widehat{v_{t}\mu_{t}}^{\theta}}{d\widehat{\mu}_{t}^{\theta}}\right|^{2}\,d\widehat{\mu}^{\theta}\,d\theta=\mathinner{\!\left\lVert\theta\left(\theta\cdot\frac{d\widehat{v_{t}\mu_{t}}}{d\widehat{\mu}_{t}}\right)\right\rVert}_{L^{2}(\widehat{\mu}_{t};\mathbb{R}^{d})}^{2}.

Observe that θ(θdvtμt^/dμ^t)\theta(\theta\cdot d\widehat{v_{t}\mu_{t}}/d\widehat{\mu}_{t}) is even in 𝕊d1×\mathbb{S}^{d-1}\times\mathbb{R}, hence is a function on d\mathbb{P}_{d}. For simplicity we will often write θdvtμt^/dμ^tL2(μ^t)\mathinner{\!\left\lVert\theta\cdot d\widehat{v_{t}\mu_{t}}/d\widehat{\mu}_{t}\right\rVert}_{L^{2}(\widehat{\mu}_{t})} instead. Furthermore, note that the existence of the velocity that saturates the inequality is nontrivial; for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1} the projection of the velocity must saturate the corresponding 1D inequality.

Before we begin investigating the metric derivative in detail, let us first consider examples that compare |μ|SW|\mu^{\prime}|_{SW} and |μ|W|\mu^{\prime}|_{W}; Example 3.1 demonstrates that they coincide for paths of discrete measures, whereas Example 3.2 shows that the ratio |μ|SW|μ|W\frac{|\mu^{\prime}|_{SW}}{|\mu^{\prime}|_{W}} is in general not bounded from below.

Example 3.1 (Discrete measures).

Let μt:=1ni=1nδxi(t)\mu_{t}\mathrel{\mathop{\mathchar 58\relax}}=\frac{1}{n}\sum_{i=1}^{n}\delta_{x_{i}(t)} where xi:Idx_{i}\mathrel{\mathop{\mathchar 58\relax}}I\rightarrow\mathbb{R}^{d} is continuously differentiable for each i=1,,ni=1,\cdots,n. Then

|μ|SW(t)=1d|μ|W(t) for all tI.|\mu^{\prime}|_{SW}(t)=\frac{1}{\sqrt{d}}|\mu^{\prime}|_{W}(t)\text{ for all }t\in I.

Indeed,

|μ|SW2(t)=𝕊d1|μ^(θ,)|W2(t)𝑑θ=1ni=1n𝕊d1|θxi(t)|2𝑑θ=1dni=1n|xi(t)|2=1d|μ|W2(t).|\mu^{\prime}|^{2}_{SW}(t)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|\widehat{\mu}^{\prime}(\theta,\cdot)|^{2}_{W}(t)\,d\theta=\frac{1}{n}\sum_{i=1}^{n}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|\theta\cdot x_{i}^{\prime}(t)|^{2}\,d\theta=\frac{1}{dn}\sum_{i=1}^{n}|x_{i}^{\prime}(t)|^{2}=\frac{1}{d}|\mu^{\prime}|_{W}^{2}(t).
Example 3.2 (Two sliding lines).

Consider two parallel line segments close to each other moving in opposite directions. A significant portion of the shearing velocity is cancelled out after projection, causing a significant gap between the metric derivatives with respect to WW and SWSW.

More precisely, let δ>0\delta>0 and let ν𝒫(2)\nu\in\mathscr{P}(\mathbb{R}^{2}) uniformly distributed on a segment with endpoints (±1,δ)(\pm 1,\delta). Similarly, σ\sigma is the analogous measure slightly below, with endpoints (±1,δ)(\pm 1,-\delta). Then define νt=ν((t,0))\nu_{t}=\nu(\cdot-(t,0)) and σt=σ(+(t,0))\sigma_{t}=\sigma(\cdot+(t,0)) –i.e. ν\nu is translated to the right and σ\sigma to the left.

Defining μt=12(νt+σt)\mu_{t}=\frac{1}{2}(\nu_{t}+\sigma_{t}), one can check by direct calculations that

|μ|SW(t)t+δ1=|μ|W(t) for t0.|\mu^{\prime}|_{SW}(t)\lesssim t+\delta\ll 1=|\mu^{\prime}|_{W}(t)\text{ for }t\geq 0.

We begin the rigorous study of the metric derivative |μ|SW|\mu^{\prime}|_{SW} by considering Benamou-Brenier functional for the sliced Wasserstein distance. Consider the 1D Benamou-Brenier functional for d\mathbb{R}^{d} valued flux, namely 2:()×(;d)\mathscr{B}_{2}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{M}(\mathbb{R})\times\mathcal{M}(\mathbb{R};\mathbb{R}^{d}) defined by

(3.6) 2(μ,E):=sup{a(x)𝑑μ(x)+b(x)𝑑E(x):a,bCb(;K2)}, where K2:={(a,b)×d:a+12|b|20}.\begin{split}\mathscr{B}_{2}(\mu,E)\mathrel{\mathop{\mathchar 58\relax}}=&\sup\left\{\int_{\mathbb{R}}a(x)\,d\mu(x)+\int_{\mathbb{R}}b(x)\cdot dE(x)\mathrel{\mathop{\mathchar 58\relax}}\,a,b\in C_{b}(\mathbb{R};K_{2})\right\},\\ &\text{ where }K_{2}\mathrel{\mathop{\mathchar 58\relax}}=\{(a,b)\in\mathbb{R}\times\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\,a+\frac{1}{2}|b|^{2}\leq 0\}.\end{split}

2\mathscr{B}_{2} enjoys several desirable properties such as joint convexity in the arguments and the lower semicontinuity with respect to the narrow convergence. See [54, Section 5.3.1] for a more complete list of properties (note they refer to narrow convergence as weak convergence). By (3.5) we expect

|μ|SW2(t)=𝕊d1|(μ^θ)|W2(t)𝑑θ𝕊d12(μ^tθ,θvtμt^θ)𝑑θ.|\mu^{\prime}|_{SW}^{2}(t)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(\widehat{\mu}^{\theta})^{\prime}|_{W}^{2}(t)\,d\theta\leq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathscr{B}_{2}(\widehat{\mu}_{t}^{\theta},\theta\cdot\widehat{v_{t}\mu_{t}}^{\theta})\,d\theta.

Thus it is natural to define the Benamou-Brenier functional BSW:𝒫2(d)×(d;d)(,+]B_{SW}\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\times\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\rightarrow(-\infty,+\infty] for the sliced Wasserstein distance by

(3.7) BSW(μ,E)=𝕊d12(μ^θ,θE^θ)𝑑θ=θdE^dμ^L2(μ^)2.B_{SW}(\mu,E)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathscr{B}_{2}(\widehat{\mu}^{\theta},\theta\cdot\widehat{E}^{\theta})\,d\theta=\left\|\,\theta\cdot\frac{d\widehat{E}}{d\widehat{\mu}}\right\|_{L^{2}(\widehat{\mu})}^{2}.

Note that BSWB_{SW} only depends on the Radon transforms μ^,E^\widehat{\mu},\widehat{E} of the inputs. By definition (3.1) of (d;d)\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}), E^b(𝒫d;d)\widehat{E}\in\mathcal{M}_{b}(\mathscr{P}_{d};\mathbb{R}^{d}) hence its disintegration E^θ(;d)\widehat{E}^{\theta}\in\mathcal{M}(\mathbb{R};\mathbb{R}^{d}) with respect to Vol𝕊d1\operatorname{Vol}_{\mathbb{S}^{d-1}} is well-defined for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}, so is the integral in (3.7). In general

E(d;d)𝒮(d;d),E\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}),

and EE is not necessarily a (vector-valued) measure. However, the Radon transform is well-defined for such EE, unlike for general tempered distributions; see Remark A.1 for further details. We will see later in Lemma 3.4 that fluxes of interest lie in (d;d)\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}).

We record some basic properties that BSWB_{SW} inherits from the Benamou-Brenier functional 2\mathscr{B}_{2}.

Proposition 3.3 (Properties of the BSWB_{SW}).

The functional BSWB_{SW} is convex, and satisfies the following properties:

  • (i)

    Let μ,μn𝒫2(d)\mu,\mu_{n}\in\mathscr{P}_{2}(\mathbb{R}^{d}) and E,En(d;d)E,E_{n}\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) be such that E^,E^n(d;d)\widehat{E},\widehat{E}_{n}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) and (μ^n,E^n)(\widehat{\mu}_{n},\widehat{E}_{n}) narrowly converge to (μ^,E^)(\widehat{\mu},\widehat{E}) in (d)×(d;d)\mathcal{M}(\mathbb{P}_{d})\times\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}). Then

    (3.8) BSW(μ,E)lim infnBSW(μn,En).B_{SW}(\mu,E)\leq\liminf_{n}B_{SW}(\mu_{n},E_{n}).
  • (ii)

    BSW(μ,E)0B_{SW}(\mu,E)\geq 0

  • (iii)

    BSW(μ,E)<B_{SW}(\mu,E)<\infty only if Rθμ0R_{\theta}\mu\geq 0 and RθERθμR_{\theta}E\ll R_{\theta}\mu for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}

  • (iv)

    Let μ0\mu\geq 0 and suppose RθERθμR_{\theta}E\ll R_{\theta}\mu for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. Then we can write

    BSW(μ,E)=12𝕊d1|θE^(r,θ)μ^(s,θ)|2𝑑μ^(r,θ)𝑑θ.B_{SW}(\mu,E)=\frac{1}{2}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\left|\frac{\theta\cdot\widehat{E}(r,\theta)}{\widehat{\mu}(s,\theta)}\right|^{2}\,d\widehat{\mu}(r,\theta)\,d\theta.
Proof.

Items (i)-(iv) follow directly from the analogous property for 2\mathscr{B}_{2} (see [54, Proposition 5.18], and also [1, Lemma 8.1.10] for item (iv)). ∎

The following lemma relates attenuated Sobolev norms and BSWB_{SW}, and shows that the domain of interest is a subset of (d;d)𝒮(d;d)\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}). Given v,θdv,\theta\in\mathbb{R}^{d}, we henceforth write v||θv||\theta to say that vv and θ\theta are parallel.

Lemma 3.4 (BSWB_{SW} and H(d1)/2d/2H^{-d/2-}_{-(d-1)/2}).

Let 𝔍(d;d)\mathfrak{J}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) be such that |𝔍||\mathfrak{J}| is a finite measure. Then 𝔍ε>0H1/2ε(d)\mathfrak{J}\in\bigcap_{\varepsilon>0}H^{-1/2-\varepsilon}(\mathbb{P}_{d}) and there exists J(d;d)H(d1)/2d/2(d;d)𝒮(d;d)J\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset H^{-d/2^{-}}_{-(d-1)/2}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}) such that RJ=𝔍RJ=\mathfrak{J} and for all ε>0\varepsilon>0

(3.9) JH(d1)/2d2ε(d)=𝔍H12ε(d).\|J\|_{H^{-\frac{d}{2}-\varepsilon}_{-(d-1)/2}(\mathbb{R}^{d})}=\normalcolor\|\mathfrak{J}\|_{H^{-\frac{1}{2}-\varepsilon}(\mathbb{P}_{d})}\normalcolor.

If instead 𝔍(d;d)\mathfrak{J}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) and μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) satisfy

  • (i)

    (2\mathscr{B}_{2}^{\mathbb{R}}-upper bound) 𝕊d12(μ^θ,θ𝔍θ)𝑑θ<\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathscr{B}_{2}(\widehat{\mu}^{\theta},\theta\cdot\mathfrak{J}^{\theta})\,d\theta<\infty

  • (ii)

    (parallel to θ\theta) d𝔍dμ^(θ,r)||θ\frac{d\mathfrak{J}}{d\widehat{\mu}}(\theta,r)||\theta for μ^\widehat{\mu}-a.e. θ,r\theta,r

then 𝔍\mathfrak{J} is a finite measure thus there exists J(d;d)𝒮(d;d)J\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}) such that RJ=𝔍RJ=\mathfrak{J}. Moreover, for each ε>0\varepsilon>0 there exists C(ε)>0C(\varepsilon)>0 such that

(3.10) JH(d1)/2d2ε(d)=𝔍H12ε(d)=θ𝔍H12ε(d)C(ε)BSW1/2(μ,J).\|J\|_{H^{-\frac{d}{2}-\varepsilon}_{-(d-1)/2}(\mathbb{R}^{d})}=\|\mathfrak{J}\|_{H^{-\frac{1}{2}-\varepsilon}(\mathbb{P}_{d})}=\|\theta\cdot\mathfrak{J}\|_{H^{-\frac{1}{2}-\varepsilon}(\mathbb{P}_{d})}\leq C(\varepsilon)B_{SW}^{1/2}(\mu,J).
Remark 3.5.

As we see in the proof, assumption (i) along with Proposition 3.3 (iii) implies that d𝔍dμ^L2(μ^;d)\frac{d\mathfrak{J}}{d\widehat{\mu}}\in L^{2}(\widehat{\mu};\mathbb{R}^{d}). In Theorem 3.9 we will see that the flux JtJ_{t} such that BSW(μt,Jt)|μ|SW(t)B_{SW}(\mu_{t},J_{t})\leq|\mu^{\prime}|_{SW}(t) satisfies dJ^tdμ^t(θ,r)θ\frac{d\widehat{J}_{t}}{d\widehat{\mu}_{t}}(\theta,r)\parallel\theta for μ^t\widehat{\mu}_{t}- a.e. θ,r\theta,r thus the condition (ii) is automatically satisfied. ∎

Proof.

Let 𝔍(d;d)\mathfrak{J}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) and first consider the case |𝔍||\mathfrak{J}| is a finite measure. Then for each test function φCc(d)\varphi\in C_{c}^{\infty}(\mathbb{P}_{d}) and for any ε>0\varepsilon>0,

dφ𝑑𝔍φLdd|𝔍|C(ε)φH12+ε(d)dd|𝔍|\displaystyle\int_{\mathbb{P}_{d}}\varphi\,d\mathfrak{J}\leq\|\varphi\|_{L^{\infty}}\int_{\mathbb{P}_{d}}\,d|\mathfrak{J}|\leq C(\varepsilon)\|\varphi\|_{H^{\frac{1}{2}+\varepsilon}(\mathbb{P}_{d})}\int_{\mathbb{P}_{d}}\,d|\mathfrak{J}|

where we have used the Sobolev embedding theorem in one dimension in the last line. Thus 𝔍H1/2(d)\mathfrak{J}\in H^{-1/2-}(\mathbb{P}_{d}) and (2.9) implies the existence of JεH(d1)/2d/2ε(d)𝒮(d)J^{\varepsilon}\in H^{-d/2-\varepsilon}_{-(d-1)/2}(\mathbb{R}^{d})\in\mathcal{S}^{\prime}(\mathbb{R}^{d}) such that

RJε=𝔍 and JεH(d1)/2d/2ε(d)=𝔍H1/2ε(d).\normalcolor RJ^{\varepsilon}=\mathfrak{J}\normalcolor\text{ and }\|J^{\varepsilon}\|_{H^{-d/2-\varepsilon}_{-(d-1)/2}(\mathbb{R}^{d})}\normalcolor=\|\mathfrak{J}\|_{H^{-1/2-\varepsilon}(\mathbb{P}_{d})}\normalcolor.

To conclude (3.9) it suffices to note that Jε𝒮(d;d)J^{\varepsilon}\in\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}) is independent of the choice of ε>0\varepsilon>0. Indeed, suppose ε>ε~>0\varepsilon>\tilde{\varepsilon}>0, then (2.13) implies that for all f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d})

JεJε~,f=cd1JεJε~,RΛdRf=cd1J^εJ^ε~,ΛdRf=cd1𝔲μ^𝔲μ^,Λdf^=0.\displaystyle\langle J^{\varepsilon}-J^{\tilde{\varepsilon}},f\rangle=c_{d}^{-1}\langle J^{\varepsilon}-J^{\tilde{\varepsilon}},R^{\ast}\Lambda_{d}Rf\rangle=c_{d}^{-1}\langle\widehat{J}^{\varepsilon}-\widehat{J}^{\tilde{\varepsilon}},\Lambda_{d}Rf\rangle=c_{d}^{-1}\langle\mathfrak{u}\widehat{\mu}-\mathfrak{u}\widehat{\mu},\Lambda_{d}\widehat{f}\rangle=0.

On the other hand, suppose (i) and (ii) hold. By (ii) 2(μ^θ,θ𝔍)=2(μ^θ,|𝔍|)\mathscr{B}_{2}(\widehat{\mu}^{\theta},\theta\cdot\mathfrak{J})=\mathscr{B}_{2}(\widehat{\mu}^{\theta},|\mathfrak{J}|), which by (i) is finite a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. Thus, by analogous property of 2\mathscr{B}_{2} to Proposition 3.3 (iii), we may write 𝔍=𝔲dμ^\mathfrak{J}=\mathfrak{u}d\widehat{\mu} for some 𝔲dd\mathfrak{u}\in\mathbb{P}_{d}\rightarrow\mathbb{R}^{d} satisfying 𝔲θ\mathfrak{u}\parallel\theta and 𝔲L2(μ^)=θ𝔲L2(μ^)<\|\mathfrak{u}\|_{L^{2}(\widehat{\mu})}=\|\theta\cdot\mathfrak{u}\|_{L^{2}(\widehat{\mu})}<\infty. By Jensen’s inequality,

dd|𝔍|=d|𝔲|𝑑μ^d|𝔲|2𝑑μ^=𝕊d12(μ^θ,θ𝔍θ)𝑑θ<\int_{\mathbb{P}_{d}}d|\mathfrak{J}|=\int_{\mathbb{P}_{d}}|\mathfrak{u}|\,d\widehat{\mu}\leq\int_{\mathbb{P}_{d}}|\mathfrak{u}|^{2}\,d\widehat{\mu}=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathscr{B}_{2}(\widehat{\mu}^{\theta},\theta\cdot\mathfrak{J}^{\theta})\,d\theta<\infty

thus by the previous part we can find JH(d1)/2d/2ε(d)J\in H^{-d/2-\varepsilon}_{-(d-1)/2}(\mathbb{R}^{d}) with RJ=𝔍RJ=\mathfrak{J}. Finally, (3.10) follows directly from property (ii) and (3.9). ∎

We characterize absolutely continuous curves in the sliced Wasserstein space by identifying them with solutions 𝒞I\mathcal{CE}_{I} of the continuity equation, defined in the following way.

Definition 3.6.

Let II\subset\mathbb{R} be an open interval. We denote by 𝒞I\mathcal{CE}_{I} the set of all pairs (μt,Jt)tI(\mu_{t},J_{t})_{t\in I} satisfying the following conditions:

  • (i)

    The curve (μt)tI(\mu_{t})_{t\in I} is narrowly continuous in 𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d}) with respect to tIt\in I;

  • (ii)

    J=R1𝔍J=R^{-1}\mathfrak{J} for some vector-valued Borel measure 𝔍(I×d;d)\mathfrak{J}\in\mathcal{M}(I\times\mathbb{P}_{d};\mathbb{R}^{d}) where the inverse Radon transform applied in the d\mathbb{P}_{d}-variable and

    (3.11) I~𝕊d1d|𝔍(r,θ,t)|=I~𝕊d1d|𝔍t(r,θ)|dt< for all I~I,\int_{\tilde{I}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}d|\mathfrak{J}(r,\theta,t)|=\int_{\tilde{I}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}d|\mathfrak{J}_{t}(r,\theta)|\,dt<\infty\text{ for all }\tilde{I}\subset\subset I,

    where 𝔍¬I~×d\mathfrak{J}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\tilde{I}\times\mathbb{P}_{d}} admits the disintegration 𝔍=I~𝔍t𝑑1¬I~(t)\mathfrak{J}=\int_{\tilde{I}}\mathfrak{J}_{t}\,d\mathscr{L}^{1}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\tilde{I}}(t);

  • (iii)

    (μt,Jt)tI(\mu_{t},J_{t})_{t\in I} is a distributional solution of the continuity equation – i.e.

    Iμt,tφ(t,)d+Jt,φ(t,)ddt=0 for all φCc(I×d).\int_{I}\langle\mu_{t},\partial_{t}\varphi(t,\cdot)\rangle_{\mathbb{R}^{d}}+\langle J_{t},\nabla\varphi(t,\cdot)\rangle_{\mathbb{R}^{d}}\,dt=0\text{ for all }\varphi\in C_{c}^{\infty}(I\times\mathbb{R}^{d}).

Moreover, we write 𝒞I(μ¯0,μ¯1)\mathcal{CE}_{I}(\bar{\mu}_{0},\bar{\mu}_{1}) the set of pairs (μ,J)𝒞I(\mu,J)\in\mathcal{CE}_{I} satisfying μa=μ¯0\mu_{a}=\bar{\mu}_{0} and μb=μ¯1\mu_{b}=\bar{\mu}_{1} where a=infIa=\inf I and b=supIb=\sup I.

Remark 3.7.

Given 𝔍(I×d;d)\mathfrak{J}\in\mathcal{M}(I\times\mathbb{P}_{d};\mathbb{R}^{d}) satisfying (3.11), for each I~I\tilde{I}\subset\subset I the disintegratation theorem with respect to 1¬I~\mathscr{L}^{1}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\tilde{I}} implies that 𝔍t(d;d)\mathfrak{J}_{t}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) is well-defined and is bounded for a.e. t1¬I~t\in\mathscr{L}^{1}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\tilde{I}}. By considering an increasing countable sequence of I~nI\tilde{I}_{n}\subset\subset I such that nI~n=I\bigcup_{n}\tilde{I}_{n}=I, we may define 𝔍t(d;d)\mathfrak{J}_{t}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) for 1\mathscr{L}^{1}-a.e. tIt\in I. Then Lemma 3.4 allows us to define the Radon inverse Jt=R1𝔍t(d;d)J_{t}=R^{-1}\mathfrak{J}_{t}\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) for a.e. tIt\in\ I.

Furthermore, condition (ii) is not restrictive whenever 𝔍tθ\mathfrak{J}_{t}\parallel\theta, as

I~𝕊d1|RJt(dθ,dr)|𝑑t𝑑θ\displaystyle\int_{\tilde{I}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|RJ_{t}(d\theta,dr)|\,dtd\theta I~dJ^tdμ^tL1(μ^t)𝑑tI~dJ^tdμ^tL2(μ^t)𝑑t\displaystyle\leq\int_{\tilde{I}}\,\mathinner{\!\left\lVert\frac{d\widehat{J}_{t}}{d\widehat{\mu}_{t}}\right\rVert}_{L^{1}(\widehat{\mu}_{t})}\!\!dt\lesssim\int_{\tilde{I}}\,\mathinner{\!\left\lVert\frac{d\widehat{J}_{t}}{d\widehat{\mu}_{t}}\right\rVert}_{L^{2}(\widehat{\mu}_{t})}\!\!dt
I~BSW1/2(μt,Jt)𝑑t|I~|(IBSW(μt,Jt)𝑑t)1/2\displaystyle\leq\int_{\tilde{I}}B_{SW}^{1/2}(\mu_{t},J_{t})\,dt\leq\sqrt{|\tilde{I}|}\left(\int_{I}B_{SW}(\mu_{t},J_{t})\,dt\right)^{1/2}

and we are only interested in absolutely continuous curves, for which we will see that the right-hand side is finite. ∎

We state a useful technical lemma, the proof of which we delay to Appendix B.

Lemma 3.8.

Let II\subset\mathbb{R} be an open interval. Let (μt,Jt)tI𝒞I(\mu_{t},J_{t})_{t\in I}\in\mathcal{CE}_{I} as defined in Definition 3.6. Then for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}, t(μ^tθ,J^tθ)t\mapsto(\widehat{\mu}_{t}^{\theta},\widehat{J}_{t}^{\theta}) is a distributional solution of the continuity equation on I×I\times\mathbb{R} – i.e.

(3.12) Iμ^tθ,tψ(t,)+θJ^t,rψ(t,)dt=0 for all ψCc(I×).\int_{I}\langle\widehat{\mu}_{t}^{\theta},\partial_{t}\psi(t,\cdot)\rangle_{\mathbb{R}}+\langle\theta\cdot\widehat{J}_{t},\partial_{r}\psi(t,\cdot)\rangle_{\mathbb{R}}\,dt=0\text{ for all }\psi\in C_{c}^{\infty}(I\times\mathbb{R}).

In case J=IJt𝑑tb(I×d)J=\int_{I}J_{t}\,dt\in\mathcal{M}_{b}(I\times\mathbb{R}^{d}), then by approximation t(μt,Jt)t\mapsto(\mu_{t},J_{t}) solve the continuity equation against test functions Cb1(I×d)C_{b}^{1}(I\times\mathbb{R}^{d}). Thus for each θ𝕊d1\theta\in\mathbb{S}^{d-1}, we may take test functions of the form φ(t,x)=ψ(t,xθ)\varphi(t,x)=\psi(t,x\cdot\theta) for any ψCc(I×)\psi\in C_{c}^{\infty}(I\times\mathbb{R}) and deduce that (3.12) holds. However, (μt,Jt)tI𝒞I(\mu_{t},J_{t})_{t\in I}\in\mathcal{CE}_{I} in general enjoys less regularity, thus we need to instead rely on an approximation argument involving fine properties of the Radon transform. As the proof relies on technical tools that are irrelevant to other materials of this paper, we delay the proof to the appendix.

We are now ready to establish the main theorem of this section.

Theorem 3.9 (AC curves in the sliced Wasserstein metric space).

Let BSWB_{SW} be as defined in (3.7), and II\subset\mathbb{R} an open interval.

  • (i)

    Suppose (μt,Jt)tI𝒞I(\mu_{t},J_{t})_{t\in I}\in\mathcal{CE}_{I} satisfying tBSW(μt,Jt)L1(I)t\mapsto B_{SW}(\mu_{t},J_{t})\in L^{1}(I). Then, for 1\mathscr{L}^{1}-a.e. tIt\in I

    (3.13) |μ|SW2(t)BSW(μt,Jt),|\mu^{\prime}|_{SW}^{2}(t)\leq B_{SW}(\mu_{t},J_{t}),

    and (μt)tIAC(I;𝒫2(d),SW)(\mu_{t})_{t\in I}\in AC(I;\mathscr{P}_{2}(\mathbb{R}^{d}),SW).

  • (ii)

    Conversely, let (μt)tIAC(I;𝒫2(d),SW)(\mu_{t})_{t\in I}\in AC(I;\mathscr{P}_{2}(\mathbb{R}^{d}),SW). Then there exists

    (3.14) Jt{ΛdR(μ^tΛd(φ^)):φCc(d)}¯BSW(μ,)(d;d) for a.e. tI.J_{t}\in\overline{\left\{\Lambda_{d}R^{\ast}(\widehat{\mu}_{t}\Lambda_{d}(\widehat{\nabla\varphi}))\;\mathrel{\mathop{\mathchar 58\relax}}\;\varphi\in C_{c}^{\infty}(\mathbb{R}^{d})\right\}}^{B_{SW}(\mu,\cdot)}\subset\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\text{ for a.e. }t\in I.

    such that (μt,Jt)tI𝒞I(\mu_{t},J_{t})_{t\in I}\in\mathcal{CE}_{I} and

    (3.15) BSW(μt,Jt)|μ|SW2(t) for a.e. tI.B_{SW}(\mu_{t},J_{t})\leq|\mu^{\prime}|_{SW}^{2}(t)\text{ for a.e. }t\in I.
Proof.

We adapt the proof of the analogous result by Ambrosio, Gigli, and Savaré [1, Theorem 8.3.1]; we leave the structure parallel to their proof, so readers can compare the objects arising in the sliced Wasserstein setting to the Wasserstein counterparts.

Step 1o. By assumption (Jt,μt)tI𝒞I(J_{t},\mu_{t})_{t\in I}\in\mathcal{CE}_{I} and thus we know RJt(d;d)RJ_{t}\in\mathcal{M}(\mathbb{P}_{d};\mathbb{R}^{d}) is well-defined for a.e. tIt\in I. We may deduce by Proposition 3.3 (iii) that RθJtμ^tθR_{\theta}J_{t}\ll\widehat{\mu}_{t}^{\theta} for a.e. tIt\in I and a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}; hence write RJt=𝔳tμ^tRJ_{t}=\mathfrak{v}_{t}\widehat{\mu}_{t} and note

IBSW(μt,Jt)=12I𝕊d1θ𝔳tθL2(μ^tθ)2𝑑θ𝑑t<,\int_{I}B_{SW}(\mu_{t},J_{t})=\frac{1}{2}\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|\theta\cdot\mathfrak{v}_{t}^{\theta}\|_{L^{2}(\widehat{\mu}_{t}^{\theta})}^{2}\,d\theta\,dt<\infty,

By Lemma 3.8, we have, in the sense of distributions,

tμ^tθ+r(θ𝔳tθμ^tθ)=0 for a.e. θ𝕊d1.\partial_{t}\widehat{\mu}_{t}^{\theta}+\normalcolor\partial_{r}\normalcolor(\theta\cdot\mathfrak{v}_{t}^{\theta}\widehat{\mu}_{t}^{\theta})=0\text{ for a.e. }\theta\in\mathbb{S}^{d-1}.

Thus, by [1, Theorem 8.3.1] we see that for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}, each (μ^tθ)tIAC(𝒫2(),W)(\widehat{\mu}_{t}^{\theta})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}),W), and further

|(μ^θ)|W(t)|θ𝔳t(θ,r)|2𝑑μ^θ(r) for a.e. θ𝕊d1,tI.|(\widehat{\mu}^{\theta})^{\prime}|_{W}(t)\leq\int_{\mathbb{R}}|\theta\cdot\mathfrak{v}_{t}(\theta,r)|^{2}\,d\widehat{\mu}^{\theta}(r)\text{ for a.e. }\theta\in\mathbb{S}^{d-1},\;t\in I.

Combining, we have

I|μ|SW2(t)𝑑t=𝕊d1|(μ^θ)|W(t)2𝑑tI𝕊d1|θ𝔳t(θ,r)|2𝑑μ^θ(r)𝑑θ𝑑t=IBSW(μt,Jt)𝑑t.\int_{I}|\mu^{\prime}|_{SW}^{2}(t)\,dt=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(\widehat{\mu}^{\theta})^{\prime}|_{W}(t)^{2}\,dt\leq\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\theta\cdot\mathfrak{v}_{t}(\theta,r)|^{2}\,d\widehat{\mu}^{\theta}(r)\,d\theta\,dt=\int_{I}B_{SW}(\mu_{t},J_{t})\,dt.

Step 2o. It remains to prove the converse. Let φCc(d)\varphi\in C_{c}^{\infty}(\mathbb{R}^{d}). Then by the Radon inversion formula (A.11)

φ=cd1RΛdRφ,\varphi=c_{d}^{-1}R^{\ast}\Lambda_{d}R\varphi,

and thus for each μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}), duality formula (2.3) implies

μ(φ)\displaystyle\mu(\varphi) :=dφ(x)dμ(x)=cd1d𝕊d1ΛdRθφ(xθ)dθdμ(x)\displaystyle\mathrel{\mathop{\mathchar 58\relax}}=\int_{\mathbb{R}^{d}}\varphi(x)\,d\mu(x)=c_{d}^{-1}\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\Lambda_{d}R_{\theta}\varphi(x\cdot\theta)\,d\theta\,d\mu(x)
=cd1𝕊d1dΛdRθφ(xθ)𝑑μ(x)𝑑θ=cd1𝕊d1ΛdRθφ(r)𝑑μ^θ(r)𝑑θ.\displaystyle=c_{d}^{-1}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}}\Lambda_{d}R_{\theta}\varphi(x\cdot\theta)\,d\mu(x)\,d\theta=c_{d}^{-1}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\Lambda_{d}R_{\theta}\varphi(r)\,d\widehat{\mu}^{\theta}(r)\,d\theta.

Let (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW). Denoting by γ^s,tΓ^o(μ^t,μ^s)\widehat{\gamma}_{s,t}\in\widehat{\Gamma}_{o}(\widehat{\mu}_{t},\widehat{\mu}_{s}) for each s,tIs,t\in I,

|μt(φ)μs(φ)|\displaystyle|\mu_{t}(\varphi)-\mu_{s}(\varphi)| =cd1|𝕊d1d×dΛdRθφ(q)ΛdRθφ(r)dγ^s,tθ(q,r)dθ|\displaystyle=c_{d}^{-1}\left|\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\Lambda_{d}R_{\theta}\varphi(q)-\Lambda_{d}R_{\theta}\varphi(r)\,d\widehat{\gamma}^{\theta}_{s,t}(q,r)\,d\theta\right|
cd1supθ𝕊d1Lip(ΛdRθφ)SW(μt,μs).\displaystyle\leq c_{d}^{-1}\sup_{\theta\in\mathbb{S}^{d-1}}\operatorname{Lip}(\Lambda_{d}R_{\theta}\varphi)SW(\mu_{t},\mu_{s}).

Furthermore, define Hθ=Hθ(φ)H^{\theta}=H^{\theta}(\varphi) by

Hθ(r,q):={|ΛdrRθφ(r)| when r=q, and |ΛdRθφ(r)ΛdRθφ(q)||rq| when rq.\displaystyle H_{\theta}(r,q)\mathrel{\mathop{\mathchar 58\relax}}=\begin{cases}|\Lambda_{d}\partial_{r}R_{\theta}\varphi(r)|\text{ when }r=q,\text{ and }\\ \frac{|\Lambda_{d}R_{\theta}\varphi(r)-\Lambda_{d}R_{\theta}\varphi(q)|}{|r-q|}\text{ when }r\neq q.\end{cases}

Then

μs+h(φ)μs(φ)|h|\displaystyle\frac{\mu_{s+h}(\varphi)-\mu_{s}(\varphi)}{|h|} 1cd|h|𝕊d1d×d|rq|Hθ(r,q)𝑑γs+h,hθ(r,q)𝑑θ\displaystyle\leq\frac{1}{c_{d}|h|}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}|r-q|H_{\theta}(r,q)\,d\gamma_{s+h,h}^{\theta}(r,q)\,d\theta
SW(μs+h,μs)cd|h|(𝕊d1d×dHθ2(r,q)𝑑γs+h,hθ(r,q)𝑑θ)1/2.\displaystyle\leq\frac{SW(\mu_{s+h},\mu_{s})}{c_{d}|h|}\left(\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}H_{\theta}^{2}(r,q)\,d\gamma_{s+h,h}^{\theta}(r,q)\,d\theta\right)^{1/2}.

Let φCc(I×d)\varphi\in C_{c}^{\infty}(I\times\mathbb{R}^{d}) and (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW). Then

(3.16) |Idsφ(x,s)dμs(x)|cd1(I~|μ|SW2(s)𝑑s)12(I~𝕊d1|ΛdrRθφ(r)|2𝑑μ^tθ(r)𝑑θ)12cd1(I~|μ|SW2(s)𝑑s)12(I~ΛdR(φ)L2(μ^t)2)12\begin{split}\left|\int_{I}\int_{\mathbb{R}^{d}}\partial_{s}\varphi(x,s)\,d\mu_{s}(x)\right|&\leq c_{d}^{-1}\left(\int_{\tilde{I}}|\mu^{\prime}|_{SW}^{2}(s)\,ds\right)^{\frac{1}{2}}\left(\int_{\tilde{I}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\Lambda_{d}\partial_{r}R_{\theta}\varphi(r)|^{2}\,d\widehat{\mu}_{t}^{\theta}(r)\,d\theta\right)^{\frac{1}{2}}\\ &\leq c_{d}^{-1}\left(\int_{\tilde{I}}|\mu^{\prime}|_{SW}^{2}(s)\,ds\right)^{\frac{1}{2}}\left(\int_{\tilde{I}}\|\Lambda_{d}R(\nabla\varphi)\|_{L^{2}(\widehat{\mu}_{t})}^{2}\right)^{\frac{1}{2}}\end{split}

where the Radon transform is applied in the spatial variable, and I~I\tilde{I}\subset I is any interval such that suppφI~×d\operatorname{supp}\varphi\subset\tilde{I}\times\mathbb{R}^{d}. Let

V:={Λdφ^:φCc(I×d)}V\mathrel{\mathop{\mathchar 58\relax}}=\left\{\Lambda_{d}\widehat{\nabla\varphi}\mathrel{\mathop{\mathchar 58\relax}}\;\varphi\in C_{c}^{\infty}(I\times\mathbb{R}^{d})\right\}

and let V¯\overline{V} be the closure of VV under the norm

L2((μ^t)tI):=(IL2(μ^t)2dt)1/2.\|\cdot\|_{L^{2}((\widehat{\mu}_{t})_{t\in I})}\mathrel{\mathop{\mathchar 58\relax}}=\left(\int_{I}\|\cdot\|_{L^{2}(\widehat{\mu}_{t})}^{2}\,dt\right)^{1/2}.

As Λdφt^(θ,r)=θΛdrφ^tθ(r)\Lambda_{d}\widehat{\nabla\varphi_{t}}(\theta,r)=\theta\Lambda_{d}\partial_{r}\widehat{\varphi}_{t}^{\theta}(r), any vector 𝔲V¯\mathfrak{u}\in\overline{V} evaluated at each time is parallel to θ\theta – i.e. 𝔲s(θ,r)θ\mathfrak{u}_{s}(\theta,r)\parallel\theta for all (s,θ,r)I×𝕊d1×(s,\theta,r)\in I\times\mathbb{S}^{d-1}\times\mathbb{R}. By the estimate (3.16), we can extend the functional L:VL\mathrel{\mathop{\mathchar 58\relax}}V\rightarrow\mathbb{R}

L(Λdφ^):=Idsφs(x)dμs(x)L(\Lambda_{d}\widehat{\nabla\varphi})\mathrel{\mathop{\mathchar 58\relax}}=-\int_{I}\int_{\mathbb{R}^{d}}\partial_{s}\varphi_{s}(x)\,d\mu_{s}(x)

uniquely to a bounded linear functional on V¯\overline{V}, such that

(3.17) |L(𝔲)|cd1(I~|μ|SW2(s)𝑑s)12(I~𝔲sL2(μ^s)2𝑑s)12 for any 𝔲V¯ with supp𝔲I~×d.|L(\mathfrak{u})|\leq c_{d}^{-1}\left(\int_{\tilde{I}}|\mu^{\prime}|^{2}_{SW}(s)\,ds\right)^{\frac{1}{2}}\left(\int_{\tilde{I}}\|\mathfrak{u}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\,ds\right)^{\frac{1}{2}}\text{ for any }\mathfrak{u}\in\overline{V}\text{ with }\operatorname{supp}\mathfrak{u}\subset\tilde{I}\times\mathbb{R}^{d}.

Consider the minimization problem

min{12cdI𝔲sL2(μ^s)2𝑑sL(𝔲):𝔲V¯}=min{12cdIθ𝔲sL2(μ^s)2𝑑sL(𝔲):𝔲V¯}.\min\left\{\frac{1}{2c_{d}}\int_{I}\|\mathfrak{u}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\,ds-L(\mathfrak{u})\mathrel{\mathop{\mathchar 58\relax}}\;\mathfrak{u}\in\overline{V}\right\}=\min\left\{\frac{1}{2c_{d}}\int_{I}\|\theta\cdot\mathfrak{u}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\,ds-L(\mathfrak{u})\mathrel{\mathop{\mathchar 58\relax}}\;\mathfrak{u}\in\overline{V}\right\}.

Note that the functional we are minimizing is the sum of a quadratic term and a bounded linear functional in L2((μ^t)tI)\|\cdot\|_{L^{2}((\widehat{\mu}_{t})_{t\in I})}, thus is coercive in L2((μ^t)tI)\|\cdot\|_{L^{2}((\widehat{\mu}_{t})_{t\in I})} and lower semicontinuous in the weak topology of L2((μ^t)tI)L^{2}((\widehat{\mu}_{t})_{t\in I}). Thus, by the direct method of calculus of variations, this minimization problem admits a solution (𝔳t)tIV¯(\mathfrak{v}_{t})_{t\in I}\in\overline{V}. Furthermore, the functional is strictly convex, thus this minimizer is unique. Moreover, the minimizer (𝔳t)tI(\mathfrak{v}_{t})_{t\in I} satisfies the Euler-Lagrange equation

0\displaystyle 0 =cd1I𝕊d1Λdφ^𝔳s𝑑μ^s𝑑sL(Λdφ^) for all φCc(I×d).\displaystyle=c_{d}^{-1}\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\Lambda_{d}\widehat{\nabla\varphi}\cdot\mathfrak{v}_{s}\,d\widehat{\mu}_{s}\,ds-L(\Lambda_{d}\widehat{\nabla\varphi})\;\text{ for all }\varphi\in C_{c}^{\infty}(I\times\mathbb{R}^{d}).

As 𝔳sL2(μ^s)\mathfrak{v}_{s}\in L^{2}(\widehat{\mu}_{s}) for 1\mathscr{L}^{1}-a.e. sIs\in I, by Lemma 3.4 we can find Js(d;d)𝒮(d;d)J_{s}\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}) such that RJs=μ^s𝔳sRJ_{s}=\widehat{\mu}_{s}\mathfrak{v}_{s} for 1\mathscr{L}^{1}-a.e. sIs\in I. From this we deduce (μt,Jt)tI(\mu_{t},J_{t})_{t\in I} satisfies the continuity equation in the sense of distributions, as (2.13) implies that for each φCc(I×d)\varphi\in C_{c}^{\infty}(I\times\mathbb{R}^{d}), writing φs(x)=φ(s,x)\varphi_{s}(x)=\varphi(s,x),

IJs(φs)𝑑s\displaystyle\int_{I}J_{s}(\nabla\varphi_{s})\,ds =Icd1RJs(Λdφs^)𝑑s=cd1I𝕊d1Λdφs^(r)𝑑RJs(r,θ)𝑑s\displaystyle=\int_{I}c_{d}^{-1}RJ_{s}(\Lambda_{d}\widehat{\nabla\varphi_{s}})\,ds=c_{d}^{-1}\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\Lambda_{d}\widehat{\nabla\varphi_{s}}(r)\cdot\,dRJ_{s}(r,\theta)\,ds
=cd1I𝕊d1Λdφ^𝔳s𝑑μ^s𝑑s=L(Λdφ^)=Idsφs(x)dμs(x).\displaystyle=c_{d}^{-1}\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\Lambda_{d}\widehat{\nabla\varphi}\cdot\mathfrak{v}_{s}\,d\widehat{\mu}_{s}\,ds=L(\Lambda_{d}\widehat{\nabla\varphi})=-\int_{I}\int_{\mathbb{R}^{d}}\partial_{s}\varphi_{s}(x)\,d\mu_{s}(x).

In order to establish the pointwise inequality (3.15), first recall that, as J^sθ\widehat{J}_{s}\parallel\theta for a.e. sIs\in I,

BSW(μs,Js)=𝕊d1dJ^sdμ^s𝑑J^s=𝔳sL2(μ^s)2 for 1-a.e. sI.\displaystyle B_{SW}\left(\mu_{s},J_{s}\right)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{d\widehat{J}_{s}}{d\widehat{\mu}_{s}}\cdot d\widehat{J}_{s}=\|\mathfrak{v}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\text{ for }\mathscr{L}^{1}\text{-a.e. }s\in I.

Choose an interval I~I\tilde{I}\subset I and ηCc(I~)\eta\in C_{c}^{\infty}(\tilde{I}) with 0η10\leq\eta\leq 1, and φkCc(I×d)\varphi^{k}\in C_{c}^{\infty}(I\times\mathbb{R}^{d}) such that Λdφk^\Lambda_{d}\widehat{\nabla\varphi_{k}} converges to the minimizer 𝔳V¯\mathfrak{v}\in\overline{V} in L2((μ^t)tI)\|\cdot\|_{L^{2}((\widehat{\mu}_{t})_{t\in I})}. By replacing with a suitable subsequence, we may further assume φtk=φk(t,)Cc(d)\varphi^{k}_{t}=\varphi^{k}(t,\cdot)\in C_{c}^{\infty}(\mathbb{R}^{d}) converges to 𝔳t\mathfrak{v}_{t} in L2(μ^t;d)L^{2}(\widehat{\mu}_{t};\mathbb{R}^{d}) for 1\mathscr{L}^{1}-a.e. tIt\in I. Thus, using the bounds (3.16), we obtain

I𝕊d1ηdJ^sdμ^s𝑑J^s𝑑s=limkI𝕊d1cd1ηΛdφk^𝔳s𝑑μ^s𝑑s=limkL(ηΛdφk^)\displaystyle\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\eta\frac{d\widehat{J}_{s}}{d\widehat{\mu}_{s}}\cdot d\widehat{J}_{s}\,ds=\lim_{k\rightarrow\infty}\int_{I}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}c_{d}^{-1}\eta\Lambda_{d}\widehat{\nabla\varphi^{k}}\cdot\mathfrak{v}_{s}\,d\widehat{\mu}_{s}\,ds=\lim_{k\rightarrow\infty}L(\eta\Lambda_{d}\widehat{\nabla\varphi^{k}})
(I~|μ|SW2(s)𝑑s)12limk(I~Λdφsk^L2(μ^s)2𝑑s)12(I~|μ|SW2(s)𝑑s)12(I~𝔳sL2(μ^s)2𝑑s)12,\displaystyle\leq\left(\int_{\tilde{I}}|\mu^{\prime}|_{SW}^{2}(s)\,ds\right)^{\frac{1}{2}}\lim_{k\rightarrow\infty}\left(\int_{\tilde{I}}\|\Lambda_{d}\widehat{\nabla\varphi^{k}_{s}}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\,ds\right)^{\frac{1}{2}}\leq\left(\int_{\tilde{I}}|\mu^{\prime}|_{SW}^{2}(s)\,ds\right)^{\frac{1}{2}}\left(\int_{\tilde{I}}\|\mathfrak{v}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\,ds\right)^{\frac{1}{2}},

Letting η𝟙I~\eta\nearrow\mathds{1}_{\tilde{I}}, we have

I~BSW(μs,Js)𝑑s=I~𝕊d1dJ^sdμ^s𝑑J^s𝑑s(I~|μ|SW2(s)𝑑s)12(I~𝔳sL2(μ^s)2𝑑s)12.\int_{\tilde{I}}B_{SW}\left(\mu_{s},J_{s}\right)\,ds=\int_{\tilde{I}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{d\widehat{J}_{s}}{d\widehat{\mu}_{s}}\cdot d\widehat{J}_{s}\,ds\leq\left(\int_{\tilde{I}}|\mu^{\prime}|_{SW}^{2}(s)\,ds\right)^{\frac{1}{2}}\left(\int_{\tilde{I}}\|\mathfrak{v}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}\,ds\right)^{\frac{1}{2}}.

As BSW(μs,Js)=𝔳sL2(μ^s)2B_{SW}(\mu_{s},J_{s})=\|\mathfrak{v}_{s}\|_{L^{2}(\widehat{\mu}_{s})}^{2}, by reorganizing and squaring both sides we obtain

I~BSW(Js,μs)𝑑sI~|μ|SW2(s)𝑑s.\int_{\tilde{I}}B_{SW}\left(J_{s},\mu_{s}\right)\,ds\leq\int_{\tilde{I}}|\mu^{\prime}|_{SW}^{2}(s)\,ds.

As I~I\tilde{I}\subset I was arbitrary, we deduce

BSW(Jt,μt)|μ|SW2(t) for 1-a.e. tI.B_{SW}\left(J_{t},\mu_{t}\right)\leq|\mu^{\prime}|_{SW}^{2}(t)\text{ for }\mathscr{L}^{1}\text{-a.e. }t\in I.

One can readily check that the pair (μt,Jt)tI(\mu_{t},J_{t})_{t\in I} satisfies all the conditions in Definition 3.6, hence is in 𝒞I\mathcal{CE}_{I}; see Remark 3.7. ∎

Remark 3.10.

The characterization of the space (3.14) in which the identified flux JtJ_{t} belong will be useful in characterizing the tangent space in Section 3.2. Note that this space consists of vectors satisfying RJtθRJ_{t}\parallel\theta thus assumption (ii) in Lemma 3.4 is not restrictive, as noted in Remark 3.5.

As we do not know in general if the flux is a vector-valued measure, the question of whether JtμtJ_{t}\ll\mu_{t} may not even make sense. Furthermore, even if JtJ_{t} is sufficiently regular, it is obtained by the Radon inversion and thus it is difficult to determine whether the flux is absolutely continuous with respect to the measure μt\mu_{t}; the finiteness of BSW(μt,Jt)B_{SW}(\mu_{t},J_{t}) implies J^tμ^t\widehat{J}_{t}\ll\widehat{\mu}_{t} but not necessarily JtμtJ_{t}\ll\mu_{t}. For instance, let μ\mu be a normalized measure on the unit sphere B(0,1)\partial B(0,1) in d\mathbb{R}^{d} and σ𝒫2(d)\sigma\in\mathscr{P}_{2}(\mathbb{R}^{d}) be suitable normalization of 𝟙B(0,1/2)\mathds{1}_{B(0,1/2)}. Letting μt=(1t)μ+tσ\mu_{t}=(1-t)\mu+t\normalcolor\sigma\normalcolor for t[0,1]t\in[0,1], one can check that (μt)t[0,1)(\mu_{t})_{t\in[0,1)} is an absolutely continuous curve in the SW space, as in each projection suppμ^t=B(0,1)¯\operatorname{supp}\widehat{\mu}_{t}=\overline{B(0,1)} for all t[0,1)t\in[0,1). On the other hand, for any h>0h>0 the curve (μt)t[0,h)(\mu_{t})_{t\in[0,h)} cannot be absolutely continuous in the Wasserstein space, as mass is created outside the support of μt\mu_{t}. ∎

3.2. Tangential structure of the sliced Wasserstein space

Unlike absolutely continuous curves in the Wasserstein space which allow corresponding solutions of the form (μt,vtμt)tI(\mu_{t},v_{t}\mu_{t})_{t\in I} of the continuity equation, Theorem 3.9 only guarantees the solution of the continuity equation in the flux form (μt,Jt)tI(\mu_{t},J_{t})_{t\in I} (see Remark 3.10), where we only know Jt(d;d)𝒮(d;d)J_{t}\subset\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d};\mathbb{R}^{d}) in general. Furthermore, from the proof we saw that (3.14) must hold in order to ensure BSW(μt,Jt)=|μ|SW2(t)B_{SW}(\mu_{t},J_{t})=|\mu^{\prime}|_{SW}^{2}(t) for a.e. tIt\in I. This motivates us to define the tangent space at μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) after the space appearing in (3.14), namely

(3.18) Tanμ(𝒫2(d),SW):={ΛdR(μ^Λd(φ^)):φCc(d)}¯BSW(μ,)=R1[μ^{Λdφ^:φCc(d)}¯L2(μ^)](d;d).\begin{split}\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW)&\mathrel{\mathop{\mathchar 58\relax}}=\overline{\left\{\Lambda_{d}R^{\ast}(\widehat{\mu}\Lambda_{d}(\widehat{\nabla\varphi}))\;\mathrel{\mathop{\mathchar 58\relax}}\;\varphi\in C_{c}^{\infty}(\mathbb{R}^{d})\right\}}^{B_{SW}(\mu,\cdot)}\\ &=\normalcolor R^{-1}\left[\widehat{\mu}\overline{\left\{\Lambda_{d}\widehat{\nabla\varphi}\mathrel{\mathop{\mathchar 58\relax}}\;\varphi\in C_{c}^{\infty}(\mathbb{R}^{d})\right\}}^{L^{2}(\widehat{\mu})}\right]\normalcolor\subset\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}).\end{split}

In this section we highlight some properties of the tangent space. To begin, recall that in the case of the Wasserstein space, the tangent space Tanμ(𝒫2(d),W)\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),W) satisfies the optimality property [1, Lemma 8.4.2]

vTanμ(𝒫2(d),W) if and only if v+wL2(μ)vL2(μ) for all (wμ)=0.v\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),W)\text{ if and only if }\|v+w\|_{L^{2}(\mu)}\geq\|v\|_{L^{2}(\mu)}\text{ for all }\nabla\cdot(w\mu)=0.

We see that Tanμ(𝒫2(d),SW)\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) satisfies the analogous property.

Proposition 3.11 (Tangent space and optimality).

Let μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) and J(d;d)J\subset\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) such that BSW(μ,J)<B_{SW}(\mu,J)<\infty. Then JTanμ(𝒫2(d),SW)J\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) if and only if

(3.19) dJ^dμ^+dE^dμ^L2(μ^)dJ^dμ^L2(μ^)\mathinner{\!\left\lVert\frac{d\widehat{J}}{d\widehat{\mu}}+\frac{d\widehat{E}}{d\widehat{\mu}}\right\rVert}_{L^{2}(\widehat{\mu})}\geq\mathinner{\!\left\lVert\frac{d\widehat{J}}{d\widehat{\mu}}\right\rVert}_{L^{2}(\widehat{\mu})}

for all E(d;d)E\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) such that dE^dμ^L2(μ^;dd)\frac{d\widehat{E}}{d\widehat{\mu}}\in L^{2}(\widehat{\mu};\mathbb{P}_{d}^{d}) and E=0\nabla\cdot E=0 (in the sense of distributions). Moreover, such minimizer JTanμ(𝒫2(d),SW)J\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) is unique.

Proof.

Squaring both sides of (3.19), a simple scaling argument reveals that (3.19) is true if and only if E^,dJ^dμ^d=0\left\langle\widehat{E},\frac{d\widehat{J}}{d\widehat{\mu}}\right\rangle_{\mathbb{P}_{d}}=0 for all such EE. Indeed, as E(d;d)E\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) we may apply the duality formula (2.13) to see

E^,Λdφ^d=cdE,φd for all φCc(d).\langle\widehat{E},\Lambda_{d}\widehat{\nabla\varphi}\rangle_{\mathbb{P}_{d}}=c_{d}\langle E,\nabla\varphi\rangle_{\mathbb{R}^{d}}\text{ for all }\varphi\in C_{c}^{\infty}(\mathbb{R}^{d}).

As E=0\nabla\cdot E=0 in the sense of distributions, we see that E^,dJ^dμ^d=0\left\langle\widehat{E},\frac{d\widehat{J}}{d\widehat{\mu}}\right\rangle_{\mathbb{P}_{d}}=0 if and only if dJ^dμ^\frac{d\widehat{J}}{d\widehat{\mu}} is in the L2(μ^)L^{2}(\widehat{\mu}) closure of Λdφ^\Lambda_{d}\widehat{\nabla\varphi}, which characterizes Tanμ(𝒫2(d),SW)\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW).

Furthermore, uniqueness follows from strict convexity of the L2(μ^)L^{2}(\widehat{\mu})-norm and the linearity of the continuity equation in the flux. ∎

From this we deduce the following key property of the tangent space.

Proposition 3.12.

Let (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW). For any (μt,Jt)tI𝒞I(\mu_{t},J_{t})_{t\in I}\in\mathcal{CE}_{I}, we have

|μ|SW2(t)=BSW(μt,Jt) for a.e. tI if and only if JtTanμt(𝒫2(d),SW) for a.e. tI.|\mu^{\prime}|_{SW}^{2}(t)=B_{SW}(\mu_{t},J_{t})\text{ for a.e. }t\in I\normalcolor\text{ if and only if }\normalcolor J_{t}\in\operatorname{Tan}_{\mu_{t}}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW)\text{ for a.e. }t\in I.

In particular, such (Jt)tI(J_{t})_{t\in I} is determined uniquely for a.e. tIt\in I given (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW).

Proof.

Let (μt,Jt)tI𝒞I(\mu_{t},J_{t})_{t\in I}\in\mathcal{CE}_{I}, then we have Jt(d;d)J_{t}\in\mathscr{H}(\mathbb{R}^{d};\mathbb{R}^{d}) for a.e. tIt\in I – see Remark 3.7. Thus by Theorem 3.9 we know that in general |μ|SW2(t)BSW(μt,Jt)|\mu^{\prime}|_{SW}^{2}(t)\leq B_{SW}(\mu_{t},J_{t}), where the equality is attained by some flux JtJ_{t} for a.e. tIt\in I. Finally, Proposition 3.11 implies the minimizer is in the respective tangent space Tanμt(𝒫2(d),SW)\operatorname{Tan}_{\mu_{t}}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW), and that it is unique. ∎

Remark 3.13 (Nonexistence of SW\ell_{SW}-geodesics in some directions).

We emphasize that not every flux JTanμ(𝒫2(d),SW)J\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) can be attained as a velocity flux of an absolutely continuous curve. The following example illustrates that some fluxes JJ are admissible while J-J is not. Fix a small ε>0\varepsilon>0 and consider

μt=c((2cεt)𝟙B(0,1)B(0,ε)+t𝟙B(0,ε))dx𝒫2(d) where c,cε are normalizing constants .\mu_{t}=c((2-c_{\varepsilon}t)\normalcolor\mathds{1}_{B(0,1)\setminus B(0,\varepsilon)}\normalcolor+t\mathds{1}_{B(0,\varepsilon)})\,dx\in\mathscr{P}_{2}(\mathbb{R}^{d})\text{ where }c,c_{\varepsilon}\text{ are normalizing constants }.

To allow sufficient regularity of the objects constructed from the Radon inversion, we consider μtε2=μtηε2\mu_{t}^{\varepsilon^{2}}=\mu_{t}\ast\eta_{\varepsilon^{2}}; choice of the convolution radius ε2\varepsilon^{2} ensures that when ε<12\varepsilon<\frac{1}{2} then |μ0ε2|(B(0,ε/2))=0|\mu_{0}^{\varepsilon^{2}}|(B(0,\varepsilon/2))=0. On the other hand Rθμ0ε2>0R_{\theta}\mu_{0}^{\varepsilon^{2}}>0 in the interior of its support as long as ε\varepsilon is chosen sufficiently small. Clearly (μtε2)t[0,1]AC(𝒫2(d);SW)(\mu_{t}^{\varepsilon^{2}})_{t\in[0,1]}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW), and thus by Theorem 3.9 we can find corresponding (Jt)tI(J_{t})_{t\in I} in the tangent space, and as μtε2\mu_{t}^{\varepsilon^{2}} is smooth, JtJ_{t} must also be a smooth function. As

tμtε2|t=0=(J0)>0 on B(0,ε/2),\partial_{t}\mu_{t}^{\varepsilon^{2}}|_{t=0}=-(\nabla\cdot J_{0})>0\text{ on }B(0,\varepsilon/2),

proceeding in the direction J0-J_{0} from μ0\mu_{0} introduces negative mass in B(0,ε/2)B(0,\varepsilon/2). From this we see J0J_{0} is achievable as a tangent vector to a curve, but J0-J_{0} is not.

In general, not all fluxes in the tangent space vanish outside the support of μ\mu, hence cannot be attained by absolutely continuous curves in the space of probability measures. Thus, despite the definition (3.18) of Tanμ(𝒫2(d),SW)\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) as a vector space, the tangent vectors attainable by curves form a convex cone rather than a linear space. This suggests that (𝒫2(d);SW)(\mathscr{P}_{2}(\mathbb{R}^{d});\ell_{SW}) should be (formally) considered as a manifold with corners. ∎

4. The sliced Wasserstein length space

Given an interval II and a curve (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW), define its sliced Wasserstein length LSW((μt)tI)L_{SW}((\mu_{t})_{t\in I}) by

(4.1) LSW((μt)tI)=I|μ|SW(t)𝑑t.L_{SW}((\mu_{t})_{t\in I})=\int_{I}|\mu^{\prime}|_{SW}(t)\,dt.

We note that (4.1) is consistent with the usual notion of length in a metric space (see [16, Theorem 2.7.6]):

(4.2) LSW((μt)tI)=sup{i=0n1SW(μti,μti+1):t0<t1<<tn,tiI for i=0,,n}.L_{SW}((\mu_{t})_{t\in I})=\sup\left\{\sum_{i=0}^{n-1}SW(\mu_{t_{i}},\mu_{t_{i+1}})\mathrel{\mathop{\mathchar 58\relax}}\;\;t_{0}<t_{1}<\cdots<t_{n},\;\>t_{i}\in I\textrm{ for }i=0,\dots,n\right\}.

In this section we examine the length metric SW\ell_{SW} induced by SWSW

(4.3) SW(μ,ν)=inf{LSW((μt)t[0,1]):(μt)t[0,1]AC([0,1];𝒫2(d),SW),μ0=μ,μ1=ν}\begin{split}\ell_{SW}(\mu,\nu)=&\inf\left\{L_{SW}((\mu_{t})_{t\in[0,1]})\mathrel{\mathop{\mathchar 58\relax}}\;(\mu_{t})_{t\in[0,1]}\in AC([0,1];\mathscr{P}_{2}(\mathbb{R}^{d}),SW),\;\,\mu_{0}=\mu,\;\mu_{1}=\nu\right\}\end{split}

and the associated length space (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}). As SW1dWSW\leq\frac{1}{\sqrt{d}}W and WW coincides with its length metric, it immediately follows that SW1dW\ell_{SW}\leq\frac{1}{\sqrt{d}}W.

While in general the study of the intrinsic metric and the geometry is mathematically natural, it is particularly relevant for SWSW for the following reasons. Firstly, using the characterization of metric derivatives via the quadratic functional BSWB_{SW} in Theorem 3.9, we can consider a formal Riemannian structure on (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) analogous to that on (𝒫2(d),W)(\mathscr{P}_{2}(\mathbb{R}^{d}),W). Furthermore, in applications we are often interested in continuous deformations of probability measures, hence the geodesic distance that can be attained by absolutely continuous curves can be more relevant than the original distance.

After noting (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) is a complete metric space, in Lemma 4.3 we show the narrow precompactness of absolutely continuous curves in (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) and the lower semicontinuity of LSWL_{SW}. From this we deduce the lower semicontinuity of SW\ell_{SW} respect to the narrow convergence in Lemma 4.4 and the existence of SW\ell_{SW} geodesics in Proposition 4.5; in particular, the latter implies that in general SWSW\ell_{SW}\neq SW, as we have seen in Example 2.5 that (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) is not a geodesic space.

We first note that completeness of (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) follows from completeness of (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW).

Corollary 4.1 (Completeness).

(𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) is a complete metric space.

Proof.

Note that any Cauchy sequence (μn)(\mu_{n}) in (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}), is also Cauchy with respect to SWSW. By Proposition 2.4 we can find a limit μ0𝒫2(d)\mu_{0}\in\mathscr{P}_{2}(\mathbb{R}^{d}) such that SW(μn,μ0)n0SW(\mu_{n},\mu_{0})\xrightarrow[]{n\rightarrow\infty}0. By the topological equivalence of SWSW and WW [5, Theorem 2.3], SW(μn,μ0)d1/2W(μn,μ0)n0\ell_{SW}(\mu_{n},\mu_{0})\leq d^{-1/2}W(\mu_{n},\mu_{0})\xrightarrow[]{n\rightarrow\infty}0. ∎

In locally compact metric spaces the compactness of paths, lower semicontinuity of length, and existence of geodesics follow by classical arguments; see Section 4 of [2]. However, in (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) balls are not precompact; see Remark 2.2. On the other hand balls are precompact with respect to the narrow topology (Proposition 2.3) and the SW distance is lower semicontinuous with respect to narrow convergence. This allows to use instead the following refined version of Ascoli-Arzelà theorem [1, Proposition 3.3.1] to construct limiting curves and establish the existence of geodesics.

Proposition 4.2 (Proposition 3.3.1. of [1]).

Let (X,m)(X,m) be a complete metric space. Let T>0T>0 and KXK\subset X be a sequentially compact set with respect to topology σ\sigma, and let un:[0,T]Xu_{n}\mathrel{\mathop{\mathchar 58\relax}}[0,T]\rightarrow X be curves such that

(4.4) un(t)Kn,t[0,T],lim supnm(un(s),un(t))ω(s,t)s,t[0,T],u_{n}(t)\in K\quad\forall n\in\mathbb{N},\;t\in[0,T],\\ \limsup_{n\rightarrow\infty}m(u_{n}(s),u_{n}(t))\leq\omega(s,t)\quad\forall s,t\in[0,T],

for a symmetric function ω:[0,T]×[0,T][0,+)\omega\mathrel{\mathop{\mathchar 58\relax}}[0,T]\times[0,T]\rightarrow[0,+\infty), such that

lim(s,t)(r,r)ω(s,t)=0r[0,T]𝒞\lim_{(s,t)\rightarrow(r,r)}\omega(s,t)=0\quad\forall r\in[0,T]\setminus\mathscr{C}

where 𝒞\mathscr{C} is an (at most) countable subset of [0,T][0,T]. Then there exists an increasing subsequence kn(k)k\mapsto n(k) and a limit curve u:[0,T]Xu\mathrel{\mathop{\mathchar 58\relax}}[0,T]\rightarrow X such that

un(k)(t)𝜎u(t)t[0,T],u is continuous with respect to m in [0,T]𝒞.u_{n(k)}(t)\xrightharpoonup{\sigma}u(t)\quad\forall t\in[0,T],\;u\text{ is continuous with respect to }m\text{ in }[0,T]\setminus\mathscr{C}.

Setting m=SWm=SW and σ\sigma to be the topology generated by the narrow convergence in Proposition 4.2, we can modify the standard arguments to show pointwise narrow compactness of curves.

Lemma 4.3 (Pointwise narrow compactness for curves and lower semiconinuity of length).


Let II be a closed interval and suppose a sequence of curves (μtk)tIAC(𝒫2(d);SW)(\mu^{k}_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW) satisfies

supk1LSW((μtk)tI)< and supk,l1SW(μ0k,μ0l)<.\sup_{k\geq 1}L_{SW}((\mu^{k}_{t})_{t\in I})<\infty\text{ and }\sup_{k,l\geq 1}SW(\mu^{k}_{0},\mu^{l}_{0})<\infty.

Then, up to a reparametrization, there exists a curve (μt)tI(\mu_{t})_{t\in I} continuous in SWSW such that along a subsequence (which we do not relabel)

μtkμt narrowly for all tI.\mu^{k}_{t}\rightharpoonup\mu_{t}\text{ narrowly for all }t\in I.

Moreover,

(4.5) LSW((μt)tI)lim infkLSW((μtk)tI).L_{SW}((\mu_{t})_{t\in I})\leq\liminf_{k\rightarrow\infty}L_{SW}((\mu^{k}_{t})_{t\in I}).

In particular, (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW).

Proof.

As each (μtk)tI(\mu_{t}^{k})_{t\in I} is an absolutely continuously curve with uniformly bounded length, we may instead consider their Lipschitz reparametrizations [1, Lemma 1.1.4] to the interval I:=[0,1]I\mathrel{\mathop{\mathchar 58\relax}}=[0,1] with each of the Lipschitz constant is bounded above by the length of the curve. Thus the equicontinuity condition (4.4) is satisfied at all points s,tIs,t\in I with ω(s,t)=|st|supkLSW((μτk)τI)\omega(s,t)=|s-t|\sup_{k}L_{SW}((\mu^{k}_{\tau})_{\tau\in I}). Furthermore, the condition supk,l1SW(μ0k,μ0l)<\sup_{k,l\geq 1}SW(\mu^{k}_{0},\mu^{l}_{0})<\infty allows us to choose ν𝒫2(d)\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}) such that M2:=supkSW(ν,μ0k)M_{2}\mathrel{\mathop{\mathchar 58\relax}}=\sup_{k}SW(\nu,\mu^{k}_{0}) is finite. Then

SW(μtk,ν)SW(μ0k,μtk)+SW(μ0k,ν)L((μtk)tI)+supkSW(ν,μ0k)M1+M2.SW(\mu^{k}_{t},\nu)\leq SW(\mu^{k}_{0},\mu^{k}_{t})+SW(\mu^{k}_{0},\nu)\leq L((\mu^{k}_{t})_{t\in I})+\sup_{k}SW(\nu,\mu^{k}_{0})\leq M_{1}+M_{2}.

By Proposition 2.3, B¯SW(ν,M1+M2)\overline{B}_{SW}(\nu,M_{1}+M_{2}) is compact with respect to the narrow topology. Thus the refined Ascoli-Arzelà Theorem (Proposition 4.2) implies the existence of curve (μt)tI(\mu_{t})_{t\in I} continuous in tIt\in I such that (μtk)tI(\mu^{k}_{t})_{t\in I} pointwise converge narrowly at all tIt\in I as kk\nearrow\infty.

By Proposition 2.1 (μ,ν)SW(μ,ν)(\mu,\nu)\mapsto SW(\mu,\nu) is lower semicontinuous with respect to narrow convergence of measures. Thus for any fixed partition 0<t1<<tN0<t_{1}<\cdots<t_{N}, we can find sufficiently large kk such that SW(μk(tj),μ(tj))<ε/NSW(\mu^{k}(t_{j}),\mu(t_{j}))<\varepsilon/N for all jNj\leq N and thus

LSW((μt)tI)jN1SW(μ(tj),μ(tj+1))jN1SW(μk(tj),μk(tj+1))+2εLSW((μtk)tI)+2ε.L_{SW}((\mu_{t})_{t\in I})\leq\sum_{j\leq N-1}SW(\mu(t_{j}),\mu(t_{j+1}))\leq\sum_{j\leq N-1}SW(\mu^{k}(t_{j}),\mu^{k}(t_{j+1}))+2\varepsilon\leq L_{SW}((\mu^{k}_{t})_{t\in I})+2\varepsilon.

Letting ε0\varepsilon\searrow 0 and kk\to\infty, we see that indeed LSW((μt)tI)=limkLSW((μtk)tI)L_{SW}((\mu_{t})_{t\in I})=\lim_{k\nearrow\infty}L_{SW}((\mu^{k}_{t})_{t\in I}).∎

From Lemma 4.3 we deduce the lower semiconitnuity of SW\ell_{SW}.

Corollary 4.4 (lower semicontinuity of SW\ell_{SW}).

The map (μ,ν)SW(μ,ν)(\mu,\nu)\mapsto\ell_{SW}(\mu,\nu) on 𝒫2(d)×𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d})\times\mathscr{P}_{2}(\mathbb{R}^{d}) is lower semicontinuous with respect to the narrow convergence of measures.

Proof.

Let μk,νk\mu^{k},\nu^{k} be narrowly convergent sequences in 𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d}) with respective limits μ,ν\mu,\nu. Fix ε>0\varepsilon>0, and for each k=1,2,k=1,2,\cdots let (μtk)tIk(\mu_{t}^{k})_{t\in I_{k}}, Ik=[0,SW(μk,νk)+ε]I_{k}=[0,\ell_{SW}(\mu^{k},\nu^{k})+\varepsilon], be the arc-length parametrized curve [1, Lemma 1.1.4] such that

LSW((μtk)tIk)SW(μk,νk)+ε.L_{SW}((\mu_{t}^{k})_{t\in I_{k}})\leq\ell_{SW}(\mu^{k},\nu^{k})+\varepsilon.

By setting the μtk=νk\mu_{t}^{k}=\nu^{k} for tSW(μk,νk)+εt\geq\ell_{SW}(\mu^{k},\nu^{k})+\varepsilon, we can define all (μtk)t(\mu^{k}_{t})_{t} on a common bounded interval IIkI\supset I_{k}. As SWSW is lower semicontinuous, SW(μk,μ)k0SW(\mu^{k},\mu)\xrightarrow[]{k\rightarrow\infty}0. Thus by Lemma 4.3, there exists a limiting curve (μt)tIAC(𝒫2(d);SW)(\mu_{t})_{t\in I}\in AC(\mathscr{P}_{2}(\mathbb{R}^{d});SW) such that

μtkμt narrowly for all tI.\mu^{k}_{t}\rightharpoonup\mu_{t}\text{ narrowly for all }t\in I.

Moreover, (μt)t(\mu_{t})_{t} is a curve connecting μ\mu and ν\nu, and by (4.5)

SW(μ,ν0)LSW((μt)tI)lim infkLSW((μtk)tI)=lim infkSW(μk,νk)+ε.\ell_{SW}(\mu,\nu^{0})\leq L_{SW}((\mu_{t})_{t\in I})\leq\liminf_{k\rightarrow\infty}L_{SW}((\mu^{k}_{t})_{t\in I})=\liminf_{k\rightarrow\infty}\ell_{SW}(\mu^{k},\nu^{k})+\varepsilon.

We conclude by letting ε0\varepsilon\searrow 0. ∎

Existence of geodesics in (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) also follows from Lemma 4.3.

Proposition 4.5 (SW\ell_{SW} is a geodesic metric).

For each μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}) there exists a length minimizing curve (μt)t[0,1](\mu_{t})_{t\in[0,1]} such that SW(μ,ν)=LSW((μt)t[0,1])\ell_{SW}(\mu,\nu)=L_{SW}((\mu_{t})_{t\in[0,1]}). In particular, (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) is a geodesic space.

Proof.

Let μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}). As SW1dWSW\leq\frac{1}{\sqrt{d}}W, the length-minimizing sequence of curves can be chosen to have length LSWL_{SW} controlled unformly by W(μ,ν)W(\mu,\nu). Then Lemma 4.3 directly implies the existence of a length minimizing curve (μt)t[0,1](\mu_{t})_{t\in[0,1]}. We also note that by Theorem 3.9 one can associate (μt,Jt)t[0,1]𝒞[0,1](\mu_{t},J_{t})_{t\in[0,1]}\in\mathcal{CE}_{[0,1]} such that SW2(μ,ν)=01BSW(μt,Jt)𝑑t\ell_{SW}^{2}(\mu,\nu)=\int_{0}^{1}B_{SW}(\mu_{t},J_{t})\,dt. ∎

Remark 4.6.

In case the geodesic (μt)t[0,1](\mu_{t})_{t\in[0,1]} attains the sliced Wasserstein distance between μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}), the geodesic can be characterized as the Radon inverse of the 1D displacement interpolant between μ^θ\widehat{\mu}^{\theta} and ν^θ\widehat{\nu}^{\theta}. However, in general such Radon inverse is not a probability measure, as noted in Example 2.5.

While the SW\ell_{SW}-geodesic (μt)t[0,1](\mu_{t})_{t\in[0,1]} remains in 𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d}), we cannot guarantee that the corresponding pair (μt,Jt)[0,1]𝒞[0,1](\mu_{t},J_{t})_{\in[0,1]}\in\mathcal{CE}_{[0,1]} satisfies μtJt\mu_{t}\ll J_{t}, or even that JtJ_{t} is a measure for a.e. t[0,1]t\in[0,1]. See Remark 3.10. However, it can be approximated by solution of the continuity equation with JtμtJ_{t}\ll\mu_{t} by concatenating (μtηε,Jtηε)t[0,1](\mu_{t}\ast\eta_{\varepsilon},J_{t}\ast\eta_{\varepsilon})_{t\in[0,1]} with the Wasserstein geodesics from μ\mu to μηε\mu\ast\eta_{\varepsilon} and νηε\nu\ast\eta_{\varepsilon} to ν\nu, where ηε\eta_{\varepsilon} is a suitable smooth convolution kernel with bandwidth ε1\varepsilon\ll 1. ∎

Remark 4.7.

We note that the projection to closed balls is not a sliced Wasserstein contraction. This property holds for all transportation distances which are increasing with Euclidean distance, but we show by explicit example that it does not hold for the sliced Wasserstein distance: let

με=12δ(0,1)+12δ(0,1+ε),νε=12δ(ε1+ε,11+ε)+12δ(ε1+ε,11+ε).\normalcolor\mu^{\varepsilon}=\frac{1}{2}\delta_{(0,-1)}+\frac{1}{2}\delta_{(0,\sqrt{1+\varepsilon})}\normalcolor,\quad\nu^{\varepsilon}=\frac{1}{2}\delta_{(-\sqrt{\frac{\varepsilon}{1+\varepsilon}},\sqrt{\frac{1}{1+\varepsilon}})}+\frac{1}{2}\delta_{(\sqrt{\frac{\varepsilon}{1+\varepsilon}},\sqrt{\frac{1}{1+\varepsilon}})}.

Then suppνεB¯(0,1)2\operatorname{supp}\nu^{\varepsilon}\subset\overline{B}(0,1)\subset\mathbb{R}^{2}. Let πB:2B¯(0,1)\pi^{B}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{2}\rightarrow\overline{B}(0,1) be the projection onto B¯(0,1)\overline{B}(0,1). Then via explicit computation one can verify that

SW(π#Bμε,νε)>SW(με,νε) for sufficiently small ε>0.SW(\pi^{B}_{\#}\mu^{\varepsilon},\nu^{\varepsilon})>SW(\mu^{\varepsilon},\nu^{\varepsilon})\;\text{ for sufficiently small }\varepsilon>0.

In light of this observation, the following question is nontrivial: Consider μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}) supported in a closed unit ball B¯\overline{B} centered at 0. Does it hold that for all t[0,1]t\in[0,1] the measures along the geodesic are supported within the same ball? This property sounds natural as μ^θ,ν^θ\widehat{\mu}^{\theta},\widehat{\nu}^{\theta} would be supported on [1,1][-1,1] for all θ𝕊d1\theta\in\mathbb{S}^{d-1}, but remains an open problem. ∎

Recall that in Example 2.5 established that in general the SWSW geodesics do not exist, whereas SW\ell_{SW} geodesics between μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}) always exist by Proposition 4.5. Thus we conclude this section with the following corollary.

Corollary 4.8.

(𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) is not a length space.

5. Comparisons with negative Sobolev norms and the Wasserstein distance

We establish two comparison results, Theorem 5.2, near absolutely continuous measures, and Theorem 5.5, near discrete measures. The former states that for suitable absolutely continuous measures, SW\ell_{SW} is equivalent to SWSW and both are comparable to the H˙(d+1)/2(d)\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})-norm as a consequence of the averaging effect of the Radon transform. On the other hand, Theorem 5.5 states that, roughly speaking, SWSW and SW\ell_{SW} are very close to 1dW\frac{1}{\sqrt{d}}W near discrete measures, as the smoothing effect due to averaging does not take place.

For absolutely continuous measures with densities bounded below by aa and above by bb, Peyre [49] established the metric equivalence

μνH˙1(d)bW(μ,ν)aμνH˙1(d);\|\mu-\nu\|_{\dot{H}^{-1}(\mathbb{R}^{d})}\lesssim_{b}W(\mu,\nu)\lesssim_{a}\|\mu-\nu\|_{\dot{H}^{-1}(\mathbb{R}^{d})};

see [36, Proposition 2.8] for an earlier proof of the first inequality above. Our results can be seen as providing analogous comparisons between the SW distance and a norm in a Hilbert space. As SW\ell_{SW} differs from SW, a question particular to our setup is whether the intrinsic distance also enjoys such comparison with H˙(d+1)/2(d)\dot{H}^{-(d+1)/2}(\mathbb{R}^{d}) norm. We answer this affirmatively in Theorem 5.2.

Recall that a measure λ𝒫(d)\lambda\in\mathscr{P}(\mathbb{R}^{d}) is log-concave if for any t(0,1)t\in(0,1) and Borel measurable sets A,BdA,B\subset\mathbb{R}^{d}

(5.1) λ((1t)A+tB)λ1t(A)λt(B).\lambda((1-t)A+tB)\geq\lambda^{1-t}(A)\lambda^{t}(B).

We first prove a useful lemma, which relies on that log-concavity is preserved by the pushforward with respect to projection xxθx\mapsto x\cdot\theta [13], and that log-concave measures have log-concave density [14].

Lemma 5.1.

Let λ𝒫2(d)\lambda\in\mathscr{P}_{2}(\mathbb{R}^{d}) be a log-concave measure. Let μ,ν𝒫2(d)\mu,\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}), and suppose there exists b>0b>0 such that

(5.2) μ^θ,ν^θbλ^θ for a.e. θ𝕊d1.\widehat{\mu}^{\theta},\widehat{\nu}^{\theta}\leq b\widehat{\lambda}^{\theta}\;\text{ for a.e. }\theta\in\mathbb{S}^{d-1}.

Then for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1} the displacement interpolation (μ^tθ)t[0,1](\widehat{\mu}_{t}^{\theta})_{t\in[0,1]} from μ^θ\widehat{\mu}^{\theta} to ν^θ\widehat{\nu}^{\theta} satisfies the same upper bound for all t[0,1]t\in[0,1].

Proof.

As projection preserves log-concavity of a measure, each λ^θ\widehat{\lambda}^{\theta} is log-concave, and thus λ^θ\widehat{\lambda}^{\theta} has a log-concave density with respect to 1\mathscr{L}^{1} unless it is a dirac mass [14, Theorem 3.2]. For θ𝕊d1\theta\in\mathbb{S}^{d-1} such that λ^θ\widehat{\lambda}^{\theta} is a dirac mass, the conclusion of this lemma is trivial, so we consider θ𝕊d1\theta\in\mathbb{S}^{d-1} such that λ^θ1\widehat{\lambda}^{\theta}\ll\mathscr{L}^{1}. It follows that μ^θ,ν^θ1\widehat{\mu}^{\theta},\widehat{\nu}^{\theta}\ll\mathscr{L}^{1}, thus we can fix the optimal transport map Tθ:T^{\theta}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}\rightarrow\mathbb{R} mapping μ^θ\widehat{\mu}^{\theta} to ν^θ\widehat{\nu}^{\theta}, and define

Ttθ(r)=(1t)r+tTθ(r).T^{\theta}_{t}(r)=(1-t)r+tT^{\theta}(r).

In the remainder of this proof, we identify the measures with their densities with respect to 1\mathscr{L}^{1}. The displacement interpolation μ^tθ\widehat{\mu}_{t}^{\theta} is given by μ^tθ=(Ttθ)#μ^θ\widehat{\mu}_{t}^{\theta}=(T_{t}^{\theta})_{\#}\widehat{\mu}^{\theta}. As μ^tθ(Ttθ)#λ^θ\widehat{\mu}_{t}^{\theta}\ll(T_{t}^{\theta})_{\#}\widehat{\lambda}^{\theta}, it suffices to show μ^tθ(Ttθ(r))bλ^θ(Trθ(r))\widehat{\mu}_{t}^{\theta}(T_{t}^{\theta}(r))\leq b\widehat{\lambda}^{\theta}(T_{r}^{\theta}(r)) for all rr\in\mathbb{R}. Arguing as in the proof of[38, Proposition D.2],

μ^tθ(Ttθ(r))=μ^θ(r)rTtθ(Tθ(r))=μ^θ(r)1t+tμ^θ(r)ν^θ(Tθ(r))=(1tμ^θ(r)+tν^θ(Tθ(r)))1(μ^θ(r))1t(ν(Tθ(r)))t,\displaystyle\widehat{\mu}_{t}^{\theta}(T_{t}^{\theta}(r))=\frac{\widehat{\mu}^{\theta}(r)}{\partial_{r}T^{\theta}_{t}(T^{\theta}(r))}=\frac{\widehat{\mu}^{\theta}(r)}{1-t+t\frac{\widehat{\mu}^{\theta}(r)}{\widehat{\nu}^{\theta}(T^{\theta}(r))}}=\left(\frac{1-t}{\widehat{\mu}^{\theta}(r)}+\frac{t}{\widehat{\nu}^{\theta}(T^{\theta}(r))}\right)^{-1}\leq(\widehat{\mu}^{\theta}(r))^{1-t}(\nu(T^{\theta}(r)))^{t},

where we have used the harmonic mean-geometric mean inequality. On the other hand, log-concavity of λ^θ\widehat{\lambda}^{\theta} implies

μ^tθ(Ttθ(r))(μ^θ(r))1t(ν(Tθ(r)))tb(λ^θ(r))1t(λ^θ(Tθ(r)))tλ^θ(Ttθ(r)).\widehat{\mu}_{t}^{\theta}(T_{t}^{\theta}(r))\leq(\widehat{\mu}^{\theta}(r))^{1-t}(\nu(T^{\theta}(r)))^{t}\leq b(\widehat{\lambda}^{\theta}(r))^{1-t}(\widehat{\lambda}^{\theta}(T^{\theta}(r)))^{t}\leq\widehat{\lambda}^{\theta}(T_{t}^{\theta}(r)).

Theorem 5.2 (Comparison between SW,SW\ell_{SW},SW and the H˙(d+1)/2\dot{H}^{-(d+1)/2}-norm.).

Let μ,ν,λ𝒫2(d)\mu,\nu,\lambda\in\mathscr{P}_{2}(\mathbb{R}^{d}), and let 0<ab<0<a\leq b<\infty such that

(5.3) aλ^θμ^θbλ^θ and ν^θbλ^θ for a.e. θ𝕊d1.a\widehat{\lambda}^{\theta}\leq\widehat{\mu}^{\theta}\leq b\widehat{\lambda}^{\theta}\;\text{ and }\;\widehat{\nu}^{\theta}\leq b\widehat{\lambda}^{\theta}\;\;\text{ for a.e. }\theta\in\mathbb{S}^{d-1}.

Then we have the following.

  • (i)

    If λ\lambda is log-concave then

    (5.4) SW(μ,ν)2baSW(μ,ν).\ell_{SW}(\mu,\nu)\leq 2\sqrt{\frac{b}{a}}SW(\mu,\nu).
  • (ii)

    If λd\lambda\ll\mathscr{L}^{d} with Cλ:=esssupr,θ𝕊d1dλ^θd1(r)<C_{\lambda}\mathrel{\mathop{\mathchar 58\relax}}=\operatorname*{esssup}_{r\in\mathbb{R},\theta\in\mathbb{S}^{d-1}}\frac{d\widehat{\lambda}^{\theta}}{d\mathscr{L}^{1}}(r)<\infty, then

    (5.5) 1bCλμνH˙(d+1)/2(d)SW(μ,ν).\sqrt{\frac{1}{bC_{\lambda}}}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\leq SW(\mu,\nu).

    Suppose further λ=λΩ:=1|Ω|d¬Ω\lambda=\lambda_{\Omega}\mathrel{\mathop{\mathchar 58\relax}}=\frac{1}{|\Omega|}\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega} for an open connected bounded Ωd\Omega\subset\mathbb{R}^{d}. Furthermore, let μ=ν\mu=\nu on ΩΩ~\Omega\setminus\tilde{\Omega} for some Ω~Ω\tilde{\Omega}\subset\subset\Omega. Then there exists C=C(d,Ω,dist(Ω~,Ω))C=C(d,\Omega,\operatorname{dist}(\tilde{\Omega},\partial\Omega)) such that

    (5.6) 1bCλμνH˙(d+1)/2(d)SW(μ,ν)SW(μ,ν)CaμνH˙(d+1)/2(d).\sqrt{\frac{1}{bC_{\lambda}}}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\leq SW(\mu,\nu)\leq\ell_{SW}(\mu,\nu)\leq\frac{C}{\sqrt{a}}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}.

In particular, if Ω\Omega in (ii) is also convex, then

(5.7) 1bCλμνH˙(d+1)/2(d)SW(μ,ν)SW(μ,ν)min{2baSW(μ,ν),CaμνH˙(d+1)/2}.\begin{split}\sqrt{\frac{1}{bC_{\lambda}}}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}&\leq SW(\mu,\nu)\\ &\leq\ell_{SW}(\mu,\nu)\leq\min\left\{2\sqrt{\frac{b}{a}}SW(\mu,\nu),\frac{C}{\sqrt{a}}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}}\right\}.\end{split}
Remark 5.3.

We leave a few remarks on the conditions of Theorem 5.2. A simple, and useful, condition that implies (5.3) is the following:

(5.8) aλμbλ and νbλ.a\lambda\leq\mu\leq b\lambda\;\text{ and }\;\nu\leq b\lambda.

We note that (5.3) only requires the comparison to hold after integrating over hyperplanes.

The condition Cλ<C_{\lambda}<\infty is satisfied whenever λd\lambda\ll\mathscr{L}^{d} is compactly supported and has bounded density. Indeed, denoting by Bk(0,M)B_{k}(0,M) the ball of radius MM in k\mathbb{R}^{k} centered at 0, if suppλBd(0,M)\operatorname{supp}\lambda\subset B_{d}(0,M),

dλ^θd1(r)=r+θdλdd𝑑d1dλddd1(Bd1(0,M)),\frac{d\widehat{\lambda}^{\theta}}{d\mathscr{L}^{1}}(r)=\int_{r+\theta^{\perp}}\frac{d\lambda}{d\mathscr{L}^{d}}\,d\mathscr{L}^{d-1}\leq\mathinner{\!\left\lVert\frac{d\lambda}{d\mathscr{L}^{d}}\right\rVert}_{\infty}\mathscr{L}^{d-1}(B_{d-1}(0,M)),

thus Cλ<C_{\lambda}<\infty whenever dλ/dd<\|d\lambda/d\mathscr{L}^{d}\|_{\infty}<\infty. However, λ\lambda need not be compactly supported for Cλ<C_{\lambda}<\infty; for instance, consider a Gaussian measure on d\mathbb{R}^{d}.

Observe also that the second part of (ii) requires connectedness but not convexity of Ω\Omega, whereas the comparison H˙1\dot{H}^{-1} with WW on d\mathbb{R}^{d} requires Ω\Omega to be convex [49]. This is because we only use displacement interpolation between 1D projections, and connected and convex sets coincide in \mathbb{R}. In fact, our proof only requires connectedness for each projection Ωθ\Omega^{\theta} defined in (5.9). However, to keep the statement simpler we use a stronger assumption that Ω\Omega is connected. ∎

Proof.

Our proof is a careful adaptation of the argument by Peyre [49] to the sliced Wasserstein setting. The main difficulty comes from the fact that the density of the projections μ^θ,ν^θ\widehat{\mu}^{\theta},\widehat{\nu}^{\theta} with respect to 1\mathscr{L}^{1} is not bounded away from zero, near the edge of their supports.

In this proof, we will use Ωθ\Omega^{\theta}\subset\mathbb{R} to denote the projection of Ωd\Omega\subset\mathbb{R}^{d} in the direction θ𝕊d1\theta\in\mathbb{S}^{d-1}, namely

(5.9) Ωθ={r:xΩ s.t. r=xθ}.\Omega^{\theta}=\{r\in\mathbb{R}\mathrel{\mathop{\mathchar 58\relax}}\;\exists x\in\Omega\text{ s.t. }r=x\cdot\theta\}.

Defining Ω\Omega to be the interior of suppλ\operatorname{supp}\lambda in the case (i), observe that in both cases (i) and (ii) Ωθ\Omega^{\theta}\subset\mathbb{R} is connected for each θ𝕊d1\theta\in\mathbb{S}^{d-1}, hence convex; in particular, the displacement interpolation between μ^θ\widehat{\mu}^{\theta} and ν^θ\widehat{\nu}^{\theta} remains in Ωθ\Omega^{\theta}.

Noting that d¬Ω\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega} is log-concave whenever Ω\Omega is convex, (5.7) follows directly from (5.4) and (5.6). Thus it suffices to prove items (i) and (ii).

Step 1o In this step we show that when λ\lambda is log-concave, the condition (5.3) implies the upper bound

SW(μ,ν)2baSW(μ,ν).\ell_{SW}(\mu,\nu)\leq 2\sqrt{\frac{b}{a}}SW(\mu,\nu).

We do this using comparison of WW distances of the projections along each θ\theta with corresponding weighted H˙1\dot{H}^{-1} norms. Consider the linear interpolation μ~t=(1t)μ+tν\tilde{\mu}_{t}=(1-t)\mu+t\nu and write μ~tθ=Rθμ~t=(1t)μ^θ+tν^θ\tilde{\mu}_{t}^{\theta}=R_{\theta}\tilde{\mu}_{t}=(1-t)\widehat{\mu}^{\theta}+t\widehat{\nu}^{\theta}. Then μ~tθ(1t)μ^θ\tilde{\mu}_{t}^{\theta}\geq(1-t)\widehat{\mu}^{\theta} and thus by duality H˙1(μ~tθ)(1t)1/2H˙1(μ^θ)\|\cdot\|_{\dot{H}^{-1}(\tilde{\mu}_{t}^{\theta})}\leq(1-t)^{-1/2}\|\cdot\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}. Hence, using Benamou-Brenier formula for each projection, we have

(5.10) SW(μ,ν)𝕊d101μ^θν^θH˙1(μ~tθ)𝑑t𝑑θ2𝕊d1μ^θν^θH˙1(μ^θ)𝑑θ.\ell_{SW}(\mu,\nu)\leq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{0}^{1}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\tilde{\mu}_{t}^{\theta})}\,dt\,d\theta\leq 2\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}\,d\theta.

Note that (5.10) required no assumption on μ,ν\mu,\nu. For each θ𝕊d1\theta\in\mathbb{S}^{d-1} let (μ^tθ)t[0,1](\widehat{\mu}_{t}^{\theta})_{t\in[0,1]} be a constant speed WW-geodesic from μ^θ\widehat{\mu}^{\theta} to ν^θ\widehat{\nu}^{\theta}. By Lemma 5.1 μ^tθbλ^θ\widehat{\mu}_{t}^{\theta}\leq b\widehat{\lambda}^{\theta} for all t[0,1]t\in[0,1]. As aλ^θμ^θa\widehat{\lambda}^{\theta}\leq\widehat{\mu}^{\theta}, we have

H˙1(μ^tθ)bH˙1(λ^θ)baH˙1(μ^θ),\|\cdot\|_{\dot{H}^{1}(\widehat{\mu}_{t}^{\theta})}\leq\sqrt{b}\|\cdot\|_{\dot{H}^{1}(\widehat{\lambda}^{\theta})}\leq\sqrt{\frac{b}{a}}\,\|\cdot\|_{\dot{H}^{1}(\widehat{\mu}^{\theta})},

and thus by duality

H˙1(μ^θ)baH˙1(μ^tθ).\|\cdot\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}\leq\sqrt{\frac{b}{a}}\,\|\cdot\|_{\dot{H}^{-1}(\widehat{\mu}_{t}^{\theta})}.

As μ^tθ\widehat{\mu}_{t}^{\theta} is a constant speed geodesic, tμ^tθH˙1(μ^tθ)=W(μ^θ,ν^θ)\|\partial_{t}\widehat{\mu}_{t}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}_{t}^{\theta})}=W(\widehat{\mu}^{\theta},\widehat{\nu}^{\theta}) and thus

μ^θν^θH˙1(μ^θ)01tμ^tθH˙1(μ^θ)𝑑tba01tμ^tθH˙1(μ^tθ)𝑑t=baW(μ^θ,ν^θ).\displaystyle\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}\leq\int_{0}^{1}\|\partial_{t}\widehat{\mu}_{t}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}\,dt\leq\sqrt{\frac{b}{a}}\int_{0}^{1}\|\partial_{t}\widehat{\mu}_{t}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}_{t}^{\theta})}\,dt=\sqrt{\frac{b}{a}}W(\widehat{\mu}^{\theta},\widehat{\nu}^{\theta}).

Hence

SW2(μ,ν)4𝕊d1μ^θν^θH˙1(μ^θ)2𝑑θ4ba𝕊d1W2(μ^θ,ν^θ)𝑑θ=4baSW2(μ,ν).\displaystyle\ell_{SW}^{2}(\mu,\nu)\leq 4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}^{2}\,d\theta\leq\frac{4b}{a}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}W^{2}(\widehat{\mu}^{\theta},\widehat{\nu}^{\theta})\,d\theta=\frac{4b}{a}SW^{2}(\mu,\nu).

Step 2o In this step we establish the lower bound

1bCλμνH˙(d+1)/2(d)SW(μ,ν).\sqrt{\frac{1}{bC_{\lambda}}}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\leq SW(\mu,\nu).

under the assumption that λd\lambda\ll\mathscr{L}^{d} and Cλ<C_{\lambda}<\infty. By construction

μ^θ,ν^θbλ^θbCλ1 for a.e. θ𝕊d1.\widehat{\mu}^{\theta},\widehat{\nu}^{\theta}\leq b\widehat{\lambda}^{\theta}\leq bC_{\lambda}\mathscr{L}^{1}\text{ for a.e. }\theta\in\mathbb{S}^{d-1}.

As 1\mathscr{L}^{1} is log-concave, by Lemma 5.1 the displacement interpolation μ^tθ\widehat{\mu}_{t}^{\theta} between μ^θ\widehat{\mu}^{\theta} and ν^θ\widehat{\nu}^{\theta} satisfies μ^tθbCλ1\widehat{\mu}^{\theta}_{t}\leq bC_{\lambda}\mathscr{L}^{1} for all t[0,1]t\in[0,1] and a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. Arguing as in[49, Theorem 5], we have

μ^θν^θH˙1()01tμ^tH˙1()bCλ01tμ^tH˙1(μ^tθ)=bCλW(μ^θ,ν^θ).\displaystyle\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\mathbb{R})}\leq\int_{0}^{1}\|\partial_{t}\widehat{\mu}_{t}\|_{\dot{H}^{-1}(\mathbb{R})}\leq\sqrt{bC_{\lambda}}\int_{0}^{1}\|\partial_{t}\widehat{\mu}_{t}\|_{\dot{H}^{-1}(\widehat{\mu}_{t}^{\theta})}=\sqrt{bC_{\lambda}}W(\widehat{\mu}^{\theta},\widehat{\nu}^{\theta}).

By averaging over θ𝕊d1\theta\in\mathbb{S}^{d-1} we obtain

1bCλμ^ν^H˙1(d)2SW2(μ,ν).\frac{1}{bC_{\lambda}}\|\widehat{\mu}-\widehat{\nu}\|_{\dot{H}^{-1}(\mathbb{P}_{d})}^{2}\leq SW^{2}(\mu,\nu).

Moreover, identifying the measures μ,ν\mu,\nu and their densities, we have μνL1(d)\mu-\nu\in L^{1}(\mathbb{R}^{d}). Thus by the Fourier slicing theorem (Proposition A.2) and change of variables ξ=θζ\xi=\theta\zeta, we have

μ^ν^H˙1(d)2\displaystyle\|\widehat{\mu}-\widehat{\nu}\|_{\dot{H}^{-1}(\mathbb{P}_{d})}^{2} =12(2π)d1𝕊d1|ζ|2|(1Rθ(μν))(ζ)|2𝑑ζ𝑑θ\displaystyle=\frac{1}{2(2\pi)^{d-1}}\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\zeta|^{-2}|(\mathcal{F}_{1}R_{\theta}(\mu-\nu))(\zeta)|^{2}\,d\zeta\,d\theta
=12(2π)d1𝕊d1|ζ|2|d(μν)(θζ)|2𝑑ζ𝑑θ\displaystyle=\frac{1}{2(2\pi)^{d-1}}\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\zeta|^{-2}|\mathcal{F}_{d}(\mu-\nu)(\theta\zeta)|^{2}\,d\zeta\,d\theta
=d|ξ|(d+1)|d(μν)(ξ)|2𝑑ξ=μνH˙(d+1)/2(d)2.\displaystyle=\int_{\mathbb{R}^{d}}|\xi|^{-(d+1)}|\mathcal{F}_{d}(\mu-\nu)(\xi)|^{2}\,d\xi=\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}^{2}.

Step 3o We show (5.6) under the additional assumption that λ=λΩ:=1|Ω|d¬Ω\lambda=\lambda_{\Omega}\mathrel{\mathop{\mathchar 58\relax}}=\frac{1}{|\Omega|}\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega} for some bounded connected Ωd\Omega\subset\mathbb{R}^{d} and μ=ν\mu=\nu in ΩΩ~\Omega\setminus\tilde{\Omega}. Let α:=dist(Ω~,Ω)\alpha\mathrel{\mathop{\mathchar 58\relax}}=\operatorname{dist}(\tilde{\Omega},\partial\Omega). Then

μ^θ=ν^θ on (infrΩθr,infrΩθr+α)(suprΩθrα,suprΩθr) for each θ𝕊d1.\widehat{\mu}^{\theta}=\widehat{\nu}^{\theta}\text{ on }(\inf_{r\in\Omega^{\theta}}r,\inf_{r\in\Omega^{\theta}}r+\alpha)\cup(\sup_{r\in\Omega^{\theta}}r-\alpha,\sup_{r\in\Omega^{\theta}}r)\text{ for each }\theta\in\mathbb{S}^{d-1}.

Recall dμ^θd1=dμ^θdλ^Ωθdλ^Ωθd1adλ^Ωθd1\frac{d\widehat{\mu}^{\theta}}{d\mathscr{L}^{1}}=\frac{d\widehat{\mu}^{\theta}}{d\widehat{\lambda}^{\theta}_{\Omega}}\frac{d\widehat{\lambda}^{\theta}_{\Omega}}{d\mathscr{L}^{1}}\geq a\frac{d\widehat{\lambda}^{\theta}_{\Omega}}{d\mathscr{L}^{1}}. Furthermore, as Ω\Omega is open and connected, dλ^Ωθd1>0\frac{d\widehat{\lambda}^{\theta}_{\Omega}}{d\mathscr{L}^{1}}>0 on the interval (infrΩθr,suprΩθr)(\inf_{r\in\Omega^{\theta}}r,\sup_{r\in\Omega^{\theta}}r). Thus there exists some constant CC depending only on Ω,α,d\Omega,\alpha,d such that

μ^θ1aλ^ΩθCa1 on [infrΩθr+α,suprΩθrα].\widehat{\mu}^{\theta}\geq\frac{1}{a}\widehat{\lambda}_{\Omega}^{\theta}\geq\frac{C}{a}\mathscr{L}^{1}\text{ on }[\inf_{r\in\Omega^{\theta}}r+\alpha,\sup_{r\in\Omega^{\theta}}r-\alpha].

Combining this with (5.10), we can find some C=C(Ω,α,d)>0C=C(\Omega,\alpha,d)>0 such that

SW(μ,ν)2𝕊d1μ^θν^θH˙1(μ^θ)𝑑θCa𝕊d1μ^θν^θH˙1()𝑑θ,\displaystyle\ell_{SW}(\mu,\nu)\leq 2\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}\,d\theta\leq\frac{C}{\sqrt{a}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\mathbb{R})}\,d\theta,

By Jensen’s inequality and the Radon isometry argument as in Step 2, we deduce

SW2(μ,ν)C2aμ^ν^H˙1(d)2=C2aμνH˙(d+1)/2(d).\displaystyle\ell_{SW}^{2}(\mu,\nu)\leq\frac{C^{2}}{a}\|\widehat{\mu}-\widehat{\nu}\|_{\dot{H}^{-1}(\mathbb{P}_{d})}^{2}=\frac{C^{2}}{a}\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}.

Remark 5.4.

We make a few further remarks. Firstly, the lower bound

μνH˙(d+1)/2(d)SW(μ,ν)\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\lesssim SW(\mu,\nu)

does not hold for general measures. In fact, while the Sobolev embedding Theorem implies δ0H(d+1)/2(d)\delta_{0}\in H^{-(d+1)/2}(\mathbb{R}^{d}), the same is not true for H˙(d+1)/2(d)\dot{H}^{-(d+1)/2}(\mathbb{R}^{d}). Recalling that dδ0\mathcal{F}_{d}\delta_{0} is a constant, we see that for any f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d})

f(0)=df(x)𝑑δ0(x)=Cd(df)(ξ)𝑑ξ.\displaystyle f(0)=\int_{\mathbb{R}^{d}}f(x)\,d\delta_{0}(x)=C\int_{\mathbb{R}^{d}}(\mathcal{F}_{d}f)(\xi)\,d\xi.

By (A.13), if δ0H˙(d+1)/2(d)\delta_{0}\in\dot{H}^{-(d+1)/2}(\mathbb{R}^{d}) then the right-hand side must be controlled by fH˙(d+1)/2(d)\|f\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}, which is clearly not true; for instance, consider increasingly concentrated Gaussians centered at zero. Similarly, δxδyH˙(d+1)/2(d)\delta_{x}-\delta_{y}\not\in\dot{H}^{-(d+1)/2}(\mathbb{R}^{d}) in general for x,ydx,y\in\mathbb{R}^{d}, whereas SW(δx,δy)1d|xy|SW(\delta_{x},\delta_{y})\leq\frac{1}{\sqrt{d}}|x-y|.

We further note that the upper bound (5.6) requires the additional condition that μ=ν\mu=\nu near the boundary Ω\partial\Omega. Indeed, letting λ=cd¬Ω𝒫2(d)\lambda=c\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega}\in\mathscr{P}_{2}(\mathbb{R}^{d}) for some bounded domain Ωd\Omega\subset\mathbb{R}^{d}, the density of λ^θ\widehat{\lambda}^{\theta} is not bounded away from zero; this makes it difficult to control μνH˙1(λ^θ)\|\mu-\nu\|_{\dot{H}^{-1}(\widehat{\lambda}^{\theta})} with μνH˙1()\|\mu-\nu\|_{\dot{H}^{-1}(\mathbb{R})}. Indeed, denoting by FσF_{\sigma} the cumulative distribution function (CDF) of σ𝒫()\sigma\in\mathscr{P}(\mathbb{R}), aλμ,νbλa\lambda\leq\mu,\nu\leq b\lambda does not in general guarantee

Fμ^θFν^θL2((λ^θ)1)=μ^θν^θH˙1(λ^θ)λ,a,bμ^θν^θH˙1()=Fμ^θFν^θL2().\|F_{\widehat{\mu}^{\theta}}-F_{\widehat{\nu}^{\theta}}\|_{L^{2}((\widehat{\lambda}^{\theta})^{-1})}=\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\widehat{\lambda}^{\theta})}\lesssim_{\lambda,a,b}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\mathbb{R})}=\|F_{\widehat{\mu}^{\theta}}-F_{\widehat{\nu}^{\theta}}\|_{L^{2}(\mathbb{R})}.

To see this, let d2d\geq 2 and Ω=B(0,1)\Omega=B(0,1). As λ\lambda is radially symmetric, without loss of generality we restrict our attention to the projection onto the e1e^{1} direction. Then λ^e1(r)=c(r+1)(d1)/2\widehat{\lambda}^{e_{1}}(r)=c(r+1)^{(d-1)/2} for 1r0-1\leq r\leq 0, and is symmetric about 0. Consider one-dimensional measures μh,νh𝒫2()\mu_{h},\nu_{h}\in\mathscr{P}_{2}(\mathbb{R}) such that their CDFs satisfy Fμh=FνhF_{\mu_{h}}=F_{\nu_{h}} on [1+2h,1][-1+2h,1] while μh=bλ^e1\mu_{h}=b\widehat{\lambda}^{e_{1}}, νh=aλ^e1\nu_{h}=a\widehat{\lambda}^{e_{1}} on [1,1+h][-1,-1+h]; note this is possible by prescribing suitable behavior on the interval (1+h,1+2h)(-1+h,-1+2h). Then we have FμhFνh=(ba)Fλ^e1F_{\mu_{h}}-F_{\nu_{h}}=(b-a)F_{\widehat{\lambda}^{e_{1}}} on [1,1+h][-1,-1+h]. From direct calculations one can check that

μhνhH˙1(λ^e1)2μhνhH˙1()2dhd12h0.\frac{\|\mu_{h}-\nu_{h}\|_{\dot{H}^{-1}(\widehat{\lambda}^{e_{1}})}^{2}}{\|\mu_{h}-\nu_{h}\|_{\dot{H}^{-1}(\mathbb{R})}^{2}}\gtrsim_{d}h^{-\frac{d-1}{2}}\xrightarrow[]{h\searrow 0}\infty.

As λ\lambda is radially symmetric, we can come up with examples of measures in 𝒫2(d)\mathscr{P}_{2}(\mathbb{R}^{d}) such that their projections satisfy similar estimates. ∎

We now study the behavior of SWSW around discrete measures. The \infty-Wasserstein distance WW_{\infty} is defined by

(5.11) W(μ,ν):=infγΓ(μ,ν)γesssup(x,y)suppγ|xy|.W_{\infty}(\mu,\nu)\mathrel{\mathop{\mathchar 58\relax}}=\inf_{\gamma\in\Gamma(\mu,\nu)}\gamma-\operatorname*{esssup}_{(x,y)\in\operatorname{supp}\gamma}|x-y|.

We have seen that for any xdx\in\mathbb{R}^{d} SW(ν,δx)=1dW(ν,δx)SW(\nu,\delta_{x})=\frac{1}{\sqrt{d}}W(\nu,\delta_{x}). Similarly, if μ\mu is a discrete measure with support {xi}i=1,,n\{x_{i}\}_{i=1,\cdots,n} and W(μ,ν)W_{\infty}(\mu,\nu) is sufficiently small, any optimal transport map should map all the mass of ν\nu near xisuppμx_{i}\in\operatorname{supp}\mu to xix_{i}. Moreover, for most directions θ\theta the same is true at the level of projections as well. This allows us to show that within WW_{\infty}-balls of a discrete measure, SWSW metric can be well approximated by 1dW\frac{1}{\sqrt{d}}W.

Theorem 5.5.

Assume μ\mu is a discrete probability measure: μ=i=1nmiδyi\mu=\sum_{i=1}^{n}m_{i}\delta_{y_{i}} where all masses are positive and all points are distinct. Let lμ=minij|yiyj|l_{\mu}=\min_{i\neq j}|y_{i}-y_{j}|. Then there exists C1C\geq 1 only dependent on dd such that if W(μ,ν)<lμ4CnW_{\infty}(\mu,\nu)<\frac{l_{\mu}}{4Cn}, we have

(5.12) 01dW2(μ,ν)SW2(μ,ν)4CnlμW(μ,ν)SW2(μ,ν).0\leq\frac{1}{d}W^{2}(\mu,\nu)-SW^{2}(\mu,\nu)\leq\frac{4Cn}{l_{\mu}}W_{\infty}(\mu,\nu)SW^{2}(\mu,\nu).

Thus, we have the comparison

(5.13) SW2(μ,ν)SW2(μ,ν)1dW2(μ,ν)(1+4Cnlμ1W(μ,ν))SW2(μ,ν).SW^{2}(\mu,\nu)\leq\ell_{SW}^{2}(\mu,\nu)\leq\frac{1}{d}W^{2}(\mu,\nu)\leq(1+4Cnl_{\mu}^{-1}W_{\infty}(\mu,\nu))SW^{2}(\mu,\nu).
Proof.

Let δ<lμ2\delta<\frac{l_{\mu}}{2}. We claim that we can find C=C(d)1C=C(d)\geq 1 such that

(5.14) 1dW2(μ,ν)SW2(μ,ν)2CnδlμW2(μ,ν) for W(μ,ν)δ.\frac{1}{d}W^{2}(\mu,\nu)-SW^{2}(\mu,\nu)\leq\frac{2Cn\delta}{l_{\mu}}W^{2}(\mu,\nu)\text{ for }W_{\infty}(\mu,\nu)\leq\delta.

The desired result (5.12) follows from the claim (5.14). Indeed, setting δ=W(μ,ν)\delta=W_{\infty}(\mu,\nu) whenever W(μ,ν)<lμ4CnW_{\infty}(\mu,\nu)<\frac{l_{\mu}}{4Cn}, as ε=2Cnδlμ<12\varepsilon=\frac{2Cn\delta}{l_{\mu}}<\frac{1}{2} implies (1ε)11+2ε(1-\varepsilon)^{-1}\leq 1+2\varepsilon.

Thus it remains to prove (5.14). To this end, let γΓo(ν,μ)\gamma^{\infty}\in\Gamma^{\infty}_{o}(\nu,\mu) be the \infty-transport plan. As δ<lμ2\delta<\frac{l_{\mu}}{2} we know that γ=(Id×T)#ν\gamma^{\infty}=(\operatorname{Id}\times T^{\infty})_{\#}\nu for some transport map TT^{\infty}, which is also the optimal transport map for the quadratic cost, and satisfies

TIdL(ν)δ.\|T^{\infty}-\operatorname{Id}\|_{L^{\infty}(\nu)}\leq\delta.

For each θ𝕊d1\theta\in\mathbb{S}^{d-1}, let γθΓo2(ν^θ,μ^θ)\gamma^{\theta}\in\Gamma^{2}_{o}(\widehat{\nu}^{\theta},\widehat{\mu}^{\theta}). For each xsuppνx\in\operatorname{supp}\nu, define the set of angles AxA_{x} where the γθ\gamma^{\theta} differs from the 1D-coupling induced by TT^{\infty} – i.e.

Ax:={θ𝕊d1:(xθ,yθ)suppγθ s.t. |yθxθ|<|θ(T(x)x)|}.A_{x}\mathrel{\mathop{\mathchar 58\relax}}=\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\;\;\exists(x\cdot\theta,y^{\theta})\in\operatorname{supp}\gamma^{\theta}\text{ s.t. }|y^{\theta}-x\cdot\theta|<|\theta\cdot(T^{\infty}(x)-x)|\}.

To control 1dW2(μ,ν)SW2(μ,ν)\frac{1}{d}W^{2}(\mu,\nu)-SW^{2}(\mu,\nu), it suffices to control the size of AxA_{x}, as

1dW2(μ,ν)\displaystyle\frac{1}{d}W^{2}(\mu,\nu) SW2(μ,ν)1dTIdL2(ν)2SW2(μ,ν)\displaystyle-SW^{2}(\mu,\nu)\leq\frac{1}{d}\|T^{\infty}-\operatorname{Id}\|_{L^{2}(\nu)}^{2}-SW^{2}(\mu,\nu)
=d𝕊d1|θ(T(x)x)|2𝑑θ𝑑ν(x)d𝕊d1|yθxθ|𝑑γθ(xθ,yθ)𝑑θ\displaystyle=\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|\theta\cdot(T^{\infty}(x)-x)|^{2}\,d\theta\,d\nu(x)-\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|y^{\theta}-x^{\theta}|\,d\gamma^{\theta}(x^{\theta},y^{\theta})\,d\theta
d1|𝕊d1|Ax|θ(T(x)x)|2𝑑θ𝑑ν(x).\displaystyle\leq\int_{\mathbb{R}^{d}}\frac{1}{|\mathbb{S}^{d-1}|}\int_{A_{x}}|\theta\cdot(T^{\infty}(x)-x)|^{2}\,d\theta\,d\nu(x).

As TIdL(ν)δ\|T_{\infty}-\operatorname{Id}\|_{L^{\infty}(\nu)}\leq\delta,

Ax\displaystyle A_{x} i=1n{θ𝕊d1:|yiθxθ|<|θ(T(x)x)|}\displaystyle\subset\bigcup_{i=1}^{n}\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}|y_{i}\cdot\theta-x\cdot\theta|<|\theta\cdot(T^{\infty}(x)-x)|\}
yisuppμyiT(x){θ𝕊d1:|yiθxθ|<δ}yisuppμyiT(x){θ𝕊d1:lμ2θyix|yix|<δ},\displaystyle\subset\bigcup_{\begin{subarray}{c}y_{i}\in\operatorname{supp}\mu\\ y_{i}\neq T^{\infty}(x)\end{subarray}}\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\;|y_{i}\cdot\theta-x\cdot\theta|<\delta\}\subset\bigcup_{\begin{subarray}{c}y_{i}\in\operatorname{supp}\mu\\ y_{i}\neq T^{\infty}(x)\end{subarray}}\left\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\;\frac{l_{\mu}}{2}\theta\cdot\frac{y_{i}-x}{|y_{i}-x|}<\delta\right\},

where we have used that for yiT(x)y_{i}\neq T^{\infty}(x)

|yix||T(x)yi||T(x)x|lμδ>lμ2.|y_{i}-x|\geq|T^{\infty}(x)-y_{i}|-|T^{\infty}(x)-x|\geq l_{\mu}-\delta>\frac{l_{\mu}}{2}.

Thus by Chebyshev’s inequality

|Ax||𝕊d1|2Cnδlμ\frac{|A_{x}|}{|\mathbb{S}^{d-1}|}\leq\frac{2Cn\delta}{l_{\mu}}

for some C=C(d)C=C(d). As TT^{\infty} is also the optimal transport map for the quadratic cost,

1dW2(μ,ν)SW2(μ,ν)\displaystyle\frac{1}{d}W^{2}(\mu,\nu)-SW^{2}(\mu,\nu) d1|𝕊d1|Ax|θ(T(x)x)|2𝑑θ𝑑ν(x)\displaystyle\leq\int_{\mathbb{R}^{d}}\frac{1}{|\mathbb{S}^{d-1}|}\int_{A_{x}}|\theta\cdot(T^{\infty}(x)-x)|^{2}\,d\theta\,d\nu(x)
1|𝕊d1|Axd|T(x)x|2𝑑ν(x)𝑑θ2CnδlμW2(μ,ν).\displaystyle\leq\frac{1}{|\mathbb{S}^{d-1}|}\int_{A_{x}}\int_{\mathbb{R}^{d}}|T^{\infty}(x)-x|^{2}\,d\nu(x)\,d\theta\leq\frac{2Cn\delta}{l_{\mu}}W^{2}(\mu,\nu).

As this is precisely our claim (5.14), we conclude the proof. ∎

6. Statistical properties of the sliced Wasserstein length

In this section we investigate the approximation error in SW\ell_{SW} distance between absolutely continuous measures μ\mu and the empirical measure of their i.i.d. samples, μn=1ni=1nδXi\mu^{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}} with Xii.i.d.μX_{i}\overset{i.i.d.}{\sim}\mu. The parametric rate of estimation for SWSW has already been observed, for instance by Manole, Balakrishnan, and Wasserman [37, Proposition 4] in the form 𝔼SW(μn,μ)n1/2\mathbb{E}SW(\mu^{n},\mu)\lesssim n^{-1/2}.

The main result of this section is Theorem 6.3 which shows that the corresponding concentration result holds for the SW\ell_{SW} distance, namely that

SW(μ,μn)lognn with high probability. \ell_{SW}(\mu,\mu^{n})\lesssim\sqrt{\frac{\log n}{n}}\;\text{ with high probability. }

Note that this directly implies SW(μ,μn)logn/nSW(\mu,\mu^{n})\lesssim\sqrt{\log n/n} with high probability, which is also new, to the best of our knowledge. We note that while proving SW(μ,μn)logn/nSW(\mu,\mu_{n})\lesssim\sqrt{\log n/n} only requires showing the estimation of one-dimensional Wasserstein distances holds in an integrated form over all projections to lines, showing estimates for SW(μ,μn)\ell_{SW}(\mu,\mu_{n}) requires constructing curves of length at most logn/n\sqrt{\log n/n} connecting μ\mu and μn\mu_{n}. We first provide a geometric intuition as to why this is to be expected. If tμt+(vtμt)=0\partial_{t}\mu_{t}+\nabla\cdot(v_{t}\mu_{t})=0, then

tμtH˙1(μt)=supφH˙1(μt)1dφ(vtμt)𝑑x=supφH˙1(μt)1dvtφdμt=vtL2(μt)=|μ|W(t),\|\partial_{t}\mu_{t}\|_{\dot{H}^{-1}(\mu_{t})}=\sup_{\|\varphi\|_{\dot{H}^{1}(\mu_{t})}\leq 1}\int_{\mathbb{R}^{d}}\varphi\cdot\nabla\cdot(v_{t}\mu_{t})\,dx=\sup_{\|\varphi\|_{\dot{H}^{1}(\mu_{t})}\leq 1}\int_{\mathbb{R}^{d}}v_{t}\cdot\nabla\varphi\,d\mu_{t}=\|v_{t}\|_{L^{2}(\mu_{t})}=|\mu^{\prime}|_{W}(t),

where we obtain the last equality by choosing φ\varphi such that φ=vt/vtL2(μt)\nabla\varphi=v_{t}/\|v_{t}\|_{L^{2}(\mu_{t})}. Thus

|μ|SW2(t)=𝕊d1|(μ^θ)|W2(t)𝑑θ=tμ^tH˙1(μ^t)2.|\mu^{\prime}|_{SW}^{2}(t)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(\widehat{\mu}^{\theta})^{\prime}|_{W}^{2}(t)\,d\theta=\|\partial_{t}\widehat{\mu}_{t}\|_{\dot{H}^{-1}(\widehat{\mu}_{t})}^{2}.

From (2.9) we know that the Radon transform is an isometry from H˙(d+1)/2(d)\dot{H}^{-(d+1)/2}(\mathbb{R}^{d}) to H˙1(d)\dot{H}^{-1}(\mathbb{P}_{d}). We note that the related Sobolev space H(d+1)/2(d)H^{(d+1)/2}(\mathbb{R}^{d}) is a Reproducing Kernel Hilbert Space (RKHS). Heuristically we can view (𝒫2(d),SW)(\mathscr{P}_{2}(\mathbb{R}^{d}),\ell_{SW}) as having an RKHS as a dual at each point μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}). It is important to note that dual metrics of RKHS norms – also known as Maximum Mean Discrepancy (MMD) – can be approximated at parametric rate [61]. Thus it is reasonable that the same holds for the nonlinear analogue, SW\ell_{SW}.

We will see that, under suitable assumptions, considering linear interpolation between μ,μn\mu,\mu^{n} is sufficient to establish the parametric rate of estimation in SW\ell_{SW}. Recall that in (5.10) we established

SW(μ,ν)2𝕊d1μ^θν^θH˙1(μ^θ)𝑑θ.\ell_{SW}(\mu,\nu)\leq 2\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}\,d\theta.

Take ν=μn\nu=\mu^{n} and suppose μ^θ1\widehat{\mu}^{\theta}\ll\mathscr{L}^{1} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. Write dμ^θ=fθd1d\widehat{\mu}^{\theta}=f^{\theta}\,d\mathscr{L}^{1} and let FθF^{\theta} and FnθF_{n}^{\theta} denote the cumulative distribution functions (CDFs) of RθμR_{\theta}\mu and RθμnR_{\theta}\mu^{n}, respectively. Then, for each test function φ𝒮()\varphi\in\mathcal{S}(\mathbb{R}) we have

φd(Rθ(μμn))=(Fθ(r)Fnθ(r))φ(r)𝑑rφH˙1(μ^θ)(|Fθ(r)Fnθ(r)|2fθ(r)𝑑r)1/2\displaystyle-\int_{\mathbb{R}}\varphi\,d(R_{\theta}(\mu-\mu^{n}))=\int_{\mathbb{R}}(F^{\theta}(r)-F_{n}^{\theta}(r))\varphi^{\prime}(r)\,dr\leq\|\varphi\|_{\dot{H}^{1}(\widehat{\mu}^{\theta})}\left(\int_{\mathbb{R}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|^{2}}{f^{\theta}(r)}\,dr\right)^{1/2}

as the (weak) derivative of FθFnθF^{\theta}-F^{\theta}_{n} is RθμRθμnR_{\theta}\mu-R_{\theta}\mu^{n}, and the boundary term from integration by parts vanishes as lim|r|Fθ(r)Fnθ(r)=0\lim_{|r|\to\infty}F^{\theta}(r)-F^{\theta}_{n}(r)=0. Thus from (2.10) we conclude

SW2(μ,μn)4𝕊d1RθμRθμnH˙1(μ^θ)2𝑑θ4𝕊d1|Fθ(r)Fnθ(r)|2fθ(r)𝑑r𝑑θ.\ell_{SW}^{2}(\mu,\mu^{n})\leq 4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|R_{\theta}\mu-R_{\theta}\mu^{n}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}^{2}\,d\theta\leq 4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|^{2}}{f^{\theta}(r)}\,dr\,d\theta.

Thus the key is to uniformly bound |FθFnθ|2|F^{\theta}-F_{n}^{\theta}|^{2} relative to fθf^{\theta}, which can decay rapidly near the boundary. This can be done using the relative VC-inequality due to Vapnik and Chervonenkis [64, Theorem 1] (see also [63, Chapter 3]). We state below the version of the relative VC inequality that can be found in [3, Theorem 2.1] and [22, Exercise 3.3]. The theorem provides an upper bound in terms of the shattering number (also known as the growth function or the shattering coefficient) of a class of sets, which quantifies richness or complexity of the class; we refer the readers to [63, Section 2.7] for a precise definition.

Theorem 6.1 (Vapnik and Chervonenkis, Theorem 1 of [64]).

Let μ𝒫(d)\mu\in\mathscr{P}(\mathbb{R}^{d}) and μn=1ni=1nδXi\mu^{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}} be the empirical measure of i.i.d samples Xii.i.d.μX_{i}\overset{i.i.d.}{\sim}\mu. For each class 𝒜\mathcal{A} of measurable subsets of d\mathbb{R}^{d}, and let S𝒜(k)S_{\mathcal{A}}(k) be its shattering number for kk points. Then

(6.1) (supA𝒜|μ(A)μn(A)|μ(A)ε)4S𝒜(2n)exp(nε24).\mathbb{P}\left(\sup_{A\in\mathcal{A}}\frac{|\mu(A)-\mu_{n}(A)|}{\sqrt{\mu(A)}}\geq\varepsilon\right)\leq 4S_{\mathcal{A}}(2n)\exp\left(-\frac{n\varepsilon^{2}}{4}\right).

By considering 𝒜\mathcal{A} to be the collection of half-spaces, we can deduce the following uniform concentration result of the empirical CDFs Fnθ\,F_{n}^{\theta}.

Corollary 6.2.

Let μ,μn\mu,\mu^{n} be as in Theorem 6.1, and for each θ𝕊d1\theta\in\mathbb{S}^{d-1} let Fθ,FnθF^{\theta},F_{n}^{\theta} be the respective cumulative distribution functions of Rθμ,Rθμn𝒫()R_{\theta}\mu,\,R_{\theta}\mu^{n}\in\mathscr{P}(\mathbb{R}). Then

(6.2) (supr,θ𝕊d1|Fθ(r)Fnθ(r)|Fθ(r)(1Fθ(r))ε)8(2n+1)d+1exp(nε216).\mathbb{P}\left(\sup_{r\in{\mathbb{R}},\theta\in\mathbb{S}^{d-1}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|}{\sqrt{F^{\theta}(r)(1-F^{\theta}(r))}}\geq\varepsilon\right)\leq 8(2n+1)^{d+1}\exp\left(-\frac{n\varepsilon^{2}}{16}\right).
Proof.

Take 𝒜\mathcal{A} in Theorem 6.1 to be the collection of all half spaces in d\mathbb{R}^{d}. The VC-dimension of half spaces in d\mathbb{R}^{d} is d+1d+1. Thus, by the Sauer-Shelah lemma [58, 56] (see also  [22, Corollary 1.4]), we have S𝒜(2n)(2n+1)d+1S_{\mathcal{A}}(2n)\leq(2n+1)^{d+1}. As

μ({xd:xθr})=Fθ(r) and μ({xd:xθ>r})=1Fθ(r),\mu(\{x\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\;x\cdot\theta\leq r\})=F^{\theta}(r)\;\text{ and }\;\mu(\{x\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\;x\cdot\theta>r\})=1-F^{\theta}(r),

we obtain

(supr,θ𝕊d1|Fθ(r)Fnθ(r)|Fθ(r)ε)4(2n+1)d+1exp(nε24)\displaystyle\mathbb{P}\left(\sup_{r\in{\mathbb{R}},\theta\in\mathbb{S}^{d-1}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|}{\sqrt{F^{\theta}(r)}}\geq\varepsilon\right)\leq 4(2n+1)^{d+1}\exp\left(-\frac{n\varepsilon^{2}}{4}\right)

and

(supr,θ𝕊d1|Fθ(r)Fnθ(r)|1Fθ(r)ε)4(2n+1)d+1exp(nε216).\displaystyle\mathbb{P}\left(\sup_{r\in{\mathbb{R}},\theta\in\mathbb{S}^{d-1}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|}{\sqrt{1-F_{\theta}(r)}}\geq\varepsilon\right)\leq 4(2n+1)^{d+1}\exp\left(-\frac{n\varepsilon^{2}}{16}\right).

We deduce (6.2) using that s(1s)12min{s,1s}s(1-s)\geq\frac{1}{2}\min\{s,1-s\} for s[0,1]s\in[0,1], as noted in [37, Example 2]. ∎

We establish the parametric rate of SW\ell_{SW} for measures with finite values of SJ2:𝒫2(d)[0,+]SJ_{2}\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow[0,+\infty] defined by

(6.3) SJ2(μ)=𝕊d1Fθ(r)(1Fθ(r))fθ(r)𝑑r𝑑θSJ_{2}(\mu)=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{F^{\theta}(r)(1-F^{\theta}(r))}{f^{\theta}(r)}\,dr\,d\theta

where fθf^{\theta} and FθF^{\theta} are respectively the density and the CDF of RθμR_{\theta}\mu, and we use the convention 0/0=00/0=0. The functional SJ2SJ_{2}, introduced in [37], is a sliced analogue of the functional J2J_{2} introduced by Bobkov and Ledoux [8] for one dimensional measures; in general, fθf^{\theta} is defined as the 1\mathscr{L}^{1}-density of the absolutely continuous component of μ^θ\widehat{\mu}^{\theta}, which need not be absolutely continuous with respect to 1\mathscr{L}^{1}. In the 1D case, finiteness of J2J_{2} is necessary and sufficient for 𝔼W2(μ,μn)\mathbb{E}W_{2}(\mu,\mu^{n}) to decay at rate n1/2n^{-1/2} [8, Section 5].

Manole, Balakrishnan, and Wasserman established [37, Proposition 4]

𝔼SW2(μ,μn)CSJ2(μ)n1/2\normalcolor\mathbb{E}SW_{2}(\mu,\mu^{n})\normalcolor\leq C\sqrt{SJ_{2}(\mu)}\,n^{-1/2}

for some constant C>0C>0 independent of μ\mu. Theorem 6.3 provides an analogous concentration result for SW\ell_{SW} only with the additional assumption that μ^θ1\widehat{\mu}^{\theta}\ll\mathscr{L}^{1} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}; note that this assumption holds whenever μ\mu is absolutely continuous with respect to the Lebesgue measure on an affine hyperplane of dimension at least 1.

Theorem 6.3 (Parametric estimation rate of empirical measures in SW\ell_{SW}).

Let μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) be such that μ^θ1\widehat{\mu}^{\theta}\ll\mathscr{L}^{1} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. Let μn=1ni=1nδXi\mu^{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}} where XiX_{i} are i.i.d samples of μ\mu. Then for each c>d+1c>d+1, we have

(6.4) SW2(μn,μ)SW(μn,μ)64clognnSJ2(μ)SW_{2}(\mu^{n},\mu)\leq\ell_{SW}(\mu^{n},\mu)\leq\sqrt{\frac{64c\log n}{n}}\sqrt{SJ_{2}(\mu)}

with probability at least 18(2n+1)d+1nc1-8(2n+1)^{d+1}n^{-c}, where SJ2SJ_{2} is as defined in (6.3).

Proof.

As noted earlier, letting dμ^θ=fθd1d\widehat{\mu}^{\theta}=f^{\theta}\,d\mathscr{L}^{1} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1} we have

SW2(μ,μn)4𝕊d1RθμRθμnH˙1(μ^θ)2𝑑θ4𝕊d1|Fθ(r)Fnθ(r)|2fθ(r)𝑑r𝑑θ\normalcolor\ell_{SW}^{2}(\mu,\mu^{n})\leq 4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|R_{\theta}\mu-R_{\theta}\mu^{n}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}^{2}\,d\theta\normalcolor\leq 4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|^{2}}{f^{\theta}(r)}\,dr\,d\theta

where FθF^{\theta} and FnθF_{n}^{\theta} are the respective CDFs of μ^θ\widehat{\mu}^{\theta} and RθμnR_{\theta}\mu^{n}. By Corollary 6.2 we have

(supr,θ𝕊d1|Fθ(r)Fnθ(r)|>sFθ(r)(1Fθ(r)))8(2n+1)d+1ens216.\displaystyle\mathbb{P}\left(\sup_{r\in{\mathbb{R}},\theta\in\mathbb{S}^{d-1}}|F^{\theta}(r)-F_{n}^{\theta}(r)|>s\sqrt{F^{\theta}(r)(1-F^{\theta}(r)})\right)\leq 8(2n+1)^{d+1}e^{-\frac{ns^{2}}{16}}.

Choosing s=4clognns=4\sqrt{c\frac{\log n}{n}}, with probability at least 18(2n+1)d+1nc1-8(2n+1)^{d+1}n^{-c} we have

SW2(μ,μn)4𝕊d1RθμRθμnH˙1(μ^θ)2𝑑θ\displaystyle\ell_{SW}^{2}(\mu,\mu^{n})\leq 4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\|R_{\theta}\mu-R_{\theta}\mu^{n}\|_{\dot{H}^{-1}(\widehat{\mu}^{\theta})}^{2}\,d\theta =4𝕊d1|Fθ(r)Fnθ(r)|2fθ(r)𝑑r𝑑θ\displaystyle=4\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{|F^{\theta}(r)-F_{n}^{\theta}(r)|^{2}}{f^{\theta}(r)}\,dr\,d\theta
64clognn𝕊d1Fθ(r)(1Fθ(r))fθ(r)𝑑r𝑑θ.\displaystyle\leq 64c\frac{\log n}{n}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{F^{\theta}(r)(1-F^{\theta}(r))}{f^{\theta}(r)}\,dr\,d\theta.

We now turn to providing practical, intuitive, and geometric conditions for finiteness of SJ2(μ)SJ_{2}(\mu). We show that one can uniformly bound the ratio Fθ(1Fθ)/fθF^{\theta}(1-F^{\theta})/f^{\theta} using a Cheeger-type isoperimetric constant h(μ)h(\mu) of the probability measure μ\mu, defined in the following way by Bobkov [6].

Definition 6.4 (Cheeger-type isoperimetric constant).

Let μ𝒫(d)\mu\in\mathscr{P}(\mathbb{R}^{d}). The isoperimetric constant h(μ)h(\mu) of μ\mu is defined by

(6.5) h(μ)=infAdμ+(A)min{μ(A),1μ(A)},h(\mu)=\inf_{A\subset\mathbb{R}^{d}}\frac{\mu^{+}(A)}{\min\{\mu(A),1-\mu(A)\}},

where the infimum is taken over all Borel sets AdA\subset\mathbb{R}^{d} and μ+\mu^{+} is defined by

μ+(A)=lim infr0+μ(Ar)μ(A)r,\displaystyle\mu^{+}(A)=\liminf_{r\rightarrow 0^{+}}\frac{\mu(A^{r})-\mu(A)}{r},

where Ar={xd:|xa|<r for some aA}A^{r}=\{x\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\;|x-a|<r\text{ for some }a\in A\} is the open rr-neighborhood of AA.

Corollary 6.5.

Let μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) be a probability measure with suppμB(0,M)\operatorname{supp}\mu\subset B(0,M) such that h(μ)>0h(\mu)>0 and μ^θ1\widehat{\mu}^{\theta}\ll\mathscr{L}^{1} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}. Then

(6.6) SJ2(μ)2Mh(μ).SJ_{2}(\mu)\leq\frac{2M}{h(\mu)}.

In particular, letting μn=1ni=1nδXi\mu^{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}} where XiX_{i} are i.i.d samples of μ\mu, we have, for each c>d+1c>d+1

(6.7) SW2(μn,μ)SW(μn,μ)128Mch(μ)lognnSW_{2}(\mu^{n},\mu)\leq\ell_{SW}(\mu^{n},\mu)\leq\sqrt{\frac{128Mc}{h(\mu)}}\sqrt{\frac{\log n}{n}}

with probability at least 18(2n+1)d+1nc1-8(2n+1)^{d+1}n^{-c}.

Proof.

As noted by Bobkov and Houdré [7, Theorem 1.3], in one dimensions we have the characterization

h(μ^θ)=essinfrsuppμ^θfθ(r)min{Fθ(r),(1Fθ(r))} where dμ^θ=fθd1,Fθ(r)=rfθ(s)𝑑s.\displaystyle h(\widehat{\mu}^{\theta})=\operatorname*{essinf}_{r\in\operatorname{supp}\widehat{\mu}^{\theta}}\frac{f^{\theta}(r)}{\min\{F^{\theta}(r),(1-F^{\theta}(r))\}}\;\text{ where }d\widehat{\mu}^{\theta}=f^{\theta}\,d\mathscr{L}^{1},\;F^{\theta}(r)=\int_{-\infty}^{r}f^{\theta}(s)\,ds.

Noting Fθ(r)(1Fθ(r))min{Fθ(r),1Fθ(r)}F^{\theta}(r)(1-F^{\theta}(r))\leq\min\{F^{\theta}(r),1-F^{\theta}(r)\}, we deduce

MMFθ(r)(1Fθ(r))fθ(r)𝑑r2Mh(μ^θ).\int_{-M}^{M}\frac{F^{\theta}(r)(1-F^{\theta}(r))}{f^{\theta}(r)}\,dr\leq\frac{2M}{h(\widehat{\mu}^{\theta})}.

Moreover, note that for any Borel set AA\subset\mathbb{R}, (μ^θ)+(A)min{μ^θ(A),1μ^θ(A)}=μ+((π~θ)1(A))min{μ((π~θ)1(A)),1μ((π~θ)1(A))}\;\frac{(\widehat{\mu}^{\theta})^{+}(A)}{\min\{\widehat{\mu}^{\theta}(A),1-\widehat{\mu}^{\theta}(A)\}}=\frac{\mu^{+}((\tilde{\pi}^{\theta})^{-1}(A))}{\min\{\mu((\tilde{\pi}^{\theta})^{-1}(A)),1-\mu((\tilde{\pi}^{\theta})^{-1}(A))\}}, where π~θ(x)=xθ\tilde{\pi}^{\theta}(x)=x\cdot\theta. In particular h(μ^θ)h(μ)h(\widehat{\mu}^{\theta})\geq h(\mu), and we obtain (6.6). Thus (6.7) follows directly from Theorem 6.3.∎

Remark 6.6.

The isoperimetric constant h(μ)h(\mu) quantifies the narrowness of the ‘bottleneck’ of μ\mu. Note that if the support of μ\mu is disconnected then h(μ)=0h(\mu)=0. On the other hand, for log-concave μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) Bobkov established a positive lower bound on h(μ)h(\mu)[6, Theorem 1.2].

Furthermore, h(μ)h(\mu) is bounded from below by the L1L^{1}-Poincaré constant of μ\mu. Indeed, h(μ)h(\mu) can be alternatively characterized as the largest constant satisfying the inequality

hd|φmedμ(φ)|𝑑μd|φ|𝑑μh\int_{\mathbb{R}^{d}}|\varphi-\operatorname{med}_{\mu}(\varphi)|\,d\mu\leq\int_{\mathbb{R}^{d}}|\nabla\varphi|\,d\mu

for all integrable locally Lipschitz φ:d\varphi\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R} with median medμφ\operatorname{med}_{\mu}\varphi (while median is nonunique, the statement holds for every median) with respect to the measure μ\mu; see for instance the proof of [7, Theorem 3.1]. As

d|φmedμ(φ)|𝑑μd|φdφ𝑑μ|𝑑μ,\int_{\mathbb{R}^{d}}|\varphi-\operatorname{med}_{\mu}(\varphi)|\,d\mu\leq\int_{\mathbb{R}^{d}}\left|\varphi-\int_{\mathbb{R}^{d}}\varphi\,d\mu\right|\,d\mu,

the constant h(μ)h(\mu) is at least as large as the L1L^{1}-Poincaré constant of μ\mu. Consequently, for any bounded open connected Ω\Omega with Lipschitz boundary, the measure λΩ:=|Ω|1d¬Ω\lambda_{\Omega}\mathrel{\mathop{\mathchar 58\relax}}=|\Omega|^{-1}\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega} satisfies h(λΩ)>0h(\lambda_{\Omega})>0.

Moreover, if μ,λ𝒫(d)\mu,\lambda\in\mathscr{P}(\mathbb{R}^{d}) satisfy aλμbλa\lambda\leq\mu\leq b\lambda for some 0<ab<0<a\leq b<\infty, then we have h(μ)abh(λ)h(\mu)\geq\frac{a}{b}h(\lambda). Indeed, for each Borel set AdA\subset\mathbb{R}^{d}

μ(A)bλ(A),1μ(A)=μ(dA)bλ(dA)=b(1λ(A)),\mu(A)\leq b\lambda(A),\qquad 1-\mu(A)=\mu(\mathbb{R}^{d}\setminus A)\leq b\lambda(\mathbb{R}^{d}\setminus A)=b(1-\lambda(A)),

whereas μ+(A)aλ+(A)\mu^{+}(A)\geq a\lambda^{+}(A). Thus

μ+(A)min{μ(A),1μ(A)}aλ+(A)bmin{λ(A),1λ(A)}.\displaystyle\frac{\mu^{+}(A)}{\min\{\mu(A),1-\mu(A)\}}\geq\frac{a\lambda^{+}(A)}{b\min\{\lambda(A),1-\lambda(A)\}}.

Thus any measure μ\mu comparable to (bounded above and below by) d¬Ω\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,\Omega satisfies h(μ)>0h(\mu)>0, given that Ω\Omega is bounded open connected and has a Lipschitz boundary. ∎

Remark 6.7.

Tudor Manole pointed out to us that the O(logn/n)O(\sqrt{\log n/n})-rate concentration bound of Corollary 6.5 is likely not sharp, as the relative VC inequality (Theorem 6.1) may be suboptimal when applied to CDFs.

Indeed, the asymptotically sharp uniform bound on |F(r)Fn(r)|/F(r)(1F(r))|F(r)-F_{n}(r)|/\sqrt{F(r)(1-F(r))} is of order O(loglogn/n)O(\log\log n/n), where FF is the CDF of μ\mu and FnF_{n} the corresponding empirical CDF; see the recent survey [55, Section 3.1] and references therein. This would lead to an improvement to O(loglogn/n)O(\sqrt{\log\log n/n})-rate in Corollary 6.5. As the relative VC inequality allows convenient uniform bound on empirical CDFs over all θ𝕊d1\theta\in\mathbb{S}^{d-1}, we do not pursue this refinement in this paper. ∎

7. Metric slopes and gradient flows in the sliced Wasserstein space

We examine the consequences of the local geometry of the sliced Wasserstein space on metric slopes and gradient flows. As in the previous sections, we contrast the behaviors of the metric slopes and gradient flows at absolutely continuous and discrete measures.

Since we are dealing with both the SW metric and the induced intrinsic distance SW\ell_{SW}, we start by commenting on the relationship between gradient flows with respect to the ambient and the intrinsic metric. In a general metric space (X,m)(X,m) the “gradient flows” of an energy :X(,+]\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}X\to(-\infty,+\infty] are defined as curves of maximal slope, namely the continuous curves u:[0,T]Xu\mathrel{\mathop{\mathchar 58\relax}}[0,T]\to X that satisfy

(7.1) ddt(ut)12|u|m2(t)12||m2(ut) for a.e. tI,\frac{d}{dt}\mathcal{E}(u_{t})\leq-\frac{1}{2}|u^{\prime}|_{m}^{2}(t)-\frac{1}{2}|\partial\mathcal{E}|_{m}^{2}(u_{t})\text{ for a.e. }t\in I,

where |u|m|u^{\prime}|_{m} is the metric derivative (3.3) and ||m|\partial\mathcal{E}|_{m} is the metric slope defined in (1.6); see [1, Definition 1.3.2] for a precise and more general definition.

Suppose m\ell_{m} is the length metric induced by mm; the definition above allows one to consider curves of maximal slope of \mathcal{E} in (X,m)(X,\ell_{m}) as well. We note that if XX is a Riemannian manifold isometrically embedded in d\mathbb{R}^{d} and mm is the Euclidean metric, then m\ell_{m} is the Riemannian distance with respect to the Riemannian metric of the manifold. It is straightforward to see that the gradient flows in the classical sense on the manifold coincide with the curves of maximal slope in both (X,m(X,\ell_{m}) and (X,m)(X,m).

This equivalence is not as clear in full generality for curves of maximal slopes. As mmm\leq\ell_{m}, in general ||m||m|\partial\mathcal{E}|_{\ell_{m}}\leq|\partial\mathcal{E}|_{m} and |u|m(t)|u|m(t)|u^{\prime}|_{m}(t)\leq|u^{\prime}|_{\ell_{m}}(t) for any absolutely continuous curve u:IXu\mathrel{\mathop{\mathchar 58\relax}}I\rightarrow X. Furthermore,

|u|m(t)limh0m(t,t+h)hlimh01htt+h|u|m(s)𝑑s=|u|m(t) for 1-a.e. tI,|u^{\prime}|_{\ell_{m}}(t)\leq\lim_{h\searrow 0}\frac{\ell_{m}(t,t+h)}{h}\leq\lim_{h\searrow 0}\frac{1}{h}\int_{t}^{t+h}|u^{\prime}|_{m}(s)\,ds=|u^{\prime}|_{m}(t)\;\text{ for }\mathscr{L}^{1}\text{-a.e. }t\in I,

where the last equality holds by absolute continuity. Consequently, any curve of maximal slope with respect to mm is a curve of maximal slope in m\ell_{m}.

However, it is in general unclear exactly when ||m=||m|\partial\mathcal{E}|_{\ell_{m}}=|\partial\mathcal{E}|_{m} holds. Muratori and Savaré showed the equivalence for approximately λ\lambda-convex functional \mathcal{E} [39, Proposition 2.1.6]. On a different note, the weighted energy dissipation (WED) approach to constructing curves of maximal slope, studied by Rossi, Savaré, Segatti, and Stefanelli [53] relies on functionals that only involve metric derivatives and the energy, but not the metric slope and thus does not distinguish between mm and m\ell_{m}. The authors construct solutions of (7.1) with metric slope |||\partial\mathcal{E}| replaced by its relaxation |||\partial^{-}\mathcal{E}|, provided |||\partial^{-}\mathcal{E}| is a strong upper gradient, which is not the case in general, and in particular is not true for potential energies in the SW space; see Corollary 7.7.

Let us now return to the discussion of metric slopes in the SW space. At an absolutely continuous measure μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) where we have the comparison (see Theorem 5.2)

μνH˙(d+1)/2(d)SW(μ,ν)SW(μ,ν)μνH˙(d+1)/2(d) for suitable ν𝒫2(d),\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\lesssim SW(\mu,\nu)\leq\ell_{SW}(\mu,\nu)\lesssim\|\mu-\nu\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}\,\text{ for suitable }\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}),

we formally expect

(7.2) ||H˙(d+1)/2(d)(μ)||SW(μ)||SW(μ)||H˙(d+1)/2(d)(μ).|\partial\mathcal{E}|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}(\mu)\lesssim|\partial\mathcal{E}|_{\ell_{SW}}(\mu)\leq|\partial\mathcal{E}|_{SW}(\mu)\lesssim|\partial\mathcal{E}|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}(\mu).

On the other hand, at a discrete measure μn=i=1nmiδyi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{y_{i}}, where we have comparison

SW(μ,ν)=1dW(μ,ν)+o(SW(μ,ν)) for suitable ν𝒫2(d),SW(\mu,\nu)=\frac{1}{\sqrt{d}}W(\mu,\nu)+o(SW(\mu,\nu))\text{ for suitable }\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}),

we expect

(7.3) ||SW(μn)=d||W(μn).|\partial\mathcal{E}|_{SW}(\mu^{n})=\sqrt{d}\,|\partial\mathcal{E}|_{W}(\mu^{n}).

Of course, the comparison theorems of Section 5 require restrictive conditions on μ,ν\mu,\nu and thus the comparisons of ||SW|\partial\mathcal{E}|_{SW} above are formal; rigorously establishing this in generality would be challenging. Hence, we provide rigorous proofs of (7.2) and (7.3) for the potential energy 𝒱(μ):=dV(x)dμ(x)\mathcal{V}(\mu)\mathrel{\mathop{\mathchar 58\relax}}=\int_{\mathbb{R}^{d}}V(x)\,d\mu(x) for suitable V:d[0,+)V\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow[0,+\infty), at absolutely continuous measures in Section 7.2 and at discrete measures in Section 7.3, respectively. Understanding of the metric slope allows us to show instability of curves of maximal slope in terms of initial data; see Proposition 7.5 and Remark 7.6.

7.1. Formal sliced Wasserstein gradient flows at smooth densities

We begin by formally deriving partial differential equations corresponding to sliced Wasserstein gradient flows, emphasizing that they are of order d1d-1 higher than their Wasserstein counterparts. For this purpose, it is convenient to limit our attention to the space of smooth positive measures 𝒫2(d)\mathscr{P}_{2}^{\infty}(\mathbb{R}^{d}), defined by

(7.4) 𝒫2(d)={ρdd𝒫2(d):ρC(d),ρ>0}.\mathscr{P}_{2}^{\infty}(\mathbb{R}^{d})=\left\{\rho\,d\mathscr{L}^{d}\in\mathscr{P}_{2}(\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}\;\rho\in C^{\infty}(\mathbb{R}^{d}),\;\rho>0\right\}.

Consider tμt+Jt=0\partial_{t}\mu_{t}+\nabla\cdot J_{t}=0 and set (μ,J):=(μ0,J0)(\mu,J)\mathrel{\mathop{\mathchar 58\relax}}=(\mu_{0},J_{0}). Writing diffμ(J)=ddt|t=0(μt)\operatorname{diff}\mathcal{E}_{\mu}(J)=\left.\frac{d}{dt}\right|_{t=0}\mathcal{E}(\mu_{t}), note that the (standard) Wasserstein gradient WL2(μ;d)\nabla_{W}\mathcal{E}\in L^{2}(\mu;\mathbb{R}^{d}) satisfies

diffμ(J)=dJdμ,WμL2(μ)=J,Wμd.\operatorname{diff}\mathcal{E}_{\mu}(J)=\left\langle\frac{dJ}{d\mu},\nabla_{W}\mathcal{E}_{\mu}\right\rangle_{L^{2}(\mu)}=\langle J,\nabla_{W}\mathcal{E}_{\mu}\rangle_{\mathbb{R}^{d}}.

Since the quadratic form JdRJ/dRμL2(Rμ)2J\mapsto\|dRJ/dR\mu\|_{L^{2}(R\mu)}^{2} characterizes the local metric of the SW space at μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}), formally the sliced Wasserstein gradient flux SWμTanμ(𝒫2(d),SW)\nabla_{SW}\mathcal{E}_{\mu}\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW) satisfies

(7.5) diffμ(J)=dRJdRμ,dR(SWμ)dRμL2(Rμ)=RJ,dR(SWμ)dRμd.\operatorname{diff}\mathcal{E}_{\mu}(J)=\left\langle\frac{dRJ}{dR\mu},\frac{dR(\nabla_{SW}\mathcal{E}_{\mu})}{dR\mu}\right\rangle_{L^{2}(R\mu)}=\left\langle RJ,\frac{dR(\nabla_{SW}\mathcal{E}_{\mu})}{dR\mu}\right\rangle_{\mathbb{P}_{d}}.

Suppose there exists some 𝔳μL2(μ^;d)\mathfrak{v}_{\mu}\in L^{2}(\widehat{\mu};\mathbb{R}^{d}) such that for all JTanμ(𝒫2(d),SW)J\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d}),SW)

(7.6) J,Wμd=RJ,𝔳μμ^d.\langle J,\nabla_{W}\mathcal{E}_{\mu}\rangle_{\mathbb{R}^{d}}=\langle RJ,\normalcolor\mathfrak{v}_{\mu}\widehat{\mu}\normalcolor\rangle_{\mathbb{P}_{d}}.

then SWμ=R1(𝔳μμ^)\nabla_{SW}\mathcal{E}_{\mu}=R^{-1}(\mathfrak{v}_{\mu}\widehat{\mu}) satisfies (7.5). For simplicity, suppose Wμ𝒮(d;d)\nabla_{W}\mathcal{E}_{\mu}\in\mathcal{S}(\mathbb{R}^{d};\mathbb{R}^{d}) and thus the inversion formula is valid. Then by (2.13)

𝔳μ=cd1ΛdRWμ\mathfrak{v}_{\mu}=c_{d}^{-1}\Lambda_{d}R\nabla_{W}\mathcal{E}_{\mu}

satisfies (7.6), and thus by the inversion formula,

SWμ=R1(𝔳μμ^)=cd2RΛd(μ^ΛdRWμ).\nabla_{SW}\mathcal{E}_{\mu}=R^{-1}(\mathfrak{v}_{\mu}\widehat{\mu})=c_{d}^{-2}R^{\ast}\Lambda_{d}(\widehat{\mu}\Lambda_{d}R\nabla_{W}\mathcal{E}_{\mu}).

By the definition (3.18), if Wμ=φ\nabla_{W}\mathcal{E}_{\mu}=\nabla\varphi for some potential φ\varphi then SWμTanμ(𝒫2(d);SW)\nabla_{SW}\mathcal{E}_{\mu}\in\operatorname{Tan}_{\mu}(\mathscr{P}_{2}(\mathbb{R}^{d});SW). Thus, formally, the gradient flow of \mathcal{E} in (𝒫2(d),SW)(\mathscr{P}_{2}^{\infty}(\mathbb{R}^{d}),\ell_{SW}) satisfies the equation

(7.7) tμtcd2(RΛd(μ^tΛdRWμ))=0.\partial_{t}\mu_{t}-c_{d}^{-2}\nabla\cdot\left(R^{\ast}\Lambda_{d}(\widehat{\mu}_{t}\Lambda_{d}R\nabla_{W}\mathcal{E}_{\mu})\right)=0.

Observe that the order of (7.7) is d1d-1 higher than the corresponding Wasserstein gradient flow equation. Namely, each Λd\Lambda_{d} is a differential operator of order d1d-1, whereas RR^{\ast} and RR jointly regularizes the function by d1d-1 derivatives. Note that the energy dissipation for (7.7) is formally

ddt(μt)=cd2μ^tΛdR(Wμt),ΛdR(Wμ)d=cd2ΛdR(Wμ)L2(μ^t)2.\frac{d}{dt}\mathcal{E}(\mu_{t})=-c_{d}^{-2}\langle\widehat{\mu}_{t}\Lambda_{d}R(\nabla_{W}\mathcal{E}_{\mu_{t}}),\Lambda_{d}R(\nabla_{W}\mathcal{E}_{\mu})\rangle_{\mathbb{P}_{d}}=-c_{d}^{-2}\|\Lambda_{d}R(\nabla_{W}\mathcal{E}_{\mu})\|_{L^{2}(\widehat{\mu}_{t})}^{2}.

7.2. Metric slopes of potential energies at absolutely continuous measures

In the formal computations we have seen that a gradient flow (μt)tI(\mu_{t})_{t\in I} of \mathcal{E} satisfies

ddt(μt)=cd2ΛdR(Wμ)L2(μ^t)2WμH˙(d1)/2(d)2.\frac{d}{dt}\mathcal{E}(\mu_{t})=-c_{d}^{-2}\|\Lambda_{d}R(\nabla_{W}\mathcal{E}_{\mu})\|_{L^{2}(\widehat{\mu}_{t})}^{2}\sim\|\nabla_{W}\mathcal{E}_{\mu}\|_{\dot{H}^{(d-1)/2}(\mathbb{R}^{d})}^{2}.

Letting (μ)=𝒱(μ)=dV(x)𝑑μ\mathcal{E}(\mu)=\mathcal{V}(\mu)=\int_{\mathbb{R}^{d}}V(x)\,d\mu for smooth V:dV\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R}, we know W𝒱μ=V\nabla_{W}\mathcal{V}_{\mu}=\nabla V. Thus along the SWSW-gradient flow (μt)t0(\mu_{t})_{t\geq 0}, we have

ddt𝒱(μt)VH˙(d1)/2(d)2=VH˙(d+1)/2(d)2.\frac{d}{dt}\mathcal{V}(\mu_{t})\sim-\|\nabla V\|_{\dot{H}^{(d-1)/2}(\mathbb{R}^{d})}^{2}=-\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}.

Remark 7.1 shows that the dissipation of H˙(d+1)/2\dot{H}^{-(d+1)/2}-gradient flow (μ~t)t0(\tilde{\mu}_{t})_{t\geq 0} of 𝒱\mathcal{V} is of the same order:

ddt𝒱(μ~t)=VH˙(d+1)/2(d)2.\frac{d}{dt}\mathcal{V}(\tilde{\mu}_{t})=-\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}.
Remark 7.1 (Gradient flows with respect to the H˙s\dot{H}^{-s} norm).

Let μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) be a measure with L2(d)L^{2}(\mathbb{R}^{d}) density, and let us identify μ\mu with its density. Let :L2(d)\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}L^{2}(\mathbb{R}^{d})\rightarrow\mathbb{R} be a functional that admits an L2L^{2} gradient – i.e. at suitable μL2(d)\mu\in L^{2}(\mathbb{R}^{d}) there exists L2μL2(d)\nabla_{L^{2}}\mathcal{E}_{\mu}\in L^{2}(\mathbb{R}^{d}) such that for each vL2(d)v\in L^{2}(\mathbb{R}^{d}) with v=0\int v=0

ddε|ε=0(μ+εv)=L2μ,vL2(d).\displaystyle\left.\frac{d}{d\varepsilon}\right|_{\varepsilon=0}\mathcal{E}(\mu+\varepsilon v)=\langle\nabla_{L^{2}}\mathcal{E}_{\mu},v\rangle_{L^{2}(\mathbb{R}^{d})}.

Assuming L2μ\nabla_{L^{2}}\mathcal{E}_{\mu} is sufficiently smooth, the gradient H˙sμ\nabla_{\dot{H}^{-s}}\mathcal{E}_{\mu} of \mathcal{E} with respect to the H˙s\dot{H}^{-s} norm is formally given by

H˙sμ=(Δ)sL2μ,\nabla_{\dot{H}^{s}}\mathcal{E}_{\mu}=(-\Delta)^{s}\nabla_{L^{2}}\mathcal{E}_{\mu},

as (Δ)sL2μ,vH˙s(d)=L2μ,vL2(d)\langle(-\Delta)^{s}\nabla_{L^{2}}\mathcal{E}_{\mu},v\rangle_{\dot{H}^{-s}(\mathbb{R}^{d})}=\langle\nabla_{L^{2}}\mathcal{E}_{\mu},v\rangle_{L^{2}(\mathbb{R}^{d})}. Thus the H˙s(d)\dot{H}^{-s}(\mathbb{R}^{d}) gradient flow of \mathcal{E} formally satisfies the PDE

tμt+(Δ)sL2μt=0\partial_{t}\mu_{t}+(-\Delta)^{s}\nabla_{L^{2}}\mathcal{E}_{\mu_{t}}=0

and we see that the PDE is precisely of order 2s2s higher than that of the L2L^{2} gradient flow equation; note that the H˙s\dot{H}^{-s}-gradient flow equation has the structure tμt=H˙sμ\partial_{t}\mu_{t}=-\nabla_{\dot{H}^{s}}\mathcal{E}_{\mu}, whereas the Wasserstein gradient flow satisfies an equation formulated in terms of the continuity equation. Furthermore, dissipation of the gradient flow is

ddt(μt)=H˙sμtHs(d)2=(Δ)sL2μtHs(d)2=L2μtH˙s(d)2.\frac{d}{dt}\mathcal{E}(\mu_{t})=-\|\nabla_{\dot{H}^{-s}}\mathcal{E}_{\mu_{t}}\|_{H^{-s}(\mathbb{R}^{d})}^{2}=-\|(-\Delta)^{s}\nabla_{L^{2}}\mathcal{E}_{\mu_{t}}\|_{H^{-s}(\mathbb{R}^{d})}^{2}=-\|\nabla_{L^{2}}\mathcal{E}_{\mu_{t}}\|_{\dot{H}^{s}(\mathbb{R}^{d})}^{2}.

For =𝒱\mathcal{E}=\mathcal{V}, we note that L2𝒱μ=V\nabla_{L^{2}}\mathcal{V}_{\mu}=V, and hence H˙s\dot{H}^{s}-gradient flow of 𝒱\mathcal{V} satisfies

ddt𝒱(μt)=VH˙s(d)2.\displaystyle\frac{d}{dt}\mathcal{V}(\mu_{t})=-\|V\|_{\dot{H}^{s}(\mathbb{R}^{d})}^{2}.

Applying Theorem 5.2, we demonstrate that (7.2) holds for potential energy functionals with smooth compactly supported VV.

Proposition 7.2 (Slope of potential energies at absolutely continuous measures).

Let V:d[0,+)V\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow[0,+\infty) be smooth and compactly supported. Let 𝒱:𝒫2(d)[0,+)\mathcal{V}\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow[0,+\infty) be the potential energy functional

𝒱(ν)=dV(x)𝑑ν(x).\mathcal{V}(\nu)=\int_{\mathbb{R}^{d}}V(x)\,d\nu(x).

Let Ω\Omega an open bounded connected domain containing suppV\operatorname{supp}V with dist(suppV,Ω)=:α>0\operatorname{dist}(\operatorname{supp}V,\partial\Omega)=\mathrel{\mathop{\mathchar 58\relax}}\alpha>0, and μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) an absolutely continuous probability measure such that

a|Ω|d¬Ωμb|Ω|d¬Ω for some 0<a<b<.\frac{a}{|\Omega|}\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega}\leq\mu\leq\frac{b}{|\Omega|}\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega}\text{ for some }0<a<b<\infty.

Then

(7.8) VH˙(d+1)/2(d)a,α,|Ω||𝒱|SW(μ)|𝒱|SW(μ)cd1rΛdRVL2(μ^)cd,b,ΩVH˙(d+1)/2(d).\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}\lesssim_{a,\alpha,|\Omega|}|\partial\mathcal{V}|_{\ell_{SW}}(\mu)\leq|\partial\mathcal{V}|_{SW}(\mu)\leq c_{d}^{-1}\|\partial_{r}\Lambda_{d}RV\|_{L^{2}(\widehat{\mu})}\lesssim_{c_{d},b,\Omega}\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}.
Proof.

To obtain the upper bound, note that as VCc(d)V\in C_{c}^{\infty}(\mathbb{R}^{d}) the Radon inversion (A.11) and duality formula for finite measures (2.3) imply that for any ν𝒫2(d)\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}), writing γ^Γ^o(μ,ν)\widehat{\gamma}\in\widehat{\Gamma}_{o}(\mu,\nu), we have

𝒱(μ)𝒱(ν)\displaystyle\mathcal{V}(\mu)-\mathcal{V}(\nu) =dVd(μν)=cd1dRΛdRVd(μν)=cd1𝕊d1ΛdRVd(μ^θν^θ)𝑑θ\displaystyle=\int_{\mathbb{R}^{d}}V\,d(\mu-\nu)=c_{d}^{-1}\int_{\mathbb{R}^{d}}R^{\ast}\Lambda_{d}RVd(\mu-\nu)=c_{d}^{-1}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\Lambda_{d}RV\,d(\widehat{\mu}^{\theta}-\widehat{\nu}^{\theta})\,d\theta
=cd1𝕊d1ΛdRV(r,θ)ΛdRV(q,θ)rq(rq)𝑑γ^θ(r,q)𝑑θ.\displaystyle=c_{d}^{-1}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{\Lambda_{d}RV(r,\theta)-\Lambda_{d}RV(q,\theta)}{r-q}(r-q)\,d\widehat{\gamma}^{\theta}(r,q)\,d\theta.

Furthermore, as ΛdRV\Lambda_{d}RV is of class CC^{\infty},

|ΛdRV(q,θ)ΛdRV(r,θ)||rΛdRV(r,θ)||rq|+O(|rq|2)\displaystyle|\Lambda_{d}RV(q,\theta)-\Lambda_{d}RV(r,\theta)|\leq|\partial_{r}\Lambda_{d}RV(r,\theta)||r-q|+O(|r-q|^{2})

and thus

|𝕊d1ΛdRV(r,θ)ΛdRV(q,θ)rq(rq)𝑑γ^θ(r,q)𝑑θ|\displaystyle\left|\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\frac{\Lambda_{d}RV(r,\theta)-\Lambda_{d}RV(q,\theta)}{r-q}(r-q)\,d\widehat{\gamma}^{\theta}(r,q)\,d\theta\right|
𝕊d1|rq|(|rΛdRV(r,θ)|+O(|rq|))𝑑γ^θ(r,q)𝑑θ\displaystyle\leq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|r-q|(|\partial_{r}\Lambda_{d}RV(r,\theta)|+O(|r-q|))\,d\widehat{\gamma}^{\theta}(r,q)\,d\theta
𝕊d1|rq||rΛdRV(r,θ)|)dγ^θ(r,q)dθ+𝕊d1O(|rq|2)dγ^θ(r,q)dθ\displaystyle\leq\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|r-q||\partial_{r}\Lambda_{d}RV(r,\theta)|)\,d\widehat{\gamma}^{\theta}(r,q)\,d\theta+\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}O(|r-q|^{2})\,d\widehat{\gamma}^{\theta}(r,q)\,d\theta
SW(μ,ν)rΛdRVL2(μ^)+SW2(μ,ν).\displaystyle\leq SW(\mu,\nu)\|\partial_{r}\Lambda_{d}RV\|_{L^{2}(\widehat{\mu})}+SW^{2}(\mu,\nu).

Hence

|𝒱(μ)𝒱(ν)|cd1SW(μ,ν)(rΛdRVL2(μ^)+SW(μ,ν))\displaystyle|\mathcal{V}(\mu)-\mathcal{V}(\nu)|\leq c_{d}^{-1}SW(\mu,\nu)(\|\partial_{r}\Lambda_{d}RV\|_{L^{2}(\widehat{\mu})}+SW(\mu,\nu))

and deduce the upper bound

|𝒱|SW(μ)\displaystyle|\partial\mathcal{V}|_{SW}(\mu) =lim supνμ[𝒱(μ)𝒱(ν)]+SW(μ,ν)\displaystyle=\limsup_{\nu\rightarrow\mu}\frac{[\mathcal{V}(\mu)-\mathcal{V}(\nu)]_{+}}{SW(\mu,\nu)}
lim supνμcd1SW(μ,ν)(rΛdRVL2(μ^)+SW(μ,ν))SW(μ,ν)=cd1rΛdRVL2(μ^).\displaystyle\leq\limsup_{\nu\rightarrow\mu}\frac{c_{d}^{-1}SW(\mu,\nu)(\|\partial_{r}\Lambda_{d}RV\|_{L^{2}(\widehat{\mu})}+SW(\mu,\nu))}{SW(\mu,\nu)}=c_{d}^{-1}\|\partial_{r}\Lambda_{d}RV\|_{L^{2}(\widehat{\mu})}.

Note that μ^θbCΩ1\widehat{\mu}^{\theta}\leq bC_{\Omega}\mathscr{L}^{1}, where CΩ=CλΩC_{\Omega}=C_{\lambda_{\Omega}} is as in Theorem 5.2 with λΩ=|Ω|1d¬Ω\lambda_{\Omega}=|\Omega|^{-1}\mathscr{L}^{d}\,\raisebox{-0.5468pt}{\reflectbox{\rotatebox[origin={br}]{-90.0}{$\lnot$}}}\,_{\Omega}. Thus by the Radon isometry (2.9)

rΛdRVL2(μ^)b/|Ω|ΛdRVH˙1(d)=VH˙(d+1)/2(d).\|\partial_{r}\Lambda_{d}RV\|_{L^{2}(\widehat{\mu})}\lesssim_{b/|\Omega|}\|\Lambda_{d}RV\|_{\dot{H}^{1}(\mathbb{P}_{d})}=\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}.

As SWSWSW\leq\ell_{SW}, |𝒱|SW(μ)|𝒱|SW(μ)|\partial\mathcal{V}|_{SW}(\mu)\geq|\partial\mathcal{V}|_{\ell_{SW}}(\mu) and it only remains to prove the lower bound

VH˙(d+1)/2(d)a,α,|Ω||𝒱|SW(μ)\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}\lesssim_{a,\alpha,|\Omega|}|\partial\mathcal{V}|_{\ell_{SW}}(\mu)

for μ\mu satisfying the provided conditions. To do so, define for each ε>0\varepsilon>0

σε:=με(Δ)d+12V(x)dx.\sigma_{\varepsilon}\mathrel{\mathop{\mathchar 58\relax}}=\mu-\varepsilon(-\Delta)^{\frac{d+1}{2}}V(x)\,dx.

As VCcV\in C_{c}^{\infty}, (Δ)d+12V<\|(-\Delta)^{\frac{d+1}{2}}V\|_{\infty}<\infty and μ\mu is bounded away from zero on ΩsuppV\Omega\supset\operatorname{supp}V, σε0\sigma_{\varepsilon}\geq 0 when ε\varepsilon is sufficiently small. Furthermore, as (Δ)d+12VCc(d)(-\Delta)^{\frac{d+1}{2}}V\in C_{c}^{\infty}(\mathbb{R}^{d}), σε\sigma_{\varepsilon} has bounded second moments, and integrating by parts in a sufficiently large ball containing suppV\operatorname{supp}V we may deduce d(Δ)d+12V=0\int_{\mathbb{R}^{d}}(-\Delta)^{\frac{d+1}{2}}V=0, hence σε𝒫2(d)\sigma_{\varepsilon}\in\mathscr{P}_{2}(\mathbb{R}^{d}). Moreover, σε\sigma_{\varepsilon} has uniformly bounded second moments and converges to μ\mu narrowly, thus σεμ\sigma_{\varepsilon}\rightarrow\mu in SWSW as ε0\varepsilon\searrow 0. Therefore

dV(x)d(μσε)=εdV(x)(Δ)d+12V𝑑x=ε(Δ)d+14VL2(d)2=εVH˙(d+1)/2(d)2.\displaystyle\int_{\mathbb{R}^{d}}V(x)\,d(\mu-\sigma_{\varepsilon})=\varepsilon\int_{\mathbb{R}^{d}}V(x)(-\Delta)^{\frac{d+1}{2}}V\,dx=\varepsilon\|(-\Delta)^{\frac{d+1}{4}}V\|_{L^{2}(\mathbb{R}^{d})}^{2}=\varepsilon\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}.

As further μ=με\mu=\mu_{\varepsilon} on ΩsuppV\Omega\setminus\operatorname{supp}V, by the comparison theorem at absolutely continuous measures (Theorem 5.2),

|𝒱|SW(μ)\displaystyle|\partial\mathcal{V}|_{\ell_{SW}}(\mu) =lim supνμ𝒱(μ)𝒱(ν)SW(μ,ν)lim supε0𝒱(μ)𝒱(σε)SW(μ,σε)\displaystyle=\limsup_{\nu\rightarrow\mu}\frac{\mathcal{V}(\mu)-\mathcal{V}(\nu)}{\ell_{SW}(\mu,\nu)}\geq\limsup_{\varepsilon\searrow 0}\frac{\mathcal{V}(\mu)-\mathcal{V}(\sigma_{\varepsilon})}{\ell_{SW}(\mu,\sigma_{\varepsilon})}
a,α,Ωlim supε0𝒱(μ)𝒱(σε)μσεH˙(d+1)/2(d)=VH˙(d+1)/2(d)2(Δ)d+12VH˙(d+1)/2(d)=VH˙(d+1)/2(d).\displaystyle\gtrsim_{a,\alpha,\Omega}\limsup_{\varepsilon\searrow 0}\frac{\mathcal{V}(\mu)-\mathcal{V}(\sigma_{\varepsilon})}{\|\mu-\sigma_{\varepsilon}\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}}=\frac{\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}}{\|(-\Delta)^{\frac{d+1}{2}}V\|_{\dot{H}^{-(d+1)/2}(\mathbb{R}^{d})}}=\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}.

7.3. Metric slopes of potential energies at discrete measures

In this section we focus on the equivalence of the Wasserstein and the sliced Wasserstein metric slopes of potential energies at discrete measures.

Given a functional :𝒫2(d)(,+]\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow(-\infty,+\infty], a metric m:𝒫2(d)×𝒫2(d)[0,+)m\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\times\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow[0,+\infty), a time-step τ>0\tau>0, and a base point μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}) let us write

(7.9) m(ν;τ,μ):=m2(ν,μ)2τ+(ν) for each ν𝒫2(d).\mathcal{E}^{m}(\nu;\tau,\mu)\mathrel{\mathop{\mathchar 58\relax}}=\frac{m^{2}(\nu,\mu)}{2\tau}+\mathcal{E}(\nu)\text{ for each }\nu\in\mathscr{P}_{2}(\mathbb{R}^{d}).

We denote by τm:𝒫2(d)(,+]\mathcal{E}^{m}_{\tau}\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow(-\infty,+\infty] the Moreau-Yosida approximation of \mathcal{E} with with respect to metric mm and time step τ>0\tau>0

(7.10) τm(μ)=infν𝒫2(d)m(ν;τ,μ).\mathcal{E}^{m}_{\tau}(\mu)=\inf_{\nu\in\mathscr{P}_{2}(\mathbb{R}^{d})}\mathcal{E}^{m}(\nu;\tau,\mu).

Existence and uniqueness of the minimizer of (7.9) with m=SWm=SW in certain cases was discussed in [10]. General existence readily follows from the direct method of calculus of variations as we will see in the proof of Lemma 7.4.

We will impose two weak regularity assumptions on \mathcal{E}, namely lower semicontinuity with respect to SWSW and coercivity; we say \mathcal{E} is coercive if there exists τ>0\tau_{\ast}>0 and μ𝒫2(d)\mu_{\ast}\in\mathscr{P}_{2}(\mathbb{R}^{d}) such that

(7.11) τm(μ)>.\mathcal{E}^{m}_{\tau}(\mu_{\ast})>-\infty.

It is well-known that the potential energy functional 𝒱(μ)=dV(x)𝑑μ(x)\mathcal{V}(\mu)=\int_{\mathbb{R}^{d}}V(x)\,d\mu(x) is coercive for instance when the negative part of VV grows at most quadratically – i.e. V(x)C1C2|x|2V(x)\geq-C_{1}-C_{2}|x|^{2} for some C1,C2>0C_{1},C_{2}>0. Moreover, lower semicontinuity of VV in d\mathbb{R}^{d} implies lower semicontinuity of 𝒱\mathcal{V} with respect to the narrow topology, hence with respect to SWSW and WW.

We stress that while we utilize the variational problem (7.10) in this section to characterize the metric slope via the duality formula, we do not study the limiting curves of the minimizing movements scheme.

The duality formula for the local slope [1, Lemma 3.1.5] in terms of the minimizers of the functional (7.9) along with Theorem 5.5 allows us to establish the following sufficient condition for an energy functional \mathcal{E} to satisfy ||SW=d||W|\partial\mathcal{E}|_{SW}=\sqrt{d}\,|\partial\mathcal{E}|_{W} at discrete measures.

Lemma 7.3.

Let :𝒫2(d)[0,+)\mathcal{E}\mathrel{\mathop{\mathchar 58\relax}}\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow[0,+\infty) be coercive and lower semicontinuous with respect to SWSW. Additionally, suppose that at each discrete measure μn=i=1nmiδxi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}}, for sufficiently small τ>0\tau>0 the functional SW(;τ,μn)\mathcal{E}^{SW}(\cdot;\tau,\mu^{n}) as defined in (7.9) admits minimizers μτ\mu_{\tau} such that W(μτ,μn)τ00W_{\infty}(\mu_{\tau},\mu^{n})\xrightarrow[]{\tau\searrow 0}0.

Then the slope of \mathcal{E} at each discrete probability measures μn\mu^{n} w.r.t SWSW coincide with the slope with respect to W/dW/\sqrt{d} – i.e.

(7.12) ||SW(μn)=d||W(μn).|\partial\mathcal{E}|_{SW}(\mu^{n})=\sqrt{d}\,|\partial\mathcal{E}|_{W}(\mu^{n}).
Proof.

Fix μn=i=1nmiδxi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}}. By hypothesis, for sufficiently small τ>0\tau>0 we can find minimizers μτSW\mu_{\tau}^{SW} of the JKO functional SW(;τ,μn)\mathcal{E}^{SW}(\cdot;\tau,\mu^{n}) such that W(μn,μτSW)τ00W_{\infty}(\mu^{n},\mu_{\tau}^{SW})\xrightarrow[]{\tau\searrow 0}0. By coercivity and lower semicontinuity with respect to SWSW, we can apply the duality formula for the local slope [1, Lemma 3.1.5] to choose a sequence τk0\tau_{k}\rightarrow 0 such that

||SW2(μ)=limkSW2(μn,μτk)τk2=limk(μn)(μτSW)τk.|\partial\mathcal{E}|_{SW}^{2}(\mu)=\lim_{k\rightarrow\infty}\frac{SW^{2}(\mu^{n},\mu_{\tau_{k}})}{\tau_{k}^{2}}=\lim_{k\rightarrow\infty}\frac{\mathcal{E}(\mu^{n})-\mathcal{E}(\mu_{\tau}^{SW})}{\tau_{k}}.

As W(μn,μτkSW)τ00W_{\infty}(\mu^{n},\mu_{\tau_{k}}^{SW})\xrightarrow[]{\tau\searrow 0}0, by Theorem 5.5 we have

limkSW2(μn,μτkSW)2τk2=limkW2(μn,μτkSW)2dτk2.\lim_{k\rightarrow\infty}\frac{SW^{2}(\mu^{n},\mu_{\tau_{k}}^{SW})}{2\tau_{k}^{2}}=\lim_{k\rightarrow\infty}\frac{W^{2}(\mu^{n},\mu_{\tau_{k}}^{SW})}{2d\tau_{k}^{2}}.

On the other hand, the Moreau-Yosida approximation τW/d\mathcal{E}_{\tau}^{W/\sqrt{d}} satisfies τW/d(μn)(μτSW)+W2(μn,μτSW)2dτ\mathcal{E}_{\tau}^{W/\sqrt{d}}(\mu^{n})\leq\mathcal{E}(\mu_{\tau}^{SW})+\frac{W^{2}(\mu^{n},\mu_{\tau}^{SW})}{2d\tau}, and thus

(μn)(μτSW)τW2(μn,μτSW)2dτ2(μn)τW/d(μn)τ.\frac{\mathcal{E}(\mu^{n})-\mathcal{E}(\mu_{\tau}^{SW})}{\tau}-\frac{W^{2}(\mu^{n},\mu_{\tau}^{SW})}{2d\tau^{2}}\leq\frac{\mathcal{E}(\mu^{n})-\mathcal{E}_{\tau}^{W/\sqrt{d}}(\mu^{n})}{\tau}.

Combining the estimates and again using the duality formula for ||W/d|\partial\mathcal{E}|_{W/\sqrt{d}},

12||SW2(μn)\displaystyle\frac{1}{2}|\partial\mathcal{E}|_{SW}^{2}(\mu^{n}) =limk(μn)(μτkSW)τkSW2(μn,μτkSW)2τk2=limk(μn)(μτkSW)τW2(μn,μτkSW)2dτk2\displaystyle=\lim_{k\rightarrow\infty}\frac{\mathcal{E}(\mu^{n})-\mathcal{E}(\mu_{\tau_{k}}^{SW})}{\tau_{k}}-\frac{SW^{2}(\mu^{n},\mu_{\tau_{k}}^{SW})}{2\tau_{k}^{2}}=\lim_{k\rightarrow\infty}\frac{\mathcal{E}(\mu^{n})-\mathcal{E}(\mu_{\tau_{k}}^{SW})}{\tau}-\frac{W^{2}(\mu^{n},\mu_{\tau_{k}}^{SW})}{2d\tau_{k}^{2}}
lim supτ0(μn)τW/d(μn)τ=12||W/d2(μn)=d2||W2(μn).\displaystyle\leq\limsup_{\tau\searrow 0}\frac{\mathcal{E}(\mu^{n})-\mathcal{E}_{\tau}^{W/\sqrt{d}}(\mu^{n})}{\tau}=\frac{1}{2}|\partial\mathcal{E}|_{W/\sqrt{d}}^{2}(\mu^{n})=\frac{d}{2}\,|\partial\mathcal{E}|_{W}^{2}(\mu^{n}).

Our last step is to verify that the hypotheses of Lemma 7.3 are satisfied for a general class of potential energy functionals.

Lemma 7.4.

Let V:d[0,+)V\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow[0,+\infty) be continuous and let 𝒱:μdV(x)𝑑μ(x)\mathcal{V}\mathrel{\mathop{\mathchar 58\relax}}\mu\mapsto\int_{\mathbb{R}^{d}}V(x)\,d\mu(x). Then, for each discrete measure μn=i=1nmiδyi𝒫2(d)\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{y_{i}}\in\mathscr{P}_{2}(\mathbb{R}^{d}) there exists τ=τ(μ,V)\tau^{\ast}=\tau^{\ast}(\mu,V) such that for all 0<τ<τ0<\tau<\tau^{\ast}, 𝒱SW(;τ,μn)\normalcolor\mathcal{V}^{SW}(\,\cdot\,;\tau,\mu^{n})\normalcolor as defined in (7.9) admits a minimizer μτ\mu_{\tau} such that

(7.13) W(μτ,μn)τ1/3.W_{\infty}(\mu_{\tau},\mu^{n})\leq\normalcolor\tau^{1/3}\normalcolor.
Proof.

As 𝒱\mathcal{V} is nonnegative, for any fixed τ>0\tau>0 and c<c<\infty the corresponding sublevel set

{ν𝒫2(d):𝒱SW(;τ,μn)c}\{\nu\in\mathscr{P}_{2}(\mathbb{R}^{d})\mathrel{\mathop{\mathchar 58\relax}}\mathcal{V}^{SW}(\cdot;\tau,\mu^{n})\leq c\}

is contained in the ball BSW(μn,2τc)B_{SW}(\mu^{n},\sqrt{2\tau c}), which is sequentially compact with respect to the narrow convergence of measures by Proposition 2.3. On the other hand, 𝒱SW(;τ,μn)\mathcal{V}^{SW}(\cdot;\tau,\mu^{n}) is lower semicontinuous with respect to the narrow convergence: 𝒱\mathcal{V} is lower semicontinuous as VV is continuous and we know from Lemma 2.1 that σSW(σ,μn)\sigma\mapsto SW(\sigma,\mu^{n}) is also narrowly lower semicontinuous In conclusion, for each τ>0\tau>0 minimizer μτ\mu_{\tau} of the JKO functional 𝒱SW(;τ,μn)\mathcal{V}^{SW}(\cdot;\tau,\mu^{n}) exists.

In the remainder of the proof, we denote by κd>0\kappa_{d}>0 a constant only dependent on the dimension such that

(7.14) |{θ𝕊d1:|θ1|s}|κd|𝕊d1|s.|\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\;|\theta_{1}|\leq s\}|\leq\kappa_{d}|\mathbb{S}^{d-1}|s.

Step 1 Let μτ\mu_{\tau} be a minimizer of 𝒱SW(;τ,μn)\mathcal{V}^{SW}(\cdot;\tau,\mu^{n}). Let us write

Ωhn={xd:|xyi|h for some i=1,2,,n}\Omega_{h}^{n}=\{x\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\;|x-y_{i}|\leq h\text{ for some }i=1,2,\cdots,n\}

and decompose μτ\mu_{\tau} into

(7.15) μτ=mτfarμτfar+i=1nmτiμτi where suppμτiB¯(yi,τ1/3),suppμτfarΩτ1/3n=.\mu_{\tau}=m_{\tau}^{far}\mu_{\tau}^{far}+\sum_{i=1}^{n}m_{\tau}^{i}\mu_{\tau}^{i}\text{ where }\operatorname{supp}\mu_{\tau}^{i}\subset\overline{B}(y_{i},\tau^{1/3}),\;\operatorname{supp}\mu_{\tau}^{far}\cap\Omega_{\tau^{1/3}}^{n}=\emptyset.

Here μτfar𝒫2(d)\mu_{\tau}^{far}\in\mathscr{P}_{2}(\mathbb{R}^{d}) is the (normalized) part of μτ\mu_{\tau} that is far from the support of μn\mu^{n}, whereas μτi𝒫2(d)\mu_{\tau}^{i}\in\mathscr{P}_{2}(\mathbb{R}^{d}) is the part of μτ\mu_{\tau} close to yiy_{i} for each i=1,,ni=1,\cdots,n. Let

mτΔ=i=1n(mτimi)+=mτfar+i=1n(mimτi)+.m_{\tau}^{\Delta}=\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}=-m_{\tau}^{far}+\sum_{i=1}^{n}(m_{i}-m_{\tau}^{i})_{+}.

Intuitively, mτfarm_{\tau}^{far} is the mass outside the balls of radius τ1/3\tau^{1/3} about yiy_{i}, and mτΔm_{\tau}^{\Delta} is the ‘total misplaced mass’. We want to show that both mτfarm_{\tau}^{far} and mτΔm_{\tau}^{\Delta} are 0 when τ\tau is sufficiently small. To deduce this, note that 𝒱SW(μτ;τ,μn)𝒱SW(μn;τ,μn)\mathcal{V}^{SW}(\mu_{\tau};\tau,\mu^{n})\leq\mathcal{V}^{SW}(\mu^{n};\tau,\mu^{n}), and thus

SW2(μτ,μn)2τ\displaystyle\frac{SW^{2}(\mu_{\tau},\mu^{n})}{2\tau} 𝒱(μn)𝒱(μτ)=𝒱(μn)𝒱(i=1nmτiδyi)+𝒱(i=1nmτiδyi)𝒱(μτ)\displaystyle\leq\mathcal{V}(\mu^{n})-\mathcal{V}(\mu_{\tau})=\mathcal{V}(\mu^{n})-\mathcal{V}(\sum_{i=1}^{n}m_{\tau}^{i}\delta_{y_{i}})+\mathcal{V}(\sum_{i=1}^{n}m_{\tau}^{i}\delta_{y_{i}})-\mathcal{V}(\mu_{\tau})
=i=1n(mimτi)V(yi)+i=1nmτidV(yi)V(x)dμτi(x)\displaystyle=\sum_{i=1}^{n}(m_{i}-m_{\tau}^{i})V(y_{i})+\sum_{i=1}^{n}m_{\tau}^{i}\int_{\mathbb{R}^{d}}V(y_{i})-V(x)\,d\mu_{\tau}^{i}(x)
(mτΔ+mτfar)maxi=1,,nV(yi)+ωV(τ1/3).\displaystyle\leq(m_{\tau}^{\Delta}+m_{\tau}^{far})\max_{i=1,\cdots,n}V(y_{i})+\omega_{V}(\tau^{1/3}).

where ωV\omega_{V} is the uniform modulus of continuity of VV in, say i=1nB¯(yi,1)\bigcup_{i=1}^{n}\overline{B}(y_{i},1). On the other hand, we claim

(7.16) τ2/3mτfar16n2κd2τ+(lμn216n2Cd2τ2/3)mτΔ2τSW2(μτ,μn)2τ\frac{\tau^{2/3}m_{\tau}^{far}}{16n^{2}\kappa_{d}^{2}\tau}+\left(\frac{l_{\mu^{n}}^{2}}{16n^{2}C_{d}^{2}}-\tau^{2/3}\right)\frac{m_{\tau}^{\Delta}}{2\tau}\leq\frac{SW^{2}(\mu_{\tau},\mu^{n})}{2\tau}

where lμn=minij|yiyj|l_{\mu^{n}}=\min_{i\neq j}|y_{i}-y_{j}|. Note that this implies

(7.17) τ2/3mτfar16n2κd2τ+(lμn216n2Cd2τ2/3)mτΔ2τ(mτΔ+mτfar)maxyiV(yi)+ωV(τ1/3).\frac{\tau^{2/3}m_{\tau}^{far}}{16n^{2}\kappa_{d}^{2}\tau}+\left(\frac{l_{\mu^{n}}^{2}}{16n^{2}C_{d}^{2}}-\tau^{2/3}\right)\frac{m_{\tau}^{\Delta}}{2\tau}\leq(m_{\tau}^{\Delta}+m_{\tau}^{far})\max_{y_{i}}V(y_{i})+\omega_{V}(\tau^{1/3}).

The choice of radius τ1/3\tau^{1/3} in (7.15) allows, after squaring, the coefficient of mτfarm_{\tau}^{far} in the left-hand side to blow up as τ0\tau\searrow 0, and may be replaced by τ1/2ε\tau^{1/2-\varepsilon} for any ε>0\varepsilon>0 . From this we deduce that we can find τ=τ(n,d,lμn,maxiV(yi),ωV)>0\tau^{\ast}=\tau^{\ast}(n,d,l_{\mu^{n}},\max_{i}V(y_{i}),\omega_{V})>0 such that for all τ<τ\tau<\tau_{\ast},

mτΔ=0 and mτfar=0.m_{\tau}^{\Delta}=0\text{ and }m_{\tau}^{far}=0.

By definition of mτΔ,mτfarm_{\tau}^{\Delta},m_{\tau}^{far}, this means μτ\mu_{\tau} has exactly mim_{i} mass in each B¯(yi,τ1/3)\overline{B}(y_{i},\tau^{1/3}), thus

W(μτ,μn)τ1/3 for τ(0,τ).W_{\infty}(\mu_{\tau},\mu^{n})\leq\tau^{1/3}\text{ for }\tau\in(0,\tau^{\ast}).

Thus it remains to prove (7.16). Without loss of generality we may assume μτd\mu_{\tau}\ll\mathscr{L}^{d}. Otherwise, we approximate μτ\mu_{\tau} by its convolution μτε\mu_{\tau}^{\varepsilon} with a smooth and compactly supported kernel. Then all the quantities involved in (7.17) such as mτfar(ε),mτΔ(ε),SW2(μτε,μn)m_{\tau}^{far}(\varepsilon),m_{\tau}^{\Delta}(\varepsilon),SW^{2}(\mu_{\tau}^{\varepsilon},\mu^{n}), V(μτε)V(\mu_{\tau}^{\varepsilon}) converge as ε0\varepsilon\searrow 0, hence we can deduce (7.17) in the limit as ε0\varepsilon\searrow 0.

Furthermore, as yyθy\mapsto y\cdot\theta is a bijection on {y1,,yn}\{y_{1},\cdots,y_{n}\} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1}, we have the existence of transport map Tθ:ddT^{\theta}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R}^{d} for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1} such that

SW2(μτ,μn)=𝕊d1d|(Tθ(x)x)θ|2𝑑μτ(x).SW^{2}(\mu_{\tau},\mu^{n})=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}}|(T^{\theta}(x)-x)\cdot\theta|^{2}\,d\mu_{\tau}(x).

Step 2 To prove (7.16), we separately consider μτfar\mu_{\tau}^{far} and i=1nmτiμτi\sum_{i=1}^{n}m_{\tau}^{i}\mu_{\tau}^{i} – namely

SW2(μτ,μn)\displaystyle SW^{2}(\mu_{\tau},\mu^{n}) =d𝕊d1|(Tθ(x)x)θ|2𝑑μτ(x)𝑑θ\displaystyle=\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(T^{\theta}(x)-x)\cdot\theta|^{2}\,d\mu_{\tau}(x)\,d\theta
=mτfard𝕊d1|(Tθ(x)x)θ|2𝑑μτfar(x)𝑑θ\displaystyle=m_{\tau}^{far}\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(T^{\theta}(x)-x)\cdot\theta|^{2}\,d\mu_{\tau}^{far}(x)\,d\theta
+i=1nmτid𝕊d1|(Tθ(x)x)θ|2𝑑μτi(x)𝑑θ\displaystyle\phantom{=}\,+\sum_{i=1}^{n}m_{\tau}^{i}\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(T^{\theta}(x)-x)\cdot\theta|^{2}\,d\mu_{\tau}^{i}(x)\,d\theta
=:Iτfar+IτΔ.\displaystyle=\mathrel{\mathop{\mathchar 58\relax}}I_{\tau}^{far}+I_{\tau}^{\Delta}.

We first deal with the term IτfarI_{\tau}^{far}

Iτfar=mτfard𝕊d1|(Tθ(x)x)θ|2𝑑μτfar(x)𝑑θmτfard𝕊d1mini|(yix)θ|2dμτfardθ.\displaystyle I_{\tau}^{far}=m_{\tau}^{far}\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}|(T^{\theta}(x)-x)\cdot\theta|^{2}\,d\mu_{\tau}^{far}(x)\,d\theta\geq m_{\tau}^{far}\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\min_{i}|(y_{i}-x)\cdot\theta|^{2}\,d\mu_{\tau}^{far}\,d\theta.

Let

Θxi:={θ𝕊d1:|(yix)θ|τ1/32nκd}.\Theta_{x}^{i}\mathrel{\mathop{\mathchar 58\relax}}=\left\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\;|(y_{i}-x)\cdot\theta|\leq\frac{\tau^{1/3}}{2n\kappa_{d}}\right\}.

As |yix|τ1/3|y_{i}-x|\geq\tau^{1/3}, we deduce |Θxi|12n|\Theta_{x}^{i}|\leq\frac{1}{2n} and |i=1nΘxi|12|\bigcup_{i=1}^{n}\Theta_{x}^{i}|\leq\frac{1}{2}. On 𝕊d1i=1nΘxi\mathbb{S}^{d-1}\setminus\bigcup_{i=1}^{n}\Theta_{x}^{i} we have mini|(yix)θ|τ1/32nκd\min_{i}|(y_{i}-x)\cdot\theta|\geq\frac{\tau^{1/3}}{2n\kappa_{d}}, and thus

(7.18) Iτfarmτfard𝕊d1mini|(yix)θ|2dμτfardθmτfarτ2/38n2κd2.I_{\tau}^{far}\geq m_{\tau}^{far}\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\min_{i}|(y_{i}-x)\cdot\theta|^{2}\,d\mu_{\tau}^{far}\,d\theta\geq\frac{m_{\tau}^{far}\tau^{2/3}}{8n^{2}\kappa_{d}^{2}}.

Step 3. To deal with the second term IτΔI_{\tau}^{\Delta}, define

Bθ:={xd:|Tθ(x)x|lμnτ1/3}.B_{\theta}\mathrel{\mathop{\mathchar 58\relax}}=\{x\in\mathbb{R}^{d}\mathrel{\mathop{\mathchar 58\relax}}\;|T^{\theta}(x)-x|\geq l_{\mu^{n}}-\tau^{1/3}\}.

The threshold lμnτ1/3l_{\mu^{n}}-\tau^{1/3} is a lower bound on how far incorrectly assigned mass must travel. Thus

mτiμτi(Bθ)(mτimi)+ for each i=1,,n, and θ𝕊d1.m_{\tau}^{i}\mu_{\tau}^{i}(B_{\theta})\geq(m_{\tau}^{i}-m_{i})_{+}\text{ for each }i=1,\cdots,n,\text{ and }\theta\in\mathbb{S}^{d-1}.

Furthermore, on BθsuppμτiB_{\theta}\cap\operatorname{supp}\mu_{\tau}^{i} we have

|(Tθ(x)x)θ|minji|(yiyj)θ|τ1/3.|(T^{\theta}(x)-x)\cdot\theta|\geq\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|-\tau^{1/3}.

Using these two properties, we deduce

IτΔ\displaystyle I_{\tau}^{\Delta} =i=1nmτi𝕊d1d|(Tθ(x)x)θ|2𝑑μτi(x)𝑑θ\displaystyle=\sum_{i=1}^{n}m_{\tau}^{i}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}}|(T^{\theta}(x)-x)\cdot\theta|^{2}\,d\mu_{\tau}^{i}(x)\,d\theta
12i=1nmτi𝕊d1Bθminji|(yiyj)θ|22τ2/3dμτi(x)dθ\displaystyle\geq\frac{1}{2}\sum_{i=1}^{n}m_{\tau}^{i}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{B_{\theta}}\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|^{2}-2\tau^{2/3}\,d\mu_{\tau}^{i}(x)\,d\theta
12i=1n(mτimi)+𝕊d1minji|(yiyj)θ|22τ2/3dθ\displaystyle\geq\frac{1}{2}\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|^{2}-2\tau^{2/3}\,d\theta
=12(i=1n(mτimi)+𝕊d1minji|(yiyj)θ|2dθ)mτΔτ2/3.\displaystyle=\normalcolor\frac{1}{2}\left(\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|^{2}\,d\theta\right)-m_{\tau}^{\Delta}\tau^{2/3}.\normalcolor

Let

Θij:={θ𝕊d1:|(yiyj)θ|lμn2nCd}.\Theta_{i}^{j}\mathrel{\mathop{\mathchar 58\relax}}=\left\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\;|(y_{i}-y_{j})\cdot\theta|\leq\frac{l_{\mu^{n}}}{2nC_{d}}\right\}.

As lμn|θyiyj|yiyj|||(yiyj)θ|,l_{\mu^{n}}\left|\theta\cdot\frac{y_{i}-y_{j}}{|y_{i}-y_{j}|}\right|\leq|(y_{i}-y_{j})\cdot\theta|, by Chebyshev’s inequality we have |Θij||𝕊d1|12n\frac{|\Theta_{i}^{j}|}{|\mathbb{S}^{d-1}|}\leq\frac{1}{2n}, and thus

|j=1nΘij||𝕊d1|12,\frac{|\bigcup_{j=1}^{n}\Theta_{i}^{j}|}{|\mathbb{S}^{d-1}|}\leq\frac{1}{2},

whereas

minji|(yiyj)θ|>lμn2nCd for θ𝕊d1j=1nΘij.\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|>\frac{l_{\mu^{n}}}{2nC_{d}}\;\text{ for }\theta\in\mathbb{S}^{d-1}\setminus\bigcup_{j=1}^{n}\Theta_{i}^{j}.

Thus

i=1n(mτimi)+𝕊d1minji|(yiyj)θ|2dθ\displaystyle\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|^{2}\,d\theta
i=1n(mτimi)+1|𝕊d1|𝕊d1j=1nΘijlμn24n2Cd2𝑑θlμn28n2Cd2i=1n(mτimi)+=lμn2mτΔ8n2Cd2.\displaystyle\geq\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}\frac{1}{|\mathbb{S}^{d-1}|}\int_{\mathbb{S}^{d-1}\setminus\bigcup_{j=1^{n}}\Theta_{i}^{j}}\frac{l_{\mu^{n}}^{2}}{4n^{2}C_{d}^{2}}\,d\theta\geq\frac{l_{\mu^{n}}^{2}}{8n^{2}C_{d}^{2}}\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}=\frac{l_{\mu^{n}}^{2}m_{\tau}^{\Delta}}{8n^{2}C_{d}^{2}}.

Collecting the estimates, we have

IτΔ(12i=1n(mτimi)+𝕊d1minji|(yiyj)θ|2dθ)mτΔτ2/3(lμn216n2Cd2τ2/3)mτΔ\displaystyle I_{\tau}^{\Delta}\geq\normalcolor\left(\frac{1}{2}\sum_{i=1}^{n}(m_{\tau}^{i}-m_{i})_{+}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\min_{j\neq i}|(y_{i}-y_{j})\cdot\theta|^{2}\,d\theta\right)-m_{\tau}^{\Delta}\tau^{2/3}\normalcolor\geq\left(\frac{l_{\mu^{n}}^{2}}{16n^{2}C_{d}^{2}}-\tau^{2/3}\right)m_{\tau}^{\Delta}

Recalling SW2(μτ,μn)=Iτfar+IτΔSW^{2}(\mu_{\tau},\mu^{n})=I_{\tau}^{far}+I_{\tau}^{\Delta}, we combine the above estimate with (7.18) to obtain (7.16). ∎

We are ready to state the main result of this section.

Proposition 7.5 (Slope of potential energy at discrete measures).

Let V:d[0,+)V\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow[0,+\infty) be continuously differentiable. Let 𝒱\mathcal{V} be the potential energy functional

𝒱(μ)=dV(x)𝑑μ(x).\mathcal{V}(\mu)=\int_{\mathbb{R}^{d}}V(x)\,d\mu(x).

Then the slope of 𝒱\mathcal{V} at discrete probability measures μn=i=1nmiδxi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}} w.r.t SWSW coincide with the slope with respect to W/dW/\sqrt{d} – i.e.

(7.19) |𝒱|SW(μn)=d|𝒱|W(μn).|\partial\mathcal{V}|_{SW}(\mu^{n})=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu^{n}).
Proof.

As 𝒱\mathcal{V} satisfies the assumptions of Lemma 7.3 and Lemma 7.4, (7.19) follows from the two lemmas. ∎

We conclude this section by discussing implications of Proposition 7.5. We first note in Remark 7.6 that the curves of maximal slope of the potential energy with respect to SWSW is not stable in the initial data.

Remark 7.6 (Lack of stability of sliced Wasserstein gradient flows).

We note that the curves of maximal slopes (μtn)t(\mu^{n}_{t})_{t} of 𝒱\mathcal{V} with respect to the Wasserstein metric starting at any discrete measure μn\mu^{n} are, up to a multiplicative constant, curves of maximal slopes with respect to SWSW (or SW\ell_{SW}) metric. To see this, recall that the WW-gradient flow of 𝒱\mathcal{V} starting at μn=i=1nmiδxi\mu^{n}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}} is given by μtn=i=1nmiδxi(t)\mu^{n}_{t}=\sum_{i=1}^{n}m_{i}\delta_{x_{i}(t)} where xi(t)x_{i}(t) solves xi(t)=V(xi(t))x_{i}^{\prime}(t)=-\nabla V(x_{i}(t)). Thus by Proposition 7.5

|𝒱|SW(μtn)=d|𝒱|W(μtn),|\partial\mathcal{V}|_{SW}(\mu^{n}_{t})=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu^{n}_{t}),

whereas d|(μn)|SW(t)=|(μn)|W(t)\sqrt{d}|(\mu^{n})^{\prime}|_{SW}(t)=|(\mu^{n})^{\prime}|_{W}(t) for a.e. tIt\in I by Theorem 5.5. Consequently

ddt(d1𝒱(μtn))=12d|(μn)|W2(t)12d|𝒱|W2(μtn)=12|(μn)|SW2(t)12|d1𝒱|SW2(μtn),\displaystyle\frac{d}{dt}(d^{-1}\mathcal{V}(\mu_{t}^{n}))=-\frac{1}{2d}|(\mu^{n})^{\prime}|_{W}^{2}(t)-\frac{1}{2d}|\partial\mathcal{V}|_{W}^{2}(\mu_{t}^{n})=-\frac{1}{2}|(\mu^{n})^{\prime}|_{SW}^{2}(t)-\frac{1}{2}|d^{-1}\mathcal{V}|_{SW}^{2}(\mu^{n}_{t}),

thus (μtn)t0(\mu^{n}_{t})_{t\geq 0} is a curve of maximal slope of νd1𝒱(ν)\nu\mapsto d^{-1}\mathcal{V}(\nu) with respect to SWSW or, equivalently, μ~tn:=μdtn\tilde{\mu}^{n}_{t}\mathrel{\mathop{\mathchar 58\relax}}=\mu^{n}_{dt} is a curve of maximal slope of 𝒱\mathcal{V} with respect to SWSW. Note that we do not claim that μ~tn\tilde{\mu}^{n}_{t} is the only such SWSW curve of maximal slope, as uniqueness is unknown.

If VV is semiconvex, the Wasserstein gradient flow of 𝒱\mathcal{V} is stable in initial data, for instance by the Evolution Variational Inequality [1, Theorem 11.1.4]. Thus, if W(μn,μ)0W(\mu^{n},\mu)\rightarrow 0, then (μtn)t(\mu^{n}_{t})_{t} converge with respect to WW to the Wasserstein gradient flow starting at μ\mu, which satisfies

ddt𝒱(μt)=|𝒱|W2(μt)=VL2(μt)2 a.e. t0\frac{d}{dt}\mathcal{V}(\mu_{t})=-|\partial\mathcal{V}|_{W}^{2}(\mu_{t})=-\|\nabla V\|_{L^{2}(\mu_{t})}^{2}\text{ a.e. }t\geq 0

On the other hand, if further VCc(d)V\in C_{c}^{\infty}(\mathbb{R}^{d}), Proposition 7.2 asserts that

VH˙(d+1)/2(d)|𝒱|SW(μ)|𝒱|SW(μ)VH˙(d+1)/2(d)\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}\lesssim|\partial\mathcal{V}|_{\ell_{SW}}(\mu)\leq|\partial\mathcal{V}|_{SW}(\mu)\lesssim\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}

for suitable μ\mu bounded away from zero on a bounded open convex set compactly containing suppV\operatorname{supp}V. Thus, if a SWSW gradient flow (μtSW)t0(\mu_{t}^{SW})_{t\geq 0} starting at μ\mu were to exist, it must satisfy

ddt𝒱(μtSW)=|𝒱|SW2(μtSW)VH˙(d+1)/2(d)2 a.e. t0.\frac{d}{dt}\mathcal{V}(\mu_{t}^{SW})=-|\partial\mathcal{V}|_{SW}^{2}(\mu_{t}^{SW})\lesssim-\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}\text{ a.e. }t\geq 0.

Note that the Wasserstein gradient flow does not satisfy this. So the SW gradient flow (should one exist) is in this case distinct from the Wasserstein gradient flow. This implies that potential energy is not λ\lambda-convex in the SW geometry and furthermore suggests that we cannot hope for stability of sliced Wasserstein gradient flows in initial data in the set of measures, even for smooth potential energies. We believe that equation (7.7) may be more amenable to PDE-based approaches. ∎

Proposition 7.5 also implies that the local slope μ|𝒱|SW(μ)\mu\mapsto|\partial\mathcal{V}|_{SW}(\mu) is not lower semicontinuous with respect to SWSW nor the narrow convergence. This lack of regularity makes difficult rigorously studying the limit of the minimizing movements scheme as the time-step vanishes, which we do not pursue in this paper.

More precisely, solutions of the minimizing movements scheme for gradient flows converge to the curve of maximal slope with respect to the relaxed slope [1, Chapter 2]; recall that the relaxed slope |𝒱|m(μ)|\partial^{-}\mathcal{V}|_{m}(\mu) of 𝒱:X(,+]\mathcal{V}\mathrel{\mathop{\mathchar 58\relax}}X\rightarrow(-\infty,+\infty] in metric space (X,m)(X,m) at each μX\mu\in X is defined by

(7.20) |𝒱|m(μ)=inf{lim infk|𝒱|m(μk):μkμ narrowly and supk{m(μ,μk),𝒱(μk)}<}.|\partial^{-}\mathcal{V}|_{m}(\mu)=\inf\{\liminf_{k\rightarrow\infty}|\partial\mathcal{V}|_{m}(\mu_{k})\mathrel{\mathop{\mathchar 58\relax}}\;\mu_{k}\rightharpoonup\mu\text{ narrowly and }\sup_{k}\{m(\mu,\mu_{k}),\mathcal{V}(\mu_{k})\}<\infty\}.

Under regularity assumptions such as the λ\lambda-geodesic-convexity of 𝒱\mathcal{V} with respect to mm, we have the equivalence |𝒱|m=|𝒱|m|\partial^{-}\mathcal{V}|_{m}=|\partial\mathcal{V}|_{m}. However, Proposition 7.5 implies that this is not the case for the SW-slopes of potential energies. Namely we show that even for smooth potentials the relaxed slope in the SW metric of the potential energy coincides with d|𝒱|W(μ)\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu) rather than |𝒱|SW(μ)|\partial\mathcal{V}|_{SW}(\mu).

Corollary 7.7 (Relaxed slope of the potential energy).

Let V:dV\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{d}\rightarrow\mathbb{R} be continuously differentiable with uniformly bounded derivatives. Denote by |𝒱|SW|\partial^{-}\mathcal{V}|_{SW} the lower semicontinuous envelope of |𝒱|SW|\partial\mathcal{V}|_{SW} with respect to the narrow topology as defined in (7.20). Then

(7.21) |𝒱|SW(μ)=d|𝒱|W(μ) for each μ𝒫2(d).|\partial^{-}\mathcal{V}|_{SW}(\mu)=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu)\text{ for each }\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}).
Remark 7.8.

As d|𝒱|W|𝒱|SW|𝒱|SW\sqrt{d}|\partial\mathcal{V}|_{W}\leq|\partial\mathcal{V}|_{\ell_{SW}}\leq|\partial\mathcal{V}|_{SW} in general, (7.21) implies

d|𝒱|W(μ)=d|𝒱|W(μ)|𝒱|SW(μ)|𝒱|SW(μ)=d|𝒱|W(μ).\sqrt{d}|\partial\mathcal{V}|_{W}(\mu)=\sqrt{d}|\partial^{-}\mathcal{V}|_{W}(\mu)\leq|\partial^{-}\mathcal{V}|_{\ell_{SW}}(\mu)\leq|\partial^{-}\mathcal{V}|_{SW}(\mu)=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu).

In particular, |𝒱|SW(μ)=d|𝒱|W(μ)|\partial^{-}\mathcal{V}|_{\ell_{SW}}(\mu)=\sqrt{d}|\partial\mathcal{V}|_{W}(\mu). ∎

Proof.

It is well-known [1, Proposition 10.4.2] that

|𝒱|W(μ)=VL2(μ),|\partial\mathcal{V}|_{W}(\mu)=\|\nabla V\|_{L^{2}(\mu)},

which is continuous with respect to the narrow convergence when V<\|\nabla V\|_{\infty}<\infty. Fix any μ𝒫2(d)\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}). Then, for any narrowly converging sequence μkμ𝒫2(d)\mu_{k}\rightharpoonup\mu\in\mathscr{P}_{2}(\mathbb{R}^{d}),

lim infk|𝒱|SW(μk)lim infkd|𝒱|W(μk)=d|𝒱|W(μ),\displaystyle\liminf_{k\rightarrow\infty}|\partial\mathcal{V}|_{SW}(\mu_{k})\geq\liminf_{k\rightarrow\infty}\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu_{k})=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu),

thus we conclude |𝒱|SW(μ)d|𝒱|W(μ)|\partial^{-}\mathcal{V}|_{SW}(\mu)\geq\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu) by taking infimum over such sequences.

On the other hand, approximating μ\mu by discrete measures μn\mu^{n} in SWSW, and taking a suitable subsequence to ensure supn𝒱(μn)<\sup_{n}\mathcal{V}(\mu^{n})<\infty, we have by Proposition 7.5

|𝒱|SW(μ)lim infn|𝒱|SW(μn)=lim infnd|𝒱|W(μn)=d|𝒱|W(μ).\displaystyle|\partial^{-}\mathcal{V}|_{SW}(\mu)\leq\liminf_{n\rightarrow\infty}|\partial\mathcal{V}|_{SW}(\mu^{n})=\liminf_{n\rightarrow\infty}\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu^{n})=\sqrt{d}\,|\partial\mathcal{V}|_{W}(\mu).

From the proof it is clear that (7.21) also holds if the lower semicontinuous envelope is defined with respect to the topology generated by SWSW.

Remark 7.9.

We note that in general |𝒱|SW|\partial^{-}\mathcal{V}|_{SW} is not a strong upper gradient. In particular, consider 2aμb/22a\leq\mu\leq b/2 on a bounded convex domain Ω\Omega for some 0<4a<b<0<4a<b<\infty and let VCc(d)V\in C_{c}^{\infty}(\mathbb{R}^{d}) with suppVΩ\operatorname{supp}V\subset\subset\Omega. We claim that |𝒱|SW=d|𝒱|W|\partial^{-}\mathcal{V}|_{SW}=\sqrt{d}|\partial\mathcal{V}|_{W} fails to be an upper gradient. As all derivatives of VV are bounded, for small time t>0t>0 the path μt=μt(Δ)(d+1)/2V\mu_{t}=\mu-t(-\Delta)^{(d+1)/2}V satisfies aμtba\leq\mu_{t}\leq b on Ω\Omega, and in particular remains in the space of probability measures. Furthermore, tμt(Δ)(d+1)/2V=0\partial_{t}\mu_{t}-(-\Delta)^{(d+1)/2}V=0 and thus

|μ|SW(t)d|μ|W(t)(Δ)(d1)/2VL2(1/μt)d/aVH˙d/2(d)|\mu^{\prime}|_{SW}(t)\leq\sqrt{d}|\mu^{\prime}|_{W}(t)\leq\|\nabla(-\Delta)^{(d-1)/2}V\|_{L^{2}(1/\mu_{t})}\leq\sqrt{d/a}\|V\|_{\dot{H}^{d/2}(\mathbb{R}^{d})}

and |𝒱|W(μ)=VL2(μt)bVL2(d)|\partial\mathcal{V}|_{W}(\mu)=\|\nabla V\|_{L^{2}(\mu_{t})}\leq\sqrt{b}\|\nabla V\|_{L^{2}(\mathbb{R}^{d})}. On the other hand,

|ddt𝒱(μt)|=|dV(x)tμt|=|dV(x)(Δ)d+12V𝑑x|=VH˙(d+1)/2(d)2.\displaystyle\left|\frac{d}{dt}\mathcal{V}(\mu_{t})\right|=\left|\int_{\mathbb{R}^{d}}V(x)\partial_{t}\mu_{t}\right|=\left|\int_{\mathbb{R}^{d}}V(x)(-\Delta)^{\frac{d+1}{2}}V\,dx\right|=\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}.

Thus, choosing sufficiently oscillatory VV such that

VH˙(d+1)/2(d)2>d2baVL2(d)VH˙d/2(d),\|V\|_{\dot{H}^{(d+1)/2}(\mathbb{R}^{d})}^{2}>\sqrt{\frac{d^{2}b}{a}}\|\nabla V\|_{L^{2}(\mathbb{R}^{d})}\|V\|_{\dot{H}^{d/2}(\mathbb{R}^{d})},

we see |ddt𝒱(μt)|>|𝒱|W(μt)|μ|W(μt)\left|\frac{d}{dt}\mathcal{V}(\mu_{t})\right|>|\partial\mathcal{V}|_{W}(\mu_{t})|\mu^{\prime}|_{W}(\mu_{t}) which verifies that d|𝒱|W\sqrt{d}|\partial\mathcal{V}|_{W} is not an upper gradient. ∎

Acknowledgements. The authors are grateful to Jun Kitagawa for stimulating discussions, and also to Tudor Manole for pointing us to the literature that lead to Remark 6.7. The authors acknowledge the support of the National Science Foundation via the grant DMS-2206069. They are also thankful to the Center for Nonlinear Analysis for its support. SP was also supported by the NSF grant DMS-2106534. The authors would also like to thank the anonymous referees for careful readings and numerous helpful suggestions, which greatly helped improve the exposition of this manuscript.

References

  • [1] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, second ed., 2008.
  • [2] L. Ambrosio and P. Tilli, Topics on analysis in metric spaces, vol. 25 of Oxford Lecture Series in Mathematics and its Applications, Oxford University Press, Oxford, 2004.
  • [3] M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications, Discrete Applied Mathematics, 47 (1993), pp. 207–217.
  • [4] Y. Bai, B. Schmitzer, M. Thorpe, and S. Kolouri, Sliced optimal partial transport, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 13681–13690.
  • [5] E. Bayraktar and G. Guo, Strong equivalence between metrics of Wasserstein type, Electronic Communications in Probability, 26 (2021), pp. 1 – 13.
  • [6] S. G. Bobkov, Isoperimetric and analytic inequalities for log-concave probability measures, The Annals of Probability, 27 (1999), pp. 1903 – 1921.
  • [7] S. G. Bobkov and C. Houdré, Isoperimetric constants for product probability measures, The Annals of Probability, 25 (1997), pp. 184 – 205.
  • [8] S. G. Bobkov and M. Ledoux, One-dimensional empirical measures, order statistics, and Kantorovich Transport Distances, American Mathematical Society, 2019.
  • [9] C. Bonet, P. Berg, N. Courty, F. Septier, L. Drumetz, and M. T. Pham, Spherical sliced-Wasserstein, in The Eleventh International Conference on Learning Representations, 2023.
  • [10] C. Bonet, N. Courty, F. Septier, and L. Drumetz, Efficient gradient flows in sliced-Wasserstein space, Transactions on Machine Learning Research, (2022).
  • [11] N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, Sliced and Radon Wasserstein barycenters of measures, Journal of Mathematical Imaging and Vision, 51 (2014), p. 22–45.
  • [12] N. Bonnotte, Unidimensional and Evolution Methods for Optimal Transportation, PhD thesis, Université Paris-Sud, 2013.
  • [13] C. Borell, Convex measures on locally convex spaces, Arkiv för Matematik, 12 (1974), pp. 239–252.
  • [14] C. Borell, Convex set functions in d-space, Periodica Mathematica Hungarica, 6 (1975), p. 111–136.
  • [15] F. Bruhat, Distributions sur un groupe localement compact et applications à l’étude des représentations des groupes pp-adiques, Bulletin de la Société Mathématique de France, 89 (1961), pp. 43–75.
  • [16] D. Burago, Y. Burago, and S. Ivanov, A course in metric geometry, vol. 33 of Graduate Studies in Mathematics, American mathematical Society, 2001.
  • [17] G. Cozzi and F. Santambogio, Long-time asymptotics of the sliced-wasserstein flow, arXiv preprint arXiv:2405.06313, (2024).
  • [18] B. Dai and U. Seljak, Sliced iterative normalizing flows, arXiv preprint arXiv:2007.00674, (2020).
  • [19] C. Dellacherie and P.-A. Meyer, Probabilities and potential, vol. 29 of North-Holland Mathematics Studies, North-Holland Publishing Co., Amsterdam-New York, 1978.
  • [20] I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing, Max-sliced Wasserstein distance and its use for GANs, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10648–10656.
  • [21] I. Deshpande, Z. Zhang, and A. G. Schwing, Generative modeling using the sliced Wasserstein distance, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [22] L. Devroye and G. Lugosi, Combinatorial methods in density estimation, Springer, 2001.
  • [23] A. Figalli, The optimal partial transport problem, Archive for Rational Mechanics and Analysis, 195 (2009), p. 533–560.
  • [24] I. M. Gelfand, M. I. Graev, N. Y. Vilenkin, and E. J. Saletan, Generalized functions. Vol. 5: Integral Geometry and Representation Theory, Academic Press, 1966.
  • [25] S. Helgason, Integral geometry and Radon transforms, Springer, 1 ed., 2010.
  • [26] J. Kitagawa and A. Takatsu, Sliced optimal transport: is it a suitable replacement?, arXiv preprint arXiv:2311.15874, (2023).
  • [27] J. Kitagawa and A. Takatsu, Disintegrated optimal transport for metric fiber bundles, arXiv preprint arXiv:2407.01879, (2024).
  • [28] S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde, Generalized sliced Wasserstein distances, in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds., vol. 32, Curran Associates, Inc., 2019.
  • [29] S. Kolouri, S. R. Park, and G. K. Rohde, The Radon cumulative distribution transform and its application to image classification, IEEE Transactions on Image Processing, 25 (2016), pp. 920–934.
  • [30] S. Kolouri, P. E. Pope, C. E. Martin, and G. K. Rohde, Sliced Wasserstein auto-encoders, in International Conference on Learning Representations, 2019.
  • [31] C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht, Sliced Wasserstein discrepancy for unsupervised domain adaptation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  • [32] G. Leoni, A first course in fractional Sobolev spaces, vol. 229 of Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, 2023.
  • [33] S. Li and C. Moosmüller, Measure transfer via stochastic slicing and matching, arXiv preprint arXiv:2307.05705, (2023).
  • [34] T. Lin, Z. Zheng, E. Chen, M. Cuturi, and M. Jordan, On projection robust optimal transport: Sample complexity and model misspecification, in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, A. Banerjee and K. Fukumizu, eds., vol. 130 of Proceedings of Machine Learning Research, PMLR, 13–15 Apr 2021, pp. 262–270.
  • [35] A. Liutkus, U. Simsekli, S. Majewski, A. Durmus, and F.-R. Stöter, Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 4104–4113.
  • [36] G. Loeper, Uniqueness of the solution to the Vlasov–Poisson system with bounded density, Journal de Mathématiques Pures et Appliquées, 86 (2006), pp. 68–79.
  • [37] T. Manole, S. Balakrishnan, and L. Wasserman, Minimax confidence intervals for the sliced Wasserstein distance, Electronic Journal of Statistics, 16 (2022), pp. 2252 – 2345.
  • [38] R. J. McCann, A convexity theory for interacting gases and equilibrium crystals, PhD thesis, Princeton University, 1994.
  • [39] M. Muratori and G. Savaré, Gradient flows and evolution variational inequalities in metric spaces. i: Structural properties, Journal of Functional Analysis, 278 (2020), p. 108347.
  • [40] K. Nadjahi, Sliced-Wasserstein distance for large-scale machine learning : theory, methodology and extensions, theses, Institut Polytechnique de Paris, Nov. 2021.
  • [41] K. Nadjahi, A. Durmus, L. Chizat, S. Kolouri, S. Shahrampour, and U. Simsekli, Statistical and topological properties of sliced probability divergences, in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds., vol. 33, Curran Associates, Inc., 2020, pp. 20802–20812.
  • [42] K. Nguyen and N. Ho, Revisiting sliced Wasserstein on images: From vectorization to convolution, in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, eds., 2022.
  • [43] K. Nguyen, N. Ho, T. Pham, and H. Bui, Distributional sliced-Wasserstein and applications to generative modeling, in International Conference on Learning Representations, 2021.
  • [44] S. Nietert, Z. Goldfeld, R. Sadhu, and K. Kato, Statistical, robustness, and computational guarantees for sliced Wasserstein distances, in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, eds., vol. 35, Curran Associates, Inc., 2022, pp. 28179–28193.
  • [45] J. Niles-Weed and P. Rigollet, Estimation of Wasserstein distances in the Spiked Transport Model, Bernoulli, 28 (2022), pp. 2663 – 2688.
  • [46] J. L. M. Olea, C. Rush, A. Velez, and J. Wiesel, The out-of-sample prediction error of the square-root-LASSO and related estimators, arXiv preprint arXiv:2211.07608, (2023).
  • [47] M. Osborne, On the Schwartz-Bruhat space and the Paley-Wiener theorem for locally compact abelian groups, Journal of Functional Analysis, 19 (1975), pp. 40–49.
  • [48] F.-P. Paty and M. Cuturi, Subspace robust Wasserstein distances, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 5072–5081.
  • [49] Peyre, Rémi, Comparison between W2W_{2} distance and H˙1\dot{H}^{-1} norm, and localization of Wasserstein distance, ESAIM: COCV, 24 (2018), pp. 1489–1501.
  • [50] F. Pitié, A. C. Kokaram, and R. Dahyot, Automated colour grading using colour distribution transfer, Computer Vision and Image Understanding, 107 (2007), pp. 123–137. Special issue on color image processing.
  • [51] J. Rabin, G. Peyré, J. Delon, and M. Bernot, Wasserstein barycenter and its application to texture mixing, in Scale Space and Variational Methods in Computer Vision, A. M. Bruckstein, B. M. ter Haar Romeny, A. M. Bronstein, and M. M. Bronstein, eds., Berlin, Heidelberg, 2012, Springer Berlin Heidelberg, pp. 435–446.
  • [52] A. G. Ramm, Radon transform on distributions, Proceedings of the Japan Academy, Series A, Mathematical Sciences, 71 (1995), pp. 202 – 206.
  • [53] R. Rossi, G. Savaré, A. Segatti, and U. Stefanelli, Weighted energy-dissipation principle for gradient flows in metric spaces, Journal de Mathématiques Pures et Appliquées, 127 (2019), pp. 1–66.
  • [54] F. Santambrogio, Optimal transport for applied mathematicians: calculus of variations, PDEs, and modeling, Birkhäuser, 2015.
  • [55] S. Sarkar and A. K. Kuchibhotla, Post-selection inference for conformal prediction: Trading off coverage for precision, arXiv preprint arXiv:2304.06158, (2023).
  • [56] N. Sauer, On the density of families of sets, Journal of Combinatorial Theory, Series A, 13 (1972), pp. 145–147.
  • [57] V. Sharafutdinov, Radon transform on Sobolev spaces, Siberian mathematical Journal, 50 (2021), pp. 560–580.
  • [58] S. Shelah, A combinatorial problem; stability and order for models and theories in infinitary languages., Pacific Journal of Mathematics, 41 (1972), pp. 247 – 261.
  • [59] K. T. Smith, D. C. Solmon, and S. L. Wagner, Practical and mathematical aspects of the problem of reconstructing objects from radiographs, Bull. Amer. Math. Soc., 83 (1977), pp. 1227–1270.
  • [60] D. C. Solomon, Asymptotic formulas for the dual Radon transform and applications, Mathematische Zeitschrift, 195 (1987), pp. 1432–1823.
  • [61] B. Sriperumbudur, On the optimal estimation of probability measures in weak and strong topologies, Bernoulli, 22 (2016), pp. 1839 – 1893.
  • [62] H. Triebel, Theory of function spaces, Modern Birkhäuser Classics, Birkhäuser/Springer Basel AG, Basel, 2010. Reprint of 1983 edition.
  • [63] V. N. Vapnik, The nature of statistical learning theory, Springer, 2 ed., 2013.
  • [64] V. N. Vapnik and A. Y. Chervonenkis, Ordered risk minimization. I, Automat. Remote Control, 35 (1974), pp. 1226–1235.
  • [65] C. Villani, Optimal transport, old and new, vol. 338 of Grundlehren der mathematischen Wissenschaften, Springer-Verlag, Berlin, 2009.

Appendix A Preliminaries on the Radon transform

In this appendix we record some basic properties of the Radon transform in further detail.

Remark A.1 (Radon transform of measures and distributions).

The duality formula (2.2) is used to extend the Radon transform to distributions. For general distributions there are ambiguities, as R𝔤R^{\ast}\mathfrak{g} does not necessarily decay rapidly at infinity even for 𝔤Cc(d)\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d}) ; see [25, Chapter 1.5] and  [52]. However, for bounded measures, pushforward by the projection map π~θ(x)=xθ\tilde{\pi}^{\theta}(x)=x\cdot\theta is consistent with the duality formula. To see this, let 𝔤C0(d)\mathfrak{g}\in C_{0}(\mathbb{P}_{d}) – i.e. 𝔤\mathfrak{g} is continuous in (θ,r)(\theta,r) and vanishes as |r||r|\rightarrow\infty. Then, it can be verified that R𝔤C0(d)R^{\ast}\mathfrak{g}\in C_{0}(\mathbb{R}^{d}). Thus by Fubini’s theorem

μ,R𝔤d\displaystyle\langle\mu,R^{\ast}\mathfrak{g}\rangle_{\mathbb{R}^{d}} =d𝕊d1𝔤(θ,xθ)𝑑θ𝑑μ(x)\displaystyle=\int_{\mathbb{R}^{d}}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathfrak{g}(\theta,x\cdot\theta)d\theta\,d\mu(x)
=𝕊d1d𝔤(θ,xθ)𝑑μ(x)𝑑θ=𝕊d1𝔤(θ,r)𝑑π~#θμ(r)𝑑θ=π~#θμ,𝔤d.\displaystyle=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}^{d}}\mathfrak{g}(\theta,x\cdot\theta)\,d\mu(x)\,d\theta=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\mathfrak{g}(\theta,r)d\tilde{\pi}^{\theta}_{\#}\mu(r)\,d\theta=\langle\tilde{\pi}^{\theta}_{\#}\mu,\mathfrak{g}\rangle_{\mathbb{P}_{d}}.

In the second last equality we used the change of variables formula and that xθ=rx\cdot\theta=r for all x(π~θ)1(r)=rθ+θx\in(\tilde{\pi}^{\theta})^{-1}(r)=r\theta+\theta^{\perp}. As b(d)\mathcal{M}_{b}(\mathbb{R}^{d}) equipped with the total variation is the dual of C0(d)C_{0}(\mathbb{R}^{d}), the Radon transform can be unambiguously extended to μb(d)\mu\in\mathcal{M}_{b}(\mathbb{R}^{d}).

We also note that for distributions in Hts(d)H_{t}^{s}(\mathbb{R}^{d}) the Radon transform (or its extension) can be defined unambiguously and the duality formula can be verified; see [57], and the discussion preceding (2.13). ∎

Another important property of the Radon transform is its relationship to the Fourier transform.

Proposition A.2 (The Fourier slicing property).

For k=1,dk=1,d, let k\mathcal{F}_{k} denote the kk-dimensional Fourier transform from 𝒮(k)\mathcal{S}(\mathbb{R}^{k}) to itself. Then for each f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d})

(A.1) (2π)d12df(θζ)=1Rθf(ζ) for all θ𝕊d1 and ζ.(2\pi)^{\tfrac{d-1}{2}}\mathcal{F}_{d}f(\theta\zeta)=\mathcal{F}_{1}R_{\theta}f(\zeta)\text{ for all }\theta\in\mathbb{S}^{d-1}\text{ and }\zeta\in\mathbb{R}.

Moreover, (A.1) holds a.e. for fL1(d)f\in L^{1}(\mathbb{R}^{d}).

Proof.

By definition,

(2π)d2df(θζ)=deiζθxf(x)𝑑x\displaystyle(2\pi)^{\tfrac{d}{2}}\mathcal{F}_{d}f(\theta\zeta)=\int_{\mathbb{R}^{d}}e^{-i\zeta\theta\cdot x}f(x)\,dx =eiζrxθ=rf(x)𝑑x𝑑r\displaystyle=\int_{\mathbb{R}}e^{-i\zeta r}\int_{x\cdot\theta=r}f(x)\,dx\,dr
=eiζrRθf(r)𝑑r=2π1Rθf(ζ).\displaystyle=\int_{\mathbb{R}}e^{-i\zeta r}R_{\theta}f(r)\,dr=\sqrt{2\pi}\mathcal{F}_{1}R_{\theta}f(\zeta).

Note all the equalities above are justified for a.e. θ𝕊d1,ζ\theta\in\mathbb{S}^{d-1},\zeta\in\mathbb{R} when fL1(d)f\in L^{1}(\mathbb{R}^{d}). ∎

From (A.1) it follows that, for f,g𝒮(d)f,g\in\mathcal{S}(\mathbb{R}^{d})

(A.2) R(fg)(θ,r)=(RθfRθg)(r),R(f\ast g)(\theta,r)=(R_{\theta}f\ast R_{\theta}g)(r),

where \ast on the left-hand side denotes the dd-dimensional convolution and \ast on the right-hand side denotes the 11-dimensional convolution. Indeed, as d(fg)=(2π)d/2dfdg\mathcal{F}_{d}(f\ast g)=(2\pi)^{d/2}\mathcal{F}_{d}f\mathcal{F}_{d}g and 1(RθfRθg)=(2π)1/21Rθf1Rθg\mathcal{F}_{1}(R_{\theta}f\ast R_{\theta}g)=(2\pi)^{1/2}\mathcal{F}_{1}R_{\theta}f\mathcal{F}_{1}R_{\theta}g, we have

(1Rθ(fg))(ζ)\displaystyle(\mathcal{F}_{1}R_{\theta}(f\ast g))(\zeta) =(2π)d12(d(fg))(θζ)=(2π)d1/2df(θζ)dg(θζ)\displaystyle=(2\pi)^{\frac{d-1}{2}}(\mathcal{F}_{d}(f\ast g))(\theta\zeta)=(2\pi)^{d-1/2}\mathcal{F}_{d}f(\theta\zeta)\mathcal{F}_{d}g(\theta\zeta)
=(2π)1/21Rθf(ζ)1Rθg(ζ)=1(RθfRθg)(θζ).\displaystyle=(2\pi)^{1/2}\mathcal{F}_{1}R_{\theta}f(\zeta)\mathcal{F}_{1}R_{\theta}g(\zeta)=\mathcal{F}_{1}(R_{\theta}f\ast R_{\theta}g)(\theta\zeta).

Moreover, the same computation is justified when f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}) and g𝒮(d)g\in\mathcal{S}^{\prime}(\mathbb{R}^{d}) and Rg𝒮(d)Rg\in\mathcal{S}^{\prime}(\mathbb{P}_{d}) is well-defined. In particular, (A.2) holds for f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}) and gHts(d)g\in H_{t}^{s}(\mathbb{R}^{d}) with t(d2,d2)t\in(-\frac{d}{2},\frac{d}{2}), which includes the case gb(d)g\in\mathcal{M}_{b}(\mathbb{R}^{d}).

Next we record the smoothing effect of RRR^{\ast}R.

Proposition A.3 (Regularizing property of the Radon transform).

Let us denote by AkA_{k} the surface area of the (k1)(k-1)-dimensional sphere. Then

(A.3) RRf(x)=Ad1Addf(y)|yx|𝑑y.R^{\ast}Rf(x)=\frac{A_{d-1}}{A_{d}}\int_{\mathbb{R}^{d}}\frac{f(y)}{|y-x|}\,dy.
Proof.

By using polar coordinates and Fubini’s Theorem, we see

(RRf)(x)\displaystyle(R^{\ast}Rf)(x) =𝕊d1f^(θ,xθ)dθ=𝕊d1yθf(x+y)dydθ\displaystyle=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\widehat{f}(\theta,x\cdot\theta)d\theta=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{y\in\theta^{\perp}}f(x+y)\,dyd\theta
=𝕊d1{ω𝕊d1:ωθ}0f(x+rω)rd2drdωdθ\displaystyle=\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\int_{\{\omega\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\omega\perp\theta\}}\int_{0}^{\infty}f(x+r\omega)r^{d-2}\,dr\,d\omega\,d\theta
=1Ad0𝕊d1{θ𝕊d1:ωθ}f(x+rω)rd2dθdωdr\displaystyle=\frac{1}{A_{d}}\int_{0}^{\infty}\int_{\mathbb{S}^{d-1}}\int_{\{\theta\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}\omega\perp\theta\}}f(x+r\omega)r^{d-2}\,d\theta\,d\omega\,dr
=Ad1Ad0𝕊d1rd2f(x+rω)dωdr=Ad1Addf(y)|yx|dy\displaystyle=\frac{A_{d-1}}{A_{d}}\int_{0}^{\infty}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}r^{d-2}f(x+r\omega)\,d\omega\,dr=\frac{A_{d-1}}{A_{d}}\int_{\mathbb{R}^{d}}\frac{f(y)}{|y-x|}\,dy

This confirms the intuition that RRfR^{\ast}Rf should be more regular than ff, as for f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d})

limε0d|xy|d+εf(y)dy=C(d)f(x)\lim_{\varepsilon\searrow 0}\int_{\mathbb{R}^{d}}|x-y|^{-d+\varepsilon}f(y)\,dy=C(d)f(x)

for some dimension dependent constant C(d)C(d). To examine the regularizing property in more detail, we first introduce the Riesz potentials; see [25, Chapter VII] for further details.

Definition A.4 (Riesz potential).

For γ\gamma\in\mathbb{R} and f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}) we define its Riesz potential Idγf𝒮(d)I_{d}^{\gamma}f\in\mathcal{S}(\mathbb{R}^{d}) by

(A.4) (Idγf)(x)=1Hd(γ)df(y)|yx|dγ where Hd(γ)=2γπd/2Γ(γ2)Γ(dγ2).(I_{d}^{\gamma}f)(x)=\frac{1}{H_{d}(\gamma)}\int_{\mathbb{R}^{d}}\normalcolor\frac{f(y)}{|y-x|^{d-\gamma}}\normalcolor\,\quad\text{ where }H_{d}(\gamma)=\normalcolor 2^{\gamma}\pi^{d/2}\frac{\Gamma(\tfrac{\gamma}{2})}{\Gamma(\tfrac{d-\gamma}{2})}\normalcolor.
Proposition A.5 (Properties of the Riesz potential).

Let f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}). Then

  • (i)

    (Lemma 6.4 of [25]) γ(Idγf)(x)\gamma\rightarrow(I_{d}^{\gamma}f)(x) extends to a holomorphic function in the set 𝕕={γ:γd2+}\mathbb{C_{d}}=\{\gamma\in\mathbb{C}\mathrel{\mathop{\mathchar 58\relax}}\,\gamma-d\not\in 2\mathbb{Z}^{+}\}. Also

    (A.5) Id0f=limγ0Idγf=f and IdγΔf=ΔIdγf=Idγ2fI_{d}^{0}f=\lim_{\gamma\rightarrow 0}I_{d}^{\gamma}f=f\text{ and }I_{d}^{\gamma}\Delta f=\Delta I_{d}^{\gamma}f=-I_{d}^{\gamma-2}f

    Thus, we will understand fractional orders of Δ\Delta and \Box as

    (A.6) (Δ)s=Id2s,()s=I12s,(-\Delta)^{s}=I_{d}^{-2s},\,(-\Box)^{s}=I_{1}^{-2s},
  • (ii)

    (Proposition 6.5 of [25]) We have the following identity.

    (A.7) Idα(Idβf)=Idα+βf for f𝒮(d) whenever Reα,Reβ>0 and Re(α+β)<d.I_{d}^{\alpha}(I_{d}^{\beta}f)=I_{d}^{\alpha+\beta}f\text{ for }f\in\mathcal{S}(\mathbb{R}^{d})\text{ whenever }\operatorname{Re}\alpha,\operatorname{Re}\beta>0\text{ and }\operatorname{Re}(\alpha+\beta)<d.

We often suppress the dimensional notation and simply write IγI^{\gamma} when it is clear.

We record some properties of the Radon transform; see [25, Chapter 1] for proofs. Intertwining property between the Laplacian and the Radon transform follows from direct calculations.

Proposition A.6 (Intertwining property).

For f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}) and 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d}), we have

(A.8) R(Δ)f=(r2)Rf,R(r2)𝔤=ΔR𝔤.R{(-\Delta)f}=(-\partial_{r}^{2})Rf,\,\quad R^{\ast}(-\partial_{r}^{2})\mathfrak{g}=-\Delta R^{\ast}\mathfrak{g}.
Proof.

To show the first item, write τhf(x)=f(xh)\tau_{h}f(x)=f(x-h) and notice

R(τhf)(θ,r)=yθ=rτhf(y)dy=yθ=rf(y+h)dy=yθ=r+hθf(y)dy=Rf(θ,r+hθ),R(\tau_{-h}f)(\theta,r)=\normalcolor\int_{y\cdot\theta=r}\tau_{-h}f(y)\,dy\normalcolor=\int_{y\cdot\theta=r}f(y+h)\,dy=\int_{y\cdot\theta=r+h\cdot\theta}f(y)\,dy=Rf(\theta,r+h\cdot\theta),

thus

(R(if))(θ,r)=θirRf(θ,r),(R(\partial_{i}f))(\theta,r)=\theta_{i}\partial_{r}Rf(\theta,r),

which, along with i=1dθi2=1\sum_{i=1}^{d}\theta_{i}^{2}=1 gives the first equality in (A.8).

The second item is immediate from the definition, as

iR𝔤(x)=i𝕊d1𝔤(θ,xθ)dθ=θi𝕊d1r𝔤(θ,xθ)dθ=θiRr𝔤(x).\displaystyle\partial_{i}R^{\ast}\mathfrak{g}(x)=\partial_{i}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\mathfrak{g}(\theta,x\cdot\theta)\,d\theta=\theta_{i}\mathchoice{{\vbox{\hbox{$\textstyle-$}}\kern-4.86108pt}}{{\vbox{\hbox{$\scriptstyle-$}}\kern-3.25pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-2.29166pt}}{{\vbox{\hbox{$\scriptscriptstyle-$}}\kern-1.875pt}}\!\int_{\mathbb{S}^{d-1}}\partial_{r}\mathfrak{g}(\theta,x\cdot\theta)\,d\theta=\theta_{i}R^{\ast}\partial_{r}\mathfrak{g}(x).

We give a precise definition of the (d1)(d-1)-order differential operator Λd\Lambda_{d} involved in inversion formula for the Radon and the dual transform.

Definition A.7.

Let Λd:𝒮(d)𝒮(d)\Lambda_{d}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{S}(\mathbb{P}_{d})\rightarrow\mathcal{S}(\mathbb{P}_{d}) be defined by

(A.9) Λd={(i)d1d1rd1 when d is odd (i)d1d1rd1 when d is even .\Lambda_{d}=\begin{cases}(-i)^{d-1}\frac{\partial^{d-1}}{\partial r^{d-1}}&\text{ when }d\text{ is odd }\\ (-i)^{d-1}\mathcal{H}\frac{\partial^{d-1}}{\partial r^{d-1}}&\text{ when }d\text{ is even }\end{cases}.

where :𝒮(𝕊d1×)𝒮(𝕊d1×)\mathcal{H}\mathrel{\mathop{\mathchar 58\relax}}\mathcal{S}(\mathbb{S}^{d-1}\times\mathbb{R})\rightarrow\mathcal{S}(\mathbb{S}^{d-1}\times\mathbb{R}) is the Hilbert transform in the scalar variable

(A.10) 𝔤(θ,r)=iπ𝔤(θ,s)rsds.\mathcal{H}\mathfrak{g}(\theta,r)=\frac{i}{\pi}\int_{\mathbb{R}}\frac{\mathfrak{g}(\theta,s)}{r-s}\,ds.
Remark A.8.

From the interaction of derivatives and the Hilbert transform with the Fourier transform, we can easily verify that for each 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d})

(1Λd𝔤θ)(ζ)=|ξ|d1(1𝔤θ)(ζ).(\mathcal{F}_{1}\Lambda_{d}\mathfrak{g}_{\theta})(\zeta)=|\xi|^{d-1}(\mathcal{F}_{1}\mathfrak{g}_{\theta})(\zeta).

Consequently, for 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d}) we have

Λd𝔤Ht(d1)s(d1)(d)2\displaystyle\|\Lambda_{d}\mathfrak{g}\|_{H_{t-(d-1)}^{s-(d-1)}(\mathbb{P}_{d})}^{2} =12(2π)d1𝕊d1|ζ|2t(2d2)(1+ζ2)st|1Λd𝔤θ(ζ)|2dζdθ\displaystyle=\frac{1}{2(2\pi)^{d-1}}\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\zeta|^{2t-(2d-2)}(1+\zeta^{2})^{s-t}|\mathcal{F}_{1}\Lambda_{d}\mathfrak{g}_{\theta}(\zeta)|^{2}\,d\zeta\,d\theta
=12(2π)d1𝕊d1|ζ|2t(1+ζ2)st|1𝔤θ(ζ)|2dζdθ=𝔤Hts(d)2.\displaystyle=\frac{1}{2(2\pi)^{d-1}}\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}|\zeta|^{2t}(1+\zeta^{2})^{s-t}|\mathcal{F}_{1}\mathfrak{g}_{\theta}(\zeta)|^{2}\,d\zeta\,d\theta=\|\mathfrak{g}\|_{H_{t}^{s}(\mathbb{P}_{d})}^{2}.

Thus this we may extend Λd\Lambda_{d} as a bijective linear isometry from Hts(d)H_{t}^{s}(\mathbb{P}_{d}) to Ht(d1)s(d1)(d)H_{t-(d-1)}^{s-(d-1)}(\mathbb{P}_{d}) when t>d3/2t>d-3/2. ∎

For Schwartz functions, we have the following inversion formulae [60, Theorem 8.1].

Proposition A.9 (Inversion formula for the Radon and the dual transform).

For all f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d})

(A.11) cdf=RΛdRf,c_{d}f=R^{\ast}\Lambda_{d}Rf,

where cd=(4π)(d1)/2Γ(d/2)/Γ(1/2)c_{d}=(4\pi)^{(d-1)/2}\Gamma(d/2)/\Gamma(1/2). Similarly, for all 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d})

(A.12) cd𝔤(θ,p)=RR(Λd𝔤).c_{d}\mathfrak{g}(\theta,p)=RR^{\ast}(\Lambda_{d}\mathfrak{g}).

Here, 𝒮(d)\mathcal{S}(\mathbb{P}_{d}) is defined as in (2.5). Furthermore, RΛd𝔤CR^{\ast}\Lambda_{d}\mathfrak{g}\in C^{\infty} with (RΛd𝔤)(x)=O(|x|d)(R^{\ast}\Lambda_{d}\mathfrak{g})(x)=O(|x|^{-d}), hence fLp(d)f\in L^{p}(\mathbb{R}^{d}) for p(1,+]p\in(1,+\infty].

Remark A.10 (Formal derivation of the inversion formula).

Let 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d}). Denote by k\mathcal{F}_{k} with k=1,dk=1,d the Fourier transform in kk-dimensions. Recalling 1(Rθf)(ζ)=(2π)d12df(ζθ)\mathcal{F}_{1}(R_{\theta}f)(\zeta)=(2\pi)^{\tfrac{d-1}{2}}\mathcal{F}_{d}f(\zeta\theta), define ff in the Fourier domain by

df(ζθ):=(2π)1d21𝔤θ(ζ).\mathcal{F}_{d}f(\zeta\theta)\mathrel{\mathop{\mathchar 58\relax}}=(2\pi)^{\tfrac{1-d}{2}}\mathcal{F}_{1}\mathfrak{g}_{\theta}(\zeta).

As 𝔤\mathfrak{g} is even, the above is well-defined, and we have

1(Rθf)(ζ)=1𝔤θ(ζ).\mathcal{F}_{1}(R_{\theta}f)(\zeta)=\mathcal{F}_{1}\mathfrak{g}_{\theta}(\zeta).

By injectivity of the Fourier transform 1\mathcal{F}_{1}, we have Rf=𝔤Rf=\mathfrak{g}. Thus the key to proving the inversion formulae is to justify the Fourier and the inverse Fourier transforms, which comes down to regularity of the functions.

Formally, we can find an expression for ff following the argument in Theorem 12.6 of [59]. For any test function φCc(d)\varphi\in C_{c}^{\infty}(\mathbb{R}^{d}),

Rφ,Λd𝔤L2(d)\displaystyle\langle R\varphi,\Lambda_{d}\mathfrak{g}\rangle_{L^{2}(\mathbb{P}_{d})} =C𝕊d11(Rθφ)(ζ)|ζ|d11𝔤θ(ζ)dζdθ by the Plancherel formula for 1\displaystyle=C\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}\mathcal{F}_{1}(R_{\theta}\varphi)(\zeta)|\zeta|^{d-1}\mathcal{F}_{1}\mathfrak{g}_{\theta}(\zeta)\,d\zeta\,d\theta\qquad\text{ by the Plancherel formula for }\mathcal{F}_{1}
=C𝕊d1(dφ)(ζθ)|ζ|d1(df)(ζθ)dζdθ as dφ(ζθ)=1(Rθφ)(ζ)\displaystyle=C\int_{\mathbb{S}^{d-1}}\int_{\mathbb{R}}(\mathcal{F}_{d}\varphi)(\zeta\theta)|\zeta|^{d-1}(\mathcal{F}_{d}f)(\zeta\theta)\,d\zeta\,d\theta\qquad\text{ as }\mathcal{F}_{d}\varphi(\zeta\theta)=\mathcal{F}_{1}(R_{\theta}\varphi)(\zeta)
=Cdφ,dfL2(d)=Cφ,fL2(d)\displaystyle=C\langle\mathcal{F}_{d}\varphi,\mathcal{F}_{d}f\rangle_{L^{2}(\mathbb{R}^{d})}=C\langle\varphi,f\rangle_{L^{2}(\mathbb{R}^{d})}

where we have used polar coordinates and the Plancherel formula for the Fourier transform. Thus

φ,RΛd𝔤L2(d)=Rφ,Λd𝔤L2(d)=Cφ,fL2(d),\langle\varphi,R^{\ast}\Lambda_{d}\mathfrak{g}\rangle_{L^{2}(\mathbb{P}_{d})}=\langle R\varphi,\Lambda_{d}\mathfrak{g}\rangle_{L^{2}(\mathbb{P}_{d})}=C\langle\varphi,f\rangle_{L^{2}(\mathbb{R}^{d})},

and we may conclude, for for some constant CC only depending on the dimension dd,

f=CRΛd𝔤f=CR^{\ast}\Lambda_{d}\mathfrak{g}

As Rf=𝔤Rf=\mathfrak{g}, this gives us (A.11), and yields (A.12) by applying RR on both sides ∎

We end this section with a few results on the Sobolev spaces with attenuated/amplified low frequencies. We first note Hts(d)H_{t}^{s}(\mathbb{R}^{d}) continuously embeds in 𝒮(d)\mathcal{S}^{\prime}(\mathbb{R}^{d}) [57, Theorem 5.3].

Theorem A.11 (Hts(d)H_{t}^{s}(\mathbb{R}^{d}) and 𝒮(d)\mathcal{S}^{\prime}(\mathbb{R}^{d})).

Let t(d/2,d/2)t\in(-d/2,d/2) and r,sr,s\in\mathbb{R}, the identity map of 𝒮(d)\mathcal{S}(\mathbb{R}^{d}) extends to the continuous embedding Hts(d)𝒮(d)H_{t}^{s}(\mathbb{R}^{d})\subset\mathcal{S}^{\prime}(\mathbb{R}^{d}). In other words, Hts(d)H_{t}^{s}(\mathbb{R}^{d}) consists of tempered distributions.

Sketch of Proof.

In general, for all s,ts,t\in\mathbb{R} one can show

(A.13) |dfg|fHts(d)gHts(d) for all f,g𝒮(d);\left|\int_{\mathbb{R}^{d}}fg\right|\leq\|f\|_{H_{t}^{s}(\mathbb{R}^{d})}\|g\|_{H_{-t}^{-s}(\mathbb{R}^{d})}\text{ for all }f,g\in\mathcal{S}(\mathbb{R}^{d});

see [57, Theorem 5.3] for further details. Whenever t(d/2,d/2)t\in(-d/2,d/2), for each g𝒮(d)g\in\mathcal{S}(\mathbb{R}^{d}) we have gHts(d)<\|g\|_{H_{-t}^{-s}(\mathbb{R}^{d})}<\infty. Thus, for any fHts(d)f\in H_{t}^{s}(\mathbb{R}^{d}), we can unambiguously define its action on each g𝒮(d)g\in\mathcal{S}(\mathbb{R}^{d}) as a limit of actions by the approximating sequence – i.e.

f(g):=limkdfk(x)g(x)dx for any (fk)k1 in 𝒮(d) such that fkHtsf.\displaystyle f(g)\mathrel{\mathop{\mathchar 58\relax}}=\lim_{k\rightarrow\infty}\int_{\mathbb{R}^{d}}f_{k}(x)g(x)\,dx\quad\text{ for any }(f_{k})_{k\geq 1}\text{ in }\mathcal{S}(\mathbb{R}^{d})\text{ such that }f_{k}\xrightarrow[]{H_{t}^{s}}f.

Furthermore, for each fixed g𝒮(d)g\in\mathcal{S}(\mathbb{R}^{d}) the estimate (A.13) is preserved for fHts(d)f\in H_{t}^{s}(\mathbb{R}^{d}), hence we deduce f𝒮(d)f\in\mathcal{S}^{\prime}(\mathbb{R}^{d}). ∎

Sharafutdinov also established the following supercritical Sobolev-embedding type result [57, Theorem 5.4].

Theorem A.12 (Hts(d)H_{t}^{s}(\mathbb{R}^{d}) and continuous functions).

If t(d2,d2)t\in(-\frac{d}{2},\frac{d}{2}), s>t+d2s>t+\frac{d}{2}, then Hts(d)H_{t}^{s}(\mathbb{R}^{d}) consists of bounded continuous functions.

Appendix B Continuity equation in each projection

In this section we provide a proof of Lemma 3.8, which was used to establish Theorem  3.9 (i). Our proof relies on the Ht(q,s)(Ω)H_{t}^{(q,s)}(\Omega)-norms for Ω=d,d\Omega=\mathbb{R}^{d},\mathbb{P}_{d} – introduced by Sharafutdinov [57] – which generalize the Hts(Ω)H_{t}^{s}(\Omega)-norms defined in (2.6) and (2.7) to include regularity in the direction θ𝕊d1\theta\in\mathbb{S}^{d-1}. In the simple case where t=0t=0 and q,sq,s are nonnegative integers, the Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{P}_{d})-norm is the sum of L2L^{2}-norms of derivatives of order q\leq q in the θ\theta-variable and derivatives of order s\leq s in the scalar variable. In this section we provide minimal details necessary to prove Lemma 3.8 and refer the interested readers to [57, Section 3] and references therein for further information.

Let YlC(𝕊d1)Y_{l}\in C^{\infty}(\mathbb{S}^{d-1}) be a spherical harmonic of degree ll if Yl=Y~l|𝕊d1Y_{l}=\tilde{Y}_{l}|_{\mathbb{S}^{d-1}} for a homogeneous polynomial Y~l\tilde{Y}_{l} of degree ll on d\mathbb{R}^{d} satisfying ΔY~l=0\Delta\tilde{Y}_{l}=0. The space of spherical harmonics of degree ll on 𝕊d1\mathbb{S}^{d-1} has finite dimension N(d,l)N(d,l), thus we can choose an orthonormal basis (Ylm)m=1N(d,l)(Y_{lm})_{m=1}^{N(d,l)} for the space. Then the spherical harmonics of degree ll are eigenfunctions of the spherical Laplacian Δθ:C(𝕊d1)C(𝕊d1)\Delta_{\theta}\mathrel{\mathop{\mathchar 58\relax}}C^{\infty}(\mathbb{S}^{d-1})\rightarrow C^{\infty}(\mathbb{S}^{d-1}), where sign is chosen to ensure that Δθ\Delta_{\theta} is positive definite,

ΔθYlm=λ(d,l)Ylm,λ(d,l)=l(l+d2).\Delta_{\theta}Y_{lm}=\lambda(d,l)Y_{lm},\qquad\lambda(d,l)=l(l+d-2).

We can represent the Fourier transform of each f𝒮(d)f\in\mathcal{S}(\mathbb{R}^{d}) by

(B.1) df(ξ)=l=0m=1N(d,l)f¯lm(|ξ|)Ylm(ξ/|ξ|)\mathcal{F}_{d}f(\xi)=\sum_{l=0}^{\infty}\sum_{m=1}^{N(d,l)}\bar{f}_{lm}(|\xi|)Y_{lm}(\xi/|\xi|)

where the coefficients f¯lmC([0,+))\bar{f}_{lm}\in C^{\infty}([0,+\infty)) and decays fast at infinity. Similarly, for each 𝔤𝒮(d)\mathfrak{g}\in\mathcal{S}(\mathbb{P}_{d})

(B.2) 1𝔤(θ,ζ)=l=0m=1N(d,l)𝔤¯lm(ζ)Ylm(θ)\mathcal{F}_{1}\mathfrak{g}(\theta,\zeta)=\sum_{l=0}^{\infty}\sum_{m=1}^{N(d,l)}\bar{\mathfrak{g}}_{lm}(\zeta)Y_{lm}(\theta)

with coefficients 𝔤¯lm𝒮()\bar{\mathfrak{g}}_{lm}\in\mathcal{S}(\mathbb{R}) satisfying 𝔤¯lm(ζ)=(1)l𝔤¯lm(ζ)\bar{\mathfrak{g}}_{lm}(-\zeta)=(-1)^{l}\bar{\mathfrak{g}}_{lm}(\zeta).

For any r,sr,s\in\mathbb{R} and t>d/2t>-d/2, the Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{R}^{d})-norm is defined by

(B.3) fHt(q,s)(d)2=l=0(λ(d,l)+1)qm=1N(d,l)0ζ2t+d1(1+ζ2)st|f¯lm(ζ)|2dζ.\|f\|_{H_{t}^{(q,s)}(\mathbb{R}^{d})}^{2}=\sum_{l=0}^{\infty}(\lambda(d,l)+1)^{q}\sum_{m=1}^{N(d,l)}\int_{0}^{\infty}\zeta^{2t+d-1}(1+\zeta^{2})^{s-t}|\bar{f}_{lm}(\zeta)|^{2}\,d\zeta.

The norm is independent of the choice of the orthonormal basis; see [57, Sections 3-4]. Similarly, for r,sr,s\in\mathbb{R} and t>1/2t>-1/2 the Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{P}_{d})-norm is defined by

(B.4) 𝔤Ht(q,s)(d)2=12(2π)d1l=0(λ(d,l)+1)qm=1N(d,l)|ζ|2t(1+ζ2)st|𝔤¯lm(ζ)|2dζ.\|\mathfrak{g}\|_{H_{t}^{(q,s)}(\mathbb{P}_{d})}^{2}=\frac{1}{2(2\pi)^{d-1}}\sum_{l=0}^{\infty}(\lambda(d,l)+1)^{q}\sum_{m=1}^{N(d,l)}\int_{\mathbb{R}}|\zeta|^{2t}(1+\zeta^{2})^{s-t}|\bar{\mathfrak{g}}_{lm}(\zeta)|^{2}\,d\zeta.

The spaces Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{R}^{d}) and Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{P}_{d}) are respectively the closures of 𝒮(d),𝒮(d)\mathcal{S}(\mathbb{R}^{d}),\mathcal{S}(\mathbb{P}_{d}) under the corresponding norm.

In fact, Ht(q,s)(Ω)H_{t}^{(q,s)}(\Omega)-norm for Ω=d,d\Omega=\mathbb{R}^{d},\mathbb{P}_{d} with q=0q=0 is exactly the Hts(Ω)H_{t}^{s}(\Omega)-norms; see the proof of [57, Theorem 5.1]. Moreover, when q0q\geq 0, Ht(q,s)(Ω)H_{t}^{(q,s)}(\Omega) is continuously embedded in Hts(Ω)H_{t}^{s}(\Omega). Hence for t(d/2,d/2)t\in(-d/2,d/2) for Ω=d\Omega=\mathbb{R}^{d} and t(1/2,1/2)t\in(-1/2,1/2), Ht(q,s)(Ω)H_{t}^{(q,s)}(\Omega) is continuously embedded in 𝒮(Ω)\mathcal{S}^{\prime}(\Omega).

Sharafutdinov showed [57, Theorem 4.3] that the Radon transform extends to a bijective Hilbert space isometry between Ht(q,s)(Ω)H_{t}^{(q,s)}(\Omega)-spaces. Namely, for all q,sq,s\in\mathbb{R} and t>d/2t>-d/2

(B.5) fHt(q,s)(d)=RfHt+(d1)/2(q,s+(d1)/2)(d) for all fHt(q,s)(d).\|f\|_{H_{t}^{(q,s)}(\mathbb{R}^{d})}=\|Rf\|_{H_{t+(d-1)/2}^{(q,s+(d-1)/2)}(\mathbb{P}_{d})}\text{ for all }f\in H_{t}^{(q,s)}(\mathbb{R}^{d}).

The crucial property we use in this section is the supercritical Sobolev-embedding-type inequality for Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{P}_{d}) spaces, which is due to Sharafutdinov [57, Corollary 5.11]. While the proof is omitted, the result readily follows from the analogous arguments for Hq(𝕊d1)H^{q}(\mathbb{S}^{d-1}) and Hts()H_{t}^{s}(\mathbb{R}) [57, Corollary 5.5].

Theorem B.1 (Supercritical Sobolev embedding for Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{P}_{d})).

If t(1/2,1/2)t\in(-1/2,1/2), s>t+1/2+ks>t+1/2+k, and q>(d1)/2+kq>(d-1)/2+k then Ht(q,s)(d)Ck(d)H_{t}^{(q,s)}(\mathbb{P}_{d})\subset C^{k}(\mathbb{P}_{d}) is a continuous embedding.

We now present a proof of Lemma 3.8.

Proof of Lemma 3.8.

Note that by hypothesis, for a.e. θ𝕊d1\theta\in\mathbb{S}^{d-1} and tIt\in I we have μ^tθ𝒫2()\widehat{\mu}_{t}^{\theta}\in\mathscr{P}_{2}(\mathbb{R}) and J^tθb(;d)\widehat{J}_{t}^{\theta}\in\mathcal{M}_{b}(\mathbb{R};\mathbb{R}^{d}). It suffices to show that for a.e θ𝕊d1\theta\in\mathbb{S}^{d-1}

(B.6) Iα(t)μ^tθ,ψ+α(t)θJ^tθ,rψdt for all αCc(I) and ψCc().\int_{I}\alpha^{\prime}(t)\langle\widehat{\mu}_{t}^{\theta},\psi\rangle_{\mathbb{R}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t}^{\theta},\partial_{r}\psi\rangle_{\mathbb{R}}\,dt\text{ for all }\alpha\in C_{c}^{\infty}(I)\text{ and }\psi\in C_{c}^{\infty}(\mathbb{R}).

Indeed, linear combinations of test functions of the form (t,x)α(t)ψ(r)(t,x)\mapsto\alpha(t)\psi(r) are dense in Cc(I×K)C_{c}^{\infty}(I\times K) for every compact KK\subset\mathbb{R} hence this implies (3.12); see [54, Proposition 4.2 and Exercise 4.23] for instance.

Step 1o. We first show that

(B.7) Iα(t)μt,φnd+α(t)Jt,φnddt=0 for all αCc(I) and 𝔤Cc(d).\int_{I}\alpha^{\prime}(t)\langle\mu_{t},\varphi_{n}\rangle_{\mathbb{R}^{d}}+\alpha(t)\langle J_{t},\nabla\varphi_{n}\rangle_{\mathbb{R}^{d}}\,dt=0\text{ for all }\alpha\in C_{c}^{\infty}(I)\text{ and }\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d}).

To this end, we first note by Proposition A.3 that R𝔤=RRR1𝔤=cIdd1R1𝔤R^{\ast}\mathfrak{g}=R^{\ast}RR^{-1}\mathfrak{g}=cI_{d}^{d-1}R^{-1}\mathfrak{g} for some dimension-dependent constant c>0c>0, where Id1d=(Δ)(d1)/2I^{d-1}_{d}=(-\Delta)^{-(d-1)/2} (see Definition A.4). Fix any q>(d+1)/2q>(d+1)/2. Then by (B.5) we can find some constant C=C(d)>0C=C(d)>0 such that

R𝔤H(d1)/2(q,2+(d1)/2))(d)=cIdd1R1𝔤H(d1)/2(q,2+(d1)/2)(d)=CR1𝔤H(d1)/2(q,2(d1)/2)(d)=C𝔤H(q,2)(d).\|R^{\ast}\mathfrak{g}\|_{H_{(d-1)/2}^{(q,2+(d-1)/2)})(\mathbb{R}^{d})}=\|cI_{d}^{d-1}R^{-1}\mathfrak{g}\|_{H_{(d-1)/2}^{(q,2+(d-1)/2)}(\mathbb{R}^{d})}=C\|R^{-1}\mathfrak{g}\|_{H_{-(d-1)/2}^{(q,2-(d-1)/2)}(\mathbb{R}^{d})}=C\|\mathfrak{g}\|_{H^{(q,2)}(\mathbb{P}_{d})}.

As 𝔤Cc(d)H(q,2)(d)\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d})\subset H^{(q,2)}(\mathbb{P}_{d}), R𝔤H(d1)/2(q,2+(d1)/2)(d)R^{\ast}\mathfrak{g}\in H_{(d-1)/2}^{(q,2+(d-1)/2)}(\mathbb{R}^{d}). Hence we can choose a sequence φn𝒮(d)\varphi_{n}\in\mathcal{S}(\mathbb{R}^{d}) such that φnR𝔤H(d1)/2(q,2+(d1)/2)(d)n0\|\varphi_{n}-R^{\ast}\mathfrak{g}\|_{H_{(d-1)/2}^{(q,2+(d-1)/2)}(\mathbb{R}^{d})}\xrightarrow[]{n\rightarrow\infty}0.

Observe that this implies

(B.8) cd1ΛdRφn𝔤C1(d)d,qcd1ΛdRφn𝔤H(q,2)(d)dφnR𝔤H(d1)/2(q,2+(d1)/2)(d)n0.\|c_{d}^{-1}\Lambda_{d}R\varphi_{n}-\mathfrak{g}\|_{C^{1}(\mathbb{P}_{d})}\lesssim_{d,q}\|c_{d}^{-1}\Lambda_{d}R\varphi_{n}-\mathfrak{g}\|_{H^{(q,2)}(\mathbb{P}_{d})}\lesssim_{d}\|\varphi_{n}-R^{\ast}\mathfrak{g}\|_{H_{(d-1)/2}^{(q,2+(d-1)/2)}(\mathbb{R}^{d})}\xrightarrow[]{n\rightarrow\infty}0.

Indeed, the first inequality is a consequence of Theorem B.1 applied to s=2,t=0s=2,t=0 and q>(d+1)/2q>(d+1)/2. The second inequality is in fact an equality up to a constant; from definition of the Ht(q,s)(d)H_{t}^{(q,s)}(\mathbb{R}^{d}) norms, the isometry (2.9), and the intertwining property R(Δ)(d1)/2=c~ΛdRR(-\Delta)^{(d-1)/2}=\tilde{c}\Lambda_{d}R, we have

φnR𝔤H(d1)/2(q,2+(d1)/2)(d)=(Δ)(d1)/2(φnR𝔤)H(d1)/2(q,2(d1)/2)(d)\displaystyle\|\varphi_{n}-R^{\ast}\mathfrak{g}\|_{H_{(d-1)/2}^{(q,2+(d-1)/2)}(\mathbb{R}^{d})}=\|(-\Delta)^{(d-1)/2}(\varphi_{n}-R^{\ast}\mathfrak{g})\|_{H_{-(d-1)/2}^{(q,2-(d-1)/2)}(\mathbb{R}^{d})}
=R(Δ)(d1)/2(φnR𝔤)H(q,2)(d)=c~ΛdRφnΛdRR𝔤H(q,2)(d)\displaystyle=\|R(-\Delta)^{(d-1)/2}(\varphi_{n}-R^{\ast}\mathfrak{g})\|_{H^{(q,2)}(\mathbb{P}_{d})}=\tilde{c}\|\Lambda_{d}R\varphi_{n}-\Lambda_{d}RR^{\ast}\mathfrak{g}\|_{H^{(q,2)}(\mathbb{P}_{d})}
=c~cdcd1ΛdRφncd1ΛdRR𝔤H(q,2)(d)=c~cdcd1ΛdRφn𝔤H(q,2)(d).\displaystyle=\tilde{c}c_{d}\|c_{d}^{-1}\Lambda_{d}R\varphi_{n}-c_{d}^{-1}\Lambda_{d}RR^{\ast}\mathfrak{g}\|_{H^{(q,2)}(\mathbb{P}_{d})}=\tilde{c}c_{d}\|c_{d}^{-1}\Lambda_{d}R\varphi_{n}-\mathfrak{g}\|_{H^{(q,2)}(\mathbb{P}_{d})}.

In the last line we have used the inversion formula (A.12).

Without loss of generality, we can choose φnCc(d)\varphi_{n}\in C_{c}^{\infty}(\mathbb{R}^{d}), for instance by noting that Cc(d)C_{c}^{\infty}(\mathbb{R}^{d}) is dense in 𝒮(d)\mathcal{S}(\mathbb{R}^{d}) in the Schwartz topology and that 𝒮(d)H(d1)/2(q,2+(d1)/2)(d)\mathcal{S}(\mathbb{R}^{d})\subset H_{(d-1)/2}^{(q,2+(d-1)/2)}(\mathbb{R}^{d}) is a continuous embedding. Thus

Iα(t)μt,φnd+α(t)Jt,φnddt=0.\int_{I}\alpha^{\prime}(t)\langle\mu_{t},\varphi_{n}\rangle_{\mathbb{R}^{d}}+\alpha(t)\langle J_{t},\nabla\varphi_{n}\rangle_{\mathbb{R}^{d}}\,dt=0.

Recalling Rφn=θrRφnR\nabla\varphi_{n}=\theta\partial_{r}R\varphi_{n}, and applying the duality formulae for bounded measures (2.3) and distributions (2.13),

cd1Iα(t)μ^t,ΛdRφnd+α(t)θJ^t,rΛdRφnddt=0\displaystyle c_{d}^{-1}\int_{I}\alpha^{\prime}(t)\langle\widehat{\mu}_{t},\Lambda_{d}R\varphi_{n}\rangle_{\mathbb{P}_{d}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t},\partial_{r}\Lambda_{d}R\varphi_{n}\rangle_{\mathbb{P}_{d}}\,dt=0

Let I~\tilde{I} be the compact interval containing the support of α\alpha. Then

|Iα(t)μ^t,𝔤d+α(t)θJ^t,r𝔤ddt|\displaystyle\left|\int_{I}\alpha^{\prime}(t)\langle\widehat{\mu}_{t},\mathfrak{g}\rangle_{\mathbb{P}_{d}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t},\partial_{r}\mathfrak{g}\rangle_{\mathbb{P}_{d}}\,dt\right|
=|I~α(t)μ^t,cd1ΛdRφn𝔤d+α(t)θJ^t,r(cd1ΛdRφn𝔤)ddt|\displaystyle\qquad=\left|\int_{\tilde{I}}\alpha^{\prime}(t)\langle\widehat{\mu}_{t},c_{d}^{-1}\Lambda_{d}R\varphi_{n}-\mathfrak{g}\rangle_{\mathbb{P}_{d}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t},\partial_{r}(c_{d}^{-1}\Lambda_{d}R\varphi_{n}-\mathfrak{g})\rangle_{\mathbb{P}_{d}}\,dt\right|
αC1(I~)(|μ^|TV(I~×d)+|J^|TV(I~×d))cd1ΛdRφn𝔤C1(d).\displaystyle\qquad\leq\|\alpha\|_{C^{1}(\tilde{I})}(|\widehat{\mu}|_{TV(\tilde{I}\times\mathbb{P}_{d})}+|\widehat{J}|_{TV(\tilde{I}\times\mathbb{P}_{d})})\|c_{d}^{-1}\Lambda_{d}R\varphi_{n}-\mathfrak{g}\|_{C^{1}(\mathbb{P}_{d})}.

By Definition 3.6 (ii), |μ^|TV(I~×d)+|J^|TV(I~×d)<+|\widehat{\mu}|_{TV(\tilde{I}\times\mathbb{P}_{d})}+|\widehat{J}|_{TV(\tilde{I}\times\mathbb{P}_{d})}<+\infty. Thus from (B.8) we obtain (B.7).

Step 2o. Let B𝕊d1(θ0,ε)={ω𝕊d1:|ωθ0|ε}B_{\mathbb{S}^{d-1}}(\theta_{0},\varepsilon)=\{\omega\in\mathbb{S}^{d-1}\mathrel{\mathop{\mathchar 58\relax}}|\omega-\theta_{0}|\leq\varepsilon\}, and let B𝕊d1e(θ0,ε)=B𝕊d1(θ,ε)B𝕊d1(θ,ε)B_{\mathbb{S}^{d-1}}^{e}(\theta_{0},\varepsilon)=B_{\mathbb{S}^{d-1}}(\theta,\varepsilon)\cap B_{\mathbb{S}^{d-1}}(-\theta,\varepsilon). Then 𝟙B𝕊d1e(θ0,ε)L1(𝕊d1)\mathds{1}_{B_{\mathbb{S}^{d-1}}^{e}(\theta_{0},\varepsilon)}\in L^{1}(\mathbb{S}^{d-1}) is an even function. A standard approximation argument using smooth cutoff function in the θ\theta-variable using the continuity equation in the Radon space (B.7) we deduce that for any θ0𝕊d1\theta_{0}\in\mathbb{S}^{d-1}, ε>0\varepsilon>0, and 𝔤Cc(d)\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d})

(B.9) B𝕊d1e(θ0,ε)Iα(t)μ^tθ,𝔤θ+α(t)θJ^tθ,r𝔤θdtdθ=0.\int_{B_{\mathbb{S}^{d-1}}^{e}(\theta_{0},\varepsilon)}\int_{I}\alpha^{\prime}(t)\langle\widehat{\mu}_{t}^{\theta},\mathfrak{g}^{\theta}\,\rangle_{\mathbb{R}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t}^{\theta},\partial_{r}\mathfrak{g}^{\theta}\rangle_{\mathbb{R}}\,dt\,d\theta=0.

Indeed, as no derivatives in θ\theta appear in the continuity equation, the passage to the limit is justified by the dominated convergence theorem.

As the integrand in (B.9) with respect to dθd\theta is clearly L1L^{1}, for each 𝔤Cc(d)\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d}) and αCc(I)\alpha\in C_{c}^{\infty}(I) the Lebesgue differentiation theorem yields that there exists a null set 𝒩𝔤,α𝕊d1\mathcal{N}_{\mathfrak{g},\alpha}\subset\mathbb{S}^{d-1} such that

Iα(t)μ^tθ,𝔤θ+α(t)θJ^tθ,r𝔤θdt=0 for all θ𝒩𝔤.\displaystyle\int_{I}\alpha^{\prime}(t)\langle\widehat{\mu}_{t}^{\theta},\mathfrak{g}^{\theta}\,\rangle_{\mathbb{R}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t}^{\theta},\partial_{r}\mathfrak{g}^{\theta}\rangle_{\mathbb{R}}\,dt=0\text{ for all }\theta\not\in\mathcal{N}_{\mathfrak{g}}.

By separability of Cc(Ω)C_{c}^{\infty}(\Omega) for Ω=I,d\Omega=I,\mathbb{P}_{d} we can find a null set 𝒩𝕊d1\mathcal{N}\subset\mathbb{S}^{d-1} such that the above holds for all 𝔤Cc(d)\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d}) and αCc(I)\alpha\in C_{c}^{\infty}(I).

As for every ψCc()\psi\in C_{c}^{\infty}(\mathbb{R}) one can find 𝔤Cc(d)\mathfrak{g}\in C_{c}^{\infty}(\mathbb{P}_{d}) with 𝔤θ=ψ\mathfrak{g}^{\theta}=\psi we conclude that for all θ𝒩\theta\not\in\mathcal{N}

Iα(t)μ^tθ,ψ+α(t)θJ^tθ,rψdt=0 for all αCc(I),ψCc().\int_{I}\alpha^{\prime}(t)\langle\widehat{\mu}_{t}^{\theta},\psi\,\rangle_{\mathbb{R}}+\alpha(t)\langle\theta\cdot\widehat{J}_{t}^{\theta},\partial_{r}\psi\rangle_{\mathbb{R}}\,dt=0\text{ for all }\alpha\in C_{c}^{\infty}(I),\psi\in C_{c}^{\infty}(\mathbb{R}).