Geometry and analytic properties of the sliced Wasserstein space
Abstract.
The sliced Wasserstein metric compares probability measures on by taking averages of the Wasserstein distances between projections of the measures to lines. The distance has found a range of applications in statistics and machine learning, as it is easier to approximate and compute in high dimensions than the Wasserstein distance. While the geometry of the Wasserstein metric is quite well understood, and has led to important advances, very little is known about the geometry and metric properties of the sliced Wasserstein (SW) metric. Here we show that when the measures considered are “nice” (e.g. bounded above and below by positive multiples of the Lebesgue measure ) then the SW metric is comparable to the (homogeneous) negative Sobolev norm . On the other hand when the measures considered are close in the infinity transportation metric to a discrete measure, then the SW metric between them is close to a multiple of the Wasserstein metric. We characterize the tangent space of the SW space, and show that the speed of curves in the space can be described by a quadratic form, but that the SW space is not a length space. We establish a number of properties of the metric given by the minimal length of curves between measures – the SW length. Finally we highlight the consequences of these properties on the gradient flows in the SW metric.
Keywords: Sliced Wasserstein Distance, Optimal Transport, Radon Transform, Gradient Flows in Spaces of Measures
MSC (2020): 49Q22, 46E27, 60B10, 44A12
Notation
-
– the set we call the Radon domain is defined as , the quotient space for the equivalence relation ; see (1.5)
-
– the Radon transform of function ; see (1.4)
-
– the dual Radon transform of function ; see (2.1)
-
with or – the Schwartz class of functions; see (2.5)
-
with or – the set of tempered distributions on
-
with or – the set of locally finite Borel measures on ; see (2.4)
-
with or – the set of bounded Borel measures on
-
, – sets of probability measures with bounded -th moments; see (2.15)
-
– the -transportation distance; see (1.1). We write for the Wasserstein distance .
-
– the -sliced Wasserstein distance; see (1.3). We write for .
-
– set of transport plans between probability measures ; see (1.1)
-
– set of slice-wise transport plans between probability measures ; see (2.18)
-
– set of optimal transport plans between for the quadratic cost; see (2.17)
-
– optimal transport map for quadratic cost between and
-
– set of slice-wise optimal transport plans between ; see (2.19)
-
– slice-wise optimal transport map from to
-
– the -dimensional Fourier transform of a function
-
– the slice-wise -dimensional Fourier transform of
-
– space of admissible fluxes for the continuity equation; see (3.1)
-
– set of suitable distributional solutions of the continuity equation; see Definition 3.6
-
with or – action of a distribution on on a function on ; see (2.14)
1. Introduction
The sliced Wasserstein distance, introduced by Rabin, Peyré, Delon, and Bernot [51], compares probability measures on by taking averages of the Wasserstein distances between projections of the measures to each 1-dimensional subspaces of . Thanks to its lower sample and computational complexity relative to the Wasserstein distance, the sliced Wasserstein distance and its variants [20, 28, 9, 45, 48, 4, 43, 42] have recently expanded its applications in statistics [44, 34, 40, 37, 41] and machine learning [29, 10, 35, 18, 21, 20, 30, 31] as a tool to compare measures and construct paths in spaces of measures. For , we write (or the distance) to refer to the -sliced Wasserstein distance, and refer to the corresponding space as the space. When , we drop the subscript and simply refer to them as the distance and the space.
Despite the multitude of uses of the sliced Wasserstein distance there are only a few works dealing with its metric and geometric properties: Bonnotte [12] established that is indeed a metric, and is equivalent to for measures supported on a common compact set for all ; more recently, Bayraktar and Guo [5] showed that and induce the same topology on for . This is in stark contrast with the Wasserstein metric, whose geometry has been a subject of intense study and has led to important advances; see [1, 54, 65].
Here we take steps towards a better understanding of the sliced Wasserstein distance and its geometry. In particular we show that for measures that are absolutely continuous with respect to the Lebesgue measure, have bounded density, and differ only within a set compactly contained in the interior of their support, the SW metric is comparable to the (homogeneous) negative Sobolev norm ; see Theorem 5.2. On the other hand when the measures considered are close in the infinity-transportation metric to a discrete measure, the SW metric between them is close to a multiple of the Wasserstein metric (Theorem 5.5). We show that, unlike the Wasserstein space, the SW space is not a length space. Nevertheless, it still has a tangential structure that resembles the one of the Wasserstein space. We also show that geodesics (considered as length minimizing curves) in SW exist and study the intrinsic metric , defined as the length of the minimizing geodesics between measures. In particular we show that satisfies some of the similar comparison and approximation properties as the SW metric. Finally we discuss the consequences of these properties to gradient flows with respect to the SW metric.
1.1. Setting
For we denote by the space of all Borel probability measures on . Let . For probability measures , the -Wasserstein distance, , is defined as follows:
(1.1) |
To define the sliced Wasserstein distance, we introduce the following notation: for each , let , and define the projection by
(1.2) |
The -sliced Wasserstein distance is defined by
(1.3) |
The Radon transform provides a natural language to describe objects relating to the sliced Wasserstein distance. Consider an integrable function , for . We use both and to denote its Radon transform: For and
(1.4) |
where and is the -dimensional Lebesgue measure on . By Fubini’s theorem, exists for a.e. for each when .
Note that is even on , meaning that . This motivates defining the -dimensional “Radon domain” by
(1.5) |
The Radon transform can be extended to distributions (see [25, Chapter 1.5] and [57]); in particular, when is a bounded measure, the distributional extension is consistent with the definition of as a pushforward of by the projection map (see Remark A.1). Thus, for we have , and
Note that whereas ; we will sometimes use the latter when it is more convenient to consider measures on subspaces of than on .
Henceforth we will focus on the case , and write and .
1.2. Summary of results
We obtain a number of geometric and analytic properties of the distance and of the associated length space, and investigate their implications on the sliced Wasserstein gradient flows and statistical estimation rates in the metrics. Throughout this paper, we use as a shorthand for the inequality , where is a finite positive constant depending on . When summarizing results or making remarks, we sometimes omit the dependence on the parameters and write for the sake of simplicity; however, all rigorous statements contain clear characterizations of the constants.
Basic properties. In Section 2 we establish some basic properties of the metric. In particular in Proposition 2.4 we show that is a complete metric space, which we refer to as the space. We then turn to the intrinsic geometry of the space. Example 2.5 shows that, unlike the Wasserstein space, the space is not a geodesic space. That is, one cannot in general find a continuous curve connecting two measures in the SW space with its length equal to the distance between the measures.
Tangential structure of sliced Wasserstein space. We show in Section 3 that the space has a tangent structure which resembles the tangent structure of the Wasserstein space. Recall that in the Wasserstein space, each absolutely continuous curve defined on an interval corresponds to a measure-valued distributional solution of the continuity equation
where is the metric derivative w.r.t ; see [1, Theorem 8.3.1]. Theorem 3.9 establishes an analogous result for the sliced Wasserstein space – for each absolutely continuous curve in the sliced Wasserstein space, there exists a vector-valued flux such that is in a suitable subspace of and
We note two key differences: the metric derivative is characterized by the weighted norm in the Radon domain, finiteness of which does not imply in general (see Remark 3.10). Moreover, Sharafutdinov’s results of Radon transform on Sobolev spaces [57] imply that , thus we can formally understand as corresponding to a weighted high order negative Sobolev norm of the flux , in contrast to the weighted -norm in the Wasserstein case. Hence, at least for absolutely continuous measures, the formal Riemannian metric measuring infinitesimal length in the sliced Wasserstein space corresponds to a weaker space than the one for the Wasserstein metric. Furthermore in Section 3.2, we characterize its tangent space which has the following key property analogous to its Wasserstein counterpart: Each absolutely continuous curve is associated to a unique (up to a -null set) family of tangent vectors , which moreover attain the metric derivative through the quadratic form .
Intrinsic sliced Wasserstein length space. In Section 4, motivated by the general of lack geodesics in , we introduce the sliced Wasserstein length metric defined as the infimum of the lengths of curves between measures in the SW space. We establish the basic properties of the metric space ; in particular we prove in Proposition 4.5 that geodesics exist, which further implies that in general .
Comparison of sliced Wasserstein metric with negative Sobolev norms and Wasserstein metric. In Section 5 we establish some of the key results of this paper, namely the comparison theorems of metric with negative Sobolev norms near absolutely continuous measures and comparisons of with the Wasserstein metric near discrete measures. In particular, consider an absolutely continuous measure bounded away from zero and infinity on some bounded open convex domain . Theorem 5.2 establishes that
for all measures which are bounded above and below by constant multiples of and coincide with near the boundary of . In other words we show that near , is equivalent to .
These two results provide interesting insights about the metric. Near smooth measures it behaves like a highly negative Sobolev space, in contrast to the Wasserstein metric which for such measures behaves like the norm as noted by Peyre [49], while near discrete measures behaves like the Wasserstein distance.
Approximation by discrete measures in sliced Wasserstein length. Manole, Balakrishnan, and Wasserman [37, Proposition 4] have shown that a finite random sample (i.e. the empirical measure of the set of random points) of a probability measure on estimates the measure in the sliced Wasserstein distance at a parametric rate for a large class of measures; see also [41]. This is in stark contrast with the Wasserstein distance where the approximation error is poor in high dimensions and scales like . We start by pointing out a connection between the results on the parametric finite-sample estimation in the sliced Wasserstein distance and the results in statistical literature, that our results in Section 5 identify. Namely it is known that finite-sample estimation of measures with respect to maximum mean discrepancy (MMD) also enjoys parametric rate [61, Theorem 3.3]. MMD distance is nothing but the norm in the dual of a reproducing kernel Hilbert space (RKHS). In particular the results of [61] apply to the dual of the Sobolev space with (when the spaces embeds in the spaces of Hölder continuous functions and are RKHS). Our Theorem 5.2 says that near absolutely continuous measures, SW behaves like the -norm; as the associated norm is an MMD, we can formally understand to exhibit behaviors like an MMD. Thus the MMD parametric estimation can be seen as a tangential or a linearized analogue of the finite sample estimation rates in SW distance.
Here we investigate the finite sample estimation rates in the SW intrinsic length metric . The goal is to gain a better understanding of the extent to which and share properties. In Theorem 6.3, we establish that the finite sample approximation in happens at the parametric rate up to a logarithmic correction, namely that
While this is consistent with the geometric view of as a curved or nonlinear dual of Reproducing Kernel Hilbert Space (see beginning of Section 6 for discussion), the statement and the proof requires dealing with discrete measures where such heuristic view does not hold.
Implications on gradient flows. Section 7 applies the comparison results on , to obtain comparisons for the metric slopes. Given a metric space , recall that metric slope of a functional is defined by
(1.6) |
Let and consider the potential energy . Proposition 7.2 states that when is smooth and compactly supported, for suitable absolutely continuous it holds that
whereas Proposition 7.5 shows that the slope behaves quite differently at discrete measures, , namely that
By considering a sequence of discrete measures converging to an absolutely continuous measure , we may deduce that (resp. ) is not lower semicontinuous in (resp. ) in general, even when ; see Corollary 7.7. This implies that the potential energy is not -geodesically convex in . Furthermore we observe in Remark 7.6 that starting from discrete measures with finite number of particles, the curves of maximal slope in the Wasserstein space, after a constant rescaling of time, are the curves of maximal slope in the SW space.
On the other hand, for smooth measures, the curves of maximal slope with respect to the Wasserstein metric are not curves of maximal slope in the SW space. We formally show that SW gradient flows of the potential energy satisfy a higher order equation given by a pseudodifferential operator of order , which is consistent with the rigorous results of Proposition 7.2. We conclude that the framework of gradient flows in metric spaces would not be the right tool to study such equations; PDE based approaches may provide an avenue for creating a well-posedness theory, which remains an open problem.
1.3. Related works
Since the introduction of the sliced Wasserstein distance [51], numerous variants have been considered. Deshpande et al. [20] proposed the max-sliced Wasserstein distance (max-SW distance), which is the maximum of the 1D Wasserstein distances, instead of the average as in the case. Niles-Weed and Rigollet [45] and Paty and Cuturi [48] independently proposed the -dimensional generalization (max--SW distance) for . Generalizations to spherical [9] and other nonlinear projections [28] have also been considered to more effectively capture the geometric structure of data. Based on the ideas of partial optimal transportation [23], Bai, Schmitzer, Thorpe, and Kolouri [4] introduced sliced optimal partial transport to compare of measures with different masses. Further projection-based transport metrics include the distributional sliced Wasserstein distance introduced by Nguyen, Ho, Pham, and Bui [43] and the convolution sliced Wasserstein distance proposed by Nguyen and Ho [42].
The sliced Wasserstein distances have found numerous applications in image processing. In fact, utility of the sliced Wasserstein barycenter for tasks such as image synthesis, color transfer, and texture mixing served as a motivation behind the introduction of sliced Wasserstein distance [51]. Bonneel, Rabin, Peyré, and Pfister [11] further studied efficient numerical methods to compute sliced Wasserstein and related barycenters, and their applications. Kolouri, Park, and Rhode proposed the Radon cumulative distribution transport (Radon CDT) [29] for image classification; Radon CDT effectively computes the sliced Wasserstein ‘geodesic’, by taking the Radon inverse of the displacement interpolation between the Radon transform of the measures. However, we note that such inverse will in general fail to be a curve in the space of probability measures, as the Radon inverse of nonnegative functions need not be nonnegative.
Gradient flows related to the sliced Wasserstein distance have been applied to various machine learning and image processing tasks. Bonnotte noticed [12] that the continuous analogue of the the isotropic Iterative Distribution Transfer (IDT) algorithm, introduced by [50] to transfer the color palette of a reference picture to a target picture, is the Wasserstein gradient flow of . Liutkus et al. [35] utilizes the Wasserstein gradient flows of the entropy-regularized version of the same energy functional for generative modelling. Gradient flow of the same energy in the sliced Wasserstein space have been considered by Bonet et al. [10] also for generative modelling; they also study the JKO scheme with respect to , and establish existence and uniqueness of minimizers of the scheme when the optimization is restricted to probability measures supported on a common compact set [10, Section 3.2]. Sliced Iterative Normalizing Flows (SINF) [18], useful for sampling and density evaluation, can be seen as a max-SW variant of the isotropic IDT algorithm.
Other applications in machine learning include: sliced Wasserstein generative adversarial nets by Deshpande, Zhang, Schwing [21]; max-SW generative adversarial nets [20] for generative modelling; sliced Wasserstein autoencoder by [30]; and use of distance for unsupervised domain adaptation [31].
On the statistical side, Manole, Balakrishnan, and Wasserman [37] established, based on the 1-dimensional results by Bobkov and Ledoux [8], the parametric estimation rate for the empirical measure of i.i.d samples of , and further investigated statistical properties of the trimmed sliced Wasserstein distances. Nietert, Goldfeld, Sadhu, and Kato [44] established empirical estimation rate in and max- for log-concave distributions with explicit constants dependent on the intrinsic dimension, and showed robustness to data contamination and explored efficient computational methods. Lin, Zheng, Chen, Cuturi, and Jordan [34] investigated the max--sliced distances and their corresponding integral variants integral projection robust Wasserstein (IPRW) distance, also known as the -sliced Wasserstein distances (-SW distances), and established several statistical properties including sample complexity . More recently, Olea, Rush, Velez, and Wiesel [46] explored the connection between a certain linear predictor problems and distributionally robust optimization based on a modified max-SW. For applications in Approximate Bayesian Computation, we refer the readers to [40, Chapter 4] and the references therein.
Regarding analytic and topological properties, Bonnotte [12] showed that is indeed a distance on for , and established that, for measures supported on – i.e. compactly contained in – we have
More recently, Bayraktar and Guo [5] showed that , , and the -max-sliced Wasserstein distance induce the same topology on for . We note here that this does not directly imply completeness of , as not all Cauchy sequences in need be Cauchy in .
Bonnotte also showed the existence of the Wasserstein gradient flow of the energy functional for the target measure , despite the lack of geodesic convexity of the energy functional, and derived the corresponding PDE [12, Chapter 5], the continuous-time version of the previously mentioned isotropic IDT algorithm. Due to the lack of convexity of the energy functional, even the asymptotic convergence of the gradient flow remains open. Nevertheless, Li and Moosmüller [33] recently established almost sure convergence of the discrete isotropic IDT algorithm with step-sizes satisfying certain summability conditions. More recently, Cozzi and Santambrogio [17] established the convergence rate when the target measure is any isotropic Gaussian.
As our work was nearing completion, we became aware of the independent work by Kitagawa and Takatsu [26] on the sliced Wasserstein spaces. In their work, Kitagawa and Takatsu establish in the metric completeness of sliced optimal-transportation-based spaces, which generalizes our Proposition 2.4, and also demonstrate that the SW spaces are not geodesic spaces, generalizing our Example 2.5. Their work focuses on isometrically embedding the SW type spaces into larger spaces and the barycenter problem. See also [27] for their more recent work on disintegrated optimal transport for metric fiber bundles.
2. Basic properties of the sliced Wasserstein space
In this section, we examine the basic properties of the sliced Wasserstein space . We start by reviewing a few properties of the Radon transform that we use (Section 2.1). In Section 2.2 we establish basic metric properties including lower semicontinuity of and precompactness of balls in with respect to the narrow topology, from which completeness follows. We conclude the section by noting that, unlike the Wasserstein space, the sliced Wasserstein space is not a geodesic space.
2.1. Preliminaries on the Radon transform
Here we provide a brief overview of the key properties of the Radon transform. We refer the readers to Appendix A for precise statements and to the book by Helgason [25] for a more thorough introduction.
The dual Radon transform. Given an integrable function we define its dual Radon transform, which we write or , by
(2.1) |
As
by Fubini’s theorem is well-defined for -a.e. whenever . Furthermore, the dual transform satisfies
(2.2) |
whenever either or are absolutely integrable; see [25, Lemma 5.1] for further details. In particular, the extension of the Radon transform to finite measures as the pushforward of under the map is consistent with (2.2) (see Remark A.1). Consequently, we will often use the duality formula for bounded measures in the form
(2.3) |
Spaces related to the Radon transform. To add clarity, we denote the functions defined on by a different set of symbols – e.g. . We denote by the space of locally finite signed Borel measures on . We note that can be identified with
(2.4) |
We write for the space of bounded Borel measures on . Any is a Polish space, hence can be equivalently understood as a space of signed Radon measures. Finally, we denote by the space of vector valued Radon measures.
We will mostly treat as a parameter and as the variable, which is reflected in our notation. For instance, for a function we write . Denoting by the normalized volume measure on satisfying , for each we write for its disintegration with respect to – i.e. ; for precise statement of the disintegration theorem, see [19, III-70] or [1, Theorem 5.3.1]. We will always consider with its first marginal equal to .
We denote by with the Schwartz-Bruhat space of smooth rapidly decreasing functions [15, 47]. We note that is the usual Schwartz class, whereas can be identified with the subspace of of even functions, namely the set
(2.5) |
We write with to denote the space of continuous linear functionals on –i.e. the space of tempered distributions on .
The -Sobolev theory of Radon transforms will be crucial in understanding the differential structure of the sliced Wasserstein space. For this purpose, we use Sobolev spaces with attenuated () or amplified () low frequencies, introduced by Sharafutdinov [57]. For each let us denote by the -dimensional Fourier transform
For and , the Hilbert space is defined as the completion of under the norm
(2.6) |
Similarly we define the analogous space for and in the Radon domain as the completion of Schwartz functions on the Radon domain under the norm
(2.7) |
Here and in the sequel the one-dimensional Fourier transform always applies to the scalar variable when applied to functions defined on . We only use the norms or for . In the first case, we can view the norm as counting derivatives of order between and , as
Thus when we see coincides with the standard Sobolev space of order . On the other hand, the space can be understood as the dual of . Indeed, for any
(2.8) |
for details, see [57, Theorem 5.3] and its proof. We provide further information on the relationship between the Radon transform and Sobolev spaces in Appendix A.
Outside of Section 2 we will mostly be interested in the case , where with or is equivalent to the more familiar homogeneous Sobolev space.
We only consider the space where , and for which ensures that the identity map continuously embeds to and the same holds for and ; see [57, Theorem 5.3], which we have included in the appendix (Theorem A.11) for completeness. Thus, for , the spaces can be seen as a complete normed subspace of . We stress that, while we use the norm in comparison to , generic elements of spaces with are not considered in this paper.
Furthermore, when , continuously embeds to by Gagliardo-Nirenberg-Sobolev inequality for fractional Sobolev spaces; see for instance [32, Theorem 11.31]. Thus, we can consider as a space of functions in this case.
We note that in some works the definition of homogeneous Sobolev spaces for differs from the one we use. Namely is defined as the subset of for which the seminorm (2.6) for is bounded; in this case, elements in are uniquely defined in modulo polynomials; see for instance [62, Remark 3, Section 5.1].
Sharafutdinov showed [57, Theorem 2.1] that the Radon transform can be extended as a bijective isometry between and – i.e. when
(2.9) |
The special case was observed by Reshetnyak, recorded in [24, Section 1.1.5] and also in [25, Chapter 1, Theorem 4.1]. Whenever , the -norm is stronger than the topology of , thus the continuous extension of the Radon transform applied to any function is unambiguously defined as an element of independently of and . Therefore in the remainder of this paper we refer to this extension simply as the Radon transform.
In Sections 5 and 6 we will make use of the weighted homogeneous Sobolev norm of order in dimension 1. Given , the -norm of is defined by
(2.10) |
Operators related to the Radon transform. Calculus using the Radon transform often involves , which is defined via
(2.11) |
When is even, the fractional power of the 1-dimensional Laplace operator is defined using the Hilbert transform; for precise definitions, see Definition A.7 in the appendix. Observe that is well-defined as an operator from to itself, and can be extended as a bounded operator from to for ; see Remark A.8.
The operators can be understood by their interaction with the Fourier transform, namely
(2.12) |
We also use fractional powers of the Laplace operator in -dimensions, which can be defined via the Fourier transform by
Again, observe that is well-defined as an operator from to itself, and can be extended as a bounded operator from to when . Rigorous definition of for fractional powers without relying on the Fourier transform can be found in [25, Chapter 7.6], which we have included in Proposition A.5 in the appendix for completeness. The inversion formulae are expressed using these operators: Setting , for each and we have
See Proposition A.9 for further details.
Whenever , we have and . In this case, straightforward calculations using the inversion formula and the Fourier transform imply
(2.13) |
For each of the domains , we will write
(2.14) |
2.2. Basic properties of sliced Wasserstein metric
In this section we establish some basic properties of the SW distance and the SW space.
Let be the set of Borel probability measures in the Radon domain with bounded second moment
(2.15) |
where is the projection in the first variable. In other words, is a family of measures in parametrized by additionally satisfying the evenness condition. Observe that for each , we can choose an orthonormal frame with and for . As , we have
Thus
Equivalently,
(2.16) |
Thus the finite second moment condition in the Euclidean and Radon domain coincide. Hence, if and only if .
Given , let be the set of optimal transport plans for the quadratic cost:
(2.17) |
where is as defined in (1.1). On the other hand, given , write
(2.18) |
where is the disintegration of with respect to . Then we define the set of slice-wise optimal transport plans by
(2.19) |
As noticed by Bonnotte [12], given , , and thus
We denote the optimal transport map between , if it exists, by – i.e. , where . Similarly, we denote by the family of optimal transport maps in the Radon domain – i.e. such that
Recall that the sequence in converges narrowly to if for each continuously bounded function
(2.20) |
We begin by establishing lower semicontinuity of with respect to the narrow convergence.
Proposition 2.1 ( is lower semicontinuous with respect to the narrow topology).
The map from to is lower semicontinuous with respect to the narrow topology.
Proof.
Note that the analogous statement for for is a classical result; see [65, Remark 6.12] for instance (where they refer to the narrow convergence as weak convergence). Clearly implies . Thus, if narrowly, then by Fatou’s lemma
Thus we deduce that is lower semicontinuous with respect to the narrow convergence. ∎
Remark 2.2 (Lack of compactness in ).
We note here that the closed unit ball in is not compact in the topology. The argument is analogous to one that shows that is not compact with respect to the topology of the Wasserstein metric. The argument is as follows: Consider
where and choose such that . Quick calculations show that while , the second moments do not converge, as
Thus . As and induce the same topology, we deduce that is indeed not compact with respect to the topology. ∎
On the other hand we note that balls in sliced Wasserstein space are compact with respect to the narrow convergence of measures:
Proposition 2.3 (Narrow compactness of the sliced Wasserstein unit ball).
Let be fixed. Then the closed unit ball
is compact with respect to the topology of narrow convergence.
Proof.
Recall from (2.16) that
Thus, for all , we have
Hence the second moment of probability measures in is uniformly bounded, hence is tight, as
Thus by Prokhorov’s theorem, for any sequence in we can find a subsequence narrowly converging to . Fix such a subsequence without relabling. Moreover , as
by lower semicontinuity of (Proposition 2.1). ∎
We can deduce completeness from weak compactness, lower semicontinuity, and the topological equivalence. While the authors believe this is known, we record the proof here as we could not locate the statement of completeness in the literature.
Proposition 2.4 (Completeness).
is a complete metric space.
Proof.
Suppose is a Cauchy sequence with respect to . Then we can find a closed unit ball in that contains all for sufficiently large , which is relatively compact by Proposition 2.3 hence has a subsequential narrow limit . Fix such a subsequence without relabeling. Then, by the lower semicontinuity established in Proposition 2.1,
By the triangle inequality we deduce for the original sequence in . ∎
We conclude this section by noting that the sliced Wasserstein space is not a geodesic space. Indeed, let , and suppose is a constant-speed -geodesic from to – i.e. for any , . Then for any and ,
Using this and the triangle inequality ,
As the integrand above is nonnegative, it is zero for a.e. . By for instance considering all rational sequences and arguing by density, we may deduce that for a.e. the curve must be a -Wasserstein geodesic between and , hence characterized by the displacement interpolation based on the 1D transport plan.
Thus the problem of identifying the geodesic comes down to invertibility of displacement interpolant . In principle, sufficient regularity of guarantees the existence of a function such that for . However, we additionally require .
While the Radon transform preserves nonnegativity, the inverse Radon transform does not. For example, consider
For sufficiently small we see , whereas near the origin. Mollifying within radius , we see that the additional regularity does not resolve the issue. In general, it is difficult to determine when the Radon inversion is nonnegative, as the inversion formula involves high order derivatives and . Indeed, in many cases the geodesic cannot exist, as we can see in the following example.
Example 2.5 ( is not a geodesic space).
Consider and . Suppose is the constant speed geodesic from to . As the quadratic cost is strictly convex, for each , should be the displacement interpolation between and . Considering in particular, should satisfy
However, this is impossible: The first and the third line imply that must be a convex combination of , and , which contradicts the second requirement. By continuity of , similar properties hold for close to , which shows that the geodesic cannot exist.
Note that regularity is not the only issue; by convolving with a smooth kernel with a sufficiently small radius, we can argue similarly that there cannot be a geodesic.
In fact, we will see in Corollary 4.8 that is not even a length metric on – i.e. in general it cannot be approximated by the sliced Wasserstein length of absolutely continuous curves. This motivates us to investigate the length (and geodesic) metric induced by on .
3. Curves in the sliced Wasserstein space and the tangential structure
In this section we study absolutely continuous curves in the SW space. In Section 3.1 we investigate the sliced Wasserstein metric derivative and prove the main result of this section, Theorem 3.9, characterizing absolutely continuous curves by corresponding distributional solutions of the continuity equation in the flux form
Section 3.2 characterizes the tangent space . The main result regarding the tangent space, Proposition 3.12, states that if is a solution of the continuity equation corresponding to an absolutely continuous curve, attains if and only if , up to a -null set.
While these results are reminiscent of the analogous statements for the Wasserstein space [1, Proposition 8.4.5], we note here a few differences. Firstly, from Theorem 3.9 we know that a generic absolutely continuous curve admits the representation for some , where
(3.1) |
We will see in Lemma 3.4 that the above space is well-defined as a subspace of , and that
In general need not be measures, in which case it does not even make sense to consider the Radon-Nikodym derivative . Thus the distributional fluxes will be the main object on which the tangential structure is based. Moreover, not all fluxes in preserve the nonnegativity of . Therefore the tangent vectors attainable by curves in the space forms a (convex) cone, whereas the tangent space of the 2-Wasserstein space is a vector space. Thus, we can formally consider the space as a manifold with corners; see Remark 3.13 for further details.
3.1. Absolutely continuous curves in the sliced Wasserstein space
Let be a complete metric space. We say a curve belongs to if there exists such that
(3.2) |
Furthermore, for any , the metric derivative
(3.3) |
exists for -a.e. , and . We will often write to denote an interval, which is assumed to be open but not necessarily bounded, unless otherwise stated. When there is no room for confusion regarding the interval, we simply write
Prior to studying the length associated to , let us first investigate the metric derivative . As for any , it immediately follows that
for all at which both metric derivatives are well-defined.
Recall [1, Theorem 8.3.1] that each absolutely continuous curve in the Wasserstein space has a corresponding distributional solution of the continuity equation such that
We want to establish an analogous result for the sliced Wasserstein space. Suppose is well-defined at all . Then
Assume that satisfy the continuity equation
From direct calculations one can readily verify (see Proposition A.6 for the proof)
(3.4) |
Formally applying the Radon transform to the continuity equation and using (3.4), we obtain
Rewriting in the velocity formulation,
Thus, by applying [1, Theorem 8.3.1] for each , we deduce that formally
(3.5) |
Observe that is even in , hence is a function on . For simplicity we will often write instead. Furthermore, note that the existence of the velocity that saturates the inequality is nontrivial; for a.e. the projection of the velocity must saturate the corresponding 1D inequality.
Before we begin investigating the metric derivative in detail, let us first consider examples that compare and ; Example 3.1 demonstrates that they coincide for paths of discrete measures, whereas Example 3.2 shows that the ratio is in general not bounded from below.
Example 3.1 (Discrete measures).
Let where is continuously differentiable for each . Then
Indeed,
Example 3.2 (Two sliding lines).
Consider two parallel line segments close to each other moving in opposite directions. A significant portion of the shearing velocity is cancelled out after projection, causing a significant gap between the metric derivatives with respect to and .
More precisely, let and let uniformly distributed on a segment with endpoints . Similarly, is the analogous measure slightly below, with endpoints . Then define and –i.e. is translated to the right and to the left.
Defining , one can check by direct calculations that
We begin the rigorous study of the metric derivative by considering Benamou-Brenier functional for the sliced Wasserstein distance. Consider the 1D Benamou-Brenier functional for valued flux, namely defined by
(3.6) |
enjoys several desirable properties such as joint convexity in the arguments and the lower semicontinuity with respect to the narrow convergence. See [54, Section 5.3.1] for a more complete list of properties (note they refer to narrow convergence as weak convergence). By (3.5) we expect
Thus it is natural to define the Benamou-Brenier functional for the sliced Wasserstein distance by
(3.7) |
Note that only depends on the Radon transforms of the inputs. By definition (3.1) of , hence its disintegration with respect to is well-defined for a.e. , so is the integral in (3.7). In general
and is not necessarily a (vector-valued) measure. However, the Radon transform is well-defined for such , unlike for general tempered distributions; see Remark A.1 for further details. We will see later in Lemma 3.4 that fluxes of interest lie in .
We record some basic properties that inherits from the Benamou-Brenier functional .
Proposition 3.3 (Properties of the ).
The functional is convex, and satisfies the following properties:
-
(i)
Let and be such that and narrowly converge to in . Then
(3.8) -
(ii)
-
(iii)
only if and for a.e.
-
(iv)
Let and suppose for a.e. . Then we can write
Proof.
The following lemma relates attenuated Sobolev norms and , and shows that the domain of interest is a subset of . Given , we henceforth write to say that and are parallel.
Lemma 3.4 ( and ).
Let be such that is a finite measure. Then and there exists such that and for all
(3.9) |
If instead and satisfy
-
(i)
(-upper bound)
-
(ii)
(parallel to ) for -a.e.
then is a finite measure thus there exists such that . Moreover, for each there exists such that
(3.10) |
Remark 3.5.
Proof.
Let and first consider the case is a finite measure. Then for each test function and for any ,
where we have used the Sobolev embedding theorem in one dimension in the last line. Thus and (2.9) implies the existence of such that
To conclude (3.9) it suffices to note that is independent of the choice of . Indeed, suppose , then (2.13) implies that for all
On the other hand, suppose (i) and (ii) hold. By (ii) , which by (i) is finite a.e. . Thus, by analogous property of to Proposition 3.3 (iii), we may write for some satisfying and . By Jensen’s inequality,
thus by the previous part we can find with . Finally, (3.10) follows directly from property (ii) and (3.9). ∎
We characterize absolutely continuous curves in the sliced Wasserstein space by identifying them with solutions of the continuity equation, defined in the following way.
Definition 3.6.
Let be an open interval. We denote by the set of all pairs satisfying the following conditions:
-
(i)
The curve is narrowly continuous in with respect to ;
-
(ii)
for some vector-valued Borel measure where the inverse Radon transform applied in the -variable and
(3.11) where admits the disintegration ;
-
(iii)
is a distributional solution of the continuity equation – i.e.
Moreover, we write the set of pairs satisfying and where and .
Remark 3.7.
Given satisfying (3.11), for each the disintegratation theorem with respect to implies that is well-defined and is bounded for a.e. . By considering an increasing countable sequence of such that , we may define for -a.e. . Then Lemma 3.4 allows us to define the Radon inverse for a.e. .
Furthermore, condition (ii) is not restrictive whenever , as
and we are only interested in absolutely continuous curves, for which we will see that the right-hand side is finite. ∎
We state a useful technical lemma, the proof of which we delay to Appendix B.
Lemma 3.8.
Let be an open interval. Let as defined in Definition 3.6. Then for a.e. , is a distributional solution of the continuity equation on – i.e.
(3.12) |
In case , then by approximation solve the continuity equation against test functions . Thus for each , we may take test functions of the form for any and deduce that (3.12) holds. However, in general enjoys less regularity, thus we need to instead rely on an approximation argument involving fine properties of the Radon transform. As the proof relies on technical tools that are irrelevant to other materials of this paper, we delay the proof to the appendix.
We are now ready to establish the main theorem of this section.
Theorem 3.9 (AC curves in the sliced Wasserstein metric space).
Let be as defined in (3.7), and an open interval.
-
(i)
Suppose satisfying . Then, for -a.e.
(3.13) and .
-
(ii)
Conversely, let . Then there exists
(3.14) such that and
(3.15)
Proof.
We adapt the proof of the analogous result by Ambrosio, Gigli, and Savaré [1, Theorem 8.3.1]; we leave the structure parallel to their proof, so readers can compare the objects arising in the sliced Wasserstein setting to the Wasserstein counterparts.
Step 1o. By assumption and thus we know is well-defined for a.e. . We may deduce by Proposition 3.3 (iii) that for a.e. and a.e. ; hence write and note
By Lemma 3.8, we have, in the sense of distributions,
Thus, by [1, Theorem 8.3.1] we see that for a.e. , each , and further
Combining, we have
Step 2o. It remains to prove the converse. Let . Then by the Radon inversion formula (A.11)
and thus for each , duality formula (2.3) implies
Let . Denoting by for each ,
Furthermore, define by
Then
Let and . Then
(3.16) |
where the Radon transform is applied in the spatial variable, and is any interval such that . Let
and let be the closure of under the norm
As , any vector evaluated at each time is parallel to – i.e. for all . By the estimate (3.16), we can extend the functional
uniquely to a bounded linear functional on , such that
(3.17) |
Consider the minimization problem
Note that the functional we are minimizing is the sum of a quadratic term and a bounded linear functional in , thus is coercive in and lower semicontinuous in the weak topology of . Thus, by the direct method of calculus of variations, this minimization problem admits a solution . Furthermore, the functional is strictly convex, thus this minimizer is unique. Moreover, the minimizer satisfies the Euler-Lagrange equation
As for -a.e. , by Lemma 3.4 we can find such that for -a.e. . From this we deduce satisfies the continuity equation in the sense of distributions, as (2.13) implies that for each , writing ,
In order to establish the pointwise inequality (3.15), first recall that, as for a.e. ,
Choose an interval and with , and such that converges to the minimizer in . By replacing with a suitable subsequence, we may further assume converges to in for -a.e. . Thus, using the bounds (3.16), we obtain
Letting , we have
As , by reorganizing and squaring both sides we obtain
As was arbitrary, we deduce
One can readily check that the pair satisfies all the conditions in Definition 3.6, hence is in ; see Remark 3.7. ∎
Remark 3.10.
The characterization of the space (3.14) in which the identified flux belong will be useful in characterizing the tangent space in Section 3.2. Note that this space consists of vectors satisfying thus assumption (ii) in Lemma 3.4 is not restrictive, as noted in Remark 3.5.
As we do not know in general if the flux is a vector-valued measure, the question of whether may not even make sense. Furthermore, even if is sufficiently regular, it is obtained by the Radon inversion and thus it is difficult to determine whether the flux is absolutely continuous with respect to the measure ; the finiteness of implies but not necessarily . For instance, let be a normalized measure on the unit sphere in and be suitable normalization of . Letting for , one can check that is an absolutely continuous curve in the SW space, as in each projection for all . On the other hand, for any the curve cannot be absolutely continuous in the Wasserstein space, as mass is created outside the support of . ∎
3.2. Tangential structure of the sliced Wasserstein space
Unlike absolutely continuous curves in the Wasserstein space which allow corresponding solutions of the form of the continuity equation, Theorem 3.9 only guarantees the solution of the continuity equation in the flux form (see Remark 3.10), where we only know in general. Furthermore, from the proof we saw that (3.14) must hold in order to ensure for a.e. . This motivates us to define the tangent space at after the space appearing in (3.14), namely
(3.18) |
In this section we highlight some properties of the tangent space. To begin, recall that in the case of the Wasserstein space, the tangent space satisfies the optimality property [1, Lemma 8.4.2]
We see that satisfies the analogous property.
Proposition 3.11 (Tangent space and optimality).
Let and such that . Then if and only if
(3.19) |
for all such that and (in the sense of distributions). Moreover, such minimizer is unique.
Proof.
Squaring both sides of (3.19), a simple scaling argument reveals that (3.19) is true if and only if for all such . Indeed, as we may apply the duality formula (2.13) to see
As in the sense of distributions, we see that if and only if is in the closure of , which characterizes .
Furthermore, uniqueness follows from strict convexity of the -norm and the linearity of the continuity equation in the flux. ∎
From this we deduce the following key property of the tangent space.
Proposition 3.12.
Let . For any , we have
In particular, such is determined uniquely for a.e. given .
Proof.
Remark 3.13 (Nonexistence of -geodesics in some directions).
We emphasize that not every flux can be attained as a velocity flux of an absolutely continuous curve. The following example illustrates that some fluxes are admissible while is not. Fix a small and consider
To allow sufficient regularity of the objects constructed from the Radon inversion, we consider ; choice of the convolution radius ensures that when then . On the other hand in the interior of its support as long as is chosen sufficiently small. Clearly , and thus by Theorem 3.9 we can find corresponding in the tangent space, and as is smooth, must also be a smooth function. As
proceeding in the direction from introduces negative mass in . From this we see is achievable as a tangent vector to a curve, but is not.
In general, not all fluxes in the tangent space vanish outside the support of , hence cannot be attained by absolutely continuous curves in the space of probability measures. Thus, despite the definition (3.18) of as a vector space, the tangent vectors attainable by curves form a convex cone rather than a linear space. This suggests that should be (formally) considered as a manifold with corners. ∎
4. The sliced Wasserstein length space
Given an interval and a curve , define its sliced Wasserstein length by
(4.1) |
We note that (4.1) is consistent with the usual notion of length in a metric space (see [16, Theorem 2.7.6]):
(4.2) |
In this section we examine the length metric induced by
(4.3) |
and the associated length space . As and coincides with its length metric, it immediately follows that .
While in general the study of the intrinsic metric and the geometry is mathematically natural, it is particularly relevant for for the following reasons. Firstly, using the characterization of metric derivatives via the quadratic functional in Theorem 3.9, we can consider a formal Riemannian structure on analogous to that on . Furthermore, in applications we are often interested in continuous deformations of probability measures, hence the geodesic distance that can be attained by absolutely continuous curves can be more relevant than the original distance.
After noting is a complete metric space, in Lemma 4.3 we show the narrow precompactness of absolutely continuous curves in and the lower semicontinuity of . From this we deduce the lower semicontinuity of respect to the narrow convergence in Lemma 4.4 and the existence of geodesics in Proposition 4.5; in particular, the latter implies that in general , as we have seen in Example 2.5 that is not a geodesic space.
We first note that completeness of follows from completeness of .
Corollary 4.1 (Completeness).
is a complete metric space.
Proof.
In locally compact metric spaces the compactness of paths, lower semicontinuity of length, and existence of geodesics follow by classical arguments; see Section 4 of [2]. However, in balls are not precompact; see Remark 2.2. On the other hand balls are precompact with respect to the narrow topology (Proposition 2.3) and the SW distance is lower semicontinuous with respect to narrow convergence. This allows to use instead the following refined version of Ascoli-Arzelà theorem [1, Proposition 3.3.1] to construct limiting curves and establish the existence of geodesics.
Proposition 4.2 (Proposition 3.3.1. of [1]).
Let be a complete metric space. Let and be a sequentially compact set with respect to topology , and let be curves such that
(4.4) |
for a symmetric function , such that
where is an (at most) countable subset of . Then there exists an increasing subsequence and a limit curve such that
Setting and to be the topology generated by the narrow convergence in Proposition 4.2, we can modify the standard arguments to show pointwise narrow compactness of curves.
Lemma 4.3 (Pointwise narrow compactness for curves and lower semiconinuity of length).
Let be a closed interval
and suppose a sequence of curves satisfies
Then, up to a reparametrization, there exists a curve continuous in such that along a subsequence (which we do not relabel)
Moreover,
(4.5) |
In particular, .
Proof.
As each is an absolutely continuously curve with uniformly bounded length, we may instead consider their Lipschitz reparametrizations [1, Lemma 1.1.4] to the interval with each of the Lipschitz constant is bounded above by the length of the curve. Thus the equicontinuity condition (4.4) is satisfied at all points with . Furthermore, the condition allows us to choose such that is finite. Then
By Proposition 2.3, is compact with respect to the narrow topology. Thus the refined Ascoli-Arzelà Theorem (Proposition 4.2) implies the existence of curve continuous in such that pointwise converge narrowly at all as .
By Proposition 2.1 is lower semicontinuous with respect to narrow convergence of measures. Thus for any fixed partition , we can find sufficiently large such that for all and thus
Letting and , we see that indeed .∎
From Lemma 4.3 we deduce the lower semiconitnuity of .
Corollary 4.4 (lower semicontinuity of ).
The map on is lower semicontinuous with respect to the narrow convergence of measures.
Proof.
Let be narrowly convergent sequences in with respective limits . Fix , and for each let , , be the arc-length parametrized curve [1, Lemma 1.1.4] such that
By setting the for , we can define all on a common bounded interval . As is lower semicontinuous, . Thus by Lemma 4.3, there exists a limiting curve such that
Moreover, is a curve connecting and , and by (4.5)
We conclude by letting . ∎
Existence of geodesics in also follows from Lemma 4.3.
Proposition 4.5 ( is a geodesic metric).
For each there exists a length minimizing curve such that . In particular, is a geodesic space.
Proof.
Remark 4.6.
In case the geodesic attains the sliced Wasserstein distance between , the geodesic can be characterized as the Radon inverse of the 1D displacement interpolant between and . However, in general such Radon inverse is not a probability measure, as noted in Example 2.5.
While the -geodesic remains in , we cannot guarantee that the corresponding pair satisfies , or even that is a measure for a.e. . See Remark 3.10. However, it can be approximated by solution of the continuity equation with by concatenating with the Wasserstein geodesics from to and to , where is a suitable smooth convolution kernel with bandwidth . ∎
Remark 4.7.
We note that the projection to closed balls is not a sliced Wasserstein contraction. This property holds for all transportation distances which are increasing with Euclidean distance, but we show by explicit example that it does not hold for the sliced Wasserstein distance: let
Then . Let be the projection onto . Then via explicit computation one can verify that
In light of this observation, the following question is nontrivial: Consider supported in a closed unit ball centered at 0. Does it hold that for all the measures along the geodesic are supported within the same ball? This property sounds natural as would be supported on for all , but remains an open problem. ∎
Recall that in Example 2.5 established that in general the geodesics do not exist, whereas geodesics between always exist by Proposition 4.5. Thus we conclude this section with the following corollary.
Corollary 4.8.
is not a length space.
5. Comparisons with negative Sobolev norms and the Wasserstein distance
We establish two comparison results, Theorem 5.2, near absolutely continuous measures, and Theorem 5.5, near discrete measures. The former states that for suitable absolutely continuous measures, is equivalent to and both are comparable to the -norm as a consequence of the averaging effect of the Radon transform. On the other hand, Theorem 5.5 states that, roughly speaking, and are very close to near discrete measures, as the smoothing effect due to averaging does not take place.
For absolutely continuous measures with densities bounded below by and above by , Peyre [49] established the metric equivalence
see [36, Proposition 2.8] for an earlier proof of the first inequality above. Our results can be seen as providing analogous comparisons between the SW distance and a norm in a Hilbert space. As differs from SW, a question particular to our setup is whether the intrinsic distance also enjoys such comparison with norm. We answer this affirmatively in Theorem 5.2.
Recall that a measure is log-concave if for any and Borel measurable sets
(5.1) |
We first prove a useful lemma, which relies on that log-concavity is preserved by the pushforward with respect to projection [13], and that log-concave measures have log-concave density [14].
Lemma 5.1.
Let be a log-concave measure. Let , and suppose there exists such that
(5.2) |
Then for a.e. the displacement interpolation from to satisfies the same upper bound for all .
Proof.
As projection preserves log-concavity of a measure, each is log-concave, and thus has a log-concave density with respect to unless it is a dirac mass [14, Theorem 3.2]. For such that is a dirac mass, the conclusion of this lemma is trivial, so we consider such that . It follows that , thus we can fix the optimal transport map mapping to , and define
In the remainder of this proof, we identify the measures with their densities with respect to . The displacement interpolation is given by . As , it suffices to show for all . Arguing as in the proof of[38, Proposition D.2],
where we have used the harmonic mean-geometric mean inequality. On the other hand, log-concavity of implies
∎
Theorem 5.2 (Comparison between and the -norm.).
Let , and let such that
(5.3) |
Then we have the following.
-
(i)
If is log-concave then
(5.4) -
(ii)
If with , then
(5.5) Suppose further for an open connected bounded . Furthermore, let on for some . Then there exists such that
(5.6)
In particular, if in (ii) is also convex, then
(5.7) |
Remark 5.3.
We leave a few remarks on the conditions of Theorem 5.2. A simple, and useful, condition that implies (5.3) is the following:
(5.8) |
We note that (5.3) only requires the comparison to hold after integrating over hyperplanes.
The condition is satisfied whenever is compactly supported and has bounded density. Indeed, denoting by the ball of radius in centered at 0, if ,
thus whenever . However, need not be compactly supported for ; for instance, consider a Gaussian measure on .
Observe also that the second part of (ii) requires connectedness but not convexity of , whereas the comparison with on requires to be convex [49]. This is because we only use displacement interpolation between 1D projections, and connected and convex sets coincide in . In fact, our proof only requires connectedness for each projection defined in (5.9). However, to keep the statement simpler we use a stronger assumption that is connected. ∎
Proof.
Our proof is a careful adaptation of the argument by Peyre [49] to the sliced Wasserstein setting. The main difficulty comes from the fact that the density of the projections with respect to is not bounded away from zero, near the edge of their supports.
In this proof, we will use to denote the projection of in the direction , namely
(5.9) |
Defining to be the interior of in the case (i), observe that in both cases (i) and (ii) is connected for each , hence convex; in particular, the displacement interpolation between and remains in .
Noting that is log-concave whenever is convex, (5.7) follows directly from (5.4) and (5.6). Thus it suffices to prove items (i) and (ii).
Step 1o In this step we show that when is log-concave, the condition (5.3) implies the upper bound
We do this using comparison of distances of the projections along each with corresponding weighted norms. Consider the linear interpolation and write . Then and thus by duality . Hence, using Benamou-Brenier formula for each projection, we have
(5.10) |
Note that (5.10) required no assumption on . For each let be a constant speed -geodesic from to . By Lemma 5.1 for all . As , we have
and thus by duality
As is a constant speed geodesic, and thus
Hence
Step 2o In this step we establish the lower bound
under the assumption that and . By construction
As is log-concave, by Lemma 5.1 the displacement interpolation between and satisfies for all and a.e. . Arguing as in[49, Theorem 5], we have
By averaging over we obtain
Moreover, identifying the measures and their densities, we have . Thus by the Fourier slicing theorem (Proposition A.2) and change of variables , we have
Step 3o We show (5.6) under the additional assumption that for some bounded connected and in . Let . Then
Recall . Furthermore, as is open and connected, on the interval . Thus there exists some constant depending only on such that
Combining this with (5.10), we can find some such that
By Jensen’s inequality and the Radon isometry argument as in Step 2∘, we deduce
∎
Remark 5.4.
We make a few further remarks. Firstly, the lower bound
does not hold for general measures. In fact, while the Sobolev embedding Theorem implies , the same is not true for . Recalling that is a constant, we see that for any
By (A.13), if then the right-hand side must be controlled by , which is clearly not true; for instance, consider increasingly concentrated Gaussians centered at zero. Similarly, in general for , whereas .
We further note that the upper bound (5.6) requires the additional condition that near the boundary . Indeed, letting for some bounded domain , the density of is not bounded away from zero; this makes it difficult to control with . Indeed, denoting by the cumulative distribution function (CDF) of , does not in general guarantee
To see this, let and . As is radially symmetric, without loss of generality we restrict our attention to the projection onto the direction. Then for , and is symmetric about . Consider one-dimensional measures such that their CDFs satisfy on while , on ; note this is possible by prescribing suitable behavior on the interval . Then we have on . From direct calculations one can check that
As is radially symmetric, we can come up with examples of measures in such that their projections satisfy similar estimates. ∎
We now study the behavior of around discrete measures. The -Wasserstein distance is defined by
(5.11) |
We have seen that for any . Similarly, if is a discrete measure with support and is sufficiently small, any optimal transport map should map all the mass of near to . Moreover, for most directions the same is true at the level of projections as well. This allows us to show that within -balls of a discrete measure, metric can be well approximated by .
Theorem 5.5.
Assume is a discrete probability measure: where all masses are positive and all points are distinct. Let . Then there exists only dependent on such that if , we have
(5.12) |
Thus, we have the comparison
(5.13) |
Proof.
Let . We claim that we can find such that
(5.14) |
The desired result (5.12) follows from the claim (5.14). Indeed, setting whenever , as implies .
Thus it remains to prove (5.14). To this end, let be the -transport plan. As we know that for some transport map , which is also the optimal transport map for the quadratic cost, and satisfies
For each , let . For each , define the set of angles where the differs from the 1D-coupling induced by – i.e.
To control , it suffices to control the size of , as
As ,
where we have used that for
Thus by Chebyshev’s inequality
for some . As is also the optimal transport map for the quadratic cost,
As this is precisely our claim (5.14), we conclude the proof. ∎
6. Statistical properties of the sliced Wasserstein length
In this section we investigate the approximation error in distance between absolutely continuous measures and the empirical measure of their i.i.d. samples, with . The parametric rate of estimation for has already been observed, for instance by Manole, Balakrishnan, and Wasserman [37, Proposition 4] in the form .
The main result of this section is Theorem 6.3 which shows that the corresponding concentration result holds for the distance, namely that
Note that this directly implies with high probability, which is also new, to the best of our knowledge. We note that while proving only requires showing the estimation of one-dimensional Wasserstein distances holds in an integrated form over all projections to lines, showing estimates for requires constructing curves of length at most connecting and . We first provide a geometric intuition as to why this is to be expected. If , then
where we obtain the last equality by choosing such that . Thus
From (2.9) we know that the Radon transform is an isometry from to . We note that the related Sobolev space is a Reproducing Kernel Hilbert Space (RKHS). Heuristically we can view as having an RKHS as a dual at each point . It is important to note that dual metrics of RKHS norms – also known as Maximum Mean Discrepancy (MMD) – can be approximated at parametric rate [61]. Thus it is reasonable that the same holds for the nonlinear analogue, .
We will see that, under suitable assumptions, considering linear interpolation between is sufficient to establish the parametric rate of estimation in . Recall that in (5.10) we established
Take and suppose for a.e. . Write and let and denote the cumulative distribution functions (CDFs) of and , respectively. Then, for each test function we have
as the (weak) derivative of is , and the boundary term from integration by parts vanishes as . Thus from (2.10) we conclude
Thus the key is to uniformly bound relative to , which can decay rapidly near the boundary. This can be done using the relative VC-inequality due to Vapnik and Chervonenkis [64, Theorem 1] (see also [63, Chapter 3]). We state below the version of the relative VC inequality that can be found in [3, Theorem 2.1] and [22, Exercise 3.3]. The theorem provides an upper bound in terms of the shattering number (also known as the growth function or the shattering coefficient) of a class of sets, which quantifies richness or complexity of the class; we refer the readers to [63, Section 2.7] for a precise definition.
Theorem 6.1 (Vapnik and Chervonenkis, Theorem 1 of [64]).
Let and be the empirical measure of i.i.d samples . For each class of measurable subsets of , and let be its shattering number for points. Then
(6.1) |
By considering to be the collection of half-spaces, we can deduce the following uniform concentration result of the empirical CDFs .
Corollary 6.2.
Let be as in Theorem 6.1, and for each let be the respective cumulative distribution functions of . Then
(6.2) |
Proof.
We establish the parametric rate of for measures with finite values of defined by
(6.3) |
where and are respectively the density and the CDF of , and we use the convention . The functional , introduced in [37], is a sliced analogue of the functional introduced by Bobkov and Ledoux [8] for one dimensional measures; in general, is defined as the -density of the absolutely continuous component of , which need not be absolutely continuous with respect to . In the 1D case, finiteness of is necessary and sufficient for to decay at rate [8, Section 5].
Manole, Balakrishnan, and Wasserman established [37, Proposition 4]
for some constant independent of . Theorem 6.3 provides an analogous concentration result for only with the additional assumption that for a.e. ; note that this assumption holds whenever is absolutely continuous with respect to the Lebesgue measure on an affine hyperplane of dimension at least 1.
Theorem 6.3 (Parametric estimation rate of empirical measures in ).
Let be such that for a.e. . Let where are i.i.d samples of . Then for each , we have
(6.4) |
with probability at least , where is as defined in (6.3).
Proof.
As noted earlier, letting for a.e. we have
where and are the respective CDFs of and . By Corollary 6.2 we have
Choosing , with probability at least we have
∎
We now turn to providing practical, intuitive, and geometric conditions for finiteness of . We show that one can uniformly bound the ratio using a Cheeger-type isoperimetric constant of the probability measure , defined in the following way by Bobkov [6].
Definition 6.4 (Cheeger-type isoperimetric constant).
Let . The isoperimetric constant of is defined by
(6.5) |
where the infimum is taken over all Borel sets and is defined by
where is the open -neighborhood of .
Corollary 6.5.
Let be a probability measure with such that and for a.e. . Then
(6.6) |
In particular, letting where are i.i.d samples of , we have, for each
(6.7) |
with probability at least .
Proof.
Remark 6.6.
The isoperimetric constant quantifies the narrowness of the ‘bottleneck’ of . Note that if the support of is disconnected then . On the other hand, for log-concave Bobkov established a positive lower bound on [6, Theorem 1.2].
Furthermore, is bounded from below by the -Poincaré constant of . Indeed, can be alternatively characterized as the largest constant satisfying the inequality
for all integrable locally Lipschitz with median (while median is nonunique, the statement holds for every median) with respect to the measure ; see for instance the proof of [7, Theorem 3.1]. As
the constant is at least as large as the -Poincaré constant of . Consequently, for any bounded open connected with Lipschitz boundary, the measure satisfies .
Moreover, if satisfy for some , then we have . Indeed, for each Borel set
whereas . Thus
Thus any measure comparable to (bounded above and below by) satisfies , given that is bounded open connected and has a Lipschitz boundary. ∎
Remark 6.7.
Tudor Manole pointed out to us that the -rate concentration bound of Corollary 6.5 is likely not sharp, as the relative VC inequality (Theorem 6.1) may be suboptimal when applied to CDFs.
Indeed, the asymptotically sharp uniform bound on is of order , where is the CDF of and the corresponding empirical CDF; see the recent survey [55, Section 3.1] and references therein. This would lead to an improvement to -rate in Corollary 6.5. As the relative VC inequality allows convenient uniform bound on empirical CDFs over all , we do not pursue this refinement in this paper. ∎
7. Metric slopes and gradient flows in the sliced Wasserstein space
We examine the consequences of the local geometry of the sliced Wasserstein space on metric slopes and gradient flows. As in the previous sections, we contrast the behaviors of the metric slopes and gradient flows at absolutely continuous and discrete measures.
Since we are dealing with both the SW metric and the induced intrinsic distance , we start by commenting on the relationship between gradient flows with respect to the ambient and the intrinsic metric. In a general metric space the “gradient flows” of an energy are defined as curves of maximal slope, namely the continuous curves that satisfy
(7.1) |
where is the metric derivative (3.3) and is the metric slope defined in (1.6); see [1, Definition 1.3.2] for a precise and more general definition.
Suppose is the length metric induced by ; the definition above allows one to consider curves of maximal slope of in as well. We note that if is a Riemannian manifold isometrically embedded in and is the Euclidean metric, then is the Riemannian distance with respect to the Riemannian metric of the manifold. It is straightforward to see that the gradient flows in the classical sense on the manifold coincide with the curves of maximal slope in both ) and .
This equivalence is not as clear in full generality for curves of maximal slopes. As , in general and for any absolutely continuous curve . Furthermore,
where the last equality holds by absolute continuity. Consequently, any curve of maximal slope with respect to is a curve of maximal slope in .
However, it is in general unclear exactly when holds. Muratori and Savaré showed the equivalence for approximately -convex functional [39, Proposition 2.1.6]. On a different note, the weighted energy dissipation (WED) approach to constructing curves of maximal slope, studied by Rossi, Savaré, Segatti, and Stefanelli [53] relies on functionals that only involve metric derivatives and the energy, but not the metric slope and thus does not distinguish between and . The authors construct solutions of (7.1) with metric slope replaced by its relaxation , provided is a strong upper gradient, which is not the case in general, and in particular is not true for potential energies in the SW space; see Corollary 7.7.
Let us now return to the discussion of metric slopes in the SW space. At an absolutely continuous measure where we have the comparison (see Theorem 5.2)
we formally expect
(7.2) |
On the other hand, at a discrete measure , where we have comparison
we expect
(7.3) |
Of course, the comparison theorems of Section 5 require restrictive conditions on and thus the comparisons of above are formal; rigorously establishing this in generality would be challenging. Hence, we provide rigorous proofs of (7.2) and (7.3) for the potential energy for suitable , at absolutely continuous measures in Section 7.2 and at discrete measures in Section 7.3, respectively. Understanding of the metric slope allows us to show instability of curves of maximal slope in terms of initial data; see Proposition 7.5 and Remark 7.6.
7.1. Formal sliced Wasserstein gradient flows at smooth densities
We begin by formally deriving partial differential equations corresponding to sliced Wasserstein gradient flows, emphasizing that they are of order higher than their Wasserstein counterparts. For this purpose, it is convenient to limit our attention to the space of smooth positive measures , defined by
(7.4) |
Consider and set . Writing , note that the (standard) Wasserstein gradient satisfies
Since the quadratic form characterizes the local metric of the SW space at , formally the sliced Wasserstein gradient flux satisfies
(7.5) |
Suppose there exists some such that for all
(7.6) |
then satisfies (7.5). For simplicity, suppose and thus the inversion formula is valid. Then by (2.13)
satisfies (7.6), and thus by the inversion formula,
By the definition (3.18), if for some potential then . Thus, formally, the gradient flow of in satisfies the equation
(7.7) |
Observe that the order of (7.7) is higher than the corresponding Wasserstein gradient flow equation. Namely, each is a differential operator of order , whereas and jointly regularizes the function by derivatives. Note that the energy dissipation for (7.7) is formally
7.2. Metric slopes of potential energies at absolutely continuous measures
In the formal computations we have seen that a gradient flow of satisfies
Letting for smooth , we know . Thus along the -gradient flow , we have
Remark 7.1 shows that the dissipation of -gradient flow of is of the same order:
Remark 7.1 (Gradient flows with respect to the norm).
Let be a measure with density, and let us identify with its density. Let be a functional that admits an gradient – i.e. at suitable there exists such that for each with
Assuming is sufficiently smooth, the gradient of with respect to the norm is formally given by
as . Thus the gradient flow of formally satisfies the PDE
and we see that the PDE is precisely of order higher than that of the gradient flow equation; note that the -gradient flow equation has the structure , whereas the Wasserstein gradient flow satisfies an equation formulated in terms of the continuity equation. Furthermore, dissipation of the gradient flow is
For , we note that , and hence -gradient flow of satisfies
∎
Applying Theorem 5.2, we demonstrate that (7.2) holds for potential energy functionals with smooth compactly supported .
Proposition 7.2 (Slope of potential energies at absolutely continuous measures).
Let be smooth and compactly supported. Let be the potential energy functional
Let an open bounded connected domain containing with , and an absolutely continuous probability measure such that
Then
(7.8) |
Proof.
To obtain the upper bound, note that as the Radon inversion (A.11) and duality formula for finite measures (2.3) imply that for any , writing , we have
Furthermore, as is of class ,
and thus
Hence
and deduce the upper bound
Note that , where is as in Theorem 5.2 with . Thus by the Radon isometry (2.9)
As , and it only remains to prove the lower bound
for satisfying the provided conditions. To do so, define for each
As , and is bounded away from zero on , when is sufficiently small. Furthermore, as , has bounded second moments, and integrating by parts in a sufficiently large ball containing we may deduce , hence . Moreover, has uniformly bounded second moments and converges to narrowly, thus in as . Therefore
As further on , by the comparison theorem at absolutely continuous measures (Theorem 5.2),
∎
7.3. Metric slopes of potential energies at discrete measures
In this section we focus on the equivalence of the Wasserstein and the sliced Wasserstein metric slopes of potential energies at discrete measures.
Given a functional , a metric , a time-step , and a base point let us write
(7.9) |
We denote by the Moreau-Yosida approximation of with with respect to metric and time step
(7.10) |
Existence and uniqueness of the minimizer of (7.9) with in certain cases was discussed in [10]. General existence readily follows from the direct method of calculus of variations as we will see in the proof of Lemma 7.4.
We will impose two weak regularity assumptions on , namely lower semicontinuity with respect to and coercivity; we say is coercive if there exists and such that
(7.11) |
It is well-known that the potential energy functional is coercive for instance when the negative part of grows at most quadratically – i.e. for some . Moreover, lower semicontinuity of in implies lower semicontinuity of with respect to the narrow topology, hence with respect to and .
We stress that while we utilize the variational problem (7.10) in this section to characterize the metric slope via the duality formula, we do not study the limiting curves of the minimizing movements scheme.
The duality formula for the local slope [1, Lemma 3.1.5] in terms of the minimizers of the functional (7.9) along with Theorem 5.5 allows us to establish the following sufficient condition for an energy functional to satisfy at discrete measures.
Lemma 7.3.
Let be coercive and lower semicontinuous with respect to . Additionally, suppose that at each discrete measure , for sufficiently small the functional as defined in (7.9) admits minimizers such that .
Then the slope of at each discrete probability measures w.r.t coincide with the slope with respect to – i.e.
(7.12) |
Proof.
Fix . By hypothesis, for sufficiently small we can find minimizers of the JKO functional such that . By coercivity and lower semicontinuity with respect to , we can apply the duality formula for the local slope [1, Lemma 3.1.5] to choose a sequence such that
As , by Theorem 5.5 we have
On the other hand, the Moreau-Yosida approximation satisfies , and thus
Combining the estimates and again using the duality formula for ,
∎
Our last step is to verify that the hypotheses of Lemma 7.3 are satisfied for a general class of potential energy functionals.
Lemma 7.4.
Let be continuous and let . Then, for each discrete measure there exists such that for all , as defined in (7.9) admits a minimizer such that
(7.13) |
Proof.
As is nonnegative, for any fixed and the corresponding sublevel set
is contained in the ball , which is sequentially compact with respect to the narrow convergence of measures by Proposition 2.3. On the other hand, is lower semicontinuous with respect to the narrow convergence: is lower semicontinuous as is continuous and we know from Lemma 2.1 that is also narrowly lower semicontinuous In conclusion, for each minimizer of the JKO functional exists.
In the remainder of the proof, we denote by a constant only dependent on the dimension such that
(7.14) |
Step 1∘ Let be a minimizer of . Let us write
and decompose into
(7.15) |
Here is the (normalized) part of that is far from the support of , whereas is the part of close to for each . Let
Intuitively, is the mass outside the balls of radius about , and is the ‘total misplaced mass’. We want to show that both and are 0 when is sufficiently small. To deduce this, note that , and thus
where is the uniform modulus of continuity of in, say . On the other hand, we claim
(7.16) |
where . Note that this implies
(7.17) |
The choice of radius in (7.15) allows, after squaring, the coefficient of in the left-hand side to blow up as , and may be replaced by for any . From this we deduce that we can find such that for all ,
By definition of , this means has exactly mass in each , thus
Thus it remains to prove (7.16). Without loss of generality we may assume . Otherwise, we approximate by its convolution with a smooth and compactly supported kernel. Then all the quantities involved in (7.17) such as , converge as , hence we can deduce (7.17) in the limit as .
Furthermore, as is a bijection on for a.e. , we have the existence of transport map for a.e. such that
Step 2∘ To prove (7.16), we separately consider and – namely
We first deal with the term
Let
As , we deduce and . On we have , and thus
(7.18) |
Step 3∘. To deal with the second term , define
The threshold is a lower bound on how far incorrectly assigned mass must travel. Thus
Furthermore, on we have
Using these two properties, we deduce
Let
As by Chebyshev’s inequality we have , and thus
whereas
Thus
Collecting the estimates, we have
Recalling , we combine the above estimate with (7.18) to obtain (7.16). ∎
We are ready to state the main result of this section.
Proposition 7.5 (Slope of potential energy at discrete measures).
Let be continuously differentiable. Let be the potential energy functional
Then the slope of at discrete probability measures w.r.t coincide with the slope with respect to – i.e.
(7.19) |
Proof.
We conclude this section by discussing implications of Proposition 7.5. We first note in Remark 7.6 that the curves of maximal slope of the potential energy with respect to is not stable in the initial data.
Remark 7.6 (Lack of stability of sliced Wasserstein gradient flows).
We note that the curves of maximal slopes of with respect to the Wasserstein metric starting at any discrete measure are, up to a multiplicative constant, curves of maximal slopes with respect to (or ) metric. To see this, recall that the -gradient flow of starting at is given by where solves . Thus by Proposition 7.5
whereas for a.e. by Theorem 5.5. Consequently
thus is a curve of maximal slope of with respect to or, equivalently, is a curve of maximal slope of with respect to . Note that we do not claim that is the only such curve of maximal slope, as uniqueness is unknown.
If is semiconvex, the Wasserstein gradient flow of is stable in initial data, for instance by the Evolution Variational Inequality [1, Theorem 11.1.4]. Thus, if , then converge with respect to to the Wasserstein gradient flow starting at , which satisfies
On the other hand, if further , Proposition 7.2 asserts that
for suitable bounded away from zero on a bounded open convex set compactly containing . Thus, if a gradient flow starting at were to exist, it must satisfy
Note that the Wasserstein gradient flow does not satisfy this. So the SW gradient flow (should one exist) is in this case distinct from the Wasserstein gradient flow. This implies that potential energy is not -convex in the SW geometry and furthermore suggests that we cannot hope for stability of sliced Wasserstein gradient flows in initial data in the set of measures, even for smooth potential energies. We believe that equation (7.7) may be more amenable to PDE-based approaches. ∎
Proposition 7.5 also implies that the local slope is not lower semicontinuous with respect to nor the narrow convergence. This lack of regularity makes difficult rigorously studying the limit of the minimizing movements scheme as the time-step vanishes, which we do not pursue in this paper.
More precisely, solutions of the minimizing movements scheme for gradient flows converge to the curve of maximal slope with respect to the relaxed slope [1, Chapter 2]; recall that the relaxed slope of in metric space at each is defined by
(7.20) |
Under regularity assumptions such as the -geodesic-convexity of with respect to , we have the equivalence . However, Proposition 7.5 implies that this is not the case for the SW-slopes of potential energies. Namely we show that even for smooth potentials the relaxed slope in the SW metric of the potential energy coincides with rather than .
Corollary 7.7 (Relaxed slope of the potential energy).
Let be continuously differentiable with uniformly bounded derivatives. Denote by the lower semicontinuous envelope of with respect to the narrow topology as defined in (7.20). Then
(7.21) |
Remark 7.8.
Proof.
It is well-known [1, Proposition 10.4.2] that
which is continuous with respect to the narrow convergence when . Fix any . Then, for any narrowly converging sequence ,
thus we conclude by taking infimum over such sequences.
On the other hand, approximating by discrete measures in , and taking a suitable subsequence to ensure , we have by Proposition 7.5
∎
From the proof it is clear that (7.21) also holds if the lower semicontinuous envelope is defined with respect to the topology generated by .
Remark 7.9.
We note that in general is not a strong upper gradient. In particular, consider on a bounded convex domain for some and let with . We claim that fails to be an upper gradient. As all derivatives of are bounded, for small time the path satisfies on , and in particular remains in the space of probability measures. Furthermore, and thus
and . On the other hand,
Thus, choosing sufficiently oscillatory such that
we see which verifies that is not an upper gradient. ∎
Acknowledgements. The authors are grateful to Jun Kitagawa for stimulating discussions, and also to Tudor Manole for pointing us to the literature that lead to Remark 6.7. The authors acknowledge the support of the National Science Foundation via the grant DMS-2206069. They are also thankful to the Center for Nonlinear Analysis for its support. SP was also supported by the NSF grant DMS-2106534. The authors would also like to thank the anonymous referees for careful readings and numerous helpful suggestions, which greatly helped improve the exposition of this manuscript.
References
- [1] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, second ed., 2008.
- [2] L. Ambrosio and P. Tilli, Topics on analysis in metric spaces, vol. 25 of Oxford Lecture Series in Mathematics and its Applications, Oxford University Press, Oxford, 2004.
- [3] M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications, Discrete Applied Mathematics, 47 (1993), pp. 207–217.
- [4] Y. Bai, B. Schmitzer, M. Thorpe, and S. Kolouri, Sliced optimal partial transport, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 13681–13690.
- [5] E. Bayraktar and G. Guo, Strong equivalence between metrics of Wasserstein type, Electronic Communications in Probability, 26 (2021), pp. 1 – 13.
- [6] S. G. Bobkov, Isoperimetric and analytic inequalities for log-concave probability measures, The Annals of Probability, 27 (1999), pp. 1903 – 1921.
- [7] S. G. Bobkov and C. Houdré, Isoperimetric constants for product probability measures, The Annals of Probability, 25 (1997), pp. 184 – 205.
- [8] S. G. Bobkov and M. Ledoux, One-dimensional empirical measures, order statistics, and Kantorovich Transport Distances, American Mathematical Society, 2019.
- [9] C. Bonet, P. Berg, N. Courty, F. Septier, L. Drumetz, and M. T. Pham, Spherical sliced-Wasserstein, in The Eleventh International Conference on Learning Representations, 2023.
- [10] C. Bonet, N. Courty, F. Septier, and L. Drumetz, Efficient gradient flows in sliced-Wasserstein space, Transactions on Machine Learning Research, (2022).
- [11] N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, Sliced and Radon Wasserstein barycenters of measures, Journal of Mathematical Imaging and Vision, 51 (2014), p. 22–45.
- [12] N. Bonnotte, Unidimensional and Evolution Methods for Optimal Transportation, PhD thesis, Université Paris-Sud, 2013.
- [13] C. Borell, Convex measures on locally convex spaces, Arkiv för Matematik, 12 (1974), pp. 239–252.
- [14] C. Borell, Convex set functions in d-space, Periodica Mathematica Hungarica, 6 (1975), p. 111–136.
- [15] F. Bruhat, Distributions sur un groupe localement compact et applications à l’étude des représentations des groupes -adiques, Bulletin de la Société Mathématique de France, 89 (1961), pp. 43–75.
- [16] D. Burago, Y. Burago, and S. Ivanov, A course in metric geometry, vol. 33 of Graduate Studies in Mathematics, American mathematical Society, 2001.
- [17] G. Cozzi and F. Santambogio, Long-time asymptotics of the sliced-wasserstein flow, arXiv preprint arXiv:2405.06313, (2024).
- [18] B. Dai and U. Seljak, Sliced iterative normalizing flows, arXiv preprint arXiv:2007.00674, (2020).
- [19] C. Dellacherie and P.-A. Meyer, Probabilities and potential, vol. 29 of North-Holland Mathematics Studies, North-Holland Publishing Co., Amsterdam-New York, 1978.
- [20] I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing, Max-sliced Wasserstein distance and its use for GANs, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10648–10656.
- [21] I. Deshpande, Z. Zhang, and A. G. Schwing, Generative modeling using the sliced Wasserstein distance, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- [22] L. Devroye and G. Lugosi, Combinatorial methods in density estimation, Springer, 2001.
- [23] A. Figalli, The optimal partial transport problem, Archive for Rational Mechanics and Analysis, 195 (2009), p. 533–560.
- [24] I. M. Gelfand, M. I. Graev, N. Y. Vilenkin, and E. J. Saletan, Generalized functions. Vol. 5: Integral Geometry and Representation Theory, Academic Press, 1966.
- [25] S. Helgason, Integral geometry and Radon transforms, Springer, 1 ed., 2010.
- [26] J. Kitagawa and A. Takatsu, Sliced optimal transport: is it a suitable replacement?, arXiv preprint arXiv:2311.15874, (2023).
- [27] J. Kitagawa and A. Takatsu, Disintegrated optimal transport for metric fiber bundles, arXiv preprint arXiv:2407.01879, (2024).
- [28] S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde, Generalized sliced Wasserstein distances, in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds., vol. 32, Curran Associates, Inc., 2019.
- [29] S. Kolouri, S. R. Park, and G. K. Rohde, The Radon cumulative distribution transform and its application to image classification, IEEE Transactions on Image Processing, 25 (2016), pp. 920–934.
- [30] S. Kolouri, P. E. Pope, C. E. Martin, and G. K. Rohde, Sliced Wasserstein auto-encoders, in International Conference on Learning Representations, 2019.
- [31] C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht, Sliced Wasserstein discrepancy for unsupervised domain adaptation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- [32] G. Leoni, A first course in fractional Sobolev spaces, vol. 229 of Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, 2023.
- [33] S. Li and C. Moosmüller, Measure transfer via stochastic slicing and matching, arXiv preprint arXiv:2307.05705, (2023).
- [34] T. Lin, Z. Zheng, E. Chen, M. Cuturi, and M. Jordan, On projection robust optimal transport: Sample complexity and model misspecification, in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, A. Banerjee and K. Fukumizu, eds., vol. 130 of Proceedings of Machine Learning Research, PMLR, 13–15 Apr 2021, pp. 262–270.
- [35] A. Liutkus, U. Simsekli, S. Majewski, A. Durmus, and F.-R. Stöter, Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 4104–4113.
- [36] G. Loeper, Uniqueness of the solution to the Vlasov–Poisson system with bounded density, Journal de Mathématiques Pures et Appliquées, 86 (2006), pp. 68–79.
- [37] T. Manole, S. Balakrishnan, and L. Wasserman, Minimax confidence intervals for the sliced Wasserstein distance, Electronic Journal of Statistics, 16 (2022), pp. 2252 – 2345.
- [38] R. J. McCann, A convexity theory for interacting gases and equilibrium crystals, PhD thesis, Princeton University, 1994.
- [39] M. Muratori and G. Savaré, Gradient flows and evolution variational inequalities in metric spaces. i: Structural properties, Journal of Functional Analysis, 278 (2020), p. 108347.
- [40] K. Nadjahi, Sliced-Wasserstein distance for large-scale machine learning : theory, methodology and extensions, theses, Institut Polytechnique de Paris, Nov. 2021.
- [41] K. Nadjahi, A. Durmus, L. Chizat, S. Kolouri, S. Shahrampour, and U. Simsekli, Statistical and topological properties of sliced probability divergences, in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds., vol. 33, Curran Associates, Inc., 2020, pp. 20802–20812.
- [42] K. Nguyen and N. Ho, Revisiting sliced Wasserstein on images: From vectorization to convolution, in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, eds., 2022.
- [43] K. Nguyen, N. Ho, T. Pham, and H. Bui, Distributional sliced-Wasserstein and applications to generative modeling, in International Conference on Learning Representations, 2021.
- [44] S. Nietert, Z. Goldfeld, R. Sadhu, and K. Kato, Statistical, robustness, and computational guarantees for sliced Wasserstein distances, in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, eds., vol. 35, Curran Associates, Inc., 2022, pp. 28179–28193.
- [45] J. Niles-Weed and P. Rigollet, Estimation of Wasserstein distances in the Spiked Transport Model, Bernoulli, 28 (2022), pp. 2663 – 2688.
- [46] J. L. M. Olea, C. Rush, A. Velez, and J. Wiesel, The out-of-sample prediction error of the square-root-LASSO and related estimators, arXiv preprint arXiv:2211.07608, (2023).
- [47] M. Osborne, On the Schwartz-Bruhat space and the Paley-Wiener theorem for locally compact abelian groups, Journal of Functional Analysis, 19 (1975), pp. 40–49.
- [48] F.-P. Paty and M. Cuturi, Subspace robust Wasserstein distances, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 5072–5081.
- [49] Peyre, Rémi, Comparison between distance and norm, and localization of Wasserstein distance, ESAIM: COCV, 24 (2018), pp. 1489–1501.
- [50] F. Pitié, A. C. Kokaram, and R. Dahyot, Automated colour grading using colour distribution transfer, Computer Vision and Image Understanding, 107 (2007), pp. 123–137. Special issue on color image processing.
- [51] J. Rabin, G. Peyré, J. Delon, and M. Bernot, Wasserstein barycenter and its application to texture mixing, in Scale Space and Variational Methods in Computer Vision, A. M. Bruckstein, B. M. ter Haar Romeny, A. M. Bronstein, and M. M. Bronstein, eds., Berlin, Heidelberg, 2012, Springer Berlin Heidelberg, pp. 435–446.
- [52] A. G. Ramm, Radon transform on distributions, Proceedings of the Japan Academy, Series A, Mathematical Sciences, 71 (1995), pp. 202 – 206.
- [53] R. Rossi, G. Savaré, A. Segatti, and U. Stefanelli, Weighted energy-dissipation principle for gradient flows in metric spaces, Journal de Mathématiques Pures et Appliquées, 127 (2019), pp. 1–66.
- [54] F. Santambrogio, Optimal transport for applied mathematicians: calculus of variations, PDEs, and modeling, Birkhäuser, 2015.
- [55] S. Sarkar and A. K. Kuchibhotla, Post-selection inference for conformal prediction: Trading off coverage for precision, arXiv preprint arXiv:2304.06158, (2023).
- [56] N. Sauer, On the density of families of sets, Journal of Combinatorial Theory, Series A, 13 (1972), pp. 145–147.
- [57] V. Sharafutdinov, Radon transform on Sobolev spaces, Siberian mathematical Journal, 50 (2021), pp. 560–580.
- [58] S. Shelah, A combinatorial problem; stability and order for models and theories in infinitary languages., Pacific Journal of Mathematics, 41 (1972), pp. 247 – 261.
- [59] K. T. Smith, D. C. Solmon, and S. L. Wagner, Practical and mathematical aspects of the problem of reconstructing objects from radiographs, Bull. Amer. Math. Soc., 83 (1977), pp. 1227–1270.
- [60] D. C. Solomon, Asymptotic formulas for the dual Radon transform and applications, Mathematische Zeitschrift, 195 (1987), pp. 1432–1823.
- [61] B. Sriperumbudur, On the optimal estimation of probability measures in weak and strong topologies, Bernoulli, 22 (2016), pp. 1839 – 1893.
- [62] H. Triebel, Theory of function spaces, Modern Birkhäuser Classics, Birkhäuser/Springer Basel AG, Basel, 2010. Reprint of 1983 edition.
- [63] V. N. Vapnik, The nature of statistical learning theory, Springer, 2 ed., 2013.
- [64] V. N. Vapnik and A. Y. Chervonenkis, Ordered risk minimization. I, Automat. Remote Control, 35 (1974), pp. 1226–1235.
- [65] C. Villani, Optimal transport, old and new, vol. 338 of Grundlehren der mathematischen Wissenschaften, Springer-Verlag, Berlin, 2009.
Appendix A Preliminaries on the Radon transform
In this appendix we record some basic properties of the Radon transform in further detail.
Remark A.1 (Radon transform of measures and distributions).
The duality formula (2.2) is used to extend the Radon transform to distributions. For general distributions there are ambiguities, as does not necessarily decay rapidly at infinity even for ; see [25, Chapter 1.5] and [52]. However, for bounded measures, pushforward by the projection map is consistent with the duality formula. To see this, let – i.e. is continuous in and vanishes as . Then, it can be verified that . Thus by Fubini’s theorem
In the second last equality we used the change of variables formula and that for all . As equipped with the total variation is the dual of , the Radon transform can be unambiguously extended to .
Another important property of the Radon transform is its relationship to the Fourier transform.
Proposition A.2 (The Fourier slicing property).
For , let denote the -dimensional Fourier transform from to itself. Then for each
(A.1) |
Moreover, (A.1) holds a.e. for .
Proof.
By definition,
Note all the equalities above are justified for a.e. when . ∎
From (A.1) it follows that, for
(A.2) |
where on the left-hand side denotes the -dimensional convolution and on the right-hand side denotes the -dimensional convolution. Indeed, as and , we have
Moreover, the same computation is justified when and and is well-defined. In particular, (A.2) holds for and with , which includes the case .
Next we record the smoothing effect of .
Proposition A.3 (Regularizing property of the Radon transform).
Let us denote by the surface area of the -dimensional sphere. Then
(A.3) |
Proof.
By using polar coordinates and Fubini’s Theorem, we see
∎
This confirms the intuition that should be more regular than , as for
for some dimension dependent constant . To examine the regularizing property in more detail, we first introduce the Riesz potentials; see [25, Chapter VII] for further details.
Definition A.4 (Riesz potential).
For and we define its Riesz potential by
(A.4) |
Proposition A.5 (Properties of the Riesz potential).
Let . Then
We record some properties of the Radon transform; see [25, Chapter 1] for proofs. Intertwining property between the Laplacian and the Radon transform follows from direct calculations.
Proposition A.6 (Intertwining property).
For and , we have
(A.8) |
Proof.
The second item is immediate from the definition, as
∎
We give a precise definition of the -order differential operator involved in inversion formula for the Radon and the dual transform.
Definition A.7.
Let be defined by
(A.9) |
where is the Hilbert transform in the scalar variable
(A.10) |
Remark A.8.
From the interaction of derivatives and the Hilbert transform with the Fourier transform, we can easily verify that for each
Consequently, for we have
Thus this we may extend as a bijective linear isometry from to when . ∎
For Schwartz functions, we have the following inversion formulae [60, Theorem 8.1].
Proposition A.9 (Inversion formula for the Radon and the dual transform).
For all
(A.11) |
where . Similarly, for all
(A.12) |
Here, is defined as in (2.5). Furthermore, with , hence for .
Remark A.10 (Formal derivation of the inversion formula).
Let . Denote by with the Fourier transform in -dimensions. Recalling , define in the Fourier domain by
As is even, the above is well-defined, and we have
By injectivity of the Fourier transform , we have . Thus the key to proving the inversion formulae is to justify the Fourier and the inverse Fourier transforms, which comes down to regularity of the functions.
Formally, we can find an expression for following the argument in Theorem 12.6 of [59]. For any test function ,
where we have used polar coordinates and the Plancherel formula for the Fourier transform. Thus
and we may conclude, for for some constant only depending on the dimension ,
As , this gives us (A.11), and yields (A.12) by applying on both sides ∎
We end this section with a few results on the Sobolev spaces with attenuated/amplified low frequencies. We first note continuously embeds in [57, Theorem 5.3].
Theorem A.11 ( and ).
Let and , the identity map of extends to the continuous embedding . In other words, consists of tempered distributions.
Sketch of Proof.
In general, for all one can show
(A.13) |
see [57, Theorem 5.3] for further details. Whenever , for each we have . Thus, for any , we can unambiguously define its action on each as a limit of actions by the approximating sequence – i.e.
Furthermore, for each fixed the estimate (A.13) is preserved for , hence we deduce . ∎
Sharafutdinov also established the following supercritical Sobolev-embedding type result [57, Theorem 5.4].
Theorem A.12 ( and continuous functions).
If , , then consists of bounded continuous functions.
Appendix B Continuity equation in each projection
In this section we provide a proof of Lemma 3.8, which was used to establish Theorem 3.9 (i). Our proof relies on the -norms for – introduced by Sharafutdinov [57] – which generalize the -norms defined in (2.6) and (2.7) to include regularity in the direction . In the simple case where and are nonnegative integers, the -norm is the sum of -norms of derivatives of order in the -variable and derivatives of order in the scalar variable. In this section we provide minimal details necessary to prove Lemma 3.8 and refer the interested readers to [57, Section 3] and references therein for further information.
Let be a spherical harmonic of degree if for a homogeneous polynomial of degree on satisfying . The space of spherical harmonics of degree on has finite dimension , thus we can choose an orthonormal basis for the space. Then the spherical harmonics of degree are eigenfunctions of the spherical Laplacian , where sign is chosen to ensure that is positive definite,
We can represent the Fourier transform of each by
(B.1) |
where the coefficients and decays fast at infinity. Similarly, for each
(B.2) |
with coefficients satisfying .
For any and , the -norm is defined by
(B.3) |
The norm is independent of the choice of the orthonormal basis; see [57, Sections 3-4]. Similarly, for and the -norm is defined by
(B.4) |
The spaces and are respectively the closures of under the corresponding norm.
In fact, -norm for with is exactly the -norms; see the proof of [57, Theorem 5.1]. Moreover, when , is continuously embedded in . Hence for for and , is continuously embedded in .
Sharafutdinov showed [57, Theorem 4.3] that the Radon transform extends to a bijective Hilbert space isometry between -spaces. Namely, for all and
(B.5) |
The crucial property we use in this section is the supercritical Sobolev-embedding-type inequality for spaces, which is due to Sharafutdinov [57, Corollary 5.11]. While the proof is omitted, the result readily follows from the analogous arguments for and [57, Corollary 5.5].
Theorem B.1 (Supercritical Sobolev embedding for ).
If , , and then is a continuous embedding.
We now present a proof of Lemma 3.8.
Proof of Lemma 3.8.
Note that by hypothesis, for a.e. and we have and . It suffices to show that for a.e
(B.6) |
Indeed, linear combinations of test functions of the form are dense in for every compact hence this implies (3.12); see [54, Proposition 4.2 and Exercise 4.23] for instance.
Step 1o. We first show that
(B.7) |
To this end, we first note by Proposition A.3 that for some dimension-dependent constant , where (see Definition A.4). Fix any . Then by (B.5) we can find some constant such that
As , . Hence we can choose a sequence such that .
Observe that this implies
(B.8) |
Indeed, the first inequality is a consequence of Theorem B.1 applied to and . The second inequality is in fact an equality up to a constant; from definition of the norms, the isometry (2.9), and the intertwining property , we have
In the last line we have used the inversion formula (A.12).
Without loss of generality, we can choose , for instance by noting that is dense in in the Schwartz topology and that is a continuous embedding. Thus
Recalling , and applying the duality formulae for bounded measures (2.3) and distributions (2.13),
Let be the compact interval containing the support of . Then
Step 2o. Let , and let . Then is an even function. A standard approximation argument using smooth cutoff function in the -variable using the continuity equation in the Radon space (B.7) we deduce that for any , , and
(B.9) |
Indeed, as no derivatives in appear in the continuity equation, the passage to the limit is justified by the dominated convergence theorem.
As the integrand in (B.9) with respect to is clearly , for each and the Lebesgue differentiation theorem yields that there exists a null set such that
By separability of for we can find a null set such that the above holds for all and .
As for every one can find with we conclude that for all
∎