Orlicz space regularization of continuous optimal transport problems
Abstract
In this work we analyze regularized optimal transport problems in the so-called Kantorovich form, i.e. given two Radon measures on two compact sets, the aim is to find a transport plan, which is another Radon measure on the product of the sets, that has these two measures as marginals and minimizes the sum of a certain linear cost function and a regularization term. We focus on regularization terms where a Young’s function applied to the (density of the) transport plan is integrated against a product measure. This forces the transport plan to belong to a certain Orlicz space. The predual problem is derived and proofs for strong duality and existence of primal solutions of the regularized problem are presented. Existence of (pre-)dual solutions is shown for the special case of -regularization for . Moreover, two results regarding -convergence are stated: The first is concerned with marginals that do not lie in the appropriate Orlicz space and guarantees -convergence to the original Kantorovich problem, when smoothing the marginals. The second result gives convergence of a regularized and discretized problem to the unregularized, continuous problem.
1 Introduction
In this paper we consider the optimal transport problem in the Kantorovich form in the following setting: For compact sets , probability measures on , respectively, and a real-valued continuous cost function we want to solve
(OT) |
where the infimum is taken over all probability measures on which have and as their first and second marginals, respectively. This problem has been well studied, and an overview is given in the recent books [31, 33, 27]. For example, it is known that the problem has a solution and that the support of is contained in the so-called -superdifferential of a -concave function on , see [1, Theorem 1.13]. In the case where is the squared Euclidean distance, this implies that the support of an optimal plan is singular with respect to the Lebesgue measure. This motivates the use of regularization of the continuous problem to obtain approximate solutions that are absolutely continuous w.r.t. given measures. That in turn allows to apply classical discretization techniques to solve the regularized problem approximately.
A regularization method that received much attention recently is regularization with the negative entropy of , i.e. adding a term with and some measure on [8, 11, 12, 10, 2]. Since is a measure, one has to interpret appropriately: One should think of as the Radon-Nikodym derivative of with respect to the regularization measure , and we will make this distinction explicit in the following. In [10] entropic regularization with respect to the Lebesgue measure is considered and it is shown that the analysis of entropically regularized optimal transport problems naturally takes place in the function space (also called Zygmund space [3]) and that optimal plans for entropic regularization are always in and exist if and only if the marginals are in the spaces . These spaces are an example of so-called Orlicz spaces [28]. This motivates the analysis of regularization in arbitrary Orlicz spaces in this paper. Another motivation to study a more general regularization comes from the fact that regularization with the -norm has been shown to be beneficial in some applications, see [29, 4, 13, 20]. Using the product of the marginals for regularization has been considered in the case of entropic regularization [17, 27, 32]. In this case one can show existence of the dual problem with different techniques. These observations motivate us to consider regularization with Young’s function with respect to general measures. Notable regularizations that our approach covers are regularization with arbitrary and the Tsallis entropy [25].
The notion of Orlicz spaces in the context of convex integral functionals has previously been used in [18], where the author considers a more general setting than the one presented here. More precisely, the spaces used in [18] are a generalization of the Orlicz spaces used here, which are also known as Musielak-Orlicz spaces [24]. Existence of both primal and dual optimizers are covered. By choosing with and regularization parameter , a problem similar to the one considered here is recovered. The difference lies in the fact that the cost function is part of the definition of the relevant Musielak-Orlicz spaces in this case and hence, the analysis takes place in different spaces. As the aim of [18] is to weaken the necessary assumptions as much as possible, the overall setting is more abstract and the proofs rely heavily on the author’s work [19]. Here we aim for a self-contained, more elementary treatment of the problem.
During the review process of the present work, we became aware of the preprint [22], which also considers regularization of the Kantorovich problem with Young’s functions. While the underlying domains are chosen to be general complete separable metric spaces, only regularization with respect to the marginal measures is considered and this allows the authors to derive existence of dual solutions independent of the form of the regularization.
Moreover, in [26] regularization with general convex functionals is considered. A duality result similar to our Theorem 3.1 is derived and existence of primal solutions is covered. However, [26, Theorem 2] implies the existence of continuous dual optimizers. As Example 3.7 below demonstrates, this can not be the case in general.
1.1 Notation and problem statement
Let us first fix some notation before we formulate our problem. The spaces of Radon and probability measures on will be denoted by and , respectively. The cone of non-negative Radon measures will be denoted by . With and we denote the spaces of continuous functions and bounded, continuous functions, respectively. The Lebesgue measure will be denoted by and integrals w.r.t. the Lebesgue measure are simply denoted by with the appropriate integration variable .
In the following we will consider compact domains , equipped with finite measures and , respectively. The measures and will be assumed to have full support, i.e. , for . We will denote and . For the space of -integrable functions on with respect to the measure , the symbol will be used. When a measure is absolutely continuous with respect to another measure , written as , the Radon-Nikodym derivative of w.r.t. to , i.e. the density of w.r.t , will be denoted by .
The characteristic function of a set will be denoted by . For two functions and , denote by , the outer sum of and . This notation generalizes to measures , on , , respectively, by . For and , the pushforward of by will be denoted as , i.e. the measure on defined by . Most importantly, the pushforward of the coordinate projections , will be used. Note, that is the -th marginal of . For real-valued functions we denote by the positive part and by the negative part. Finally, for a function denote by its extension to the real line by infinity, i.e.
The Orlicz space regularized Kantorovich problem of optimal transport considered in this work now reads as
(P) |
where is a so-called Young’s function. Note that the regularization of is employed w.r.t. some product measure . Important cases are and . Note also that is required to be absolutely continuous w.r.t. . This is due to the fact that even for Young’s functions satisfying modest conditions like , as , by e.g. [16, Theorem 5.19] the optimal may have a singular part w.r.t. . That however, would make the process of regularizing futile. Therefore we will require throughout the paper.
Now consider the marginal constraints. As we will see later in Lemmas 2.9 and 3.4, it is necessary that the marginals are absolutely continuous with respect to and that is integrable with respect to . To formulate the marginal constraints in terms of the densities, we recall that if for all -measurable sets it holds that . In terms of densities and integrals, this reads as
Using Fubini’s theorem we get
and hence, the marginal constraints read as
Note that for the integral to exist, the cost function does not need to be continuous and the problem may be formulated for more general cost functions. However, some of the results in this work require to be continuous and for simplicity this shall be assumed throughout the paper.
Let us summarize our assumptions:
Assumption 1.1.
For let be compact domains equipped with finite measures with . Let be a continuous cost function. Let be such that . Finally, for let such that and is integrable with respect to .
1.2 Contribution and Organization
The notions of Young’s functions and Orlicz spaces are introduced in Section 2 alongside some auxiliary results that will be used in the later sections. Section 3 deals with the question of existence of solutions in the framework of Fenchel duality. The first contribution (Theorem 3.4) guarantees existence of solutions of problem (P), which generalizes the corresponding results of [10, 20]. Afterwards, the predual problem is analyzed for the special case for . As second contribution Theorem 3.13 gives existence of dual optimizers in , where and . This generalizes the corresponding result of [20]. In Section 4 -convergence of different related problems is considered. First, a continuous, regularized problem with arbitrary marginals is considered. Theorem 4.2 extends [10, Theorem 5.1] and guarantees -convergence to the unregularized problem (OT) when smoothing the marginals. Note that the case of -convergence for fixed marginals in has been treated in [8, Theorem 2.7] for . While Theorem 4.2 is stated only for compact , it allows for marginals not in and a coupled reduction of the regularization and the smoothing parameter. The final contribution (Theorem 4.9) is the proof of -convergence of a discretized and regularized optimal transport problem to the unregularized continuous problem (OT). The result covers both entropic and quadratic regularization as special cases. Some of the results in this paper are a direct generalization of results of previous papers and their proofs also follow the general proof strategy.
2 Young’s functions and Orlicz spaces
In this section, some notions about Young’s functions and Orlicz spaces are introduced. For a more detailed introduction, see [3, 28].
Definition 2.1 (Young’s function [3, Definitions IV.8.1, IV.8.11]).
-
i)
Let be increasing and lower semi-continuous, with . Suppose that is neither identically zero nor identically infinite on . Then the function defined by is said to be a Young’s function.
-
ii)
Let . Then, the function defined by is said to be the complementary Young’s function of .
By definition, Young’s functions are convex and for a Young’s function it holds that the complementary Young’s function is also a Young’s function and actually equal to the convex conjugate .
The negative entropy regularization uses the function which is not a Young’s function, but the function is. Hence, we introduce a slight generalization of the notion of Young’s function to be able to treat this case as well.
Definition 2.2 (Quasi-Young’s functions).
We say that is a quasi-Young’s function if it is convex, lower semi-continuous and is a Young’s function.
Note that convexity of shows that is bounded from below. Moreover, any Young’s function is also a quasi-Young’s function.
Example 2.3.
The function is a quasi-Young’s function because is a Young’s function.
Definition 2.4 (Luxemburg and Orlicz spaces [3, Definition IV.8.10]).
Let be a Young’s function, and . Define the Luxemburg norm of a measurable function w.r.t. as
Then the space
of measurable functions on with finite Luxemburg norm is called the Orlicz space of w.r.t. .
Remark 2.5 ([9, Remark 1]).
The bound in the definition of the Luxemburg norm can be replaced by any . That is, all norms defined by
are equivalent. This can be seen by combining the inequalities and for .
The definition of Orlicz spaces does not immediately allow for the concept of quasi-Young’s functions to be incorporated. However, the following results establish the desired connection.
Lemma 2.6.
Let be a Young’s function and , with and let . Then defined by is a Young’s function and .
A proof of Lemma 2.6 can be found in Appendix A.
Corollary 2.7.
Let be a Young’s function, a quasi-Young’s function with , a measure and with . Then, if and only if
for some .
Proof.
For the other implication, by Remark 2.5 it suffices to show that whenever . However, this holds trivially, since is bounded from below and has finite measure w.r.t. . ∎
Lemmas 2.6 and 2.7 state that the definitions of and are essentially independent of whether is a Young’s function or just a quasi-Young’s function. To simplify notation, and will therefore be used for quasi-Young’s functions as well. Note that for a quasi-Young’s function with , , while in general and are equivalent but not equal. Moreover for complementary Young’s functions and which are proper, locally integrable and satisfy a certain growth condition, called the -property near infinity, it holds that is canonically isometrically isomorphic to (see, e.g. [14]).
Example 2.8 ( and ).
Let and . The space of measurable functions with is called . By the above corollary, the space of measurable functions with is equal to . The complementary Young’s function of is given by
As satisfies the -property near infinity, the dual space of is thus given by the space of measurable functions that satisfy , which is called .
The following result states that the marginals of a transport plan with density in also have density in the respective space.
Lemma 2.9.
Let be such that , for and set . Let .
If for a quasi-Young’s function , then for with
Proof.
Setting , one observes, that , and similarly for . The proof given for [10, Lemma 2.11] then holds with minor modifications. ∎
Remark 2.10.
The above result immediately yields that no optimal solution of problem (P) can exist for any , if for .
Next, a few facts are derived, which will be useful for the analysis of both the primal and dual regularized optimal transport problems and -convergence.
Lemma 2.11.
Let be a quasi-Young’s function and such that . Then, .
Proof.
Noting that , the proof given for [10, Lemma 2.6] holds with minor adjustments. ∎
For complementary Young’s’s functions and , we have the following result on conjugating scaled Young’s functions.
Lemma 2.12.
Let be a Young’s function, and .
-
1.
Then for it holds that .
-
2.
Let be a quasi-Young’s function with . Then for it holds that .
Proof.
The assertion follows directly from the definition of the convex conjugate and the Fenchel-Moreau theorem. ∎
The following Lemma 2.13 gives some useful insights into the behavior of the objective function of problem (P) under perturbation of . Recall from Assumption 1.1 that for Young’s functions we required .
Lemma 2.13.
Let , and such that and . Let for some quasi-Young’s function . Then the following statements hold.
-
1.
Let . Then
-
2.
Let . Then
Proof.
Since grows superlinearly at , the recession function (which is independent of ) is infinite for all . By [16, Theorem 5.19], it holds for every sequence which weakly* converges to that
where denotes the unique measure singular to in the Lebesgue decomposition (e.g. [16, Theorem 1.115]) and denotes the corresponding measure with .
-
1.
It suffices to show for every . First note that since , is non-negative and hence . Let now be a bounded, convex closed set containing the origin in its interior. Then by [16, Definition 1.156], for every it holds that
since for all because of . In fact,
and for every .
-
2.
The second statement follows directly, since for it holds that . ∎
3 Existence of Solutions
In this section, we show strong duality for the regularized mass transport (P) using Fenchel duality in the spaces and . The result will then be used to study the question of existence of solutions for both the primal and the dual problem.
Theorem 3.1 (Strong duality).
Let be a quasi-Young’s function and Assumption 1.1 hold. If is integrable w.r.t. , then the predual problem to (P) is
(P*) |
and strong duality holds. Furthermore, if the supremum is finite, (P) possesses a minimizer.
Proof.
Strong duality holds by standard arguments (see e.g. [6, Theorem 4.4.3]) and (assuming a finiteness of the supremum) the primal problem (P) possesses a minimizer.
To derive the dual problem, we start from the primal problem, express the equality conditions and as suprema over continuous functions and get
The integrand of the first integral in (P*) is normal, so that it can be conjugated pointwise [30, Theorem 2]. Carrying out the conjugation with the help of Lemma 2.12, one obtains the claim. ∎
Example 3.2.
Remark 3.3.
Theorem 3.1 does not claim that the supremum is attained, i.e. that the predual problem (P*) admits a solution. Moreover, the solutions of (P*) cannot be unique since one can add and subtract constants to and , respectively, without changing the functional value. If however is strictly convex, then the functional in (P*) is strictly concave up to such a constant and therefore any solution is uniquely determined by this constant. This is the case, e.g., for functions with superlinear growth at .
3.1 Existence result for the primal problem
The duality result can now be used to address the question of existence of a solution to (P).
Theorem 3.4 (Existence of solutions of (P)).
Let Assumption 1.1 hold. If is integrable w.r.t. and
(3.1) |
we have that problem (P) admits a minimizer if and only if for . In this case, and the minimizer is unique, if is strictly convex.
Proof.
Example 3.5.
Since , condition (3.1) is satisfied e.g. when satisfies either for some or for some . For , , both conditions hold trivially. For the second condition holds, since .
3.2 Existence result for the predual problem with regularization
The question of existence of solutions to the predual problem (P*) proves to be difficult for general Young’s functions. There are results that show existence for the predual problem in the entropic case [10] and in the quadratic case [20] (both considering the penalty w.r.t. the Lebesgue measure), but their proofs are quite different in nature. In the case where (i.e. the product of the marginals) and , one can show dual existence in a different way, see [17] for a proof based on the convergence of the Sinkhorn method and [32] for a sketch of a proof that uses regularity of the cost function. Our methods do not allow to show existence of solutions for the dual problem in the general case considered up to now. Hence, we consider a special case in the following: We specialize to the case of -regularization for a measure with , . That is, we use the Young’s functions for , and thus, , with , and the predual is actually also the dual. Moreover, . To keep notation clean, will abbreviated as with slight abuse of notation and similarly for .
Assumption 3.6.
Let Assumption 1.1 hold. In addition, let for and let the cost function be continuous and fulfill for some constant . Furthermore, let the marginals with satisfy -a.e. for .
Note that the latter condition, i.e that the densities should be bounded away from zero, can be guaranteed by proper choice of , e.g. gives . It can not be expected for problem (P*) to have continuous optimizers , , as the following example demonstrates.
Example 3.7.
For let , , where denotes the Dirac measure at zero, and . Moreover, let . Clearly the minimum in problem (P) is , which is attained for . By Theorem 3.1, the optimal value of problem (P*) is , as well. Going on, it holds
If , the supremum () can clearly not be attained. Hence, assume for some , which implies , as . Thus,
where equality holds for . Due to the strict inequality, the supremum in problem (P*) can not be attained for , .
However, with the help of the following result, we can define a variant of the predual problem, for which existence of minimizers can be shown.
Lemma 3.8.
Let Assumption 1.1 hold and let , be such that . Then , .
Proof.
First note, that
so that . This implies -a.e. for .
We now consider , separately and start with . Let be such that -a.e. on . If the assertion holds trivially, so we assume . Then,
where the first inequality is justified because thanks to we can take the supremum over . With the same argumentation, we can prove . ∎
Hence, the objective function of problem (P*) is also well defined for functions , , with . We now considered the following variant of the predual problem:
(P†) |
The strategy in this section is as follows.
-
1.
First, show that problem (P†) admits a solution .
-
2.
Then, prove that and possess higher regularity, namely that they are functions in .
The objective function needs to be extended to allow to deal with weak- converging sequences. To that end, define
where . Note that in the case , the variable is given by . Then, thanks to the normalization of and ,
Of course, is also well defined as a functional on the feasible set of problem (P†) and this functional will be denoted by the same symbol to ease notation. In order to extend to the space of Radon measures, consider for a given measure the Hahn-Jordan decomposition and assume . Then, is finite for as defined above and as in Assumption 3.6. Regarding the negative part, we set , whenever this expression is not properly defined, as and are both positive. Combining this, we always have and define
Remark 3.9.
If , then and -a.e. in . Hence, both functionals denoted by coincide on , which justifies this notation.
The following auxiliary results are generalizations of the corresponding results in [20]. The first lemma covers the coercivity of in . To keep notation simple, from now on we will abbreviate by and similarly for , , where the underlying space will be clear from the context.
Lemma 3.10.
Let Assumption 3.6 hold and suppose that a sequence fulfills
for some . Then, the sequences and are bounded in and , respectively.
Proof.
This proof follows the outline of the proof in [20, Lemma 2.5].
We rewrite as . The positivity of implies
which gives the first assertion. The second one can be seen by making use of with from Assumption 3.6, which yields the estimate
Since is already known to be bounded, the second assertion holds. ∎
The next lemma provides a lower semi-continuity result for w.r.t. weak- convergence in . Note that the extension of as introduced above is needed, here.
Lemma 3.11.
Let Assumption 3.6 hold and a sequence be given such that in and for all . Then it holds that with and
Proof.
The proof given in [20, Lemma 2.6] only uses Lemma 3.10 and fundamental properties of , which also hold for , , and can thus be readily extended. ∎
Now all prerequisites for proving the existence result for problem (P†) are gathered.
Proposition 3.12.
Let Assumption 3.6 hold. Then, problem (P†) admits a solution .
Proof.
In [20, Proposition 2.9] the statement is proven for via the classical direct method of the calculus of variations using only [20, Lemmas 2.7 & 2.8] and Lemmas 3.10 and 3.11, where [20, Lemmas 2.7 & 2.8] are rather technical results holding independently of the choice of and , . Hence, the proof also holds for . ∎
Next, it is shown that , are indeed functions in .
Theorem 3.13.
Let Assumption 3.6 hold and let . Then every optimal solution from Proposition 3.12 satisfies , . Moreover, the negative parts of are bounded and the function has the marginals and .
Proof.
We only consider the the negative parts, as for the positive parts we already have by Lemma 3.8.
First note, that for functions , and it holds
where is to be understood as the constant mapping . Let now and fix some . Then, thanks to
(3.2) |
Proposition 3.12 implies that , so that is feasible for problem (P†). Therefore, the optimality of for problem (P†) yields
for all . Owing to the continuous differentiability of , the first integrand converges to -a.e. in .
Moreover, for , the mapping is Lipschitz continuous with Lipschitz constant . Together with (3.2), this gives
(3.3) |
since for all , . As with implies for all (see e.g. [15, Proposition 6.12]), the right-hand side is integrable.
Hence, due to Lebesgue’s dominated convergence theorem, passing to the limit is allowed and yields
Since was arbitrary, the fundamental lemma of the calculus of variations gives
(3.4) |
-a.e. in . Next, define a sequence of functions by
where is the lower bound for from Assumption 3.6. It holds that , which can be seen as follows: Since , clearly . Furthermore, since is compact and , it holds . Consequently, and is also integrable for every .
The functions satisfy and -a.e. in for , so that the monotone convergence theorem gives
Thus, there exists such that
with the threshold from Assumption 3.6. Now assume that -a.e. on a set with . Then
-a.e. in , which is a contradiction to (3.4). Therefore -a.e. in , which even implies that . Concerning , one may argue exactly the same way to conclude that , too. ∎
Remark 3.14.
While it seems clear that proving a generalization of Theorem 3.13 to general Young’s functions or even quasi-Young’s functions is likely to be complicated or even impossible without making strict assumptions on , not even the existence result for optimizers in can be generalized directly. The problem occurs in Lemma 3.10, which could not be extended to the case of Young’s functions or quasi-Young’s functions in this work. That additional assumptions on might be necessary for Lemma 3.10 to hold can be seen as follows.
In the general case, the function would be defined as
where and in the proof of Lemma 3.10 an inequality of the form
would be necessary. For this to hold, it would suffice to know
for (quasi-)Young’s functions , but this is not true in general as the Young’s function and shows. Indeed for for some one readily computes that and the above mentioned inequality would be
which is not possible for any constant independent of .111We thank the user harfe from mathoverflow who provided this counterexample to our question https://mathoverflow.net/q/333925. This counterexample indicates that both the growth of at infinity and at zero are important properties for this problem.
4 -Convergence
We return to the general setting, i.e. we only assume that Assumption 1.1 holds, and consider results on -convergence of problems related to problem (P). Recall from, e.g., [7], that a sequence of functionals on a metric space is said to -converge to a functional , written , if
-
(i)
for every sequence with ,
-
(ii)
for every , there is a sequence with and
It is a straightforward consequence of this definition that if -converges to and is a minimizer of for every , then every cluster point of the sequence is a minimizer to . Furthermore, -convergence is stable under perturbations by continuous functionals.
4.1 Continuous case
When considering arbitrary measures as marginals, their densities w.r.t. may not be in and by Theorem 3.4, problem (P) will not admit a solution in that case. One may therefore consider smoothed marginals with , , converging to and , respectively and show that the regularized problem with these marginals -converges to the unregularized problem with the original marginals.
Let be a smooth, compactly supported, non-negative kernel on with unit integral (w.r.t. the Lebesgue measure) and set
Since the marginals and transport plans will be smoothed by convolutions, the domains and will be extended slightly to take care of boundary effects. Hence, let be compact supersets of , respectively, such that
which is large enough to contain the supports of the smoothed marginals , for (and the width of the convolution kernels will be assumed to be small enough for this in the following). For a function (or measure) on denote by the extension of onto by zero (and analogously for functions and measures on and ). Let be the extension of onto by the Lebesgue measure and . Let be a continuous extension of onto and let
where we set the second integral to , if , and
for , .
First, we state an auxiliary result ensuring that the marginal constraints are preserved by convolution. For simplicity, we state it for measures on (but we could restrict everything to the respective domains , ). Note that for the expression can – thanks to the smoothness of – be interpreted both as smooth function or as measure in with that smooth function as density. Here, we choose to interpret as measure. A proof is given in Appendix B.
Lemma 4.1.
Let and let with , . Let and . Then for .
Theorem 4.2 (-convergence for smoothed marginals).
Let Assumption 1.1 hold and let denote
Define the smoothed marginals as for . Then it holds that
with respect to weak- convergence in . Moreover, if are chosen such that
then does not have a finite -limit. More precisely, even for feasible (i.e. with marginals ) it holds that
Proof.
This proof follows the outline of the proof in [10, Theorem 5.1].
-
i)
-condition: Let , then due to being continuous and bounded. Furthermore,
for some only dependent on for any . Thus,
Finally, the pushforward operator is weak- continuous, which implies that the marginal constraints are preserved under weak- convergence of , , and (note that ).
-
ii)
-condition: It suffices to consider a recovery sequence for , because for the marginal conditions for and can never be satisfied.
If , then for every and the condition holds trivially. Let therefore be finite and set . Then and as well as for , by Lemma 4.1. Finally, by Young’s convolution inequality,
for some constant . Thus,
(4.1) and the right hand side vanishes for by the assumption on the (coupled) convergence of and . Therefore,
-
iii)
For the second statement, recall from Lemma 2.9 that
By Lemma 2.11, this immediately yields , and the assertion follows.∎
In the case , the assumption is much stronger than necessary for some Young’s functions. For example consider with or . In those cases the condition
suffices, as the following result states. For this gives exactly the result in [10, Theorem 5.1].
Corollary 4.3.
Let and let be a quasi-Young’s function such that is monotone. Then it suffices to assume
in Theorem 4.2.
Proof.
A refinement of estimate (4.1) can be given. Using the monotonicity of we obtain
where again the right hand side vanishes by assumption. The assertion follows as in Theorem 4.2. ∎
4.2 Discretized problems
Here we describe a discretization of problem (P) and show two approximation results:
We recall the problem data: Marginals and finite measures with for and a continuous and positive cost function on . We have and aim to discretize the problem
(4.2) |
We do a Galerkin discretization with piecewise constant functions. For let be a sequence of finite partitions of such that for every there is an with and such that , , and satisfy the following assumption:
Assumption 4.4.
Let be a Borel set and . Then there exists some such that for all the sets , defined by
satisfy
(4.3) |
for all .
Remark 4.5.
If Assumption 4.4 is fulfilled for , condition (4.3) holds analogously for and , which can be seen as follows: For , let and be defined analogously based on . It holds
Thus,
by (4.3) and the argument holds analogously for .
Assumption 4.4 yields the following auxiliary result about piecewise constant approximation of measures.
Lemma 4.6.
A proof of Lemma 4.6 can be found in Appendix C. We now set
Then by Lemma 4.6
Note that division by zero is not a problem here, since and were assumed to have full support and hence for all . We define the finite-dimensional spaces
The discrete problem is then one in , namely
(PD) |
If we discretize the sought-after density by
we can derive the optimization problem for the unknown coefficients as follows: The objective function is
The first marginal constraint is given by
and this leads to the equation
A similar equation can be derived for the other marginal constraint. With
we arrive at the fully discretized problem
This is a finite-dimensional convex minimization problem with linear constraints. Several general methods could be used to solve this problem numerically, see, e.g. [4, 27, 20, 21]. The following theorem now guarantees that the discretized and regularized problem converges to the continuous regularized problem (P). The proof makes use of the so-called Jensen’s inequality (see e.g. [5, Theorem 2.12.19]), which we recall for the convenience of the reader: Let , and with . Let be convex and let be a -integrable function such that is -integrable. Then,
Theorem 4.7.
Let Assumptions 1.1 and 4.4 hold. Let , be a quasi-Young’s function, and let . Then the minimization problem (PD) -converges to problem (4.2) w.r.t. weak- convergence in as .
Proof.
We define, via
and
Given an arbitrary , we now check the two conditions for -convergence.
-
i)
-condition: Let such that .
If , pass to a subsequence (denoted by the same symbol) with finite values . Because , we have
and moreover, by Lemma 2.13.
If , assume for a contradiction that . Pass to a subsequence (not renamed) with .
Since, and, by Lemma 4.6, , and , we see that always satisfies the marginal constraints and positivity. Hence, can not occur due to violation of these constraints.
So, if satisfies the marginals but or with , then it holds by Lemma 2.13, which again is a contradiction.
Thus, in every case.
-
ii)
-condition: If , the condition is trivially fulfilled for . Hence, consider and define by
Note that by assumption and therefore by definition. One easily sees that
i.e., . In particular, by Lemma 4.6.
It remains to show . As before, we have
Moreover, we get from Jensen’s inequality that
With this we obtain
which shows the -condition.∎
-convergence for simultaneously decreasing the regularization parameter and refining the discretization proves to be a harder problem. The following example shows that the convergence rate of must be linked to the convergence rate of the discretization.
Example 4.8.
Let , the Dirac measure at zero and the Lebesgue measure on . Then clearly is the only feasible and thus the optimal transport plan. Let be a sequence, which is to be chosen.
We consider and have the discretized optimal plan
However, it holds that
and hence, the condition can not be fulfilled if is such that (which holds, e.g., for ).
Let now be the Dirac measure at zero. The discretized optimal plan now is
and hence,
Thus, may be chosen arbitrary in this case.
Using the insights of Example 4.8 the desired result can be formulated.
Theorem 4.9.
Under the assumptions of Theorem 4.7, let be a positive sequence with for all and let be a sequence converging to zero such that
(4.4) |
Then the minimization problem
-converges to (OT) w.r.t. weak- convergence in as .
Proof.
Let be defined as in the proof of Theorem 4.7 with and let be defined via
Given an arbitrary , check now the two conditions for -convergence.
-
i)
-condition: Let be any measure and be such that .
The case can be treated similarly to the proof of Theorem 4.7, with the difference that we don’t have to consider the case . For we get
Moreover, since is bounded from below, we can extract a subsequence such that is bounded from above and below and obtain
which proves .
-
ii)
-condition: For we have nothing to prove.
If , define as in the proof of Theorem 4.7. Then, the marginal constraints are satisfied and . Hence and it remains to show that
Using and monotonicity of we get
(4.5) Hence,
as desired.∎
Corollary 4.10.
Theorems 4.7 and 4.9 remain true if instead of a continuous cost function , a sequence of step functions , constant on the sets and converging to uniformly, is used.
Proof.
Rewrite the integral as the dual pairing , where converges strongly and converges weakly-. Because the variation norm is bounded ([23, Corollary 2.6.10]), can be seen by standard arguments, which yields the assertion. ∎
Similarly to Corollary 4.3, for monotone the assumption on can be weakened.
Corollary 4.11.
Let be a quasi-Young’s function such that is monotone. Then it suffices to assume
instead of condition (4.4) in Theorem 4.9.
Proof.
5 Conclusion
Employing regularization in Orlicz spaces to the optimal transport problem allows to generalize the existence results of [10, 20] for both the primal and the predual problem and under mild assumptions, the results hold for regularization w.r.t. product measures . More precisely, primal solutions exist if and only if the marginals are functions in the appropriate Orlicz spaces and existence of optimizers in for the predual problem has been shown for the special case , .
A combined regularization and smoothing approach leads to a family of well-posed approximations that -converge to the original Kantorovich formulation if the regularization and smoothing parameters are coupled in an appropriate way. This gives a generalization of the corresponding result [10, Theorem 5.1] for . Similarly, a combined regularization and discretization approach leads to another family of approximations. It could be proven that, again, -convergence is guaranteed if the regularization parameter and the discretization fineness are coupled in an appropriate way.
Existence of solutions of the dual problem for general (quasi-)Young’s functions has been considered in a different framework in [18]. Still, future work might investigate, if the result can also be achieved by the approach considered here. Moreover, numerical methods for solving the regularized problem (P) have not been discussed here.
Acknowledgments
The authors would like to thank two anonymous reviewers for their suggestions regarding the presentation and valuable comments, which also helped to close some gaps in the argumentation.
Appendix
Appendix A Proof of Lemma 2.6
Proof.
Let be as in Definition 2.1. Then with it holds that and hence, is a Young’s function.
To show it suffices to show that for any
If , then trivially. Let therefore , i.e. there is a such that
where . Let and write
Since and is convex, for every it holds that and we obtain an upper bound for the first integral by
for some constant . With the same argument,
Finally, since , the same estimate holds for the second integral. Combining the estimates for the three integrals one obtains
which is less or equal than for small enough. ∎
Appendix B Proof of Lemma 4.1
Proof.
We only treat the case (the case being analogous). Recalling and using Fubini’s Theorem we get
where the third equality follows directly from the definition of the Lebesgue integral via simple functions. ∎
Appendix C Proof of Lemma 4.6
Proof.
Let be a Borel set, and , as in Assumption 4.4. Then for all it holds
and similarly for . In combination with (4.3) this yields
for large enough. Using an analogous argument for a lower bound, we get
which yields the first assertion. Analogous argumentation proves the result for . ∎
References
- [1] L. Ambrosio and N. Gigli. A user’s guide to optimal transport. In Modelling and Optimisation of Flows on Networks, pages 1–155. Springer, 2013. doi:10.1007/978-3-642-32160-3_1.
- [2] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015. doi:10.1137/141000439.
- [3] C. Bennett and R. Sharpley. Interpolation of Operators, volume 129 of Pure and Applied Mathematics. Academic Press, Inc., Boston, MA, 1988. doi:10.1016/S0079-8169(13)62909-8.
- [4] M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse optimal transport. In A. Storkey and F. Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 880–889. PMLR, 09–11 Apr 2018.
- [5] V. I. Bogachev. Measure theory. Springer, Berlin New York, 2007.
- [6] J. Borwein and Q. J. Zhu. Techniques of variational analysis. Springer, New York, 2005.
- [7] A. Braides. -convergence for Beginners, volume 22 of Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, Oxford, 2002. doi:10.1093/acprof:oso/9780198507840.001.0001.
- [8] G. Carlier, V. Duval, G. Peyré, and B. Schmitzer. Convergence of entropic schemes for optimal transport and gradient flows. SIAM Journal on Mathematical Analysis, 49, 12 2015. doi:10.1137/15M1050264.
- [9] A. Caruso. Two properties of norms in Orlicz spaces. Le Matematiche, 56(1):183–194, 2001.
- [10] C. Clason, D. A. Lorenz, H. Mahler, and B. Wirth. Entropic regularization of continuous optimal transport problems. Journal of Mathematical Analysis and Applications, 494(1):124432, Feb. 2021. doi:10.1016/j.jmaa.2020.124432.
- [11] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transportation distances. Advances in Neural Information Processing Systems, 26, 06 2013.
- [12] M. Cuturi, G. Peyré, and A. Rolet. A smoothed dual approach for variational Wasserstein problems. SIAM Journal on Imaging Sciences, 9, 03 2015. doi:10.1137/15M1032600.
- [13] A. Dessein, N. Papadakis, and J.-L. Rouas. Regularized optimal transport and the rot mover’s distance. The Journal of Machine Learning Research, 19(1):590–642, 2018.
- [14] L. Diening, P. Harjulehto, P. Hästö, and M. Růžička. Lebesgue and Sobolev Spaces with Variable Exponents, volume 2017 of Lecture Notes in Mathematics. Springer, 4 2011. doi:10.1007/978-3-642-18363-8.
- [15] G. B. Folland. Real analysis : modern techniques and their applications. Wiley, New York, 1999.
- [16] I. Fonseca and G. Leoni. Modern Methods in the Calculus of Variations: Spaces. Springer Monographs in Mathematics. Springer, 2007. doi:10.1007/978-0-387-69006-3.
- [17] A. Genevay. Entropy-regularized optimal transport for machine learning. PhD thesis, Université Paris Dauphine, 2019.
- [18] C. Léonard. Minimization of entropy functionals. Journal of Mathematical Analysis and Applications, 346(1):183 – 204, 2008. doi:10.1016/j.jmaa.2008.04.048.
- [19] C. Léonard. Convex minimization problems with weak constraint qualifications. Journal of Convex Analysis, 17(1):321–348, 2010.
- [20] D. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport. Applied Mathematics & Optimization, 09 2019. doi:10.1007/s00245-019-09614-w.
- [21] D. A. Lorenz and H. Mahler. Orlicz-space regularization for optimal transport and algorithms for quadratic regularization, 2019. arXiv:1909.06082.
- [22] S. D. Marino and A. Gerolin. Optimal transport losses and Sinkhorn algorithm with general convex regularization, 2020. arXiv:2007.00976.
- [23] R. E. Megginson. An Introduction to Banach Space Theory (Graduate Texts in Mathematics). Springer, 1998. doi:10.1007/978-1-4612-0603-3.
- [24] J. Musielak. Orlicz Spaces and Modular Spaces. Springer Berlin Heidelberg, 1983. doi:10.1007/bfb0072210.
- [25] B. Muzellec, R. Nock, G. Patrini, and F. Nielsen. Tsallis regularized optimal transport and ecological inference. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), Feb. 2017. URL: https://ojs.aaai.org/index.php/AAAI/article/view/10854.
- [26] F.-P. Paty and M. Cuturi. Regularized optimal transport is ground cost adversarial. In H. Daumé III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 7532–7542. PMLR, 13–18 Jul 2020. URL: http://proceedings.mlr.press/v119/paty20a.html.
- [27] G. Peyré and M. Cuturi. Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019. doi:10.1561/2200000073.
- [28] M. M. Rao and Z. D. Ren. Theory of Orlicz Spaces. Pure and Applied Mathematics. Dekker, 1991.
- [29] L. Roberts, L. Razoumov, L. Su, and Y. Wang. Gini-regularized optimal transport with an application to spatio-temporal forecasting, 2017. arXiv:1712.02512.
- [30] R. T. Rockafellar. Integrals which are convex functionals. Pacific J. Math., 24:525–539, 1968. doi:10.2140/pjm.1968.24.525.
- [31] F. Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling (Progress in Nonlinear Differential Equations and Their Applications). Birkhäuser, 2015. doi:10.1007/978-3-319-20828-2.
- [32] F.-X. Vialard. An elementary introduction to entropic regularization and proximal methods for numerical optimal transport. Lecture, May 2019. URL: https://hal.archives-ouvertes.fr/hal-02303456.
- [33] C. Villani. Optimal Transport. Old and New, volume 338 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, Berlin, 2009. doi:10.1007/978-3-540-71050-9.