Monotonicity in Quadratically Regularized Linear Programs
Abstract
In optimal transport, quadratic regularization is a sparse alternative to entropic regularization: the solution measure tends to have small support. Computational experience suggests that the support decreases monotonically to the unregularized counterpart as the regularization parameter is relaxed. We find it useful to investigate this monotonicity more abstractly for linear programs over polytopes, regularized with the squared norm. Here, monotonicity can be stated as an invariance property of the curve mapping the regularization parameter to the solution: once the curve enters a face of the polytope, does it remain in that face forever? We show that this invariance is equivalent to a geometric property of the polytope, namely that each face contains the minimum norm point of its affine hull. Returning to the optimal transport problem and its associated Birkhoff polytope, we verify this property for low dimensions, but show that it fails for dimension . As a consequence, the conjectured monotonicity of the support fails in general, even if experiments suggest that monotonicity holds for many cost matrices.
Keywords Linear Program, Quadratic Regularization, Optimal Transport, Sparsity
AMS 2020 Subject Classification 49N10; 49N05; 90C25
1 Introduction
Let be a polytope and . The regularized linear program
(1) |
has a unique solution as long as belongs to a certain interval , hence we can consider the curve which travels across various faces of as increases (i.e., as the regularizing constraint relaxes). We are interested in the following invariance property: once enters a given face, does it ever leave that face? (Cf. Figure 1.) In the optimal transport applications that motivate our study (see below), this property corresponds to the monotonicity of the optimal support wrt. the regularization strength. Our abstract result, Theorem 3.2, geometrically characterizes all polytopes such that the invariance property holds (for any cost ). We show that this property holds for the set of probability measures (unit simplex), but fails for the set of transport plans (Birkhoff polytope) when the dimension . As a consequence—which may be surprising given numerical experience—the optimal support in quadratically regularized optimal transport problems is not always monotone.
1.1 Motivation
This study is motivated by optimal transport and related minimization problems over probability measures. In its simplest form, the transport problem between probability measures and is
(OT) |
where is a given “cost” function and denotes the set of couplings; i.e., joint probability measures with marginals . See [34] for a detailed discussion. In many applications such as machine learning or statistics (see [24, 31] for surveys), the marginals encode observed data points and which are represented by their empirical measures and . Denoting by the resulting cost matrix, (OT) then becomes a linear program
(LP) |
where the polytope is (up to a constant factor) the Birkhoff polytope of doubly stochastic matrices of size . For different choices of polytope, (LP) includes other problems of recent interest, such as multi-marginal optimal transport and Wasserstein barycenters [2] or adapted Wasserstein distances [3]. The optimal transport problem is computationally costly when is large. The impactful paper [13] proposed to regularize (OT) by penalizing with Kullback–Leibler divergence (entropy). Then, solutions can be computed using the Sinkhorn–Knopp algorithm, which has lead to an explosion of high-dimensional applications (see [32]). More generally, [14] introduced regularized optimal transport with regularization by a divergence. Different divergences give rise to different properties of the solution. Entropic regularization always leads to couplings whose support contains all data pairs , even though (OT) typically has a sparse solution. In some applications that is undesirable; for instance, it may correspond to blurrier images in an image processing task [6]. The second-most prominent regularization is -divergence or equivalently the squared norm, as proposed in [6],111See also [17] for a similar formulation of minimum-cost flow problems, and the predecessors referenced therein. Our model with a general polytope includes such problems, and many others. which gives rise to sparse solutions. In the Eulerian formulation of [14], this is exactly our problem (1) with being the (scaled) Birkhoff polytope. Alternately, the problem can be stated in Lagrangian form as in [6], making the squared-norm penalty explicit:
(QOT) |
where denotes the density of wrt. the product measure and the regularization strength is now parameterized by . In the general setting, this corresponds to
(2) |
which has a unique solution for any . The curves and of solutions to (1) and (2), respectively, coincide up to a simple reparametrization.
In optimal transport, the regularized problem is often solved to approximate the linear problem (OT). The latter has a generically unique (see [12]) solution , which is recovered as (or ): .222For problems with non-unique solution, recovers the minimum-norm solution. In particular, the support of converges to the support of , which is generically sparse ( out of possible points, again by [12]). In numerous experiments, it has been observed not only that is sparse when is large, but also that monotonically decreases to (e.g., [6]). If this monotonicity holds, then in particular , meaning that can be used as a multivariate confidence band for the (unknown) solution .
1.2 Summary of Results
When is a set of measures, the support of can be defined as usual. When is a general polytope, the set of vertices of the minimal face containing yields a similar notion. With that in mind, we can ask whether the support of is monotone wrt. the strength of regularization. (The two notions of support yield the same result when is the simplex or the Birkhoff polytope; cf. Lemma 4.2.) Geometrically, this corresponds to the aforementioned invariance property: if for some face , does it follow that for all ? The answer may of course depend on the cost ; we call monotone if the invariance holds for any .
We show that monotonicity can be characterized by the geometry of . Namely, monotonicity of is equivalent to two properties: for any proper face , the minimum-norm point of the affine hull of must lie in , and moreover, the minimum-norm point of must lie in the relative interior . See Theorem 3.2. Once the right point of view is taken, the proof is quite elementary. An example satisfying both properties, and hence monotonicity, is the unit simplex , for any . For that choice of polytope, is a sparse soft-min of , converging to in the unregularized limit. On the other hand, we show that the Birkhoff polytope violates the first condition whenever the marginals have data points (meaning that ). As a result, the optimal support in quadratically regularized optimal transport problems is not always monotone, even if it appears so in numerous experiments. In fact, we exhibit a particularly egregious failure of monotonicity where the support of the limiting (unregularized) optimal transport is not contained in for some reasonably large . Our counterexample is constructed by hand, based on theoretical considerations. Our numerical experiments using random cost matrices have failed to locate counterexamples, suggesting that there are, in some sense, “few” faces violating our condition on the minimum-norm point. (See also the proof of Lemma 4.10.) This is an interesting problem for further study in combinatorics, where the Birkhoff polytope has remained an active topic even after decades of research (e.g., [30] and the references therein).
2 Preliminaries
This section collects notation and well-known or straightforward results for ease of reference. Let be an inner product on and the induced norm. Let be a polytope; i.e., the convex hull of finitely many points. The minimal set of such points, called vertices or extreme points, is denoted . A face of is a subset such that any open line segment with satisfies , where denotes closure. Alternately, a nonempty face of is the intersection of with a tangent hyperplane . Here a hyperplane is called tangent if and . We denote by and the relative interior and relative boundary of a set , and by its affine span. See [8] for background on polytopes and related notions.
Next, we introduce the regularized linear program and the interval of parameters where the set of feasible points in nonempty and the constraint is binding.
Lemma 2.1 (Eulerian Formulation).
Denote for and
Then
For each , the problem
(3) |
Moreover, is continuous and . In particular, is the singleton , and is the minimum-norm solution of .
Proof.
The identity holds as is increasing in . Fix . Let and . If , then for sufficiently small . By optimality it follows that , meaning that . Thus, for , we have . In particular, the set is contained in the sphere . As the set is also convex, it must be a singleton by the strict convexity of . Continuity of is straightforward.
It remains to deal with the boundary cases. For , we note that is trivially ruled out, hence we can conclude as above. Clearly . Define as the minimum-norm solution of , which is unique by strict convexity. Clearly . Thus when , we must have for any . Continuity and are again straightforward. ∎
The next lemma recalls the standard projection theorem (e.g., [7, Theorem 5.2]).
Lemma 2.2 (Projection).
Below, we often use as a convenient notation for , the minimum-norm point of . In the Lagrangian formulation of our regularized linear program, the solution can be expressed as a projection onto as follows.
Lemma 2.3 (Lagrangian Formulation).
Define
(6) |
Then
(7) | ||||
(8) |
The limit is stationary; i.e., there exists such that for all .
Proof.
The algorithm of [23] solves the problem of projecting a point onto a polyhedron, hence can be used to find numerically. The solutions of the Eulerian formulation (3) and of the Lagrangian formulation (6) are related as follows.
Lemma 2.4 (Euler Lagrange).
Given , there exists a unique such that , namely . The function is nondecreasing.
Conversely, let . Then there exists (possibly non-unique) such that . Define as the minimal with .333This is merely for concreteness; any choice of selector will do. Then is strictly increasing on .
Proof.
Let and recall that . Note also that is continuous and that is nondecreasing, with range . Hence there exists such that . We see from (7) that minimizes among all , which is to say that . The converse follows. ∎
3 Abstract Result
Recall that and denote the solutions of the Eulerian (3) and Lagrangian (6) formulation, respectively. Lemma 2.4 shows that the curves and are reparametrizations of one another. In particular, they trace out the same trajectory, and only the trajectory matters for the subsequent definition.
Definition 3.1.
Let be a polytope and . We say that is -monotone if for any face of ,
(9) | ||||
(10) |
We say that is monotone if it is -monotone for all .
This definition means that the “support” of is monotone decreasing in , in the following sense. For any , there is a unique face such that ; moreover, if and only if is a convex combination of the vertices with strictly positive weights [8, Exercise 3.1, Theorem 5.6]. Thus is a notion of support for , and then the property (9) indeed means that the support of is monotone decreasing for inclusion. When is the unit simplex, boils down to the usual measure-theoretic notion of support, namely for . This identification breaks down when is the Birkhoff polytope, but monotonicity is nevertheless equivalent for the two notions of support (see Lemma 4.2).444Clearly this equivalence does not hold in general. E.g., when , all points have the same measure-theoretic support. Next, we characterize monotonicity in geometric terms.
Theorem 3.2.
A polytope is monotone (Definition 3.1) if and only if
-
(H1)
and
-
(H2)
for each face .
The two conditions are similar, with the requirement for the proper faces being less stringent than for the improper face : while is a sufficient condition for (H2), the condition (H2) includes the boundary case where the projection lands on without the constraint being binding. We shall see that (H1) and (H2) play different roles in the proof. Simple examples show that changing to in (H1) or to in (H2) invalidates the equivalence in Theorem 3.2.
3.1 Proof of Theorem 3.2
We first prove the “if” implication, using the Lagrangian formulation (Lemma 2.3). This parametrization is convenient because is piecewise affine. While the latter is well-known (even for some more general norms, see [18] and the references therein); we detail the statement for completeness.
Lemma 3.3.
The curve is piecewise affine. The affine pieces correspond to faces of as follows. Fix and let be the unique face of such that . Let . Then is an interval containing and is affine. In particular, the curve does not return to after leaving it.
Proof.
Consider in and let . As , we have . Since is an affine subspace, the curve is affine. In particular, convexity of implies that for all . Thus is an interval and is affine, and this extends to the closure by continuity. ∎
Recall that denotes the relative boundary of . Consider the first time the curve touches a given face of . That point may lie in or in . As seen in Lemma 3.5 below, the boundary situation indicates that left another face when it entered , meaning that is not monotone. Hence, we analyze that situation in detail in the next lemma.
Lemma 3.4.
Let be a face of and . Let . If , then exactly one of the following holds true:
-
(i)
;
-
(ii)
, and then . In particular, .
Proof.
Recall from Lemma 3.3 that is an interval and is affine. Let and consider . Recalling Lemma 2.2, we have for that and in particular .
As for and both curves are continuous, it follows that . Therefore, our assumption that implies that .
Since is an affine space, is affine. We have and for all , where . (Note that cannot be a singleton when .) As is convex, it follows that for . If , we are in the first case. Whereas if , it follows that . Thus . Moreover, implies . ∎
Lemma 3.5.
Let and let be the unique face of with . Suppose there exists such that and let . For sufficiently small , there is a unique face such that for all .555When traveling along the curve , is the first face outside whose interior is visited. If , then satisfies .
Proof.
For , let denote the unique face such that . As seen in Lemma 3.3, the map has finitely many values and the preimage of each value is an interval. Hence the map must be constant on for sufficiently small . Let , be the corresponding face. As and are faces with , we have . Thus implies , and now Lemma 3.4 applies. ∎
Proof of Theorem 3.2.
Step 1: (H1), (H2) monotone. If is not monotone, then Lemma 3.5 shows that either (H1) or (H2) must be violated.
Step 2: Not (H1) not monotone. Suppose that . As , there exists a hyperplane with
Define and and , and denote . Then and in particular . Moreover,
(11) |
For any , we have , and then (11) implies . In particular, is not monotone for the cost .
Step 3: Not (H2) not monotone. Given the assumption, there exists a face of such that does not belong to . Set . As is a face of , there is a hyperplane such that
Define and . Then for every ,
showing that . Since , Lemma 2.2 implies that
is a face of —and a fortiori a face of —such that
We claim that for all . Indeed, for any it holds that
whereas for any , it holds that
In summary, for all and . In view of , it follows that for all and in particular that is not monotone for the cost . ∎
4 Applications
4.1 Soft-Min
To begin with a straightforward example, let be the unit simplex in Euclidean space and . Then
corresponds to finding the minimum value of . More specifically, is a minimizer if and only if is supported in ; i.e., . When this linear program is regularized with the entropy of , we obtain the usual soft-min (counterpart of log-sum-exp) which gives large weights to the small values of but non-zero weights to all values. The quadratic regularization,
(12) |
yields a sparse soft-min: as (or as ), the support of the solution tends to . In this example, and .
Corollary 4.1.
The unit simplex is monotone for all ; i.e., the support of the soft-min (or ) defined in (12) decreases monotonically to as (or ) increases.
Proof.
We have , the canonical basis of . Let be a face, then there exist such that . Note that , where is the solution of
namely . In particular, . Now Theorem 3.2 applies. ∎
4.2 Optimal Transport
We recall the quadratically regularized optimal transport problem
(QOT) |
where denotes the set of couplings of the probability measures . We will be concerned with the discrete case as introduced in [6, 14, 17]. See also [26, 27, 29] for more general theory, [4, 16, 19] for asymptotic aspects, [15, 20, 22, 25, 33] for computational approaches and [25, 35] for some recent applications.
Fix and two sets of distinct points, and . Let and denote the associated empirical measures, and let be a (cost) function. We note that while is simply the ratio of the probability mass functions. Any coupling can be identified with its matrix of probability weights
via . Writing similarly , (QOT) can be identified with the problem
Clearly where
denotes the Birkhoff polytope of doubly stochastic matrices; see, e.g., [9]. The monotonicity (in the sense of Definition 3.1) of is equivalent to that of .
The following result connects the measure-theoretic support with the geometric notion discussed below Definition 3.1.
Lemma 4.2.
Let . Denote by the unique face such that . For , the following are equivalent:
-
(i)
,
-
(ii)
,
-
(iii)
if for some face , then .
Proof.
As a consequence of Lemma 4.2, we have the following equivalence.
Corollary 4.3.
Let denote the solution of (QOT) for . The Birkhoff polytope is monotone if and only if the optimal support is monotone decreasing in for all .
Theorem 4.4.
The Birkhoff polytope is monotone for but not monotone for . In particular, when , the optimal support is not monotone for some costs .
Example 4.5.
We exhibit an example for where the support is not monotone. (This is not easily achieved by brute-force numerical experiment). Our starting point is the matrix in (14) below, which is used to describe a face where (H2) fails. Step 3 in the proof of Theorem 3.2 then suggests to perturb in a suitable direction to find a cost exhibiting non-monotonicity. With some geometric considerations, this leads us to propose the cost matrix
For , or equivalently regularization strength , the corresponding problem (QOT) has an exact solution coupling with probability weights given by
(This can be verified by noticing that the stated coupling is induced by the dual potentials ; see for instance [29, Theorem 2.2 and Remark 2.3].) We observe in particular that the location is not in the support. On the other hand, because the diagonal of features the smallest costs, the solution of the unregularized transport problem is to put all mass on the diagonal; i.e., to transport all mass from to for each . Because of the stationary convergence mentioned in Lemma 2.3, that is also the solution of the regularized problem for large enough value of (e.g., will do):
In particular, is part of the support for large , completing the example. Figure 2 shows in more detail the weight at as a function of (the inverse of) .

Remark 4.6.
The early work of [17] considered a minimum-cost flow problem with quadratic regularization that predates the optimal transport literature for this regularization. The authors point out that the solution can have non-monotone support already in the minimal setting of points [17, Figure 1]. Similarly, it is immediate to obtain non-monotonicity if instead of we use a different measure to define the -norm in (QOT), as in [29]. In those examples, the mechanism causing non-monotonicity is straightforward and quite different than in the present work, where non-monotonicity arises only in higher dimensions.
4.3 Proof of Lemma 4.2 and Theorem 4.4
The celebrated Birkhoff’s theorem [5] states that the vertices are the permutation matrices of ; that is, the elements of with binary entries. Following [11], the faces of can be described using the so-called permanent function. If is a binary matrix, its permanent is defined as the number of permutation matrices with (meaning that for all ). Denoting by the convex hull, the following characterization is contained in [11, Theorem 2.1].
Lemma 4.7.
Let and . Let be the matrix such that if there exists with and otherwise. Then is a face of if and only if .
We use Lemma 4.7 to relate faces with the distribution of zeros.
Lemma 4.8.
Let . The following are equivalent:
-
(i)
there exists such that and ;
-
(ii)
there exists a face of such that and .
Proof.
(i) (ii): Let be such that and . Assume w.l.o.g. that . Then belongs to the set
where is the matrix
As we see that is a face. Clearly does not belong to , proving the claim.
(ii) (i): Let be such that and let be the matrix such that if there exists such that and otherwise. We have by Lemma 4.7. As an element of , can be written as a convex combination of permutation matrices, i.e.,
As , there exists with . Using the fact that , we derive the existence of such that but . In particular, but , proving the claim. ∎
Lemma 4.9.
Let . Define the permutation matrices , by
That is, permutes the first with the -th element. Then is a face of and
(13) |
As a consequence,
and in particular (H2) is violated for .
Proof.
Define by
(14) |
so that is the entry-wise maximum . One readily verifies that . Hence by Lemma 4.7, is a face of . To determine , we consider the minimization problem
The Lagrangian for this problem is
and the resulting optimality equations are
By symmetry, the unique optimal satisfies for all , so that
and finally
Solving for and yields (13). Note that for any whereas for , for and for , implying the last claim. ∎
The matrix of (14) is inspired by the matrix in [10, Example 1.5] where the authors are interested in a different problem (and, in turn, credit [1]).666In [10], the conclusion is that “not every zero pattern of a fully indecomposable -matrix is realizable as the zero pattern of a doubly stochastic matrix whose diagonal sums avoiding the ’s are constant.” For the present question of monotonicity, this matrix yields a counterexample only for . As an aside, the subsequent proof that for , the face considered in Lemma 4.9 is (up to symmetries) the only face where , whereas all other faces satisfy .
Lemma 4.10.
Let . Then is monotone.
Proof.
We verify (H1) and (H2). Note that is the matrix with all entries equal to (corresponding to the product measure). This matrix is clearly in the relative interior of , showing (H1). The property (H2) is trivial for . For and , we give a computer-assisted proof in the interest of brevity; the code is available at https://github.com/marcelnutz/birkhoff-polytope-faces.777An analytic proof is also available, but requires us to go through 52 different cases for .888The cases can also be obtained as a corollary of the case . One can check directly that if the Birkhoff polytope is (not) monotone for some , then is also (not) monotone for all . Specifically, we generate all permutation matrices and determine all families of permutation matrices that form the vertices of a nonempty face by using the permanent function as in Lemma 4.7. There are 49 nonempty faces for and 7443 for . For each face , we can numerically compute or more specifically scalar coefficients such that and . The coefficients are non-unique for some faces; in that case, we choose positive weights if possible. Note that as are binary matrices and , the computation can be done with accuracy close to machine precision. It turns out that for , all coefficients satisfy (much larger than machine precision), establishing that and in particular (H2). For , most faces have coefficients , whereas for 96 faces, one coefficient is numerically close to zero, hence requiring an analytic argument. We verify that all those 96 faces are equivalent up to permutations of rows and columns, corresponding to a relabeling of the points and . Specifically, they are all equivalent to the particular face analyzed in Lemma 4.9, where we have seen that for (and in fact for ). We conclude that (H2) holds for all faces when , completing the proof. ∎
References
- [1] E. Achilles. Doubly stochastic matrices with some equal diagonal sums. Linear Algebra Appl., 22:293–296, 1978.
- [2] M. Agueh and G. Carlier. Barycenters in the Wasserstein space. SIAM J. Math. Anal., 43(2):904–924, 2011.
- [3] J. Backhoff-Veraguas, D. Bartl, M. Beiglböck, and M. Eder. All adapted topologies are equal. Probab. Theory Related Fields, 178(3-4):1125–1172, 2020.
- [4] E. Bayraktar, S. Eckstein, and X. Zhang. Stability and sample complexity of divergence regularized optimal transport. Preprint arXiv:2212.00367v1, 2022.
- [5] G. Birkhoff. Tres observaciones sobre el algebra lineal. Univ. Nac. Tucumán. Revista A., 5:147–151, 1946.
- [6] M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse optimal transport. volume 84 of Proceedings of Machine Learning Research, pages 880–889, 2018.
- [7] H. Brezis. Functional analysis, Sobolev spaces and partial differential equations. Universitext. Springer, New York, 2011.
- [8] A. Brøndsted. An introduction to convex polytopes, volume 90 of Graduate Texts in Mathematics. Springer-Verlag, New York-Berlin, 1983.
- [9] R. A. Brualdi. Combinatorial matrix classes, volume 108 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2006.
- [10] R. A. Brualdi and G. Dahl. Diagonal sums of doubly stochastic matrices. Linear Multilinear Algebra, 70(20):4946–4972, 2022.
- [11] R. A. Brualdi and P. M. Gibson. Convex polyhedra of doubly stochastic matrices. I. Applications of the permanent function. J. Combinatorial Theory Ser. A, 22(2):194–230, 1977.
- [12] J. A. Cuesta-Albertos and A. Tuero-Díaz. A characterization for the solution of the Monge-Kantorovich mass transference problem. Statist. Probab. Lett., 16(2):147–152, 1993.
- [13] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems 26, pages 2292–2300. 2013.
- [14] A. Dessein, N. Papadakis, and J.-L. Rouas. Regularized optimal transport and the rot mover’s distance. J. Mach. Learn. Res., 19(15):1–53, 2018.
- [15] S. Eckstein and M. Kupper. Computation of optimal transport and related hedging problems via penalization and neural networks. Appl. Math. Optim., 83(2):639–667, 2021.
- [16] S. Eckstein and M. Nutz. Convergence rates for regularized optimal transport via quantization. Math. Oper. Res., 49(2):1223–1240, 2024.
- [17] M. Essid and J. Solomon. Quadratically regularized optimal transport on graphs. SIAM J. Sci. Comput., 40(4):A1961–A1986, 2018.
- [18] M. Finzel and W. Li. Piecewise affine selections for piecewise polyhedral multifunctions and metric projections. J. Convex Anal., 7(1):73–94, 2000.
- [19] A. Garriz-Molina, A. González-Sanz, and G. Mordant. Infinitesimal behavior of quadratically regularized optimal transport and its relation with the porous medium equation. Preprint arXiv:2407.21528v1, 2024.
- [20] A. Genevay, M. Cuturi, G. Peyré, and F. Bach. Stochastic optimization for large-scale optimal transport. In Advances in Neural Information Processing Systems 29, pages 3440–3448, 2016.
- [21] A. González-Sanz and M. Nutz. Quantitative convergence of quadratically regularized linear programs. Preprint arXiv:2408.04088v1, 2024.
- [22] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 5769–5779, 2017.
- [23] W. W. Hager and H. Zhang. Projection onto a polyhedron that exploits sparsity. SIAM J. Optim., 26(3):1773–1798, 2016.
- [24] S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde. Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Processing Magazine, 34(4):43–59, 2017.
- [25] L. Li, A. Genevay, M. Yurochkin, and J. Solomon. Continuous regularized Wasserstein barycenters. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17755–17765. Curran Associates, Inc., 2020.
- [26] D. Lorenz and H. Mahler. Orlicz space regularization of continuous optimal transport problems. Appl. Math. Optim., 85(2):Paper No. 14, 33, 2022.
- [27] D. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport. Appl. Math. Optim., 83(3):1919–1949, 2021.
- [28] O. L. Mangasarian. Normal solutions of linear programs. Math. Programming Stud., 22:206–216, 1984. Mathematical programming at Oberwolfach, II (Oberwolfach, 1983).
- [29] M. Nutz. Quadratically regularized optimal transport: Existence and multiplicity of potentials. Preprint arXiv:2404.06847v1, 2024.
- [30] A. Paffenholz. Faces of Birkhoff polytopes. Electron. J. Combin., 22(1):Paper 1.67, 36, 2015.
- [31] V. M. Panaretos and Y. Zemel. Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl., 6:405–431, 2019.
- [32] G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019.
- [33] V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel. Large scale optimal transport and mapping estimation. In International Conference on Learning Representations, 2018.
- [34] C. Villani. Optimal transport, old and new, volume 338 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, Berlin, 2009.
- [35] S. Zhang, G. Mordant, T. Matsumoto, and G. Schiebinger. Manifold learning with sparse regularised optimal transport. Preprint arXiv:2307.09816v1, 2023.