On Combinatorial Properties of
Greedy Wasserstein Minimization
Abstract.
We discuss a phenomenon where Optimal Transport leads to a remarkable amount of combinatorial regularity. Consider infinite sequences in constructed in a greedy manner: given , the new point is chosen so as to minimize the Wasserstein distance between the empirical measure of the points and the Lebesgue measure,
This leads to fascinating sequences (for example: for some ) which coincide with sequences recently introduced by Ralph Kritzinger in a different setting. Numerically, the regularity of these sequences rival the best known constructions from Combinatorics or Number Theory. We prove a regularity result below the square root barrier.
Key words and phrases:
Wasserstein distance, Greedy Sequences, Irregularities of Distribution, Kronecker, van der Corput, Kritzinger, discrepancy1. Introduction and Main Results
1.1. Motivation.
Given any finite set of points , we propose adding a new point in such a way that the Wasserstein distance between the new empirical measure and the Lebesgue measure on is minimized. Formally, we consider a notion of energy via
and then choose to be any number in which assumes a global minimum. Throughout this paper, we will only consider the distance between a discrete measure and the uniform measure on for which the following convenient formula exists: assuming , then
This procedure is perhaps best illustrated with an example. Consider the initial set of points on . A computation shows that the point that one needs to add to minimize the transport cost is . Repeating the computation after having added , the optimal point in the next step is . Continuing the computation, one arrives at the sequence
which has a number of remarkable properties. The elements are rational (with numerator odd and denominator being , see [18] or §2.1). They are uniformly distributed in the unit interval in a highly regular manner that is illustrated in Fig. 1: newly added points avoid existing points and fill in existing gaps – considering the construction, this is not too surprising. However, they seem to do so at the optimal rate which is very difficult to do (see §1.3).
We consider another example (see Fig. 1, right) where we start with 50 randomly sampled points in . The greedy construction is eager to avoid existing clusters while filling in existing gaps – what is totally remarkable is that after the greedy construction has added another 50 elements, it seems to have completely repaired the existing defects and continues in a most regular fashion.
1.2. Kritzinger sequences.
This coincides with another type of construction that was recently given by Ralph Kritzinger [18] using a seemingly different metric.
Theorem 1.
Greedy minimization produces a Kritzinger sequence.
Kritzinger was motivated by a different notion of regularity which is a classical object in the irregularities of distribution: the discrepancy. Formally, Kritzinger proposes, given to pick so that it minimizes
This expression is at first glance perhaps a bit difficult to interpret while the Wasserstein interpretation allows access to the theory of Optimal Transport. Theorem 1 is only true on the unit interval (due to the rigidity of Optimal Transport in one dimension) and fails on domains in dimensions where minimization and Kritzinger sequences are different. We first collect some of the results from [18].
Theorem (Kritzinger [18]).
If is constructed using the greedy method, then it is different from and for some . Moreover, any Kritzinger sequence satisfies, for all ,
where the constant only depends on the initial set of points.
We give a new proof of this result in our framework before going on to further refining the estimate. The estimate essentially says that for most points
This is the typical square root law that one would also expect from random variables: it is a common theme in the construction of greedy sequences that one can usually establish a error bound using an averaging trick. Simultaneously, it seems difficult to go past it in most cases.
1.3. How regular are these sequences really?
Suppose now that is an arbitrary sequence in . In 1935 van der Corput [10, 11] asked whether there exists an infinite sequence so that for some universal constant and all
This was shown to be impossible by van Aardenne-Ehrenfest [1] and started the theory of irregularities of distribution. After seminal contributions by K. F. Roth [21], it was then proven by W. M. Schmidt [22] in 1972 that for every sequence there are infinitely many integers such that
The optimal constant is now known to satisfy where the lower bound is due to Larcher & Puchhammer [16] and the upper bound is due to Ostromoukhov [19]. There are two classical types of sequences with very good behavior: the Kronecker sequences, also known as irrational rotations on the torus, . The other classical example is the van der Corput sequence in base 2 defined via digit expansions in base 2 (see [9, 12, 13, 17] for details).
We compared the maximum deviation (multiplied with to rescale the leading asymptotic order) between the Kritzinger sequence started with , the van der Corput sequence in base 2 and the Kronecker sequence with . The Kritzinger sequence is somewhat superior to the van der Corput sequence and very comparable to the golden ratio Kronecker sequence (albeit it seems to have less variance). We emphasize that these two sequences are among the most regular sequences known and their behavior is within a small factor from the theoretical limit: indeed, irrational rotations on the torus are among the most well studied object in many different fields of mathematics while the van der Corput sequence was specifically designed to be as regular as possible. The Kritzinger sequence appears to draws its regularity from the geometry of the Wasserstein distance.
1.4. Improved Regularity
Our main result is a new regularity statement for Kritzinger sequences. Such a sequence is assumed to be initialized with an arbitrary finite set of real numbers and then extended in the greedy fashion.
Theorem 2.
Every Kritzinger sequence has infinitely many with
In light of Fig. 2, we naturally expect that can be replaced by . The scaling of the discrepancy suggests that may be a plausible guess. It is not clear to us whether it could be as small as . Theorem 2 breaks the square root barrier, in particular, it shows in a concrete way that Kritzinger sequences are more regular than i.i.d. random variables (for which this quantity is w.h.p).
1.5. A curious inequality.
A new technical ingredient of the proof is a curious new inequality for real-valued functions.
Lemma (Main Lemma).
Let be continuous and . Then
This inequality is sharp up to at most a factor of 8: take to be the characteristic function on . Then
while
This inequality plays a role in a part of the proof that is somewhat modular and could relatively easily be exchanged by something else. Any lower bound
could be used to derive a regularity statement for Kritzinger sequences.
1.6. Related Results.
The problem of irregularities of distributions in dimensions has proven to be challenging [2, 3, 5, 6]. Indeed, at this point there is not even a clear consensus on what the optimal scale of regularity should be. Most examples of highly regular sequences seem to follow the spirit of van der Corput sequences or generalized Kronecker sequences (we refer to the textbooks [9, 12, 13, 17]). This naturally suggests investigating the possibility of new constructions. Such constructions were given by the author in [23, 24] and by Brown and the author in [7, 8]. Pausinger [20] showed that a large class of greedy constructions can replicate the van der Corput sequence: this means that, at least for some suitable initial conditions, they can produce sequences at the theoretical limit of regularity. One of the most interesting example of a greedy sequence was recently given by Ralph Kritzinger [18]: the arising sequence is, empirically, as regular as any of the other greedy sequences but additionally consists of rational numbers which greatly simplifies computation. Most of these greedy constructions can easily be shown to have regularity , a barrier which has resisted improvement. A notable exception is [25] where additional structure in the complex plane could be used to show that a classic greedy constructions for polynomials has optimal rate up to possibly logarithmic factors (see also work of Beck [4] resolving a question of Erdős [14]). The Wasserstein distance has been introduced to the study of irregularities of distributions in [26] and was then further developed in [7]. We especially emphasize work of C. Graham [15] who established an irregularities of distributions phenomenon for . An explicit inequality relating the distances to the Green Energy [27] has lead to the following rather general principle (established by Brown and the author in [7]): on compact manifolds in dimension normalized to have volume 1, there always exist greedy sequences obtained by minimizing the Green energy for which uniformly in .
2. Proofs
§3.1 gives two proofs of Theorem 1. §2.2 derives a useful alternative representation, §2.3 proves an Lemma which is enough to prove a regularity rate of . §2.4 proves the Main Lemma which is then used in §2.5 to prove Theorem 2.
2.1. Proof of Theorem 1
Proof.
Let be given. Observe that
Rescaling by a factor of , we have
Let us now consider the original set and let us add an additional point . We introduce the variables such that
and . A computation shows that
Among these terms only two, and the last sum, actually depend on . The next step consists in simplifying the difference between the last two sums to exploit some additional cancellation. There are now three cases: (1) , (2) or (3) there exists such that
We will only deal with case (3), the other two cases can be dealt with analogously. Under this assumption, we necessarily have
Thus we have
Collecting all the terms that actually depend on , our goal is to minimize
which is exactly the Kritzinger functional. ∎
Kritzinger’s functional can be derived from trying to minimize the discrepancy
In hindsight, the fact that these functionals are related is perhaps not too surprising: the fact that Wasserstein distances can be equivalently defined over empirical cumulative distribution functions is a feature of us working in one dimension (see, for example, Vallender [28]). Another quick proof of Theorem 1 could be given by appealing to the formula (see e.g. [18, Equation 10]) that for ordered points
while
This quickly implies that
for a constant that is independent of the set of points. Therefore, minimizing one of the functionals is equivalent to minimizing the other. The explicit form of allows to prove that minimizers are rational numbers.
Proposition (Kritzinger [18]).
Let and let
Then the minimizer is different from all the existing points and of the form for for some odd integer .
Proof.
Note that the function is continuous, not differentiable in the points but differentiable everywhere else. In particular, if is different from the points, then
The only candidates for roots outside of the existing points are numbers of the form
which is exactly of the desired form. However, we also note that is piecewise linear between the points. If it were now the case that assumes its global minimum in , then we would have to have and for sufficiently small values of which is ruled out by the explicit form of . ∎
2.2. Some Preparatory Work.
Since we now know that minimizing the cost is equivalent to minimizing the discrepancy, we will instead work with this new, renormalized functional. To this end we introduce, for any given set of points in the unit interval, the counting function
We will also use, throughout the rest of the paper, the rescaled version
Assume is being added. A straightforward computation shows that
Note that some of these terms do not depend on while others do. Collecting only those terms that the depend on shows that one needs to solve
where the function is defined as
For the remainder of the paper, we will work with this representation.
2.3. An Elementary Lemma
The purpose of this section is to introduce a basic Lemma and to use it to prove the missing part of Kritzinger’s Theorem: that
where only depends on the initial configuration. Subsequent arguments will then both use and refine this step.
Lemma (Basic Form).
Let be continuous. Then
Proof.
Using integration by parts
Introducing the anti-derivative via we have, integrating again by parts,
as well as Thus
Every function has to be at least as large as its average somewhere and
∎
One sees that the proof is merely a sequence of identities: the part where one can hope to gain is the last step which is clearly only sharp whenever is constant. The elementary Lemma now implies the missing part of Corollary 1.
Proposition.
We have
Proof.
Note that we have
There exists such that from which the result follows since
∎
2.4. A refined Lemma.
The purpose of this section is to establish what is perhaps the core of the improvement: instead of merely using that a function has to exceed its average, as in the proof of the basic Lemma, we quantify the gain from below.
Lemma (Main Lemma).
Let be continuous and . Then
Proof.
Using again the antiderivative
the computations in the basic Lemma already established that
We will work with this expression instead. In order to emphasize that the integral in this expression is truly a constant, the average value, we will abbreviate
We start by arguing that if assumes a value much smaller than the average, then it also has to assume values that are larger than the average. The difficulty here is that ‘much smaller than the average’ is a pointwise statement which we have to turn into a statement about the integral. This is summarized in the following statement.
Fact. We have
Proof of the Fact..
For simplicity of exposition,we abbreviate
Let us assume the minimum is assumed in
Let be the shortest interval containing both and a point for which
The existence of such a point follows from the fact that is continuous and
and the function assumes both the value and the value and the intermediate value theorem applies. If there is nothing to prove, thus we can assume implying . The shortest interval containing both has positive length and are its endpoints. Using Cauchy-Schwarz on ,
This provides an upper bound on . Our next goal will be to show that cannot be too long: if it is long, then has to be large which is exactly what we are trying to show. This follows at once from
from which we infer that
We can now combine these bounds. We have
which implies that
∎
The fact will now allow us to conclude the proof of the Main Lemma. Consider the values assumed by . Since is continuous, we have
where . We introduce . As a first observation, we have
Note that and thus from which we deduce that
We will now show that a large value of will imply that has to exceed its maximum by at least a little bit (depending on ). For this, we write , where and . Then, tautologically,
Simultaneously, appealing to the fact proved above, we have
Altogether, we have
which we turn into a smoother object via
Differentiation shows that
which shows that the minimum is assumed for . Thus
Altogether, we obtain, as desired,
∎
2.5. Proof of Theorem 2
Proof.
Suppose now that is a Kritzinger sequence. We note that
where only depends on the initial configuration. Suppose now there exists such that for all we have
We will use this to deduce a contradiction. We have
The Main Lemma implies
However, for sufficiently large (say where depends only on ), this implies, for all
which leads to a contradiction since these integrals are always positive. ∎
Remark. We note that this allows for a small refinement: the argument shows that for all sufficiently large, there always exist for which
References
- [1] T. van Aardenne-Ehrenfest, Proof of the Impossibility of a Just Distribution of an Infinite Sequence Over an Interval, Proc. Kon. Ned. Akad. Wetensch. 48, 3-8, 1945.
- [2] J. Beck, A two-dimensional van Aardenne-Ehrenfest theorem in irregularities of distribution. Compositio Math. 72 3, 269–339 (1989).
- [3] J. Beck and W. Chen, Irregularities of Distribution, Cambridge Tracts in Mathematics (No. 89), Cambridge University Press, 1987.
- [4] J. Beck, The modulus of polynomials with zeros on the unit circle: A problem of Erdos, Annals of Mathematics, 134 (1991), p. 609–651
- [5] D. Bilyk and M. Lacey, On the small ball Inequality in three dimensions, Duke Math. J. 143 (2008), no. 1, 81–115.
- [6] D. Bilyk, M. Lacey and A. Vagharshakyan, On the small ball inequality in all dimensions, J. Funct. Anal. 254 (2008), no. 9, 2470–2502.
- [7] L. Brown and S. Steinerberger, Positive-definite Functions, Exponential Sums and the Greedy Algorithm: a curious Phenomenon, Journal of Complexity 61 (2020), 101485
- [8] L. Brown and S. Steinerberger, On the Wasserstein distance between classical sequences and the Lebesgue measure, Trans. Amer. Math. Soc. 373 (2020), p. 8943–8962.
- [9] B. Chazelle, The discrepancy method. Randomness and complexity. Cambridge University Press, Cambridge, 2000.
- [10] J. van der Corput, Verteilungsfunktionen I, Proc. Akad. Wetensch. , 38 (1935), 813–821.
- [11] J. van der Corput, Verteilungsfunktionen II, Akad. Wetensch., Proc. 38 (1935), 1058–1068
- [12] J. Dick and F. Pillichshammer, Digital nets and sequences. Discrepancy theory and quasi-Monte Carlo integration. Cambridge University Press, Cambridge, 2010.
- [13] M. Drmota, R. Tichy, Sequences, discrepancies and applications. Lecture Notes in Mathematics, 1651. Springer-Verlag, Berlin, 1997.
- [14] P. Erdős, Some unsolved problems, Michigan Math. J. 4 (1957), p. 291–300
- [15] C. Graham, Irregularity of distribution in Wasserstein distance. Journal of Fourier Analysis and Applications, 26 (2020), p. 1-21.
- [16] G. Larcher and F. Puchhammer, An improved bound for the star discrepancy of sequences in the unit interval, Unif. Distrib. Theory 11 (2016), p. 1–14
- [17] L. Kuipers and H. Niederreiter, Uniform distribution of sequences. Pure and Applied Mathematics. Wiley-Interscience, New York-London-Sydney, 1974.
- [18] R. Kritzinger, Uniformly distributed sequences generated by a greedy minimization of the discrepancy, arXiv:2109.06298.
- [19] V. Ostromoukhov. Recent progress in improvement of extreme discrepancy and star discrepancy of one-dimensional sequences. MCQMC 2008, pages 561–572.
- [20] F. Pausinger, Greedy energy minimization can count in binary: point charges and the van der Corput sequence, Annali di Matematica Pura ed Applicata 200 (2021), p. 165–186.
- [21] K. F. Roth, On irregularities of distribution. Mathematika 1, 73–79 (1954).
- [22] W. Schmidt, Irregularities of distribution. VII. Acta Arith. 21 (1972), 45–50.
- [23] S. Steinerberger, A Nonlocal Functional promoting Low-Discrepancy Point Sets, Journal of Complexity 54 (2019), 101410
- [24] S. Steinerberger, Dynamically Defined Sequences with Small Discrepancy, Monatshefte fur Mathematik 191 (2020), p. 639–655
- [25] S. Steinerberger, Polynomials with zeros on the unit circle: regularity of Leja sequences, Mathematika 67 (2021), p. 553–568
- [26] S. Steinerberger, Wasserstein distance, Fourier series and applications, Monatshefte fur Mathematik 194 (2021), p. 305-338
- [27] S. Steinerberger, A Wasserstein inequality and minimal Green energy on compact manifolds, Journal of Functional Analysis 281(2021):109076
- [28] S. Vallender, Calculation of the Wasserstein Distance Between Probability Distributions on the Line Theory of Probability & Its Applications 18 (1974), p. 435.