This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Combinatorial Properties of
Greedy Wasserstein Minimization

Stefan Steinerberger Department of Mathematics, University of Washington, Seattle, WA 98195, USA [email protected]
Abstract.

We discuss a phenomenon where Optimal Transport leads to a remarkable amount of combinatorial regularity. Consider infinite sequences (xk)k=1(x_{k})_{k=1}^{\infty} in [0,1][0,1] constructed in a greedy manner: given x1,,xnx_{1},\dots,x_{n}, the new point xn+1x_{n+1} is chosen so as to minimize the Wasserstein distance W2W_{2} between the empirical measure of the n+1n+1 points and the Lebesgue measure,

xn+1=argminxW2(1n+1k=1nδxk+δxn+1,dx).x_{n+1}=\arg\min_{x}~{}W_{2}\left(\frac{1}{n+1}\sum_{k=1}^{n}\delta_{x_{k}}+\frac{\delta_{x}}{n+1},dx\right).

This leads to fascinating sequences (for example: xn+1=(2k+1)/(2n+2)x_{n+1}=(2k+1)/(2n+2) for some kk\in\mathbb{Z}) which coincide with sequences recently introduced by Ralph Kritzinger in a different setting. Numerically, the regularity of these sequences rival the best known constructions from Combinatorics or Number Theory. We prove a regularity result below the square root barrier.

Key words and phrases:
Wasserstein distance, Greedy Sequences, Irregularities of Distribution, Kronecker, van der Corput, Kritzinger, L2L^{2}-discrepancy
S.S. is supported by the NSF (DMS-2123224) and the Alfred P. Sloan Foundation.

1. Introduction and Main Results

1.1. Motivation.

Given any finite set of points x1,,xn[0,1]x_{1},\dots,x_{n}\in[0,1], we propose adding a new point in such a way that the Wasserstein W2W_{2}-distance between the new empirical measure and the Lebesgue measure on [0,1][0,1] is minimized. Formally, we consider a notion of energy E:[0,1]0E:[0,1]\rightarrow\mathbb{R}_{\geq 0} via

E(x)=W2(1n+1k=1nδxk+δxn+1,dx)E(x)=W_{2}\left(\frac{1}{n+1}\sum_{k=1}^{n}\delta_{x_{k}}+\frac{\delta_{x}}{n+1},dx\right)

and then choose xn+1x_{n+1} to be any number in which EE assumes a global minimum. Throughout this paper, we will only consider the W2W_{2} distance between a discrete measure and the uniform measure on [0,1][0,1] for which the following convenient formula exists: assuming 0a1a2an10\leq a_{1}\leq a_{2}\leq\dots\leq a_{n}\leq 1, then

W2(1nk=1nδak,dx)2=i=1n(i1)/ni/n(xai)2𝑑x.W_{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{a_{k}},dx\right)^{2}=\sum_{i=1}^{n}\int_{(i-1)/n}^{i/n}\left(x-a_{i}\right)^{2}dx.

This procedure is perhaps best illustrated with an example. Consider the initial set of points {1/π,1/e,1/2}\left\{1/\pi,1/e,1/\sqrt{2}\right\} on [0,1][0,1]. A computation shows that the point x4x_{4} that one needs to add to minimize the W2W_{2}-transport cost is x4=7/8x_{4}=7/8. Repeating the computation after having added x4x_{4}, the optimal point in the next step is x5=1/10x_{5}=1/10. Continuing the computation, one arrives at the sequence

1π,1e,12,78,110,712,714,1316,318,\frac{1}{\pi},\frac{1}{e},\frac{1}{\sqrt{2}},\frac{7}{8},\frac{1}{10},\frac{7}{12},\frac{7}{14},\frac{13}{16},\frac{3}{18},\dots

which has a number of remarkable properties. The elements are rational (with numerator odd and denominator being 2n2n, see [18] or §2.1). They are uniformly distributed in the unit interval in a highly regular manner that is illustrated in Fig. 1: newly added points avoid existing points and fill in existing gaps – considering the construction, this is not too surprising. However, they seem to do so at the optimal rate which is very difficult to do (see §1.3).

Refer to captionRefer to caption
Figure 1. An illustration of the first nn elements of a sequence: the points (k/n,xk)(k/n,~{}x_{k}) for 1kn1\leq k\leq n (here n=250n=250). Left: starting with 1/π,1/e,1/21/\pi,1/e,1/\sqrt{2}. Right: starting with 50 random elements.

We consider another example (see Fig. 1, right) where we start with 50 randomly sampled points in [0,1][0,1]. The greedy construction is eager to avoid existing clusters while filling in existing gaps – what is totally remarkable is that after the greedy construction has added another 50 elements, it seems to have completely repaired the existing defects and continues in a most regular fashion.

1.2. Kritzinger sequences.

This coincides with another type of construction that was recently given by Ralph Kritzinger [18] using a seemingly different metric.

Theorem 1.

Greedy W2W_{2} minimization produces a Kritzinger sequence.

Kritzinger was motivated by a different notion of regularity which is a classical object in the irregularities of distribution: the L2L^{2}-discrepancy. Formally, Kritzinger proposes, given x1,,xnx_{1},\dots,x_{n} to pick xn+1x_{n+1} so that it minimizes

F(x)=2k=1nmax{xk,x}+(n+1)2x2x.F(x)=-2\sum_{k=1}^{n}\max\left\{x_{k},x\right\}+(n+1)^{2}x^{2}-x.

This expression is at first glance perhaps a bit difficult to interpret while the Wasserstein W2W_{2} interpretation allows access to the theory of Optimal Transport. Theorem 1 is only true on the unit interval (due to the rigidity of Optimal Transport in one dimension) and fails on domains in d2d\geq 2 dimensions where W2W_{2} minimization and Kritzinger sequences are different. We first collect some of the results from [18].

Theorem (Kritzinger [18]).

If xnx_{n} is constructed using the greedy method, then it is different from x1,,xn1x_{1},\dots,x_{n-1} and xn=(2k+1)/(2n)x_{n}=(2k+1)/(2n) for some k0k\in\mathbb{N}_{\geq 0}. Moreover, any Kritzinger sequence satisfies, for all nn\in\mathbb{N},

01|#{1kn:xkx}nx|2𝑑xn3+c,\int_{0}^{1}\left|\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx\right|^{2}dx\leq\frac{n}{3}+c,

where the constant cc only depends on the initial set of points.

We give a new proof of this result in our framework before going on to further refining the estimate. The estimate essentially says that for most points 0<x<10<x<1

#{1kn:0xkx}=nx±𝒪(n).\#\left\{1\leq k\leq n:0\leq x_{k}\leq x\right\}=n\cdot x\pm\mathcal{O}(\sqrt{n}).

This is the typical square root law that one would also expect from random variables: it is a common theme in the construction of greedy sequences that one can usually establish a 𝒪(n)\mathcal{O}(\sqrt{n}) error bound using an averaging trick. Simultaneously, it seems difficult to go past it in most cases.

1.3. How regular are these sequences really?

Suppose now that (xk)k=1(x_{k})_{k=1}^{\infty} is an arbitrary sequence in [0,1][0,1]. In 1935 van der Corput [10, 11] asked whether there exists an infinite sequence so that for some universal constant CC and all nn\in\mathbb{N}

max0x1|#{1kn:xkx}nx|C.\max_{0\leq x\leq 1}\left|\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx\right|\leq C.

This was shown to be impossible by van Aardenne-Ehrenfest [1] and started the theory of irregularities of distribution. After seminal contributions by K. F. Roth [21], it was then proven by W. M. Schmidt [22] in 1972 that for every sequence there are infinitely many integers nn\in\mathbb{N} such that

max0x1|#{1kn:xkx}nx|clogn.\max_{0\leq x\leq 1}\left|\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx\right|\geq c\log{n}.

The optimal constant is now known to satisfy 0.065c0.22230.065\leq c\leq 0.2223 where the lower bound is due to Larcher & Puchhammer [16] and the upper bound is due to Ostromoukhov [19]. There are two classical types of sequences with very good behavior: the Kronecker sequences, also known as irrational rotations on the torus, xn=nαmod1x_{n}=n\alpha~{}\mod 1. The other classical example is the van der Corput sequence in base 2 defined via digit expansions in base 2 (see [9, 12, 13, 17] for details).

Refer to captionRefer to caption
Figure 2. Comparison to van der Corput (left) and nϕmod1n\phi\mod 1 (right).

We compared the maximum deviation (multiplied with 1/logn1/\log{n} to rescale the leading asymptotic order) between the Kritzinger sequence started with x1=1/2x_{1}=1/2, the van der Corput sequence in base 2 and the Kronecker sequence with α=(1+5)/2\alpha=(1+\sqrt{5})/2. The Kritzinger sequence is somewhat superior to the van der Corput sequence and very comparable to the golden ratio Kronecker sequence (albeit it seems to have less variance). We emphasize that these two sequences are among the most regular sequences known and their behavior is within a small factor from the theoretical limit: indeed, irrational rotations on the torus are among the most well studied object in many different fields of mathematics while the van der Corput sequence was specifically designed to be as regular as possible. The Kritzinger sequence appears to draws its regularity from the geometry of the Wasserstein distance.

1.4. Improved Regularity

Our main result is a new regularity statement for Kritzinger sequences. Such a sequence is assumed to be initialized with an arbitrary finite set of real numbers and then extended in the greedy fashion.

Theorem 2.

Every Kritzinger sequence (xk)k=1(x_{k})_{k=1}^{\infty} has infinitely many nn\in\mathbb{N} with

max0x1|0x#{1kn:xky}nydy|2n1/3.\max_{0\leq x\leq 1}\left|\int_{0}^{x}\#\left\{1\leq k\leq n:x_{k}\leq y\right\}-ny~{}dy\right|\leq 2\cdot n^{1/3}.

In light of Fig. 2, we naturally expect that n1/3n^{1/3} can be replaced by 𝒪(logn)\mathcal{O}(\log{n}). The scaling of the L1L^{1}-discrepancy suggests that 𝒪(logn)\mathcal{O}(\sqrt{\log{n}}) may be a plausible guess. It is not clear to us whether it could be as small as 𝒪(1)\mathcal{O}(1). Theorem 2 breaks the square root barrier, in particular, it shows in a concrete way that Kritzinger sequences are more regular than i.i.d. random variables (for which this quantity is n1/2\sim n^{1/2} w.h.p).

1.5. A curious inequality.

A new technical ingredient of the proof is a curious new inequality for real-valued functions.

Lemma (Main Lemma).

Let g:[0,1]g:[0,1]\rightarrow\mathbb{R} be continuous and g0g\not\equiv 0. Then

max0z10zg(x)x𝑑xz1g(x)(1x)𝑑x1161gL22(max0x1|0xg(y)𝑑y|)3.\max_{0\leq z\leq 1}~{}\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx\geq\frac{1}{16}\frac{1}{\|g\|_{L^{2}}^{2}}\left(\max_{0\leq x\leq 1}\left|\int_{0}^{x}g(y)dy\right|\right)^{3}.

This inequality is sharp up to at most a factor of 8: take gg to be the characteristic function on [0,ε][0,\varepsilon]. Then

max0z10zg(x)x𝑑xz1g(x)(1x)𝑑x=0εg(x)x𝑑x=ε22\max_{0\leq z\leq 1}~{}\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx=\int_{0}^{\varepsilon}g(x)xdx=\frac{\varepsilon^{2}}{2}

while

1161gL22(max0x1|0xg(y)𝑑y|)3=116εε3=ε216.\frac{1}{16}\frac{1}{\|g\|_{L^{2}}^{2}}\left(\max_{0\leq x\leq 1}\left|\int_{0}^{x}g(y)dy\right|\right)^{3}=\frac{1}{16\varepsilon}\cdot\varepsilon^{3}=\frac{\varepsilon^{2}}{16}.

This inequality plays a role in a part of the proof that is somewhat modular and could relatively easily be exchanged by something else. Any lower bound

max0z10zg(x)x𝑑xz1g(x)(1x)𝑑xinformative quantity\max_{0\leq z\leq 1}~{}\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx\geq\mbox{informative quantity}

could be used to derive a regularity statement for Kritzinger sequences.

1.6. Related Results.

The problem of irregularities of distributions in dimensions d2d\geq 2 has proven to be challenging [2, 3, 5, 6]. Indeed, at this point there is not even a clear consensus on what the optimal scale of regularity should be. Most examples of highly regular sequences seem to follow the spirit of van der Corput sequences or generalized Kronecker sequences (we refer to the textbooks [9, 12, 13, 17]). This naturally suggests investigating the possibility of new constructions. Such constructions were given by the author in [23, 24] and by Brown and the author in [7, 8]. Pausinger [20] showed that a large class of greedy constructions can replicate the van der Corput sequence: this means that, at least for some suitable initial conditions, they can produce sequences at the theoretical limit of regularity. One of the most interesting example of a greedy sequence was recently given by Ralph Kritzinger [18]: the arising sequence is, empirically, as regular as any of the other greedy sequences but additionally consists of rational numbers which greatly simplifies computation. Most of these greedy constructions can easily be shown to have regularity n1/2\lesssim n^{-1/2}, a barrier which has resisted improvement. A notable exception is [25] where additional structure in the complex plane could be used to show that a classic greedy constructions for polynomials has optimal rate up to possibly logarithmic factors (see also work of Beck [4] resolving a question of Erdős [14]). The Wasserstein distance has been introduced to the study of irregularities of distributions in [26] and was then further developed in [7]. We especially emphasize work of C. Graham [15] who established an irregularities of distributions phenomenon for d=1d=1. An explicit inequality relating the W2W_{2} distances to the Green Energy [27] has lead to the following rather general principle (established by Brown and the author in [7]): on compact manifolds in dimension d3d\geq 3 normalized to have volume 1, there always exist greedy sequences (xk)k=1(x_{k})_{k=1}^{\infty} obtained by minimizing the Green energy for which W2(1nk=1nδak,dx)n1/dW_{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{a_{k}},dx\right)\lesssim n^{-1/d} uniformly in nn.

2. Proofs

§3.1 gives two proofs of Theorem 1. §2.2 derives a useful alternative representation, §2.3 proves an Lemma which is enough to prove a regularity rate of n1/2\lesssim n^{1/2}. §2.4 proves the Main Lemma which is then used in §2.5 to prove Theorem 2.

2.1. Proof of Theorem 1

Proof.

Let 0x1x2xn10\leq x_{1}\leq x_{2}\leq\dots\leq x_{n}\leq 1 be given. Observe that

W2(1nk=1nδxk,dx)2\displaystyle W_{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{x_{k}},dx\right)^{2} =k=1nk1nkn(xkz)2𝑑z\displaystyle=\sum_{k=1}^{n}\int_{\frac{k-1}{n}}^{\frac{k}{n}}(x_{k}-z)^{2}dz
=k=1nk1nknxk22xkz+z2dz\displaystyle=\sum_{k=1}^{n}\int_{\frac{k-1}{n}}^{\frac{k}{n}}x_{k}^{2}-2x_{k}z+z^{2}dz
=13+1nk=1nxk2k=1nxk2k1n2.\displaystyle=\frac{1}{3}+\frac{1}{n}\sum_{k=1}^{n}x_{k}^{2}-\sum_{k=1}^{n}x_{k}\frac{2k-1}{n^{2}}.

Rescaling by a factor of n2n^{2}, we have

n2W2(1nk=1nδxk,dx)2=n23+nk=1nxk2k=1nxk(2k1).n^{2}\cdot W_{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{x_{k}},dx\right)^{2}=\frac{n^{2}}{3}+n\sum_{k=1}^{n}x_{k}^{2}-\sum_{k=1}^{n}x_{k}(2k-1).

Let us now consider the original set and let us add an additional point xx. We introduce the variables such that

{x1,,xn,x}={y1,,yn+1}\left\{x_{1},\dots,x_{n},x\right\}=\left\{y_{1},\dots,y_{n+1}\right\}

and y1y2yn+1y_{1}\leq y_{2}\leq\dots\leq y_{n+1}. A computation shows that

(n+1)2W22(1n+1k=1n+1δyk,dx)\displaystyle(n+1)^{2}\cdot W_{2}^{2}\left(\frac{1}{n+1}\sum_{k=1}^{n+1}\delta_{y_{k}},dx\right) =(n+1)23+(n+1)k=1n+1yk2k=1n+1yk(2k1)\displaystyle=\frac{(n+1)^{2}}{3}+(n+1)\sum_{k=1}^{n+1}y_{k}^{2}-\sum_{k=1}^{n+1}y_{k}(2k-1)
=n2W22(1nk=1nδxk,dx)+(n+1)2n23\displaystyle=n^{2}\cdot W_{2}^{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{x_{k}},dx\right)+\frac{(n+1)^{2}-n^{2}}{3}
+(n+1)x2+k=1nxk2\displaystyle+(n+1)x^{2}+\sum_{k=1}^{n}x_{k}^{2}
+k=1nxk(2k1)k=1n+1yk(2k1).\displaystyle+\sum_{k=1}^{n}x_{k}\cdot(2k-1)-\sum_{k=1}^{n+1}y_{k}\cdot(2k-1).

Among these terms only two, (n+1)x2(n+1)x^{2} and the last sum, actually depend on xx. The next step consists in simplifying the difference between the last two sums to exploit some additional cancellation. There are now three cases: (1) x<x1x<x_{1}, (2) x>xnx>x_{n} or (3) there exists jj such that

xjx<xj+1.x_{j}\leq x<x_{j+1}.

We will only deal with case (3), the other two cases can be dealt with analogously. Under this assumption, we necessarily have

yi={xiifijxifi=j+1xi1ifij+2y_{i}=\begin{cases}x_{i}\quad&\mbox{if}~{}i\leq j\\ x\qquad&\mbox{if}~{}i=j+1\\ x_{i-1}\qquad&\mbox{if}~{}i\geq j+2\end{cases}

Thus we have

k=1n+1yk(2k1)k=1nxk(2k1)\displaystyle\sum_{k=1}^{n+1}y_{k}(2k-1)-\sum_{k=1}^{n}x_{k}(2k-1) =(2j+1)x+2kj+1xk\displaystyle=(2j+1)x+2\sum_{k\geq j+1}x_{k}
=x+2k=1nmax{x,xk}.\displaystyle=x+2\sum_{k=1}^{n}\max\left\{x,x_{k}\right\}.

Collecting all the terms that actually depend on xx, our goal is to minimize

F(x)=(n+1)x2x2i=1nmax{x,xi}.F(x)=(n+1)x^{2}-x-2\sum_{i=1}^{n}\max\left\{x,x_{i}\right\}.

which is exactly the Kritzinger functional. ∎

Kritzinger’s functional can be derived from trying to minimize the L2L^{2}-discrepancy

01|#{1kn:xkx}nx|2𝑑x.\int_{0}^{1}\left|\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx\right|^{2}dx.

In hindsight, the fact that these functionals are related is perhaps not too surprising: the fact that Wasserstein distances can be equivalently defined over empirical cumulative distribution functions is a feature of us working in one dimension (see, for example, Vallender [28]). Another quick proof of Theorem 1 could be given by appealing to the formula (see e.g. [18, Equation 10]) that for ordered points

01|#{1kn:xkx}nx|2𝑑x=nk=1n(xk2k12n)2+112\int_{0}^{1}\left|\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx\right|^{2}dx=n\sum_{k=1}^{n}\left(x_{k}-\frac{2k-1}{2n}\right)^{2}+\frac{1}{12}

while

n2W2(1nk=1nδxk,dx)2=n23+nk=1nxk2k=1nxk(2k1).n^{2}\cdot W_{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{x_{k}},dx\right)^{2}=\frac{n^{2}}{3}+n\sum_{k=1}^{n}x_{k}^{2}-\sum_{k=1}^{n}x_{k}(2k-1).

This quickly implies that

01|#{1kn:xkx}nx|2𝑑xn2W22(1nk=1nδxk,dx)=cn\int_{0}^{1}\left|\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx\right|^{2}dx-n^{2}\cdot W_{2}^{2}\left(\frac{1}{n}\sum_{k=1}^{n}\delta_{x_{k}},dx\right)=c_{n}

for a constant cnc_{n} that is independent of the set of points. Therefore, minimizing one of the functionals is equivalent to minimizing the other. The explicit form of FF allows to prove that minimizers are rational numbers.

Proposition (Kritzinger [18]).

Let 0x1x2xn10\leq x_{1}\leq x_{2}\leq\dots\leq x_{n}\leq 1 and let

F(x)=(n+1)x2x2k=1nmax{x,xk}.F(x)=(n+1)x^{2}-x-2\sum_{k=1}^{n}\max\left\{x,x_{k}\right\}.

Then the minimizer is different from all the existing points and of the form x=j/(2n)x=j/(2n) for for some odd integer j2+1j\in 2\mathbb{Z}+1.

Proof.

Note that the function FF is continuous, not differentiable in the points x1,,xnx_{1},\dots,x_{n} but differentiable everywhere else. In particular, if xx is different from the nn points, then

F(x)=2(n+1)x12#{1kn:xk<x}.F^{\prime}(x)=2(n+1)x-1-2\cdot\#\left\{1\leq k\leq n:x_{k}<x\right\}.

The only candidates for roots outside of the existing points are numbers of the form

x=1+2#{1kn:xk<x}2n+2x=\frac{1+2\#\left\{1\leq k\leq n:x_{k}<x\right\}}{2n+2}

which is exactly of the desired form. However, we also note that FF^{\prime} is piecewise linear between the points. If it were now the case that FF assumes its global minimum in xkx_{k}, then we would have to have F(xkε)0F^{\prime}(x_{k}-\varepsilon)\leq 0 and F(xk+ε)0F^{\prime}(x_{k}+\varepsilon)\geq 0 for sufficiently small values of ε\varepsilon which is ruled out by the explicit form of FF^{\prime}. ∎

2.2. Some Preparatory Work.

Since we now know that minimizing the W2W_{2} cost is equivalent to minimizing the L2L^{2}-discrepancy, we will instead work with this new, renormalized functional. To this end we introduce, for any given set of nn points in the unit interval, the counting function

fn(x)=#{1kn:xkx}.f_{n}(x)=\#\left\{1\leq k\leq n:x_{k}\leq x\right\}.

We will also use, throughout the rest of the paper, the rescaled version

gn(x)=#{1kn:xkx}nx.g_{n}(x)=\#\left\{1\leq k\leq n:x_{k}\leq x\right\}-nx.

Assume xn+1x_{n+1} is being added. A straightforward computation shows that

01(fn+1(x)(n+1)x)2𝑑x\displaystyle\int_{0}^{1}(f_{n+1}(x)-(n+1)x)^{2}dx =0xn+1((fn(x)nx)x)2𝑑x\displaystyle=\int_{0}^{x_{n+1}}((f_{n}(x)-nx)-x)^{2}dx
+xn+11((fn(x)nx)+(1x))2𝑑x\displaystyle+\int_{x_{n+1}}^{1}((f_{n}(x)-nx)+(1-x))^{2}dx
=01(fn(x)nx)2𝑑x20xn+1(fn(x)nx)x𝑑x\displaystyle=\int_{0}^{1}(f_{n}(x)-nx)^{2}dx-2\int_{0}^{x_{n+1}}(f_{n}(x)-nx)x~{}dx
+2xn+11(fn(x)nx)(1x)𝑑x+xn+133+(1xn+1)33.\displaystyle+2\int_{x_{n+1}}^{1}(f_{n}(x)-nx)(1-x)dx+\frac{x_{n+1}^{3}}{3}+\frac{(1-x_{n+1})^{3}}{3}.

Note that some of these terms do not depend on xn+1x_{n+1} while others do. Collecting only those terms that the depend on xn+1x_{n+1} shows that one needs to solve

E(z)+z3+(1z)33min,E(z)+\frac{z^{3}+(1-z)^{3}}{3}\rightarrow\min,

where the function EE is defined as

E(z)=20z(fn(x)nx)x𝑑x+2z1(fn(x)nx)(1x)𝑑x.\displaystyle E(z)=-2\int_{0}^{z}(f_{n}(x)-nx)x~{}dx+2\int_{z}^{1}(f_{n}(x)-nx)(1-x)~{}dx.

For the remainder of the paper, we will work with this representation.

2.3. An Elementary Lemma

The purpose of this section is to introduce a basic Lemma and to use it to prove the missing part of Kritzinger’s Theorem: that

01(fn(x)nx)2𝑑xn3+c,\int_{0}^{1}(f_{n}(x)-nx)^{2}dx\leq\frac{n}{3}+c,

where c>0c>0 only depends on the initial configuration. Subsequent arguments will then both use and refine this step.

Lemma (Basic Form).

Let g:[0,1]g:[0,1]\rightarrow\mathbb{R} be continuous. Then

max0z10zg(x)x𝑑xz1g(x)(1x)𝑑x0.\max_{0\leq z\leq 1}~{}\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx\geq 0.
Proof.

Using integration by parts

0zg(x)x𝑑xz1g(x)(1x)𝑑x\displaystyle\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx =0zg(x)x𝑑x+z1g(x)(x1)𝑑x\displaystyle=\int_{0}^{z}g(x)x~{}dx+\int_{z}^{1}g(x)(x-1)~{}dx
=01g(x)x𝑑xz1g(x)𝑑x.\displaystyle=\int_{0}^{1}g(x)xdx-\int_{z}^{1}g(x)dx.

Introducing the anti-derivative G:[0,1]G:[0,1]\rightarrow\mathbb{R} via G(s)=0sg(x)𝑑xG(s)=\int_{0}^{s}g(x)dx we have, integrating again by parts,

01g(x)x𝑑x=G(x)x|0101G(x)𝑑x=G(1)01G(x)𝑑x\displaystyle\int_{0}^{1}g(x)xdx=G(x)x\big{|}_{0}^{1}-\int_{0}^{1}G(x)dx=G(1)-\int_{0}^{1}G(x)dx

as well as z1g(x)𝑑x=G(1)G(z).\int_{z}^{1}g(x)dx=G(1)-G(z). Thus

0zg(x)x𝑑xz1g(x)(1x)𝑑x\displaystyle\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx =G(z)01G(x)𝑑x.\displaystyle=G(z)-\int_{0}^{1}G(x)dx.

Every function has to be at least as large as its average somewhere and

max0z1G(z)01G(x)𝑑x0.\max_{0\leq z\leq 1}G(z)-\int_{0}^{1}G(x)dx\geq 0.

One sees that the proof is merely a sequence of identities: the part where one can hope to gain is the last step which is clearly only sharp whenever GG is constant. The elementary Lemma now implies the missing part of Corollary 1.

Proposition.

We have

01(fn+1(x)(n+1)x)2𝑑x01(fn(x)nx)2𝑑x+13.\int_{0}^{1}(f_{n+1}(x)-(n+1)x)^{2}dx\leq\int_{0}^{1}(f_{n}(x)-nx)^{2}dx+\frac{1}{3}.
Proof.

Note that we have

01(fn+1(x)(n+1)x)2𝑑x\displaystyle\int_{0}^{1}(f_{n+1}(x)-(n+1)x)^{2}dx =01(fn(x)nx)2𝑑x2E(xn+1)\displaystyle=\int_{0}^{1}(f_{n}(x)-nx)^{2}dx-2\cdot E(x_{n+1})
+xn+13+(1xn+1)33.\displaystyle+\frac{x_{n+1}^{3}+(1-x_{n+1})^{3}}{3}.

There exists xn+1x_{n+1} such that E(xn+1)0E(x_{n+1})\geq 0 from which the result follows since

0z1z3+(1z)3313\forall~{}0\leq z\leq 1\qquad\frac{z^{3}+(1-z)^{3}}{3}\leq\frac{1}{3}

2.4. A refined Lemma.

The purpose of this section is to establish what is perhaps the core of the improvement: instead of merely using that a function has to exceed its average, as in the proof of the basic Lemma, we quantify the gain from below.

Lemma (Main Lemma).

Let g:[0,1]g:[0,1]\rightarrow\mathbb{R} be continuous and g0g\not\equiv 0. Then

max0z10zg(x)x𝑑xz1g(x)(1x)𝑑x1161gL22(max0x1|0xg(y)𝑑y|)3.\max_{0\leq z\leq 1}~{}\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx\geq\frac{1}{16}\frac{1}{\|g\|_{L^{2}}^{2}}\left(\max_{0\leq x\leq 1}\left|\int_{0}^{x}g(y)dy\right|\right)^{3}.
Proof.

Using again the antiderivative

G(x)=0xg(y)𝑑yG(x)=\int_{0}^{x}g(y)dy

the computations in the basic Lemma already established that

max0z10zg(x)x𝑑xz1g(x)(1x)𝑑x=max0z1G(z)01G(x)𝑑x0.\max_{0\leq z\leq 1}~{}\int_{0}^{z}g(x)x~{}dx-\int_{z}^{1}g(x)(1-x)~{}dx=\max_{0\leq z\leq 1}G(z)-\int_{0}^{1}G(x)dx\geq 0.

We will work with this expression instead. In order to emphasize that the integral in this expression is truly a constant, the average value, we will abbreviate

G¯=01G(x)𝑑x\overline{G}=\int_{0}^{1}G(x)dx\in\mathbb{R}

We start by arguing that if GG assumes a value much smaller than the average, then it also has to assume values that are larger than the average. The difficulty here is that ‘much smaller than the average’ is a pointwise statement which we have to turn into a statement about the integral. This is summarized in the following statement.

Fact. We have

max0z1G(z)G¯18gL22(G¯min0x1G(x))3.\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\frac{1}{8\|g\|_{L^{2}}^{2}}\cdot\left(\overline{G}-\min_{0\leq x\leq 1}G(x)\right)^{3}.
Proof of the Fact..

For simplicity of exposition,we abbreviate

M=G¯min0x1G(x)0.M=\overline{G}-\min_{0\leq x\leq 1}G(x)\geq 0.

Let us assume the minimum is assumed in

G(z1)=min0x1G(x).G(z_{1})=\min_{0\leq x\leq 1}G(x).

Let JJ be the shortest interval containing both z1z_{1} and a point 0z210\leq z_{2}\leq 1 for which

G(z2)=G(z1)+M2.G(z_{2})=G(z_{1})+\frac{M}{2}.

The existence of such a point z2z_{2} follows from the fact that GG is continuous and

G(z1)G(z1)+M2G(z1)+M=G¯G(z_{1})\leq G(z_{1})+\frac{M}{2}\leq G(z_{1})+M=\overline{G}

and the function GG assumes both the value G(z1)G(z_{1}) and the value G¯\overline{G} and the intermediate value theorem applies. If M=0M=0 there is nothing to prove, thus we can assume M0M\neq 0 implying z1z2z_{1}\neq z_{2}. The shortest interval JJ containing both z1z_{1} has positive length and z1,z2z_{1},z_{2} are its endpoints. Using Cauchy-Schwarz on JJ,

M2\displaystyle\frac{M}{2} =|G(z2)G(z1)|=|Jg(x)𝑑x|\displaystyle=\left|G(z_{2})-G(z_{1})\right|=\left|\int_{J}g(x)dx\right|
|J|1/2(Jg(x)2𝑑x)1/2|J|1/2(01g(x)2𝑑x)1/2=|J|1/2gL2.\displaystyle\leq|J|^{1/2}\left(\int_{J}g(x)^{2}dx\right)^{1/2}\leq|J|^{1/2}\left(\int_{0}^{1}g(x)^{2}dx\right)^{1/2}=|J|^{1/2}\cdot\|g\|_{L^{2}}.

This provides an upper bound on MM. Our next goal will be to show that |J||J| cannot be too long: if it is long, then max0z1G(z)G¯\max_{0\leq z\leq 1}G(z)-\overline{G} has to be large which is exactly what we are trying to show. This follows at once from

G¯\displaystyle\overline{G} =01G(x)𝑑x=JG(x)𝑑x+[0,1]JG(x)𝑑x\displaystyle=\int_{0}^{1}G(x)dx=\int_{J}G(x)dx+\int_{[0,1]\setminus J}G(x)dx
(G¯M2)|J|+(1|J|)(G¯+[max0z1G(z)G¯])\displaystyle\leq\left(\overline{G}-\frac{M}{2}\right)\cdot|J|+(1-|J|)\cdot\left(\overline{G}+\left[\max_{0\leq z\leq 1}G(z)-\overline{G}\right]\right)
G¯M2|J|+[max0z1G(z)G¯]\displaystyle\leq\overline{G}-\frac{M}{2}\cdot|J|+\left[\max_{0\leq z\leq 1}G(z)-\overline{G}\right]

from which we infer that

max0z1G(z)G¯|J|M2.\max_{0\leq z\leq 1}G(z)-\overline{G}\geq|J|\cdot\frac{M}{2}.

We can now combine these bounds. We have

M24gL22|J|2M[max0z1G(z)G¯]\frac{M^{2}}{4\|g\|_{L^{2}}^{2}}\leq|J|\leq\frac{2}{M}\cdot\left[\max_{0\leq z\leq 1}G(z)-\overline{G}\right]

which implies that

max0z1G(z)G¯M38gL22.\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\frac{M^{3}}{8\|g\|_{L^{2}}^{2}}.

The fact will now allow us to conclude the proof of the Main Lemma. Consider the values assumed by GG. Since GG is continuous, we have

{G(x):0x1}=[A,B],\left\{G(x):0\leq x\leq 1\right\}=[A,B],

where AG¯BA\leq\overline{G}\leq B. We introduce Y=BAY=B-A. As a first observation, we have

max0x1|0xg(y)𝑑y|=max0x1|G(x)|.\max_{0\leq x\leq 1}\left|\int_{0}^{x}g(y)dy\right|=\max_{0\leq x\leq 1}|G(x)|.

Note that G(0)=0G(0)=0 and thus A0BA\leq 0\leq B from which we deduce that

Y2max0x1|0xg(y)𝑑y|=max0x1|G(x)|Y.\frac{Y}{2}\leq\max_{0\leq x\leq 1}\left|\int_{0}^{x}g(y)dy\right|=\max_{0\leq x\leq 1}|G(x)|\leq Y.

We will now show that a large value of YY will imply that GG has to exceed its maximum by at least a little bit (depending on YY). For this, we write Y=Y1+Y2Y=Y_{1}+Y_{2}, where Y1=BG¯Y_{1}=B-\overline{G} and Y2=G¯AY_{2}=\overline{G}-A. Then, tautologically,

max0z1G(z)G¯=Y1.\max_{0\leq z\leq 1}G(z)-\overline{G}=Y_{1}.

Simultaneously, appealing to the fact proved above, we have

max0z1G(z)G¯Y238gL22.\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\frac{Y_{2}^{3}}{8\|g\|_{L^{2}}^{2}}.

Altogether, we have

max0z1G(z)G¯max{Y1,Y238gL22}=max{Y1,(YY1)38gL22}\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\max\left\{Y_{1},\frac{Y_{2}^{3}}{8\|g\|_{L^{2}}^{2}}\right\}=\max\left\{Y_{1},\frac{(Y-Y_{1})^{3}}{8\|g\|_{L^{2}}^{2}}\right\}

which we turn into a smoother object via

max0z1G(z)G¯12(Y1+(YY1)38gL22).\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\frac{1}{2}\left(Y_{1}+\frac{(Y-Y_{1})^{3}}{8\|g\|_{L^{2}}^{2}}\right).

Differentiation shows that

Y1[Y1+(YY1)38gL22]=1+38(YY1)2gL221\frac{\partial}{\partial Y_{1}}\left[Y_{1}+\frac{(Y-Y_{1})^{3}}{8\|g\|_{L^{2}}^{2}}\right]=1+\frac{3}{8}\frac{(Y-Y_{1})^{2}}{\|g\|_{L^{2}}^{2}}\geq 1

which shows that the minimum is assumed for Y1=0Y_{1}=0. Thus

max0z1G(z)G¯116Y3gL22.\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\frac{1}{16}\frac{Y^{3}}{\|g\|_{L^{2}}^{2}}.

Altogether, we obtain, as desired,

max0z1G(z)G¯1161gL22(max0x1|0xg(y)𝑑y|)3.\max_{0\leq z\leq 1}G(z)-\overline{G}\geq\frac{1}{16}\frac{1}{\|g\|_{L^{2}}^{2}}\left(\max_{0\leq x\leq 1}\left|\int_{0}^{x}g(y)dy\right|\right)^{3}.

2.5. Proof of Theorem 2

Proof.

Suppose now that (xk)k=1(x_{k})_{k=1}^{\infty} is a Kritzinger sequence. We note that

01(fn(x)nx)2𝑑x=01gn(x)2𝑑xn3+c,\int_{0}^{1}(f_{n}(x)-nx)^{2}dx=\int_{0}^{1}g_{n}(x)^{2}dx\leq\frac{n}{3}+c,

where cc only depends on the initial configuration. Suppose now there exists NN\in\mathbb{N} such that for all nNn\geq N we have

max0x1|0xgn(y)𝑑y|2n1/3.\max_{0\leq x\leq 1}\left|\int_{0}^{x}g_{n}(y)dy\right|\geq 2\cdot n^{1/3}.

We will use this to deduce a contradiction. We have

01gn+1(x)2𝑑x\displaystyle\int_{0}^{1}g_{n+1}(x)^{2}dx 01gn(x)2𝑑x+13\displaystyle\leq\int_{0}^{1}g_{n}(x)^{2}dx+\frac{1}{3}
max0z12(0zgn(x)x𝑑xz1gn(x)(1x)𝑑x).\displaystyle-\max_{0\leq z\leq 1}2\left(\int_{0}^{z}g_{n}(x)x~{}dx-\int_{z}^{1}g_{n}(x)(1-x)~{}dx\right).

The Main Lemma implies

max0z12(0zgn(x)x𝑑xz1gn(x)(1x)𝑑x)\displaystyle\max_{0\leq z\leq 1}2\left(\int_{0}^{z}g_{n}(x)x~{}dx-\int_{z}^{1}g_{n}(x)(1-x)~{}dx\right) 181gnL22(max0x1|0xgn(y)𝑑y|)3\displaystyle\geq\frac{1}{8}\frac{1}{\|g_{n}\|_{L^{2}}^{2}}\left(\max_{0\leq x\leq 1}\left|\int_{0}^{x}g_{n}(y)dy\right|\right)^{3}
181n3+c8n=nn/3+c.\displaystyle\geq\frac{1}{8}\frac{1}{\frac{n}{3}+c}8n=\frac{n}{n/3+c}.

However, for nn sufficiently large (say nN1n\geq N_{1} where N1N_{1} depends only on cc), this implies, for all nmax(N,N1)n\geq\max(N,N_{1})

01gn+1(x)2𝑑x01gn(x)2𝑑x2\int_{0}^{1}g_{n+1}(x)^{2}dx\leq\int_{0}^{1}g_{n}(x)^{2}dx-2

which leads to a contradiction since these integrals are always positive. ∎

Remark. We note that this allows for a small refinement: the argument shows that for all NN sufficiently large, there always exist Nn100NN\leq n\leq 100N for which

max0x1|0xgn(y)𝑑y|2n1/3.\max_{0\leq x\leq 1}\left|\int_{0}^{x}g_{n}(y)dy\right|\leq 2\cdot n^{1/3}.

References

  • [1] T. van Aardenne-Ehrenfest, Proof of the Impossibility of a Just Distribution of an Infinite Sequence Over an Interval, Proc. Kon. Ned. Akad. Wetensch. 48, 3-8, 1945.
  • [2] J. Beck, A two-dimensional van Aardenne-Ehrenfest theorem in irregularities of distribution. Compositio Math. 72 3, 269–339 (1989).
  • [3] J. Beck and W. Chen, Irregularities of Distribution, Cambridge Tracts in Mathematics (No. 89), Cambridge University Press, 1987.
  • [4] J. Beck, The modulus of polynomials with zeros on the unit circle: A problem of Erdos, Annals of Mathematics, 134 (1991), p. 609–651
  • [5] D. Bilyk and M. Lacey, On the small ball Inequality in three dimensions, Duke Math. J. 143 (2008), no. 1, 81–115.
  • [6] D. Bilyk, M. Lacey and A. Vagharshakyan, On the small ball inequality in all dimensions, J. Funct. Anal. 254 (2008), no. 9, 2470–2502.
  • [7] L. Brown and S. Steinerberger, Positive-definite Functions, Exponential Sums and the Greedy Algorithm: a curious Phenomenon, Journal of Complexity 61 (2020), 101485
  • [8] L. Brown and S. Steinerberger, On the Wasserstein distance between classical sequences and the Lebesgue measure, Trans. Amer. Math. Soc. 373 (2020), p. 8943–8962.
  • [9] B. Chazelle, The discrepancy method. Randomness and complexity. Cambridge University Press, Cambridge, 2000.
  • [10] J. van der Corput, Verteilungsfunktionen I, Proc. Akad. Wetensch. , 38 (1935), 813–821.
  • [11] J. van der Corput, Verteilungsfunktionen II, Akad. Wetensch., Proc. 38 (1935), 1058–1068
  • [12] J. Dick and F. Pillichshammer, Digital nets and sequences. Discrepancy theory and quasi-Monte Carlo integration. Cambridge University Press, Cambridge, 2010.
  • [13] M. Drmota, R. Tichy, Sequences, discrepancies and applications. Lecture Notes in Mathematics, 1651. Springer-Verlag, Berlin, 1997.
  • [14] P. Erdős, Some unsolved problems, Michigan Math. J. 4 (1957), p. 291–300
  • [15] C. Graham, Irregularity of distribution in Wasserstein distance. Journal of Fourier Analysis and Applications, 26 (2020), p. 1-21.
  • [16] G. Larcher and F. Puchhammer, An improved bound for the star discrepancy of sequences in the unit interval, Unif. Distrib. Theory 11 (2016), p. 1–14
  • [17] L. Kuipers and H. Niederreiter, Uniform distribution of sequences. Pure and Applied Mathematics. Wiley-Interscience, New York-London-Sydney, 1974.
  • [18] R. Kritzinger, Uniformly distributed sequences generated by a greedy minimization of the L2L_{2} discrepancy, arXiv:2109.06298.
  • [19] V. Ostromoukhov. Recent progress in improvement of extreme discrepancy and star discrepancy of one-dimensional sequences. MCQMC 2008, pages 561–572.
  • [20] F. Pausinger, Greedy energy minimization can count in binary: point charges and the van der Corput sequence, Annali di Matematica Pura ed Applicata 200 (2021), p. 165–186.
  • [21] K. F. Roth, On irregularities of distribution. Mathematika 1, 73–79 (1954).
  • [22] W. Schmidt, Irregularities of distribution. VII. Acta Arith. 21 (1972), 45–50.
  • [23] S. Steinerberger, A Nonlocal Functional promoting Low-Discrepancy Point Sets, Journal of Complexity 54 (2019), 101410
  • [24] S. Steinerberger, Dynamically Defined Sequences with Small Discrepancy, Monatshefte fur Mathematik 191 (2020), p. 639–655
  • [25] S. Steinerberger, Polynomials with zeros on the unit circle: regularity of Leja sequences, Mathematika 67 (2021), p. 553–568
  • [26] S. Steinerberger, Wasserstein distance, Fourier series and applications, Monatshefte fur Mathematik 194 (2021), p. 305-338
  • [27] S. Steinerberger, A Wasserstein inequality and minimal Green energy on compact manifolds, Journal of Functional Analysis 281(2021):109076
  • [28] S. Vallender, Calculation of the Wasserstein Distance Between Probability Distributions on the Line Theory of Probability & Its Applications 18 (1974), p. 435.