This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Pointwise convergence of ergodic averages with Möbius weight

Joni Teräväinen Department of Mathematics and Statistics
University of Turku, 20014 Turku
Finland
[email protected]
Abstract.

Let (X,ν,T)(X,\nu,T) be a measure-preserving system, and let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients. We prove that, for any f1,,fkL(X)f_{1},\ldots,f_{k}\in L^{\infty}(X), the Möbius-weighted polynomial multiple ergodic averages

1NnNμ(n)f1(TP1(n)x)fk(TPk(n)x)\displaystyle\frac{1}{N}\sum_{n\leq N}\mu(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)

converge to 0 pointwise almost everywhere. Specialising to P1(y)=y,P2(y)=2yP_{1}(y)=y,P_{2}(y)=2y, this solves a problem of Frantzikinakis. We also prove pointwise convergence for a more general class of multiplicative weights for multiple ergodic averages involving distinct degree polynomials. For the proofs we establish some quantitative generalised von Neumann theorems for polynomial configurations that are of independent interest.

1. Introduction

Let (X,ν)(X,\nu) be a probability space and T:XXT\colon X\to X an invertible measure-preserving map, meaning that ν(T1(E))=ν(E)\nu(T^{-1}(E))=\nu(E) for all measurable sets EXE\subset X. The triple (X,ν,T)(X,\nu,T) is called a measure-preserving system. Given functions f1,,fkL(X)f_{1},\ldots,f_{k}\in L^{\infty}(X) and polynomials P1,,Pk[y]P_{1},\ldots,P_{k}\in\mathbb{Z}[y], we form the polynomial multiple ergodic averages

(1.1) 1NnNf1(TP1(n))fk(TPk(n)x),xX.\displaystyle\frac{1}{N}\sum_{n\leq N}f_{1}(T^{P_{1}(n)})\cdots f_{k}(T^{P_{k}(n)}x),\quad x\in X.

The convergence properties of these averages as NN\to\infty have been studied intensively. The question of their L2(X)L^{2}(X) norm convergence was settled by Host–Kra [20] and Leibman [24] after a series of substantial progress by several authors. The question of pointwise convergence, however, remains open and is the subject of the celebrated Furstenberg–Bergelson–Leibman conjecture [2, Section 5.5]. Pointwise convergence has so far been established in two notable cases, namely the case where k=1k=1 and the case where k=2k=2 and P1P_{1} is linear. The former follows from celebrated work of Bourgain [3], and the latter follows from another well known work of Bourgain [4] if P2P_{2} is linear and from a recent breakthrough of Krause, Mirek and Tao [22] if P2P_{2} has degree at least 22.

Let μ\mu be the Möbius function, defined by μ(n)=(1)k\mu(n)=(-1)^{k} if nn is the product of kk distinct primes and by μ(n)=0\mu(n)=0 if nn is divisible by the square of a prime. In this paper, we shall consider the Möbius-weighted polynomial multiple ergodic averages

1NnNμ(n)f1(TP1(n))fk(TPk(n)x),xX.\displaystyle\frac{1}{N}\sum_{n\leq N}\mu(n)f_{1}(T^{P_{1}(n)})\cdots f_{k}(T^{P_{k}(n)}x),\quad x\in X.

Based on the Möbius randomness principle (see [21, Section 13]), it is natural to conjecture the following.

Conjecture 1.1.

Let kk\in\mathbb{N}, and let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients. Let (X,ν,T)(X,\nu,T) be a measure-preserving system. Then, for any f1,,fkL(X)f_{1},\ldots,f_{k}\in L^{\infty}(X), we have

limN1NnNμ(n)f1(TP1(n)x)fk(TPk(n)x)=0\displaystyle\lim_{N\to\infty}\frac{1}{N}\sum_{n\leq N}\mu(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)=0

for almost all xXx\in X.

The corresponding L2(X)L^{2}(X) norm convergence result was proven by Frantzikinakis and Host [10] (this could also be deduced from the proof of [5, Theorem 1.3], combined with [17, Theorem 1.1]). The only case we are aware of where Conjecture 1.1 was previously known is the case k=1k=1, proven in [7, Theorem 2.2] (see also [8, Proposition 3.1] for the case of a linear polynomial and [10, Theorem C] for a generalisation of that result to other multiplicative functions).

Our first main theorem settles Conjecture 1.1 in full. In fact, somewhat unusually for ergodic theorems, we get a quantitative (polylogarithmic) rate of converge. This result goes beyond what is currently known about the unweighted averages (1.1).

Theorem 1.2.

Let kk\in\mathbb{N}, and let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients. Let 1<q1,,qk1<q_{1},\ldots,q_{k}\leq\infty satisfy 1q1++1qk<1\frac{1}{q_{1}}+\cdots+\frac{1}{q_{k}}<1. Let (X,ν,T)(X,\nu,T) be a measure-preserving system. Then, for any f1Lq1(X),,fkLqk(X)f_{1}\in L^{q_{1}}(X),\ldots,f_{k}\in L^{q_{k}}(X) and A0A\geq 0, we have

limN(logN)ANnNμ(n)f1(TP1(n)x)fk(TPk(n)x)=0\displaystyle\lim_{N\to\infty}\frac{(\log N)^{A}}{N}\sum_{n\leq N}\mu(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)=0

for almost all xXx\in X.

Let us make a few remarks about Theorem 1.2.

  1. (1)

    Specialising to k=2k=2 and P1(y)=y,P2(y)=2yP_{1}(y)=y,P_{2}(y)=2y (and A=0A=0, q1=q2=q_{1}=q_{2}=\infty), Theorem 1.2 settles the Möbius case of Problem 12 of Frantzikinakis’ open problems survey [9]111Frantzikinakis also asked about the convergence of the same multiple ergodic averages weighted by any multiplicative function g:g\colon\mathbb{N}\to\mathbb{C}, taking values in the unit disc, which has a mean value in every arithmetic progression. Theorem 1.6 below makes progress on this more general question.. This problem was also stated by Frantzikinakis and Host in [10].

  2. (2)

    In the case where the PiP_{i} are linear, we can allow iterates of different commuting transformations in the result; see Theorem 1.3 below.

  3. (3)

    It is likely that, at least for k=2k=2 and P1P_{1} linear, the region of (q1,,qk)(q_{1},\ldots,q_{k}) in the theorem could be improved to 1q1++1qk1+δd\frac{1}{q_{1}}+\cdots+\frac{1}{q_{k}}\leq 1+\delta_{d} for some δd>0\delta_{d}>0, hence “breaking duality” in this problem. This is thanks to the LpL^{p}-improving estimates of Lacey [23] (see also [6]) and Han–Kovač–Lacey–Madrid–Yang [19]. See also [22, Section 11]. We leave the details to the interested reader.

  4. (4)

    The theorem continues to hold if we replace the Möbius function μ\mu with the Liouville function λ\lambda (defined by λ(n)=(1)Ω(n)\lambda(n)=(-1)^{\Omega(n)}, where Ω(n)\Omega(n) is the number of prime factors of nn with multiplicities); see the more general Theorem 7.1 along with Remark 5.3. Similarly, all the results in this paper regarding the Möbius function also hold for the Liouville function.

1.1. Ergodic averages with commuting transformations

Let a probability space (X,ν)(X,\nu) and invertible, commuting, measure-preserving maps T1,,Tk:XXT_{1},\ldots,T_{k}\colon X\to X be given. For any polynomials P1,,PkP_{1},\ldots,P_{k} with integer coefficients and f1,,fkL(X)f_{1},\ldots,f_{k}\in L^{\infty}(X), one can consider polynomial ergodic averages with commuting transformations

(1.2) 1NnNf1(T1P1(n)x)fk(TkPk(n)x),xX\displaystyle\frac{1}{N}\sum_{n\leq N}f_{1}(T_{1}^{P_{1}(n)}x)\cdots f_{k}(T_{k}^{P_{k}(n)}x),\quad x\in X

and their Möbius-weighted versions

(1.3) 1NnNμ(n)f1(T1P1(n)x)fk(TkPk(n)x),xX.\displaystyle\frac{1}{N}\sum_{n\leq N}\mu(n)f_{1}(T_{1}^{P_{1}(n)}x)\cdots f_{k}(T_{k}^{P_{k}(n)}x),\quad x\in X.

The commuting case seems to be rather more difficult, since in the unweighted case (1.2) pointwise convergence is not currently known even for k=2k=2 and P1,P2P_{1},P_{2} linear (see Problem 19 of [9]). However, we mention that Walsh [35] proved L2(X)L^{2}(X) norm convergence of the averages (1.2) in a groundbreaking work, and the Furstenberg–Bergelson–Leibman conjecture asserts that pointwise convergence should hold also with commuting transformations.

Our next theorem states that for the Möbius averages (1.3) we have pointwise convergence in the case where the PiP_{i} are linear.

Theorem 1.3.

Let kk\in\mathbb{N}. Let (X,ν)(X,\nu) be a probability space and let T1,,Tk:XXT_{1},\ldots,T_{k}\colon X\to X be invertible, commuting, measure-preserving maps. Let 1<q1,,qk1<q_{1},\ldots,q_{k}\leq\infty satisfy 1q1++1qk<1\frac{1}{q_{1}}+\cdots+\frac{1}{q_{k}}<1, and let f1Lq1(X),,fkLqk(X)f_{1}\in L^{q_{1}}(X),\ldots,f_{k}\in L^{q_{k}}(X). Then, for any A0A\geq 0, we have

limN(logN)ANnNμ(n)f1(T1nx)fk(Tknx)=0\displaystyle\lim_{N\to\infty}\frac{(\log N)^{A}}{N}\sum_{n\leq N}\mu(n)f_{1}(T_{1}^{n}x)\cdots f_{k}(T_{k}^{n}x)=0

for almost all xXx\in X.

Naturally, this theorem implies Theorem 1.2 for PiP_{i} being linear by taking to T1,,TkT_{1},\ldots,T_{k} being powers of the same transformation.

It seems likely that also the general case of Theorem 1.2 could be obtained for commuting transformations by extending Theorem 4.2, which goes into its proof, to multivariate functions (which could likely be done with a more complicated PET induction scheme). We leave the details to the interested reader.

1.2. Multiplicative weights

We also consider more general weighted polynomial multiple ergodic averages

(1.4) 1NnNg(n)f1(TP1(n))fk(TPk(n)x),xX,\displaystyle\frac{1}{N}\sum_{n\leq N}g(n)f_{1}(T^{P_{1}(n)})\cdots f_{k}(T^{P_{k}(n)}x),\quad x\in X,

with g:g\colon\mathbb{N}\to\mathbb{C} a function. Already for k=1k=1 and P1P_{1} linear, a necessary condition for the pointwise convergence of these averages is that gg has convergent means, meaning that limN1NnNg(an+b)\lim_{N\to\infty}\frac{1}{N}\sum_{n\leq N}g(an+b) exists for all a,ba,b\in\mathbb{N} (this is seen by taking XX to be a finite set).

In Theorem 7.1 we will show, assuming a mild growth condition on gg, that good decay bounds on the Gowers Us[N]U^{s}[N] norms of gg (or for certain weaker us[N]u^{s}[N] norms that depend on the maximal correlation with polynomial phases) imply the convergence of the averages (1.4) to 0. Such results should have applications also in cases where gg has no arithmetic structure (for example, for random weights); however, here we focus on applications with gg being multiplicative222We say that g:g\colon\mathbb{N}\to\mathbb{C} is multiplicative if g(mn)=g(m)g(n)g(mn)=g(m)g(n) whenever m,nm,n are coprime.

Frantzikinakis [9] conjectured that if g:g\colon\mathbb{N}\to\mathbb{C} is 11-bounded, multiplicative and hasconvergent means, then we have the pointwise almost everywhere convergence of

1NnNg(n)f1(Tnx)f2(T2nx).\displaystyle\frac{1}{N}\sum_{n\leq N}g(n)f_{1}(T^{n}x)f_{2}(T^{2n}x).

There is an obvious extension of this conjecture to polynomial multiple ergodic averages.

Conjecture 1.4.

Let kk\in\mathbb{N}, and let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients. Let g:g\colon\mathbb{N}\to\mathbb{C} be a multiplicative function taking values in the unit disc and having convergent means. Let (X,ν,T)(X,\nu,T) be a measure-preserving system. Then, for any f1,,fkL(X)f_{1},\ldots,f_{k}\in L^{\infty}(X), the limit

limN1NnNg(n)f1(TP1(n)x)fk(TPk(n)x)\displaystyle\lim_{N\to\infty}\frac{1}{N}\sum_{n\leq N}g(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)

exists for almost all xXx\in X.

While we are not able to prove this statement in full, we can prove it for a natural class of multiplicative functions, namely those satisfying a Siegel–Walfisz assumption (stated below). Most practically occurring multiplicative functions of mean 0 satisfy this property, and this class arises naturally in several problems in analytic number theory, in particular in connection with the Bombieri–Vinogradov theorem (see e.g. [15]).

Definition 1.5.

We say that a function g:g\colon\mathbb{N}\to\mathbb{C} satisfies the Siegel–Walfisz assumption if the following hold:

  1. (1)

    gg is divisor-bounded: for some C0C\geq 0, we have |g(n)|d(n)C|g(n)|\leq d(n)^{C} for all nn\in\mathbb{N}, with d(n)d(n) denoting the number of positive divisors of nn.

  2. (2)

    For all A>0A>0 and N3N\geq 3 we have

    max1aq(logN)A|nNna(modq)g(n)|AN(logN)A.\displaystyle\max_{1\leq a\leq q\leq(\log N)^{A}}\left|\sum_{\begin{subarray}{c}n\leq N\\ n\equiv a\ (\mathrm{mod}\ q)\end{subarray}}g(n)\right|\ll_{A}\frac{N}{(\log N)^{A}}.

Examples of multiplicative functions satisfying the Siegel–Walfisz assumption include h(n)d(n)jχ(n)nith(n)d(n)^{j}\chi(n)n^{it} for any integer j0j\geq 0, real tt and Dirichlet character χ\chi, where h:h\colon\mathbb{N}\to\mathbb{C} is any bounded multiplicative function that is “pretending” to be the Möbius function in the sense that p1+Re(h(p))p<\sum_{p}\frac{1+\textnormal{Re}(h(p))}{p}<\infty.333That these functions are examples can be verified by using Perron’s formula and standard estimates for Dirichlet LL-functions close to the 11-line.

We are now ready to state a result on the pointwise convergence of multiple ergodic averages with a multiplicative weight satisfying the Siegel–Walfisz assumption.

Theorem 1.6.

Let kk\in\mathbb{N}, and let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients and with distinct degrees. Let 1<q1,,qk1<q_{1},\ldots,q_{k}\leq\infty satisfy 1q1++1qk<1\frac{1}{q_{1}}+\cdots+\frac{1}{q_{k}}<1. Let (X,ν,T)(X,\nu,T) be a measure-preserving system. Let g:g\colon\mathbb{N}\to\mathbb{C} be a multiplicative function satisfying the Siegel–Walfisz assumption. Then, for any f1Lq1(X),,fkLqk(X)f_{1}\in L^{q_{1}}(X),\ldots,f_{k}\in L^{q_{k}}(X), we have

limN1NnNg(n)f1(TP1(n)x)fk(TPk(n)x)=0\displaystyle\lim_{N\to\infty}\frac{1}{N}\sum_{n\leq N}g(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)=0

for almost all xXx\in X.

Note that we allow the function gg to be unbounded, hence proving pointwise convergence in some cases not covered by Conjecture 1.4.

Somewhat curiously, Theorem 1.6 requires an argument that is rather different from our proof of Theorem 1.2; both proofs will be discussed in Section 2.

1.3. Prime ergodic averages

The arguments presented in this paper are not limited to polynomial ergodic averages weighted by the Möbius function, and indeed apply to similar ergodic averages with any weight that satisfies certain Gowers uniformity assumptions as well as some weak upper bound assumptions that are easy to verify; see Theorem 7.1. In particular, thanks to the quantitative Gowers uniformity estimates for the von Mangoldt function in [25], these general theorems can be applied to reduce the problem of convergence of polynomial ergodic averages weighted by the primes to the problem of convergence of the same averages weighted by integers with no small prime factors, which is an easier problem (though still highly nontrivial). Pointwise convergence of polynomial multiple ergodic averages weighted by the primes will be studied in a future joint work with Krause, Mousavi and Tao.

1.4. Further applications of the proof method

Key ingredients in the proofs of our main theorems are some new polynomial generalised von Neumann theorems with quantitative dependencies that we establish in Section 4. These results are likely to have applications also to other problems, such as to bounds for sets of integers lacking progressions of the form x,x+P1(p1),x+Pk(p1)x,x+P_{1}(p-1),\ldots x+P_{k}(p-1) with pp prime and with P1,,PkP_{1},\ldots,P_{k} polynomials of distinct degrees with Pi(0)=0P_{i}(0)=0. Such applications will be investigated in future works.

1.5. Acknowledgements

The author thanks Nikos Frantzikinakis, Ben Krause, Sarah Peluse, Sean Prendiville and Terence Tao for helpful discussions and suggestions. The author was supported by funding from European Union’s Horizon Europe research and innovation programme under Marie Skłodowska-Curie grant agreement No 101058904.

2. Proof ideas

We now give an overview of the arguments used to prove Theorems 1.2 and 1.6, presenting the steps of the proof in a somewhat different order than in the actual proof and focusing on the case of functions fiL(X)f_{i}\in L^{\infty}(X) for simplicity.

We begin with Theorem 1.2. Let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients and with highest degree dd. Thefirst step for the proofs of our pointwise convergence results is a lacunary subsequence trick, which combined with the Borel–Cantelli lemma and Markov’s inequality reduces matters to obtaining strong quantitative pointwise bounds of the form

(2.1) |1Nd+1mNdnNμ(n)ϕ(Tmx)f1(Tm+P1(n)x)fk(Tm+Pk(n)x)|ϕL(X)f1L(X)fkL(X)(logN)A\displaystyle\begin{aligned} &\left|\frac{1}{N^{d+1}}\sum_{m\leq N^{d}}\sum_{n\leq N}\mu(n)\phi(T^{m}x)f_{1}(T^{m+P_{1}(n)}x)\cdots f_{k}(T^{m+P_{k}(n)}x)\right|\\ &\quad\ll\|\phi\|_{L^{\infty}(X)}\|f_{1}\|_{L^{\infty}(X)}\cdots\|f_{k}\|_{L^{\infty}(X)}(\log N)^{-A}\end{aligned}

for ν\nu-almost all xx and all ϕL(X)\phi\in L^{\infty}(X), with AA large enough. This reduction is presented in Section 7.

We then establish a polynomial generalised von Neumann theorem for counting operators on the left-hand side of (2.1), which bounds the averages in (2.1) in terms of the Us[N]U^{s}[N] Gowers norm of μ\mu for some ss\in\mathbb{N}; see Theorem 4.2. This result is proven by repeated applications of van der Corput’s inequality coupled with the PET induction scheme. Crucially, the bounds we obtain for (2.1) in terms of the Gowers norm μUs[N]\|\mu\|_{U^{s}[N]} are quantitative with polynomial (in fact, linear) dependencies.

After establishing this generalised von Neumann theorem, we can conclude the proof by applying the strongest known quantitative bounds for the Us[N]U^{s}[N] norms of the Möbius function (see Lemma 5.1), which save an arbitrary power of logarithm, thanks to recent work of Leng [25] that builds on the work of Leng–Sah–Sawhney [26].

In the case of Theorem 1.6, we repeat the lacunary subsequence argument to reduce to (2.1), with μ\mu replaced by a multiplicative function gg satisfying the Siegel–Walfisz assumption. If one were now to apply Theorem 4.2 again, one would not be able to obtain a sufficiently strong bound on the Us[N]U^{s}[N] norm of gg, since the Leng–Sah–Sawhney inverse theorem [26] is quasipolynomial rather than polynomial. We overcome this by establishing a different polynomial generalised von Neumann theorem, Theorem 4.1, that (perhaps unexpectedly at first) allows bounding (2.1) in terms of a weaker norm than the Us[N]U^{s}[N] norm of the weight function. This weaker norm, called the us[N]u^{s}[N] norm and defined in (3.3), expresses the maximal correlation of the weight gg with a polynomial phase of degree at most s1s-1. The proof of Theorem 4.1 draws motivation from the Peluse–Prendiville degree lowering theory [28][29]. The proof proceeds by induction on the length of the progression and involves showing that the first two functions in a weighted progressions can be assumed to be “locally linear” phase functions in a suitable sense. This conclusion is then boosted to global linearity with some extra work, which allows reducing the length of the progression, hence completing the induction.

Since the us[N]u^{s}[N] norm already involves correlations with polynomial phases, we are able to bypass the need for the inverse theorem for the Us[N]U^{s}[N] norm when working with distict degree polynomials. In Subsection 5.2, we show (using in particular a restriction to typical factorisations and bilinear estimates for polynomial exponential sums) that if gg is multiplicative and satisfies the Siegel–Walfisz assumption, then gg is close in L1[N]L^{1}[N] norm to a function g~\widetilde{g} whose us[N]u^{s}[N] norms decay faster than any power of logarithm. This together with Theorem 4.1 mentioned above suffices for concluding the proof of Theorem 1.6. The Siegel–Walfisz assumption gives just the right decay for (2.1): with any weaker assumption we would not be able to prove this (although the averages (1.4) should still converge).

We lastly remark that the approach based on Theorem 4.1 also gives a different and arguably simpler proof of Theorem 1.2 for distinct degree polynomials that is independent of any inverse theorems for the Gowers norms and only uses classical analytic number theory input (the only property needed of the Möbius function is an exponential sum estimate that essentially goes back to the work of Vinogradov [34] from 1939; see Remark 5.2).

3. Notation and preliminaries

3.1. Asymptotic notation, indicators and averages

We use the Vinogradov and Landau asymptotic notations ,,o(),O()\ll,\gg,o(\cdot),O(\cdot). Thus, we write XYX\ll Y, X=O(Y)X=O(Y) or YXY\gg X if there is a constant CC such that |X|CY|X|\leq CY. We use XYX\asymp Y to denote XYXX\ll Y\ll X. We write X=o(Y)X=o(Y) as NN\to\infty if |X|c(N)Y|X|\leq c(N)Y for some function c(N)0c(N)\to 0 as NN\to\infty. If we add subscripts to these notations, then the implied constants can depend on these subscripts. Thus, for example XAYX\ll_{A}Y means that |X|CAY|X|\leq C_{A}Y for some CA>0C_{A}>0 depending on AA.

For a set EE, we define the indicator function 1E(x)1_{E}(x) as the function that equals to 11 if xEx\in E and equals to 0 otherwise. Similarly, if PP is a proposition, the expression 1P1_{P} equals to 11 if PP is true and 0 if PP is false.

For a nonempty finite set AA and a function f:Af\colon A\to\mathbb{C}, we define the averages

𝔼aAf(a)aAf(a)aA1.\displaystyle\mathbb{E}_{a\in A}f(a)\coloneqq\frac{\sum_{a\in A}f(a)}{\sum_{a\in A}1}.

For a real number N1N\geq 1, we denote [N]{n:nN}[N]\coloneqq\{n\in\mathbb{N}\colon\,\,n\leq N\}. For integers m,nm,n, we denote their greatest common divisor by (m,n)(m,n) and write mnm\mid n^{\infty} to mean that mnkm\mid n^{k} for some natural number kk. Unless otherwise specified, all our sums and averages run over the positive integers, with the exception that the symbol pp is reserved for primes.

3.2. Gowers norms

For ss\in\mathbb{N} and a function f:f\colon\mathbb{Z}\to\mathbb{C} with finite support, we define its unnormalised UsU^{s} Gowers norm as

fU~s()(x,h1,,hsω{0,1}s𝒞|ω|f(x+ω(h1,,hs)))1/2s,\displaystyle\|f\|_{\widetilde{U}^{s}(\mathbb{Z})}\coloneqq\left(\sum_{x,h_{1},\ldots,h_{s}\in\mathbb{Z}}\prod_{\omega\in\{0,1\}^{s}}\mathcal{C}^{|\omega|}f(x+\omega\cdot(h_{1},\ldots,h_{s}))\right)^{1/2^{s}},

where 𝒞(z)=z¯\mathcal{C}(z)=\overline{z} is the complex conjugation operator and for a vector (ω1,,ωs)(\omega_{1},\ldots,\omega_{s}) we write |ω||ω1|++|ωs||\omega|\coloneqq|\omega_{1}|+\cdots+|\omega_{s}|. For N1N\geq 1, we then define the Us[N]U^{s}[N] Gowers norm of a function f:f\colon\mathbb{Z}\to\mathbb{C} as

fUs[N]f1[N]U~s()1[N]U~s().\displaystyle\|f\|_{U^{s}[N]}\coloneqq\frac{\|f1_{[N]}\|_{\widetilde{U}^{s}(\mathbb{Z})}}{\|1_{[N]}\|_{\widetilde{U}^{s}(\mathbb{Z})}}.

As is well known (see for example [16, Appendix B]), for s2s\geq 2 the Us[N]U^{s}[N] norm is indeed a norm and for s=1s=1 it is a seminorm, and the function sfUs[N]s\mapsto\|f\|_{U^{s}[N]} is increasing.

We observe the classical U2[N]U^{2}[N] inverse theorem: if f:[N]f\colon[N]\to\mathbb{C} satisfies |f|1|f|\leq 1 and δ(0,1)\delta\in(0,1), then

(3.1) fU2[N]δsupα|𝔼n[N]f(n)e(αn)|δ2,\displaystyle\|f\|_{U^{2}[N]}\geq\delta\implies\sup_{\alpha\in\mathbb{R}}|\mathbb{E}_{n\in[N]}f(n)e(-\alpha n)|\gg\delta^{2},

where e(x)e2πixe(x)\coloneqq e^{2\pi ix}. This follows from the identity fU~2()=01|nf(n)e(αn)|4 dα\|f\|_{\widetilde{U}^{2}(\mathbb{Z})}=\int_{0}^{1}|\sum_{n\in\mathbb{Z}}f(n)e(-\alpha n)|^{4}\textnormal{ d}\alpha (which can be verified by expanding out the right-hand side) combined with Parseval’s identity.

For the U~s()\widetilde{U}^{s}(\mathbb{Z}) norms we have the Gowers–Cauchy–Schwarz inequality (see for example [32, (4.2)]), which states that, for any functions (fω)ω{0,1}s(f_{\omega})_{\omega\in\{0,1\}^{s}} from \mathbb{Z} to \mathbb{C} with finite support, we have

(3.2) |x,h1,,hsω{0,1}sfω(x+ω(h1,,hs))|ω{0,1}sfωU~s().\displaystyle\left|\sum_{x,h_{1},\ldots,h_{s}\in\mathbb{Z}}\prod_{\omega\in\{0,1\}^{s}}f_{\omega}(x+\omega\cdot(h_{1},\ldots,h_{s}))\right|\leq\prod_{\omega\in\{0,1\}^{s}}\|f_{\omega}\|_{\widetilde{U}^{s}(\mathbb{Z})}.

We also define the us[N]u^{s}[N] norm of a function f:f\colon\mathbb{Z}\to\mathbb{C} as

(3.3) fus[N]supP(y)[y]deg(P)s1|𝔼n[N]f(n)e(P(n))|.\displaystyle\|f\|_{u^{s}[N]}\coloneqq\sup_{\begin{subarray}{c}P(y)\in\mathbb{R}[y]\\ \deg(P)\leq s-1\end{subarray}}\left|\mathbb{E}_{n\in[N]}f(n)e(-P(n))\right|.

We remark that, as is well known, for s=2s=2 the us[N]u^{s}[N] and Us[N]U^{s}[N] norms are equivalent (up to polynomial losses), but for all s3s\geq 3 the us[N]u^{s}[N] norm is weaker in the sense that the Us[N]U^{s}[N] norm controls the us[N]u^{s}[N] norm but not vice versa; see [18][14, Section 4].

3.3. Van der Corput’s inequality

For H1H\geq 1, define the weight

μH(h)|{(h1,h2)[H]2:h1h2=h}|H2,\displaystyle\mu_{H}(h)\coloneqq\frac{|\{(h_{1},h_{2})\in[H]^{2}\colon h_{1}-h_{2}=h\}|}{\lfloor H\rfloor^{2}},

where x\lfloor x\rfloor is the floor function of xx (This weight should not be confused with the Möbius function μ\mu). For an integer hh, we define the differencing operator Δh\Delta_{h} by setting Δhf(x)=f(x+h)¯f(x)\Delta_{h}f(x)=\overline{f(x+h)}f(x) for any f:f\colon\mathbb{Z}\to\mathbb{C} and xx\in\mathbb{Z}.

We will frequently use van der Corput’s inequality in the following form.

Lemma 3.1.

For any NH1N\geq H\geq 1 and any function f:f\colon\mathbb{Z}\to\mathbb{C} supported on [N][N], we have

(3.4) |𝔼n[N]f(n)|2N+HNhμH(h)𝔼n[N]Δhf(n).\displaystyle\left|\mathbb{E}_{n\in[N]}f(n)\right|^{2}\leq\frac{N+H}{\lfloor N\rfloor}\sum_{h\in\mathbb{Z}}\mu_{H}(h)\mathbb{E}_{n\in[N]}\Delta_{h}f(n).
Proof.

See for example [30, Lemma 3.1]. ∎

3.4. Vinogradov’s Fourier expansion

For a real number xx, we write x\|x\| for the distance from xx to the nearest integer(s).

We shall need a Fourier approximation for the indicator function of an interval that goes back to Vinogradov.

Lemma 3.2.

For any real numbers 1/2α<β1/2-1/2\leq\alpha<\beta\leq 1/2 and η(0,min{1/2α,1/2β,αβ/2})\eta\in(0,\min\{1/2-\|\alpha\|,1/2-\|\beta\|,\|\alpha-\beta\|/2\}), there exists a 11-periodic function g:[0,1]g\colon\mathbb{R}\to[0,1] with the following properties.

  1. (1)

    g(x)=1g(x)=1 for x[α+η,βη]x\in[\alpha+\eta,\beta-\eta], g(x)=0g(x)=0 for x[1/2,1/2][αη,β+η]x\in[-1/2,1/2]\setminus[\alpha-\eta,\beta+\eta], and 0g(x)10\leq g(x)\leq 1 for all x[1/2,1/2]x\in[-1/2,1/2].

  2. (2)

    For some |cj|10η|c_{j}|\leq 10\eta, we have the pointwise convergent Fourier representation

    g(x)=βαη+|j|>0cje(jx).\displaystyle g(x)=\beta-\alpha-\eta+\sum_{|j|>0}c_{j}e(jx).
  3. (3)

    For any K1K\geq 1, we have

    |j|>K|cj|10η1K.\displaystyle\sum_{|j|>K}|c_{j}|\leq\frac{10\eta^{-1}}{K}.
Proof.

This follows from [33, Lemma 12] (taking r=1r=1 there). ∎

4. Polynomial generalised von Neumann theorems

In this section, we prove generalised von Neumann theorems for the weighted polynomial counting operators

(4.1) ΛP1,,PkM,N(θ;f0,f1,,fk)1MNmn[N]θ(n)f0(m)f1(m+P1(n))fk(m+Pk(n)),\displaystyle\Lambda^{M,N}_{P_{1},\ldots,P_{k}}(\theta;f_{0},f_{1},\ldots,f_{k})\coloneqq\frac{1}{MN}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)f_{0}(m)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n)),

where P1,,Pk[y]P_{1},\ldots,P_{k}\in\mathbb{Z}[y] and θ:[N]\theta\colon[N]\to\mathbb{C} is a weight function (which in applications we take to be a multiplicative function) and f0,f1,,fk:f_{0},f_{1},\ldots,f_{k}\colon\mathbb{Z}\to\mathbb{C} are functions supported on [M,M][-M,M] (with MNmaxj(degPj)M\asymp N^{\max_{j}(\deg P_{j})}). Thus, we bound the expression (4.1) in terms of some Gowers norm (or related norm) of θ\theta or f0f_{0}. It is important for the proofs of our main theorems that the obtained results are quantitative, with polynomial dependencies. The two main results of this section (of independent interest) are Theorems 4.1 and 4.2; they are used for proving Theorems 1.6 and 1.2, respectively.

The first main result of this section states that if P1,,PkP_{1},\ldots,P_{k} have distinct degrees, then the polynomial counting operator (4.1) is bounded in terms of the us[N]u^{s}[N] norm of the weight θ\theta for some ss, with polynomial dependencies. It is important to have the us[N]u^{s}[N] norm rather than the Us[N]U^{s}[N] norm here, since for the Us[N]U^{s}[N] norm we do not currently have a polynomial inverse theorem, meaning that the Siegel–Walfisz assumption from Definition 1.5 would be insufficient if we only had a bound in terms of these norms.

Theorem 4.1 (A polynomial generalised von Neumann theorem with usu^{s} control).

Let d,kd,k\in\mathbb{N} and C1C\geq 1. Let P1,,PkP_{1},\ldots,P_{k} be a polynomials with integer coefficients satisfying degP1<degP2<<degPk=d\deg P_{1}<\deg P_{2}<\cdots<\deg P_{k}=d. Let N1N\geq 1, and let f0,,fk:f_{0},\ldots,f_{k}\colon\mathbb{Z}\to\mathbb{C} be functions supported on [CNd,CNd][-CN^{d},CN^{d}] with |fi|1|f_{i}|\leq 1 for all 0ik0\leq i\leq k, and let θ:[N]\theta\colon[N]\to\mathbb{C} be a function with |θ|1|\theta|\leq 1. Then, for some 1Kd11\leq K\ll_{d}1, we have

|1Nd+1mn[N]θ(n)f0(m)f1(m+P1(n))fk(m+Pk(n))|C,P1,,Pk(N1+θud+1[N])1/K.\displaystyle\left|\frac{1}{N^{d+1}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)f_{0}(m)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n))\right|\ll_{C,P_{1},\ldots,P_{k}}(N^{-1}+\|\theta\|_{u^{d+1}[N]})^{1/K}.

The proof of Theorem 4.1 is given in the next three subsections. In Subsection 4.1, we show that f0,f1f_{0},f_{1} can be assumed to be “locally linear phase functions” in a suitable sense. In Subsection 4.2, we prove the k=1k=1 case of the theorem using the circle method; this works as a base case for the proof which is by induction on kk. Finally, in Subsection 4.3, we use the conclusions of the preceding subsections together with an iterative argument for the function f1f_{1} to conclude the proof.

The second main result of this section is that, for any polynomials P1,,PkP_{1},\ldots,P_{k}, the polynomial counting operator (4.1) is always bounded by some Us[N]U^{s}[N] norm of the weight θ\theta, with linear dependence on the Gowers norm. In what follows, for any finite nonempty collection 𝒬\mathcal{Q} of polynomials with integer coefficients, we define its degree deg𝒬\deg\mathcal{Q} as the largest of the degrees of the polynomials in 𝒬\mathcal{Q}.

Theorem 4.2 (A polynomial generalised von Neumann theorem with UsU^{s} control).

Let dd\in\mathbb{N} and C1C\geq 1. Let N1N\geq 1, and let θ:[N]\theta\colon[N]\to\mathbb{C} be a function. Let 𝒬\mathcal{Q} be a finite collection of polynomials with integer coefficients satisfying deg𝒬=d\deg\mathcal{Q}=d and Q([N])[CNd,CNd]Q([N])\subset[-CN^{d},CN^{d}] for all Q𝒬Q\in\mathcal{Q}. For each Q𝒬Q\in\mathcal{Q} let fQ:f_{Q}\colon\mathbb{Z}\to\mathbb{C} be a function supported on [CNd,CNd][-CN^{d},CN^{d}] with |fQ|1|f_{Q}|\leq 1. Then, for some natural number s|𝒬|,deg𝒬1s\ll_{|\mathcal{Q}|,\deg\mathcal{Q}}1, we have

(4.2) |1Nd+1mn[N]θ(n)Q𝒬fQ(m+Q(n))||𝒬|,deg𝒬,CθUs[N].\displaystyle\left|\frac{1}{N^{d+1}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)\prod_{Q\in\mathcal{Q}}f_{Q}(m+Q(n))\right|\ll_{|\mathcal{Q}|,\deg\mathcal{Q},C}\|\theta\|_{U^{s}[N]}.

The proof of Theorem 4.2 is based on the PET induction scheme and is given in Subsection 4.4. We also present there a multidimensional version of the special case where deg𝒬=1\deg\mathcal{Q}=1 (Lemma 4.7); this will be needed for the proof of Theorem 1.3.

4.1. Transferring to locally linear functions

The first step in the proof of Theorem 4.1 is to show that if a polynomial average of the form (4.1) is large, then the functions f0,f1f_{0},f_{1} can be assumed to be locally linear phase functions. In what follows, we say that a function ϕ:\phi\colon\mathbb{Z}\to\mathbb{C} is a locally linear phase function of resolution MM is for some real numbers αm\alpha_{m} we have ϕ(m)=e(αmm)\phi(m)=e(\alpha_{m}m) for all mm\in\mathbb{Z} and if additionally there is a partition of \mathbb{Z} into discrete intervals of length MM such that mαmm\mapsto\alpha_{m} is constant on the cells of that partition. We call the set {αm(mod 1):m}\{\alpha_{m}\ (\mathrm{mod}\ 1)\colon m\in\mathbb{Z}\} the spectrum of ϕ\phi.

Proposition 4.3 (Reduction to locally linear phases).

Let C1C\geq 1, d,kd,k\in\mathbb{N}, and let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients and with degP1<<degPk=d\deg P_{1}<\cdots<\deg P_{k}=d. Let N1N\geq 1, and let f0,f1,,fk:f_{0},f_{1},\ldots,f_{k}\colon\mathbb{Z}\to\mathbb{C} be functions supported on [CNd,CNd][-CN^{d},CN^{d}] and with |fi|1|f_{i}|\leq 1 for all 0ik0\leq i\leq k. Let θ:[N]\theta\colon[N]\to\mathbb{C} be a function with |θ|1|\theta|\leq 1. Then, for some 1Kd11\leq K\ll_{d}1, we have

|1Nd+1mn[N]θ(n)f0(m)f1(m+P1(n))fk(m+Pk(n))|\displaystyle\left|\frac{1}{N^{d+1}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)f_{0}(m)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n))\right|
C,P1,,Pk(N1+|1Nd+1mn[N]θ(n)ϕ0(m)ϕ1(m+P1(n))fk(m+Pk(n))|)1/K\displaystyle\quad\ll_{C,P_{1},\ldots,P_{k}}\left(N^{-1}+\left|\frac{1}{N^{d+1}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)\phi_{0}(m)\phi_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n))\right|\right)^{1/K}

for some locally linear phase functions ϕ0,ϕ1\phi_{0},\phi_{1} of resolution C,P1,,PkδOd(1)NdegP1\gg_{C,P_{1},\ldots,P_{k}}\delta^{O_{d}(1)}N^{\deg P_{1}}. Moreover, we may assume that the spectra of ϕ0,ϕ1\phi_{0},\phi_{1} belong to 1NdegP1\frac{1}{N^{\deg P_{1}}}\mathbb{Z}.

Proposition 4.3 may be compared with, and is motivated by, the work of Peluse and Prendiville [29, Theorem 1.5], where in the case k=2k=2, θ1\theta\equiv 1 and P1(y)=y,P2(y)=y2P_{1}(y)=y,P_{2}(y)=y^{2}, it is proven that f0,f1,f2f_{0},f_{1},f_{2} can be replaced more strongly with major arc locally linear phase functions. In the more general setup of Proposition 4.3, it is not possible to reduce to major arc locally linear phase functions.

For the proof of Proposition 4.3, we need Peluse’s inverse theorem.

Lemma 4.4 (Peluse’s inverse theorem).

Let k,d1,dk,d_{1},d\in\mathbb{N} and C1C\geq 1. Let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients satisfying Pi(0)=0P_{i}(0)=0 for all 1ik1\leq i\leq k and d1=degP1<<degPk=dd_{1}=\deg P_{1}<\ldots<\deg P_{k}=d, and with all the coefficients of the polynomials PiP_{i} being bounded by CC in modulus.

Let N1N\geq 1 and δ(0,1/2)\delta\in(0,1/2). Let f1,,fk:f_{1},\ldots,f_{k}\colon\mathbb{Z}\to\mathbb{C} be functions supported on [CNd,CNd][-CN^{d},CN^{d}] with |fi|1|f_{i}|\leq 1 for all 0ik0\leq i\leq k. Then there exists 1Bd11\leq B\ll_{d}1 such that for either j{0,1}j\in\{0,1\} we have

(4.3) |1Nd+1mn[N]f0(m)f1(m+P1(n))fk(m+Pk(n))|C,dδ+(N1+maxq[δB]N[δBNd1,Nd1]1Ndm|1Nn[N]fj(m+qn)|)1/B.\displaystyle\begin{split}&\left|\frac{1}{N^{d+1}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}f_{0}(m)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n))\right|\\ &\quad\ll_{C,d}\delta+\left(N^{-1}+\max_{\begin{subarray}{c}q\in[\delta^{-B}]\\ N^{\prime}\in[\delta^{B}N^{d_{1}},N^{d_{1}}]\end{subarray}}\frac{1}{N^{d}}\sum_{m\in\mathbb{Z}}\left|\frac{1}{N^{\prime}}\sum_{n\in[N^{\prime}]}f_{j}(m+qn)\right|\right)^{1/B}.\end{split}
Proof.

This will follow from [28, Theorem 3.3] after some reductions. It suffices to show that for each j{0,1}j\in\{0,1\} there exists 1Bd11\leq B\ll_{d}1 such that (4.3) holds, since we may increase BB if necessary.

Suppose first that j=0j=0. Using the notation (4.1), let

(4.4) η|ΛP1,,PkCNd,N(1;f0,,fk)|.\displaystyle\eta\coloneqq|\Lambda^{CN^{d},N}_{P_{1},\ldots,P_{k}}(1;f_{0},\ldots,f_{k})|.

Then CηC\eta is equal to the left-hand side of (4.3). We may clearly assume that ηδ\eta\geq\delta and that δ1/L\delta\leq 1/L for any given constant L=LC,dL=L_{C,d}. We may further assume that NδKηKN\geq\delta^{-K}\geq\eta^{-K} for any given constant K=KdK=K_{d}, since otherwise by taking B=Kd1B=Kd_{1} the claim follows (with q=1q=1 in (4.3)) from the crude triangle inequality bound

|ΛP1,,PkCNd,N(1;f0,,fk)|1CNdm[CNd,CNd]|f0(m)|.|\Lambda^{CN^{d},N}_{P_{1},\ldots,P_{k}}(1;f_{0},\ldots,f_{k})|\leq\frac{1}{CN^{d}}\sum_{m\in[-CN^{d},CN^{d}]}|f_{0}(m)|.

Now, applying444In  [28, Theorem 3.3], the functions fif_{i} are assumed to be supported on [CNd][CN^{d}] instead of [CNd,CNd][-CN^{d},CN^{d}], but this makes no difference in the argument. [28, Theorem 3.3] (with η\eta in place of δ\delta) we see that there exists some 1Bd11\leq B\ll_{d}1 such that

maxq[δB]N[δBNd1,Nd1]1Ndm|1Nn[N]fj(m+qn)|C,dηB,\displaystyle\max_{\begin{subarray}{c}q\in[\delta^{-B}]\\ N^{\prime}\in[\delta^{B}N^{d_{1}},N^{d_{1}}]\end{subarray}}\frac{1}{N^{d}}\sum_{m\in\mathbb{Z}}\left|\frac{1}{N^{\prime}}\sum_{n\in[N^{\prime}]}f_{j}(m+qn)\right|\gg_{C,d}\eta^{B},

which in view of (4.4) implies the claim.

Suppose then that j=1j=1. Then, making the change of variables m=m+P1(n)m^{\prime}=m+P_{1}(n) in (4.1), we see that

ΛP1,,PkCNd,N(1;f0,f1,f2,,fk)=ΛP1,P2P1,,PkP1CNd,N(1;f1,f0,f2,,fk)\displaystyle\Lambda^{CN^{d},N}_{P_{1},\ldots,P_{k}}(1;f_{0},f_{1},f_{2},\ldots,f_{k})=\Lambda^{CN^{d},N}_{-P_{1},P_{2}-P_{1},\ldots,P_{k}-P_{1}}(1;f_{1},f_{0},f_{2},\ldots,f_{k})

Now the claim follows from the j=0j=0 case handled above. ∎

Proof of Proposition 4.3.

We begin with a few reductions. Firstly, we may extend θ\theta to a function on \mathbb{Z} by setting it equal to 0 outside [N][N]. Secondly, we may assume that Pi(0)=0P_{i}(0)=0 for 1ik1\leq i\leq k by translating the functions fif_{i} if necessary. Thirdly, we may assume that CC is large enough in terms of P1,,PkP_{1},\ldots,P_{k} so that

max1ikmaxn[N]|Pi(n)|CNd.\displaystyle\max_{1\leq i\leq k}\max_{n\in[N]}|P_{i}(n)|\leq CN^{d}.

Let

(4.5) η|ΛP1,,PkCNd,N(θ;f0,f1,,fk)|.\displaystyle\eta\coloneqq|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta;f_{0},f_{1},\ldots,f_{k})|.

We may assume that NηKN\geq\eta^{-K} for any given constant KK depending on dd, as otherwise the claim readily follows.

We first wish to replace f0f_{0} with a locally linear phase function. For mm\in\mathbb{Z}, define the first dual function

F(m)𝔼n[N]θ(n)f1(m+P1(n))fk(m+Pk(n)).\displaystyle F(m)\coloneqq\mathbb{E}_{n\in[N]}\theta(n)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n)).

Then by (4.5) we have

|mf0(m)F(m)|CηNd,\displaystyle\left|\sum_{m\in\mathbb{Z}}f_{0}(m)F(m)\right|\gg_{C}\eta N^{d},

so by the Cauchy–Schwarz inequality we get

(4.6) |mF¯(m)F(m)|Cη2Nd.\displaystyle\left|\sum_{m\in\mathbb{Z}}\overline{F}(m)F(m)\right|\gg_{C}\eta^{2}N^{d}.

By the definition of FF, (4.6) expands out as

|m𝔼n[N]θ(n)F¯(m)f1(m+P1(n))fk(m+Pk(n))|Cη2Nd.\displaystyle\left|\sum_{m\in\mathbb{Z}}\mathbb{E}_{n\in[N]}\theta(n)\overline{F}(m)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n))\right|\gg_{C}\eta^{2}N^{d}.

Denote d1degP1d_{1}\coloneqq\deg P_{1}. Applying Cauchy–Schwarz and van der Corput’s inequality (Lemma 3.1), this implies that

|h[Nd1,Nd1]μNd1(h)𝔼n[N]mΔhF¯(m)Δhf1(m+P1(n))Δhfk(m+Pk(n))|Cη4Nd.\displaystyle\left|\sum_{h\in[-N^{d_{1}},N^{d_{1}}]}\mu_{N^{d_{1}}}(h)\mathbb{E}_{n\in[N]}\sum_{m\in\mathbb{Z}}\Delta_{h}\overline{F}(m)\Delta_{h}f_{1}(m+P_{1}(n))\cdots\Delta_{h}f_{k}(m+P_{k}(n))\right|\gg_{C}\eta^{4}N^{d}.

Noting that Nd1|μNd1(h)|1|h|Nd1N^{d_{1}}|\mu_{N^{d_{1}}}(h)|\ll 1_{|h|\leq N^{d_{1}}}, from the triangle inequality and the pigeonhole principle we now see that

(4.7) |𝔼n[N]mΔhF¯(m)Δhf1(m+P1(n))Δhfk(m+Pk(n))|Cη4Nd\displaystyle\left|\mathbb{E}_{n\in[N]}\sum_{m\in\mathbb{Z}}\Delta_{h}\overline{F}(m)\Delta_{h}f_{1}(m+P_{1}(n))\cdots\Delta_{h}f_{k}(m+P_{k}(n))\right|\gg_{C}\eta^{4}N^{d}

for Cη4Nd1\gg_{C}\eta^{4}N^{d_{1}} integers h[Nd1,Nd1]h\in[-N^{d_{1}},N^{d_{1}}].

Applying Lemma 4.4 and the pigeonhole principle, from (4.7) we conclude that there exists a constant 1Bd11\leq B\ll_{d}1 and an integer 1qC,P1,,PkηB1\leq q\ll_{C,P_{1},\ldots,P_{k}}\eta^{-B} such that

(4.8) 𝔼h[Nd1,Nd1]maxM[ηBNd1,Nd1]𝔼m[2CNd,2CNd]|𝔼y[qM]1y0(modq)ΔhF(m+y)|C,P1,,PkηB.\displaystyle\mathbb{E}_{h\in[-N^{d_{1}},N^{d_{1}}]}\max_{M\in[\eta^{B}N^{d_{1}},N^{d_{1}}]}\mathbb{E}_{m\in[-2CN^{d},2CN^{d}]}\left|\mathbb{E}_{y\in[qM]}1_{y\equiv 0\ (\mathrm{mod}\ q)}\Delta_{h}F(m+y)\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B}.

Using the simple bound

(4.9) 𝔼m[X,X]|𝔼y[Y]a(m+y)|𝔼m[X,X]|𝔼y[Y]a(m+y)|+Y/Y+Y/X,\displaystyle\mathbb{E}_{m\in[-X,X]}|\mathbb{E}_{y\in[Y]}a(m+y)|\ll\mathbb{E}_{m\in[-X,X]}|\mathbb{E}_{y\in[Y^{\prime}]}a(m+y)|+Y^{\prime}/Y+Y/X,

valid for any bounded sequence a:a\colon\mathbb{Z}\to\mathbb{C} and 1YYX1\leq Y^{\prime}\leq Y\leq X, and setting

N=η4B+2Nd1,N^{\prime}=\eta^{4B+2}N^{d_{1}},

from (4.8) we deduce that

𝔼h[Nd1,Nd1]𝔼m[2CNd,2CNd]|𝔼y[N]1y0(modq)ΔhF(m+y)|C,P1,,PkηB.\displaystyle\mathbb{E}_{h\in[-N^{d_{1}},N^{d_{1}}]}\mathbb{E}_{m\in[-2CN^{d},2CN^{d}]}\left|\mathbb{E}_{y\in[N^{\prime}]}1_{y\equiv 0\ (\mathrm{mod}\ q)}\Delta_{h}F(m+y)\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B}.

Splitting the hh average into intervals of length 2N2N^{\prime} and applying the pigeonhole principle, we see that there exists some integer ||Nd1|\ell|\leq N^{d_{1}} such that

𝔼h[N,N]𝔼m[2CNd,2CNd]|𝔼y[N]1y0(modq)Δh+F(m+y)|C,P1,,PkηB.\displaystyle\mathbb{E}_{h\in[-N^{\prime},N^{\prime}]}\mathbb{E}_{m\in[-2CN^{d},2CN^{d}]}\left|\mathbb{E}_{y\in[N^{\prime}]}1_{y\equiv 0\ (\mathrm{mod}\ q)}\Delta_{h+\ell}F(m+y)\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B}.

From Cauchy–Schwarz and van der Corput’s inequality, we then see that

(4.10) 𝔼h[N,N]𝔼h[N,N]NμN(h)𝔼m[2CNd,2CNd]𝔼y[N]1y,y+h0(modq)Δh+ΔhF(m+y)C,P1,,Pkη2B.\displaystyle\begin{split}&\mathbb{E}_{h\in[-N^{\prime},N^{\prime}]}\mathbb{E}_{h^{\prime}\in[-N^{\prime},N^{\prime}]}\lfloor N^{\prime}\rfloor\mu_{N^{\prime}}(h^{\prime})\mathbb{E}_{m\in[-2CN^{d},2CN^{d}]}\mathbb{E}_{y\in[N^{\prime}]}1_{y,y+h^{\prime}\equiv 0\ (\mathrm{mod}\ q)}\Delta_{h+\ell}\Delta_{h^{\prime}}F(m+y)\\ &\quad\gg_{C,P_{1},\ldots,P_{k}}\eta^{2B}.\end{split}

We wish to remove the weight μN(h)\mu_{N^{\prime}}(h^{\prime}) from (4.10). To this end, we note the easily verified Fourier expansion

1|x|=12+2π2n1(mod 2)1n2e(n2x)\displaystyle 1-|x|=\frac{1}{2}+\frac{2}{\pi^{2}}\sum_{n\equiv 1\ (\mathrm{mod}\ 2)}\frac{1}{n^{2}}e\left(\frac{n}{2}x\right)

for x[1,1]x\in[-1,1], which allows us to write for |h|N|h^{\prime}|\leq N the expansion

NμN(h)=1|hN|=124π2n1(mod 2)1n2e(h2Nn).\displaystyle\lfloor N^{\prime}\rfloor\mu_{N^{\prime}}(h^{\prime})=1-\left|\frac{h^{\prime}}{\lfloor N^{\prime}\rfloor}\right|=\frac{1}{2}-\frac{4}{\pi^{2}}\sum_{n\equiv 1\ (\mathrm{mod}\ 2)}\frac{1}{n^{2}}e\left(\frac{h^{\prime}}{2\lfloor N^{\prime}\rfloor}n\right).

Substituting this to (4.10) and expanding 1y0(modq)=0r<qe(ry/q)1_{y\equiv 0\ (\mathrm{mod}\ q)}=\sum_{0\leq r<q}e(ry/q) and using e(ξh)=e(ξ(y+h+h))e(ξ(y+h))e(\xi h^{\prime})=e(\xi(y+h+h^{\prime}))e(-\xi(y+h)) and e(ξy)=e(ξ(y+h))e(ξ(y+h))e(ξ(y+h+h))e(\xi y)=e(\xi(y+h))e(\xi(y+h^{\prime}))e(-\xi(y+h+h^{\prime})), we see that

𝔼h[N,N]𝔼h[N,N]𝔼m[2CNd,2CNd]𝔼y[N]ω{0,1}2Fω(m+y+ω(h,h))\displaystyle\mathbb{E}_{h\in[-N^{\prime},N^{\prime}]}\mathbb{E}_{h^{\prime}\in[-N^{\prime},N^{\prime}]}\mathbb{E}_{m\in[-2CN^{d},2CN^{d}]}\mathbb{E}_{y\in[N^{\prime}]}\prod_{\omega\in\{0,1\}^{2}}F_{\omega}(m+y+\omega\cdot(h,h^{\prime}))
C,P1,,Pkη2B,\displaystyle\quad\gg_{C,P_{1},\ldots,P_{k}}\eta^{2B},

where F(0,0)=FF_{(0,0)}=F and |Fω(x)|C1|F_{\omega}(x)|\ll_{C}1 for all x,ωx,\omega. Hence, by the Gowers–Cauchy–Schwarz inequality (3.2), we find

𝔼m[2CNd,2CNd]FU2[m,m+N]C,P1,,Pkη2B.\displaystyle\mathbb{E}_{m\in[-2CN^{d},2CN^{d}]}\|F\|_{U^{2}[m,m+N^{\prime}]}\gg_{C,P_{1},\ldots,P_{k}}\eta^{2B}.

Applying the pigeonhole principle, we deduce that

(4.11) FU2[m,m+N]C,P1,,Pkη2B\displaystyle\|F\|_{U^{2}[m,m+N^{\prime}]}\gg_{C,P_{1},\ldots,P_{k}}\eta^{2B}

for η2BNd/N\gg\eta^{2B}N^{d}/N^{\prime} integers mN[CNd,CNd]m\in N^{\prime}\mathbb{Z}\cap[-CN^{d},CN^{d}].

Note that for any complex number zz we have |z|10maxj{0,1,2}{Re(e(j/3)z)}|z|\leq 10\max_{j\in\{0,1,2\}}\{\textnormal{Re}(e(j/3)z)\}. For j{0,1,2}j\in\{0,1,2\}, let j\mathcal{M}_{j} be the set of mN[CNd,CNd]m\in N^{\prime}\mathbb{Z}\cap[-CN^{d},CN^{d}] for which

(4.12) supα[0,1]Re(e(j3)𝔼x[m,m+N]F(x)e(αx))η4B+1.\displaystyle\sup_{\alpha\in[0,1]}\textnormal{Re}\left(e\left(\frac{j}{3}\right)\mathbb{E}_{x\in[m,m+N^{\prime}]}F(x)e(\alpha x)\right)\geq\eta^{4B+1}.

Then, by the pigeonhole principle, the U2[N]U^{2}[N^{\prime}] inverse theorem (3.1) and (4.11), we have |j0|η2BNd/N|\mathcal{M}_{j_{0}}|\gg\eta^{2B}N^{d}/N^{\prime} for some j0{0,1,2}j_{0}\in\{0,1,2\}. For any mj0m\in\mathcal{M}_{j_{0}}, let αm1NdegP1\alpha_{m}^{\prime}\in\frac{1}{N^{\deg P_{1}}}\mathbb{Z} be a point where the supremum in (4.12) is attained, and let αm\alpha_{m} be an element of 1Nd1\frac{1}{N^{d_{1}}}\mathbb{Z} nearest to αm\alpha_{m}^{\prime}. Recalling the definition of NN^{\prime}, we have

Re(e(j03)𝔼x[m,m+N]F(x)e(αmx))C,P1,,Pkη4B+1.\displaystyle\textnormal{Re}\left(e\left(\frac{j_{0}}{3}\right)\mathbb{E}_{x\in[m,m+N^{\prime}]}F(x)e(\alpha_{m}x)\right)\gg_{C,P_{1},\ldots,P_{k}}\eta^{4B+1}.

For mNj0m\in N^{\prime}\mathbb{Z}\setminus\mathcal{M}_{j_{0}}, note that by Parseval’s identity there is some αm1Nd1\alpha_{m}\in\frac{1}{N^{d_{1}}}\mathbb{Z} for which

|x[m,m+N]F(x)e(αmx)|C(N)1/2.\displaystyle\left|\sum_{x\in[m,m+N^{\prime}]}F(x)e(\alpha_{m}x)\right|\ll_{C}(N^{\prime})^{1/2}.

Now, extend the definition of αm\alpha_{m} from NN^{\prime}\mathbb{Z} to all of \mathbb{Z} by letting αm=αm\alpha_{m}=\alpha_{m^{\prime}}, where mm^{\prime} is the largest element of NN^{\prime}\mathbb{Z} that is at most mm. Then, define the locally linear phase function ϕ0(m)=e(αmm)\phi_{0}(m)=e(\alpha_{m}m), which has resolution NN^{\prime}. We now have

|x[CNd,CNd]F(x)ϕ0(x)|C,P1,,Pkη2Bη4B+1\displaystyle\left|\sum_{x\in[-CN^{d},CN^{d}]}F(x)\phi_{0}(x)\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{2B}\cdot\eta^{4B+1}

Recalling the definition of FF, we conclude that

(4.13) |mn[N]θ(n)ϕ0(m)f1(m+P1(n))fk(m+Pk(n))|C,P1,,PkηBNd+1,\displaystyle\left|\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)\phi_{0}(m)f_{1}(m+P_{1}(n))\cdots f_{k}(m+P_{k}(n))\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B^{\prime}}N^{d+1},

where B=6B+1B^{\prime}=6B+1.

We proceed to replace also f1f_{1} with a locally linear phase function. For mm\in\mathbb{Z}, define the second dual function as

G(m)𝔼n[N]θ(n)ϕ0(mP1(n))f2(m+P2(n)P1(n))fk(m+Pk(n)P1(n)).\displaystyle G(m)\coloneqq\mathbb{E}_{n\in[N]}\theta(n)\phi_{0}(m-P_{1}(n))f_{2}(m+P_{2}(n)-P_{1}(n))\cdots f_{k}(m+P_{k}(n)-P_{1}(n)).

Making a change of variables, from (4.13) it follows that

|mf1(m)G(m)|C,P1,,PkηBNd.\displaystyle\left|\sum_{m\in\mathbb{Z}}f_{1}(m)G(m)\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B^{\prime}}N^{d}.

Arguing verbatim as above, we deduce that there exists a locally linear phase function ϕ1:\phi_{1}\colon\mathbb{Z}\to\mathbb{C} of resolution N′′=ηB′′Nd1N^{\prime\prime}=\eta^{B^{\prime\prime}}N^{d_{1}} and with spectrum in 1Nd1\frac{1}{N^{d_{1}}}\mathbb{Z} such that

|mG(m)ϕ1(m)|C,P1,,PkηB′′Nd,\displaystyle\left|\sum_{m\in\mathbb{Z}}G(m)\phi_{1}(m)\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B^{\prime\prime}}N^{d},

where B′′=6B+1B^{\prime\prime}=6B+1^{\prime}.

Recalling the definition of GG, this means that

|mn[N]θ(n)ϕ0(m)ϕ1(m+P1(n))f2(m+P2(n))fk(m+Pk(n))|C,P1,,PkηB′′Nd+1.\displaystyle\left|\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)\phi_{0}(m)\phi_{1}(m+P_{1}(n))f_{2}(m+P_{2}(n))\cdots f_{k}(m+P_{k}(n))\right|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B^{\prime\prime}}N^{d+1}.

This gives the desired claim. ∎

4.2. A circle method bound

The proof of Theorem 4.1 will proceed by induction on kk, so we first need to bound the weighted averages (4.1) with k=1k=1. These averages can be controlled simply by using classical Fourier analysis.

Lemma 4.5.

Let dd\in\mathbb{N} and C1C\geq 1. Let PP be a polynomial of degree dd with integer coefficients. Let N1N\geq 1, and let f0,f1:f_{0},f_{1}\colon\mathbb{Z}\to\mathbb{C} be functions supported on [CNd,CNd][-CN^{d},CN^{d}] with |fi|1|f_{i}|\leq 1 for both i{0,1}i\in\{0,1\}, and let θ:[N]\theta\colon[N]\to\mathbb{C} be a function. Then we have

(4.14) |1Nd+1mn[N]θ(n)f0(m)f1(m+P(n))|C,Pθud+1[N].\displaystyle\left|\frac{1}{N^{d+1}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)f_{0}(m)f_{1}(m+P(n))\right|\ll_{C,P}\|\theta\|_{u^{d+1}[N]}.
Proof.

Let C1C^{\prime}\geq 1 (depending on C,PC,P) be such that maxn[N]|P(n)|(CC)Nd\max_{n\in[N]}|P(n)|\leq(C^{\prime}-C)N^{d}. Then, by the orthogonality of characters, the left-hand side of (4.14) without absolute values equals to

1Nd+1|a|(CP+C)Nd|m|CNdn[N]θ(n)f0(m)f1(a)1a=m+P(n)\displaystyle\frac{1}{N^{d+1}}\sum_{|a|\leq(C_{P}+C)N^{d}}\sum_{|m|\leq CN^{d}}\sum_{n\in[N]}\theta(n)f_{0}(m)f_{1}(a)1_{a=m+P(n)}
=1Nd+101(|a|CNdf1(a)e(ξa))(|m|CNdf0(m)e(ξm))(n[N]θ(n)e(ξP(n))) dξ.\displaystyle\quad=\frac{1}{N^{d+1}}\int_{0}^{1}\left(\sum_{|a|\leq C^{\prime}N^{d}}f_{1}(a)e(\xi a)\right)\left(\sum_{|m|\leq CN^{d}}f_{0}(m)e(-\xi m)\right)\left(\sum_{n\in[N]}\theta(n)e(-\xi P(n))\right)\textnormal{ d}\xi.

Now the claim follows by bounding the exponential sum involving θ\theta pointwise by Nθud+1[N]N\|\theta\|_{u^{d+1}[N]} and by using Cauhcy–Schwarz and Parseval’s identity to the remaining two exponential sums. ∎

4.3. Proof of Theorem 4.1

We are now ready to prove the claimed estimate for the operator (4.1) in the case of distinct degree polynomials.

Proof of Theorem 4.1.

We use induction on kk. The base case k=1k=1 follows from Lemma 4.5. Suppose that the case k1k-1 has been proven for some k2k\geq 2, and consider the case kk.

Step 1: Reduction to locally linear phase functions. Let δ|ΛP1,,PkCNd,N(θ;f0,f1,,fk)|\delta\coloneqq|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta;f_{0},f_{1},\ldots,f_{k})|. We may assume that 1/K′′δN1/K1/K^{\prime\prime}\geq\delta\geq N^{-1/K^{\prime}} for any large constants K=KdK^{\prime}=K^{\prime}_{d} and K′′=KC,P1,,Pk′′K^{\prime\prime}=K^{\prime\prime}_{C,P_{1},\ldots,P_{k}}, as otherwise there is nothing to prove. By Proposition 4.3, there exist Cd1C_{d}\geq 1 and locally linear phase functions ϕ0,ϕ1:\phi_{0},\phi_{1}\colon\mathbb{Z}\to\mathbb{C} of resolution N\geq N^{\prime} for some NC,P1,,PkδCdNdegP1N^{\prime}\gg_{C,P_{1},\ldots,P_{k}}\delta^{C_{d}}N^{\deg P_{1}}, and with the spectra of ϕ0,ϕ1\phi_{0},\phi_{1} belonging to 1NdegP1\frac{1}{N^{\deg P_{1}}}\mathbb{Z}, such that

(4.15) |ΛP1,,PkCNd,N(θ;ϕ0,ϕ1,f2,,fk)|δCd.\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta;\phi_{0},\phi_{1},f_{2},\ldots,f_{k})|\geq\delta^{C_{d}}.

Step 2: An iteration for the locally linear phase function. We can write ϕ1(m)=e(αmm)\phi_{1}(m)=e(\alpha_{m}m) for some αm1NdegP1\alpha_{m}\in\frac{1}{N^{\deg P_{1}}}\mathbb{Z}, with mαmm\mapsto\alpha_{m} is being constant on the the intervals [jN+a,(j+1)N+a)[jN^{\prime}+a,(j+1)N^{\prime}+a) for some a[N]a\in[N^{\prime}] and all jj\in\mathbb{Z}. For any set SS\subset\mathbb{R}, write ϕ1,S(m)ϕ1(m)1αmS\phi_{1,S}(m)\coloneqq\phi_{1}(m)1_{\alpha_{m}\not\in S}.

Claim. If CdC_{d}^{\prime} and KdK^{\prime}_{d} are large, the following holds. For any η(0,N1/Kd)\eta\in(0,N^{-1/K_{d}^{\prime}}) and any finite (possibly empty) set SS, if

(4.16) |ΛP1,,PkCNd,N(θ;ϕ0,ϕ1,S,f2,,fk)|η,\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta;\phi_{0},\phi_{1,S},f_{2},\ldots,f_{k})|\geq\eta,

then either

(4.17) |ΛP1,,PkCNd,N(θ;ϕ0,ϕ1ϕ1,{α},f2,,fk)|η2Cd\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta;\phi_{0},\phi_{1}-\phi_{1,\{\alpha\}},f_{2},\ldots,f_{k})|\geq\eta^{2C_{d}^{\prime}}

or there exists αS\alpha\not\in S such that αm=α\alpha_{m}=\alpha for ηCdNd\geq\eta^{C_{d}^{\prime}}N^{d} integers m[CNd,CNd]m\in[-CN^{d},CN^{d}] and

(4.18) |ΛP1,,PkCNd,N(θ;ϕ0,ϕ1,S{α},f2,,fk)|ηη2Cd.\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta;\phi_{0},\phi_{1,S\cup\{\alpha\}},f_{2},\ldots,f_{k})|\geq\eta-\eta^{2C_{d}^{\prime}}.

For proving this claim, we first apply van der Corput’s inequality (Lemma (3.1)) to (4.16) to conclude that there is a set [CNd,CNd]\mathcal{H}\subset[-CN^{d},CN^{d}] of size η2Nd\gg\eta^{2}N^{d} such that for hh\in\mathcal{H} we have

|ΛP1,,PkCNd,N(1;Δhϕ0,Δhϕ1,S,Δhf2,,Δhfk)|η2.\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(1;\Delta_{h}\phi_{0},\Delta_{h}\phi_{1,S},\Delta_{h}f_{2},\ldots,\Delta_{h}f_{k})|\gg\eta^{2}.

From Lemma 4.4 and the pigeonhole principle, we now conclude that for some constant Bd1B_{d}\geq 1, some integer 1qηBd1\leq q\leq\eta^{-B_{d}} and some M[ηBdNdegP1,NdegP1]M\in[\eta^{B_{d}}N^{\deg P_{1}},N^{\deg P_{1}}], we have

𝔼|x|2CNd|𝔼m[M]ϕ1,S¯(x+qm+h)ϕ1,S(x+qm)|C,P1,,PkηBd\displaystyle\mathbb{E}_{|x|\leq 2CN^{d}}|\mathbb{E}_{m\in[M]}\overline{\phi_{1,S}}(x+qm+h)\phi_{1,S}(x+qm)|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B_{d}}

for C,P1,,PkηBdNd\gg_{C,P_{1},\ldots,P_{k}}\eta^{B_{d}}N^{d} integers hh\in\mathcal{H}. Let \mathcal{H}^{\prime} be the set of such hh. By (4.9) we also have

(4.19) 𝔼|x|2CNd|𝔼m[M]ϕ1,S¯(x+qm+h)ϕ1,S(x+qm)|C,P1,,PkηBd\displaystyle\mathbb{E}_{|x|\leq 2CN^{d}}|\mathbb{E}_{m\in[M^{\prime}]}\overline{\phi_{1,S}}(x+qm+h)\phi_{1,S}(x+qm)|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B_{d}}

for hh\in\mathcal{H}^{\prime}, where M=η4Bd+2NM^{\prime}=\eta^{4B_{d}+2}N^{\prime}.

From the pigeonhole principle, we see that there exist h0[N]h_{0}\in[N^{\prime}] and ′′\mathcal{H}^{\prime\prime}\subset\mathcal{H}^{\prime} with |′′|C,P1,,Pkη4BdNd|\mathcal{H}^{\prime\prime}|\gg_{C,P_{1},\ldots,P_{k}}\eta^{4B_{d}}N^{d} such that (hh0)/Nη3Bd\|(h-h_{0})/N^{\prime}\|\leq\eta^{3B_{d}} for all hh\in\mathcal{H}^{\prime}. Let 𝒳\mathcal{X} be the set of x[CNd,CNd]x\in[-CN^{d},CN^{d}]\cap\mathbb{Z} for which

minh{0,h0}x+haNη2Bd.\displaystyle\min_{h^{\prime}\in\{0,h_{0}\}}\left\|\frac{x+h^{\prime}-a}{N^{\prime}}\right\|\geq\eta^{2B_{d}}.

Then |𝒳|(2CO(η2Bd))Nd|\mathcal{X}|\geq(2C-O(\eta^{2B_{d}}))N^{d}. Note that, for any x𝒳x\in\mathcal{X}, h′′h\in\mathcal{H}^{\prime\prime} and m[η2BdN/2]m\in[\eta^{2B_{d}}N^{\prime}/2], we have

ϕ1,S¯(x+m+h)ϕ1,S(x+m)=1αxS1αx+hSe((αxαx+h)(x+m)hαx+h).\displaystyle\overline{\phi_{1,S}}(x+m+h)\phi_{1,S}(x+m)=1_{\alpha_{x}\not\in S}\cdot 1_{\alpha_{x+h}\not\in S}e((\alpha_{x}-\alpha_{x+h})(x+m)-h\alpha_{x+h}).

Hence, recalling (4.19) and applying the pigeonhole principle, we see that there is some x0𝒳x_{0}\in\mathcal{X} with αx0S\alpha_{x_{0}}\not\in S such that for all h′′h\in\mathcal{H}^{\prime\prime} we have αx0+hS\alpha_{x_{0}+h}\not\in S and

|𝔼m[M]e((αx0αx0+h)qm)|C,P1,,PkηBd.\displaystyle|\mathbb{E}_{m\in[M^{\prime}]}e((\alpha_{x_{0}}-\alpha_{x_{0}+h})qm)|\gg_{C,P_{1},\ldots,P_{k}}\eta^{B_{d}}.

By the geometric sum formula, we conclude that, for all h′′h\in\mathcal{H}^{\prime\prime}, we have

q(αx0αx0+h)C,P1,,PkηBdNdegP1.\displaystyle\|q(\alpha_{x_{0}}-\alpha_{x_{0}+h})\|\ll_{C,P_{1},\ldots,P_{k}}\frac{\eta^{-B_{d}}}{N^{\deg P_{1}}}.

Recalling that αm1NdegP1\alpha_{m}\in\frac{1}{N^{\deg P_{1}}}\mathbb{Z}, by the pigeonhole principle we conclude that there is some constant Cd1C_{d}^{\prime}\geq 1 and some α1NdegP1\alpha\in\frac{1}{N^{\deg P_{1}}}\mathbb{Z} such that αm=α\alpha_{m}=\alpha for ηCdNd\geq\eta^{C_{d}^{\prime}}N^{d} integers m[CNd,CNd]m\in[-CN^{d},CN^{d}]. We have αS\alpha\not\in S, since αx0+hS\alpha_{x_{0}+h}\not\in S for h′′h\in\mathcal{H}^{\prime\prime}. Now the claim follows by writing ϕ1,S=(ϕ1ϕ1,{α})+ϕ1,S{α}\phi_{1,S}=(\phi_{1}-\phi_{1,\{\alpha\}})+\phi_{1,S\cup\{\alpha\}} and applying the pigeonhole principle.

Step 3: Concluding the argument. Now, applying repeatedly the claim established above, starting with S=S=\emptyset (in which case the assumption (4.16) holds for η=δCd\eta=\delta^{C_{d}^{\prime}} by (4.15)) and applying the above repeatedly, after δCd\ll\delta^{-C_{d}^{\prime}} iterations (4.18) cannot hold (since the number of different values that αm\alpha_{m} takes at least (δ/2)CdNd(\delta/2)^{C_{d}^{\prime}}N^{d} times is dδCdC\ll_{d}\delta^{-C_{d}^{\prime}}C), so (4.17) holds with η=δAd\eta=\delta^{A_{d}} for some constant AdA_{d}.

Now that (4.17) holds with η=δAd\eta=\delta^{A_{d}}, by the pigeonhole principle there exist intervals [N1,N2][1,N][N_{1},N_{2}]\subset[1,N] of length δ3AdCdN\delta^{3A_{d}C_{d}^{\prime}}N and I[CNd,CNd]I\subset[-CN^{d},CN^{d}] of length δ3AdCdNd\delta^{3A_{d}C_{d}^{\prime}}N^{d} such that, denoting ψα(m)1αm=α\psi_{\alpha}(m)\coloneqq 1_{\alpha_{m}=\alpha}, we have

|ΛP1,,PkCNd,N(θe(αP1())1[N1,N2];ϕ0e(α)1I,ψα,f3,,fk)|δ8AdCd.\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta e(\alpha P_{1}(\cdot))1_{[N_{1},N_{2}]};\phi_{0}e(\alpha\cdot)1_{I},\psi_{\alpha},f_{3},\ldots,f_{k})|\geq\delta^{8A_{d}C_{d}^{\prime}}.

But since the function ψα\psi_{\alpha} is constant on intervals of the form [jN+a,(j+1)N+a)[jN^{\prime}+a,(j+1)N^{\prime}+a) with jj\in\mathbb{Z}, this implies

|ΛP1,,PkCNd,N(θe(αP1())1[N1,N2];ϕ0e(α)1Iψα(+P1(N1)),f3,,fk)|δ8AdCd.\displaystyle|\Lambda_{P_{1},\ldots,P_{k}}^{CN^{d},N}(\theta e(\alpha P_{1}(\cdot))1_{[N_{1},N_{2}]};\phi_{0}e(\alpha\cdot)1_{I}\psi_{\alpha}(\cdot+P_{1}(N_{1})),f_{3},\ldots,f_{k})|\gg\delta^{8A_{d}C_{d}^{\prime}}.

By the induction assumption and the fact that d>degP1d>\deg P_{1}, for some AdA_{d}^{\prime} we now obtain

θ1[N1,N2]ud+1[N]C,P1,,PkδAd.\displaystyle\|\theta 1_{[N_{1},N_{2}]}\|_{u^{d+1}[N]}\gg_{C,P_{1},\ldots,P_{k}}\delta^{A_{d}^{\prime}}.

By Vinogradov’s Fourier expansion (Lemma 3.2), we can write

1[N1,N2](n)=1jδ3Adcje(βjn)+E(n)1_{[N_{1},N_{2}]}(n)=\sum_{1\leq j\leq\delta^{-3A_{d}^{\prime}}}c_{j}e(\beta_{j}n)+E(n)

for some real numbers βj\beta_{j}, some complex numbers |cj|1|c_{j}|\leq 1 and some |E(n)|1|E(n)|\ll 1 satisfying n[N]|E(n)|δ2AdN\sum_{n\in[N]}|E(n)|\ll\delta^{2A_{d}^{\prime}}N. Hence, we conclude that

θud+1[N]C,P1,,PkδOC,P1,,Pk(1),\displaystyle\|\theta\|_{u^{d+1}[N]}\gg_{C,P_{1},\ldots,P_{k}}\delta^{O_{C,P_{1},\ldots,P_{k}}(1)},

as desired. ∎

4.4. Proof of Theorem 4.2

For the proof of Theorem 4.2, we need the following generalised von Neumann theorem for arithmetic progressions. This is well known (see for example [11, Lemma 2]), although the result is typically presented for functions defined on a cyclic group.

Lemma 4.6 (A generalised von Neumann theorem for arithmetic progressions).

Let kk\in\mathbb{N}, N1N\geq 1, C1C\geq 1, and let θ:[N]\theta\colon[N]\to\mathbb{C}. Let L1,,LkL_{1},\ldots,L_{k} be polynomials with integer coefficients of degree at most 11 satisfying Li([N])[CN,CN]L_{i}([N])\subset[-CN,CN] for all i[k]i\in[k]. Let f1,fk:f_{1}\ldots,f_{k}\colon\mathbb{Z}\to\mathbb{C} be functions supported on [CN,CN][-CN,CN] with |fi|1|f_{i}|\leq 1 for 1ik1\leq i\leq k. Then we have

|1N2mn[N]θ(n)f1(m+L1(n))fk(m+Lk(n))|k,CθUk[N].\displaystyle\left|\frac{1}{N^{2}}\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\theta(n)f_{1}(m+L_{1}(n))\cdots f_{k}(m+L_{k}(n))\right|\ll_{k,C}\|\theta\|_{U^{k}[N]}.

Lemma 4.6 could be proven directly without difficulty, but we deduce it as an immediate consequence of the following more general lemma that deals with multidimensional averages. This multidimensional version is needed for proving Theorem 1.3 (but for Theorem 4.2 the one-dimensional case suffices).

Lemma 4.7 (A multidimensional generalised von Neumann theorem for arithmetic progressions).

Let k,rk,r\in\mathbb{N}, N1N\geq 1, C1C\geq 1, and let θ:[N]\theta\colon[N]\to\mathbb{C}. Let L1,,Lk:rL_{1},\ldots,L_{k}\colon\mathbb{Z}\to\mathbb{Z}^{r} be polynomials with integer coefficients of degree at most 11 satisfying Li([N])[CN,CN]rL_{i}([N])\subset[-CN,CN]^{r} for all i[k]i\in[k]. Let f1,fk:rf_{1}\ldots,f_{k}\colon\mathbb{Z}^{r}\to\mathbb{C} be functions supported on [CN,CN]r[-CN,CN]^{r} with |fi|1|f_{i}|\leq 1 for 1ik1\leq i\leq k. Then we have

(4.20) |1Nr+1𝐦rn[N]θ(n)f1(𝐦+L1(n))fk(𝐦+Lk(n))|k,r,CθUk[N].\displaystyle\begin{split}&\left|\frac{1}{N^{r+1}}\sum_{\mathbf{m}\in\mathbb{Z}^{r}}\sum_{n\in[N]}\theta(n)f_{1}(\mathbf{m}+L_{1}(n))\cdots f_{k}(\mathbf{m}+L_{k}(n))\right|\\ &\quad\ll_{k,r,C}\|\theta\|_{U^{k}[N]}.\end{split}
Proof of Lemma 4.7.

For convenience, we extend the definition of θ\theta to all of \mathbb{Z} by setting it equal to 0 outside [N][N].

We use induction on kk. In the case k=1k=1, the claim is immediate by making the change of variables 𝐦=𝐦+n𝐯1+𝐛1\mathbf{m}^{\prime}=\mathbf{m}+n\mathbf{v}_{1}+\mathbf{b}_{1} and noting that 𝔼n[N]θ(n)=θU1[N]\mathbb{E}_{n\in[N]}\theta(n)=\|\theta\|_{U^{1}[N]}.

Suppose then that the claim holds in the case k1k-1\in\mathbb{N} and consider the case kk. Let SS be the expression inside the absolute values in (4.20). By making the change of variables 𝐦=𝐦+L1(n)\mathbf{m}^{\prime}=\mathbf{m}+L_{1}(n), we have

S=1Nr+1𝐦rn[N]θ(n)f1(𝐦)j=2kfj(𝐦+Lj(n)L1(n)).\displaystyle S=\frac{1}{N^{r+1}}\sum_{\mathbf{m}^{\prime}\in\mathbb{Z}^{r}}\sum_{n\in[N]}\theta(n)f_{1}(\mathbf{m}^{\prime})\prod_{j=2}^{k}f_{j}(\mathbf{m}^{\prime}+L_{j}(n)-L_{1}(n)).

By the Cauchy–Schwarz inequality, we obtain

|S|2r,C1Nr+2𝐦r|n[N]θ(n)j=2kfj(𝐦+Lj(n)L1(n))|2.\displaystyle|S|^{2}\ll_{r,C}\frac{1}{N^{r+2}}\sum_{\mathbf{m}\in\mathbb{Z}^{r}}\left|\sum_{n\in[N]}\theta(n)\prod_{j=2}^{k}f_{j}(\mathbf{m}+L_{j}(n)-L_{1}(n))\right|^{2}.

In what follows, for a function f:rf\colon\mathbb{Z}^{r}\to\mathbb{C} and 𝐯r\mathbf{v}\in\mathbb{Z}^{r}, denote Δh𝐯f(x)f(x+h𝐯)¯f(x)\Delta_{h}^{\mathbf{v}}f(x)\coloneqq\overline{f(x+h\mathbf{v})}f(x). From van der Corput’s inequality (Lemma 3.1), we conclude that

|S|2r,C1Nr+1hμN(h)𝐦rn[N]Δhθ(n)j=2kΔh𝐯jfj(𝐦+Lj(n)L1(n)),\displaystyle|S|^{2}\ll_{r,C}\frac{1}{N^{r+1}}\sum_{h\in\mathbb{Z}}\mu_{N}(h)\sum_{\mathbf{m}\in\mathbb{Z}^{r}}\sum_{n\in[N]}\Delta_{h}\theta(n)\prod_{j=2}^{k}\Delta_{h}^{\mathbf{v}_{j}}f_{j}(\mathbf{m}+L_{j}(n)-L_{1}(n)),

where 𝐯jr\mathbf{v}_{j}\in\mathbb{Z}^{r} is such that Lj(n)L1(n)=n𝐯j+Lj(0)L1(0)L_{j}(n)-L_{1}(n)=n\mathbf{v}_{j}+L_{j}(0)-L_{1}(0). By the induction assumption and the fact that |μN(h)|1N1|h|N|\mu_{N}(h)|\ll\frac{1}{N}1_{|h|\leq N}, we see that

(4.21) |S|2k,r,C𝔼h[N,N]ΔhθUk1[N].\displaystyle|S|^{2}\ll_{k,r,C}\mathbb{E}_{h\in[-N,N]}\|\Delta_{h}\theta\|_{U^{k-1}[N]}.

But by Hölder’s inequality and the definition of the Uk1[N]U^{k-1}[N] norm, the right-hand side of (4.21) is

k,C(𝔼h[N,N]𝔼n,h1,,hk1[N,N]ω{0,1}k1𝒞|ω|Δhθ(n+ω(h1,,hk1)))1/2k1\displaystyle\ll_{k,C}\left(\mathbb{E}_{h\in[-N,N]}\mathbb{E}_{n,h_{1},\ldots,h_{k-1}\in[-N,N]}\prod_{\omega\in\{0,1\}^{k-1}}\mathcal{C}^{|\omega|}\Delta_{h}\theta(n+\omega\cdot(h_{1},\ldots,h_{k-1}))\right)^{1/2^{k-1}}
kθUk[N]2.\displaystyle\ll_{k}\|\theta\|_{U^{k}[N]}^{2}.

This completes the induction. ∎

Proof of Theorem 4.2.

For convenience, we extend the definition of θ\theta to all of \mathbb{Z} by setting it equal to 0 outside [N][N].

We apply the PET induction scheme of Bergelson and Leibman [1]. For any finite collection 𝒬\mathcal{Q} of polynomials, define its type as (d,wd,,w1)(d,w_{d},\ldots,w_{1}), where wjw_{j} is the number of different leading coefficients among the polynomials in the subcollection {Q𝒬:degQ=j}\{Q\in\mathcal{Q}\colon\deg Q=j\}. We introduce the lexicographic order << on the types of polynomials. In other words, we write (d,wd,,w1)<(d,wd,,w1)(d,w_{d},\ldots,w_{1})<(d^{\prime},w_{d^{\prime}}^{\prime},\ldots,w_{1}^{\prime}) if there exists j0j\geq 0 such that the first jj coordinates of the vectors are equal and the coordinate of order j+1j+1 is larger for the second vector. In this way, we have introduced an order << on the set of all finite collections of polynomials based on the order of their type. Note that the length of any descending chain of collections of polynomials with maximal element 𝒬\mathcal{Q} is bounded as a function of |𝒬||\mathcal{Q}| and deg𝒬\deg\mathcal{Q}.

We shall prove Theorem 4.2 by induction on the type of 𝒬\mathcal{Q}. The base case is that of collections of type (1,k)(1,k) with kk\in\mathbb{N}, that is, collections where all the polynomials have degree at most 11. This case follows readily from Lemma 4.6.

Suppose that 𝒬\mathcal{Q} is a finite collection of polynomials for which Theorem 4.2 fails and that there is no smaller collection 𝒬<𝒬\mathcal{Q}^{\prime}<\mathcal{Q} with this property. Then deg𝒬2\deg\mathcal{Q}\geq 2. Let us write 𝒬={Q1,,Qk}\mathcal{Q}=\{Q_{1},\ldots,Q_{k}\}, where Q1Q_{1} is one of the polynomials of 𝒬\mathcal{Q} with the least positive degree.

Let SS be the left-hand side of (4.2). By the Cauchy–Schwarz inequality, van der Corput’s inequality (3.4) and a change of variables, we have

(4.22) |S|21Nd+1hμN(h)mn[N]Δhθ(n)Q𝒬ΔhfQ(m+Q(n))=1Nd+1hμN(h)mn[N]Δhθ(n)Q𝒬fQ(m+Q(n+h)Q1(n))fQ¯(m+Q(n)Q1(n))=1Nd+1hμN(h)mn[N]Δhθ(n)Q~𝒬hf~Q~(m+Q~(n)),\displaystyle\begin{split}|S|^{2}&\ll\frac{1}{N^{d+1}}\sum_{h\in\mathbb{Z}}\mu_{N}(h)\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\Delta_{h}\theta(n)\prod_{Q\in\mathcal{Q}}\Delta_{h}f_{Q}(m+Q(n))\\ &=\frac{1}{N^{d+1}}\sum_{h\in\mathbb{Z}}\mu_{N}(h)\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\Delta_{h}\theta(n)\prod_{Q\in\mathcal{Q}}f_{Q}(m+Q(n+h)-Q_{1}(n))\overline{f_{Q}}(m+Q(n)-Q_{1}(n))\\ &=\frac{1}{N^{d+1}}\sum_{h\in\mathbb{Z}}\mu_{N}(h)\sum_{m\in\mathbb{Z}}\sum_{n\in[N]}\Delta_{h}\theta(n)\prod_{\widetilde{Q}\in\mathcal{Q}_{h}}\widetilde{f}_{\widetilde{Q}}(m+\widetilde{Q}(n)),\end{split}

where f~Q~\widetilde{f}_{\widetilde{Q}} are some 11-bounded functions and 𝒬h\mathcal{Q}_{h} is the collection (of size 2|𝒬|\leq 2|\mathcal{Q}| and degree deg𝒬\leq\deg\mathcal{Q}) given by

𝒬h={Qj(y+h)Q1(y),Qj(y)Q1(y)}j=1k.\displaystyle\mathcal{Q}_{h}=\{Q_{j}(y+h)-Q_{1}(y),Q_{j}(y)-Q_{1}(y)\}_{j=1}^{k}.

We claim that for every hh\in\mathbb{Z} the type of 𝒬h\mathcal{Q}_{h} is less than the type of 𝒬\mathcal{Q}. Let d1=degQ1d_{1}=\deg Q_{1}. Let (d,wd,,w1)(d,w_{d},\ldots,w_{1}) be the type of 𝒬\mathcal{Q} and let (dh,wd,h,,w1,h)(d_{h},w_{d,h},\ldots,w_{1,h}) be the type of 𝒬h\mathcal{Q}_{h}. We have dhdd_{h}\leq d, and if dh=dd_{h}=d, then we=we,hw_{e}=w_{e,h} for d1<edd_{1}<e\leq d. Moreover, wd1,h<wd1w_{d_{1},h}<w_{d_{1}}, since if c1,,crc_{1},\ldots,c_{r} are the distinct leading coefficients of the degree d1d_{1} polynomials in 𝒬\mathcal{Q}, the leading coefficients of degree d1d_{1} polynomials in 𝒬h\mathcal{Q}_{h} are c2c1,,crc1c_{2}-c_{1},\ldots,c_{r}-c_{1}. Hence we have 𝒬h<𝒬\mathcal{Q}_{h}<\mathcal{Q}.

By (4.22), the estimate |μN(h)|1N1|h|N|\mu_{N}(h)|\ll\frac{1}{N}1_{|h|\leq N} and the pigeonhole principle, we have

|S|2\displaystyle|S|^{2} 𝔼h[N,N]max|j|(C+1)Nddh|1Ndh+1m1(jNdh,(j+1)Ndh](m)n[N]Δhθ(n)Q~𝒬hf~Q~(m+Q~(n))|\displaystyle\ll\mathbb{E}_{h\in[-N,N]}\max_{|j|\leq(C+1)N^{d-d_{h}}}\left|\frac{1}{N^{d_{h}+1}}\sum_{m\in\mathbb{Z}}1_{(jN^{d_{h}},(j+1)N^{d_{h}}]}(m)\sum_{n\in[N]}\Delta_{h}\theta(n)\prod_{\widetilde{Q}\in\mathcal{Q}_{h}}\widetilde{f}_{\widetilde{Q}}(m+\widetilde{Q}(n))\right|
=𝔼h[N,N]max|j|(C+1)Nddh|1Ndh+1mn[N]Δhθ(n)1[Ndh](m)Q~𝒬hf~Q~,jNdh(m+Q~(n))|,\displaystyle=\mathbb{E}_{h\in[-N,N]}\max_{|j|\leq(C+1)N^{d-d_{h}}}\left|\frac{1}{N^{d_{h}+1}}\sum_{m^{\prime}\in\mathbb{Z}}\sum_{n\in[N]}\Delta_{h}\theta(n)1_{[N^{d_{h}}]}(m^{\prime})\prod_{\widetilde{Q}\in\mathcal{Q}_{h}}\widetilde{f}_{\widetilde{Q},jN^{d_{h}}}(m^{\prime}+\widetilde{Q}(n))\right|,

where f~Q~,a=f~Q~(+a)\widetilde{f}_{\widetilde{Q},a}=\widetilde{f}_{\widetilde{Q}}(\cdot+a).

Since by assumption Q([N])[CNd,CNd]Q([N])\subset[-CN^{d},CN^{d}], basic linear algebra gives that the coefficients of QQ are C,d1\ll_{C,d}1 in modulus. Then, by the mean value theorem, for any Q~𝒬h\widetilde{Q}\in\mathcal{Q}_{h} we have

maxn[N]|Q~(n)||Q~(1)|+Nmaxy[1,N]|Q~(y)|deg𝒬,CNdh,\max_{n\in[N]}|\widetilde{Q}(n)|\ll|\widetilde{Q}(1)|+N\max_{y\in[1,N]}|\widetilde{Q}^{\prime}(y)|\ll_{\deg\mathcal{Q},C}N^{d_{h}},

so for all n[N]n\in[N] we have Q~(n)[CNdh,CNdh]\widetilde{Q}(n)\in[-C^{\prime}N^{d_{h}},C^{\prime}N^{d_{h}}] for some Cdeg𝒬,C1C^{\prime}\ll_{\deg\mathcal{Q},C}1. Now, by the induction assumption (and the fact that the functions f~Q,jNhd\widetilde{f}_{Q,jN^{d}_{h}} may be assumed to be supported on [(C+1)Ndh,(C+1)Ndh][-(C^{\prime}+1)N^{d_{h}},(C^{\prime}+1)N^{d_{h}}]), we conclude that for some natural number s|𝒬|,degQs^{\prime}\ll_{|\mathcal{Q}|,\deg{Q}} we have

|S|2|𝒬|,deg𝒬,C𝔼h[N,N]ΔhθUs[N].\displaystyle|S|^{2}\ll_{|\mathcal{Q}|,\deg\mathcal{Q},C}\mathbb{E}_{h\in[-N,N]}\|\Delta_{h}\theta\|_{U^{s^{\prime}}[N]}.

Applying Hölder’s inequality as in the proof of Lemma 4.6, this implies that

|S||𝒬|,degQ,CθUs+1[N].\displaystyle|S|\ll_{|\mathcal{Q}|,\deg{Q},C}\|\theta\|_{U^{s^{\prime}+1}[N]}.

This completes the induction. ∎

5. Quantitative uniformity of multiplicative functions

In this section, we give quantitative bounds for the uniformity norms of the Möbius function and other multiplicative functions that we need for the proofs of the main theorems.

5.1. The Möbius function

The only arithmetic property we need of the Möbius function is encapsulated in the following recent result of Leng [25], building on [26] and improving on [32].

Lemma 5.1 (Quantitative UkU^{k}-uniformity of μ\mu).

Let kk\in\mathbb{N}, A>0A>0 and N3N\geq 3. Then we have

μUk[N]A(logN)A.\displaystyle\|\mu\|_{U^{k}[N]}\ll_{A}(\log N)^{-A}.
Proof.

This follows by combining Leng’s result [25, Theorems 6] with [32, Theorem 2.5] (where we can use Siegel’s bound qSiegelA(logN)Aq_{\textnormal{Siegel}}\gg_{A}(\log N)^{A}) and the triangle inequality for the Gowers norms. ∎

Remark 5.2.

We mention that the weaker bound μuk[N]A(logN)A\|\mu\|_{u^{k}[N]}\ll_{A}(\log N)^{-A}, which turns out to be all that we need for the proof of Theorem 1.2 in the case of distinct degree polynomials, is much simpler to prove. Indeed, it follows from the method to bilinear exponential sums developed by Vinogradov [33].

Remark 5.3.

Lemma 5.1 continues to hold if the Möbius function μ\mu is replaced with the Liouville function λ\lambda. This follows easily from the identities

λ(n)=d1μ(n/d)1d2n,andμ(n/d)1d2n=n=end2ed(d,n)=1μ(e/d)μ(n),\displaystyle\lambda(n)=\sum_{d\geq 1}\mu(n/d)1_{d^{2}\mid n},\quad\textnormal{and}\quad\mu(n/d)1_{d^{2}\mid n}=\sum_{\begin{subarray}{c}n=en^{\prime}\\ d^{2}\mid e\mid d^{\infty}\\ (d,n^{\prime})=1\end{subarray}}\mu(e/d)\mu(n^{\prime}),

which can be truncated to d2eKd^{2}\leq e\leq K for any K1K\geq 1 at the cost of an error term that is bounded by O(1/K)O(1/K) in L1[N]L^{1}[N] norm.

5.2. Multiplicative functions satisfying the Siegel–Walfisz condition

Our goal in this subsection is to show that any multiplicative function satisfying the Siegel–Walfisz assumption (Definition 1.5) is close to a function whose uk[N]u^{k}[N] norm decays faster than any power of logarithm.

Proposition 5.4.

Let kk\in\mathbb{N} and A>0A>0. Let g:g\colon\mathbb{N}\to\mathbb{C} be a multiplicative function satisfying the Siegel–Walfisz property. Then we have a decomposition g=g1+g2g=g_{1}+g_{2} with g1,g2:g_{1},g_{2}\colon\mathbb{N}\to\mathbb{C} satisfying |g1|,|g2||g||g_{1}|,|g_{2}|\leq|g| and such that for N3N\geq 3 we have

g1uk[N]A,k(logN)A.\displaystyle\|g_{1}\|_{u^{k}[N]}\ll_{A,k}(\log N)^{-A}.

and for r1r\geq 1 we have

𝔼nN|g2(n)|rr(logN)1+o(1).\displaystyle\mathbb{E}_{n\leq N}|g_{2}(n)|^{r}\ll_{r}(\log N)^{-1+o(1)}.

Throughout this section, let

Qnexp((loglogn)2),Rnexp((logn)/(loglogn)2)\displaystyle Q_{n}\coloneqq\exp((\log\log n)^{2}),\quad R_{n}\coloneqq\exp((\log n)/(\log\log n)^{2})

for n3n\geq 3 and Q1=Q2=R1=R2=1Q_{1}=Q_{2}=R_{1}=R_{2}=1, and define the function

(5.1) g~(n)g(n)(11pnp(Qn,Rn)).\displaystyle\widetilde{g}(n)\coloneqq g(n)(1-1_{p\mid n\implies p\not\in(Q_{n},R_{n})}).

In other words, g~\widetilde{g} is the restriction of gg to those integers having at least one prime factor from (Qn,Rn)(Q_{n},R_{n}). The following lemma shows that the function g~\widetilde{g} is close to gg in Lr[N]L^{r}[N] norm.

Lemma 5.5.

Let C>0C>0, r1r\geq 1, and let g:g\colon\mathbb{N}\to\mathbb{C} satisfy |g(n)|d(n)C|g(n)|\leq d(n)^{C} for all nn\in\mathbb{N}. Then for N3N\geq 3 we have

(5.2) 𝔼nN|g(n)g~(n)|rC,r(logN)1+o(1).\displaystyle\mathbb{E}_{n\leq N}|g(n)-\widetilde{g}(n)|^{r}\ll_{C,r}(\log N)^{-1+o(1)}.
Proof.

The left-hand side of (5.2) is by the triangle inequality and the upper bound on gg bounded by

𝔼nNd(n)Cr1pnp(QN,RN)+O(N1/2+o(1))S+O(N1/2+o(1)).\displaystyle\ll\mathbb{E}_{n\leq N}d(n)^{Cr}1_{p\mid n\implies p\not\in(Q_{N},R_{\sqrt{N}})}+O(N^{-1/2+o(1)})\coloneqq S+O(N^{-1/2+o(1)}).

Applying Shiu’s bound [31, Theorem 1] followed by Mertens’s theorem, we see that

S\displaystyle S p[1,N](QN,RN)(1+d(p)Cr1p)p(QN,RN)(11p)\displaystyle\ll\prod_{p\in[1,N]\setminus(Q_{N},R_{\sqrt{N}})}\left(1+\frac{d(p)^{Cr}-1}{p}\right)\prod_{p\in(Q_{N},R_{\sqrt{N}})}\left(1-\frac{1}{p}\right)
=pN(1+2Cr1p)p(QN,RN)(1+2Cr1p)1(11p)\displaystyle=\prod_{p\leq N}\left(1+\frac{2^{Cr}-1}{p}\right)\prod_{p\in(Q_{N},R_{\sqrt{N}})}\left(1+\frac{2^{Cr}-1}{p}\right)^{-1}\left(1-\frac{1}{p}\right)
C,r(logN)2Cr1(logQNlogRN)2Cr\displaystyle\ll_{C,r}(\log N)^{2^{Cr}-1}\left(\frac{\log Q_{N}}{\log R_{\sqrt{N}}}\right)^{2^{Cr}}
C,r(logN)1+o(1),\displaystyle\ll_{C,r}(\log N)^{-1+o(1)},

as desired. ∎

The next lemma shows that the condition on the prime factors in the definition of g~\widetilde{g} can be replaced with a sieve weight up to a small error.

Lemma 5.6.

Let N3N\geq 3 and C>0C>0. There exist real numbers λr[1,1]\lambda_{r}\in[-1,1] such that for any A>0A>0 we have

(5.3) 𝔼NN(logN)A<nN|1pnp(Qn,Rn)rnrN1/10λr|d(n)CA,C(logN)A/2.\displaystyle\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}\left|1_{p\mid n\implies p\not\in(Q_{n},R_{n})}-\sum_{\begin{subarray}{c}r\mid n\\ r\leq N^{1/10}\end{subarray}}\lambda_{r}\right|\cdot d(n)^{C}\ll_{A,C}(\log N)^{-A/2}.
Proof.

By splitting intervals into shorter ones if necessary, we may assume that AA is large enough in terms of CC. Let λr\lambda_{r} be the upper bound linear sieve coefficients of level N1/10N^{1/10} and sifting range (QN,RNN(logN)A)(Q_{N},R_{N-N(\log N)^{-A}}), as defined in [13, Section 12]. From the definition we have λr{1,0,+1}\lambda_{r}\in\{-1,0,+1\}. Let ν(n)=rnλr\nu(n)=\sum_{r\mid n}\lambda_{r}. Then the left-hand side of (5.3) is

(logN)A/2𝔼NN(logN)A<nN|ν(n)1pnp(Qn,Rn)|\displaystyle\ll(\log N)^{A/2}\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}|\nu(n)-1_{p\mid n\implies p\not\in(Q_{n},R_{n})}|
+𝔼NN(logN)A<nNd(n)C+11d(n)(logN)A/C\displaystyle\quad+\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}d(n)^{C+1}1_{d(n)\geq(\log N)^{A/C}}
S1+S2.\displaystyle\coloneqq S_{1}+S_{2}.

Using Shiu’s bound [31, Theorem 1], we see that

S2(logN)2A𝔼NN(logN)A<nNd(n)2C+1(logN)2A+22C+11(logN)A\displaystyle S_{2}\ll(\log N)^{-2A}\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}d(n)^{2C+1}\ll(\log N)^{-2A+2^{2C+1}-1}\ll(\log N)^{-A}

by the assumption that AA is large in terms of CC.

Since

ν(n)1pnp(QN,RNN(logN)A)1pnp(Qn,Rn),\nu(n)\geq 1_{p\mid n\implies p\not\in(Q_{N},R_{N-N(\log N)^{-A}})}\geq 1_{p\mid n\implies p\not\in(Q_{n},R_{n})},

for n(NN(logN)A,N]n\in(N-N(\log N)^{-A},N], we have

𝔼NN(logN)A<nN|ν(n)1pnp(Qn,Rn)|\displaystyle\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}|\nu(n)-1_{p\mid n\implies p\not\in(Q_{n},R_{n})}|
=𝔼NN(logN)A<nN(ν(n)1pnp(QN,RNN(logN)A))\displaystyle\quad=\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}(\nu(n)-1_{p\mid n\implies p\not\in(Q_{N},R_{N-N(\log N)^{-A}})})
+O(p(QNN(logN)A,QN)(RNN(logN)A,RN)1p2).\displaystyle\quad\quad+O\left(\sum_{p\in(Q_{N-N(\log N)^{-A}},Q_{N})\cup(R_{N-N(\log N)^{-A}},R_{N})}\frac{1}{p^{2}}\right).

By the fundamental lemma of sieve theory ([21, Lemma 6.8]) and Mertens’s theorem, this is

exp((loglogN)2/20)+(logN)A/2,\displaystyle\ll\exp(-(\log\log N)^{2}/20)+(\log N)^{-A/2},

which suffices. ∎

We are now ready to show that the Siegel–Walfisz property for gg implies the same property for g~\widetilde{g}.

Lemma 5.7.

Let g:g\colon\mathbb{N}\to\mathbb{C} be a multiplicative function satisfying the Siegel–Walfisz property. Then g~\widetilde{g} satisfies the Siegel–Walfisz property.

Proof.

Since g~(n)=g(n)g(n)1pnp(Qn,Rn)\widetilde{g}(n)=g(n)-g(n)1_{p\mid n\implies p\not\in(Q_{n},R_{n})}, it suffices to show that the function

ng(n)1pnp(Qn,Rn)\displaystyle n\mapsto g(n)1_{p\mid n\implies p\not\in(Q_{n},R_{n})}

satisfies the Siegel–Walfisz property. By splitting a long interval into shorter ones, it suffices to show that for any large A>0A>0 and any 1aq(logN)A1\leq a\leq q\leq(\log N)^{A} we have

|𝔼NN(logN)A<nNg(n)1pnp(Qn,Rn)1na(modq)|A(logN)A/2.\displaystyle\left|\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}g(n)1_{p\mid n\implies p\not\in(Q_{n},R_{n})}1_{n\equiv a\ (\mathrm{mod}\ q)}\right|\ll_{A}(\log N)^{-A/2}.

By Lemma 5.6, it suffices to show that for any |λr|1|\lambda_{r}|\leq 1 we have

|𝔼NN(logN)A<nNg(n)rnrN1/10λr1na(modq)|A(logN)A/2.\displaystyle\left|\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}g(n)\sum_{\begin{subarray}{c}r\mid n\\ r\leq N^{1/10}\end{subarray}}\lambda_{r}1_{n\equiv a\ (\mathrm{mod}\ q)}\right|\ll_{A}(\log N)^{-A/2}.

Exchanging the order of summation and applying the triangle inequality, it suffices to show that

rN1/101r|𝔼(NN(logN)A)/r<mN/rg(rm)1rma(modq)|A(logN)A/2.\displaystyle\sum_{r\leq N^{1/10}}\frac{1}{r}\left|\mathbb{E}_{(N-N(\log N)^{-A})/r<m\leq N/r}g(rm)1_{rm\equiv a\ (\mathrm{mod}\ q)}\right|\ll_{A}(\log N)^{-A/2}.

Let 𝒜N\mathcal{A}_{N} be the set of rr\in\mathbb{N} with ω(r)(loglogN)3/2\omega(r)\leq(\log\log N)^{3/2}. Then by Shiu’s bound we have

rN1/10r𝒜N1r|𝔼(NN(logN)A)/r<mN/rg(rm)1rma(modq)|\displaystyle\sum_{\begin{subarray}{c}r\leq N^{1/10}\\ r\not\in\mathcal{A}_{N}\end{subarray}}\frac{1}{r}\left|\mathbb{E}_{(N-N(\log N)^{-A})/r<m\leq N/r}g(rm)1_{rm\equiv a\ (\mathrm{mod}\ q)}\right|
(logN)2C1rN1/10r𝒜Nd(r)Cr\displaystyle\ll(\log N)^{2^{C}-1}\sum_{\begin{subarray}{c}r\leq N^{1/10}\\ r\not\in\mathcal{A}_{N}\end{subarray}}\frac{d(r)^{C}}{r}
(logN)2C1rN1/10d(r)C+1r2(loglogN)3/2\displaystyle\leq(\log N)^{2^{C}-1}\sum_{r\leq N^{1/10}}\frac{d(r)^{C+1}}{r\cdot 2^{(\log\log N)^{3/2}}}
2(loglogN)3/2/2,\displaystyle\ll 2^{-(\log\log N)^{3/2}/2},

say. In view of this, it suffices to show that for all r𝒜N[1,N1/10]r\in\mathcal{A}_{N}\cap[1,N^{1/10}] we have

|𝔼(NN(logN)A)/r<mN/rg(rm)1rma(modq)|A(logN)3A/21.\displaystyle\left|\mathbb{E}_{(N-N(\log N)^{-A})/r<m\leq N/r}g(rm)1_{rm\equiv a\ (\mathrm{mod}\ q)}\right|\ll_{A}(\log N)^{-3A/2-1}.

By writing the sum over ((NN(logN)A)/r,N/r]((N-N(\log N)^{-A})/r,N/r] as a difference of two sums, it suffices to show that for y[N9/10/2,N]y\in[N^{9/10}/2,N] we have

(5.4) |𝔼myg(rm)1rma(modq)|A(logN)A1.\displaystyle\left|\mathbb{E}_{m\leq y}g(rm)1_{rm\equiv a\ (\mathrm{mod}\ q)}\right|\ll_{A}(\log N)^{-A-1}.

We can uniquely factorise m=mm=\ell m^{\prime}, where r\ell\mid r^{\infty} and (m,r)=1(m^{\prime},r)=1. By multiplicativity of gg, we then reduce to showing that

(5.5) r1|𝔼my/g(m)1rma(modq)1(m,r)=1|A(logN)3A/21.\displaystyle\sum_{\ell\mid r^{\infty}}\frac{1}{\ell}\left|\mathbb{E}_{m^{\prime}\leq y/\ell}g(m^{\prime})1_{r\ell m^{\prime}\equiv a\ (\mathrm{mod}\ q)}1_{(m^{\prime},r)=1}\right|\ll_{A}(\log N)^{-3A/2-1}.

Let S1S_{1} denote the part of the left-hand side of (5.5) with <(logN)4A\ell<(\log N)^{4A}, and let S2S_{2} denote the part with (logN)4A\ell\geq(\log N)^{4A}. By Shiu’s bound, we can crudely estimate

(5.6) S2r(logN)3A1(logN)2C1(logN)4A/2+2C1r11/2=(logN)2A+2C1pr(1+1p1/2+1p21/2+)(logN)2A+2C1exp(O(ω(r)1/2)),\displaystyle\begin{aligned} S_{2}&\ll\sum_{\begin{subarray}{c}\ell\mid r^{\infty}\\ \ell\geq(\log N)^{3A}\end{subarray}}\frac{1}{\ell}\cdot(\log N)^{2^{C}-1}\\ &\ll(\log N)^{-4A/2+2^{C}-1}\sum_{\ell\mid r^{\infty}}\frac{1}{\ell^{1/2}}\\ &=(\log N)^{-2A+2^{C}-1}\prod_{p\mid r}\left(1+\frac{1}{p^{1/2}}+\frac{1}{p^{2\cdot 1/2}}+\cdots\right)\\ &\ll(\log N)^{-2A+2^{C}-1}\exp(O(\omega(r)^{1/2})),\end{aligned}

where for the last line we used the simple inequality

pr1p1/2pω(r)1p1/2ω(r)1/2.\sum_{p\mid r}\frac{1}{p^{1/2}}\leq\sum_{p\leq\omega(r)}\frac{1}{p^{1/2}}\ll\omega(r)^{1/2}.

Since ω(r)(loglogN)3/2\omega(r)\leq(\log\log N)^{3/2} for r𝒜Nr\in\mathcal{A}_{N}, we see that S2A(logN)3A/21S_{2}\ll_{A}(\log N)^{-3A/2-1}.

The remaining task is to show that S1A(logN)3A/21S_{1}\ll_{A}(\log N)^{-3A/2-1}. For this, it suffices to show that for any integer 1(logN)4A1\leq\ell\leq(\log N)^{4A} we have

(5.7) |𝔼my/g(m)1rma(modq)1(m,r)=1|A(logN)3A/22.\displaystyle\left|\mathbb{E}_{m^{\prime}\leq y/\ell}g(m^{\prime})1_{r\ell m^{\prime}\equiv a\ (\mathrm{mod}\ q)}1_{(m^{\prime},r)=1}\right|\ll_{A}(\log N)^{-3A/2-2}.

From Shiu’s bound and the Siegel–Walfisz assumption on gg, for any integer 1uN1/101\leq u\leq N^{1/10} and any r𝒜N[1,N1/10]r\in\mathcal{A}_{N}\cap[1,N^{1/10}] we have

(5.8) |𝔼my/g(m)1rma(modq)1um|A,Cmin{(d(u)d(q))Cu,(logN)10A}.\displaystyle\left|\mathbb{E}_{m^{\prime}\leq y/\ell}g(m^{\prime})1_{r\ell m^{\prime}\equiv a\ (\mathrm{mod}\ q)}1_{u\mid m^{\prime}}\right|\ll_{A,C}\min\left\{\frac{(d(u)d(q))^{C}}{u},(\log N)^{-10A}\right\}.

Substituting the Möbius inversion formula

1(m,r)=1=uru(logN)6Aμ(u)1um+uru>(logN)6Aμ(u)1um\displaystyle 1_{(m^{\prime},r)=1}=\sum_{\begin{subarray}{c}u\mid r\\ u\leq(\log N)^{6A}\end{subarray}}\mu(u)1_{u\mid m^{\prime}}+\sum_{\begin{subarray}{c}u\mid r\\ u>(\log N)^{6A}\end{subarray}}\mu(u)1_{u\mid m^{\prime}}

and (5.8) into (5.7), and estimating d(q)qo(1)=(logN)o(1)d(q)\ll q^{o(1)}=(\log N)^{o(1)}, we reduce to showing that

uru>(logN)6Ad(u)Cu(logN)2C1A,C(logN)3A/23.\displaystyle\sum_{\begin{subarray}{c}u\mid r\\ u>(\log N)^{6A}\end{subarray}}\frac{d(u)^{C}}{u}(\log N)^{2^{C}-1}\ll_{A,C}(\log N)^{-3A/2-3}.

Estimating crudely using d(u)C/u(logN)5A/2/u1/2d(u)^{C}/u\ll(\log N)^{-5A/2}/u^{1/2} for u(logN)6Au\geq(\log N)^{6A}, and recalling that AA is large, it suffices to show that

(5.9) ur1u1/2logN,\displaystyle\sum_{u\mid r}\frac{1}{u^{1/2}}\ll\log N,

say. Since r𝒜Nr\in\mathcal{A}_{N}, the left-hand side is

pr(1+1p1/2+1p21/2+)exp(O(ω(r1/2))exp(O((loglogN)3/4)),\displaystyle\prod_{p\mid r}\left(1+\frac{1}{p^{1/2}}+\frac{1}{p^{2\cdot 1/2}}+\cdots\right)\ll\exp(O(\omega(r^{1/2}))\ll\exp(O((\log\log N)^{3/4})),

which suffices. ∎

For the proof of Proposition 5.4 we also need a bilinear estimate for polynomial phases.

Lemma 5.8.

Let A,C>0A,C>0, ss\in\mathbb{N}, and let αa\alpha_{a}, βa\beta_{a} be complex sequences with |αa|,|βa|d(a)C|\alpha_{a}|,|\beta_{a}|\leq d(a)^{C} for aa\in\mathbb{N}, and with αa,βa\alpha_{a},\beta_{a} supported on aexp((loglogN)2)a\geq\exp((\log\log N)^{2}). Let P(y)=0jscjysP(y)=\sum_{0\leq j\leq s}c_{j}y^{s} be a polynomial with real coefficients, and suppose that

(5.10) |abNαaβbe(P(ab))|N(logN)A.\displaystyle\left|\sum_{ab\leq N}\alpha_{a}\beta_{b}e(P(ab))\right|\geq N(\log N)^{-A}.

Then there exists an integer 1A,C,s(logN)OA,C,s(1)1\leq\ell\ll_{A,C,s}(\log N)^{O_{A,C,s}(1)} such that

cjA,C,s(logN)OA,C,s(1)Nj\displaystyle\|\ell c_{j}\|\ll_{A,C,s}\frac{(\log N)^{O_{A,C,s}(1)}}{N^{j}}

for all integers 1js1\leq j\leq s.

Proof.

We may assume that C>0C>0 is large and that A>0A>0 is large in terms of CC. Write αa=αa(1)+αa(2)\alpha_{a}=\alpha_{a}^{(1)}+\alpha_{a}^{(2)} and βa=βa(1)+βa(2)\beta_{a}=\beta_{a}^{(1)}+\beta_{a}^{(2)}, where

αa(1)=αa1|αa|(logN)A/10,βa(1)=βa1|βa|(logN)A/10.\displaystyle\alpha_{a}^{(1)}=\alpha_{a}1_{|\alpha_{a}|\leq(\log N)^{A/10}},\quad\beta_{a}^{(1)}=\beta_{a}1_{|\beta_{a}|\leq(\log N)^{A/10}}.

Then, since AA is large in terms of CC, we have

(5.11) nN|αn(2)|2nNd(n)2C1d(n)C>(logN)A/10(logN)3AnNd(n)32C(logN)5A/2,\displaystyle\begin{aligned} \sum_{n\leq N}|\alpha_{n}^{(2)}|^{2}&\leq\sum_{n\leq N}d(n)^{2C}1_{d(n)^{C}>(\log N)^{A/10}}\\ &\leq(\log N)^{-3A}\sum_{n\leq N}d(n)^{32C}\\ &\ll(\log N)^{-5A/2},\end{aligned}

and similarly with βn(2)\beta_{n}^{(2)} in place of αn(2)\alpha_{n}^{(2)}. Now, applying the decompositions αa=αa(1)+αa(2)\alpha_{a}=\alpha_{a}^{(1)}+\alpha_{a}^{(2)}, βb=βb(1)+βb(2)\beta_{b}=\beta_{b}^{(1)}+\beta_{b}^{(2)}, (5.11) and Cauchy–Schwarz, the assumption (5.10) yields

|abNαaβbe(P(ab))|N(logN)A/2,\displaystyle\left|\sum_{ab\leq N}\alpha_{a}^{\prime}\beta_{b}^{\prime}e(P(ab))\right|\gg N(\log N)^{-A/2},

where αa=αa(1)/(logN)A/10\alpha_{a}^{\prime}=\alpha_{a}^{(1)}/(\log N)^{A/10}, αb=αb(1)/(logN)A/10\alpha_{b}^{\prime}=\alpha_{b}^{(1)}/(\log N)^{A/10}. Since αa,βb\alpha_{a}^{\prime},\beta_{b}^{\prime} are 11-bounded, the result follows e.g. from [27, Proposition 2.2] (which is an exponential sum estimate over short intervals; weaker results would also suffice). ∎

Proof of Proposition 5.4.

Recall the definition of g~\widetilde{g} from (5.1). We take g1=g~g_{1}=\widetilde{g}, g2=gg~g_{2}=g-\widetilde{g}. Then we immediately have |g1|,|g2||g||g_{1}|,|g_{2}|\leq|g|. In view of Lemma 5.5, it suffices to show that g1uk[N]A,k(logN)A\|g_{1}\|_{u^{k}[N]}\ll_{A,k}(\log N)^{-A}.

Let C>0C>0 be such that |g(n)|d(n)C|g(n)|\leq d(n)^{C} for all nn. By splitting into short intervals, it suffices to show that for any A>0A>0 and any polynomial P(y)=1jk1cjyj[y]P(y)=\sum_{1\leq j\leq k-1}c_{j}y^{j}\in\mathbb{R}[y] we have

(5.12) |𝔼NN(logN)A<nNg~(n)e(P(n))|A(logN)A/2.\displaystyle\left|\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}\widetilde{g}(n)e(P(n))\right|\ll_{A}(\log N)^{-A/2}.

Let

g~N(n)=g(n)(11pnp(QN,RN)).\widetilde{g}_{N}(n)=g(n)(1-1_{p\mid n\implies p\in(Q_{N},R_{N})}).

Write 𝒥N=(QNN(logN)A,QN](RNN(logN)A,RN]\mathcal{J}_{N}=(Q_{N-N(\log N)^{-A}},Q_{N}]\cup(R_{N-N(\log N)^{-A}},R_{N}] for brevity. Then for n(NN(logN)A,N]n\in(N-N(\log N)^{-A},N] we have

g~(n)=g~N(n)+O(d(n)C1p𝒥N:pn).\widetilde{g}(n)=\widetilde{g}_{N}(n)+O(d(n)^{C}1_{\exists\,p\in\mathcal{J}_{N}\colon\,\,p\mid n}).

Hence, by Shiu’s bound, for any 1Y1<Y2N1\leq Y_{1}<Y_{2}\leq N with Y2Y1+N1/2Y_{2}\geq Y_{1}+N^{1/2} we can estimate

(5.13) 𝔼Y1<nY2|g~(n)g~N(n)|p𝒥N1p𝔼Y1/p<mY2/pd(pm)CA(logN)2C1A.\displaystyle\mathbb{E}_{Y_{1}<n\leq Y_{2}}|\widetilde{g}(n)-\widetilde{g}_{N}(n)|\ll\sum_{p\in\mathcal{J}_{N}}\frac{1}{p}\mathbb{E}_{Y_{1}/p<m\leq Y_{2}/p}d(pm)^{C}\ll_{A}(\log N)^{2^{C}-1-A}.

Hence, it suffices to prove (5.12) with g~N\widetilde{g}_{N} in place of g~\widetilde{g}.

For II an interval, let ωI(n)\omega_{I}(n) denote the number of prime factors from II without multiplicities. Then we immediately have the Ramaré identity

g~N(n)=p(QN,RN)n=pmg(pm)ω(QN,RN)(pm).\displaystyle\widetilde{g}_{N}(n)=\sum_{p\in(Q_{N},R_{N})}\,\,\sum_{n=pm}\frac{g(pm)}{\omega_{(Q_{N},R_{N})}(pm)}.

By multiplicativity, g(pm)=g(p)g(m)g(pm)=g(p)g(m) and ω(QN,RN)(pm)=1+ω(QN,RN)(m)\omega_{(Q_{N},R_{N})}(pm)=1+\omega_{(Q_{N},R_{N})}(m) unless pmp\mid m, so

g~N(n)\displaystyle\widetilde{g}_{N}(n) =p(QN,RN)n=pmg(p)g(m)ω(QN,RN)(m)+1+O(1p(QN,RN):p2n)\displaystyle=\sum_{p\in(Q_{N},R_{N})}\,\,\sum_{n=pm}\frac{g(p)g(m)}{\omega_{(Q_{N},R_{N})}(m)+1}+O(1_{\exists\,p\in(Q_{N},R_{N})\colon\,\,p^{2}\mid n})
gN(n)+O(1p(QN,RN):p2n).\displaystyle\coloneqq g^{\prime}_{N}(n)+O(1_{\exists\,p\in(Q_{N},R_{N})\colon\,\,p^{2}\mid n}).

We trivially have

(5.14) 𝔼NN(logN)A<nN1p(QN,RN):p2np(QN,RN)1p21QN.\displaystyle\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}1_{\exists\,p\in(Q_{N},R_{N})\colon\,\,p^{2}\mid n}\ll\sum_{p\in(Q_{N},R_{N})}\frac{1}{p^{2}}\ll\frac{1}{Q_{N}}.

Since gN(n)g^{\prime}_{N}(n) is by definition of the form n=abαaβb\sum_{n=ab}\alpha_{a}\beta_{b} with |αa|1|\alpha_{a}|\leq 1, |βb|d(b)C|\beta_{b}|\ll d(b)^{C}, and since αa,βa\alpha_{a},\beta_{a} are supported on (QN,N/QN)(Q_{N},N/Q_{N}) by Lemma 5.8 and (5.14) we conclude that

|𝔼NN(logN)A<nNg~N(n)e(P(n))|(logN)A\displaystyle\left|\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}\widetilde{g}_{N}(n)e(P(n))\right|\leq(\log N)^{-A}

unless for some integer 1A(logN)OA(1)1\leq\ell\ll_{A}(\log N)^{O_{A}(1)} the coefficients of PP satisfy

(5.15) cjA,C,k(logN)OA,C,k(1)Nj\displaystyle\|\ell c_{j}\|\ll_{A,C,k}\frac{(\log N)^{O_{A,C,k}(1)}}{N^{j}}

for all integers 1js11\leq j\leq s-1.

Suppose that (5.15) holds. Let BB be large in terms of A,C,kA,C,k. Then e(P(n))=e(P(N))(1+O((logN)A))e(P(n))=e(P(N))(1+O((\log N)^{-A})) for nn belonging to any subinterval of [1,N][1,N] of length N(logN)AN(\log N)^{-A}. Now, splitting into progressions modulo \ell and into short intervals, we see that

|𝔼NN(logN)A<nNg~N(n)e(P(n))|\displaystyle\left|\mathbb{E}_{N-N(\log N)^{-A}<n\leq N}\widetilde{g}_{N}(n)e(P(n))\right|
\displaystyle\ll maxb(mod)maxz(NN(logN)A,N)|𝔼zN(logN)B<nzg~N(n)e(P(n))1nb(mod)|\displaystyle\ell\max_{b\ (\mathrm{mod}\ \ell)}\,\,\max_{z\in(N-N(\log N)^{-A},N)}\left|\mathbb{E}_{z-N(\log N)^{-B}<n\leq z}\widetilde{g}_{N}(n)e(P(n))1_{n\equiv b\ (\mathrm{mod}\ \ell)}\right|
\displaystyle\ll maxb(mod)maxz(NN(logN)A,N)|𝔼zN(logN)B<nzg~N(n)1nb(mod)|+OA((logN)A).\displaystyle\ell\max_{b\ (\mathrm{mod}\ \ell)}\,\,\max_{z\in(N-N(\log N)^{-A},N)}\left|\mathbb{E}_{z-N(\log N)^{-B}<n\leq z}\widetilde{g}_{N}(n)1_{n\equiv b\ (\mathrm{mod}\ \ell)}\right|+O_{A}((\log N)^{-A}).

Now the claim follows from (5.13) and the Siegel–Walfisz assumption on g~\widetilde{g} (which we have thanks to Lemma 5.7). ∎

6. Lemmas for the main proofs

6.1. A lacunary subsequence trick

We use a lacunary subsequence trick in the proofs of our pointwise convergence results. Such a trick roughly states that if AN:XA_{N}\colon X\to\mathbb{C} are some measurable functions and we have strong quantitative decay for ANjL1(X)\|A_{N_{j}}\|_{L^{1}(X)} for some lacunary sequence (Nj)j(N_{j})_{j\in\mathbb{N}}, then provided that ANA_{N} does not vary too much on intervals of the form [Nj,Nj+1][N_{j},N_{j+1}] in L1(X)L^{1}(X) norm, the sequence ANA_{N} must converge to 0 in L(X)L^{\infty}(X) norm. Variants of this idea are frequently used to establish convergence of ergodic averages; see for example [12, Section 5].

Lemma 6.1.

Let B40B\geq 40. Let (X,ν)(X,\nu) be a probability space, and for N1N\geq 1 let AN:XA_{N}\colon X\to\mathbb{C} be a measurable function. If for any N3N\geq 3 we have

(6.1) ANL1(X)(logN)B\displaystyle\|A_{N}\|_{L^{1}(X)}\ll(\log N)^{-B}

and for 3N<MN(1+(logN)(B/21))3\leq N<M\leq N(1+(\log N)^{-(B/2-1)}) and for ν\nu-almost all xXx\in X we have

(6.2) |AM(x)AN(x)|(logN)B/10B/2+1,\displaystyle|A_{M}(x)-A_{N}(x)|\leq(\log N)^{B/10-B/2+1},

then for ν\nu-almost all xXx\in X we have limN(logN)2B/52|AN(x)|=0\lim_{N\to\infty}(\log N)^{2B/5-2}|A_{N}(x)|=0.

Proof.

Let η>0\eta>0 be a small enough constant, and set Nj=exp(ηj2/B)N_{j}=\exp(\eta\cdot j^{2/B}) for all jj\in\mathbb{N}. Note that Nj+1Nj(1+(logNj)(B/21))N_{j+1}\leq N_{j}(1+(\log N_{j})^{-(B/2-1)}) for all large enough jj\in\mathbb{N}, so by (6.2) we have

(6.3) supM[Nj,Nj+1]|AM(x)AN(x)|(logNj)B/10B/2+1\displaystyle\sup_{M\in[N_{j},N_{j+1}]}|A_{M}(x)-A_{N}(x)|\leq(\log N_{j})^{B/10-B/2+1}

For δ(0,1/2)\delta\in(0,1/2), denote

(6.4) E(δ)={xX:lim supN(logN)2B/52|AN(x)|δ},Ej(δ)={xX:|ANj(x)|δ2(logNj)2B/5+2}.\displaystyle\begin{split}E(\delta)&=\{x\in X\colon\limsup_{N\to\infty}\,(\log N)^{2B/5-2}|A_{N}(x)|\geq\delta\},\\ E_{j}(\delta)&=\{x\in X\colon|A_{N_{j}}(x)|\geq\frac{\delta}{2}(\log N_{j})^{-2B/5+2}\}.\end{split}

Then by (6.3) we have

(6.5) E(δ)i=3jiEj(δ).\displaystyle E(\delta)\subset\bigcap_{i=3}^{\infty}\bigcup_{j\geq i}E_{j}(\delta).

By Markov’s inequality and (6.1), we have the bound

ν(Ej(δ))\displaystyle\nu(E_{j}(\delta)) δ(logNj)2B/5B2j1.1.\displaystyle\ll_{\delta}(\log N_{j})^{2B/5-B-2}\ll j^{-1.1}.

Since j3j1.1<\sum_{j\geq 3}j^{-1.1}<\infty, from (6.5) and the Borel–Cantelli lemma we conclude that ν(E(δ))=0\nu(E(\delta))=0. By the countable additivity of ν\nu, this then implies that

ν(δ>0E(δ))=ν(j1E(1/j))=0,\nu\left(\bigcap_{\delta>0}E(\delta)\right)=\nu\left(\bigcap_{j\geq 1}E(1/j)\right)=0,

as desired. ∎

6.2. A simple L1L^{1} bound

We also need a simple L1L^{1} estimate for ergodic averages that follows from Hölder’s inequality.

Lemma 6.2.

Let kk\in\mathbb{N}. Let (X,ν)(X,\nu) be a probability space, and let T1,,Tk:XXT_{1},\ldots,T_{k}\colon X\to X be invertible measure-preserving maps. Let g1,,gk:g_{1},\ldots,g_{k}\colon\mathbb{Z}\to\mathbb{Z} be functions. Also let 1<q1,,qk1<q_{1},\ldots,q_{k}\leq\infty satisfy 1q1++1qk1\frac{1}{q_{1}}+\cdots+\frac{1}{q_{k}}\leq 1. Then, for any f1Lq1(X),,fkLqk(X)f_{1}\in L^{q_{1}}(X),\ldots,f_{k}\in L^{q_{k}}(X), θ:\theta\colon\mathbb{N}\to\mathbb{C} and N1N\geq 1, we have

𝔼nNθ(n)j=1kfj(Tjgj(n))L1(X)(𝔼nN|θ(n)|r)1/rf1Lq1(X)fkLqk(X),\displaystyle\|\mathbb{E}_{n\leq N}\,\theta(n)\prod_{j=1}^{k}f_{j}(T_{j}^{g_{j}(n)}\cdot)\|_{L^{1}(X)}\leq\left(\mathbb{E}_{n\leq N}|\theta(n)|^{r}\right)^{1/r}\|f_{1}\|_{L^{q_{1}}(X)}\cdots\|f_{k}\|_{L^{q_{k}}(X)},

where 1r1\leq r\leq\infty satisfies 1/r+1/q1++1/qk=11/r+1/q_{1}+\cdots+1/q_{k}=1.

Proof.

By Hölder’s inequality and the TT-invariance of ν\nu, we have

𝔼nNθ(n)j=1kfj(Tjgj(n))L1(X)\displaystyle\|\mathbb{E}_{n\leq N}\theta(n)\prod_{j=1}^{k}f_{j}(T_{j}^{g_{j}(n)}\cdot)\|_{L^{1}(X)}
(𝔼nN|θ(n)|r)1/rj=1k(X𝔼n𝒮|fj(Tjgj(n)x)|qj dν(x))1/qj\displaystyle\quad\leq\left(\mathbb{E}_{n\leq N}|\theta(n)|^{r}\right)^{1/r}\prod_{j=1}^{k}\left(\int_{X}\mathbb{E}_{n\in\mathcal{S}}|f_{j}(T_{j}^{g_{j}(n)}x)|^{q_{j}}\textnormal{ d}\nu(x)\right)^{1/q_{j}}
=(𝔼nN|θ(n)|r)1/rj=1kfjLqj(X),\displaystyle\quad=\left(\mathbb{E}_{n\leq N}|\theta(n)|^{r}\right)^{1/r}\prod_{j=1}^{k}\|f_{j}\|_{L^{q_{j}}(X)},

as claimed. ∎

7. Proofs of the pointwise ergodic theorems

All of our main theorems will be proven in the more general setting of weighted polynomial ergodic averages

(7.1) ANP1,,Pk(θ;f1,,fk)(x)1NnNθ(n)f1(TP1(n)x)fk(TPk(n)x),\displaystyle A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1},\ldots,f_{k})(x)\coloneqq\frac{1}{N}\sum_{n\leq N}\theta(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x),

where θ:\theta\colon\mathbb{N}\to\mathbb{C} is a function satisfying suitable uniformity norm estimates. We note the following result does not require θ\theta to be multiplicative, and the hypotheses are satisfied also for example for θ\theta being a suitably normalised version of the von Mangoldt function (by [25, Theorem 6]).

Theorem 7.1 (Pointwise convergence of polynomial ergodic averages with nice weight).

Let d,k,Kd,k,K\in\mathbb{N}. Let θ:\theta\colon\mathbb{N}\to\mathbb{C} satisfy

(7.2) |θ(n)|(logn)Kd(n)K\displaystyle|\theta(n)|\leq(\log n)^{K}d(n)^{K}

for all n3n\geq 3. Let P1,,PkP_{1},\ldots,P_{k} be polynomials with integer coefficients satisfying degP1degPk=d\deg P_{1}\leq\cdots\leq\deg P_{k}=d. Suppose that one of the following holds:

  1. (i)

    We have

    θUs+1[M]B,s(logM)B\|\theta\|_{U^{s+1}[M]}\ll_{B,s}(\log M)^{-B}

    for any M3M\geq 3, ss\in\mathbb{N} and B>0B>0.

  2. (ii)

    We have

    θud+1[M]B(logM)B\|\theta\|_{u^{d+1}[M]}\ll_{B}(\log M)^{-B}

    for any M3M\geq 3 and B>0B>0, and the polynomials P1,,PkP_{1},\ldots,P_{k} have pairwise distinct degrees.

Let (X,ν,T)(X,\nu,T) be a measure-preserving system, and let 1<q1,,qk1<q_{1},\ldots,q_{k}\leq\infty satisfy 1q1++1qk<1\frac{1}{q_{1}}+\cdots+\frac{1}{q_{k}}<1. Then, for any f1Lq1(X),,fkLqk(X)f_{1}\in L^{q_{1}}(X),\ldots,f_{k}\in L^{q_{k}}(X) and A0A\geq 0, we have

limN(logN)ANnNθ(n)f1(TP1(n)x)fk(TPk(n)x)=0\displaystyle\lim_{N\to\infty}\frac{(\log N)^{A}}{N}\sum_{n\leq N}\theta(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)=0

for almost all xXx\in X.

Let us first see how this theorem implies our main theorems.

Proof of Theorems 1.2 and 1.6 assuming Theorem 7.1.

Theorem 1.2 follows by taking θ=μ\theta=\mu and applying Lemma 5.1.

For proving Theorem 1.6, we first use Proposition 5.4 to obtain a decomposition g=g1+g2g=g_{1}+g_{2} with g1uk[N]B,k(logN)B\|g_{1}\|_{u^{k}[N]}\ll_{B,k}(\log N)^{-B}, 𝔼nN|g2(n)|rr(logN)1+o(1)\mathbb{E}_{n\leq N}|g_{2}(n)|^{r}\ll_{r}(\log N)^{-1+o(1)} and |g1|,|g2||g||g_{1}|,|g_{2}|\leq|g|. Applying Theorem 7.1, we reduce to showing that

limN1NnNg2(n)f1(TP1(n)x)fk(TPk(n)x)=0\displaystyle\lim_{N\to\infty}\frac{1}{N}\sum_{n\leq N}g_{2}(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)=0

for almost all xXx\in X.

Since 1/q1++1/qk<11/q_{1}+\cdots+1/q_{k}<1, we can find some 0<ε<min1jk(qj1)0<\varepsilon<\min_{1\leq j\leq k}(q_{j}-1) and r1r\geq 1 such that 1/r+1/(q1ε)++1/(qkε)=11/r+1/(q_{1}-\varepsilon)+\cdots+1/(q_{k}-\varepsilon)=1. Then by Hölder’s inequality we have

|𝔼nNg2(n)f1(TP1(n)x)fk(TPk(n)x)|\displaystyle\left|\mathbb{E}_{n\leq N}g_{2}(n)f_{1}(T^{P_{1}(n)}x)\cdots f_{k}(T^{P_{k}(n)}x)\right|
(𝔼nN|g2(n)|r)1/rj=1k(𝔼nN|fj(TPj(n)x)|qjε)1/(qjε).\displaystyle\quad\leq\left(\mathbb{E}_{n\leq N}|g_{2}(n)|^{r}\right)^{1/r}\prod_{j=1}^{k}\left(\mathbb{E}_{n\leq N}|f_{j}(T^{P_{j}(n)}x)|^{q_{j}-\varepsilon}\right)^{1/(q_{j}-\varepsilon)}.

In view of the bound 𝔼nN|g2(n)|rr(logN)1+o(1)\mathbb{E}_{n\leq N}|g_{2}(n)|^{r}\ll_{r}(\log N)^{-1+o(1)}, it suffices to show for small enough δ>0\delta>0 that for all 1jk1\leq j\leq k we have

(7.3) ν({xX:lim supN(logN)δ𝔼nN|fj(TPj(n)x)|qjε1})=0.\displaystyle\nu(\{x\in X\colon\limsup_{N\to\infty}\,(\log N)^{-\delta}\mathbb{E}_{n\leq N}|f_{j}(T^{P_{j}(n)}x)|^{q_{j}-\varepsilon}\geq 1\})=0.

Using Markov’s inequality and Bourgain’s maximal inequality for polynomial ergodic averages (see [22, Theorem 1.8(iii)]) together with the fact that fjqjεLqj/(qjε)(X)f_{j}^{q_{j}-\varepsilon}\in L^{q_{j}/(q_{j}-\varepsilon)}(X), we see that for any 1jk1\leq j\leq k we have

ν({xX:supN[22m,22m+1]𝔼nN|fj(TPj(n)x)|qjε2(δ/2)m})\displaystyle\nu(\{x\in X\colon\sup_{N\in[2^{2^{m}},2^{2^{m+1}}]}\mathbb{E}_{n\leq N}|f_{j}(T^{P_{j}(n)}x)|^{q_{j}-\varepsilon}\geq 2^{(\delta/2)m}\})
(2(δ/2)m)qj/(qjε)X(supN1𝔼nN|fj(TPj(n)x)|qjε)qj/(qjε) dν(x)\displaystyle\quad\leq(2^{-(\delta/2)m})^{-q_{j}/(q_{j}-\varepsilon)}\int_{X}\left(\sup_{N\geq 1}\mathbb{E}_{n\leq N}|f_{j}(T^{P_{j}(n)}x)|^{q_{j}-\varepsilon}\right)^{q_{j}/(q_{j}-\varepsilon)}\textnormal{ d}\nu(x)
2qjδm/(2(qjε))fjLqj(X)qj.\displaystyle\quad\ll 2^{-q_{j}\delta m/(2(q_{j}-\varepsilon))}\|f_{j}\|_{L^{q_{j}}(X)}^{q_{j}}.

Since m12cm<\sum_{m\geq 1}2^{-cm}<\infty for any c>0c>0, the claim (7.3) follows from the above estimate and the Borel–Cantelli lemma. ∎

We will reduce Theorem 7.1 to the following quantitative L1L^{1} estimate.

Proposition 7.2.

Let the notation and assumptions be as in Theorem 7.1. Then, for all N3N\geq 3, we have

(7.4) ANP1,,Pk(θ;f1,,fk)L1(X)A,K,P1,,Pk(j=1k(1+fjLqj(X)))B(q1,,qk)(logN)A\displaystyle\|A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1},\ldots,f_{k})\|_{L^{1}(X)}\ll_{A,K,P_{1},\ldots,P_{k}}\left(\prod_{j=1}^{k}(1+\|f_{j}\|_{L^{q_{j}}(X)})\right)^{B(q_{1},\ldots,q_{k})}(\log N)^{-A}

for some B(q1,,qk)>0B(q_{1},\ldots,q_{k})>0.

Proof that Proposition 7.2 implies Theorem 7.1.

We are going to apply Lemma 6.1. Recall the notation (7.1). Let r1r\geq 1 satisfy 1/r+1/q1++1/qk=11/r+1/q_{1}+\cdots+1/q_{k}=1. Applying first the triangle inequality, then Lemma 6.2 and finally (7.2) and Shiu’s bound, we see that for any B>0B>0, N3N\geq 3 and M[N,(1+(logN)(B/21))N]M\in[N,(1+(\log N)^{-(B/2-1)})N] we have

AMP1,,Pk(θ;f1,,fk)ANP1,,Pk(θ;f1,,fk)L1(X)\displaystyle\|A_{M}^{P_{1},\ldots,P_{k}}(\theta;f_{1},\ldots,f_{k})-A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1},\ldots,f_{k})\|_{L^{1}(X)}
X(1NNnM|θ(n)|j=1k|fj(TPj(n)x)|+(1N1M)nN|θ(n)|j=1k|fj(TPj(n)x)|) dν(x)\displaystyle\quad\leq\int_{X}\left(\frac{1}{N}\sum_{N\leq n\leq M}|\theta(n)|\prod_{j=1}^{k}|f_{j}(T^{P_{j}(n)}x)|+\left(\frac{1}{N}-\frac{1}{M}\right)\sum_{n\leq N}|\theta(n)|\prod_{j=1}^{k}|f_{j}(T^{P_{j}(n)}x)|\right)\textnormal{ d}\nu(x)
(MNN(𝔼Nn<M|θ(n)|r)1/r+(logN)(B/21)N(𝔼nN|θ(n)|r)1/r)j=1kfjLqj(X)\displaystyle\quad\ll\left(\frac{M-N}{N}\left(\mathbb{E}_{N\leq n<M}|\theta(n)|^{r}\right)^{1/r}+\frac{(\log N)^{-(B/2-1)}}{N}\left(\mathbb{E}_{n\leq N}|\theta(n)|^{r}\right)^{1/r}\right)\prod_{j=1}^{k}\|f_{j}\|_{L^{q_{j}}(X)}
(logN)B/2+1+K+(2Kr1)/rj=1kfjLqj(X).\displaystyle\ll(\log N)^{-B/2+1+K+(2^{Kr}-1)/r}\prod_{j=1}^{k}\|f_{j}\|_{L^{q_{j}}(X)}.

If BB is large enough, the exponent of the logarithm above is at most 0.49B-0.49B, say. Hence, by Lemma 6.1, the conclusion of Theorem 7.1 follows from (7.4). ∎

Proof of Proposition 7.2.

We first reduce the proof of Proposition 7.2 to the case q1==qk=q_{1}=\cdots=q_{k}=\infty.

Reduction to the case of bounded functions. We claim that it suffices to prove Proposition 7.2 in the case q1==qk=q_{1}=\cdots=q_{k}=\infty. Suppose that this case has been proven. Let 1<q1,,qk<1<q_{1},\ldots,q_{k}<\infty and fiLqi(X)f_{i}\in L^{q_{i}}(X) for i{1,,k}i\in\{1,\ldots,k\} (we can assume that all the qiq_{i} are <<\infty, since we have fLq(X)fL(X)\|f\|_{L^{q}(X)}\leq\|f\|_{L^{\infty}(X)} for any q<q<\infty and any measurable ff). Also let A>0A>0, and let C>0C>0 be large enough in terms of A,K,q1,qkA,K,q_{1},\ldots q_{k}.

For j{1,,k}j\in\{1,\ldots,k\} and N2N\geq 2, we split

fj=fj,N(1)+fj,N(2),wherefj,N(1)(x)=fj(x)1|fj(x)|(logN)C,fj,N(2)(x)=fj(x)1|fj(x)|>(logN)C.\displaystyle f_{j}=f_{j,N}^{(1)}+f_{j,N}^{(2)},\quad\textnormal{where}\quad f_{j,N}^{(1)}(x)=f_{j}(x)1_{|f_{j}(x)|\leq(\log N)^{C}},\,\,f_{j,N}^{(2)}(x)=f_{j}(x)1_{|f_{j}(x)|>(\log N)^{C}}.

Then by linearity we see that

(7.5) ANP1,,Pk(θ;f1,,fk)=ANP1,,Pk(θ;f1,N(1),,fk,N(1))+(i1,,ik)[2]k(1,,1)ANP1,,Pk(θ;f1,N(i1),,fk,N(ik)).\displaystyle\begin{split}&A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1},\ldots,f_{k})\\ \quad&=A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1,N}^{(1)},\ldots,f_{k,N}^{(1)})+\sum_{(i_{1},\ldots,i_{k})\in[2]^{k}\setminus(1,\ldots,1)}A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1,N}^{(i_{1})},\ldots,f_{k,N}^{(i_{k})}).\end{split}

Since (logN)Cfj,N(1)L(X)1\|(\log N)^{-C}f_{j,N}^{(1)}\|_{L^{\infty}(X)}\leq 1 for j{1,,k}j\in\{1,\ldots,k\}, by the case q1==qk=q_{1}=\cdots=q_{k}=\infty of Proposition 7.2 we have

ANP1,,Pk(θ;f1,N(1),,fk,N(1))L1(X)A,C,K,P1,,Pk(logN)A.\displaystyle\|A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1,N}^{(1)},\ldots,f_{k,N}^{(1)})\|_{L^{1}(X)}\ll_{A,C,K,P_{1},\ldots,P_{k}}(\log N)^{-A}.

To bound the error term in (7.5), it suffices to show that for (i1,,ik)[2]k(1,,1)(i_{1},\ldots,i_{k})\in[2]^{k}\setminus(1,\ldots,1) we have

ANP1,,Pk(θ;f1,N(i1),,fk,N(ik))L1(X)C,q1,,qk(logN)K+(2Kr1)/rδCj=1k(1+fjLqj(X)1+δ)\displaystyle\|A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1,N}^{(i_{1})},\ldots,f_{k,N}^{(i_{k})})\|_{L^{1}(X)}\ll_{C,q_{1},\ldots,q_{k}}(\log N)^{K+(2^{Kr}-1)/r-\delta C}\prod_{j=1}^{k}(1+\|f_{j}\|_{L^{q_{j}}(X)}^{1+\delta})

for some δ=δ(q1,,qk)>0\delta=\delta(q_{1},\ldots,q_{k})>0, since CC can be taken to be large enough in terms of A,K,q1,,qkA,K,q_{1},\ldots,q_{k} so that K(2Kr1)/rδCAK(2^{Kr}-1)/r-\delta C\leq-A.

For (i1,,ik)[2]k(1,,1)(i_{1},\ldots,i_{k})\in[2]^{k}\setminus(1,\ldots,1), there is some \ell such that i=2i_{\ell}=2; for the sake of notation, assume that =1\ell=1. Fix some δ>0\delta>0 and 1r<1\leq r<\infty such that 1r+1+δq1+1q2++1qk=1\frac{1}{r}+\frac{1+\delta}{q_{1}}+\frac{1}{q_{2}}+\cdots+\frac{1}{q_{k}}=1. Using the triangle inequality, Lemma 6.2 and Shiu’s bound combined with the assumption |θ(n)|(logn)Kd(n)K|\theta(n)|\leq(\log n)^{K}d(n)^{K}, we obtain

ANP1,,Pk(θ;f1,N(i1),,fk,N(ik))L1(X)\displaystyle\|A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1,N}^{(i_{1})},\ldots,f_{k,N}^{(i_{k})})\|_{L^{1}(X)}
(logN)δCANP1,,Pk(θ;|f1,N(i1)|1+δ,,|fk,N(ik)|)L1(X)\displaystyle\quad\leq(\log N)^{-\delta C}\|A_{N}^{P_{1},\ldots,P_{k}}(\theta;|f_{1,N}^{(i_{1})}|^{1+\delta},\ldots,|f_{k,N}^{(i_{k})}|)\|_{L^{1}(X)}
(logN)δC(𝔼nN|θ(n)|r)1/rf1Lq1(X)1+δfkLqk(X)\displaystyle\quad\leq(\log N)^{-\delta C}\left(\mathbb{E}_{n\leq N}|\theta(n)|^{r}\right)^{1/r}\|f_{1}\|_{L^{q_{1}}(X)}^{1+\delta}\cdots\|f_{k}\|_{L^{q_{k}}(X)}
K,r(logN)K+(2Kr1)/rδCf1Lq1(X)1+δfkLqk(X).\displaystyle\quad\ll_{K,r}(\log N)^{K+(2^{Kr}-1)/r-\delta C}\|f_{1}\|_{L^{q_{1}}(X)}^{1+\delta}\cdots\|f_{k}\|_{L^{q_{k}}(X)}.

Now the proof of Proposition 7.2 has been reduced to the case q1==qk=q_{1}=\cdots=q_{k}=\infty.

The case of bounded functions. It now remains to show (7.4) in the case q1==qk=q_{1}=\cdots=q_{k}=\infty. In the rest of the proof, we abbreviate AN(x)ANP1,,Pk(θ;f1,,fk)(x)A_{N}(x)\coloneqq A_{N}^{P_{1},\ldots,P_{k}}(\theta;f_{1},\ldots,f_{k})(x).

By Cauchy–Schwarz, it suffices to show that

X|AN(x)|2 dν(x)A,K,Pf1L(X)2fkL(X)2(logN)A.\displaystyle\int_{X}|A_{N}(x)|^{2}\textnormal{ d}\nu(x)\ll_{A,K,P}\|f_{1}\|_{L^{\infty}(X)}^{2}\cdots\|f_{k}\|_{L^{\infty}(X)}^{2}(\log N)^{-A}.

The assumption |θ(n)|(logn)Kd(n)K|\theta(n)|\leq(\log n)^{K}d(n)^{K} gives

ANL(X)(logN)K+2K1f1L(X)fkL(X),\|A_{N}\|_{L^{\infty}(X)}\ll(\log N)^{K+2^{K}-1}\|f_{1}\|_{L^{\infty}(X)}\cdots\|f_{k}\|_{L^{\infty}(X)},

so it suffices to show that for any ϕL(X)\phi\in L^{\infty}(X) we have

(7.6) Xϕ(x)AN(x) dν(x)A,K,Pf1L(X)fkL(X)ϕL(X)(logN)A.\displaystyle\int_{X}\phi(x)A_{N}(x)\textnormal{ d}\nu(x)\ll_{A,K,P}\|f_{1}\|_{L^{\infty}(X)}\cdots\|f_{k}\|_{L^{\infty}(X)}\|\phi\|_{L^{\infty}(X)}(\log N)^{-A}.

By the TT-invariance of ν\nu, the claim (7.6) is equivalent to

(7.7) X1Nd+1m[Nd],n[N]θ(n)ϕ(Tmx)f1(Tm+P1(n)x)fk(Tm+Pk(n)x) dν(x)A,K,P1,,Pkf1L(X)fkL(X)ϕL(X)(logN)A.\displaystyle\begin{split}&\int_{X}\frac{1}{N^{d+1}}\sum_{m\in[N^{d}],n\in[N]}\theta(n)\phi(T^{m}x)f_{1}(T^{m+P_{1}(n)}x)\cdots f_{k}(T^{m+P_{k}(n)}x)\textnormal{ d}\nu(x)\\ &\quad\ll_{A,K,P_{1},\ldots,P_{k}}\|f_{1}\|_{L^{\infty}(X)}\cdots\|f_{k}\|_{L^{\infty}(X)}\|\phi\|_{L^{\infty}(X)}(\log N)^{-A}.\end{split}

By the definition of the L(X)L^{\infty}(X) norm, there exists a set XXX^{\prime}\subset X such that

ν(X)=1and|ϕ(x)|ϕL(X),|fi(x)|fiL(X)for all  1ik,xX.\displaystyle\nu(X^{\prime})=1\quad\textnormal{and}\quad|\phi(x)|\leq\|\phi\|_{L^{\infty}(X)},\,|f_{i}(x)|\leq\|f_{i}\|_{L^{\infty}(X)}\,\,\textnormal{for all}\,\,1\leq i\leq k,\,x\in X^{\prime}.

Restricting the integral in (7.7) to XX^{\prime}, it suffices to show that for all xXx\in X^{\prime} we have the bound

(7.8) |1Nd+1m[Nd],n[N]θ(n)(logN)Aϕ(Tmx)f1(Tm+P1(n)x)fk(Tm+Pk(n)x)|A,P1,,Pkf1L(X)fkL(X)ϕL(X)(logN)2A.\displaystyle\begin{split}&\left|\frac{1}{N^{d+1}}\sum_{m\in[N^{d}],n\in[N]}\frac{\theta(n)}{(\log N)^{A}}\phi(T^{m}x)f_{1}(T^{m+P_{1}(n)}x)\cdots f_{k}(T^{m+P_{k}(n)}x)\right|\\ &\quad\ll_{A,P_{1},\ldots,P_{k}}\|f_{1}\|_{L^{\infty}(X)}\cdots\|f_{k}\|_{L^{\infty}(X)}\|\phi\|_{L^{\infty}(X)}(\log N)^{-2A}.\end{split}

Write θ=θ1+θ2\theta=\theta_{1}+\theta_{2} where θ1(n)=θ(n)1|θ(n)|(logn)A\theta_{1}(n)=\theta(n)1_{|\theta(n)|\leq(\log n)^{A}}, θ2(n)=θ(n)1|θ(n)|>(logn)A\theta_{2}(n)=\theta(n)1_{|\theta(n)|>(\log n)^{A}}. Since d(n)K(logn)AKd(n)^{K}\geq(\log n)^{A-K} in the support of θ2\theta_{2}, we have

(7.9) 𝔼nN|θ2(n)|(logN)K𝔼nN(log(n+1))(AK)loglogAd(n)K+KloglogA(logN)(AloglogA)/2\displaystyle\begin{aligned} \mathbb{E}_{n\leq N}|\theta_{2}(n)|&\ll(\log N)^{K}\mathbb{E}_{n\leq N}(\log(n+1))^{-(A-K)\log\log A}d(n)^{K+K\log\log A}\\ &\ll(\log N)^{-(A\log\log A)/2}\end{aligned}

if AA is large enough in terms of KK.

Now, if assumption (i) of the theorem holds, by the triangle inequality and (7.9) we have

(7.10) θ1Us[N]θUs[N]+θ2Us[N]θUs[N]+θ2U1[N]A,s,P1,,Pk(logN)(AloglogA)/2.\displaystyle\begin{aligned} \|\theta_{1}\|_{U^{s}[N]}\leq\|\theta\|_{U^{s}[N]}+\|\theta_{2}\|_{U^{s}[N]}\leq\|\theta\|_{U^{s}[N]}+\|\theta_{2}\|_{U^{1}[N]}\ll_{A,s,P_{1},\ldots,P_{k}}(\log N)^{-(A\log\log A)/2}.\end{aligned}

If instead assumption (ii) holds, we similarly have

(7.11) θ1us[N]A,s,P1,,Pk(logN)(AloglogA)/2.\displaystyle\|\theta_{1}\|_{u^{s}[N]}\ll_{A,s,P_{1},\ldots,P_{k}}(\log N)^{-(A\log\log A)/2}.

In either case, since we may assume that ϕ,f1,,fk\phi,f_{1},\ldots,f_{k} are supported on [CN,CNd][-CN^{,}CN^{d}] for some CP1C\ll_{P}1 and since |θ1(n)|(logn)A1|\theta_{1}(n)|(\log n)^{-A}\leq 1, we can use either Theorem 4.2 or 4.1 (depending on whether we have (7.11) or (7.10)) to conclude that (7.8) holds with θ1\theta_{1} in place of θ\theta. Hence it suffices to show that (7.8) holds with θ2\theta_{2} in place of θ\theta. For this it suffices to show that

𝔼nN|θ2(n)|(logN)A.\displaystyle\mathbb{E}_{n\leq N}|\theta_{2}(n)|\ll(\log N)^{-A}.

Here the left-hand side is

(logN)K2(AK)𝔼nNd(n)2K(logN)A\displaystyle\leq(\log N)^{K-2(A-K)}\mathbb{E}_{n\leq N}d(n)^{2K}\ll(\log N)^{-A}

if AA is large enough in terms of KK. This completes the proof. ∎

Lastly, we prove Theorem 1.3.

Proof of Theorem 1.3.

Following the proof of Theorem 7.1 verbatim, we reduce matters to showing that

(7.12) |1Nk+1m1,,mk,n[N]μ(n)ϕ(T1m1Tkmkx)j=1kfj(T1m1TkmkTjnx)|Af1L(X)fkL(X)ϕL(X)(logN)A\displaystyle\begin{split}&\left|\frac{1}{N^{k+1}}\sum_{m_{1},\ldots,m_{k},n\in[N]}\mu(n)\phi(T_{1}^{m_{1}}\cdots T_{k}^{m_{k}}x)\prod_{j=1}^{k}f_{j}(T_{1}^{m_{1}}\cdots T_{k}^{m_{k}}T_{j}^{n}x)\right|\\ &\quad\ll_{A}\|f_{1}\|_{L^{\infty}(X)}\cdots\|f_{k}\|_{L^{\infty}(X)}\|\phi\|_{L^{\infty}(X)}(\log N)^{-A}\end{split}

for any functions ϕ,f1,,fkL(X)\phi,f_{1},\ldots,f_{k}\in L^{\infty}(X) and any xXx\in X for which |ϕ(x)|ϕL(X),|fj|fjL(X)|\phi(x)|\leq\|\phi\|_{L^{\infty}(X)},|f_{j}|\leq\|f_{j}\|_{L^{\infty}(X)}. By setting

Fj((m1,,mk))\displaystyle F_{j}((m_{1},\ldots,m_{k})) =fj(T1m1Tkmkx),\displaystyle=f_{j}(T_{1}^{m_{1}}\cdots T_{k}^{m_{k}}x),
G((m1,,mk))\displaystyle G((m_{1},\ldots,m_{k})) =ϕ(T1m1Tkmkx),\displaystyle=\phi(T_{1}^{m_{1}}\cdots T_{k}^{m_{k}}x),

the estimate (7.12) reduces to

|1Nk+1𝐦[N]k,n[N]μ(n)G(𝐦)F1(𝐦+(n,0,,0))Fk(𝐦+(0,,0,n))|\displaystyle\left|\frac{1}{N^{k+1}}\sum_{\mathbf{m}\in[N]^{k},n\in[N]}\mu(n)G(\mathbf{m})F_{1}(\mathbf{m}+(n,0,\ldots,0))\cdots F_{k}(\mathbf{m}+(0,\ldots,0,n))\right|
AF1L(X)FkL(X)GL(X)(logN)A.\displaystyle\quad\ll_{A}\|F_{1}\|_{L^{\infty}(X)}\cdots\|F_{k}\|_{L^{\infty}(X)}\|G\|_{L^{\infty}(X)}(\log N)^{-A}.

But this bound follows directly from Lemmas 4.7 and 5.1. ∎

References

  • [1] V. Bergelson and A. Leibman. Polynomial extensions of van der Waerden’s and Szemerédi’s theorems. J. Amer. Math. Soc., 9(3):725–753, 1996.
  • [2] V. Bergelson and A. Leibman. A nilpotent Roth theorem. Invent. Math., 147(2):429–470, 2002.
  • [3] J. Bourgain. On the pointwise ergodic theorem on LpL^{p} for arithmetic sets. Israel J. Math., 61(1):73–84, 1988.
  • [4] J. Bourgain. Double recurrence and almost sure convergence. J. Reine Angew. Math., 404:140–161, 1990.
  • [5] Q. Chu. Convergence of weighted polynomial multiple ergodic averages. Proc. Amer. Math. Soc., 137(4):1363–1369, 2009.
  • [6] Y. Do, R. Oberlin, and E. A. Palsson. Variation-norm and fluctuation estimates for ergodic bilinear averages. Indiana Univ. Math. J., 66(1):55–99, 2017.
  • [7] T. Eisner. A polynomial version of Sarnak’s conjecture. C. R. Math. Acad. Sci. Paris, 353(7):569–572, 2015.
  • [8] E. H. El Abdalaoui, J. Kułaga-Przymus, M. Lemańczyk, and T. de la Rue. The Chowla and the Sarnak conjectures from ergodic theory point of view. Discrete Contin. Dyn. Syst., 37(6):2899–2944, 2017.
  • [9] N. Frantzikinakis. Some open problems on multiple ergodic averages. Bull. Hellenic Math. Soc., 60:41–90, 2016.
  • [10] N. Frantzikinakis and B. Host. Multiple ergodic theorems for arithmetic sets. Trans. Amer. Math. Soc., 369(10):7085–7105, 2017.
  • [11] N. Frantzikinakis, B. Host, and B. Kra. Multiple recurrence and convergence for sequences related to the prime numbers. J. Reine Angew. Math., 611:131–144, 2007.
  • [12] N. Frantzikinakis, E. Lesigne, and M. Wierdl. Random sequences and pointwise convergence of multiple ergodic averages. Indiana Univ. Math. J., 61(2):585–617, 2012.
  • [13] J. Friedlander and H. Iwaniec. Opera de cribro, volume 57 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI, 2010.
  • [14] W. T. Gowers. A new proof of Szemerédi’s theorem. Geom. Funct. Anal., 11(3):465–588, 2001.
  • [15] A. Granville and X. Shao. When does the Bombieri-Vinogradov theorem hold for a given multiplicative function? Forum Math. Sigma, 6:Paper No. e15, 23, 2018.
  • [16] B. Green and T. Tao. Linear equations in primes. Ann. of Math. (2), 171(3):1753–1850, 2010.
  • [17] B. Green and T. Tao. The Möbius function is strongly orthogonal to nilsequences. Ann. of Math. (2), 175(2):541–566, 2012.
  • [18] B. Green, T. Tao, and T. Ziegler. An inverse theorem for the Gowers Us+1[N]U^{s+1}[N]-norm. Ann. of Math. (2), 176(2):1231–1372, 2012.
  • [19] R. Han, V. Kovač, M. T. Lacey, J. Madrid, and F. Yang. Improving estimates for discrete polynomial averages. J. Fourier Anal. Appl., 26(3):Paper No. 42, 11, 2020.
  • [20] B. Host and B. Kra. Convergence of polynomial ergodic averages. Israel J. Math., 149:1–19, 2005. Probability in mathematics.
  • [21] H. Iwaniec and E. Kowalski. Analytic number theory, volume 53 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI, 2004.
  • [22] B. Krause, M. Mirek, and T. Tao. Pointwise ergodic theorems for non-conventional bilinear polynomial averages. Ann. of Math. (2), 195(3):997–1109, 2022.
  • [23] M. T. Lacey. The bilinear maximal functions map into LpL^{p} for 2/3<p12/3<p\leq 1. Ann. of Math. (2), 151(1):35–57, 2000.
  • [24] A. Leibman. Convergence of multiple ergodic averages along polynomials of several variables. Israel J. Math., 146:303–315, 2005.
  • [25] J. Leng. Efficient Equidistribution of Nilsequences. arXiv e-prints, page arXiv:2312.10772, December 2023.
  • [26] J. Leng, A. Sah, and M. Sawhney. Quasipolynomial bounds on the inverse theorem for the Gowers Us+1[N]U^{s+1}[N]-norm. arXiv e-prints, page arXiv:2402.17994, February 2024.
  • [27] K. Matomäki and X. Shao. Discorrelation between primes in short intervals and polynomial phases. Int. Math. Res. Not. IMRN, (16):12330–12355, 2021.
  • [28] S. Peluse. Bounds for sets with no polynomial progressions. Forum Math. Pi, 8:e16, 55, 2020.
  • [29] S. Peluse and S. Prendiville. A polylogarithmic bound in the nonlinear Roth theorem. Int. Math. Res. Not. IMRN, (8):5658–5684, 2022.
  • [30] S. Prendiville. Quantitative bounds in the polynomial Szemerédi theorem: the homogeneous case. Discrete Anal., pages Paper No. 5, 34, 2017.
  • [31] P. Shiu. A Brun-Titchmarsh theorem for multiplicative functions. J. Reine Angew. Math., 313:161–170, 1980.
  • [32] T. Tao and J. Teräväinen. Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions. To appear in J. Eur. Math. Soc.
  • [33] I. M. Vinogradov. The method of trigonometrical sums in the theory of numbers. Dover Publications, Inc., Mineola, NY, 2004. Translated from the Russian, revised and annotated by K. F. Roth and Anne Davenport, Reprint of the 1954 translation.
  • [34] I. Vinogradow. Simplest trigonometrical sums with primes. C. R. (Doklady) Acad. Sci. URSS (N.S.), 23:615–617, 1939.
  • [35] M. N. Walsh. Norm convergence of nilpotent ergodic averages. Ann. of Math. (2), 175(3):1667–1688, 2012.