Quasipolynomial bounds on the inverse theorem for the Gowers $U^{s+1}[N]$ -norm

James Leng Department of Mathematics, UCLA, Los Angeles, CA 90095, USA [email protected] , Ashwin Sah and Mehtaab Sawhney Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA {asah,msawhney}@mit.edu

Abstract.

We prove quasipolynomial bounds on the inverse theorem for the Gowers $U^{s+1}[N]$ -norm. The proof is modeled after work of Green, Tao, and Ziegler and uses as a crucial input recent work of the first author regarding the equidistribution of nilsequences. In a companion paper, this result will be used to improve the bounds on Szemerédi’s theorem.

Leng was supported by NSF Graduate Research Fellowship Grant No. DGE-2034835. Sah and Sawhney were supported by NSF Graduate Research Fellowship Program DGE-2141064.

1. Introduction

We recall the definition of the Gowers $U^{s}$ -norm on $\mathbb{Z}/N\mathbb{Z}$ and $[N]$ . Throughout we let $[N]=\{1,\ldots,N\}$ .

Definition 1.1.

Given $f\colon\mathbb{Z}/N\mathbb{Z}\to\mathbb{C}$ and $s\geq 1$ , we define

\lVert f\rVert_{U^{s}(\mathbb{Z}/N\mathbb{Z})}^{2^{s}}=\mathbb{E}_{x,h_{1},\ldots,h_{s}\in\mathbb{Z}/N\mathbb{Z}}\Delta_{h_{1},\ldots,h_{s}}f(x)

where $\Delta_{h}f(x)=f(x)\overline{f(x+h)}$ is the multiplicative discrete derivative (extended to lists by composition). Given a natural number $N$ and a function $f\colon[N]\to\mathbb{C}$ , we choose a number $\widetilde{N}\geq 2^{s}N$ and define $\widetilde{f}\colon\mathbb{Z}/\widetilde{N}\mathbb{Z}\to\mathbb{C}$ via $\widetilde{f}(x)=f(x)$ for $x\in[N]$ and $0$ otherwise. Then

\lVert f\rVert_{U^{s}[N]}:=\lVert\widetilde{f}\rVert_{U^{s}(\mathbb{Z}/\widetilde{N}\mathbb{Z})}/\lVert\mathbbm{1}_{[N]}\rVert_{U^{s}(\mathbb{Z}/\widetilde{N}\mathbb{Z})}.

Remark.

This is known to be well-defined and independent of $\widetilde{N}$ , and a norm if $s\geq 2$ ; see [27, Lemma B.5].

Our main result is quasi-polynomial bounds on the inverse theorem for the Gowers $U^{s+1}$ -norm over the integers. This builds on earlier work [43, Section 8] of the first author which handled the case of the $U^{4}$ -norm.

Theorem 1.2.

Fix $\delta\in(0,1/2)$ . Suppose that $f\colon[N]\to\mathbb{C}$ is $1$ -bounded and

\lVert f\rVert_{U^{s+1}[N]}\geq\delta.

Then there exists a nilmanifold $G/\Gamma$ of degree $s$ , complexity at most $M$ , and dimension at most $d$ as well as a function $F$ on $G/\Gamma$ which is at most $K$ -Lipschitz such that

|\mathbb{E}_{n\in[N]}[f(n)\overline{F(g(n)\Gamma)}]|\geq\varepsilon,

where we may take

d\leq\log(1/\delta)^{O_{s}(1)}\emph{ and }\varepsilon^{-1},K,M\leq\exp(\log(1/\delta)^{O_{s}(1)}).

Remark.

Throughout this paper, we will abusively write $\log$ for $\max(\log(\cdot),e^{e})$ ; this is to avoid issues with small numbers.

We have not formally defined a nilmanifold or notions of complexity; our definition is identical to that in work of Green and Tao [29] and will be recalled precisely in Sections 2 and 3.

In the companion paper to this work [46], we will use Theorem 1.2 in order to improve the long standing bounds of Gowers [16, 18] on Szemerédi’s theorem.

Theorem 1.3 (Theorem 1.1 in [46]).

Let $r_{k}(N)$ denote the size of the largest $S\subseteq[N]$ such that $S$ has no $k$ -term arithmetic progressions. For $k\geq 5$ , there is $c_{k}\in(0,1)$ such that

r_{k}(N)\ll N\exp(-(\log\log N)^{c_{k}}).

1.1. History and previous results

A long standing conjecture of Erdős and Turán [13] stated that $r_{k}(N)=o(N)$ . In full generality, this conjecture remained open until a combinatorial tour de force of Szemerédi [54, 55] which established the Erdős and Turán conjecture.

Theorem 1.4.

For $k\geq 3$ , we have that

r_{k}(N)=o_{k}(N).

Due to uses of the van der Waerden theorem and the regularity lemma (which was introduced in this work), Szemerédi’s density saving over the trivial bound was exceedingly small. In particular, Szemerédi’s result provided no improvement on known bounds for van der Waerden’s theorem which was part of Erdős and Turán’s original motivation.

The first result in the effort to prove reasonable bounds for $r_{k}(N)$ , e.g. giving a density saving of at least a finite iterated logarithmic type, came from work of Roth [50] which proved

r_{3}(N)\ll N(\log\log N)^{-1}.

Being based on Fourier analysis, the methods used in this paper did not obviously generalize to $k\geq 4$ . An estimate for $r_{k}(N)$ which was “reasonable” would have to wait until pioneering work of Gowers [16, 18].

The starting point of work of Gowers [16, 18] is noting via an iterative application of the Cauchy–Schwarz inequality that if a set $A$ of density $\delta$ in $[N]$ has no $(s+2)$ -term arithmetic progressions then $\lVert f\rVert_{U^{s+1}[N]}\geq\delta^{O_{s}(1)}$ where $f$ is a shifted indicator function of the set. In doing so, Gowers provided the correct notion of “psuedorandomness” generalizing Fourier coefficients which was suitable for understanding arithmetic patterns in subsets of the integers and therefore created “higher order Fourier analysis”. The key technical ingredient in work of Gowers was a certain “local inverse theorem” for the $U^{s+1}[N]$ -norm. Gowers proved that given a $1$ -bounded function $f$ such that $\lVert f\rVert_{U^{s+1}[N]}\geq\delta$ , there exists a decomposition of $[N]$ into arithmetic progressions of length roughly $N^{c_{s}}$ and a $1$ -bounded function $g$ which is constant along these arithmetic progressions such that

\mathbb{E}_{x\in[N]}f(x)\overline{g(x)}\geq\delta^{O_{s}(1)};

i.e., $f$ correlates with $g$ . This result, coupled with the density increment strategy as introduced by Roth [50], provided the bound

r_{k}(N)\ll N(\log\log N)^{-c_{k}}

for Szemerédi’s theorem. These bounds have remained the best known for general $k$ until this work. For the sake of comparison, a long sequence of works have attacked the special case of $k=3$ , culminating in a recent breakthrough work of Kelley and Meka [41] which proved

r_{3}(N)\ll N\exp(-c(\log N)^{1/12});

the constant $1/12$ was refined to $1/9$ in work of Bloom and Sisask [5]. The only other improvements to the bound of Gowers were due to works of Green and Tao [25, 30] which ultimately established that

r_{4}(N)\ll N(\log N)^{-c},

and very recent work of the authors [45] which handled the case $k=5$ of Theorem 1.3.

Notice however that the “local inverse theorem” of Gowers only gives correlations on arithmetic progressions of length $N^{c_{s}}$ and that the converse of this result is not true. In particular, a function may have small $U^{s+1}[N]$ -norm and still correlate with a function which is constant on progressions of length $N^{c_{s}}$ . To construct such an example, break $[N]$ into consecutive segments of length $\sqrt{N}$ and include each segment with probability $1/2$ ; while this set with high probability has large “local correlations” it has polynomially small Gowers norm. To obtain a full inverse result (analogous to the quality of Freiman’s theorem, say), one must carefully pin down the global structure as well. Such a task is not straightforward, since the natural generalization of Fourier characters to exponentials of polynomials does not suffice.

A crucial development in the theory towards the inverse conjecture for the Gowers norm was the discovery of the role of nilpotent Lie groups. In groundbreaking work, Furstenberg [14] gave an alternate proof of Szemerédi based on ergodic theory; this work naturally led to seeking to understand certain nonconventional ergodic averages. In works of Conze and Lesigne [11] and Furstenberg and Weiss [15] regarding nonconventional ergodic averages, nilmanifolds $G/\Gamma$ where $G$ is nilpotent and $\Gamma$ is a discrete cocompact subgroup were brought to the forefront. Host and Kra [38] and independently Ziegler [62], proved convergence of such nonconventional ergodic averages. Crucial to these works was establishing that such averages are controlled by projections on certain characteristic factors which naturally give rise to nilmanifolds. The role of nilsequences (derived from polynomial sequences on nilmanifolds) was further highlighted in work of Bergelson, Host, and Kra [3].

The statement of the inverse conjecture (without the given quantification) we will prove was first formulated in work of Green and Tao [27]. Conditional on this inverse conjecture and that the Möbius function does not correlate with nilsequences, Green and Tao were able to prove asymptotic counts for all linear patterns in the primes of “finite complexity”, vastly generalizing the celebrated Green–Tao theorem [24]. Both of these conjectures were resolved; the second being resolved in work of Green and Tao [28] while the first was resolved in work of Green, Tao, and Ziegler [34]. We remark the cases $s=2$ and $s=3$ of the inverse conjecture were proven earlier by Green and Tao [23] and Green, Tao, and Ziegler [32] respectively. A crucial ingredient in the cases $s\geq 3$ was work of Green and Tao [29] on the equidistribution behavior of polynomial orbits on nilmanifolds. An alternative approach to the inverse conjecture was initiated by Szegedy [53], involving the development of the theory of nilspaces by Camarena and Szegedy [6]; a detailed treatment of these papers was given by Candela [8, 7]. This nilspace approach has been further developed in works of Gutman, Manners, and Varjú[36, 35, 37]. Both of these approaches to the inverse theorem, however, at least formally, gave no bounds on the complexity or dimension of the nilsequences with which the function correlates in the cases $s\geq 4$ . A third approach due to Manners [47] will be discussed later in this section.

We remark that the study of the inverse conjecture for the Gowers norm makes sense beyond the setting of functions on the interval or on the cyclic group $\mathbb{Z}/N\mathbb{Z}$ . Work of Bergelson, Tao, and Ziegler [4] and Tao and Ziegler [59, 60] resolved the analogue of the inverse conjecture for the Gowers norm over $\mathbb{F}_{p}^{n}$ . Candela and Szegedy [10] gave a version of the inverse conjecture for the Gowers norm over all compact abelian groups. This final work falls within the context of giving proofs which, broadly speaking, attempt to handle various abelian groups in a uniform manner. There has been substantial further work in this rough direction including works of Jamneshan and Tao [40], Jamneshan, Shalom, and Tao [39], and Candela, González-Sánchez, and Szegedy [9].

The inverse theorem has had numerous further applications within additive combinatorics; we highlight just two. First, Tao and Ziegler [61] gave an asymptotic for the number of polynomial patterns $x+P_{1}(y),\ldots,x+P_{j}(y)$ in the primes where $P_{1}(0)=\cdots=P_{j}(0)=0$ with top degree terms $P_{1},\ldots,P_{j}$ being distinct. Second, works of Green and Tao [26] and Altman [2, 1] used the inverse conjecture in combination with an arithmetic regularity lemma to establish the true complexity conjectures of Gowers and Wolf [21].

Due to its importance in the theory of additive patterns, establishing quantitative bounds on the inverse theorem for the Gowers norm has been seen as a central problem in additive combinatorics, with Green suggesting it as “perhaps the biggest open question in the subject” [22, Problem 56]. For the case of $s=2$ , work of Green and Tao [23] gave quantitative bounds for the inverse theorem over the integers and work of Sanders [51] combined with the strategy in [23] proves Theorem 1.2 for the case of $s=2$ . For general $s$ , until roughly five years ago no quantitative bounds were known for the inverse theorem and this was considered a major open problem. This state of affairs was substantially improved in remarkable work of Manners [47] which proves a version of the inverse theorem where, in the notation of Theorem 1.2,

d\leq\delta^{-O_{s}(1)}\text{ and }\varepsilon^{-1},K,M\leq\exp(\exp(\delta^{-O_{s}(1)})).

This result was subsequently used as a crucial input in work of Tao and Teräväinen [57] to give an effective result for the counts of linear equations in the primes. We remark that a quantitative version of the inverse conjecture over finite fields of high characteristic was proven in work of Gowers and Milićević [20, 19].

At the highest level, the quantitative proofs of Manners [47] and Gowers and Milićević [20, 19] examine when the iterated derivatives of a function are $0$ with positive probability. Deriving useful information from this hypothesis over finite fields and the integers are very different problems but fundamentally one glues information from higher derivatives together into information regarding lower derivatives iteratively.

Our proof instead operates via induction on $s$ and attempts to glue degree $(s-1)$ nilmanifolds into a degree $s$ one exactly as in work of Green, Tao, and Ziegler [34]. Our proof in fact is very closely modeled on their work and borrows large sections of their work essentially verbatim. In fact, we believe that the proof in [34], if appropriately quantified, itself yields a bound involving $O(s^{2})$ many iterated exponentials. The primary improvement of our proof over theirs stems from the use of improved quantitative equidistribution results on nilmanifolds [43, 42] rather than the results of [29]. The reason we obtain quasi-polynomial bounds is that our proof, even though it inducts on $s$ , gives quasi-polynomial bounds for each step of the induction. Since an iterated composition of finitely many quasi-polynomial functions is still quasi-polynomial, it follows that our bounds should remain quasi-polynomial. In contrast, we believe that the proof in [34], appropriately quantified, results in adding $O(t)$ iterated exponentials in each step $t$ of the induction, which when iterated totals $O(s^{2})$ iterated exponentials. Here the results of [43, 42] play a crucial role in eliminating the logarithms accumulated in the induction step. We further remark that the case $s=3$ of the main theorem (e.g. the $U^{4}$ -inverse theorem) of the strength in Theorem 1.2 was proven earlier by the first author in [43, Section 8] and may useful stepping stone for reading this paper (although this paper is logically independent).

1.2. Organization of the paper I

We briefly discuss the next three sections of the paper. In Section 2, we define a number of basic notions regarding nilmanifold and set various conventions which will be used throughout the paper. Our conventions differ in various extremely minor ways from those in the work of Green, Tao, and Ziegler [34] but we record them explicitly to recall a number of definitions which will be used throughout the paper. In Section 3, we set various complexity notions that will be given throughout the paper. In the case of nilmanifolds which are given a degree filtration (as is the case in Theorem 1.2), our conventions match those of Green and Tao [29]. Given these notions in hand, we will be in position to outline the main proof in greater detail in Section 4.

Acknowledgements

The first author thanks Terence Tao for advisement. The authors thank Ben Green and Terence Tao for useful discussions regarding [34, 31]. The authors are grateful to Dan Altman, Ben Green, and Zach Hunter for comments. Finally the authors are especially grateful to Sarah Peluse for exceptionally detailed and useful comments on the manuscript.

2. Conventions on nilmanifolds

We will recall a large portion of setup regarding nilsequences. In order to discuss this in a quantitative manner, various complexity notions are required which are formally defined in Section 3. This section contains little more than bare definitions; a number of these concepts are developed and motivated in a beautiful manner in [34, Section 6].

2.1. Basic group theory

We briefly record various basic group theory notations which will be used throughout the paper; our notation is identical to that of [34, Section 3].

Given a group $G$ and a subset $A$ , we define $\langle A\rangle$ to be the subgroup generated by the subset $A$ . Given a collection of subgroups $(H_{i})_{i\in I}$ in $G$ , we define $\bigvee_{i\in I}H_{i}$ to be the smallest subgroup containing all the $H_{i}$ . Given $h,k\in G$ , we denote the commutator of $h$ and $k$ to be

[h,k]=h^{-1}k^{-1}hk.

Given a sequence of elements $g_{1},\ldots,g_{r}\in G$ , we define the set of $(r-1)$ -fold commutators inductively. The $0$ -fold commutators of the set $g_{i}$ is simply $g_{i}$ . For $r>1$ , an $(r-1)$ -fold commutator is $[w,w^{\prime}]$ where $w$ and $w^{\prime}$ are $(s-1)$ -fold and $(s^{\prime}-1)$ -fold commutators of $g_{i_{1}},\ldots,g_{i_{s}}$ and $g_{i_{1}^{\prime}},\ldots,g_{i_{s^{\prime}}^{\prime}}$ respectively with $\{i_{1},\ldots,i_{s}\}\cup\{i_{1}^{\prime},\ldots,i_{s^{\prime}}^{\prime}\}=\{1,\ldots,r\}$ and $s+s^{\prime}=r$ . For instance, $[[g_{3},g_{4}],[g_{1},g_{2}]]$ and $[g_{1},[g_{3},[g_{2},g_{4}]]]$ are $3$ -fold commutators of $g_{1}$ , $g_{2}$ , $g_{3}$ , and $g_{4}$ .

We let $H\leqslant G$ denote that $H$ is a subgroup of $G$ . Given $H,K\leqslant G$ , we denote the commutator subgroup

[H,K]=\langle[h,k]\colon h\in H,k\in K\rangle.

The following pair of elementary lemmas will be used throughout the paper to verify various commutator identities; the first is [34, Lemma 3.1].

Lemma 2.1.

Let $H=\langle A\rangle$ and $K=\langle B\rangle$ be normal subgroups of a nilpotent group $G$ . Then $[H,K]$ is also normal and is generated by the $(i+j-1)$ -fold iterated commutators of $a_{1},\ldots,a_{i},b_{1},\ldots,b_{j}$ over all choices of $a_{1},\ldots,a_{i}\in A$ , $b_{1},\ldots,b_{j}\in B$ and $i,j\geq 1$ .

This implies (see [34, p. 1242]) that for families $(H_{i})_{i\in I}$ , $(K_{j})_{j\in J}$ which are normal in a nilpotent group $G$ ,

\Big{[}\bigvee_{i\in I}H_{i},\bigvee_{j\in J}K_{j}\Big{]}=\bigvee_{i\in I,j\in J}[H_{i},K_{j}].

We next require that normality and various filtration conditions can be checked at the level of generators.

Lemma 2.2.

Suppose $K\leqslant H$ with $H=\langle A\rangle,K=\langle B\rangle$ where $A=A^{-1}$ and $B=B^{-1}$ . Then:

•

If $[a,b]\in K$ for all $a\in A$ and $b\in B$ then $K$ is normal in $H$ .
•

Suppose $L\leqslant K\cap H$ is a normal subgroup with respect to both $K$ and $H$ , and suppose for $a\in A$ , $b\in B$ , we have $[a,b]\in L$ . Then $[H,K]\leqslant L$ .

Remark.

Suppose we wish to prove that $(G_{i})_{i\in I}$ forms an $I$ -filtration (see Definition 2.3). This lemma implies that it suffices to check the commutator filtration conditions simply at the level of generators: if for each $i,j\in I$ we know $[g_{i},g_{j}]\in G_{i+j}$ for all generators $g_{i}$ for $G_{i}$ and $g_{j}$ for $G_{j}$ , then we can deduce that $G_{i+j}$ is normal in $G_{i}$ using the first bullet point above, and then deduce that $[G_{i},G_{j}]\leqslant G_{i+j}$ using the second bullet point above.

Proof.

For $a\in A,b\in B$ we have $[a,b]\in K$ hence $a^{-1}b^{-1}a\in K$ . Since $B=B^{-1}$ generates $K$ , we find $a^{-1}Ka\leqslant K$ . Since $A$ generates $H$ , we deduce that $K$ is normal in $H$ .

For the second item, note that

[xy,z]=y^{-1}[x,z]y\cdot[y,z]\text{ and }[x,zy]=[x,y]\cdot y^{-1}[x,z]y.

Repeatedly expanding $[h,k]$ for $h\in H,k\in K$ into generators proves the result. ∎

Finally, and most importantly, we will require the following versions of the Baker–Campbell–Hausdorff formula (see [34, (3.2)]). Given $g_{1},g_{2}$ in a nilpotent group $G$ and $n_{1},n_{2}\in\mathbb{N}$ , we have

(2.1)

g_{1}^{n_{1}}g_{2}^{n_{2}}=g_{2}^{n_{2}}g_{1}^{n_{1}}\prod_{a}g_{a}^{P_{a}(n_{1},n_{2})}

where $g_{a}$ ranges over all iterated commutators of $g_{1}$ and $g_{2}$ with at least $1$ copy of each and $P_{a}(n_{1},n_{2})\colon\mathbb{Z}\times\mathbb{Z}\to\mathbb{Z}$ is a polynomial in $n_{1}$ and $n_{2}$ . Furthermore if $g_{a}$ involves $d_{1}$ copies of $g_{1}$ and $d_{2}$ copies of $g_{2}$ we have that $P_{a}$ has degree at most $d_{1}$ in $n_{1}$ and degree at most $d_{2}$ in $n_{2}$ . Here the $a$ have been ordered in some arbitrary manner.

If $G$ is a connected, simply connected nilpotent Lie group, then we denote the Lie algebra of $G$ as $\log G$ and let $\exp\colon\log G\to G$ denote the exponential map while $\log\colon G\to\log G$ is the inverse (the exponential map being a homeomorphism in this situation). When we refer to nilpotent Lie groups, they will henceforth be connected and simply connected. For $g\in G$ and $t\in\mathbb{R}$ , we define

g^{t}=\exp(t\log g).

The Baker–Campbell–Hausdorff formula also implies that

\exp(t_{1}\log g_{1}+t_{2}\log g_{2})=g_{1}^{t_{1}}g_{2}^{t_{2}}\prod_{a}g_{a}^{R_{a}(t_{1},t_{2})}

where $g_{a}$ ranges over all iterated commutators of $g_{1}$ and $g_{2}$ with at least $1$ copy of each and $R_{a}$ is a polynomial with rational coefficients satisfying identical degree constraints to $P_{a}$ . Finally we require the following, most standard version, of the Baker–Campbell–Hausdorff formula which states that if $X,Y\in\log G$ , then

\exp(X)\exp(Y)=\exp\Big{(}X+Y+\frac{1}{2}[X,Y]+\cdots\Big{)}

where the remaining terms in the expansion are iterated commutators in $X$ and $Y$ with all higher terms having at least one “copy” of $X$ and $Y$ within them. In particular, this implies that

(2.2)

\exp(-X)\exp(-Y)\exp(X)\exp(Y)=\exp\big{(}[X,Y]+\cdots\big{)}

where are all higher order terms have at least one copy of $X$ and $Y$ in them and are $r$ -fold commutators with $r\geq 3$ . In all versions of Baker–Campbell–Hausdorff, it is important for us that nilpotency means these expressions are finite.

2.2. Filtrations

We next require the notion of an ordering and an associated filtration (see [34, Definition 6.7]).

Definition 2.3.

An ordering $I=(I,\preceq,+,0)$ is a set $I$ with a distinguished element $0$ , binary operation $+\colon I\times I\to I$ , and a partial order $\preceq$ on $I$ such that

•

$+$ is associative and commutative with $0$ acting as an identity element;
•

$\preceq$ has $0$ as the minimal element;
•

For all $i,j,k\in I$ , if $i\preceq j$ then $i+k\preceq j+k$ ;
•

The initial segments $\{i\in I\colon i\preceq d\}$ are finite for all $d$ .

We define the following three orderings, with addition being the standard addition:

•

The degree ordering is given by the standard ordering on $\mathbb{N}$ , denoted $I=\mathbb{N}$ for short;
•

The degree-rank ordering is given by $\{(d,r)\in\mathbb{N}^{2}\colon 0\leq r\leq d\}$ with the ordering that $(d^{\prime},r^{\prime})\preceq(d,r)$ if $d^{\prime}<d$ or $d^{\prime}=d$ and $r^{\prime}\leq r$ , denoted $I=\mathrm{DR}$ for short;
•

The multidegree ordering is given by $\mathbb{N}^{k}$ with $(i_{1}^{\prime},\ldots,i_{k}^{\prime})\preceq(i_{1},\ldots,i_{k})$ when $i_{j}^{\prime}\leq i_{j}$ for all $1\leq j\leq k$ , denoted $I=\mathbb{N}^{k}$ for short.

An $I$ -filtration of $G$ is a collection of subgroups $G_{I}=(G_{i})_{i\in I}$ such that $G_{0}=G$ and:

•

(Nesting) If $i,j\in I$ are such that $i\preceq j$ then $G_{i}\geqslant G_{j}$ ;
•

(Commutator) For $i,j\in I$ , we have $[G_{i},G_{j}]\leqslant G_{i+j}$ .

We say that a filtered group $G$ has degree $\leq d$ (for $d\in I$ ) if $G_{i}$ is trivial for $i\not\preceq d$ . $G$ has degree $\subseteq J$ for a downset $J$ if $G_{i}$ is trivial whenever $i\notin J$ .

Note that the commutator condition implies nested subgroups are normal within each other. We next define degree, degree-rank, and multidegree filtrations.

Definition 2.4.

Given $d\in\mathbb{N}$ , we say a group $G$ is given a degree filtration of degree $d$ if:

•

$G$ is given a $\mathbb{N}$ -filtration $(G_{i})_{i\in\mathbb{N}}$ with degree $\leq d$ ;
•

$G_{0}=G_{1}$ .

Given $(d,r)\in\mathbb{N}^{2}$ with $0\leq r\leq d$ , $G$ is given a degree-rank filtration of degree-rank $(d,r)$ if:

•

$G$ is given a $\mathrm{DR}$ -filtration $(G_{i})_{i\in\mathrm{DR}}$ with degree $\leq(d,r)$ ;
•

$G_{(0,0)}=G_{(1,0)}$ and $G_{(i,0)}=G_{(i,1)}$ for $i\geq 1$ . (We also let $G_{(i,j)}=G_{(i+1,0)}$ for $j>i$ .)

The associated degree filtration with respect to this degree-rank filtration is $(G_{(i,0)})_{i\geq 0}$ .

Given $(d_{1},\ldots,d_{k})\in\mathbb{N}^{k}$ , $G$ is given a multidegree filtration of multidegree $J$ (where $J\subseteq\mathbb{N}^{k}$ is a downset) if:

•

$G$ is given a $\mathbb{N}^{k}$ -filtration $(G_{i})_{i\in\mathbb{N}^{k}}$ with degree $\subseteq J$ ;
•

$G_{\vec{0}}=\bigvee_{i=1}^{k}G_{\vec{e_{i}}}$ .

The associated degree filtration with respect to the multidegree filtration is $(\bigvee_{|\vec{i}|=i}G_{\vec{i}})_{i\geq 0}$ . Here $|\vec{i}|=i_{1}+\ldots+i_{k}$ .

Remark.

This definition imposes some additional equalities of subgroups in order to say a group is given a degree-rank filtration versus a $\mathrm{DR}$ -filtration (for example). In particular, the concept of “degree-rank” filtration and $\mathrm{DR}$ -filtration are distinct. The difference is minor, but causes a number of technical checks to be required, most notably in Appendix C. We will almost exclusively operate with these additional conditions; this is so that we can invoke equidistribution theory safely.

We now define polynomial sequences of an $I$ -filtered group. The notion of a polynomial sequence for a group $G$ given a degree-rank filtration will be the same as treating this ordering as a $\mathrm{DR}$ -filtration; the same applies for degree and multidegree filtrations.

Definition 2.5.

Given $g\colon H\to G$ a map between groups (not necessarily a homomorphism) and $h\in H$ , we define the derivative $\partial_{h}g\colon H\to G$ via $\partial_{h}g(n)=g(hn)g(n)^{-1}$ for all $n\in H$ . If $H,G$ are $I$ -filtered, we say that this map $g$ is polynomial if for all $m\geq 0$ and $i_{1},\ldots,i_{m}\in I$ , we have

\partial_{h_{1}}\cdots\partial_{h_{m}}g(n)\in G_{i_{1}+\cdots+i_{m}}

for all choices of $h_{j}\in H_{i_{j}}$ and $n\in H_{0}$ . The space of all polynomial maps with respect to this data is denoted $\operatorname{poly}(H_{I}\to G_{I})$ .

We will require various general properties of polynomial sequences established in [34, Appendix B]. We will only consider $H=\mathbb{Z}^{k}$ for $k\geq 1$ and the following $I$ -filtrations on $H$ .

Definition 2.6.

We define the following filtrations on $H=\mathbb{Z}^{k}$ :

•

The (domain) degree filtration is with $I=\mathbb{N}$ the degree ordering and $H_{0}=H_{1}=\mathbb{Z}^{k}$ , and $H_{i}=\{0\}$ for $i\geq 2$ ;
•

The (domain) multidegree filtration is with $I=\mathbb{N}^{k}$ the multidegree ordering, $H_{\vec{0}}=\mathbb{Z}^{k}$ , $H_{\vec{e}_{i}}=\mathbb{Z}\vec{e}_{i}$ for $i\in[k]$ , and $H_{\vec{v}}=\{0\}$ otherwise, where $\vec{e}_{i}$ forms the standard basis of $\mathbb{Z}^{k}$ ;
•

The (domain) degree-rank filtration is with $I=\mathrm{DR}$ the degree-rank ordering and $H_{(0,0)}=H_{(1,0)}=\mathbb{Z}^{k}$ and $H_{(d,r)}=\{0\}$ otherwise.

We now define the notion of a nilmanifold, which is essentially a compact quotient of a filtered nilpotent Lie group.

Definition 2.7.

We define an $I$ -filtered nilmanifold $G/\Gamma$ to be the data of a connected, simply connected nilpotent Lie group $G$ with $I$ -filtration (of Lie subgroups) and discrete cocompact subgroup $\Gamma\leqslant G$ which is rational with respect to $G_{I}$ (i.e., $\Gamma_{i}:=\Gamma\cap G_{i}$ is cocompact in $G_{i}$ for all $i\in I$ ). We say it has degree $\leq d$ or $\subseteq J$ if $G$ has degree $\leq d$ or $\subseteq J$ .

If $I=\mathbb{N}$ and the $I$ -filtration is furthermore a degree filtration with degree $\leq d$ , then $G/\Gamma$ is a degree $d$ nilmanifold. If $I=\mathrm{DR}$ and the $I$ -filtration is furthermore a degree-rank filtration with degree $\leq(d,r)$ , then $G/\Gamma$ is a degree-rank $(d,r)$ nilmanifold. Finally if $I=\mathbb{N}^{k}$ and the $I$ -filtration is furthermore a multidegree filtration with degree $\subseteq J$ , then $G/\Gamma$ is a multidegree $J$ nilmanifold.

Remark.

Note that $\Gamma$ can naturally be given the structure of an $I$ -filtered group $\Gamma_{I}$ .

We finally (very occasionally) will require the lower central series of a group $G$ .

Definition 2.8.

Given a nilpotent group $G$ , define the lower central series inductively via $G_{(0)}=G_{(1)}=G$ and $G_{(i+1)}=[G,G_{(i)}]$ . The step of $G$ is the minimal $j$ such that $G_{(j+1)}=\mathrm{Id}_{G}$ .

2.3. Horizontal tori and Taylor coefficients

The next notion, that of a horizontal character, plays a vital role when discussing the equidistribution of nilsequences.

Definition 2.9.

Given a connected, simply connected nilpotent group $G$ and a discrete, cocompact subgroup $\Gamma$ , a horizontal character $\eta$ is a continuous homomorphism $\eta\colon G\to\mathbb{R}$ such that $\eta(\Gamma)\subseteq\mathbb{Z}$ . We say a horizontal character is nontrivial when $\eta$ is not identically zero.

Remark.

Throughout the literature on nilmanifolds, horizontal characters are continuous homomorphisms $\eta\colon G\to\mathbb{R}/\mathbb{Z}$ such that $\eta$ annihilates $\Gamma$ . It is straightforward to prove (via using Mal’cev bases) that these two notions are identical up to taking $~{}\mathrm{mod}~{}1$ . The reason we operate with the above definition is that the kernel of $\eta$ as defined is then a subspace of $G/[G,G]\simeq\mathbb{R}^{\dim(G)-\dim([G,G])}$ .

We next require the notion of horizontal tori with respect to a degree-rank filtration. These tori will play a starring role in Sections 8, 9, and 10; our definition is exactly that of [34, Definition 9.6].

Definition 2.10.

Let $G$ be a degree-rank filtered nilpotent Lie group with filtration $G_{\mathrm{DR}}=(G_{(d,r)})_{(d,r)\in\mathrm{DR}}$ . Given a subgroup $\Gamma$ of $G$ , we define various horizontal tori for $i\geq 1$ as

	$\displaystyle\operatorname{Horiz}_{i}(G)$	$\displaystyle:=G_{(i,1)}/G_{(i,2)},$
	$\displaystyle\operatorname{Horiz}_{i}(\Gamma)$	$\displaystyle:=(\Gamma\cap G_{(i,1)})/(\Gamma\cap G_{(i,2)}),$
	$\displaystyle\operatorname{Horiz}_{i}(G/\Gamma)$	$\displaystyle:=\operatorname{Horiz}_{i}(G)/\operatorname{Horiz}_{i}(\Gamma).$

Given a polynomial sequence $g\in\operatorname{poly}(\mathbb{Z}_{\mathrm{DR}}\to G_{\mathrm{DR}})$ we define the $i$ -th horizontal Taylor coefficient to be

	$\displaystyle\operatorname{Taylor}_{i}(g)$	$\displaystyle:=\partial_{1}\cdots\partial_{1}g(n)~{}\mathrm{mod}~{}G_{(i,2)}\in\operatorname{Horiz}_{i}(G),$
	$\displaystyle\operatorname{Taylor}_{i}(g\Gamma)$	$\displaystyle:=\operatorname{Taylor}_{i}(g)~{}\mathrm{mod}~{}\operatorname{Horiz}_{i}(\Gamma)\in\operatorname{Horiz}_{i}(G/\Gamma),$

where we take $i$ iterated derivatives.

We also require the notion of $i$ -th horizontal characters.

Definition 2.11.

Consider a nilmanifold $G/\Gamma$ with a degree-rank filtration. A continuous homomorphism $\eta\colon G_{(i,1)}\to\mathbb{R}$ is an $i$ -th horizontal character if $\eta(G_{(i,2)})=0$ and $\eta(G_{(i,1)}\cap\Gamma)\subseteq\mathbb{Z}$ .

The name Taylor coefficient is also used in the context of Taylor coefficients of polynomial factorizations. The following elementary lemma relates these two notions; we remark that a very closely related proof appears in [26, Lemma A.8].

Lemma 2.12.

Let $G$ be given a degree-rank filtration of degree-rank $(d,r)$ and consider a sequence $g\in\operatorname{poly}(\mathbb{Z}_{\mathrm{DR}}\to G_{\mathrm{DR}})$ . Then we may write $g(n)=\prod_{i=0}^{d}g_{i}^{\binom{n}{i}}$ for elements $g_{i}\in G_{(i,0)}$ and for $1\leq i\leq d$ we have

\operatorname{Taylor}_{i}(g)=g_{i}~{}\mathrm{mod}~{}G_{(i,2)}.

Proof.

The representation of $g(n)$ in the specified product form follows immediately from the existence of Taylor expansion, see [34, Lemma B.9].

We next prove $\operatorname{Taylor}_{j}(g)=g_{j}~{}\mathrm{mod}~{}G_{(j,2)}$ for each $1\leq j\leq d$ individually. Notice that it suffices to consider $\widetilde{g}(n)$ which is $g(n)~{}\mathrm{mod}~{}G_{(j,2)}$ , i.e., we consider the group $G/G_{(j,2)}$ with quotiented filtration. This group is easily seen to be at most $j$ -step nilpotent and furthermore $[G_{(i,1)}/G_{(j,2)},G_{(j-i,1)}/G_{(j,2)}]=\mathrm{Id}_{G/G_{(j,2)}}$ for $0\leq i\leq j$ (one should check the cases $j=1$ and $i\in\{0,j\}$ manually). Let $\widetilde{G}_{i}=G_{(i,1)}/G_{(j,2)}$ for $0\leq i\leq j$ and note $\widetilde{G}_{0}=\widetilde{G}_{1}$ .

We see that $\widetilde{G}_{0}\geqslant\cdots\geqslant\widetilde{G}_{j}$ is an $\mathbb{N}$ -filtration for $\widetilde{G}_{0}$ with $[\widetilde{G}_{i},\widetilde{G}_{j-i}]=\mathrm{Id}_{\widetilde{G}_{0}}$ for all $0\leq i\leq j$ . Note that $\widetilde{g}(n)=\prod_{i=0}^{j}\widetilde{g}_{i}^{\binom{n}{i}}$ where $\widetilde{g}_{i}$ is $g_{i}~{}\mathrm{mod}~{}G_{(j,2)}$ .

It suffices to prove the claim that $\widetilde{g}(n+1)\widetilde{g}(n)^{-1}=\prod_{i=0}^{j-1}(\widetilde{g}_{i}^{\prime})^{\binom{n}{i}}$ with $\widetilde{g}_{i}^{\prime}\in\widetilde{G}_{i+1}$ and $\widetilde{g}_{j-1}^{\prime}=\widetilde{g}_{j}$ . If this is the case, then we may modify the filtration $\widetilde{G}_{0}\geqslant\widetilde{G}_{1}\geqslant\cdots\geqslant\widetilde{G}_{j}$ by stripping off the top group, which maintains the necessary inductive properties. Iterating this procedure $j$ times we obtain the desired Taylor equality.

This claim is a consequence of the Taylor expansion for general polynomial sequences and the Baker–Campbell–Hausdorff formula and counting the depths of nested commutators. The crucial reason that $\widetilde{g}_{j-1}^{\prime}=\widetilde{g}_{j}$ is that any “higher order” terms which arise in the Baker–Campbell–Hausdorff formula and could contribute are in fact annhilated due to $[\widetilde{G}_{i},\widetilde{G}_{j-i}]=\mathrm{id}_{\widetilde{G}_{0}}$ for $0\leq i\leq j$ . ∎

We also have the following linearity of the $i$ -th Taylor coefficients.

Lemma 2.13.

Assume the setup of Lemma 2.12. We have

\operatorname{Taylor}_{i}(gh)=\operatorname{Taylor}_{i}(g)+\operatorname{Taylor}_{i}(h)

and if

g(n)=\exp\bigg{(}\sum_{i=0}^{d}g_{i}\binom{n}{i}\bigg{)}

for $g_{i}\in\log(G_{i})$ we have

\operatorname{Taylor}_{i}(g)=\exp(g_{i})~{}\mathrm{mod}~{}G_{(i,2)}.

Remark 2.14.

Note that $G_{(i,1)}/G_{(i,2)}$ is abelian and hence additive notation may be used when considering Taylor coefficients.

Proof.

The first claim follows from Lemma 2.12, the Baker–Campbell–Hausdorff formula, and the commutator relationship that $[G_{(i,0)},G_{(j-i,0)}]=[G_{(i,1)},G_{(j-i,1)}]\subseteq G_{(j,2)}$ . (Note that this is using $G_{(0,0)}=G_{(0,1)}=G_{(1,0)}$ in the case $i=0$ .)

For the second claim, suppose that

g(n)=\exp\bigg{(}\sum_{i=0}^{d}g_{i}\binom{n}{i}\bigg{)}=\prod_{i=0}^{s}(g_{i}^{\prime})^{\binom{n}{i}}.

Via iterated applications of the Baker–Campbell–Hausdorff formula and the commutator relationship that $[G_{(i,0)},G_{(j-i,0)}]=[G_{(i,1)},G_{(j-i,1)}]\subseteq G_{(j,2)}$ , we see that $g_{j}^{\prime}=\exp(g_{j})~{}\mathrm{mod}~{}G_{(j,2)}$ and the result follows. ∎

2.4. Vertical tori and nilcharacters

Given a polynomial sequence $g$ on an $I$ -filtered Lie group with $I=\mathbb{N}$ , we can define a sequence of vectors by considering a smooth vector-valued function $F$ on $G/\Gamma$ and looking at $F(g(n)\Gamma)$ . However, we will be particularly interested in those which “have a Fourier coefficient” with respect to various subgroups of the center.

Definition 2.15.

Consider a nilmanifold $G/\Gamma$ and a function $F\colon G/\Gamma\to\mathbb{C}$ . Given a connected, simply connected subgroup $T$ of the center $Z(G)$ which is rational (i.e., $\Gamma\cap T$ is cocompact in $T$ ) and a continuous homomorphism $\eta\colon T\to\mathbb{R}$ such that $\eta(T\cap\Gamma)\subseteq\mathbb{Z}$ , if

F(gx)=e(\eta(g))F(x)\emph{ for all }g\in T

we say that $F$ has a $T$ -vertical character (or $T$ -vertical frequency) $\eta$ .

Remark.

Note that $T/(\Gamma\cap T)$ is isomorphic to a torus and thus one can modify functions under consideration to have vertical characters via appropriate Fourier decomposition.

A particular case which will arise frequently in our applications comes from the fact that given a filtration satisfying the conditions of Definition 2.4, we have that the “bottom group” is contained in the center. For example, a group $G$ given a degree filtration of degree $d$ satisfies $[G,G_{d}]=[G_{1},G_{d}]=\mathrm{Id}_{G}$ hence $G_{d}\leqslant Z(G)$ . One special class of functions with a vertical frequency which will be of particular importance is that of nilcharacters.

Definition 2.16.

A nilcharacter of degree $d$ and output dimension $D$ is the following data. Consider an $I$ -filtered nilmanifold $G/\Gamma$ of degree $d$ such that $[G,G_{d}]=\mathrm{Id}_{G}$ and an $I$ -filtered abelian group $H$ . Let $g\in\operatorname{poly}(H_{I}\to G_{I})$ and consider function $F\colon G/\Gamma\to\mathbb{C}^{D}$ such that:

•

$\lVert F(x)\rVert_{2}=1$ for all $x\in G/\Gamma$ pointwise;
•

$F(g_{d}x)=e(\eta(g_{d}))F(x)$ for all $g_{d}\in G_{d}$ where $\eta$ is some continuous homomorphism $G_{d}\to\mathbb{R}$ such that $\eta(\Gamma\cap G_{d})\subseteq\mathbb{Z}$ .

The values of the nilcharacter are given by $\chi\colon H\to\mathbb{C}^{D}$ where $\chi(n)=F(g(n)\Gamma)$ for $n\in H$ .

Remark.

We work with vector-valued nilcharacters for precisely the same topological reason given in [34, p. 1254].

2.5. Additional miscellaneous conventions

We end this section with a brief discussion of various miscellaneous conventions. Throughout the paper we use $\{\cdot\}$ to denote the map $\mathbb{R}\to(-1/2,1/2]$ (or $\mathbb{R}/\mathbb{Z}\to(-1/2,1/2]$ , abusively) which takes the representative $~{}\mathrm{mod}~{}1$ closest to $0$ . Furthermore given $x\in\mathbb{R}/\mathbb{Z}$ and $y\in\mathbb{R}$ we will treat $x-y\in\mathbb{R}/\mathbb{Z}$ in the obvious manner. As used above, we let $e\colon\mathbb{R}/\mathbb{Z}\to\mathbb{C}$ denote the exponential function $e(x)=\exp(2\pi ix)$ , which is lifted to $\mathbb{R}$ in the obvious manner.

We use standard asymptotic notation. Given functions $f=f(n)$ and $g=g(n)$ , we write $f=O(g)$ , $f\ll g$ , $g=\Omega(f)$ , or $g\gg f$ to mean that there is a constant $C$ such that $|f(n)|\leq Cg(n)$ for sufficiently large $n$ . We write $f\asymp g$ or $f=\Theta(g)$ to mean that $f\ll g$ and $g\ll f$ , and write $f=o(g)$ or $g=\omega(f)$ to mean $f(n)/g(n)\to 0$ as $n\to\infty$ . Subscripts indicate dependence on parameters.

Finally in various arguments throughout the paper it will be convenient to denote appropriately bounded functions as $b(n)$ or $b(n_{1},\ldots,n_{k})$ , and $B(n),B(n_{1},\ldots,n_{k})$ when vector-valued. When using such notation, the functions $b,B$ may change from line to line and within a line may refer to different functions.

3. Various complexity notions

3.1. Rationality of bases and Lipschitz norms

We will now discuss the definitions chosen for complexity of nilmanifolds. We start by defining first- and second-kind coordinates given a basis $\mathcal{X}$ for $\log G$ .

Definition 3.1.

Consider a connected, simply connected nilpotent Lie group $G$ of dimension $d$ . Given a basis $\mathcal{X}=\{X_{1},\ldots,X_{d}\}$ of $\log G$ and $g\in G$ , there exists $(t_{1},\ldots,t_{d})\in\mathbb{R}^{d}$ such that

g=\exp(t_{1}X_{1}+t_{2}X_{2}+\cdots+t_{d}X_{d}).

We define Mal’cev coordinates of first-kind $\psi_{\exp}=\psi_{\exp,\mathcal{X}}\colon G\to\mathbb{R}^{d}$ for $g$ relative to $\mathcal{X}$ by

\psi_{\exp}(g):=(t_{1},\ldots,t_{d}).

Given $g\in G$ there also exists $(u_{1},\ldots,u_{d})\in\mathbb{R}^{d}$ such that

g=\exp(u_{1}X_{1})\cdots\exp(u_{d}X_{d}),

and we define the Mal’cev coordinates of second-kind $\psi=\psi_{\mathcal{X}}\colon G\to\mathbb{R}^{d}$ for $g$ relative to $\mathcal{X}$ by

\psi(g):=(u_{1},\ldots,u_{d}).

Note that the above definition does not account for the cocompact subgroup $\Gamma$ . The next set of definitions account for how “rational” $\mathcal{X}$ is with respect to itself and $\Gamma$ .

Definition 3.2.

The height of a number $x$ is $\max(|a|,|b|)$ if $x=a/b$ with $\gcd(a,b)=1$ and $\infty$ if $x$ is irrational.

Definition 3.3.

Given a nilmanifold $G/\Gamma$ of dimension $d$ , consider a basis $\mathcal{X}=\{X_{1},\ldots,X_{d}\}$ of $\mathfrak{g}=\log G$ . $\mathcal{X}$ is said to be a weak basis of rationality $Q$ with respect to $\Gamma$ if:

•

There exist rationals $c_{ijk}$ of height at most $Q$ such that

$[X_{i},X_{j}]=\sum_{k}c_{ijk}X_{k};$
•

There exists integer $1\leq q\leq Q$ such that

$q\cdot\mathbb{Z}^{d}\subseteq\psi_{\mathrm{exp},\mathcal{X}}(\Gamma)\subseteq q^{-1}\cdot\mathbb{Z}^{d}.$

$\mathcal{X}$ is a Mal’cev basis of $\log G$ with respect to $\Gamma$ of rationality $Q$ if:

•

There exist rationals $c_{ijk}$ of height at most $Q$ such that

$[X_{i},X_{j}]=\sum_{k}c_{ijk}X_{k};$
•

$\psi_{\mathcal{X}}(\Gamma)=\mathbb{Z}^{d}$ .

We say that $\mathcal{X}$ has the degree $k$ nesting property if there exist $\ell_{1}\leq\cdots\leq\ell_{k}$ such that if $\mathfrak{g}_{t}=\operatorname{span}_{\mathbb{R}}(X_{\ell_{t}+1},\ldots,X_{m})$ then $[\mathfrak{g},\mathfrak{g}]\subseteq\mathfrak{g}_{1}$ , $[\mathfrak{g},\mathfrak{g}_{\ell}]\subseteq\mathfrak{g}_{\ell+1}$ and $[\mathfrak{g},\mathfrak{g}_{k}]=0$ .

Finally we say that a Mal’cev basis is adapted to a sequence of nesting subgroups $G=G_{0}\geqslant G_{1}\geqslant G_{2}\geqslant\cdots\geqslant G_{\ell}\geqslant\mathrm{Id}_{G}$ if

\operatorname{span}_{\mathbb{R}}(\{X_{j}\colon d-\dim(G_{i})<j\leq d\})=\log G_{i}

for $1\leq i\leq\ell$ .

We next state the definition of the Lipschitz property for a function on $G/\Gamma$ .

Definition 3.4.

We define a metric $d=d_{G,\mathcal{X}}$ on $G$ by

d(x,y):=\inf\bigg{\{}\sum_{i=1}^{n}\min(\lVert\psi(x_{i}x_{i+1}^{-1})\rVert,\lVert\psi(x_{i+1}x_{i}^{-1})\rVert)\colon n\in\mathbb{N},x_{1},\ldots,x_{n+1}\in G,x_{1}=x,x_{n+1}=y\bigg{\}},

where $\lVert\cdot\rVert$ denotes the $\ell^{\infty}$ -norm on $\mathbb{R}^{m}$ , and define a metric on $G/\Gamma$ by

d(x\Gamma,y\Gamma)=\inf_{\gamma,\gamma^{\prime}\in\Gamma}d(x\gamma,y\gamma^{\prime}).

Furthermore, for any function $F\colon G/\Gamma\to\mathbb{C}$ we define

\lVert F\rVert_{\mathrm{Lip}}:=\lVert F\rVert_{\infty}+\sup_{x\neq y\in G/\Gamma}\frac{|F(x)-F(y)|}{d(x,y)}.

Given a function $F\colon G/\Gamma\to\mathbb{C}^{D}$ such that $F=(F_{1},\ldots,F_{D})$ we define

\lVert F\rVert_{\mathrm{Lip}}:=\max_{1\leq i\leq D}\lVert F_{i}\rVert_{\mathrm{Lip}}.

Remark.

Note that the metric on $G$ is right-invariant. We may omit the subscript $\mathcal{X}$ for the distance function when clear from context.

3.2. Complexity of nilmanifolds

We now define the complexity of a nilmanifold with respect to either a degree or a degree-rank filtration.

Definition 3.5.

Let $s\geq 1$ be an integer and let $M\geq 1$ . A nilmanifold $G/\Gamma$ of degree $s$ , dimension $d$ , and complexity at most $M$ consists of a degree $s$ filtration of $G$ along with a Mal’cev basis $\mathcal{X}=\{X_{1},\ldots,X_{d}\}$ of $\log G$ which satisfies the following:

•

$\{X_{1},\ldots,X_{d}\}$ is a Mal’cev basis for $\log G$ with respect to $\Gamma$ of rationality at most $M$ ;
•

$\mathcal{X}$ is adapted to the sequence of subgroups $(G_{i})_{i\in\mathbb{N}}$ .

Analogously a nilmanifold $G/\Gamma$ of degree-rank $(s,r)$ , dimension $d$ , and complexity at most $M$ consists of a degree-rank $(s,r)$ filtration of $G$ along with a Mal’cev basis $\mathcal{X}=\{X_{1},\ldots,X_{d}\}$ of $\log G$ which satisfies the following:

•

$\{X_{1},\ldots,X_{d}\}$ is a Mal’cev basis for $\log G$ with respect to $\Gamma$ of rationality at most $M$ ;
•

$\mathcal{X}$ is adapted to the sequence of subgroups $(G_{i})_{i\in\mathrm{DR}}$ .

Remark.

The only difference in complexity for a degree versus degree-rank filtration is that we require the Mal’cev basis to be adapted with respect to the appropriate filtration. This definition unfortunately does not extend to the case of multidegree filtrations since the set of subgroups do not nest in a total order. Furthermore note that a degree-rank nilmanifold of complexity $M$ is also a degree nilmanifold of the same complexity by taking the associated degree filtration.

Finally, whenever discussing the complexity of nilmanifolds, this is always with respect to a given Mal’cev basis $\mathcal{X}$ . We will abusively write phrases such as “nilmanifold $G/\Gamma$ of complexity $M$ ” throughout the paper; such a statement should always be understood with a corresponding implicitly provided adapted Mal’cev basis of the Lie algebra.

Remark.

We will also in passing require the notion of a degree $0$ nilmanifold. A degree $0$ nilmanifold is simply the trivial group $\mathrm{Id}_{G}$ . All scalar-valued functions on degree $0$ nilmanifolds are constants and the Lipschitz norm is defined to be the absolute value of this constant.

We will next need the notion of a rational subgroup with respect to a Mal’cev basis; this will be crucial when giving the definition of complexity with respect to a multidegree filtration.

Definition 3.6.

A closed, connected subgroup $G^{\prime}\leqslant G$ is $Q$ -rational with respect to a basis $\mathcal{X}=\{X_{1},\ldots,X_{m}\}$ of $\log G$ if $\log G^{\prime}$ has a basis $\mathcal{X}^{\prime}=\{X_{1}^{\prime},\ldots,X_{m^{\prime}}^{\prime}\}$ where $X_{i}^{\prime}=\sum_{j=1}^{m}c_{ij}X_{j}$ for $1\leq i\leq m^{\prime}$ with $c_{ij}\in\mathbb{Q}$ having heights bounded by $Q$ .

We will repeatedly use the following fact about rational subgroups without further comment.

Fact 3.7.

Suppose $G$ is a connected, simply connected nilpotent Lie group of step $s$ and dimension $d$ with a discrete cocompact subgroup $\Gamma$ . Suppose that $G/\Gamma$ has a weak basis $\mathcal{X}$ of rationality at most $Q$ . Let $H_{1},\ldots,H_{j}$ be subgroups which are each $Q$ -rational and normal in $G$ . Then

H=\bigvee_{i=1}^{j}H_{i}

is an $O_{s}(Q^{O_{s}(d^{O_{s}(1)})})$ -rational subgroup.

Proof.

Let $\mathcal{X}^{i}$ denote the underlying basis of $H_{i}$ witnessing low height. By applying Baker–Campbell–Hausdorff, we have that $\log H$ is spanned by taking all $(\leq s)$ -fold commutators of elements in $\mathcal{X}^{i}$ (possibly for different $i$ ). Each such element of the Lie algebra is easily seen to be a $O_{s}(Q^{O_{s}(d^{O_{s}(1)})})$ -rational combination of $\mathcal{X}$ (using the weak basis property of $\mathcal{X}$ ). Taking a subset of these commutators which forms a basis of $\log H$ gives the desired result. ∎

We are now in position to define the complexity of a multidegree nilsequence. This definition is admittedly rather artificial but is designed to be the most flexible given various lemmas scattered throughout the literature.

Definition 3.8.

Consider a downset $J$ with respect to the multidegree ordering on $\mathbb{N}^{k}$ . Consider a group $G$ with a multidegree filtration of degree $\subseteq J$ . Recall the associated degree filtration

G_{i}=\bigvee_{\vec{v}:|\vec{v}|=i}G_{\vec{v}}

and define the associated degree to be $\sup_{\vec{v}\in J}|\vec{v}|$ . We say a multidegree $J$ nilmanifold $G/\Gamma$ of dimension $d$ with Mal’cev basis $\mathcal{X}$ has complexity at most $M$ if:

•

$\{X_{1},\ldots,X_{d}\}$ is a Mal’cev basis for $\log G$ with respect to $\Gamma$ of rationality at most $M$ ;
•

$\mathcal{X}$ is adapted to the sequence of subgroups $(G_{i})_{i\in\mathbb{N}}$ ;
•

$G_{\vec{v}}$ is an $M$ -rational subgroup for all $\vec{v}\in\mathbb{N}^{k}$ .

We next note the trivial fact that complexity is bounded appropriately with respect to taking direct products; we implicitly invoke this when handling the complexity of direct products.

Fact 3.9.

Consider nilmanifolds $G/\Gamma$ , $H/\Gamma^{\prime}$ given degree $s$ filtrations $(G_{i})_{i\geq 0}$ , $(H_{i})_{i\geq 0}$ and adapted Mal’cev bases $\mathcal{X},\mathcal{X}^{\prime}$ each of complexity at most $M$ . Then $(G\times H)/(\Gamma\times\Gamma^{\prime})$ has complexity at most $M$ with respect to the Mal’cev basis

\mathcal{X}^{\ast}=\{(X,0)\colon X\in\mathcal{X}\}\cup\{(0,X^{\prime})\colon X^{\prime}\in\mathcal{X}^{\prime}\}.

$\mathcal{X}^{\ast}$ may be adapted to the degree $s$ filtration $G_{i}\times H_{i}$ by creating an ordering with suffixes

\displaystyle\big{\{}(X_{j},0)\colon X_{j}\in\mathcal{X},0\leq\dim(G)-j<\dim(G_{i})\big{\}}\cup\big{\{}(0,X_{j}^{\prime})\colon X_{j}^{\prime}\in\mathcal{X}^{\prime},0\leq\dim(H)-j<\dim(H_{i})\big{\}}.

Furthermore given $F\colon G/\Gamma\to\mathbb{C}$ and $F^{\prime}\colon H/\Gamma^{\prime}\to\mathbb{C}$ which are $M$ -Lipschitz,

\widetilde{F}((g,h)(\Gamma\times\Gamma^{\prime})):=F(g\Gamma)F^{\prime}(h\Gamma^{\prime})

is $3M^{2}$ -Lipschitz on $(G\times H)/(\Gamma\times\Gamma^{\prime})$ . Analogous statements hold for degree-rank filtrations and multidegree filtrations.

We finally end by noting that quotients by normal subgroups of bounded rationality have appropriate complexity.

Lemma 3.10.

Consider a nilmanifold $G/\Gamma$ with $G$ given a degree $s$ filtration $(G_{i})$ and of complexity at most $M$ with respect to an adapted Mal’cev basis $\mathcal{X}$ .

Suppose that $H$ is a normal subgroup of $G$ which is $M$ -rational with respect to $\mathcal{X}$ . Then the quotient nilmanifold $(G/H)/(\Gamma/(\Gamma\cap H))$ may be given an adapted Mal’cev basis $\mathcal{X}^{\ast}$ , where the degree $s$ filtration is $(G_{i}/(G_{i}\cap H))$ , which is an $M^{O_{s}(d^{O_{s}(1)})}$ -rational combination of

\mathcal{X}^{\prime}=\{X~{}\mathrm{mod}~{}\log H\colon X\in\mathcal{X}\}.

Analogous statements hold for degree-rank filtrations and multidegree filtrations. Finally if $H\leqslant Z(G)$ and $F$ is an $M$ -Lipschitz function on $G/\Gamma$ which is $H$ -invariant then $F$ descends to $(G/H)/(\Gamma/(\Gamma\cap H))$ and is $M^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz with respect to $\mathcal{X}^{\ast}$ .

Proof.

We may find a subset $S$ such that

\mathcal{X}^{\prime}=\{X_{i}~{}\mathrm{mod}~{}\log H\colon X_{i}\in\mathcal{X},i\in S\}

is a basis for $\log(G/H)$ . Since $H$ is $M$ -rational with respect to $\mathcal{X}$ , it follows from Cramer’s rule that for $j\not\in S$ , $X_{j}~{}\mathrm{mod}~{}\log H$ is a $M^{O_{s}(d^{O_{s}(1)})}$ -combination of $X_{i}~{}\mathrm{mod}~{}\log H$ with $i\in S$ . Hence, $\mathcal{X}^{\prime}$ is a weak Mal’cev basis for $(G/H)/(\Gamma/(\Gamma\cap H))$ of rationality $M^{O_{s}(d^{O_{s}(1)})}$ . By [42, Lemma B.11], we may find a Mal’cev basis adapted to $(G/H)/(\Gamma/(\Gamma\cap H))$ with complexity $M^{O_{s}(d^{O_{s}(1)})}$ . Now, if $H\leqslant Z(G)$ and $F$ is $M$ -Lipschitz on $G/\Gamma$ which is $H$ -invariant, it follows trivially that $F$ descends to a function $\overline{F}$ on $(G/H)/(\Gamma/(\Gamma\cap H))$ . The Lipschitz bounds for $\overline{F}$ follow from [42, Lemma B.3]. ∎

3.3. Size of vertical and horizontal characters

We now define the size of vertical and horizontal characters. We first define the size of a horizontal character.

Definition 3.11.

Given a nilmanifold $G/\Gamma$ and a Mal’cev basis $\mathcal{X}$ , note that any horizontal character $\eta\colon G\to\mathbb{R}$ can be expressed in the form

\eta(g)=k\cdot\psi(g)

for some $k\in\mathbb{Z}^{\dim(G)}$ . We define the size of the horizontal character as $\lVert k\rVert_{\infty}$ .

We next define the size of an $i$ -th horizontal character.

Definition 3.12.

Consider a nilmanifold $G/\Gamma$ with $G$ given a degree-rank filtration of degree-rank $(s,r)$ and a Mal’cev basis $\mathcal{X}=\{X_{1},\ldots,X_{\dim(G)}\}$ adapted to the degree-rank filtration. Note that any $i$ -th horizontal character $\eta_{i}\colon G_{(i,1)}\to\mathbb{R}$ can be expressed in the form

\eta_{i}(g)=k\cdot\psi(g)

with $k\in\mathbb{Z}^{\dim(G)}$ for some $k$ which is nonzero only on coordinates between $\dim(G)-\dim(G_{(i,1)})<j\leq\dim(G)-\dim(G_{(i,2)})$ . We define the size of the $i$ -th horizontal character as $\lVert k\rVert_{\infty}$ .

We finally define the size of a vertical character.

Definition 3.13.

Consider a nilmanifold $G/\Gamma$ with $G$ given a degree filtration of degree $k$ and a Mal’cev basis $\mathcal{X}=\{X_{1},\ldots,X_{\dim(G)}\}$ adapted to the degree filtration. Consider a continuous vertical character $\xi\colon T\to\mathbb{R}$ from a rational subgroup $T\leqslant Z(G)$ . We define the height of $\xi$ as

\sup_{x\neq y\in T/(\Gamma\cap T)}\frac{|\xi(x)-\xi(y)|}{d_{G}(x\Gamma,y\Gamma)};

this will be denoted as $|\xi|$ .

Remark.

We now justify the terminology “height” given for the complexity of a vertical character. Suppose that $G/\Gamma$ has complexity $M$ (given $\mathcal{X}$ ) with respect to a degree filtration of degree $d$ and $T$ is $Q$ -rational. We have that $T$ has a Mal’cev basis which is a $(QM)^{O_{k}(d^{O(1)})}$ -rational combination of $\mathcal{X}$ by [42, Lemma B.12]; denote this $\mathcal{X}^{\prime}$ . By [42, Lemma B.9], we have that for $x,y\in T$ ,

d_{G,\mathcal{X}}(x\Gamma,y\Gamma)\leq(QM)^{O_{k}(d^{O(1)})}d_{T,\mathcal{X}^{\prime}}(x(\Gamma\cap T),y(\Gamma\cap T))\leq(QM)^{O_{k}(d^{O(1)})}d_{G,\mathcal{X}}(x\Gamma,y\Gamma).

With respect to $\mathcal{X}^{\prime}=\{X_{1}^{\prime},\ldots,X_{\dim(T)}^{\prime}\}$ , we have that $\xi$ is an integer vector and the definition of height is equivalent up to a multiplicative factor of $(QM)^{O_{k}(d^{O(1)})}$ to the height of this vector.

3.4. Correlation

We will also require the notion of a sequence being biased of some order.

Definition 3.14.

A function $f\colon[N]\to\mathbb{C}^{D}$ is $s$ -biased of correlation $\eta$ , complexity $M$ , and dimension $d$ if there exists a nilmanifold $G/\Gamma$ with a degree $s$ filtration such that $G$ has dimension at most $d$ , $G/\Gamma$ has complexity at most $M$ , and there exists an $M$ -Lipschitz function $F$ and a polynomial sequence $g\in\operatorname{poly}(\mathbb{Z}_{\mathbb{N}}\to G_{\mathbb{N}})$ such that

\lVert\mathbb{E}_{n\in[N]}[f(n)\overline{F(g(n)\Gamma)}]\rVert_{\infty}\geq\eta.

We will denote this as $f\in\operatorname{Corr}(s,\eta,M,d)$ .

3.5. Miscellaneous complexity notions

We will also require the following definition regarding smoothness norms of polynomial sequences.

Definition 3.15.

Given $\vec{v}\in\mathbb{N}^{k}$ and $\vec{n}\in\mathbb{N}^{k}$ , we define

\binom{\vec{n}}{\vec{v}}=\prod_{i=1}^{k}\binom{n_{i}}{v_{i}}.

Any polynomial sequence $g\colon\mathbb{Z}^{k}\to\mathbb{R}$ can be expressed uniquely as

g(\vec{n})=\sum_{\vec{\ell}\in\mathbb{N}^{k}}\alpha_{\vec{\ell}}\binom{\vec{n}}{\vec{\ell}}

with $\alpha_{\vec{\ell}}\in\mathbb{R}$ . We define

\lVert g\rVert_{C^{\infty}[N]}:=\max_{\vec{\ell}\neq\vec{0}}N^{|\vec{\ell}|}\cdot\lVert\alpha_{\vec{\ell}}\rVert_{\mathbb{R}/\mathbb{Z}}

where $|\vec{\ell}|=\sum_{j=1}^{k}\ell_{j}$ .

Remark.

Note that the above definition is only sensitive to the values of $g~{}\mathrm{mod}~{}1$ .

We now define when a polynomial sequence is rational and smooth.

Definition 3.16.

Consider a nilmanifold $G/\Gamma$ given either a degree, degree-rank, or multidegree filtration with Mal’cev basis $\mathcal{X}$ and $g$ a domain $\mathbb{Z}^{k}$ polynomial sequence on $G$ with respect to the given filtration. We say that $g$ is $(M,N)$ -smooth if:

•

$d_{G,\mathcal{X}}(g(\vec{0}),\mathrm{id}_{G})\leq M$ ;
•

$d_{G,\mathcal{X}}(g(\vec{v}),g(\vec{v}+\vec{e}_{i}))\leq M\cdot N^{-1}$ for $\vec{v}\in[N]^{k}$ and $1\leq i\leq k$ .

We say that $g$ is $M$ -rational if there is $1\leq m\leq M$ such that for all $\vec{n}\in\mathbb{N}^{k}$ we have that

\psi_{\mathcal{X}}(g(\vec{n}))\in\frac{1}{m}\cdot\mathbb{Z}^{\dim(G)}.

4. Proof outline

We are now in position to discuss the proof of Theorem 1.2; as our proof is closely modeled on that of Green, Tao, and Ziegler [34], the announcement of [33] may prove a useful starting point for certain readers. For various parts of this outline we will restrict to the case of the $U^{5}$ -inverse theorem and discuss the proof as if the analysis were performed with bracket polynomials.

4.1. Induction on degree and additive quadruples

Suppose that $f\colon[N]\to\mathbb{C}$ is $1$ -bounded such that

\lVert f\rVert_{U^{5}[N]}\geq\delta.

Via the inductive definition of the Gowers norm, we have for $\delta^{O(1)}N$ values of $h\in[N]$ that

\lVert\Delta_{h}f\rVert_{U^{4}[N]}\geq\delta^{O(1)}.

Call this set of indices $H$ . Applying Theorem 1.2 inductively (when converted to bracket polynomials; see e.g. [45, Proposition 1.4]) we may choose $d_{1},d_{2},d_{3}\leq\log(1/\delta)^{O(1)}$ and coefficients $a_{i,h}$ etc. such that

	$\displaystyle\bigg{\|}\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\cdot e\bigg{(}$	$\displaystyle\sum_{i=1}^{d_{1}}a_{i,h}n[b_{i,h}n][c_{i,h}n]+\sum_{i=1}^{d_{2}}d_{i,h}n^{2}[e_{i,h}n]+\sum_{i=1}^{d_{3}}f_{i,h}n[g_{i,h}n]$
		$\displaystyle\qquad\qquad\qquad+j_{h}n^{3}+\ell_{h}n^{2}+m_{h}n\bigg{)}\bigg{\|}\geq\exp(-\log(1/\delta)^{O(1)});$

we have padded with extra coefficients to make the dimensions $d_{i}$ not $h$ -dependent. Set

\overline{G_{h}(n)}=e\bigg{(}\sum_{i=1}^{d_{1}}a_{i,h}n[b_{i,h}n][c_{i,h}n]+\sum_{i=1}^{d_{2}}d_{i,h}n^{2}[e_{i,h}n]+\sum_{i=1}^{d_{3}}f_{i,h}n[g_{i,h}n]+j_{h}n^{3}+\ell_{h}n^{2}+m_{h}n\bigg{)}.

For the sake of clarity, we will let $L_{h}(n)$ denote terms of degree $\leq 2$ which are possibly $h$ -dependent. We have

\bigg{|}\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\cdot e\bigg{(}\sum_{i=1}^{d_{1}}a_{i,h}n[b_{i,h}n][c_{i,h}n]+\sum_{i=1}^{d_{2}}d_{i,h}n^{2}[e_{i,h}n]+j_{h}n^{3}+L_{h}(n)\bigg{)}\bigg{|}\geq\exp(-\log(1/\delta)^{O(1)}).

The first crucial step, via a Cauchy–Schwarz argument due to Gowers [16] (see [32, Proposition 6.1] or Lemma 7.2) is that for many additive quadruples $(h_{1},h_{2},h_{3},h_{4})$ , i.e. $h_{1}+h_{2}=h_{3}+h_{4}$ , we have

|\mathbb{E}_{n\in[N]}G_{h_{1}}(n)G_{h_{2}}(n+h_{1}-h_{4})\overline{G_{h_{3}}(n)}\overline{G_{h_{4}}(n+h_{1}-h_{4})}|\geq\exp(-\log(1/\delta)^{O(1)}).

4.2. Sunflower and linearization for the top degree-rank

Via bracket polynomial manipulations, we see that the “top degree-rank” term of the above expression is

\sum_{i=1}^{d_{1}}(a_{i,h_{1}}n[b_{i,h_{1}}n][c_{i,h_{1}}n]+a_{i,h_{2}}n[b_{i,h_{2}}n][c_{i,h_{2}}n]-a_{i,h_{3}}n[b_{i,h_{3}}n][c_{i,h_{3}}n]-a_{i,h_{4}}n[b_{i,h_{4}}n][c_{i,h_{4}}n]).

The heart of the proof is demonstrating that these “top degree-rank terms line up” in an appropriate sense across a dense additive tuples in $H$ . Such a conclusion is at least plausible since for generic coefficients the associated bracket polynomial equidistributes $~{}\mathrm{mod}~{}1$ , which would violate the given condition on $G_{h_{1}}(n)G_{h_{2}}(n+h_{1}-h_{4})\overline{G_{h_{3}}(n)}\overline{G_{h_{4}}(n+h_{1}-h_{4})}$ . One possibility where the top degree-rank term is exactly zero is when we can write $a_{i,h_{1}}=a_{i}h_{1}$ , $b_{i,h_{1}}=b_{i}^{\ast}$ , $c_{i,h_{1}}=c_{i}^{\ast}$ . The heart of the matter is that, up to controlled modifications, this is the only way for that to occur in a robust sense.

The first modification is that we can replace in the above example the expression $a_{i,h_{1}}=a_{i}h_{1}$ with $a_{i,h_{1}}=\Theta_{i}\{\Theta_{i}^{\prime}h_{1}\}$ or more generally a bracket linear form. The second modification is that we may not get a description that respects the presented structure of the sum. Instead the coordinates of the bracket linear form may only appear in these “fixed”, “fixed”, “bracket linear” triples after a linear change of variables. We prove the existence of this structure in two steps, as in [34]. The first step proves that the bracket form is “fixed”, “fixed”, “ $h$ -dependent” and the second step then proves that the “ $h$ -dependent” part in fact has a bracket linear structure. These steps will fall under the names sunflower and linearization respectively.

4.3. Degree-rank iteration

Once we have learned this refined form for $\sum_{i=1}^{d_{1}}a_{i,h}n[b_{i,h}n][c_{i,h}n]$ , we iterate and then learn the refined form for the next highest degree-rank term $\sum_{i=1}^{d_{2}}d_{i,h}n^{2}[e_{i,h}n]$ , and then finally we learn the refined form for $j_{h}n^{3}$ . Given these refined forms, Green, Tao, and Ziegler prove that the top degree terms in fact have the form of a multidegree $(1,3)$ nilsequence (in variables $h$ and $n$ ). Finally given such a correlation, a symmetrization argument as in [34] concludes the proof. We remark here that while terms such as $a_{i,h}$ and $e_{i,h}$ correspond to Taylor coefficients on the top degree horizontal torus, terms such as $d_{i,h}$ belong to the second horizontal torus, and $j_{h}$ to the third horizontal torus. Furthermore to handle terms of the form $\sum_{i=1}^{d_{2}}d_{i,h}n^{2}[e_{i,h}n]$ correctly we must realize such terms via a degree-rank $(3,2)$ nilmanifold, hence the need for the finer degree-rank notion.

4.4. Nilcharacters and horizontal tori

We now make this description more precise in terms of nilcharacters and horizontal tori. Let $F(g_{h}(n)\Gamma)=G_{h}(n)$ be a nilcharacter of degree-rank $(s,r)$ ; here $e(an[bn][cn])$ should be thought of as an “almost” degree-rank $(3,3)$ nilcharacter and $e(an[bn^{2}])$ as an “almost” degree-rank $(3,2)$ nilcharacter. The sunflower step proves that the nilsequence $F(g_{h}(n)\Gamma)$ can be realized as a bracket polynomial whose top degree-rank part is a sum of terms with $(r-1)$ iterated brackets where each term consists of $(r-1)$ $h$ -independent phases of $g_{h}$ , and possibly one $h$ -dependent phase of $g_{h}$ . Here, “phase” will correspond to components of the Taylor coefficients of $g_{h}$ , $\operatorname{Taylor}_{i}(g_{h})$ . This corresponds to showing that the $i$ -th horizontal torus $G_{(i,1)}/G_{(i,2)}$ contains vector spaces $V_{i,\mathrm{Dep}}\leqslant V_{i}$ such that:

•

$\operatorname{Taylor}_{i}(g_{h})-\operatorname{Taylor}_{i}(g_{h^{\prime}})\in V_{i,\mathrm{Dep}}$ and $\operatorname{Taylor}_{i}(g_{h})\in V_{i}$ ;
•

If $i_{1}+\cdots+i_{r}=s$ , then $[v_{i_{1}},v_{i_{2}},\ldots,v_{i_{r}}]=0$ whenever $v_{i_{j}}\in V_{i_{j}}$ and there are at least two indices $j$ such that $v_{i_{j}}\in V_{i_{j},\mathrm{Dep}}$ .

Here we have implicitly descended an iterated commutator to the vector spaces $G_{(i,1)}/G_{(i,2)}$ which corresponds to a multilinear form in this case. Such a result is proven via combining quantitative equidistribution theory of nilsequences [43, 42] with a “Furstenberg–Weiss argument” as in [32, 34, 43]; see [56] for further examples of the Furstenberg–Weiss argument.

The linearization step then proves that the remaining $h$ -dependent phases are “bracket linear” in $h$ . In practice, we require an additional case that the $h$ -dependent phase may be a petal phase: a top degree-rank term with the petal phase can be realized as a “lower order term”, or more precisely a bracket phase with at most $(r-2)$ iterated brackets or of total degree at most $s-1$ . Thus, the statement we ultimately prove is that we may decompose a subspace of the $i$ -th horizontal torus into the sum of three linearly disjoint vector spaces $W_{i,\ast}$ , $W_{i,\mathrm{Lin}}$ , and $W_{i,\mathrm{Pet}}$ such that:

	$\displaystyle\operatorname{Taylor}_{i}(g_{h})$	$\displaystyle\in W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}},$
	$\displaystyle\operatorname{Taylor}_{i}(g_{h})-\operatorname{Taylor}_{i}(g_{h^{\prime}})$	$\displaystyle\in W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}},$

and the projection of $\operatorname{Taylor}_{i}(g_{h})$ onto $W_{i,\mathrm{Lin}}$ is bracket linear. In addition, we require that if $i_{1}+\cdots+i_{r}=s$ , then $[v_{i_{1}},v_{i_{2}},\ldots,v_{i_{r}}]=0$ whenever $v_{i}\in W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}$ and either $v_{i_{j}}\in W_{i_{j},\mathrm{Pet}}$ for at least one index $j$ or $v_{i_{j}}\in W_{i_{j},\mathrm{Lin}}$ for at least two distinct indices $j$ . Thus even though we have not improved our understanding of the Taylor coefficients on $W_{i,\mathrm{Pet}}$ we have the improved the vanishing of the top degree-rank commutator bracket on this vector space. The linearization step is proved by a combination of quantitative equidistribution theory of nilmanifolds [43, 42] and inverse sumset theory. We refer the reader to [43] for a simpler case of the argument given here.

4.5. Quantitative bounds

The heart of this paper is performing the sunflower and linearization steps efficiently. Green, Tao, and Ziegler [34] accomplish this (when unwinding the correspondence between nilmanifolds and bracket polynomials) via iteratively learning relations between the coefficients $a_{i,h},b_{i,h},c_{i,h}$ and performing a dimension reduction argument.¹¹1This is performed in [34, Section 10] via a “rank minimality” argument; this requires passing to an ultralimit. When performed in finitary language this becomes a dimension reduction argument and is also present in the proof of [34, Theorem D.5]. Furthermore, the underlying equidistribution theorem used in the work of Green, Tao, and Ziegler [34], proven in work of Green and Tao [29], relies on an induction on dimension argument. The use of any induction on dimension argument essentially immediately results in $O(s)$ iterated logarithms and thus must be avoided.

The use of induction on dimension in the equidistribution theorem was avoided in work of the first author [43, 42]. The key point in Sections 8 and 9 therefore is to perform the sunflower and linearization steps without any use of induction on dimension. The precise details, while mainly utilizing elementary linear algebra, require a bit of precision. This argument, extending the case of the $U^{4}$ -inverse theorem from [43], demonstrates that a dimension-independent number of applications of equidistribution theory is sufficient to derive the necessary decrease in degree-rank. (Note that the argument in [34] morally uses that one can in fact assume that there are no “short linear relations” between various coefficients, but such a result necessitates exponential in dimension dependencies in the exponent.) Another crucial point in our work is that the length of the associated bracket linear form that is obtained not “very long”. This is, by now, a standard consequence of the quasi-polynomial bounds of Sanders [52] towards the polynomial Bogolyubov conjecture.

We finally remark that the quantitative equidistribution theorem we use is slightly different than the one derived in work of the first author [43, 42]. The work of the first author is most naturally phrased as factoring ill-distributed polynomial sequences into a smooth part, a rational part, and a polynomial sequence which (up to taking a certain quotient) lives in a lower step nilmanifold. For our purposes, it is critical to instead lower the degree of the nilmanifold. This is most easily seen from the above bracket polynomial example where we are attempting to linearize a function of the form

e\bigg{(}\sum_{i=1}^{d_{3}}f_{i,h}n[g_{i,h}n]+j_{h}n^{3}+\ell_{h}n^{2}+m_{h}n\bigg{)}.

At this step we wish to linearize $j_{h}n^{3}$ instead of handling the terms $f_{i,h}n[g_{i,h}n]$ ; the $j_{h}n^{3}$ term, while having the highest degree, does not correspond to the highest step part of the nilmanifold. This phenomenon only occurs when proving the $U^{s+1}$ -inverse theorem for $s\geq 4$ . Thus a crucial ingredient in our work is bootstrapping, as a black box, the efficient version of equidistribution with respect to step in order to obtain an efficient version of equidistribution with respect to degree; this is Theorem 5.4.

4.6. Organization of the paper II

In Section 5 we prove the necessary quantitative equidistribution theorem with respect to degree. In Section 6, we perform the setup and give various definitions which will be used to perform the sunflower and linearization steps. In Section 7, we derive that many additive quadruples exhibit a bias. In Section 8 we perform the sunflower step while in Section 9 we perform the linearization step. In Sections 10 and 11 we then convert information regarding the Taylor coefficients into correlation with a multidegree $(1,s-1)$ nilsequence and a nilsequence of lower degree-rank. Iterating this argument we eventually obtain correlation with a mutltidegree $(1,s-1)$ nilsequence. In Section 12, we symmetrize this nilsequence to obtain Theorem 1.2.

Appendix A collects certain standard results regarding approximate homomorphisms (this is ultimately where work of Sanders [52] is invoked). In Appendix B, we collect a number of miscellaneous propositions which are deferred throughout the paper. Finally in Appendix C we collect a number of propositions regarding nilcharacters.

5. Efficient equidistribution theory of nilsequences

In order to state the primary equidistribution input of this paper we will need the notion of when an element in $G/[G,G]$ and a horizontal character are orthogonal.

Definition 5.1.

Consider a nilmanifold $G/\Gamma$ , a horizontal character $\eta\colon G\to\mathbb{R}$ , and $w\in G/[G,G]$ . We say that $\eta$ and $w$ are orthogonal if $\eta(w)=0$ .

The primary equidistribution input into our results will be the following result of the first author [42, Theorem 3]. This result is ultimately the driving force of this paper.

Theorem 5.2.

Fix an integer $\ell\geq 1$ , $\delta\in(0,1/10)$ , $M,d\geq 1$ , and $F\colon G/\Gamma\to\mathbb{C}$ . Suppose that $G$ is a dimension $d$ , at most $s$ -step connected, simply connected nilpotent Lie group with a given degree $k$ filtration, and the nilmanifold $G/\Gamma$ is complexity at most $M$ with respect to this filtration. Let $g$ be a polynomial sequence on $G$ with respect to this filtration.

Furthermore suppose that $\lVert F\rVert_{\mathrm{Lip}}\leq 1$ and $F$ has $G_{(s)}$ -vertical frequency $\xi$ such that the height of $\xi$ is bounded by $M/\delta$ . Suppose that $N\geq(M/\delta)^{\Omega_{k,\ell}(d^{\Omega_{k,\ell}(1)})}$ and

\big{|}\mathbb{E}_{\vec{n}\in[N]^{\ell}}F(g(\vec{n})\Gamma)\big{|}\geq\delta.

There exists an integer $0\leq r\leq\dim(G/[G,G])$ such that:

•

We have horizontal characters $\eta_{1},\ldots,\eta_{r}\colon G\to\mathbb{R}$ with heights bounded by $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ ;
•

For all $1\leq i\leq r$ , we have $\lVert\eta_{i}\circ g\rVert_{C^{\infty}[N]}\leq(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$
•

For any $w_{1},\ldots,w_{s}\in G/[G,G]$ such that $w_{i}$ are orthogonal to all of $\eta_{1},\ldots,\eta_{r}$ , we have

$\xi([[[w_{1},w_{2}],w_{3}],\ldots,w_{s}])=0.$

Remark 5.3.

Note that $G_{(s)}$ (and in fact any group in the lower central series) is seen to be $O_{s,k}(M^{O_{s,k}(1)})$ -rational due to Lemma 2.1. This guarantees that the height definition used in [42] and here are compatible.

Remark.

Let $W=\bigcap_{i=1}^{r}\operatorname{ker}(\eta_{i})$ . The crucial property of the lemma output is that

\widetilde{G}:=W/\ker(\xi)

is trivially seen to be at most $(s-1)$ -step nilpotent. (Note that if $G$ is abelian then we have that $\widetilde{G}$ is trivial.) This is due to the fact that defining $W=W_{0}=W_{1}$ and $W_{j}=[W_{1},W_{j-1}]$ for $j\geq 2$ yields $W_{(s)}\leqslant G_{(s)}$ and $\xi(W_{(s)})=0$ . Additionally, the statement in [42, Theorem 3] assumes $G$ is exactly $s$ -step nilpotent and $\xi$ is nonzero. In the case when $G$ is strictly less than $s$ -step nilpotent, taking no horizontal characters (i.e., $W=G$ ) gives the desired statement. Furthermore when $\xi$ is zero we may similarly take no horizontal characters and note that the final statement is vacuous.

The following variant of Theorem 5.2 will essentially be the primary equidistribution tool in our paper. For the sake of argumentation, we first prove the result in the case when the vertical frequency considered lives on a $1$ -dimensional torus and then bootstrap to the general case.

This theorem and its proof are motivated by [34, Lemma E.11]. The key point is that Theorem 5.2 allows us to give a procedure that relies on an induction on step rather than an induction on dimension. The main technical issue is at each stage we pass to a quotient group given by quotienting the kernel of a certain vertical character and thus we must iteratively “lift” these factorizations.

Theorem 5.4.

Let $\ell\geq 1$ be an integer, $\delta\in(0,1/10)$ , $M\geq 1$ , and $F\colon G/\Gamma\to\mathbb{C}$ . Suppose that $G$ is dimension $d$ , is $s$ -step nilpotent with a given degree $k$ filtration, and the nilmanifold $G/\Gamma$ is complexity at most $M$ with respect to this filtration.

Suppose that $T\leqslant Z(G)$ is a $1$ -dimensional subgroup of the center which is $M$ -rational with respect to $G$ . Further suppose that $F$ has a nonzero $T$ -vertical character $\xi$ with $|\xi|\leq M/\delta$ , $\lVert F\rVert_{\mathrm{Lip}}\leq M$ , $N\geq(M/\delta)^{\Omega_{k,\ell}(d^{\Omega_{k,\ell}(1)})}$ , and $g$ is a polynomial sequence with respect to the degree $k$ filtration such that $g(0)=\mathrm{id}_{G}$ . Then if

\big{|}\mathbb{E}_{\vec{n}\in[N]^{\ell}}F(g(\vec{n})\Gamma)\big{|}\geq\delta

there exists a factorization

g=\varepsilon g^{\prime}\gamma

such that:

•

$\varepsilon(0)=g^{\prime}(0)=\gamma(0)=\mathrm{id}_{G}$ ;
•

$g^{\prime}$ lives in an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational subgroup $H$ such that $H\cap T=\mathrm{Id}_{G}$ ;
•

$\gamma$ is an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational polynomial sequence;
•

$\varepsilon$ is an $((M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})},N)$ -smooth polynomial sequence.

Proof.

The proof proceeds by iteratively “simplifying” $g$ to live on successively lower-step nilmanifolds. We treat $\ell$ as constant and allow implicit constants to depend on $\ell$ .

Step 1: Iteration setup. We will define a sequence of parameters $M_{i},\delta_{i}$ and $Q_{i},N_{i},v_{i}$ (where the domain of $\vec{n}$ at stage $i$ will be $v_{i}+Q_{i}\cdot[N_{i}]^{\ell}$ ) satisfying:

M_{i+1}\leq(M_{i}/\delta_{i})^{O_{k}(d^{O_{k}(1)})},\quad\delta_{i+1}\geq(\delta_{i}/M_{i})^{O_{k}(d^{O_{k}(1)})};

Q_{i+1}\leq Q_{i}\cdot(M_{i}/\delta_{i})^{O_{k}(d^{O_{k}(1)})},\quad N_{i+1}\geq N_{i}\cdot(\delta_{i}/M_{i})^{O_{k}(d^{O_{k}(1)})},\quad Q_{i+1}\cdot N_{i+1}+\lVert v_{i+1}\rVert_{\infty}\leq N.

During the iteration, we have a sequence of nilpotent Lie groups

G^{0},G^{1},\ldots,G^{t},\ldots

such that $G^{t}$ is at most $(s-t)$ -step nilpotent with associated lattice $\Gamma^{t}$ and is complexity at most $M_{t}$ . This in particular will imply that there are at most $s$ stages in the iteration. We also maintain a sequence of subgroups

K^{0},\ldots,K^{t},\ldots

which are $M_{t}$ -rational subgroups of $G$ .

We will define homomorphisms $\pi_{t+1}\colon G^{t}\to G^{t}/\operatorname{ker}(\xi_{t})=:\widetilde{G}^{t+1}$ , where $\xi_{t}$ is a $G^{t}_{(s-t)}$ -frequency (recall $H_{(i)}$ denotes the lower central series filtration of a group $H$ ). $G^{t+1}$ will be an appropriately rational subgroup of $\widetilde{G}^{t+1}$ . We will always maintain the invariant that $\operatorname{ker}(\xi_{t})\cap(\pi_{t}\circ\cdots\circ\pi_{1}(T))=\mathrm{Id}_{G^{t}}$ . We will furthermore maintain that the function $F_{t}$ has a $\pi_{t}\circ\cdots\circ\pi_{1}(T)$ -character given by descending $\xi$ on $G$ via $\pi_{t}\circ\cdots\circ\pi_{1}$ .

We inductively maintain the following pair of relations:

•

$\pi_{t}\circ\cdots\circ\pi_{1}(K^{t})=G^{t}$ ;
•

$\pi_{t}\circ\cdots\circ\pi_{1}(g_{t})=\widetilde{g}_{t}$ ;

where $g_{t}$ and $\widetilde{g}_{t}$ are polynomial sequences living in $K^{t}$ and $G^{t}$ respectively.

The iteration terminates when $G^{t}\cap(\pi_{t}\circ\cdots\circ\pi_{1}(T))=\mathrm{Id}_{G^{t}}$ . Before termination note that $G^{t}\cap(\pi_{t}\circ\cdots\circ\pi_{1}(T))=\pi_{t}\circ\cdots\circ\pi_{1}(T)$ since $\pi_{t}\circ\cdots\circ\pi_{1}(T)$ is $1$ -dimensional. Note that this in particular ensures that before the termination of the iteration, $\pi_{t}\circ\cdots\circ\pi_{1}(T)$ is well-defined even though $\pi_{j}$ is not fully defined on the image of $\pi_{j-1}$ ! Using the invariant that $\operatorname{ker}(\xi_{t})\cap(\pi_{t}\circ\cdots\circ\pi_{1}(T))=\mathrm{Id}_{G^{t}}$ we also have that $\xi$ (defined on $T$ ) naturally descends to $G^{t}$ . We define $J^{t}=\pi_{t}\circ\cdots\circ\pi_{1}(T)$ .

Furthermore at each stage of the iteration we have that

g_{t}=\varepsilon_{t+1}\cdot g_{t+1}\cdot\gamma_{t+1}

where:

•

$\varepsilon_{t+1}$ and $\gamma_{t+1}$ are polynomial sequences lying in $K_{t}$ ;
•

$g_{t+1}$ is a polynomial sequence lying in $K_{t+1}$ ;
•

$\gamma_{t+1}$ is $M_{t+1}$ -rational;
•

$\varepsilon_{t+1}$ is $(M_{t+1},N)$ -smooth.

Finally, in each stage of the iteration we will maintain a function $F_{t}\colon G^{t}/\Gamma^{t}\to\mathbb{C}$ such that

|\mathbb{E}_{\vec{n}\in v_{t}+Q_{t}\cdot[N_{t}]^{\ell}}[F_{t}(\widetilde{g_{t}}(\vec{n})\Gamma^{t})]|\geq\delta_{t}.

Throughout the iterations, nilmanifolds at stage $i$ will have complexity bounded by $M_{i}$ , $F_{i}$ is $M_{i}$ -Lipschitz, and various horizontal and vertical characters constructed will have size and height bounded by $M_{i}$ . The starting conditions are $G^{0}=G$ , $\Gamma^{0}=\Gamma$ , $M_{0}=M$ , $F_{0}=F$ , $N_{0}=N$ , $v_{0}=0$ , $Q_{0}=1$ , $\delta_{0}=\delta$ , $K^{0}=G$ (and $J^{0}=T$ ), and $g_{0}=\widetilde{g}_{0}=g$ .

Step 2: Applying equidistribution. We now run a single step of the iteration. We have

|\mathbb{E}_{\vec{n}\in v_{t}+Q_{t}\cdot[N_{t}]^{\ell}}[F_{t}(\widetilde{g}_{t}(n)\Gamma^{t})]|\geq\delta_{t}.

By definition, we have that $F_{t}$ has a $J^{t}$ -frequency (a descent of $\xi$ ); this is not sufficient to apply Theorem 5.2. We perform an additional Fourier-analytic step to obtain a $G^{t}_{(s-t)}$ -vertical frequency. Since $F_{t}$ is $M_{t}$ -Lipschitz, via [42, Lemma A.6] we may write

F_{t}(z\Gamma^{t})=\sum_{|\xi^{\prime}|\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}}F_{\xi^{\prime},t}(z\Gamma^{t})+\tau(z\Gamma^{t})

such that

•

$F_{\xi^{\prime},t}$ has $G^{t}_{(s-t)}$ -vertical frequency $\xi^{\prime}$ ;
•

$\lVert\tau\rVert_{\infty}\leq\delta_{t}/2$ ;
•

$F_{\xi^{\prime},t}$ is $(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ -Lipschitz on $G^{t}/\Gamma^{t}$ .

Given this representation, recall that $F_{t}$ has $\xi$ (appropriately descended) as a $J^{t}$ -vertical frequency. We abusively write this as $\xi$ . Therefore

	$\displaystyle F_{t}(z\Gamma^{t})$	$\displaystyle=\int_{g\in J^{t}/\Gamma^{t}}e(-\xi(g))F_{t}(zg\Gamma^{t})dJ^{t}(g)$
		$\displaystyle=\sum_{\|\xi^{\prime}\|\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}}\int_{g\in J^{t}/\Gamma^{t}}e(-\xi(g))F_{\xi^{\prime},t}(zg\Gamma^{t})dJ^{t}(g)+\int_{g\in J^{t}/\Gamma^{t}}e(-\xi(g))\tau(zg\Gamma^{t})dJ^{t}(g)$
		$\displaystyle=\sum_{\|\xi^{\prime}\|\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}}\widetilde{F}_{\xi^{\prime},t}(z\Gamma^{t})+\int_{g\in J^{t}/\Gamma^{t}}e(-\xi(g))\tau(zg\Gamma^{t})dJ^{t}(g),$

where $dJ^{t}$ represents the Haar measure on $J^{t}/\Gamma^{t}$ . Thus $F_{t}$ may be decomposed into a sum of functions with $G^{t}_{(s-t)}$ -vertical characters up to an $L^{\infty}$ error of $\delta_{t}/2$ . Furthermore, each vertical character $\xi^{\prime}$ in question must agree with $\xi$ on $J^{t}\cap G^{t}_{(s-t)}$ . If not, then the corresponding integral in the second line will average to $0$ and we may remove it.

Applying Pigeonhole, there exists $|\xi^{\prime}|\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ such that

(5.1)

|\mathbb{E}_{\vec{n}\in v_{t}+Q_{t}\cdot[N_{t}]^{\ell}}[\widetilde{F}_{\xi^{\prime},t}(\widetilde{g}_{t}(\vec{n})\Gamma^{t})]|\geq(\delta_{t}/M_{t})^{O_{k}(d^{O_{k}(1)})}.

We have the following trichotomy:

•

$\xi^{\prime}$ is nonzero and $J^{t}\cap G^{t}_{(s-t)}=J^{t}$ ;
•

$\xi^{\prime}$ is nonzero and $J^{t}\cap G^{t}_{(s-t)}=\mathrm{Id}_{G^{t}}$ ;
•

$\xi^{\prime}=0$ in $\widehat{G^{t}_{(s-t)}}$ .

We define $\pi_{t+1}\colon G^{t}\to G^{t}/\ker(\xi^{\prime})=:\widetilde{G}^{t+1}$ (in particular we let $\xi_{t+1}=\xi^{\prime}$ ). Let $\widetilde{\Gamma}^{t+1}:=\Gamma^{t}/(\Gamma^{t}\cap\ker(\xi^{\prime}))=\pi_{t+1}(\Gamma_{t})$ . We now apply Theorem 5.2 to (5.1), obtaining horizontal characters $\eta_{1},\ldots,\eta_{r}\colon G^{t}\to\mathbb{R}$ . Let their common kernel be $H^{\ast}$ and let $G^{t+1}=\pi_{t+1}(H^{\ast})\leqslant\widetilde{G}^{t+1}$ . Note that in the case when $G^{t}$ is abelian, we do not necessarily have that $\eta_{i}$ are trivial on $\operatorname{ker}(\xi^{\prime})$ . However replacing $H^{\ast}$ by $H^{\ast}\operatorname{ker}(\xi^{\prime})$ (and abusively referring to this as $H^{\ast}$ ), we may then replace $\eta_{i}$ by $\eta_{i}^{\prime}$ which instead cutout $H^{\ast}\operatorname{ker}(\xi^{\prime})$ and note that $\eta_{i}^{\prime}$ may be taken to be $(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ -height integer combination of $\eta_{i}$ . We abusively rename these characters as $\eta_{i}$ and then proceed with the proof in this edge abelian case.

By applying [42, Lemma A.1], we obtain a factorization of $\widetilde{g}_{t}(Q_{t}n+v_{t})$ into three nilsequences which are “smooth”, supported on a rational subgroup, and “rational”. We may change variables and then apply $\pi_{t+1}$ to obtain

\pi_{t+1}(\widetilde{g}_{t})=:\varepsilon_{t+1}^{\ast}g_{t+1}^{\ast}\gamma_{t+1}^{\ast}

where:

•

$g_{t+1}^{\ast}\in G^{t+1}$ , and $G^{t+1}$ is at most $(s-t-1)$ -step nilpotent. Furthermore $G^{t+1}$ is trivially seen to be $(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ -rational with respect to $\widetilde{G}_{t+1}$ ;
•

$\gamma_{t+1}^{\ast}$ is an $(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ -rational polynomial sequence within $\widetilde{G}^{t+1}$ ;
•

$\varepsilon_{t+1}^{\ast}$ is $((M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})},N)$ -smooth.

We remark that changing variables is easily seen to not affect the smoothness and rationality in a substantial manner due to the bounds on $Q_{t}$ . We can see that the step of $G^{t+1}$ decreases appropriately.

Step 3: Lifting the factorization data. Note that $G^{t+1}$ can be defined via a set of horizontal characters $\eta_{1}^{\prime},\ldots,\eta_{r^{\prime}}^{\prime}$ of $\widetilde{G}^{t+1}$ such that

G^{t+1}=\{x\in\widetilde{G}^{t+1}\colon\eta_{i}^{\prime}(x)=0\text{ for all }1\leq i\leq r^{\prime}\}

and we let $\Gamma^{t+1}=G^{t+1}\cap\widetilde{\Gamma}^{t+1}$ . Note that the $\eta_{i}^{\prime}$ are the natural descentions of $\eta_{i}$ as we have $\eta_{i}$ are trivial on $\operatorname{ker}(\xi^{\prime})$ ; this is precisely why we earlier modified the characters in the abelian case.

We define

K^{t+1}=\{x\in K^{t}\colon\eta_{i}^{\prime}(\pi_{t+1}\circ\pi_{t}\circ\cdots\circ\pi_{1}(x))=0\text{ for all }1\leq i\leq r^{\prime}\}.

The trivial (but key) point is that $\pi_{t+1}\circ\pi_{t}\circ\cdots\circ\pi_{1}(K^{t+1})\leqslant G^{t+1}$ . The key issue is noting that the map is well-defined; this is because $\pi_{t}\circ\cdots\circ\pi_{1}(K^{t})\leqslant G^{t}$ by induction so that we are allowed to apply $\pi_{t+1}$ to any such values. We further see that $\pi_{t+1}\circ\pi_{t}\circ\cdots\circ\pi_{1}(K^{t+1})=G^{t+1}$ because $\pi_{t+1}\circ\pi_{t}\circ\cdots\circ\pi_{1}(K^{t})=\pi_{t+1}(G^{t})=\widetilde{G}^{t+1}$ and $K^{t+1}$ is the subgroup of $K^{t}$ such that the image under $\pi_{t+1}\circ\cdots\circ\pi_{1}$ is precisely in the intersection of kernels defining $G^{t+1}$ within $\widetilde{G}^{t+1}$ .

Recall by induction that

\pi_{t}\circ\cdots\circ\pi_{1}(g_{t})=\widetilde{g}_{t}

and thus

\pi_{t+1}\circ\cdots\circ\pi_{1}(g_{t})=\varepsilon_{t+1}^{\ast}g_{t+1}^{\ast}\gamma_{t+1}^{\ast}.

Applying $\eta_{i}^{\prime}$ , we find that that there exists a nonzero integer $T_{i}\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ such that

(5.2)

\lVert T_{i}\cdot\eta_{i}^{\prime}(\pi_{t+1}\circ\cdots\circ\pi_{1}(g_{t}))\rVert_{C^{\infty}[N]}\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}.

We now claim that $\eta_{i}^{\prime}(\pi_{t+1}\circ\cdots\circ\pi_{1}(\cdot))$ is a horizontal character on $K^{t}$ . It is a homomorphism since the $\pi_{i}$ are homomorphisms and it is well-defined by the above. In addition, we may inductively show that $\pi_{t+1}\circ\cdots\circ\pi_{1}(\Gamma\cap K^{t})=\widetilde{\Gamma}^{t+1}$ and $\pi_{t+1}\circ\cdots\circ\pi_{1}(\Gamma\cap K^{t+1})=\Gamma^{t+1}$ . Hence $\eta_{i}^{\prime}(\pi_{t+1}\circ\cdots\circ\pi_{1}(\Gamma\cap K^{t}))\leqslant\mathbb{Z}$ , which verifies the property of being a horizontal character. That the horizontal character has appropriately bounded height is an immediate consequence of induction and the fact that $|\xi^{\prime}|\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ .

Now we use this data to construct the required factorization. By applying [42, Lemma A.1] with the horizontal characters $T_{i}\cdot\eta_{i}^{\prime}(\pi_{t+1}\circ\cdots\circ\pi_{1})$ defined on $K^{t}$ with the hypotheses (5.2), we may write

g_{t}=\varepsilon_{t+1}^{\prime}g_{t+1}^{\prime}\gamma_{t+1}^{\prime}

where:

•

$g_{t+1}^{\prime}$ takes values in $K_{t+1}$ ;
•

$\varepsilon_{t+1}^{\prime}$ and $\gamma_{t+1}^{\prime}$ take values in $K_{t}$ ;
•

$\gamma_{t+1}^{\prime}$ is an $(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ -rational polynomial sequence;
•

$\varepsilon_{t+1}^{\prime}$ is $((M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})},N)$ -smooth.

Then $Q^{\prime}$ denote the least common multiple of the periods of the $\ell$ different directions for $\gamma_{t+1}^{\prime}\Gamma$ ; note that such periods exist and we have $Q^{\prime}\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ by [42, Lemma B.14]. Divide $v_{t}+Q_{t}\cdot[N_{t}]^{\ell}$ into boxes of common difference $Q_{t}Q^{\prime}$ . By Pigeonhole there exists $v^{\prime}$ such that

|\mathbb{E}_{\vec{n}\in v^{\prime}+Q^{\prime}Q_{t}\cdot[N_{t}/Q^{\prime}]^{\ell}}[\widetilde{F}_{\xi^{\prime},t}(\widetilde{g}_{t}(\vec{n})\Gamma^{t})]|\geq(\delta_{t}/M_{t})^{O_{k}(d^{O_{k}(1)})}.

Note that

\widetilde{g}_{t}=\pi_{t}\circ\cdots\circ\pi_{1}(g_{t})=\pi_{t}\circ\cdots\circ\pi_{1}(\varepsilon_{t+1}^{\prime})\cdot\pi_{t}\circ\cdots\circ\pi_{1}(g_{t+1}^{\prime})\cdot\pi_{t}\circ\cdots\circ\pi_{1}(\gamma_{t+1}^{\prime}).

Since the differences we are considering are divisible by $Q^{\prime}$ , there is $\gamma_{\mathrm{Rep}}$ such that

\gamma_{\mathrm{Rep}}^{-1}\gamma_{t+1}^{\prime}(v^{\prime}+Q^{\prime}Q_{t}\cdot\vec{n})\in\Gamma

for all $\vec{n}\in\mathbb{Z}^{\ell}$ , where $\gamma_{\mathrm{Rep}}\in K^{t}$ and $d_{G}(\gamma_{\mathrm{Rep}},\mathrm{id}_{G})\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ . Since $\pi_{t}\circ\cdots\circ\pi_{1}(\Gamma\cap K^{t})\leqslant\Gamma^{t}$ we have that

|\mathbb{E}_{\vec{n}\in v^{\prime}+Q^{\prime}Q_{t}\cdot[N_{t}/Q^{\prime}]^{\ell}}[\widetilde{F}_{\xi^{\prime},t}(\pi_{t}\circ\cdots\circ\pi_{1}(\varepsilon_{t+1}^{\prime}\gamma_{\mathrm{Rep}})\cdot\pi_{t}\circ\cdots\circ\pi_{1}(\gamma_{\mathrm{Rep}}^{-1}g_{t+1}^{\prime}\gamma_{\mathrm{Rep}})\Gamma^{t})]|\geq(\delta_{t}/M_{t})^{O_{k}(d^{O_{k}(1)})}.

Step 4: Completing the induction. The first key polynomial sequence we shall define is

g_{t+1}=\gamma_{\mathrm{Rep}}^{-1}\cdot g_{t+1}^{\prime}\cdot\gamma_{\mathrm{Rep}}.

Note that $K^{t+1}$ is normal within $K^{t}$ and since $\gamma_{\mathrm{Rep}}\in K^{t}$ we have that $g_{t+1}$ takes on values in $K^{t+1}$ as desired. Further let $\varepsilon_{t+1}=\varepsilon_{t+1}^{\prime}\cdot\gamma_{\mathrm{Rep}}$ and $\gamma_{t+1}=\gamma_{\mathrm{Rep}}^{-1}\cdot\gamma_{t+1}^{\prime}$ ; these are trivially seen to lie in $K_{t}$ and have the necessary rationality and smoothness properties due to the above analysis.

We now break $[N_{t}/Q^{\prime}]^{\ell}$ into a collection of boxes of length $N_{t+1}\geq N_{t}/Q^{\prime}\cdot(M_{t}/\delta_{t})^{-O_{k}(d^{O_{k}(1)})}$ . There exists a box such that

|\mathbb{E}_{\vec{n}\in v^{\prime\prime}+Q^{\prime}Q_{t}\cdot[N_{t+1}]^{\ell}}[\widetilde{F}_{\xi^{\prime},t}(\pi_{t}\circ\cdots\circ\pi_{1}(\varepsilon_{t+1}^{\prime}\cdot\gamma_{\mathrm{Rep}})\cdot\pi_{t}\circ\cdots\circ\pi_{1}(g_{t+1})\Gamma^{t})]|\geq(\delta_{t}/M_{t})^{O_{k}(d^{O_{k}(1)})}.

Taking $N_{t+1}$ sufficiently small, we may replace the initial “smooth” polynomial sequence $\varepsilon_{t+1}^{\ast}$ by $\varepsilon^{\ast}\in K^{t}$ where $d_{G}(\varepsilon^{\ast},\mathrm{id}_{G})\leq(M_{t}/\delta_{t})^{O_{k}(d^{O_{k}(1)})}$ such that

|\mathbb{E}_{\vec{n}\in v^{\prime\prime}+Q^{\prime}Q_{t}\cdot[N_{t+1}]^{\ell}}[\widetilde{F}_{\xi^{\prime},t}(\pi_{t}\circ\cdots\circ\pi_{1}(\varepsilon^{\ast})\cdot\pi_{t}\circ\cdots\circ\pi_{1}(g_{t+1})\Gamma^{t})]|\geq(\delta_{t}/M_{t})^{O_{k}(d^{O_{k}(1)})}.

The new function $F_{t+1}$ is given by descending $g\mapsto\widetilde{F}_{\xi^{\prime},t}(\pi_{t}\circ\cdots\circ\pi_{1}(\varepsilon^{\ast})\cdot g\Gamma^{t})$ from $G^{t}$ to $\widetilde{G}^{t+1}$ (and later we may implicitly restrict to $G^{t+1}$ ). Explicitly, for $g\in G^{t}$ we have

\widetilde{F}_{\xi^{\prime},t}(\pi_{t}\circ\cdots\circ\pi_{1}(\varepsilon^{\ast})g\Gamma^{t+1})=F_{t+1}(\pi_{t+1}(g)\widetilde{\Gamma}^{t+1})

which is possible because $\widetilde{F}_{\xi^{\prime},t}$ has vertical frequency $\xi^{\prime}$ . Therefore we have

(5.3)

|\mathbb{E}_{\vec{n}\in v^{\prime\prime}+Q^{\prime}Q_{t}\cdot[N_{t+1}]^{\ell}}[F_{t+1}(\pi_{t+1}\circ\pi_{t}\circ\cdots\circ\pi_{1}(g_{t+1}(\vec{n}))\widetilde{\Gamma}^{t+1})]|\geq(\delta_{t}/M_{t})^{O_{k}(d^{O_{k}(1)})}.

We let

\widetilde{g}_{t+1}:=\pi_{t+1}\circ\pi_{t}\circ\cdots\circ\pi_{1}(g_{t+1})

and we may replace $\widetilde{\Gamma}^{t+1}$ with $\Gamma^{t+1}=\widetilde{\Gamma}^{t+1}\cap G^{t+1}$ in (5.3).

We now check that $\operatorname{ker}(\xi^{\prime})\cap J^{t}=\mathrm{id}_{G^{t}}$ , which is one of the invariants we are maintaining (we take $\xi_{t}=\xi^{\prime}$ ). We will have to distinguish between cases:

•

If $\xi^{\prime}$ is nonzero and $J^{t}\cap G^{t}_{(s-t)}=J^{t}$ note that $\operatorname{ker}(\xi^{\prime})\cap J^{t}=\mathrm{Id}_{G^{t}}$ . This is due to the fact that $\xi^{\prime}$ restricted to $J^{t}$ is (the descended version of) $\xi$ which is nonzero as given.
•

If $\xi^{\prime}$ is nonzero and $J^{t}\cap G^{t}_{(s-t)}=\mathrm{Id}_{G}^{t}$ then note that $\operatorname{ker}(\xi^{\prime})\cap J^{t}\leqslant J^{t}\cap G^{t}_{(s-t)}=\mathrm{Id}_{G^{t}}$ .
•

If $\xi^{\prime}=0$ then note that as $\xi$ (appropriately descended) was nonzero we have that $J^{t}\cap G^{t}_{(s-t)}=\mathrm{Id}_{G}^{t}$ is forced in this case. The result then follows as in the previous step.

Now, if $G^{t+1}\cap\pi_{t+1}\circ\cdots\circ\pi_{1}(T)=\pi_{t+1}\circ\cdots\circ\pi_{1}(T)$ then we continue with the iteration and do not terminate. If we have reached termination, we therefore have that $G^{t+1}\cap\pi_{t+1}\circ\cdots\pi_{1}(T)=\mathrm{Id}_{G^{t+1}}$ . We claim that this implies that $K^{t+1}\cap T=\mathrm{Id}_{G}$ (and therefore we may take the output group to be $H=K^{t+1}$ ). For the sake of contradiction, instead suppose $T\leqslant K^{t+1}$ (since $T$ is $1$ -dimensional). Applying $\pi_{t+1}\circ\cdots\circ\pi_{1}$ we have that

\pi_{t+1}\circ\cdots\circ\pi_{1}(T)\leqslant\pi_{t+1}\circ\cdots\circ\pi_{1}(K^{t+1})=G^{t+1}

which contradicts the termination condition.

Finally, note that if $G^{t+1}\cap\pi_{t+1}\circ\cdots\pi_{1}(T)=\pi_{t+1}\circ\cdots\pi_{1}(T)$ then $F_{t+1}$ when viewed as a function on $G^{t+1}/\Gamma^{t+1}$ is seen to have a nonzero $\pi_{t+1}\circ\cdots\pi_{1}(T)$ vertical character (which is given by descending $\xi$ on $G$ in through $\pi_{t+1}\circ\cdots\pi_{1}$ in the obvious manner), so one can continue in the iteration in this case.

Step 5: Fixing the value at $0$ . To see that this completes the proof, if the iteration terminates at some stage $t$ then note that

g=\varepsilon_{1}\cdots\varepsilon_{t}\cdot g_{t}\cdot\gamma_{t}\cdots\gamma_{1}.

Using that the product of smooth sequences are appropriately smooth and analogously for rational sequences allows us to deduce the necessary outputs. However, we have not guaranteed that the values of the factorization are the $\mathrm{id}_{G}$ at $0$ . For this, let $g_{t}(0)=\{g_{t}(0)\}[g_{t}(0)]$ with $[g_{t}(0)]\in K^{t}\cap\Gamma$ and $d_{G}(\{g_{t}(0)\},\mathrm{id}_{G})\leq(M/\varepsilon)^{O_{k}(d^{O_{k}(1)})}$ . We then have that

g=\varepsilon_{1}\cdots\varepsilon_{t}\cdot\{g_{t}(0)\}\cdot(\{g_{t}(0)\}^{-1}g_{t}[g_{t}(0)]^{-1})\cdot[g_{t}(0)]\cdot\gamma_{t}\cdots\gamma_{1}.

As $g(0)=0$ , we have that $\tau=[g_{t}(0)]\cdot\gamma_{t}(0)\cdot\cdots\gamma_{1}(0)$ satisfies $d_{G}(\tau,\mathrm{id}_{G})\leq(M/\varepsilon)^{O_{k}(d^{O_{k}(1)})}$ and $\tau$ is $(M/\varepsilon)^{O_{k}(d^{O_{k}(1)})}$ -rational. Thus

g=\varepsilon_{1}\cdots\varepsilon_{t}\cdot\{g_{t}(0)\}\tau\cdot(\tau^{-1}\{g_{t}(0)\}^{-1}g_{t}[g_{t}(0)]^{-1}\tau)\cdot\tau^{-1}[g_{t}(0)]\cdot\gamma_{t}\cdot\gamma_{1}

and note that $(\tau^{-1}\{g_{t}(0)\}^{-1}g_{t}[g_{t}(0)]^{-1}\tau)$ takes value in the conjugated subgroup $\tau^{-1}K^{t}\tau$ which is $(M/\varepsilon)^{O_{k}(d^{O_{k}(1)})}$ -rational by [42, Lemma B.15]. Note however that despite modifying the output group $H$ via conjugation, we have $\tau^{-1}K^{t}\tau\cap T=\tau^{-1}K^{t}\tau\cap\tau^{-1}T\tau=\mathrm{Id}_{G}$ as desired. ∎

We now remove the assumption of a $1$ -dimensional vertical torus via a reduction to this case.

Corollary 5.5.

Suppose that $T\leqslant Z(G)$ is a subgroup of the center which is $M$ -rational. Further suppose that $F$ has a nonzero $T$ -vertical character $\xi$ with $|\xi|\leq M/\delta$ , $\lVert F\rVert_{\mathrm{Lip}}\leq M$ , $N\geq(M/\delta)^{\Omega_{k,\ell}(d^{\Omega_{k,\ell}(1)})}$ , and $g$ is a polynomial sequence with respect to the degree $k$ filtration. Then if

\big{|}\mathbb{E}_{\vec{n}\in[N]^{\ell}}F(g(\vec{n})\Gamma)\big{|}\geq\delta

there exists a factorization

g=\varepsilon g^{\prime}\gamma

such that:

•

$g^{\prime}$ lives in an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational subgroup $H$ such that $\xi(H\cap T)=0$ ;
•

$\gamma$ is an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational polynomial sequence;
•

$\varepsilon$ is an $((M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})},N)$ -smooth polynomial sequence.

Furthermore if $g(0)=\mathrm{id}_{G}$ then we may take $\varepsilon(0)=g^{\prime}(0)=\gamma(0)=\mathrm{id}_{G}$ .

Proof.

We first reduce to the case where $g(0)=\mathrm{id}_{G}$ as is standard. We factor $g(0)=\{g(0)\}[g(0)]$ such that $[g(0)]\in\Gamma$ and $\psi_{G}(\{g(0)\})\in[0,1)^{\dim(G)}$ . Replacing $F$ by $F(\{g(0)\}\cdot)$ and $g$ by $\{g(0)\}^{-1}g[g(0)]^{-1}$ we may clearly reduce to the case where $g(0)=\mathrm{id}_{G}$ at the cost of replacing $M$ by $M^{O_{k}(d^{O_{k}(1)})}$ which leaves the conclusion unchanged.

Using Lemma 3.10 to bound the complexity of $G/\operatorname{ker}(\xi)$ and noting that $F$ descends to an $(M/\delta)^{O_{k}(d^{O_{k}(1)})}$ -Lipschitz function on $G/\operatorname{ker}(\xi)$ , by Theorem 5.4 we have that

(g~{}\mathrm{mod}~{}\operatorname{ker}(\xi))=\varepsilon g^{\prime}\gamma

where $\varepsilon,g^{\prime},\gamma$ satisfy:

•

$\varepsilon(0)=g^{\prime}(0)=\gamma(0)=\mathrm{id}_{G/\operatorname{ker}(\xi)}$ ;
•

$g^{\prime}$ lives in an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational subgroup $H$ such that $H\cap(T/\operatorname{ker}(\xi))=\mathrm{id}_{G/\operatorname{ker}(\xi)}$ ;
•

$\gamma$ is an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational polynomial sequence;
•

$\varepsilon$ is an $((M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})},N)$ -smooth polynomial sequence.

We now “lift” this factorization. Consider the Mal’cev basis $\mathcal{X}^{\prime}$ for $G/\operatorname{ker}(\xi)$ . For each element $X_{i}^{\prime}\in\mathcal{X}^{\prime}$ we may lift to $Z_{i}\in\log G$ such that:

•

$\exp(X_{i}^{\prime})=\exp(Z_{i})~{}\mathrm{mod}~{}\operatorname{ker}(\xi)$ ;
•

$d_{G}(\exp(Z_{i}),\mathrm{id}_{G})\leq(M/\delta)^{O_{k}(d^{O_{k}(1)})}$ ;
•

$Z_{i}$ is an $(M/\delta)^{O_{k}(d^{O_{k}(1)})}$ -rational combination of the elements of $\mathcal{X}$ .

Writing $\varepsilon$ as

\varepsilon(\vec{n})=\exp\bigg{(}\sum_{|\vec{i}|\leq k}\mathfrak{\varepsilon}_{\vec{i}}\binom{\vec{n}}{\vec{i}}\bigg{)}

where $\mathfrak{\varepsilon}_{\vec{i}}\in\log(G_{|\vec{i}|}/(\operatorname{ker}(\xi)\cap G_{|\vec{i}|}))$ , we lift via the above mapping on $\mathcal{X}^{\prime}$ to

\widetilde{\varepsilon}(n)=\exp\bigg{(}\sum_{|\vec{i}|\leq k}\widetilde{\mathfrak{\varepsilon}}_{\vec{i}}\binom{\vec{n}}{\vec{i}}\bigg{)}

where $\widetilde{\varepsilon}_{\vec{i}}\in\log(G_{|\vec{i}|})$ and analogously for $g^{\prime},\gamma$ .

We easily see that $\widetilde{\varepsilon}$ is an $((M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})},N)$ -smooth polynomial sequence, that $\widetilde{\gamma}$ is an $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational polynomial sequence, and that $\widetilde{g}^{\prime}$ takes values in the subgroup $H^{\prime}=\exp(\log(H)+\log(\operatorname{ker}(\xi)))$ . Furthermore $H^{\prime}$ is seen to be $(M/\delta)^{O_{k,\ell}(d^{O_{k,\ell}(1)})}$ -rational and $\xi(H^{\prime}\cap T)=0$ . Finally note that $\widetilde{\varepsilon}~{}\mathrm{mod}~{}\operatorname{ker}(\xi)=\varepsilon$ and analogously for $\widetilde{g}^{\prime},\widetilde{\gamma}$ . Therefore

g\cdot(\widetilde{\varepsilon}\widetilde{g}^{\prime}\widetilde{\gamma})^{-1}\equiv\mathrm{id}_{G}~{}\mathrm{mod}~{}\operatorname{ker}(\xi)

as polynomial sequences. Thus

g=g\cdot(\widetilde{\varepsilon}\widetilde{g}^{\prime}\widetilde{\gamma})^{-1}\cdot(\widetilde{\varepsilon}\widetilde{g}^{\prime}\cdot\widetilde{\gamma})=\widetilde{\varepsilon}\cdot((g\cdot(\widetilde{\varepsilon}\widetilde{g}^{\prime}\widetilde{\gamma})^{-1})\cdot\widetilde{g}^{\prime})\cdot\widetilde{\gamma}

gives the desired factorization noting that $\operatorname{ker}(\xi)\leqslant H^{\prime}$ and $\operatorname{ker}(\xi)$ is central and therefore $g\cdot(\widetilde{\varepsilon}\widetilde{g}^{\prime}\widetilde{\gamma})^{-1}$ may be commuted to the right. ∎

6. Setup for Sunflower and Linearization Iteration

We now set up the iteration which will take up the bulk of the following four sections. The idea is to inductively assume the statement of Theorem 1.2 for $s-1$ (i.e., the quantitative inverse theorem for the $U^{s}[N]$ -norm) and the remaining goal is to prove it for $s$ . The key step is to show that for many $h\in[N]$ , $\Delta_{h}f$ correlates with a multidegree $(1,s-1)$ nilcharacter; this is a quantitative version of [34, Theorem 7.1]. For the remainder of the analysis until Section 12 we will be concerned with the notion of a correlation structure, which can be thought of as refining the notion in Definition 3.14 with intermediate bracket information.

Definition 6.1.

A correlation structure associated to the function $f\colon[N]\to\mathbb{C}$ with parameters $\rho$ , $M$ , $d$ , and $D$ and degree-rank $(s-1,r^{\ast})$ is the following data:

•

A subset $H\subseteq[N]$ such that $|H|\geq\rho N$ ;
•

A multidegree $(1,s-1)$ nilcharacter $\chi(h,n)$ that lives on a nilmanifold $G^{\ast}/\Gamma^{\ast}$ where $\chi$ has a $G^{\ast}_{(1,s-1)}$ -vertical frequency $\eta^{\ast}$ . Furthermore $G^{\ast}/\Gamma^{\ast}$ has dimension bounded by $d$ and complexity bounded by $M$ , the function $F^{\ast}$ underlying $\chi$ is $M$ -Lipschitz, $\eta^{\ast}$ has height bounded by $M$ , and the output dimension of $\chi$ is bounded by $D$ . We let $g(h,n)$ denote the underlying polynomial sequence of $\chi$ ;
•

A collection of degree-rank $(s-1,r^{\ast})$ nilcharacters $\chi_{h}(n)$ which live on $G/\Gamma$ where every $\chi_{h}$ has the same $G_{(s-1,r^{\ast})}$ -vertical frequency $\eta$ . Furthermore $G/\Gamma$ has dimension bounded by $d$ and complexity bounded by $M$ (with Mal’cev basis $\mathcal{X}$ ), the function underlying $\chi_{h}$ is $M$ -Lipschitz, $\eta$ has height bounded by $M$ , and $\chi_{h}$ has output dimension bounded by $D$ . We let $g_{h}(n)$ denote the polynomial sequence underlying $\chi_{h}$ . Finally, the function underlying $\chi_{h}$ , which we will denote $F$ , is independent of $h$ ;
•

The polynomial sequences satisfy $g_{h}(0)=\mathrm{id}_{G}$ ;
•

For all $h\in H$ we have

$\Delta_{h}f(n)\otimes\overline{\chi(h,n)}\otimes\overline{\chi_{h}(n)}\in\operatorname{Corr}(s-2,\rho,M,d).$

If the input function $f$ we are considering for the proof of Theorem 1.2 satisfies

\lVert f\rVert_{U^{s+1}[N]}\geq\delta,

then our proof will always maintain bounds of the form

\rho^{-1},M,D\leq\exp(\log(1/\delta)^{O_{s}(1)})\text{ and }d\leq\log(1/\delta)^{O_{s}(1)}

on intermediate correlation structures, although the precise dependence may decay over roughly $s$ stages (wherein we reduce $r^{\ast}$ from $s-1$ to $0$ ).

To get started, we first note that given a function $f$ with large $U^{s+1}$ -norm we may associate to it a correlation structure of degree-rank $(s-1,s-1)$ ; this is little more than chasing definitions and applying induction.

Lemma 6.2.

Fix $\delta\in(0,1/2)$ and $s\geq 2$ . Assume Theorem 1.2 for $s-1$ . Let $f\colon[N]\to\mathbb{C}$ be a $1$ -bounded function such that

\lVert f\rVert_{U^{s+1}[N]}\geq\delta.

Then there exists a degree-rank $(s-1,s-1)$ correlation structure associated to $f$ with parameters $\rho$ , $M$ , $d$ , and $D$ such that

\rho^{-1},M,D\leq\exp(\log(1/\delta)^{O_{s}(1)})\emph{ and }d\leq\log(1/\delta)^{O_{s}(1)}.

Proof.

Note that $\lVert f\rVert_{U^{s+1}[N]}\geq\delta$ implies that

\mathbb{E}_{h\in[N]}\lVert\Delta_{h}f\rVert_{U^{s}[N]}^{2^{s}}\geq\delta^{O_{s}(1)};

this implicitly uses that $\lVert\Delta_{h}f\rVert_{U^{s}[N]}=\lVert\Delta_{-h}f\rVert_{U^{s}[N]}$ and that $\Delta_{h}f$ is identically zero for $|h|>N$ .

Therefore there exists $H\subseteq[N]$ with $|H|\geq\delta^{O_{s}(1)}N$ such that

\lVert\Delta_{h}f\rVert_{U^{s}[N]}^{2^{s}}\geq\delta^{O_{s}(1)}

for $h\in H$ .

By induction on Theorem 1.2, we may assume that for all such $h\in H$ there exists $G_{h}/\Gamma_{h}$ with degree $s-1$ filtration and an associated polynomial sequence $g_{h}(\cdot)$ such that

\mathbb{E}_{h\in[N]}[\Delta_{h}f(n)\overline{F_{h}(g_{h}(n)\Gamma)}]\geq\rho

where $G_{h}/\Gamma_{h}$ has complexity bounded by $M$ and dimension bounded by $d$ . We may take

M,\rho^{-1}\leq\exp(\log(1/\delta)^{O_{s}(1)})\text{ and }d\leq\log(1/\delta)^{O_{s}(1)}.

Note that via writing $g_{h}(0)=\{g_{h}(0)\}[g_{h}(0)]$ where $\psi_{G_{h}}(\{g_{h}(0)\})\in[0,1)^{\dim(G_{h})}$ and $[g_{h}(0)]\in\Gamma_{h}$ , we have that

	$\displaystyle F_{h}(g_{h}(n)\Gamma)$	$\displaystyle=F_{h}(\{g_{h}(0)\}\{g_{h}(0)\}^{-1}g_{h}(n)[g_{h}(0)]^{-1}\cdot[g_{h}(0)]\Gamma)$
		$\displaystyle=F_{h}(\{g_{h}(0)\}\{g_{h}(0)\}^{-1}g_{h}(n)[g_{h}(0)]^{-1}\Gamma)$

Note that $g_{h}^{\prime}(n)=\{g_{h}(0)\}^{-1}g_{h}(n)[g_{h}(0)]^{-1}$ has $g_{h}^{\prime}(0)=\mathrm{id}_{G_{h}}$ and $F_{h}^{\prime}=F_{h}(\{g_{h}(0)\}\cdot)$ is appropriately Lipschitz (as $\{g_{h}(0)\}$ has appropriately bounded coordinates by [42, Lemma B.2]). Therefore without loss we may assume that $g_{h}(0)=\mathrm{id}_{G}$ for all $h\in H$ .

Next note that there are only $O_{s}(M)^{O_{s}(d^{O(1)})}$ nilmanifolds of dimension at most $d$ with degree $(s-1)$ filtration of complexity bounded by $M$ (up to isomorphism). This follows from Lie’s third theorem on the correspondence between Lie algebras and connected, simply connected Lie groups and counting the total possible number of different structure constants and filtration choices for the Lie algebra. Therefore by Pigeonhole we may assume, at the cost of decreasing the size of set $H$ by a multiplicative factor of $O_{s}(M)^{O_{s}(-d^{O(1)})}$ , that $G_{h}/\Gamma_{h}=G/\Gamma$ (and the corresponding filtration) is independent of $h\in H$ .

We next remove the dependence on $h$ for the function $F_{h}$ . Let $\gamma$ be a parameter to be chosen later; by applying Lemma B.3 we may write

F_{h}(g\Gamma)=\sum_{j\in I}\tau_{j}(g\Gamma)^{2}\cdot F_{h}(g\Gamma)

where $|I|\leq(1/\gamma)^{O_{s}(d^{O_{s}(1)})}$ , every $g\Gamma$ is supported on at most $2^{O_{s}(d)}$ many terms, and $\tau_{j}$ are $(M/\gamma)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz. Furthermore each $\tau_{j}$ is supported on a width $2\gamma$ cube near the origin (in Mal’cev coordinates); see the third item of Lemma B.3 for a precise description. Since $F$ is an $M$ -Lipschitz function, and choosing $\gamma$ to be sufficiently small with respect to $(\rho/M)^{O_{s}(d^{O_{s}(1)})}$ , we find that

\sup_{g\in G}|F_{h}(g\Gamma)-\sum_{j\in I}a_{j}\tau_{j}(g\Gamma)^{2}|\leq\rho/2

by taking $a_{j}$ to be the mean of $F_{h}$ on the support of $\tau_{j}$ . Note that $|a_{j}|\leq M$ . Pigeonholing over $j\in I$ and decreasing $\rho$ and the size of $H$ by appropriate factors of $O_{s}(M)^{O_{s}(-d^{O_{s}(1)})}$ , we may assume that $F_{h}=F$ for all $h\in H$ .

We finally want to replace $F$ by a nilcharacter with a vertical frequency and the claimed output dimension bound. We first give $G$ a degree-rank $(s-1,s-1)$ filtration induced by its degree $s-1$ filtration. This is done via [34, Example 6.11] (i.e., $G_{(d,r)}$ is generated by iterated commutators which either have filtration depths adding to greater than $d$ or adding to exactly $d$ with at least $r$ participating elements). Lemma 2.1 guarantees each subgroup is $M^{O_{s}(d^{O_{s}(1)})}$ -rational. Via [42, Lemma B.11], we may give $G$ a Mal’cev basis adapted to this degree-rank $(s-1,s-1)$ filtration with complexity $M^{O_{s}(d^{O_{s}(1)})}$ .

Via Fourier expansion (see [42, Lemma A.6]) and the triangle inequality we may additionally assume that $F$ has a vertical $G_{(s-1,s-1)}$ -frequency $\eta$ ²²2We apply [42, Lemma A.6] to the degree filtration $G_{(0,0)}=G_{(1,0)}\geqslant G_{(2,0)}\geqslant\cdots\geqslant G_{(s-1,0)}\geqslant G_{(s-1,s-1)}\geqslant\mathrm{Id}_{G}$ . with height at most $O_{s}(M/\rho)^{O_{s}(d^{O_{s}(1)})}=\exp(\log(1/\delta)^{O_{s}(1)})$ . Given $F$ , there exists a nilcharacter $F_{\eta}$ by Lemma B.4 with vertical frequency $\eta$ , output dimension bounded by $2^{O_{s}(d)}$ , and such that each coordinate is $O_{s}(M)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz. The function $(F/(2\lVert F\rVert_{\infty}),F_{\eta}\cdot\sqrt{1-|F/(2\lVert F\rVert_{\infty})|^{2}})$ demonstrates that without loss of generality, we may assume $F$ is a coordinate of a nilcharacter.

To complete the deduction, we take $G^{\ast}/\Gamma^{\ast}$ to be the trivial nilmanifold and $g(h,n)$ to be a constant sequence. ∎

The heart of this paper is the following quantification of [34, Theorem 7.2], the proof of which is the goal of the next few sections culminating in Section 11.2.

Lemma 6.3.

Fix $s\geq 2$ and $1\leq r^{\ast}\leq s-1$ . Suppose $f\colon[N]\to\mathbb{C}$ is a $1$ -bounded function and $N\geq\exp(\Omega_{s}((d\log(MD/\rho))^{\Omega_{s}(1)}))$ .

Furthermore suppose that there exists a degree-rank $(s-1,r^{\ast})$ correlation structure associated to $f$ with parameters $\rho$ , $M$ , $d$ , and $D$ . Then there exists a degree-rank $(s-1,r^{\ast}-1)$ correlation structure associated to $f$ with parameters $\rho^{\prime}$ , $M^{\prime}$ , $d^{\prime}$ , and $D^{\prime}$ such that

\rho^{\prime-1},M^{\prime},D^{\prime}\leq\exp(O_{s}((d\log(MD/\rho))^{O_{s}(1)}))\emph{ and }d^{\prime}\leq O_{s}((d\log(MD/\rho))^{O_{s}(1)}).

Combining Lemma 6.3 along with the observation that degree-rank $(s-1,0)$ nilmanifolds induce a degree $(s-2)$ filtration (coming from the groups $G_{(i,0)}$ ), we immediately obtain the following. In particular, these can now be “hidden” inside the nilmanifolds implicit in $\operatorname{Corr}(\cdot,\cdot,\cdot,\cdot)$ .

Theorem 6.4.

Fix $\delta\in(0,1/2)$ and $s\geq 2$ . Assume Theorem 1.2 for $s-1$ . Let $f\colon[N]\to\mathbb{C}$ be a $1$ -bounded function such that

\lVert f\rVert_{U^{s+1}[N]}\geq\delta.

Then the following data exists:

•

A subset $H\subseteq[N]$ of size at least $\rho N$ ;
•

A multidegree $(1,s-1)$ nilcharacter $\chi(h,n)$ which lives on a nilmanifold $G^{\ast}/\Gamma^{\ast}$ where $\chi$ has a $G^{\ast}_{(1,s-1)}$ -vertical frequency $\eta^{\ast}$ . Furthermore $G^{\ast}/\Gamma^{\ast}$ has dimension bounded by $d$ and complexity bounded by $M$ , the function underlying $\chi$ is $M$ -Lipschitz, $\eta^{\ast}$ has height bounded by $M$ , and the output dimension of $\chi$ is bounded by $D$ ;
•

For all $h\in H$ we have that

$\Delta_{h}f(n)\otimes\overline{\chi(h,n)}\in\operatorname{Corr}(s-2,\rho,M,d).$

Furthermore, we can find such data satisfying

\rho^{-1},M,D\leq\exp(\log(1/\delta)^{O_{s}(1)})\emph{ and }d\leq\log(1/\delta)^{O_{s}(1)}.

Remark.

The case when $N$ is small (i.e., $N\leq\exp(\log(1/\delta)^{O_{s}(1)})$ ) is handled via noting that $\lVert\Delta_{h}f\rVert_{L^{2}[N]}\geq\exp(\log(1/\delta)^{O_{s}(1)})\cdot N^{-O(1)}$ for many $h$ and then applying Fourier analysis. Such an analysis always loses factors of $N$ and thus is only useful in this crude edge case. We will not comment further on such issues.

7. On a Cauchy–Schwarz Argument of Gowers

The proof of Lemma 6.3 is performed in a sequence of stages. We first deduce that the functions correlating with $\Delta_{h}f$ are not arbitrary. Indeed for many additive quadruples $(h_{1},h_{2},h_{3},h_{4})$ we have that the associated tensor product of $\chi_{h}(n)$ exhibits correlation with a degree $(s-2)$ nilsequence.

We first need the following elementary Fourier-analytic lemma which converts correlation on long progressions to correlation with a major-arc Fourier phase; this is essentially [32, Lemma 3.5(ii)].

Lemma 7.1.

Let $\delta\in(0,1/2)$ . Suppose that $g\colon[N]\to\mathbb{C}$ is $1$ -bounded and there exists an arithmetic progression $P$ of length $\delta N$ with common difference $q$ within $[N]$ such that

\big{|}\mathbb{E}_{n\in P}g(n)\big{|}\geq\delta.

Then there exists $\Theta\in\mathbb{R}$ such that $\lVert q\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq\delta^{-O(1)}N^{-1}$ and

\big{|}\mathbb{E}_{n\in[N]}e(\Theta n)g(n)\big{|}\geq\delta^{O(1)}N.

Proof.

Extend $g$ to be zero beyond the interval $[N]$ . Let $P^{\prime}$ be the arithmetic progression of length $\delta^{2}N$ with common difference $q$ centered at $0$ . We have

\bigg{|}\sum_{n\in\mathbb{Z}}(\mathbbm{1}_{P}\ast(|P^{\prime}|^{-1}\mathbbm{1}_{P^{\prime}}))(n)g(n)\bigg{|}\geq\delta^{O(1)}N.

Via Fourier inversion, we have

\bigg{|}\int_{\Theta\in\mathbb{T}}\widehat{g}(\Theta)\overline{\widehat{\mathbbm{1}_{P}}(\Theta)\widehat{\mathbbm{1}_{P^{\prime}}}(\Theta)}d\Theta\bigg{|}\geq\delta^{O(1)}N^{2}.

Now via standard bounds on linear exponential sums, we have

\displaystyle|\widehat{\mathbbm{1}_{P}}(\Theta)|,|\widehat{\mathbbm{1}_{P^{\prime}}}(\Theta)|

\displaystyle\lesssim\min(\lVert q\Theta\rVert_{\mathbb{R}/\mathbb{Z}}^{-1},N).

Since $|\widehat{g}(\Theta)|\leq N$ , we have that

\bigg{|}\int_{\lVert q\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\geq T/N}\widehat{g}(\Theta)\widehat{\mathbbm{1}_{P}}(\Theta)\widehat{\mathbbm{1}_{P^{\prime}}}(\Theta)d\Theta\bigg{|}\lesssim N^{2}/T.

Therefore, taking $T=\delta^{-O(1)}$ sufficiently large we have that

N^{2}\int_{\lVert q\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq T/N}|\widehat{g}(\Theta)|d\Theta\geq\bigg{|}\int_{\lVert q\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq T/N}\widehat{g}(\Theta)\widehat{\mathbbm{1}_{P}}(\Theta)\widehat{\mathbbm{1}_{P^{\prime}}}(\Theta)d\Theta\bigg{|}\geq\delta^{O(1)}N^{2}.

Thus

\sup_{\lVert q\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq T/N}|\widehat{g}(\Theta)|\geq\delta^{O(1)}T^{-1}N,

which is exactly the desired conclusion (recalling that $T=\delta^{-O(1)}$ ). ∎

The following lemma is due ultimately to Gowers but essentially appears as [32, Proposition 6.1]. We include the proof for the sake of completeness.

Lemma 7.2.

Suppose $\delta\in(0,1/2)$ , $f_{1},f_{2}\colon[N]\to\mathbb{C}$ are $1$ -bounded, and $\chi_{h}\colon\mathbb{Z}\to\mathbb{C}$ are all $1$ -bounded. Suppose that

\mathbb{E}_{h\in[N]}|\mathbb{E}_{n\in[N]}f_{2}(n)\Delta_{h}f_{1}(n)\overline{\chi_{h}(n)}|\geq\delta.

Then there exists $\Theta$ such that $\lVert\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq\delta^{-O(1)}/N$ and

\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\bigg{|}\mathbb{E}_{n\in[N]}\chi_{h_{1}}(n)\chi_{h_{2}}(n+h_{1}-h_{4})\overline{\chi_{h_{3}}(n)}\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\cdot e\big{(}\Theta n\big{)}\bigg{|}\geq\delta^{O(1)}.

Proof.

Note that we assume that $\chi_{h}(n)=0$ for $h\notin[N]$ and that $\chi_{h}(n)=0$ for $n\notin[N]$ via replacing $\chi_{h}(n)$ with $\chi_{h}(n)\cdot\mathbbm{1}_{n\in[N]}$ ; we will remove this truncation at the end of the argument. We extend these functions by $0$ to $\mathbb{Z}/\widetilde{N}\mathbb{Z}$ where $\widetilde{N}$ is a prime between $4N$ and $8N$ .

By Cauchy–Schwarz, we have

\mathbb{E}_{h\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}|\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}f_{2}(n)\Delta_{h}f_{1}(n)\overline{\chi_{h}(n)}|^{2}\gg\delta^{2}.

Expanding, this is equivalent to

\mathbb{E}_{h\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\mathbb{E}_{n_{1},n_{2}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}f_{2}(n_{1})f_{1}(n_{1})\overline{f_{1}(n_{1}+h)}\overline{f_{2}(n_{2})f_{1}(n_{2})}f_{1}(n_{2}+h)\overline{\chi_{h}(n_{1})}\chi_{h}(n_{2})\gg\delta^{2}.

We set $n=n_{1}$ , $k=n_{2}-n_{1}$ , and $m=n_{1}+h$ and find that

\mathbb{E}_{m,n\in\mathbb{Z}/\widetilde{N}\mathbb{Z},k\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Delta_{k}(f_{2}f_{1})(n)\Delta_{k}\overline{f_{1}(m)}\Delta_{k}\overline{\chi_{m-n}(n)}\gtrsim\delta^{2}.

This implies that

\mathbb{E}_{k\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}|\mathbb{E}_{m,n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Delta_{k}(f_{2}f_{1})(n)\Delta_{k}\overline{f_{1}(m)}\Delta_{k}\overline{\chi_{m-n}(n)}|^{4}\gtrsim\delta^{8}.

Recall the box-norm inequality that for $a,b,\Phi$ which are $1$ -bounded, we have

	$\displaystyle\|\mathbb{E}_{n,m\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}a(n)b(m)\Phi(n,m)\|^{4}$	$\displaystyle\leq\big{(}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{m\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}b(m)\Phi(n,m)\|\big{)}^{4}$
		$\displaystyle\leq\big{(}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{m\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}b(m)\Phi(n,m)\|^{2}\big{)}^{2}$
		$\displaystyle=\big{(}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\mathbb{E}_{m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}b(m)\overline{b(m^{\prime})}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\|\big{)}^{2}$
		$\displaystyle=\big{(}\mathbb{E}_{m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\|\big{)}^{2}$
		$\displaystyle\leq\mathbb{E}_{m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\|^{2}$
(7.1)			$\displaystyle=\mathbb{E}_{n,n^{\prime},m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\overline{\Phi(n^{\prime},m)}\Phi(n^{\prime},m^{\prime})\big{)}.$

Applying this for each fixed $k$ , we have that

\mathbb{E}_{k\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\mathbb{E}_{n,n^{\prime},m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Delta_{k}\overline{\chi_{m-n}(n)}\Delta_{k}\overline{\chi_{m^{\prime}-n^{\prime}}(n^{\prime})}\Delta_{k}\chi_{m^{\prime}-n}(n)\Delta_{k}\chi_{m-n^{\prime}}(n^{\prime})\gtrsim\delta^{8}.

Take $m^{\prime}-n=h_{1}$ , $m-n^{\prime}=h_{2}$ , $m-n=h_{3}$ , $m^{\prime}-n^{\prime}=h_{4}$ . Note that $n^{\prime}-n=h_{1}-h_{4}$ and $h_{1}+h_{2}=h_{3}+h_{4}$ and noting that $n$ and $n+k$ range over the whole cyclic group, this is exactly

\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}\end{subarray}}\big{|}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\chi_{h_{1}}(n)\chi_{h_{2}}(n+h_{1}-h_{4})\overline{\chi_{h_{3}}(n)}\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\big{|}^{2}\gtrsim\delta^{8}.

Since $\chi_{h}(n)=0$ identically for $h\notin[N]$ , we in fact have

\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\big{|}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\chi_{h_{1}}(n)\chi_{h_{2}}(n+h_{1}-h_{4})\overline{\chi_{h_{3}}(n)}\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\big{|}^{2}\gtrsim\delta^{8}.

For the inner sum, recall that we “truncated” $\chi_{h}(n)$ with $\mathbbm{1}_{n\in[N]}$ . In particular, extracting the truncation term we have that

\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\bigg{|}\mathbb{E}_{n\in[N]}\mathbbm{1}_{1\leq n+h_{1}-h_{4}\leq N}\chi_{h_{1}}(n)\chi_{h_{2}}(n+h_{1}-h_{4})\overline{\chi_{h_{3}}(n)}\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\bigg{|}^{2}\gtrsim\delta^{8}.

Via an application of Lemma 7.1, there exist choices of $\Theta_{\vec{h}}$ with $\lVert\Theta_{\vec{h}}\rVert_{\mathbb{R}/\mathbb{Z}}\leq\delta^{-O(1)}/N$ such that

\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\big{|}\mathbb{E}_{n\in[N]}\chi_{h_{1}}(n)\chi_{h_{2}}(n+h_{1}-h_{4})\overline{\chi_{h_{3}}(n)}\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}e(\Theta_{\vec{h}}n)\big{|}^{2}\gtrsim\delta^{O(1)}.

Rounding $\Theta_{\vec{h}}$ to a lattice of spacing $\delta^{O(1)}/N$ and Pigeonholing then gives the desired result. ∎

The next proof will require defining the notion when two nilcharacters are “equivalent” (i.e., have the same symbol in a quantified sense of [34, Appendix E]).

Definition 7.3.

We say nilcharacters $\chi,\chi^{\prime}$ are $(M,D,d)$ -equivalent for multidegree $J$ if $\chi,\chi^{\prime}$ have output dimensions bounded by $D$ and all coordinates of

\chi\otimes\overline{\chi^{\prime}}

can be represented as sums of at most $M$ nilsequences of multidegree $J$ such that the underlying functions of each nilsequence are $M$ -Lipschitz and the underlying nilmanifolds have complexity bounded by $M$ and dimension bounded by $d$ .

The key reason for the definition of equivalence is the following proposition, which states that given equivalent nilcharacters $\chi$ and $\chi^{\prime}$ , correlations with them are equivalent modulo introducing a term of multidegree $J$ . This is a finitary quantification of [34, Lemma E.7].

Lemma 7.4.

Given a function $f\colon\Omega\to\mathbb{C}^{L}$ and nilcharacters $\chi,\chi^{\prime}$ which are $(M,D,d)$ -equivalent for multidegree $J$ , if

\lVert\mathbb{E}_{\vec{n}\in\Omega}f(\vec{n})\otimes\chi(\vec{n})\rVert_{\infty}\geq\rho

then

\lVert\mathbb{E}_{\vec{n}\in\Omega}f(\vec{n})\otimes\chi^{\prime}(\vec{n})\cdot\psi(\vec{n})\rVert_{\infty}\geq(\rho/(MD))^{O(1)},

where $\psi$ can be taken to be one of the nilsequences used as part of a represention of one of the coordinates in $\chi\otimes\overline{\chi^{\prime}}$ . In particular, $\psi$ is a nilsequence of multidegree $J$ such that underlying nilmanifold has complexity bounded by $M$ and dimension bounded by $d$ and the underlying function has Lipschitz constant bounded by $M$ .

Remark.

The additional condition that $\psi$ can be taken to be an explicit nilsequence occurring in a witness for the equivalence of $\chi,\chi^{\prime}$ is used primarily to allow us to Pigeonhole the choice of $\psi$ in cases where we may need to apply this statement “on average”.

Proof.

Notice that since $\chi^{\prime}$ is a nilcharacter, we have that the trace of

\chi^{\prime}\otimes\overline{\chi^{\prime}}

is the constant function $1$ . Furthermore note that the trace is the sum of at most $D$ coordinates of $\chi^{\prime}\otimes\overline{\chi^{\prime}}$ and therefore

\lVert\mathbb{E}_{\vec{n}\in\Omega}f(\vec{n})\otimes\chi(\vec{n})\otimes\overline{\chi^{\prime}(\vec{n})}\otimes\chi^{\prime}(\vec{n})\rVert_{\infty}\geq\rho/D.

Consider the coordinate of $f(\vec{n})\otimes\chi(\vec{n})\otimes\overline{\chi^{\prime}(\vec{n})}\otimes\chi^{\prime}(\vec{n})$ which achieves the $L^{\infty}$ above, and in particular the associated coordinate of $\chi(\vec{n})\otimes\overline{\chi^{\prime}(\vec{n})}$ that contributes. Applying the definition of equivalence and the triangle inequality, there exists $\psi(\vec{n})$ of the desired form such that

\lVert\mathbb{E}_{\vec{n}\in\Omega}f(\vec{n})\otimes\chi^{\prime}(\vec{n})\cdot\psi(\vec{n})\rVert_{\infty}\geq(\rho/(MD))^{O(1)}.\qed

We are now in position to prove the quantification of [34, Proposition 7.3]. We remark that there was an error in the published version of [34, Proposition 8.3] which affected the proof of [34, Proposition 7.3]. We quantify a closely related approach to that given in the erratum [31]. For our proof we require various quantifications of [34, Appendix E]; all of these are completely mechanical.

Lemma 7.5.

Fix $s\geq 3$ and $1\leq r^{\ast}\leq s-1$ . Let $f\colon[N]\to\mathbb{C}$ is a $1$ -bounded function. Suppose that $f$ has a correlation structure with parameters $\rho$ , $M$ , $d$ , and $D$ and associated nilcharacters $\chi(h,n)$ and $\chi_{h}(n)$ . Then for at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}N^{3}$ quadruples $h_{1},h_{2},h_{3},h_{4}\in H$ with $h_{1}+h_{2}=h_{3}+h_{4}$ we have

\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\in\operatorname{Corr}(s-2,\rho^{\prime},M^{\prime},d^{\prime})

with

\rho^{\prime-1},M^{\prime}\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}\emph{ and }d^{\prime}\leq O_{s}(d^{O_{s}(1)}).

Remark 7.6.

For $s=2$ , the same statement holds modulo a correction term of $e(\Theta n)$ where $\Theta$ is such that $\lVert\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}/N$ .

Proof.

By definition of correlation structures we have for $h\in H$ that

\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\otimes\overline{\chi(h,n)}\otimes\overline{\chi_{h}(n)}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}\geq\rho

where $\psi_{h}$ is a nilsequence of degree $(s-2)$ whose underlying function is at most $M$ -Lipschitz on a nilmanifold of complexity at most $M$ and dimension at most $d$ . Setting $\chi_{h}(n)$ to be zero for $h\notin H$ we have

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}f(n)\overline{f(n+h)}\otimes\overline{\chi(h,n)}\otimes\overline{\chi_{h}(n)}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}\geq\rho^{2}.

Twisting $\psi_{h}$ by an appropriate $h$ -dependent constant complex phase so as to make the $L^{\infty}$ values be realized as positive real numbers, we may assume that

\lVert\mathbb{E}_{h\in[N]}\mathbb{E}_{n\in[N]}f(n)\overline{f(n+h)}\otimes\overline{\chi(h,n)}\otimes\overline{\chi_{h}(n)}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}\geq\rho^{2}/D^{2}.

By Lemma C.5, we have that $\chi(h,n)$ is $((MD)^{O_{s}(d^{O_{s}(1)})},(MD)^{O_{s}(d^{O_{s}(1)})},d^{O_{s}(1)})$ -equivalent for degree $(s-1)$ to some $\widetilde{\chi}(h,n,\ldots,n)$ which is a multidegree $(1,\ldots,1)$ nilcharacter with output dimension, complexity of underlying nilmanifold, Lipschitz constant of underlying function for each coordinate, and vertical frequency height all bounded by $(MD)^{O_{s}(d^{O_{s}(1)})}$ . ( $\widetilde{\chi}$ has $s$ total arguments.) Thus, applying Lemma 7.4, we have that

\displaystyle\lVert\mathbb{E}_{n,h\in[N]}f(n)\overline{f(n+h)}\otimes\overline{\widetilde{\chi}(h,n,\ldots,n)}\otimes\overline{\chi_{h}(n)}\cdot\overline{\psi_{h}(n)}\cdot\widetilde{\psi}(h,n)\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})},

where $\widetilde{\psi}(h,n)$ is a degree $(s-1)$ nilsequence where the underlying function has Lipschitz norm and complexity of underlying nilmanifold bounded $(MD)^{O_{s}(d^{O_{s}(1)})}$ while the dimension of the underlying nilmanifold is bounded by $O_{s}(d^{O_{s}(1)})$ . The nilsequence $\widetilde{\psi}(h,n)$ can also be viewed as a multidegree $(0,s-1)\cup(s-1,s-2)$ nilsequence. (I.e., we take the union of the down-sets generated by these elements.) Furthermore, the underlying function has Lipschitz norm and complexity of underlying nilmanifold bounded $(MD)^{O_{s}(d^{O_{s}(1)})}$ while the dimension of the underlying nilmanifold is bounded by $O_{s}(d^{O_{s}(1)})$ .

Thus, applying Lemma C.6 (splitting) we have

\displaystyle\lVert\mathbb{E}_{n,h\in[N]}f(n)\overline{f(n+h)}\otimes\overline{\widetilde{\chi}(h,n,\ldots,n)}\otimes\overline{\chi_{h}(n)}\cdot\overline{\widetilde{\psi_{h}}(n)}\cdot b(n)\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}

where $\widetilde{\psi_{h}}$ are degree $(s-2)$ nilsequences in $n$ where complexity and Lipschitz constant are bounded by $(MD)^{O_{s}(d^{O_{s}(1)})}$ and the dimension of the underlying nilmanifold is bounded by $O_{s}(d^{O_{s}(1)})$ while $b(n)$ is $(MD)^{O_{s}(d^{O_{s}(1)})}$ -bounded. Therefore, applying Lemma 7.2, we have

	$\displaystyle\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\lVert\mathbb{E}_{n\in[N]}\widetilde{\chi}(h_{1},n,\ldots,n)\otimes\widetilde{\chi}(h_{2},n+h_{1}-h_{4},\ldots,n+h_{1}-h_{4})\otimes\overline{\widetilde{\chi}(h_{3},n,\ldots,n)}$
	$\displaystyle\otimes\overline{\widetilde{\chi}(h_{4},n+h_{1}-h_{4},\ldots,n+h_{1}-h_{4})}\otimes\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}$
	$\displaystyle\cdot\widetilde{\psi_{h_{1}}}(n)\widetilde{\psi_{h_{2}}}(n)\overline{\widetilde{\psi_{h_{3}}}(n)}\overline{\widetilde{\psi_{h_{4}}}(n)}e(\Theta n)\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}.$

We may combine $\widetilde{\psi_{h_{1}}}(n)\widetilde{\psi_{h_{2}}}(n)\overline{\widetilde{\psi_{h_{3}}}(n)}\overline{\widetilde{\psi_{h_{4}}}(n)}e(\Theta n)$ to form $\psi_{h_{1},h_{2},h_{3},h_{4}}^{\ast}(n)$ which is degree $(s-2)$ in $n$ and with identical complexity bounds to $\widetilde{\psi_{h_{1}}}$ modulo changing implicit constant. Additionally, we may twist $\psi_{h_{1},h_{2},h_{3},h_{4}}^{\ast}$ by an $(h_{1},h_{2},h_{3},h_{4})$ -dependent complex phase to bring the outer expectation inside the norm. Thus we have

	$\displaystyle\lVert\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\mathbb{E}_{n\in[N]}\widetilde{\chi}(h_{1},n,\ldots,n)\otimes\widetilde{\chi}(h_{2},n+h_{1}-h_{4},\ldots,n+h_{1}-h_{4})\otimes\overline{\widetilde{\chi}(h_{3},n,\ldots,n)}$
	$\displaystyle\qquad\otimes\overline{\widetilde{\chi}(h_{4},n+h_{1}-h_{4},\ldots,n+h_{1}-h_{4})}\otimes\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}$
	$\displaystyle\qquad\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\cdot\psi^{\ast}_{h_{1},h_{2},h_{3},h_{4}}(n)\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}.$

By Lemma C.5, $\chi(h_{2},n+h_{1}-h_{4},\ldots,n+h_{1}-h_{4})$ is $((MD)^{O_{s}(d^{O_{s}(1)})},(MD)^{O_{s}(d^{O_{s}(1)})},d^{O_{s}(1)})$ -equivalent for degree $(s-1)$ to

\bigotimes_{k=0}^{s-1}\chi(h_{2},n,\ldots,n,h_{1}-h_{4},\ldots,h_{1}-h_{4})

where there are $s-k-1$ copies of $n$ and $k$ copies of $h_{1}-h_{4}$ and we have a similar expansion for $\chi(h_{4},n+h_{1}-h_{4},\ldots,n+h_{1}-h_{4})$ . Note that all terms in this expansion except for $k=0$ may be absorbed into $\psi^{\ast}$ . Therefore applying Lemma 7.4, we have that

	$\displaystyle\lVert\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\mathbb{E}_{n\in[N]}\widetilde{\chi}(h_{1},n,\ldots,n)\otimes\widetilde{\chi}(h_{2},n,\ldots,n)\otimes\overline{\widetilde{\chi}(h_{3},n,\ldots,n)}$
	$\displaystyle\qquad\otimes\overline{\widetilde{\chi}(h_{4},n,\ldots,n)}\otimes\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}$
	$\displaystyle\qquad\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\cdot\psi^{\ast}_{h_{1},h_{2},h_{3},h_{4}}(n)\cdot\tau(n,h_{1},h_{2},h_{3},h_{4})\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})};$

here $\tau(n,h_{1},h_{2},h_{3},h_{4})$ is a degree $(s-1)$ nilsequence where the underlying function has Lipschitz norm and complexity of underlying nilmanifold bounded by $(MD)^{O_{s}(d^{O_{s}(1)})}$ while the dimension of the underlying nilmanifold is bounded by $O_{s}(d^{O_{s}(1)})$ and we have folded certain terms into $\psi^{\ast}$ while guaranteeing it is a degree $(s-2)$ nilsequence (and the complexity bounds have not changed modulo implicit constants). Finally via Lemma C.5, we have that

\widetilde{\chi}(h_{1},n,\ldots,n)\otimes\widetilde{\chi}(h_{2},n,\ldots,n)\otimes\overline{\widetilde{\chi}(h_{3},n,\ldots,n)}\otimes\overline{\widetilde{\chi}(h_{4},n,\ldots,n)}

and $\widetilde{\chi}(h_{1}+h_{2}-h_{3}-h_{4},n,\ldots,n)$ are $((MD)^{O_{s}(d^{O_{s}(1)})},(MD)^{O_{s}(d^{O_{s}(1)})},d^{O_{s}(1)})$ -equivalent for degree $(s-1)$ . Thus applying Lemma 7.4, we have

	$\displaystyle\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\lVert\mathbb{E}_{n\in[N]}\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}$
	$\displaystyle\qquad\qquad\cdot\widetilde{\chi}(0,n,\ldots,n)\psi^{\ast}_{h_{1},h_{2},h_{3},h_{4}}(n)\tau(n,h_{1},h_{2},h_{3},h_{4})\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})};$

here we have folded in various terms into $\tau(n,h_{1},\ldots,h_{4})$ and the complexity bounds have not changed modulo implicit constants. Note that by Lemma C.2, $\widetilde{\chi}(0,n,\ldots,n)$ is a degree $(s-1)$ nilsequence in $n$ and thus may abusively also be absorbed into $\tau$ . Finally noting that a degree $(s-1)$ nilsequence may also be viewed as a multidegree $(s-1,0,\ldots,0)\cup(s-2,s-1,\ldots,s-1)$ nilsequence and thus applying Lemma C.6 we have

	$\displaystyle\lVert\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\mathbb{E}_{n\in[N]}\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\cdot\psi^{\ast}_{h_{1},h_{2},h_{3},h_{4}}(n)b(n)\rVert_{\infty}$
	$\displaystyle\qquad\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})},$

where $b(n)$ is an $(MD)^{O_{s}(d^{O_{s}(1)})}$ -bounded function and $\psi^{\ast}$ has been modified but the underlying complexity bounds have not changed modulo implicit constants. Note $\psi^{\ast}$ is degree $(s-2)$ .

We now reparameterize with

h_{1}=m-n,h_{2}=m^{\prime}-n^{\prime},h_{3}=m^{\prime}-n,h_{4}=m-n^{\prime}.

By approximating with regions where we take $m,m^{\prime},n,n^{\prime}$ to live in short intervals, there exist intervals $I_{1},\ldots,I_{4}$ each of density $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ in $[\pm 2N]$ such that

	$\displaystyle\bigg{\lVert}\mathbb{E}_{m\in I_{1},m^{\prime}\in I_{2},n\in I_{3},n^{\prime}\in I_{4}}$	$\displaystyle\chi_{m-n}(n)\otimes\chi_{m^{\prime}-n^{\prime}}(n^{\prime})\otimes\overline{\chi_{m^{\prime}-n}(n)}\otimes\overline{\chi_{m-n^{\prime}}(n^{\prime})}$
		$\displaystyle\qquad\cdot\psi^{\ast}_{m-n,m^{\prime}-n^{\prime},m^{\prime}-n,m-n^{\prime}}(n)b(n)\bigg{\rVert}_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$

where $\psi_{m-n,m^{\prime}-n^{\prime},m^{\prime}-n,m-n^{\prime}}$ is a degree $(s-2)$ nilsequence. Now by Cauchy–Schwarz, duplicating the variable $m$ and denoting the copies by $m,m^{\prime\prime}$ , we obtain

	$\displaystyle\bigg{\lVert}\mathbb{E}_{m,m^{\prime\prime}\in I_{1},m^{\prime}\in I_{2},n\in I_{3},n^{\prime}\in I_{4}}$	$\displaystyle\chi_{m-n}(n)\otimes\overline{\chi_{m^{\prime\prime}-n}(n)}\otimes\overline{\chi_{m-n^{\prime}}(n^{\prime})}\otimes\chi_{m^{\prime\prime}-n^{\prime}}(n^{\prime})$
		$\displaystyle\cdot\psi^{\ast}_{m-n,m^{\prime}-n^{\prime},m^{\prime}-n,m-n^{\prime},m^{\prime\prime}-n,m^{\prime\prime}-n^{\prime}}(n)\bigg{\rVert}_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}.$

Note that every term not involving $m$ was removed using appropriate boundedness. Now we may Pigeonhole on $m^{\prime}-n=t$ and deduce

	$\displaystyle\bigg{\lVert}\mathbb{E}_{m,m^{\prime\prime}\in I_{1},n\in I_{3},n^{\prime}\in I_{4}}$	$\displaystyle\chi_{m-n}(n)\otimes\overline{\chi_{m^{\prime\prime}-n}(n)}\otimes\overline{\chi_{m-n^{\prime}}(n^{\prime})}\otimes\chi_{m^{\prime\prime}-n^{\prime}}(n^{\prime})$
		$\displaystyle\qquad\cdot\psi^{\ast}_{m-n,m-n^{\prime},m^{\prime\prime}-n,m^{\prime\prime}-n^{\prime}}(n)\cdot\mathbbm{1}[n+t\in I_{2}]\bigg{\rVert}_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}.$

Let $m^{\prime\prime}-n=h_{1}$ , $m-n^{\prime}=h_{2}$ , $m-n=h_{3}$ , and $m^{\prime\prime}-n^{\prime}=h_{4}$ (abusively). We have

	$\displaystyle\bigg{\lVert}$	$\displaystyle\mathbb{E}_{n\in[N]}\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[\pm N]\end{subarray}}\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}$
		$\displaystyle\quad\cdot\chi_{h_{1},h_{2},h_{3},h_{4}}(n)\cdot\mathbbm{1}[n+h_{3},n+h_{1}\in I_{1},n\in I_{3},n+h_{1}-h_{4}\in I_{4},n+t\in I_{2}]\bigg{\rVert}_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$

where $\chi_{h_{1},h_{2},h_{3},h_{4}}(n)$ is a degree $(s-2)$ nilsequence (for each fixed $h_{1},h_{2},h_{3},h_{4}$ ) where the underlying nilmanifold and Lipschitz constant of underlying function are bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ and the dimension is bounded by $O_{s}(d^{O_{s}(1)})$ .

Therefore, by the triangle inequality we have that

	$\displaystyle\mathbb{E}$	${}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[\pm N]\end{subarray}}\bigg{\lVert}\mathbb{E}_{n\in[N]}\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}$
		$\displaystyle\qquad\chi_{h_{1},h_{2},h_{3},h_{4}}(n)\cdot\mathbbm{1}[n+h_{3},n+h_{1}\in I_{1},n\in I_{3},n+h_{1}-h_{4}\in I_{4},n+t\in I_{2}]\bigg{\rVert}_{\infty}\gtrsim(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}.$

Finally, the last term is the indicator of an $\vec{h}$ -dependent interval. Applying Lemma 7.1 (and noting that $s\geq 3$ allows us to fold in the major arc Fourier term) completes the proof. ∎

8. Sunflower Step

For the next stage of our proof, as outlined in Section 4, we wish to provide more structure on $h$ -dependent nilcharacters $\chi_{h}$ given information about additive quadruples as established in Section 7. As setup we will require the notion of a rational subspace with respect to a specified basis, and establish some basic control over Taylor coefficients of bounded polynomial sequences.

Definition 8.1.

A vector subspace $V^{\prime}\leqslant V$ is $Q$ -rational with respect to $V$ given the basis $\mathcal{B}=\{B_{1},\ldots,B_{\dim(V)}\}$ (of $V$ ) if there exists a basis $\mathcal{B}^{\prime}=\{B_{1}^{\prime},\ldots,B_{\dim(V^{\prime})}^{\prime}\}$ of $V^{\prime}$ such that each $B_{j}^{\prime}$ is a linear combination of elements of $\mathcal{B}$ with coefficients of height at most $Q$ .

Lemma 8.2.

Consider a nilmanifold $G/\Gamma$ given a degree-rank filtration of degree rank $(s,r)$ , dimension $d$ , and complexity at most $M$ . Let $\mathcal{X}$ denote the underlying adapted Mal’cev basis and assign the basis

\mathcal{X}_{i}=(\mathcal{X}\cap\log(G_{(i,1)}))/\log(G_{(i,2)})

for $G_{(i,1)}/G_{(i,2)}$ . Suppose $\varepsilon$ is a polynomial sequence such that

d_{G,\mathcal{X}}(\mathrm{id}_{G},\varepsilon(n))\leq M

for $n\in[N]$ . Then for $1\leq i\leq s$ , we have

d_{G_{(i,1)}/G_{(i,2)},\mathcal{X}_{i}}(\operatorname{Taylor}_{i}(\varepsilon),\mathrm{id}_{G_{(i,1)}/G_{(i,2)}})\leq M^{O_{s}(d^{O_{s}(1)})}N^{-i}.

Proof.

We may write

\varepsilon(n)=\exp\bigg{(}\sum_{j=0}^{s}\varepsilon_{j}\binom{n}{j}\bigg{)}

where $\varepsilon_{j}\in\log(G_{(j,0)})$ . By Lemma 2.12, we have

\operatorname{Taylor}_{i}(\varepsilon)=\exp(\varepsilon_{i})~{}\mathrm{mod}~{}G_{(i,2)}.

We have that

\lVert\psi_{\mathrm{exp}}(\varepsilon(n))\rVert_{\infty}\leq M^{O_{s}(d^{O_{s}(1)})}

for all $n\in[N]$ by [42, Lemmas B.1, B.3]. This implies that

\bigg{\lVert}\sum_{t=0}^{j}(-1)^{t}\binom{j}{t}\psi_{\mathrm{exp}}(\varepsilon(t\cdot\lfloor N/(2j)\rfloor+1))\bigg{\rVert}_{\infty}\leq M^{O_{s}(d^{O_{s}(1)})}.

This is exactly the $j$ -th discrete derivative and thus terms coming from $\varepsilon_{i}$ with $i<j$ vanish. This implies that

\lVert\varepsilon_{j}N^{j}~{}\mathrm{mod}~{}\log(G_{(j,2)})\rVert_{\infty}\leq M^{O_{s}(d^{O_{s}(1)})},

where the basis we assign to $\log(G_{(j,1)}/G_{(j,2)})$ is $\mathcal{X}_{j}$ . The result follows by dividing by $N^{-j}$ and noting, by say [45, Lemma 2.6], that the distance in first- and second-kind coordinates is comparable. ∎

We now come to the first of two crucial arguments in this paper where we “improve” the correlation structure. At the cost of restricting the set $H$ , we force the Taylor coefficients of $g_{h}$ , the polynomial sequences underlying the $\chi_{h}$ , to live in certain restricted subspaces and their differences to lie in an even finer restriction.

This step is closely related to the “sunflower” arguments of [32, Step 1] and [34, Lemma 11.3]; a quantitative version for the $U^{4}$ -inverse theorem due to the first author can be found in [43]. The precise statement of the lemma should also be compared with [34, Theorem 11.1(i)]. We note however that unlike [32, 34], our proof is completely free of any iteration (or equivalently passing to a subgroup where polynomial sequences are “totally equidistributed”, which necessitates too much loss in the relevant parameters).

Thus, the crucial point of the following technical statement is the final condition, which essentially captures that two $h$ -dependent frequencies in the improved correlation structure cannot “simultaneouly” affect the bottom degree-rank portion.

Lemma 8.3.

Fix $s\geq 2$ and $1\leq r^{\ast}\leq s-1$ . Let $f\colon[N]\to\mathbb{C}$ be a $1$ -bounded function. Suppose that $f$ has a degree-rank $(s-1,r^{\ast})$ correlation structure with parameters $\rho$ , $M$ , $d$ , and $D$ and that $N\geq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ and data labeled as in Definition 6.1. Furthermore let $\mathcal{X}_{i}=(\mathcal{X}\cap\log(G_{(i,1)}))/\log(G_{(i,2)})$ .

We output a new degree-rank $(s-1,r^{\ast})$ correlation structure for $f$ with parameters

\displaystyle\rho^{\prime-1}

\displaystyle\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})},\quad M^{\prime}\leq O(M),\quad D^{\prime}=D,\quad d^{\prime}\leq O(d),

with set $H^{\prime}\subseteq H$ , with multidegree $(1,s-1)$ nilcharacter $\chi^{\prime}(h,n)={F^{\ast}}^{\prime}(g^{\prime}(h,n){\Gamma^{\ast}}^{\prime})$ on $(G^{\ast})^{\prime}=G^{\ast}\times\mathbb{R}$ , with $h$ -dependent nilcharacters $\chi_{h}^{\prime}$ having underlying polynomial sequences $g_{h}^{\prime}(n)=F^{\prime}(g_{h}^{\prime}(n)\Gamma)$ on $G^{\prime}=G$ . This correlation structure satisfies:

•

$(G^{\ast})^{\prime}$ is given the multidegree filtration

$(G^{\ast})^{\prime}_{(i,j)}=(G^{\ast})_{(i,j)}\times\{0\}$

if $(i,j)\neq(0,0)$ or $(0,1)$ . For $(i,j)\in\{(0,0),(0,1)\}$ we set

$(G^{\ast})^{\prime}_{(i,j)}=(G^{\ast})_{(i,j)}\times\mathbb{R}.$

We have ${F^{\ast}}^{\prime}((x,z)(\Gamma^{\ast}\times\mathbb{Z}))=F^{\ast}(x\Gamma^{\ast})\cdot e(z)$ . We have $g^{\prime}(h,n)=(g(h,n),\Theta n)$ for some appropriate value of $\Theta$ ;
•

There exists a collection of $\mathbb{R}$ -vector spaces $V_{i,\mathrm{Dep}}\leqslant V_{i}\leqslant G_{(i,1)}/G_{(i,2)}$ which are all $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational with respect to $\exp(\mathcal{X}_{i})$ for each $i$ ;

•

For $1\leq i\leq s-1$ and $h,h_{1},h_{2}\in H^{\prime}$ we have

\operatorname{Taylor}_{i}(g_{h}^{\prime})\in V_{i},\qquad\operatorname{Taylor}_{i}(g_{h_{1}}^{\prime})-\operatorname{Taylor}_{i}(g_{h_{2}}^{\prime})\in V_{i,\mathrm{Dep}};

•

$F^{\prime}$ is $M^{\prime}$ -Lipschitz and has the same vertical frequency $\eta$ as $F$ ;
•

For integers $i_{1}+\cdots+i_{r^{\ast}}=s-1$ , suppose that $v_{i_{\ell}}\in V_{i_{\ell}}$ and for at least two distinct indices $\ell_{1},\ell_{2}$ we have $v_{i_{\ell_{1}}}\in V_{i_{\ell_{1}},\mathrm{Dep}}$ and $v_{i_{\ell_{2}}}\in V_{i_{\ell_{2}},\mathrm{Dep}}$ . Then for $w$ which is any $(r^{\ast}-1)$ -fold commutator of $v_{i_{1}},\ldots,v_{i_{r^{\ast}}}$ , we have

$\eta(w)=0.$

Remark.

Consider $g_{i_{j}}=\exp(X_{i_{j}})$ with $g_{i_{j}}\in G_{i_{j},0}$ for $1\leq j\leq r^{\ast}$ and $i_{1}+\cdots+i_{r^{\ast}}=s-1$ . Fixing any $(r^{\ast}-1)$ -fold commutator $w$ of $g_{i_{1}},\ldots,g_{i_{r^{\ast}}}$ , repeated application of the commutator version of Baker–Campbell–Hausdorff (e.g. (2.2)) implies that

w=\exp([X_{i_{1}},\ldots,X_{i_{r^{\ast}}}])

where the associated commutator has the same “form” as that defining $w$ . (All higher terms are annihilated since $G$ has degree-rank $(s-1,r^{\ast})$ .) Note that this implies that one can define the associated commutator given inputs in $G_{(i_{1},1)}/G_{(i_{1},2)},\ldots,G_{(i_{r^{\ast}},1)}/G_{(i_{r^{\ast}},2)}$ and furthermore we see that the associated commutator form on the Lie algebra is a multilinear form of the vector arguments (since $G_{(i,1)}/G_{(i,2)}$ and $G_{(s-1,r^{\ast})}$ are real vector spaces and the commutator bracket on the Lie algebra is multilinear).

Proof.

We first note that the statement of the lemma is trivial for $r^{\ast}=1$ since we may take $V_{i}=V_{i,\mathrm{Dep}}=G_{(i,1)}^{\prime}/G_{(i,2)}^{\prime}$ ; it is impossible to have two distinct indices in the final bullet point. Taking $g_{h}^{\prime}(n)=g_{h}(n)$ and $g^{\prime}(h,n)=(g(h,n),0)$ completes the proof in this case. For $s=2$ , the only possible case is $r^{\ast}=1$ and therefore for the remainder of the proof we will consider $s\geq 3$ . Similarly, if $\eta$ is trivial, the result is once again immediate. Thus throughout the remainder of the proof we will assume that $s-1\geq r^{\ast}\geq 2$ and $\eta$ is nontrivial.

Step 1: Setup for invoking equidistribution theory. By Lemma 7.5, we have

\displaystyle\bigg{\lVert}\mathbb{E}\bigg{[}\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\cdot\psi_{\vec{h}}(g_{\vec{h}}(n)\Gamma^{\prime})\bigg{]}\bigg{\rVert}_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}

for at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ fraction of additive quadruples $h_{1}+h_{2}=h_{3}+h_{4}$ . Furthermore $g_{\vec{h}}(n)$ is a polynomial sequence on a group $G_{\mathrm{Error}}$ which has a degree $(s-2)$ filtration, dimension bounded by $O_{s}(d^{O_{s}(1)})$ , and the complexity of $G_{\mathrm{Error}}/\Gamma_{\mathrm{Error}}$ and the Lipschitz constant of the function for $\psi_{\vec{h}}$ are bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Note that a priori $G_{\mathrm{Error}}/\Gamma_{\mathrm{Error}}$ and the associated Mal’cev basis depend on $\vec{h}$ . However, applying Pigeonhole on the choice of the associated structure constants allows us to assume, at the cost of passing to a density $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ subset of the additive quadruples, that $G_{\mathrm{Error}}/\Gamma_{\mathrm{Error}}$ is independent of $\vec{h}$ . Finally, we may assume as usual that $g_{\vec{h}}(0)=\mathrm{id}_{G_{\mathrm{Error}}}$ via by-now standard manipulations.

We now consider the group $\widetilde{G}=G\times G\times G\times G\times G_{\mathrm{Error}}$ . $\widetilde{G}$ may naturally be given a degree-rank $(s-1,r^{\ast})$ product filtration (where we use [34, Example 6.11] to assign $G_{\mathrm{Error}}$ a degree-rank $(s-2,s-2)$ structure) and Mal’cev basis. Furthermore if $\chi_{h_{i}}(n)=F(g_{h_{i}}(n))$ we have that the five-fold function $F(x_{1}\Gamma)\otimes F(x_{2}\Gamma)\otimes\overline{F(x_{3}\Gamma)}\otimes\overline{F(x_{4}\Gamma)}\cdot\psi_{\vec{h}}(x_{5}\Gamma_{\mathrm{Error}})$ has a vertical frequency $\eta_{\mathrm{Prod}}=(\eta,\eta,-\eta,-\eta,0)$ . (Note that $(G_{\mathrm{Error}})_{(s-1,i)}=\mathrm{Id}_{G_{\mathrm{Error}}}$ for all $i\geq 0$ .)

For the sake of convenience, we set

g_{\vec{h}}^{\ast}(n)=(g_{h_{1}}(n),g_{h_{2}}(n+h_{1}-h_{4}),g_{h_{3}}(n),g_{h_{4}}(n+h_{1}-h_{4}),g_{\vec{h}}(n))

and note that the function $F\otimes F\otimes\overline{F}\otimes\overline{F}\cdot\psi_{\vec{h}}$ is seen to be $M^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz on $\widetilde{G}$ . Note that by the second item of Lemma 2.13, we immediately have that

\operatorname{Taylor}_{i}(g_{h_{2}}(n+h_{1}-h_{4}))=\operatorname{Taylor}_{i}(g_{h_{2}}(n))

for $1\leq i\leq s-1$ and analogously for $g_{h_{4}}(n+h_{1}-h_{4})$ .

Step 2: Invoking equidistribution theory. By applying Corollary 5.5 (since $\eta$ is nonzero), there exists a $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational subgroup $J=J_{\vec{h}}$ of $\widetilde{G}$ such that $\eta_{\mathrm{Prod}}(J\cap\widetilde{G}_{(s-1,r^{\ast})})=0$ and such that

g_{\vec{h}}^{\ast}=\varepsilon_{\vec{h}}\cdot\widetilde{g_{\vec{h}}}\cdot\gamma_{\vec{h}}

where:

•

$\varepsilon_{\vec{h}}(0)=\widetilde{g_{\vec{h}}}(0)=\gamma_{\vec{h}}(0)=\mathrm{id}_{\widetilde{G}}$ ;
•

$\widetilde{g_{\vec{h}}}$ takes values in $J$ ;
•

$\gamma_{\vec{h}}$ is $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational (with respect to the lattice $\Gamma\times\Gamma\times\Gamma\times\Gamma\times\Gamma_{\mathrm{Error}}$ );
•

$d(\varepsilon(n),\varepsilon(n-1))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-1}$ for $n\in[N]$ .

By passing to a subset of additive quadruples of density $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ we may in fact assume that the group $J$ is independent of $\vec{h}$ under consideration.

We define

J_{i}^{\prime}:=(J\cap\widetilde{G}_{(i,1)})/(J\cap\widetilde{G}_{(i,2)}),\qquad J_{i}:=\tau_{i}(J_{i}^{\prime})

where $\tau_{i}\colon\operatorname{Horiz}_{i}(G)^{\otimes 4}\times\operatorname{Horiz}_{i}(G_{\mathrm{Error}})\to\operatorname{Horiz}_{i}(G)^{\otimes 4}$ is the natural projection map to the four-fold product. Since $\eta_{\mathrm{Prod}}(J\cap\widetilde{G}_{(s-1,r)})=0$ (due to the output of Corollary 5.5), we have

\eta_{\mathrm{Prod}}([J_{i_{1}}^{\prime},\ldots,J_{i_{r^{\ast}}}^{\prime}])=0

for $i_{1}^{\prime}+\ldots+i_{r^{\ast}}^{\prime}=s-1$ where the commutator bracket is taken with respect to $\widetilde{G}$ and $[\cdot,\ldots,\cdot]$ denotes any possible $(r^{\ast}-1)$ -fold commutator bracket.

Since $G_{\mathrm{Error}}$ has been given a degree-rank $<(s-1,r^{\ast})$ filtration, we have that in fact

\eta_{\mathrm{Prod}}([J_{i_{1}},\ldots,J_{i_{r^{\ast}}}])=0

where we abusively descend $\eta_{\mathrm{Prod}}$ to $G^{\otimes 4}$ . Less formally, we are noting that the final coordinate of elements in $\widetilde{G}$ play no role in commutators of the depth being considered.

Step 3: Furstenberg–Weiss commutator argument. We now perform the crucial Furstenberg–Weiss commutator argument. Given $T\subseteq[4]$ , we define $\pi_{T}((v_{1},\ldots,v_{4}))=(v_{i})_{i\in T}$ with the coordinates represented in increasing order of index.

We define

	$\displaystyle\pi_{123}(J_{i})^{\ast}$	$\displaystyle=\pi_{123}(J_{i})\cap\{(v,0,0)\colon v\in\operatorname{Horiz}_{i}(G)\},$
	$\displaystyle\pi_{124}(J_{i})^{\ast}$	$\displaystyle=\pi_{124}(J_{i})\cap\{(v,0,0)\colon v\in\operatorname{Horiz}_{i}(G)\}.$

Note that $\pi_{123}(J_{i})^{\ast}$ and $\pi_{124}(J_{i})^{\ast}$ may (abusively) be viewed as subspaces of $\operatorname{Horiz}_{i}(G)$ . The crucial claim is that

\eta([v_{i_{1}},\ldots,v_{i_{r^{\ast}}}])=0

if $i_{1}+\cdots+i_{r^{\ast}}=s-1$ , each $v_{i_{\ell}}\in\pi_{1}(J_{i_{\ell}})$ , and for two distinct indices $\ell_{1},\ell_{2}$ we have that $v_{i_{\ell_{1}}}\in\pi_{123}(J_{i_{\ell_{1}}})^{\ast}$ and $v_{i_{\ell_{2}}}\in\pi_{124}(J_{i_{\ell_{2}}})^{\ast}$ . Note that $\eta$ lives on $G$ and the commutator brackets are taken with respect to $G$ , not $G^{\otimes 4}$ . The Furstenberg–Weiss commutator argument is required to capture precisely this difference.

Note that an element $v_{i}\in\pi_{1}(J_{i_{\ell}})$ lifts to an element $\widetilde{v_{i_{\ell}}}$ of the form $(v_{i_{\ell}},\cdot,\cdot,\cdot)\in\operatorname{Horiz}_{i_{\ell}}(G)^{\otimes 4}$ . Furthermore note that $v_{i_{\ell}}\in\pi_{123}(J_{i_{\ell}})$ “lifts” to an element $\widetilde{v_{i_{\ell}}}$ of the form $(v_{i_{\ell}},0,0,\cdot)\in\operatorname{Horiz}_{i_{\ell}}(G)^{\otimes 4}$ while $v_{i_{\ell}}\in\pi_{124}(J_{i_{\ell}})$ lifts to an element $\widetilde{v_{i_{\ell}}}$ of the form $(v_{i_{\ell}},0,\cdot,0)\in\operatorname{Horiz}_{i_{\ell}}(G)^{\otimes 4}$ .

Given the above setup, we have

[\widetilde{v_{i_{1}}},\ldots,\widetilde{v_{i_{{r^{\ast}}}}}]=([v_{i_{1}},\ldots,v_{i_{{r^{\ast}}}}],\mathrm{id}_{G},\mathrm{id}_{G},\mathrm{id}_{G}).

To see this note that the iterated commutator of elements in $G\times\mathrm{Id}_{G}\times\mathrm{Id}_{G}\times G$ (with any elements in $G^{\otimes 4}$ ) remains in the subgroup $G\times\mathrm{Id}_{G}\times\mathrm{Id}_{G}\times G$ ; an analogous fact holds true for $G\times\mathrm{Id}_{G}\times G\times\mathrm{Id}_{G}$ . Since we assumed that our commutator contains elements in both $G\times\mathrm{Id}_{G}\times\mathrm{Id}_{G}\times G$ and $G\times\mathrm{Id}_{G}\times G\times\mathrm{Id}_{G}$ , the commutator must in fact live in $G\times\mathrm{Id}_{G}\times\mathrm{Id}_{G}\times\mathrm{Id}_{G}$ , and the first coordinates of the desired commutators is trivially seen to match.

Recalling that we have

\eta_{\mathrm{Prod}}([J_{i_{1}},\ldots,J_{i_{r^{\ast}}}])=0,

and noting that $\eta_{\mathrm{Prod}}$ descends to $\eta$ on the subgroup $G_{(s-1,r^{\ast})}\times\mathrm{Id}_{G}^{\otimes 3}$ , we have

\eta([v_{i_{1}},\ldots,v_{i_{{r^{\ast}}}}])=0

as claimed.

Step 4: Finding $(h_{2},h_{3})$ and $(h_{2}^{\prime},h_{4}^{\prime})$ which extend to many “good” $h_{1}$ . Recall that we are looking at the at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ fraction of additive quadruples $(h_{1},h_{2},h_{3},h_{4})\in H^{4}\subseteq[N]^{4}$ which are such that $\widetilde{g_{\vec{h}}}$ lives on a specified subgroup $J$ . Call this set of quadruples $\mathcal{S}$ .

So by Markov, there are at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}N$ many $h_{1}\in[N]$ which extend to at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}N^{2}$ quadruples in $\mathcal{S}$ . Thus there are at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}N^{5}$ pairs of additive tuples of the form

(h_{1},h_{2},h_{3},h_{1}+h_{2}-h_{3}),~{}(h_{1},h_{2}^{\prime},h_{1}+h_{2}^{\prime}-h_{4}^{\prime},h_{4}^{\prime})\in\mathcal{S}.

By averaging, there exists a pair of pairs $(h_{2},h_{3})$ and $(h_{2}^{\prime},h_{4}^{\prime})$ such that there are at least $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}N$ many $h_{1}\in[N]$ which live in such additive tuples. We fix such a pair of pairs and define $\mathcal{T}$ to denote the set of $h_{1}\in[N]$ such that $(h_{1},h_{2},h_{3},h_{1}+h_{2}-h_{3})\in\mathcal{S}$ and $(h_{1},h_{2}^{\prime},h_{1}+h_{2}^{\prime}-h_{4}^{\prime},h_{4}^{\prime})\in\mathcal{S}$ .

Step 5: Extracting coefficient data. Consider $h_{1}\in\mathcal{T}$ and define

h^{123}=(h_{1},h_{2},h_{3},h_{1}+h_{2}-h_{3}),\quad h^{124}=(h_{1},h_{2}^{\prime},h_{1}+h_{2}^{\prime}-h_{4}^{\prime},h_{4}^{\prime}).

Recall $\mathcal{X}_{i}=(\mathcal{X}\cap\log(G_{(i,1)}))/\log(G_{(i,2)})$ and assign the basis $\exp(\mathcal{X}_{i})$ to $G_{(i,1)}/G_{(i,2)}$ (viewed as a vector space). Finally we assign the basis $\mathcal{Z}_{i}=\bigcup_{Y_{i}\in\exp(\mathcal{X}_{i})}\{(Y_{i},0,0),(0,Y_{i},0),(0,0,Y_{i})\}$ to $(G_{(i,1)}/G_{(i,2)})^{\otimes 3}$ .

By Lemma 2.13, we have

	$\displaystyle\operatorname{Taylor}_{i}(g_{h^{123}}^{\ast})$	$\displaystyle=\operatorname{Taylor}_{i}(\varepsilon_{h^{123}})+\operatorname{Taylor}_{i}(\widetilde{g_{h^{123}}})+\operatorname{Taylor}_{i}(\gamma_{h^{123}})$
	$\displaystyle\operatorname{Taylor}_{i}(g_{h^{124}}^{\ast})$	$\displaystyle=\operatorname{Taylor}_{i}(\varepsilon_{h^{124}})+\operatorname{Taylor}_{i}(\widetilde{g_{h^{124}}})+\operatorname{Taylor}_{i}(\gamma_{h^{124}})$

Therefore, by Lemma 8.2, for all $h_{1}\in\mathcal{T}$ we have

	$\displaystyle\operatorname{dist}(\operatorname{Taylor}_{i}((g_{h_{1}},g_{h_{2}},g_{h_{3}})),\pi_{123}(J_{i})+T_{h_{1}}^{-1}\operatorname{Horiz}_{i}(\Gamma^{\otimes 3}))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i},$
	$\displaystyle\operatorname{dist}(\operatorname{Taylor}_{i}((g_{h_{1}},g_{h_{2}^{\prime}},g_{h_{4}^{\prime}})),\pi_{124}(J_{i})+T_{h_{1}}^{\prime-1}\operatorname{Horiz}_{i}(\Gamma^{\otimes 3}))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i},$

where $T_{h_{1}}$ and $T_{h_{1}}^{\prime}$ are positive integers bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Here we have identified the basis $\mathcal{Z}_{i}$ (for $(G_{(i,1)}/G_{(i,2)})^{\otimes 3}$ ) with the standard basis vectors in $\mathbb{R}^{3\dim(\operatorname{Horiz}_{i}(G))}$ and taken the $L^{\infty}$ metric on the latter (for the notion of $\operatorname{dist}$ ). At the cost of shrinking the set $\mathcal{T}$ by a multiplicative factor of $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ we may assume that $T_{h_{1}}=T$ and $T_{h_{1}}^{\prime}=T^{\prime}$ for all $h_{1}\in\mathcal{T}$ .

We now consider a basis $\mathcal{B}_{i}$ for $\pi_{123}(J_{i})$ which is in row-echelon form where one orders the coordinates corresponding to second copy of $G$ (in the four-fold $G^{\otimes 4}$ ) at the front, then the third copy, and then the first copy. In particular, the “final block” of basis vectors span $\pi_{123}(J_{i})^{\ast}$ . Note that one can take such $\mathcal{B}_{i}$ such that the coordinates are integers bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ due to the rationality of $\pi_{123}(J_{i})$ .

For $h_{1},h_{1}^{\prime}\in\mathcal{T}$ , we have

(8.1)		$\displaystyle\operatorname{Taylor}_{i}((g_{h_{1}},g_{h_{2}},g_{h_{3}}))$	$\displaystyle=\sum_{R_{j}\in\mathcal{B}_{i}}a_{j}R_{j}+T^{-1}\mathbb{Z}^{3\dim(\operatorname{Horiz}_{i}(G))}+v_{h_{1}}$
(8.2)		$\displaystyle\operatorname{Taylor}_{i}((g_{h_{1}^{\prime}},g_{h_{2}},g_{h_{3}}))$	$\displaystyle=\sum_{R_{j}\in\mathcal{B}_{i}}a_{j}^{\prime}R_{j}+T^{-1}\mathbb{Z}^{3\dim(\operatorname{Horiz}_{i}(G))}+v_{h_{1}^{\prime}}$

where $\lVert v_{h_{1}}\rVert_{\infty},\lVert v_{h_{1}^{\prime}}\rVert_{\infty}\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}$ . For each basis vector $R_{j}\in\mathcal{B}_{i}$ where the first nonzero element is either in coordinates corresponding to second or third copy of $G$ , there exists a dual vector which is zero on the coordinates corresponding to the first copy of $G$ and whose inner product with all of $\mathcal{B}_{i}$ but $R_{j}$ is zero.

Call this vector $v_{j}$ and note one may take $v_{j}$ to have integral coordinates bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ and divisible by $T$ . Then from (8.1) and (8.2),

0=v_{j}\cdot(\operatorname{Taylor}_{i}((g_{h_{1}},g_{h_{2}},g_{h_{3}}))-\operatorname{Taylor}_{i}((g_{h_{1}^{\prime}},g_{h_{2}},g_{h_{3}}))=M_{j}(a_{j}-a_{j}^{\prime})+\mathbb{Z}\pm(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i},

where $M_{j}$ is an nonzero integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . (That is, $M_{j}(a_{j}-a_{j}^{\prime})$ is within $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}$ of an integer.)

We may now use this information about such indices $j$ in conjunction with (8.1) and (8.2). We deduce that for all $h_{1},h_{1}^{\prime}\in\mathcal{T}$ ,

\operatorname{dist}(\operatorname{Taylor}_{i}(g_{h_{1}})-\operatorname{Taylor}_{i}(g_{h_{1}^{\prime}}),\pi_{123}(J_{i})^{\ast}+{T_{1}}^{-1}\mathbb{Z}^{\dim(\operatorname{Horiz}_{i}(G))})\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i},

where $T_{1}$ is an integer of size bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Analogously,

\operatorname{dist}(\operatorname{Taylor}_{i}(g_{h_{1}})-\operatorname{Taylor}_{i}(g_{h_{1}^{\prime}}),\pi_{124}(J_{i})^{\ast}+{T_{1}^{\prime}}^{-1}\mathbb{Z}^{\dim(\operatorname{Horiz}_{i}(G))})\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}

where $T_{1}^{\prime}$ is an integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Putting it together, we may deduce that

\operatorname{dist}(\operatorname{Taylor}_{i}(g_{h_{1}})-\operatorname{Taylor}_{i}(g_{h_{1}^{\prime}}),\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}+T_{2}^{-1}\mathbb{Z}^{\dim(\operatorname{Horiz}_{i}(G))})\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}

with $T_{2}$ a nonzero integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . To see this, simply construct a bounded integral basis of the orthogonal complement of $\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}$ (treated as a subspace of the dual space to $G_{(i,1)}/G_{(i,2)}\simeq\mathbb{R}^{\dim(\operatorname{Horiz}_{i}(G))}$ ). Then the two input inequalities imply that any basis vector for the intersection space dual will map $T_{1}T_{1}^{\prime}(\operatorname{Taylor}_{i}(g_{h_{1}})-\operatorname{Taylor}_{i}(g_{h_{1}^{\prime}}))$ to a near-integral scalar, which gives the claim.

Now by Lemma 2.13 we therefore have

(8.3)

\operatorname{dist}(\operatorname{Taylor}_{i}(g_{h_{1}}g_{h_{1}^{\prime}}^{-1}),\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}+T_{2}^{-1}\mathbb{Z}^{\dim(\operatorname{Horiz}_{i}(G))})\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}.

It is also trivial by restricting the factorization to the first coordinate that

(8.4)

\operatorname{dist}(\operatorname{Taylor}_{i}(g_{h_{1}}),\pi_{1}(J_{i})+T_{3}^{-1}\mathbb{Z}^{\dim(\operatorname{Horiz}_{i}(G))})\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}

with $T_{3}$ a nonzero integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ .

Step 6: Extracting initial factorizations. For the remainder of the proof fix $h_{1}^{\ast}\in\mathcal{T}$ . Given $h_{1}^{\prime}\in\mathcal{T}$ we have

g_{h_{1}^{\prime}}=g_{h_{1}^{\prime}}g_{h_{1}^{\ast}}^{-1}\cdot g_{h_{1}^{\ast}}.

Note that $\pi_{1}(J_{i})$ and $\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}$ may each be defined as the kernel of a set of $i$ -th horizontal characters (on $G$ ) of height at most $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Recall (8.3) and (8.4). Scaling the horizontal characters by at most $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ and applying Lemma B.2, we may write

	$\displaystyle g_{h_{1}^{\prime}}g_{h_{1}^{\ast}}^{-1}$	$\displaystyle=\varepsilon_{h_{1}^{\prime}}\cdot\widetilde{g_{h_{1}^{\prime}}}\cdot\gamma_{h_{1}^{\prime}},$
	$\displaystyle g_{h_{1}^{\ast}}$	$\displaystyle=\varepsilon\cdot\widetilde{g^{\prime}}\cdot\gamma^{\prime},$

where:

•

$\varepsilon_{h_{1}^{\prime}}(0)=\widetilde{g_{h_{1}^{\prime}}}(0)=\gamma_{h_{1}^{\prime}}(0)=\varepsilon(0)=g^{\prime}(0)=\gamma^{\prime}(0)=\mathrm{id}_{G}$ ;
•

$\operatorname{Taylor}_{i}(\widetilde{g_{h_{1}^{\prime}}})\in\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}$ and $\operatorname{Taylor}_{i}(\widetilde{g^{\prime}})\in\pi_{1}(J_{i})$ ;
•

$\gamma_{h_{1}^{\prime}},\gamma^{\prime}$ are $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational;
•

$d_{G}(\varepsilon_{h_{1}^{\prime}}(n),\varepsilon_{h_{1}^{\prime}}(n-1))+d_{G}(\varepsilon(n),\varepsilon(n-1))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-1}$ for $n\in[N]$ .

Therefore

\displaystyle g_{h_{1}^{\prime}}

\displaystyle=g_{h_{1}^{\prime}}g_{h_{1}^{\ast}}^{-1}\cdot g_{h_{1}^{\ast}}=\varepsilon_{h_{1}^{\prime}}\varepsilon\cdot(\varepsilon^{-1}\widetilde{g_{h_{1}^{\prime}}}\varepsilon)\cdot(\varepsilon^{-1}\gamma_{h_{1}^{\prime}}\varepsilon\gamma_{h_{1}^{\prime}}^{-1})\cdot(\gamma_{h_{1}^{\prime}}\cdot\widetilde{g^{\prime}}\gamma_{h_{1}^{\prime}}^{-1})\cdot\gamma_{h_{1}^{\prime}}\gamma^{\prime}.

By Lemma 2.13, we have

	$\displaystyle\operatorname{Taylor}_{i}(\gamma_{h_{1}^{\prime}}\widetilde{g^{\prime}}\gamma_{h_{1}^{\prime}}^{-1})=\operatorname{Taylor}_{i}(\widetilde{g^{\prime}})$	$\displaystyle\in\pi_{1}(J_{i}),$
	$\displaystyle\operatorname{Taylor}_{i}((\varepsilon^{-1}\widetilde{g_{h_{1}^{\prime}}}\varepsilon)\cdot(\varepsilon^{-1}\gamma_{h_{1}^{\prime}}\varepsilon\gamma_{h_{1}^{\prime}}^{-1}))=\operatorname{Taylor}_{i}(\widetilde{g_{h_{1}^{\prime}}})$	$\displaystyle\in\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}.$

We say that $h_{1}^{\prime},h_{1}^{\prime\prime}\in\mathcal{T}$ have matching rational parts if

(\gamma_{h_{1}^{\prime}}\gamma^{\prime})^{-1}\cdot(\gamma_{h_{1}^{\prime\prime}}\gamma^{\prime})

is a polynomial sequence valued in $\Gamma$ . By restricting $\mathcal{T}$ to an appropriate subset of density $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ , we may assume that all $h_{1}^{\prime}\in\mathcal{T}$ have matching rational parts. (This is most easily seen in first-kind coordinates: if $\gamma_{h_{1}^{\prime}}\gamma^{\prime}$ and $\gamma_{h_{1}^{\prime\prime}}\gamma^{\prime}$ have all coefficients differing by $T_{4}\cdot\operatorname{span}(\mathcal{X},\mathbb{Z})$ where $T_{4}$ is an appropriate integer of size bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ then two sequences match up to a polynomial sequence in $\Gamma$ .)

So, ultimately we may assume that for all $h_{1}^{\prime}\in\mathcal{T}$ we have

\displaystyle g_{h_{1}^{\prime}}=\varepsilon_{h_{1}^{\prime}}^{\ast}\cdot g_{h_{1}^{\prime}}^{\ast}\cdot\gamma^{\ast}\cdot\widetilde{\gamma_{h_{1}^{\prime}}}

where:

•

$\varepsilon_{h_{1}^{\prime}}^{\ast}(0)=g_{h_{1}^{\prime}}^{\ast}(0)=\gamma^{\ast}(0)=\widetilde{\gamma_{h_{1}^{\prime}}}(0)=\mathrm{id}_{G}$ ;
•

$\operatorname{Taylor}_{i}(g_{h_{1}^{\prime}}^{\ast}\cdot(g_{h_{1}^{\prime\prime}}^{\ast})^{-1})\in\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}$ and $\operatorname{Taylor}_{i}(g_{h_{1}^{\prime}}^{\ast})\in\pi_{1}(J_{i})$ for all $h_{1}^{\prime\prime}\in\mathcal{T}$ ;
•

$\gamma^{\ast}$ is $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational
•

$\widetilde{\gamma_{h_{1}^{\prime}}}$ takes values in $\Gamma$ ;
•

$d_{G}(\varepsilon_{h_{1}^{\prime}}^{\ast}(n),\varepsilon_{h_{1}^{\prime}}^{\ast}(n-1))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-1}$ for $n\in[N]$ .

Step 7: Removing periodic and smooth pieces of factorization. Let $Q$ be the period of $\gamma^{\ast}\Gamma$ and define $\delta=(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ where $\delta$ is to be chosen later. We break $[N]$ into a collection of arithmetic progressions with difference $Q$ and length between $\delta N$ and $2\delta N$ ; there are at most $\delta^{-1}$ such progressions. Call these progressions $P_{1},\ldots,P_{\ell}$ and note that

\bigg{\lVert}\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\sum_{i=1}^{\ell}\mathbbm{1}_{n\in P_{i}}\cdot\chi(h,n)\otimes\chi_{h}(n)\cdot\psi_{h}(n)\bigg{\rVert}_{\infty}\geq\rho

where $\psi_{h}(n)$ is the degree $(s-2)$ nilsequence coming from the condition $\Delta_{h}f(n)\otimes\overline{\chi(h,n)}\otimes\overline{\chi_{h}(n)}\in\operatorname{Corr}(s-1,\rho,M,d)$ from the original correlation structure. For $h\in\mathcal{T}$ we may write

\chi_{h}(n)=F(g_{h}(n)\Gamma)=F(\varepsilon_{h}^{\ast}g_{h}^{\ast}\gamma^{\ast}\Gamma);

here we are using that $\widetilde{\gamma_{h_{1}^{\prime}}}$ takes values in $\Gamma$ so may be dropped for the remainder of the analysis.

Since $Q$ is the period of $\gamma^{\ast}$ , we may replace $\gamma^{\ast}$ by a value $\gamma_{P_{i}}$ for each progression where $\gamma_{P_{i}}\gamma^{\ast}(n)^{-1}\in\Gamma$ for $n\in P_{i}$ and $\lVert\psi(\gamma_{P_{i}})\rVert_{\infty}\leq 1$ . Then

\bigg{\lVert}\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\cdot\chi(h,n)\otimes\bigg{(}\sum_{i=1}^{\ell}\mathbbm{1}_{n\in P_{i}}F(\varepsilon_{h}^{\ast}g_{h}^{\ast}\gamma_{P_{i}}\Gamma)\bigg{)}\cdot\psi_{h}(n)\bigg{\rVert}_{\infty}\geq\rho.

Furthermore as $\varepsilon_{h}^{\ast}$ is sufficiently smooth we may replace $\varepsilon_{h}^{\ast}$ with the constant $\varepsilon_{h,P_{i}}=\varepsilon_{h}^{\ast}(\min(P_{i}))$ and have

\bigg{\lVert}\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\cdot\chi(h,n)\otimes\bigg{(}\sum_{i=1}^{\ell}\mathbbm{1}_{n\in P_{i}}F(\varepsilon_{h,P_{i}}g_{h}^{\ast}\gamma_{P_{i}}\Gamma)\bigg{)}\cdot\psi_{h}(n)\bigg{\rVert}_{\infty}\geq\rho/2,

as long as $\delta$ was chosen sufficiently small.

By the triangle inequality there exists some $P_{i}$ which is distance at least $\delta^{1/2}N$ from the ends of the interval $[N]$ such that

\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\cdot\chi(h,n)\otimes\mathbbm{1}_{n\in P_{i}}F(\varepsilon_{h,P_{i}}g_{h}^{\ast}\gamma_{P_{i}}\Gamma)\cdot\psi_{h}(n)\rVert_{\infty}\geq\delta^{2}.

By paying a $\delta^{O(1)}$ -fraction in the size of $\mathcal{T}$ we may assume that the choice of index $i$ is independent of $h$ , hence writing $P_{i}=P$ . Furthermore note that there is a $\delta^{O(1)}$ -net of size $\delta^{-O_{s}(d)}$ for the set of $g$ satisfying $d_{G}(g,\mathrm{id}_{G})\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . If the net size is chosen small enough, we may shift $\varepsilon_{h,P_{i}}$ to a nearby value in the net without much loss. Then we can pay a $\delta^{O_{s}(d)}$ -fraction in the size of $\mathcal{T}$ to Pigeonhole onto a single point in the net, writing $\varepsilon_{h,P_{i}}=\varepsilon_{P}$ .

Overall, for all $h\in\mathcal{T}$ we have

\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\cdot\chi(h,n)\otimes\mathbbm{1}_{P}(n)F(\varepsilon_{P}g_{h}^{\ast}\gamma_{P}\Gamma)\cdot\psi_{h}(n)\rVert_{\infty}\geq\delta^{3}

for some $P$ at least $\delta^{1/2}N$ from the endpoints of the interval. Thus by Lemma 7.1, for each $h\in\mathcal{T}$ there exists $\Theta_{h}$ with $\lVert Q\cdot\Theta_{h}\rVert_{\mathbb{R}/\mathbb{Z}}\leq\delta^{-O(1)}N^{-1}$ and

(8.5)

\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\cdot\chi(h,n)\otimes e(\Theta_{h}n)F(\varepsilon^{\ast}g_{h}^{\ast}\gamma_{P}\Gamma)\cdot\psi_{h}(n)\rVert_{\infty}\geq\delta^{O(1)}.

Rounding $\Theta_{h}$ to a net of distance $\delta^{O(1)}N^{-1}$ and paying a $Q^{-1}\delta^{O(1)}$ -fraction in the size of $\mathcal{T}$ to Pigeonhole the resulting point, we may write $\Theta_{h}=\Theta$ for all $h\in\mathcal{T}$ . We are now finally in position to define the output data. Define

	$\displaystyle g_{h}^{\prime}$	$\displaystyle=\gamma_{P}^{-1}g_{h}^{\ast}\gamma_{P},$
	$\displaystyle F^{\prime}$	$\displaystyle=F(\varepsilon^{\ast}\gamma_{P}\cdot),$
	$\displaystyle\chi_{h}^{\prime}(n)$	$\displaystyle=F^{\prime}(g_{h}^{\prime}(n)),$
	$\displaystyle\chi^{\prime}(h,n)$	$\displaystyle=\chi(h,n)\cdot e(\Theta n).$

Note that $g^{\prime}(h,n)=(g(h,n),\Theta n)$ is the polynomial sequence underlying $\chi^{\prime}$ , and $\chi^{\prime}(h,n)={F^{\ast}}^{\prime}(g^{\prime}(h,n){\Gamma^{\ast}}^{\prime})$ . It is easy to check the relevant properties of Definition 6.1 to see that we obtain a degree-rank $(s-1,r^{\ast})$ correlation structure with appropriately modified underlying parameters (we set $H^{\prime}$ to be the final refined version of $\mathcal{T}$ ); in particular, (8.5) demonstrates the necessary correlation fact.

Finally, taking

V_{i,\mathrm{Dep}}=\pi_{123}(J_{i})^{\ast}\cap\pi_{124}(J_{i})^{\ast}\text{ and }V_{i}=\pi_{1}(J_{i})

we finish the proof: in particular, the result from Step 3 demonstrates the final item of the conclusion, and the result from Step 6 demonstrates the third item. ∎

9. Linearization Step

We now come to the second crucial argument of this paper. Prior this stage we have modified the degree-rank $(s-1,r^{\ast})$ to one in which various Taylor coefficients of $g_{h}$ for $h\in H$ (upon factoring) differ only on certain special subspaces. In this next stage, we deduce that either these Taylor coefficients differ on a further refined subspace which is seen to be essentially “annhilated” by $\eta$ or $g_{h}$ has a certain “bracket linear” form. This step is ultimately where we invoke the results of Sanders [52] on quasi-polynomial bounds for the Bogolyubov lemma.

This step is closely modeled after [32, Step 2] and the closely related proof of [34, Lemma 11.5]; a quantitative version for the $U^{4}$ -inverse theorem due to the first author can be found in [43]. The precise statement of the lemma should also be compared with [34, Theorem 11.1(ii)].

Lemma 9.1.

Fix $s\geq 2$ and $1\leq r^{\ast}\leq s-1$ . Let $f\colon[N]\to\mathbb{C}$ be a $1$ -bounded function. Suppose that $f$ has a degree rank $(s-1,r^{\ast})$ correlation structure with parameters $\rho$ , $M$ , $d$ , and $D$ and that $N\geq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Furthermore let $\mathcal{X}_{i}=(\mathcal{X}\cap\log(G_{(i,1)}))/\log(G_{(i,2)})$ .

We output a new degree-rank $(s-1,r^{\ast})$ correlation structure for $f$ with parameters

\displaystyle\rho^{\prime-1}

\displaystyle\leq\exp(O_{s}((d\log(MD/\rho))^{O_{s}(1)})),\quad M^{\prime}\leq O(M),\quad D^{\prime}=D,\quad d^{\prime}\leq O(d),

•

$(G^{\ast})^{\prime}$ is given the multidegree filtration

$(G^{\ast})^{\prime}_{(i,j)}=(G^{\ast})_{(i,j)}\times\{0\}$

if $(i,j)\neq(0,0)$ or $(0,1)$ . For $(i,j)\in\{(0,0),(0,1)\}$ , we set

$(G^{\ast})^{\prime}_{(i,j)}=(G^{\ast})_{(i,j)}\times\mathbb{R}.$

We have ${F^{\ast}}^{\prime}((x,z)(\Gamma^{\ast}\times\mathbb{Z}))=F^{\ast}(x\Gamma^{\ast})\cdot e(z)$ . We have $g^{\prime}(h,n)=(g(h,n),\Theta n)$ for some appropriate value of $\Theta$ ;
•

There is a collection of $\mathbb{R}$ -vector spaces $W_{i,\ast},W_{i,\mathrm{Lin}},W_{i,\mathrm{Pet}}\leqslant G_{(i,1)}/G_{(i,2)}$ for each $i$ ;
•

If $W_{i}:=W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}$ then $\dim(W_{i})=\dim(W_{i,\ast})+\dim(W_{i,\mathrm{Lin}})+\dim(W_{i,\mathrm{Pet}})$ , i.e., the three spaces are linearly disjoint;
•

There exist bases $\mathcal{X}_{i,\ast}$ , $\mathcal{X}_{i,\mathrm{Lin}}$ , and $\mathcal{X}_{i,\mathrm{Pet}}$ of the corresponding spaces which are composed of $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational combinations of elements of $(\mathcal{X}\cap G_{(i,1)})/G_{(i,2)}$ ;

•

For $1\leq i\leq s-1$ and $h,h_{1},h_{2}\in H^{\prime}$ we have

	$\displaystyle\operatorname{Taylor}_{i}(g_{h}^{\prime})$	$\displaystyle\in W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}=W_{i},$
	$\displaystyle\operatorname{Taylor}_{i}(g_{h_{1}}^{\prime})-\operatorname{Taylor}_{i}(g_{h_{2}}^{\prime})$	$\displaystyle\in W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}},$
	$\displaystyle\operatorname{Proj}_{W_{i,\mathrm{Lin}}}(\operatorname{Taylor}_{i}(g_{h}^{\prime}))$	$\displaystyle=\sum_{Z_{i,j}\in\mathcal{X}_{i,\mathrm{Lin}}}\bigg{(}\gamma_{i,j}+\sum_{k=1}^{d^{\ast}}\alpha_{i,j,k}\{\beta_{k}h\}\bigg{)}Z_{i,j},$

with $d^{\ast}\leq(d\log(MD/\rho))^{O_{s}(1)}$ and $\beta_{k}\in(1/N^{\prime})\mathbb{Z}$ where $N^{\prime}$ is a prime in $[100N,200N]$ ;

•

$F^{\prime}$ is $M^{\prime}$ -Lipschitz and has the same vertical frequency $\eta$ as $F$ ;
•

For any integers $i_{1}+\cdots+i_{r^{\ast}}=s-1$ , suppose that $v_{i_{j}}\in V_{i_{j}}$ for all $j$ . If for at least one index $\ell$ we have $v_{i_{\ell}}\in W_{i_{\ell},\mathrm{Pet}}$ , then if $w$ is any $(r^{\ast}-1)$ -fold commutator of $v_{i_{1}},\ldots,v_{i_{r^{\ast}}}$ we have

$\eta(w)=0.$

Furthermore, if instead for at least two indices $\ell_{1},\ell_{2}$ we have $v_{i_{\ell_{1}}}\in W_{i_{\ell},\mathrm{Lin}}$ and $v_{i_{\ell_{2}}}\in W_{i_{\ell_{2}},\mathrm{Lin}}$ , then if $w$ is any $(r^{\ast}-1)$ -fold commutator of $v_{i_{1}},\ldots,v_{i_{r^{\ast}}}$ we have

$\eta(w)=0.$

Remark.

The projection map $\operatorname{Proj}_{W_{i,\mathrm{Lin}}}\colon W_{i}\to W_{i,\mathrm{Lin}}$ is well-defined due to the linear disjointness condition. Furthermore we have written Taylor coefficients with additive notation, since $G_{(i,1)}/G_{(i,2)}$ can be identified with $\mathbb{R}^{\dim(\operatorname{Horiz}_{i}(G))}$ .

Proof.

For the majority of the proof we will assume $s\geq 3$ ; we indicate the minor changes required for $s=2$ for the end of the proof (and the case $s=2$ is not used in the proof of Theorem 1.2). Note that the case when $\eta$ is trivial follows via taking $W_{i,\mathrm{Pet}}=G_{(i,1)}/G_{(i,2)}$ , $W_{i,\mathrm{Lin}}$ and $W_{i,\ast}$ to be trivial, $g_{h}^{\prime}=g_{h}$ , and $g^{\prime}(h,n)=(g(h,n),0)$ ; therefore we may assume that $\eta$ is nontrivial for the remainder of the proof.

Step 1: Applying Lemma 8.3 and linear-algebraic setup. We apply Lemma 8.3 and treat the resulting correlation structure as the input to the lemma. Up to changing implicit constants in the output this leaves the lemma unchanged except for noting that

\chi(h,n)=e(\Theta n)\cdot F^{\ast}(g(h,n)\Gamma^{\ast})

which is defined on the group $(G^{\ast})^{\prime}=G^{\ast}\times\mathbb{R}$ . In particular, we will abusively overwrite notation and relabel the resulting $H^{\prime}$ from the application of Lemma 8.3 as $H$ , $g_{h}^{\prime}$ as $g_{h}$ , and $\chi^{\prime}_{h}(n)$ as $\chi_{h}(n)$ and thus assume the output properties without further comment.

It will also be crucial to define certain linear-algebraic operators of $V_{i}$ . Consider a basis for $V_{i,\mathrm{Dep}}$ , an extension to a basis of $V_{i}$ , and then to $G_{(i,1)}/G_{(i,2)}$ such that all basis elements are $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational combinations of the basis $\exp(\mathcal{X}_{i})~{}\mathrm{mod}~{}G_{(i,2)}$ . In particular, write

	$\displaystyle V_{i,\mathrm{Dep}}$	$\displaystyle=\operatorname{span}_{\mathbb{R}}(w_{i,1},\ldots,w_{i,\dim(V_{i,\mathrm{Dep}})})$
	$\displaystyle V_{i}$	$\displaystyle=\operatorname{span}_{\mathbb{R}}(w_{i,1},\ldots,w_{i,\dim(V_{i,\mathrm{Dep}})},w_{i,\dim(V_{i,\mathrm{Dep}})+1},\ldots,w_{i,\dim(V_{i})}),$
	$\displaystyle G_{(i,1)}/G_{(i,2)}$	$\displaystyle=\operatorname{span}_{\mathbb{R}}(w_{i,1},\ldots,w_{i,\dim(\operatorname{Horiz}_{i}(G))}).$

Given $v\in V_{i}$ , there is a unique linear combination

v=\sum_{j=1}^{\dim(V_{i})}\alpha_{j}w_{i,j}.

We define

P_{i}v=\sum_{j=\dim(V_{i,\mathrm{Dep}})+1}^{\dim(V_{i})}\alpha_{j}w_{i,j},\qquad Q_{i}v=\sum_{j=1}^{\dim(V_{i,\mathrm{Dep}})}\alpha_{j}w_{i,j}.

By construction $P_{i}^{2}=P_{i}$ , $Q_{i}^{2}=Q_{i}$ , $Q_{i}(V_{i})\cap P_{i}(V_{i})=0$ , and $P_{i}v+Q_{i}v=v$ for $v\in V_{i}$ . We also (abusively) extend the operator $P_{i}$ to $V_{i}^{\otimes\ell}$ and $(G_{(i,1)}/G_{(i,2)})^{\otimes 4}$ in the obvious manners by acting on each copy of $V_{i}$ separately (and zeroing out basis elements $w_{i,\dim(V_{i})+1},\ldots,w_{i,\dim(\operatorname{Horiz}_{i}(G))}$ ).

Step 2: Invoking equidistribution theory. Applying Lemma 7.5 when $s\geq 3$ , we have

\displaystyle\lVert\mathbb{E}[\chi_{h_{1}}(n)\otimes\chi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\chi_{h_{3}}(n)}\otimes\overline{\chi_{h_{4}}(n+h_{1}-h_{4})}\cdot\psi_{\vec{h}}(g_{\vec{h}}(n)\Gamma^{\prime})]\rVert_{\infty}\geq(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}

for a $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ density of additive tuples. We define $G_{\mathrm{Error}}$ , $\widetilde{G}$ , and $\eta_{\mathrm{Prod}}$ as in the proof of Lemma 8.3 and as before we may assume that $g_{\vec{h}}(0)=\mathrm{id}_{G_{\mathrm{Error}}}$ . Define

g_{\vec{h}}^{\ast}(n)=(g_{h_{1}}(n),g_{h_{2}}(n+h_{1}-h_{4}),g_{h_{3}}(n),g_{h_{4}}(n+h_{1}-h_{4}),g_{\vec{h}}(n)).

By applying Corollary 5.5, we have

g_{\vec{h}}^{\ast}=\varepsilon_{\vec{h}}\cdot\widetilde{g_{\vec{h}}}\cdot\gamma_{\vec{h}}

with

•

$\varepsilon_{\vec{h}}(0)=\widetilde{g_{\vec{h}}}(0)=\gamma_{\vec{h}}(0)=\mathrm{id}_{\widetilde{G}}$ ;
•

$\widetilde{g_{\vec{h}}}$ takes values in $K$ ;
•

$\gamma_{\vec{h}}$ is $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational;
•

$d(\varepsilon(n),\varepsilon(n-1))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-1}$ for $n\in[N]$ .

where $\eta_{\mathrm{Prod}}(K\cap\widetilde{G}_{(s-1,r^{\ast})})=0$ and $K$ is a $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational subgroup of $\widetilde{G}$ . By passing to a subset of additive quadruples of density $(MD/\rho)^{-O_{s}(d^{O_{s}(1)})}$ we may in fact assume that the group $K$ is independent of $\vec{h}$ under consideration.

Step 3: Linear algebra deductions from equidistribution theory. Note that at present the subgroup $K$ does not account for the deductions given in Lemma 8.3; these initial deductions are designed essentially to account for this. Let $\tau_{i}\colon\operatorname{Horiz}_{i}(G)^{\otimes 4}\times\operatorname{Horiz}_{i}(G_{\mathrm{Error}})\to\operatorname{Horiz}_{i}(G)^{\otimes 4}$ be the natural projection to the four-fold product. We define the following set of vector spaces:

	$\displaystyle R_{i}$	$\displaystyle:=\{(Q_{i}v_{1},Q_{i}v_{2},Q_{i}v_{3},Q_{i}v_{4})\in V_{i}^{\otimes 4}\colon Q_{i}v_{1}+Q_{i}v_{2}-Q_{i}v_{3}-Q_{i}v_{4}=0\},$
	$\displaystyle K_{i}$	$\displaystyle:=\tau_{i}(K\cap\widetilde{G}_{(i,1)}~{}\mathrm{mod}~{}\widetilde{G}_{(i,2)}),$
	$\displaystyle S_{i}$	$\displaystyle:=\{(v_{1},v_{2},v_{3},v_{4})\in V_{i}^{\otimes 4}\colon P_{i}v_{1}=P_{i}v_{2}=P_{i}v_{3}=P_{i}v_{4}\},$
	$\displaystyle K_{i,1}$	$\displaystyle:=K_{i}\cap S_{i},$
	$\displaystyle\widetilde{K_{i}}$	$\displaystyle:=K_{i,1}+R_{i},$
	$\displaystyle L_{i}$	$\displaystyle:=\pi_{1}(\widetilde{K_{i}}\cap\{(v_{1},v_{2},v_{3},v_{4})\in V_{i}^{\otimes 4}\colon Q_{i}v_{2}=Q_{i}v_{3}=Q_{i}v_{4}=0\})+Q_{i}(V_{i}).$

By inspection, we have $R_{i}\leqslant S_{i}$ hence $\widetilde{K_{i}}\leqslant S_{i}$ . Note that

\eta_{\mathrm{Prod}}([K_{i_{1},1},\ldots,K_{i_{r}^{\ast},1}])=0

whenever one has that $i_{1}+\cdots+i_{r}^{\ast}=s-1$ and $[\cdot,\ldots,\cdot]$ denotes any possible $(r^{\ast}-1)$ -fold commutator bracket. This is a consequence of the fact that $\eta_{\mathrm{Prod}}(K\cap\widetilde{G}_{(s-1,r^{\ast})})=0$ and noting that $K_{i,1}\leqslant K_{i}$ . Note that we are implicitly using that $\eta_{\mathrm{Prod}}$ is trivial on $G_{\mathrm{Error}}$ as well, and we abusively descend $\eta_{\mathrm{Prod}}$ to $G^{\otimes 4}$ .

We now claim that

\eta_{\mathrm{Prod}}([v_{i_{1}},\ldots,v_{i_{r^{\ast}}}])=0

if $v_{i_{\ell}}\in S_{i_{\ell}}$ for all $\ell$ and there is at least one index $j$ such that $v_{i_{j}}\in R_{i_{j}}$ .

To prove this, note by the final bullet point of Lemma 8.3 and multilinearity that

	$\displaystyle\eta_{\mathrm{Prod}}([v_{i_{1}},\ldots,v_{i_{r^{\ast}}}])$	$\displaystyle=\eta_{\mathrm{Prod}}([P_{i_{1}}v_{i_{1}},\ldots,P_{i_{r^{\ast}}}v_{i_{r^{\ast}}}])+\sum_{k=1}^{r}\eta_{\mathrm{Prod}}([P_{i_{1}}v_{i_{1}},\ldots,Q_{i_{k}}v_{i_{k}},\ldots,P_{i_{r^{\ast}}}v_{i_{r^{\ast}}}])$
		$\displaystyle=\eta_{\mathrm{Prod}}([P_{i_{1}}v_{i_{1}},\ldots,Q_{i_{j}}v_{i_{j}},\ldots,P_{i_{r^{\ast}}}v_{i_{r^{\ast}}}])=0.$

The first equality uses that every bracket with at least two $Q_{i_{k}}v_{i_{k}}$ has two $V_{i,\mathrm{Dep}}$ terms so is $0$ , the second equality uses $P_{i_{j}}v_{i_{j}}=0$ , and the third equality follows by noting that

P_{i}v_{i}\in\{(v_{1},v_{2},v_{3},v_{4})\in V_{i}^{\otimes 4}\colon P_{i}v_{1}=P_{i}v_{2}=P_{i}v_{3}=P_{i}v_{4},Q_{i}v_{1}=Q_{i}v_{2}=Q_{i}v_{3}=Q_{i}v_{4}=0\}

and $\eta_{\mathrm{Prod}}=(\eta,\eta,-\eta,-\eta)$ . Now, we may ultimately deduce

\eta_{\mathrm{Prod}}([\widetilde{K_{i_{1}}},\ldots,\widetilde{K_{i_{r^{\ast}}}}])=0

because $\widetilde{K_{i}}=K_{i,1}+R_{i}$ and $R_{i},K_{i,1}\leqslant S_{i}$ .

Finally, let $\pi_{T}$ for $T\subseteq[4]$ is as in the proof of Lemma 8.3 (namely, an appropriate projection map). We have

\pi_{1}(\widetilde{K_{i}})\leqslant L_{i}.

This follows because if

((Qv_{1},Pv_{1}),(Qv_{2},Pv_{2}),(Qv_{3},Pv_{3}),(Qv_{4},Pv_{4}))\in\widetilde{K_{i}}

then

((Q(v_{1}+v_{2}-v_{3}-v_{4}),Pv_{1}),(0,Pv_{2}),(0,Pv_{3}),(0,Pv_{4}))\in\widetilde{K_{i}}.

Step 4: Constructing a decomposition of $Q_{i}(V_{i})$ . We will now decompose $Q_{i}(V_{i})=V_{i,\mathrm{Dep}}$ into a pair of subspaces. On one of these subspaces we will deduce an improved vanishing for the commutator while on the other subspace we will deduce an approximate linearity for $\operatorname{Taylor}_{i}(g_{h})$ . Let

L_{i}^{\ast}=\{(v_{1},v_{2},v_{3},v_{4})\in S_{i}\colon Pv_{1}=0,v_{2}=v_{3}=v_{4}=0\}\cap\widetilde{K_{i}}.

Note that $L_{i}^{\ast}$ may abusively be viewed as a subspace of $V_{i}$ (instead of $V_{i}^{\otimes 4}$ ) and under this identification $L_{i}^{\ast}\leqslant Q_{i}(V_{i})=V_{i,\mathrm{Dep}}\leqslant L_{i}$ .

The key claim in our analysis is if $i_{1}+\cdots+i_{r^{\ast}}=s-1$ , $v_{i_{\ell}}\in L_{i_{\ell}}$ for all indices $\ell$ , and $v_{i_{j}}\in L_{i_{j}}^{\ast}$ for at least one index $j$ we have

\eta([v_{i_{1}},\ldots,v_{i_{\ell}}])=0.

To prove this, note that $Q_{i_{j}}v_{i_{j}}=v_{i_{j}}$ and $P_{i_{j}}v_{i_{j}}=0$ and using the last bullet point of Lemma 8.3, we have

\eta([v_{i_{1}},\ldots,v_{i_{\ell}}])=\eta([P_{i_{1}}v_{i_{1}},\ldots,Q_{i_{j}}v_{i_{j}},\ldots,P_{i_{\ell}}v_{i_{\ell}}]),

similar to the argument in Step 3.

Next note that $P_{i}Q_{i}v=0$ for all $v\in V_{i}$ and therefore

P_{i}(L_{i})\leqslant P_{i}(\pi_{1}(\widetilde{K_{i}}\cap\{(v_{1},v_{2},v_{3},v_{4})\in V_{i}^{\otimes 4}\colon Q_{i}v_{2}=Q_{i}v_{3}=Q_{i}v_{4}=0\})).

Therefore we may lift $P_{i_{\ell}}v_{i_{\ell}}$ for $\ell\neq j$ to $\widetilde{v_{i_{\ell}}}=(P_{i_{\ell}}v_{i_{\ell}}+w_{i_{\ell}},P_{i_{\ell}}v_{i_{\ell}},P_{i_{\ell}}v_{i_{\ell}},P_{i_{\ell}}v_{i_{\ell}})\in\widetilde{K_{i_{\ell}}}$ where $w_{i_{\ell}}\in Q_{i_{\ell}}(V_{i_{\ell}})$ . We lift $v_{i_{j}}$ to $\widetilde{v_{i_{j}}}$ which has the form $(Q_{i_{j}}v_{i_{j}},0,0,0)\in\widetilde{K_{i_{j}}}$ .

Note that we have

	$\displaystyle 0$	$\displaystyle=\eta_{\mathrm{Prod}}([\widetilde{v_{i_{1}}},\ldots,\widetilde{v_{i_{r^{\ast}}}}])=\eta([P_{i_{1}}v_{i_{1}}+w_{i_{1}},\ldots,Q_{i_{j}}v_{i_{j}},\ldots,P_{i_{r^{\ast}}}v_{i_{r^{\ast}}}+w_{i_{r^{\ast}}}])$
		$\displaystyle=\eta([P_{i_{1}}v_{i_{1}},\ldots,Q_{i_{j}}v_{i_{j}},\ldots,P_{i_{r^{\ast}}}v_{i_{r^{\ast}}}])$

where in the first equality we have used for all $\ell$ that $\widetilde{v_{i_{\ell}}}\in\widetilde{K_{i_{\ell}}}$ and the result from Step 3, in the second equality that $\widetilde{v_{i_{j}}}$ has the final three coordinates identically zero, and in the final equality that $w_{i_{\ell}}\in Q_{i_{\ell}}(V_{i_{\ell}})=V_{i_{\ell},\mathrm{Dep}}$ and the final item of Lemma 8.3.

The desired decomposition of spaces for the lemma will have

\displaystyle W_{i,\mathrm{Pet}}

\displaystyle:=L_{i}^{\ast},\quad W_{i,\ast}:=P_{i}(L_{i})\leqslant L_{i}\cap P_{i}(V_{i}).

The fact $P_{i}(L_{i})\leqslant L_{i}\cap P_{i}(V_{i})$ is deduced from $Q_{i}(V_{i})\leqslant L_{i}$ . $W_{i,\mathrm{Lin}}$ will be constructed explicitly in the next step but is chosen so that

W_{i,\mathrm{Lin}}\leqslant Q_{i}(L_{i})=Q_{i}(V_{i})=V_{i,\mathrm{Dep}}

and $W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}=V_{i,\mathrm{Dep}}\leqslant L_{i}$ . Given these properties of $W_{i,\mathrm{Lin}}$ , note that the above analysis, along with Lemma 8.3, establishes the final bullet point for our output.

Step 5: Controlling approximate homomorphisms. Recall $\widetilde{K_{i}}\leqslant S_{i}$ and there is a natural isomorphism of groups

S_{i}\simeq\{(v,v_{1},v_{2},v_{3},v_{4})\colon v\in P_{i}(V_{i}),v_{1},\ldots,v_{4}\in Q_{i}(V_{i})\}.

Using this as an identification, we may write

\widetilde{K_{i}}=\bigcap_{j=1}^{\dim(S_{i})-\dim(\widetilde{K_{i}})}\operatorname{ker}((\xi_{j}^{P_{i}},\xi_{j}^{Q_{i}},\xi_{j}^{Q_{i}},-\xi_{j}^{Q_{i}},-\xi_{j}^{Q_{i}}))

where $\xi_{j}^{P_{i}}\in P_{i}(V_{i})^{\vee}$ and $\xi_{j}^{Q_{i}}\in Q_{i}(V_{i})^{\vee}$ (i.e., corresponding dual vector spaces). Note that the annihilators all have the special form of $(\cdot,\xi_{j}^{Q_{i}},\xi_{j}^{Q_{i}},-\xi_{j}^{Q_{i}},-\xi_{j}^{Q_{i}})$ since

\{(Q_{i}v_{1},Q_{i}v_{2},Q_{i}v_{3},Q_{i}v_{4})\in V_{i}^{\otimes 4}\colon Q_{i}v_{1}+Q_{i}v_{2}-Q_{i}v_{3}-Q_{i}v_{4}=0\}=R_{i}\leqslant\widetilde{K_{i}}.

Note that

L_{i}^{\ast}=\{v\in V_{i}\colon P_{i}v=0\text{ and }\xi_{j}^{Q_{i}}(Q_{i}v)=0\text{ for all }j\}

since $v\in L_{i}^{\ast}$ is equivalent under this identification to $(0,Q_{i}v,0,0,0)\in\widetilde{K_{i}}$ . Without loss of generality we may assume that for $1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})$ , vectors $\xi_{j}^{Q_{i}}$ are independent in $Q_{i}(V_{i})^{\vee}$ (and they must span the orthogonal space to $L_{i}^{\ast}$ within $Q_{i}(V_{i})^{\vee}$ ).

By appropriate scaling, we may assume $\xi_{j}^{Q_{i}}(w_{i,j})$ is an integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ for $1\leq j\leq\dim(V_{i,\mathrm{Dep}})$ . We extend each $\xi_{j}^{Q_{i}}$ to an operator on $(G_{(i,1)}/G_{(i,2)})^{\vee}$ by setting $\xi_{j}^{Q_{i}}(w_{i,j})=0$ for $j>\dim(V_{i,\mathrm{Dep}})$ . Possibly at the cost of another $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ scaling, we may assume that $\xi_{j}^{Q_{i}}(\Gamma\cap G_{(i,1)}~{}\mathrm{mod}~{}G_{(i,2)})\in\mathbb{Z}$ . We extend $\xi_{j}^{P_{i}}(\cdot)$ in an analogous manner to $(G_{(i,1)}/G_{(i,2)})^{\vee}$ by setting $\xi_{j}^{P_{i}}(w_{i,j})=0$ for $1\leq j\leq\dim(V_{i},\mathrm{Dep})$ and $j>\dim(V_{i})$ . Again, we may scale such that $\xi_{j}^{P_{i}}(\Gamma\cap G_{(i,1)}~{}\mathrm{mod}~{}G_{(i,2)})\in\mathbb{Z}$ . The crucial point here is that now $\xi_{j}^{P_{i}}$ and $\xi_{j}^{Q_{i}}$ are $i$ -th horizontal characters of height at most $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ .

We have

\tau_{i}(\operatorname{Taylor}_{i}(\widetilde{g}_{\vec{h}}))\in K_{i}

and thus

\operatorname{dist}(\tau_{i}(\operatorname{Taylor}_{i}(g_{\vec{h}}^{\ast})),S_{i}+T^{-1}\operatorname{Horiz}_{i}(\Gamma^{\otimes 4}))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}

after Pigeonholing $\vec{h}$ appropriately. Here distance is in $L^{\infty}$ after expressing both of these expressions in the basis $\exp(\mathcal{X}_{i})^{\otimes 4}$ and $T$ is an integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . We have used Lemma 2.13 and the properties of the original factorization; a very similar argument appears in Step 5 of the proof of Lemma 8.3.

Furthermore note that

\tau_{i}(\operatorname{Taylor}_{i}(g_{\vec{h}}^{\ast}))\in S_{i}

by Lemma 8.3. So if we choose a set of horizontal characters of height $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ relative to $\operatorname{Horiz}_{i}(\Gamma^{\otimes 4})$ which cut out $\widetilde{K_{i}}$ as their common kernel, then noting that $K_{i}\cap S_{i}\leqslant\widetilde{K_{i}}$ and applying Lemma B.2 we may assume that

(9.1)

\tau_{i}(\operatorname{Taylor}_{i}(\widetilde{g}_{\vec{h}}))\in\widetilde{K_{i}}

and $\varepsilon_{\vec{h}},\gamma_{\vec{h}}$ have identical properties up to changing implicit constants. We will assume this refined property of the factorization for the remainder of our analysis.

Given the factorization of $g_{\vec{h}}^{\ast}$ , we thus deduce (taking an appropriate least common multiple)

	$\displaystyle\bigg{\lVert}T_{1}\cdot\bigg{(}\xi_{j}^{P_{i}}$	$\displaystyle(\operatorname{Taylor}_{i}(g_{h_{2}}))+\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h_{1}}))+\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h_{2}}))$
(9.2)			$\displaystyle\quad-\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h_{3}}))-\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h_{4}}))\bigg{)}\bigg{\rVert}_{\mathbb{R}/\mathbb{Z}}\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}$

for all $1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})$ where $T_{1}$ is an integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Here we have used that $\xi_{j}^{P_{i}}(\operatorname{Taylor}_{i}(g_{h}))$ is equal for all $h\in H$ by Lemma 8.3.

We define functions $f,g\colon H\to\mathbb{R}^{\sum_{i=1}^{s-1}\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})}$ via

	$\displaystyle f(h)$	$\displaystyle=(T_{1}\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h})))_{1\leq i\leq s-1,~{}1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})},$
	$\displaystyle g(h)$	$\displaystyle=(T_{1}\xi_{j}^{P_{i}}(\operatorname{Taylor}_{i}(g_{h}))+T_{1}\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h})))_{1\leq i\leq s-1,~{}1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})}.$

Note that for the additive quadruples on which we have (9.2), we are exactly in the situation necessary to apply results on approximate homomorphisms.

In particular, we may apply Lemma A.1. We see that there exists $H^{\prime}\subseteq H$ having density at least $\exp(-O_{s}(d\log(MD/\rho))^{O_{s}(1)})$ such that for all $i,j$ and $h\in H^{\prime}$ , we have

(9.3)

\bigg{\lVert}T_{1}\xi_{j}^{Q_{i}}(\operatorname{Taylor}_{i}(g_{h}))-\bigg{(}\gamma_{i,j}+\sum_{k=1}^{d^{\ast}}\alpha_{i,j,k}\{\beta_{k}h\}\bigg{)}\bigg{\rVert}_{\mathbb{R}/\mathbb{Z}}\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i},

where:

•

$d^{\ast}\leq(d\log(MD/\rho))^{O_{s}(1)}$ ;
•

$\beta_{k}\in(1/N^{\prime})\mathbb{Z}$ where $N^{\prime}$ is a prime between $100N$ and $200N$ .

At this point, for each $i$ we find elements $Z_{i,j}$ for $1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})$ which are $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational combinations of $\{w_{i,j}\colon 1\leq j\leq\dim(V_{i,\mathrm{Dep}})\}$ such that

(9.4)

T_{1}\xi_{j}^{Q_{i}}(Z_{i,j})=1\text{ and }\xi_{j^{\prime}}^{Q_{i}}(Z_{i,j})=0

for $j^{\prime}\neq j$ such that $1\leq j,j^{\prime}\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})$ . We define

W_{i,\mathrm{Lin}}=\operatorname{span}_{\mathbb{R}}((Z_{i,j})_{1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})}).

We see that there are no nontrivial linear relations between $W_{i,\ast}$ and $W_{i,\mathrm{Pet}}+W_{i,\mathrm{Lin}}$ since $W_{i,\ast}\leqslant P_{i}(V_{i})$ and $W_{i,\mathrm{Pet}}+W_{i,\mathrm{Lin}}\leqslant Q_{i}(V_{i})$ . There are no linear relations between $W_{i,\mathrm{Pet}}$ and $W_{i,\mathrm{Lin}}$ as $W_{i,\mathrm{Pet}}$ lies in the joint kernel of the $\xi_{j}^{Q_{i}}$ and therefore using (9.4) one can prove any such relation is trivial. Furthermore, by construction we have $V_{i,\mathrm{Dep}}=Q_{i}(V_{i})=W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}$ . Finally $L_{i}=P_{i}(L_{i})+Q_{i}(L_{i})=W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}$ ; this implicitly uses $Q(L_{i})=Q_{i}(V_{i})\leqslant L_{i}$ .

Step 6: Constructing the desired factorizations and completing the proof. Using the refined factorization (9.1) implies that

\pi_{1}(\tau(\operatorname{Taylor}_{i}(\widetilde{g}_{\vec{h}})))\in L_{i}

since $\pi_{1}(\widetilde{K_{i}})\leqslant L_{i}$ . Applying $g_{\vec{h}}^{\ast}=\varepsilon_{\vec{h}}\cdot\widetilde{g_{\vec{h}}}\cdot\gamma_{\vec{h}}$ in the first coordinate then implies that

(9.5)

\operatorname{dist}(\operatorname{Taylor}_{i}(g_{h_{1}}),L_{i}+T_{2}^{-1}\operatorname{Horiz}_{i}(\Gamma))\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-i}

for $h_{1}\in H^{\prime}$ where distance is in $L^{\infty}$ after expressing values in terms of $\exp(\mathcal{X}_{i})$ . Here $T_{2}$ is an integer bounded by $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ . Furthermore recall from Lemma 8.3 that

(9.6)

\operatorname{Taylor}_{i}(g_{h})-\operatorname{Taylor}_{i}(g_{h^{\prime}})\in V_{i,\mathrm{Dep}}=Q_{i}(V_{i})

for $h,h^{\prime}\in H^{\prime}\subseteq H$ .

Let $Y_{i,j}\in\operatorname{span}_{\mathbb{R}}(\mathcal{X}\cap\log(G_{(i,1)})\setminus\mathcal{X}\cap\log(G_{(i,2)}))$ be such that $\exp(Y_{i,j})~{}\mathrm{mod}~{}G_{(i,2)}=Z_{i,j}$ . Then for $h\in H^{\prime}$ , we define

(9.7)

\widetilde{g}_{h}(n)=\prod_{i=1}^{s}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\exp(Y_{i,j})^{T_{1}^{-1}\binom{n}{i}\cdot(\gamma_{i,j}+\sum_{k=1}^{d^{\ast}}\alpha_{i,j,k}\{\beta_{k}h\})}.

By construction and Lemma 2.13, for $h,h^{\prime}\in H^{\prime}$ we have

	$\displaystyle\operatorname{Taylor}_{i}(\widetilde{g}_{h}^{-1}g_{h})-\operatorname{Taylor}_{i}(\widetilde{g}_{h^{\prime}}^{-1}g_{h^{\prime}})$	$\displaystyle\in Q_{i}(V_{i}),$
	$\displaystyle\operatorname{dist}(\operatorname{Taylor}_{i}(\widetilde{g}_{h}^{-1}g_{h}),L_{i}+T_{2}^{-1}\operatorname{Horiz}_{i}(\Gamma))$	$\displaystyle\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}\cdot N^{-i},$
	$\displaystyle\lVert T_{1}\xi_{j}^{Q_{i}}(\widetilde{g}_{h}^{-1}g_{h})\rVert_{\mathbb{R}/\mathbb{Z}}$	$\displaystyle\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}\cdot N^{-i},$

where $1\leq i\leq s-1$ and $1\leq j\leq\dim(V_{i,\mathrm{Dep}})-\dim(L_{i}^{\ast})$ . The first line comes from (9.6), the second line from (9.5), and the third from (9.3) and (9.7), in conjunction with (9.4).

We now fix an element $h_{2}\in H^{\prime}$ . For each $h_{1}\in H^{\prime}$ we write

\displaystyle g_{h_{1}}^{\prime}

\displaystyle=\widetilde{g}_{h_{1}}\cdot(\widetilde{g}_{h_{1}}^{-1}g_{h_{1}}^{\prime})=\widetilde{g}_{h_{1}}\cdot(\widetilde{g}_{h_{1}}^{-1}g_{h_{1}}^{\prime})\cdot(\widetilde{g}_{h_{2}}^{-1}g_{h_{2}}^{\prime})^{-1}\cdot(\widetilde{g}_{h_{2}}^{-1}g_{h_{2}}^{\prime}).

By applying Lemma B.2, we may write

(\widetilde{g}_{h_{1}}^{-1}g_{h_{1}}^{\prime})\cdot(\widetilde{g}_{h_{2}}^{-1}g_{h_{2}}^{\prime})^{-1}=\varepsilon_{h_{1}}^{\ast}g_{h_{1}}^{\ast}\gamma_{h_{1}}^{\ast},\qquad(\widetilde{g}_{h_{2}}^{-1}g_{h_{2}}^{\prime})=\varepsilon^{\ast}g^{\ast}\gamma^{\ast}

where $\gamma^{\ast},\gamma_{h_{1}}^{\ast}$ are $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational, $\varepsilon^{\ast},\varepsilon_{h_{1}}^{\ast}$ are $((MD/\rho)^{O_{s}(d^{O_{s}(1)})},N)$ -smooth, and we have $\operatorname{Taylor}_{i}(g_{h_{1}}^{\ast})\in L_{i}^{\ast}=W_{i,\mathrm{Pet}}$ using the first and third lines above and $\operatorname{Taylor}_{i}(g^{\ast})\in L_{i}^{\ast}+P_{i}(L_{i})=W_{i,\ast}+W_{i,\mathrm{Pet}}$ using the second and third lines above. (Recall that $L_{i}^{\ast}\leqslant Q_{i}(V_{i})$ is cut out by the $\xi_{j}^{Q_{i}}$ .) Additionally, these sequences are the identity at $0$ .

Therefore, for $h_{1}\in H^{\prime}$ we have

	$\displaystyle g_{h_{1}}^{\prime}$	$\displaystyle=\varepsilon_{h_{1}}^{\ast}\varepsilon^{\ast}((\varepsilon_{h_{1}}^{\ast}\varepsilon^{\ast})^{-1}\widetilde{g}_{h_{1}}(\varepsilon_{h_{1}}^{\ast}\varepsilon^{\ast}))((\varepsilon^{\ast})^{-1}g_{h_{1}}^{\ast}\varepsilon^{\ast})((\varepsilon^{\ast})^{-1}\gamma_{h_{1}}^{\ast}\varepsilon^{\ast}(\gamma_{h_{1}}^{\ast})^{-1})(\gamma_{h_{1}}^{\ast}g^{\ast}(\gamma_{h_{1}}^{\ast})^{-1})(\gamma_{h_{1}}^{\ast}\gamma^{\ast})$
		$\displaystyle=:(\varepsilon_{h_{1}}^{\ast}\varepsilon^{\ast})\cdot g_{h_{1}}^{\triangle}\cdot(\gamma_{h_{1}}^{\ast}\gamma^{\ast}).$

So, for $h_{3},h_{4}\in H$ we deduce using Lemma 2.13 and the above analysis that

	$\displaystyle\operatorname{Taylor}_{i}(\widetilde{g}_{h_{3}})$	$\displaystyle\in W_{i,\mathrm{Lin}},$
	$\displaystyle\operatorname{Taylor}_{i}(g_{h_{3}}^{\triangle})$	$\displaystyle\in L_{i}=W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}},$
	$\displaystyle\operatorname{Proj}_{W_{i},\mathrm{Lin}}(\operatorname{Taylor}_{i}(g_{h_{3}}^{\triangle}))$	$\displaystyle=\operatorname{Proj}_{W_{i},\mathrm{Lin}}(\operatorname{Taylor}_{i}(\widetilde{g}_{h_{3}})),$
	$\displaystyle\operatorname{Taylor}_{i}(\widetilde{g}_{h_{3}}^{-1}g_{h_{3}}^{\triangle})-\operatorname{Taylor}_{i}(\widetilde{g}_{h_{4}}^{-1}g_{h_{4}}^{\triangle})$	$\displaystyle\in W_{i,\mathrm{Pet}}.$

Furthermore note that $\varepsilon_{h_{1}}^{\ast}\varepsilon^{\ast}$ is sufficiently smooth and $\gamma_{h_{1}}^{\ast}\gamma^{\ast}$ is appropriately rational. This nearly gives the desired result except we need to remove the rational and smooth parts exactly as in Step 7 of Lemma 8.3; we omit the details, although note that the only difference between $g_{h}^{\triangle}$ and the output is a conjugation by a fixed element which leaves all properties unchanged and the Fourier phase on the $\mathbb{R}$ part of $(G^{\ast})^{\prime}$ may be modified. Additionally, the set $H^{\prime}$ will be made smaller by acceptable factors due to Pigeonhole.

Step 7: Handling the exceptional case $s=2$ . In this exceptional case, we have $r^{\ast}=1$ and $s=2$ , and $\eta$ is nontrivial. The difference here versus the prior analysis is that the error term $\psi_{h}(g_{\vec{h}}(n)\Gamma^{\prime})$ is replaced by $e(\Theta_{\vec{h}}n)$ with $\lVert\Theta_{\vec{h}}\rVert_{\mathbb{R}/\mathbb{Z}}\leq(MD/\rho)^{O_{s}(d^{O_{s}(1)})}N^{-1}$ by using the Remark 7.6 regarding Lemma 7.5 for $s=2$ .

We take $G^{\mathrm{Error}}=\mathbb{R}$ , $\Gamma^{\mathrm{Error}}=\mathbb{Z}$ , $g_{\vec{h}}(n)=\Theta_{\vec{h}}n$ , and $\psi_{\vec{h}}(z)=e(z)$ . $\widetilde{G}$ is defined as before. Taking $\eta^{\ast}=(\eta,\eta,-\eta,-\eta,1)$ , by Corollary 5.5 we may factor

g_{\vec{h}}^{\ast}=\varepsilon_{\vec{h}}\cdot\widetilde{g}_{\vec{h}}\cdot\gamma_{\vec{h}}

where $\varepsilon_{\vec{h}}$ is $((MD/\rho)^{O_{s}(d^{O_{s}(1)})},N)$ -smooth, $\gamma_{\vec{h}}$ is $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational, and $\widetilde{g}_{\vec{h}}$ lies in a $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational subgroup $K$ such that $\eta^{\ast}(K\cap\widetilde{G}_{(1,1)})=0$ . Note however that

g_{\vec{h}}^{\ast}=(\mathrm{id}_{G},\mathrm{id}_{G},\mathrm{id}_{G},\mathrm{id}_{G},\Theta_{\vec{h}}n)\cdot(\tau(g_{\vec{h}}^{\ast}),0)

where $\tau\colon\widetilde{G}\to G^{\otimes 4}$ is the natural projection. Let $K^{\ast}=K\cap(G^{\otimes 4}\times\{0\})$ and note that $K^{\ast}$ can be defined as the joint kernel of certain horizontal characters of height $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ (namely, ones defining $K$ along with one of the form $(0,0,0,0,1)$ ). Since $\Theta_{\vec{h}}$ is small and is the only part in the fifth coordinate, arguments similar to before allow us to refine the first factorization (up to changing implicit constants) and instead assume that $\widetilde{g}_{\vec{h}}$ lies in $K^{\ast}$ .

Furthermore note that if $\eta_{\mathrm{Prod}}=(\eta,\eta,-\eta,-\eta,0)$ we have that $\eta_{\mathrm{Prod}}(K^{\ast})=0$ as $\eta_{\mathrm{Prod}}$ and $\eta^{\ast}$ agree on the initial four groups. At this point we are exactly in the situation of the earlier analysis and we may complete the proof.³³3Various simplifications are possible in the case since the underlying groups are all abelian here; in particular, invoking Corollary 5.5 reduces to summing a geometric series. ∎

We remark that modulo minor annoyances, the strategy of using Lemma 7.5, deducing an approximate homomorphism, and then applying results coming from the Bogolyubov lemma was introduced by Gowers [16] in his seminal work on four-term arithmetic progressions. It was similarly applied in work of Green and Tao [23] on the $U^{3}$ -inverse theorem. In a certain sense, the previous two sections can be thought of as showing that, given an appropriate equidistribution theorem and defining a number of notions for nilmanifolds, this analysis can be modified to make sense in the greater generality of nilmanifolds where the group is not abelian.

10. Setup for extracting a $(1,s-1)$ -nilsequence

Before diving into the formal proof, we motivate how we extract the “top degree-rank” part and why lifting to the universal nilmanifold plays a role in our argument at this stage. We remark that Green, Tao, and Ziegler [34] work with the universal nilmanifold throughout their argument (in the form of a representation of a degree-rank nilcharacter; see [34, Definition 9.11]).

Recall the bracket polynomial $U^{5}$ -inverse sketch discussed in Section 4; we started with functions

e\bigg{(}\sum_{i=1}^{d_{1}}a_{i,h}n[b_{i,h}n][c_{i,h}n]+\sum_{i=1}^{d_{2}}d_{i,h}n^{2}[e_{i,h}n]+\sum_{i=1}^{d_{3}}f_{i,h}n[g_{i,h}n]+j_{h}n^{3}+\ell_{h}n^{2}+m_{h}n\bigg{)}

which correlate with $\Delta_{h}f$ . At this point, we have proven that

\sum_{i=1}^{d_{1}}a_{i,h}n[b_{i,h}n][c_{i,h}n]

is equivalent to a bracket polynomial up to lower order terms of degree-rank of the form

\sum_{i=1}^{d_{1}^{\prime}}\delta_{i}\{\varepsilon_{i}h\}n[\beta_{i,\ast}n][\gamma_{i,\ast}n].

Our goal at this stage is to isolate

e\bigg{(}\sum_{i=1}^{d_{1}^{\prime}}\delta_{i}\{\varepsilon_{i}h\}n[\beta_{i,\ast}n][\gamma_{i,\ast}n]\bigg{)};

in the next section we will then convert this “top degree–rank” bracket phase into a $(1,s-1)$ –nilsequence.

The reason lifting to a universal nilmanifold proves so technically useful is that it enables us to isolate various components of the horizontal tori as “separate subgroups”. For the sake of simplicity, consider a $2$ -step group $G$ in the $U^{4}$ -inverse case given the degree-rank filtration $G_{(0,0)}=G_{(1,0)}=G_{(1,1)}=G$ , $G_{(2,0)}=G_{(2,1)}=G_{(2,2)}=[G,G]$ , where the remaining groups are trivial. In this case, the output of Lemma 9.1 gives the linearly disjoint subspaces $W_{\ast}$ , $W_{\mathrm{Lin}}$ , $W_{\mathrm{Pet}}$ of $V=G/[G,G]$ such that the commutator of any two elements in $W_{\mathrm{Lin}}+W_{\mathrm{Pet}}$ vanishes and the commutator of any element of $W_{\ast}$ and $W_{\mathrm{Pet}}$ vanishes.

Let $\mathcal{Z}_{\ast}$ denote the rational basis of $\log(W_{\ast})~{}\mathrm{mod}~{}[G,G]$ and $\mathcal{Z}_{\mathrm{Lin}}$ and $\mathcal{Z}_{\mathrm{Pet}}$ be analogous. We also have a decomposition of our polynomial

g_{h}=g_{h,\ast}+g_{h,\mathrm{Lin}}+g_{h,\mathrm{Pet}}~{}\mathrm{mod}~{}[G,G]

where

	$\displaystyle g_{h,\mathrm{Lin}}(n)$	$\displaystyle=\prod_{i=1}^{\dim(W_{\mathrm{Lin}})}\exp(\delta_{i}n\{\varepsilon_{i}h\}Z_{i}^{\mathrm{Lin}}),\qquad g_{h,\ast}(n)=\prod_{i=1}^{\dim(W_{\ast})}\exp(\beta_{i}nZ_{i}^{\ast}),$
	$\displaystyle g_{h,\mathrm{Pet}}(n)$	$\displaystyle=\prod_{i=1}^{\dim(W_{\mathrm{Pet}})}\exp(\gamma_{i}^{h}nZ_{i}^{\mathrm{Pet}}).$

Therefore we may write

g_{h}(n)=g_{h,\ast}(n)g_{h,\mathrm{Lin}}(n)g_{h,\mathrm{Pet}}(n)g_{h,\mathrm{Rem}}(n)

with $g_{h,\mathrm{Rem}}(n)\in[G,G]$ pointwise. The top order term which we seek to isolate is heuristically similar to

e\bigg{(}\sum_{i=1}^{\dim(W_{\mathrm{Lin}})}\sum_{j=1}^{\dim(W_{\ast})}\delta_{i}n\{\varepsilon_{i}h\}[\beta_{j}n][Z_{i}^{\mathrm{Lin}},Z_{j}^{\ast}]\bigg{)}.

Note that given the factorization of $g_{h}$ , we have established no control over $g_{h,\mathrm{Pet}}$ and $g_{h,\mathrm{Rem}}$ . This may suggest that we wish to quotient out by the subgroup $W_{\mathrm{Pet}}[G,G]$ in order to kill these terms; note however that $G/(W_{\mathrm{Pet}}[G,G])$ now abelian and such a projection “kills” the higher order degree-rank term calculated above. This suggest that the group $W_{\mathrm{Pet}}[G,G]$ is “too large” a quotient. The solution is to “enlarge” the group $G$ so that the subgroup $[W_{\mathrm{Lin}},W_{\ast}]$ and the subgroup $G^{\prime}$ that corresponds to the remaining phases $\gamma_{h}n^{2}+\delta_{h}n$ are disjoint. We can then quotient by $W_{\mathrm{Pet}}G^{\prime}$ . This disjointness is accomplished by lifting to the universal nilmanifold of degree-rank $(2,2)$ .

10.1. Unwinding the output of Lemma 9.1

We first require the following elementary lemma regarding lattice elements when presented in first-kind coordinates.

Lemma 10.1.

Fix an integer $k\geq 1$ . Consider a nilmanifold $G/\Gamma$ of dimension $d$ with a Mal’cev basis $\mathcal{X}=\{X_{1},\ldots,X_{d}\}$ of $\log G$ which is $Q$ -rational and such that $\mathcal{X}$ has the degree $k$ nesting property. Then there exists a positive integer $Q^{\prime}\leq O_{k}(Q^{O_{k}(d^{O_{k}(1)})})$ such that if $z_{j}\in Q^{\prime}\cdot\mathbb{Z}$ then

\exp\bigg{(}\sum_{j=1}^{d}z_{j}X_{j}\bigg{)}\in\Gamma.

Proof.

Note that $\Gamma=\psi_{\mathcal{X}}(\mathbb{Z}^{d})$ . By [42, Lemma B.1], $\psi_{\mathcal{X}}\circ\psi_{\mathrm{exp},\mathcal{X}}^{-1}$ is a degree $O_{k}(1)$ polynomial with coefficients of height at most $Q^{O_{k}(d^{O(1)})}$ . The desired result then follows by taking $Q^{\prime}$ to the least common multiple of all denominators of all coefficients present in this polynomial (since there are only $O_{k}(d^{O_{k}(1)})$ total coefficients). Note that the polynomial corresponding to $\psi_{\mathcal{X}}\circ\psi_{\mathrm{exp},\mathcal{X}}^{-1}$ has no constant term by observing the image of $\mathrm{id}_{G}$ . ∎

We next require the following additional elementary lemma which gives a Taylor series expansion which is “graded by the Mal’cev basis”.

Lemma 10.2.

Consider a nilmanifold $G/\Gamma$ of degree $k$ with an adapted Mal’cev basis $\mathcal{X}=\{X_{1},\ldots,X_{\dim(G)}\}$ and a polynomial sequence $g(n)$ . There exists a representation

g(n)=\prod_{i=0}^{k}\prod_{j=\dim(G)-\dim(G_{k})+1}^{\dim(G)}\exp(X_{j})^{\alpha_{i,j}\cdot\frac{n^{i}}{i!}}

where $\alpha_{i,j}\in\mathbb{R}$ .

Proof.

Note via Baker–Campbell–Hausdorff and existence of Taylor expansions, we may write

g(n)=\exp\bigg{(}\sum_{i=0}^{s}g_{i}\cdot\frac{n^{i}}{i!}\bigg{)}

with $g_{i}\in\log(G_{i})$ . Let $g_{0}(n)=g(n)$ and $g_{0,i}=g_{i}$ . Then iteratively define $g_{\ell+1}(n)$ by the following process: write $\sum_{j=\dim(G)-\dim(G_{(\ell,0)})+1}^{\dim(G)}\alpha_{\ell,j}X_{j}=g_{\ell,\ell}$ . Then let

g_{\ell+1}(n):=\Bigg{(}\prod_{j=\dim(G)-\dim(G_{\ell})+1}^{\dim(G)}\exp(X_{j})^{\alpha_{\ell,j}\cdot\frac{n^{\ell}}{\ell!}}\Bigg{)}^{-1}g_{\ell}(n)

and write

g_{\ell+1}(n)=\exp\bigg{(}\sum_{i=\ell+1}^{s}g_{\ell+1,i}\cdot\frac{n^{i}}{i!}\bigg{)}

in order to define $g_{\ell+1,i}$ . There exists a valid choice of $\alpha_{\ell,j}$ at each step since $\mathcal{X}$ is a filtered Mal’cev basis and there exists a valid choice of $g_{\ell+1,i}$ for $i\geq\ell+1$ by Baker–Campbell–Hausdorff. This process terminates with the identity sequence, and unraveling gives the desired. ∎

Remark.

Note that in the above proof, the reason we do not use the basis $\binom{n}{i}$ is that $\binom{n}{i}\binom{n}{j}$ is not a linear combination of polynomials of the form $\binom{n}{t}$ for $t\geq\max(i,j)+1$ and hence the Baker–Campbell–Hausdorff to construct $g_{\ell+1,i}$ fails (one needs lower-degree terms with $i\leq\ell$ ).

We now explicitly unwind, for the sake of clarity, the conclusion of Lemma 9.1. We will use the notation and conclusions here throughout the Sections 10 and 11. Suppose we have a $1$ -bounded function $f\colon[N]\to\mathbb{C}$ with a degree-rank $(s-1,r^{\ast})$ correlation structure with parameters $\rho,M,D,d$ .⁴⁴4We apologize to the reader; there is a rather incredible amount of data which is floating around at this point. The crucial details to track are data regarding Taylor coefficient and the associated decompositions of the vector spaces corresponding to horizontal tori. Then by Lemma 9.1 and some relabeling there exists a degree-rank $(s-1,r^{\ast})$ correlation structure with parameters

\displaystyle\rho^{\prime-1}

\displaystyle\leq\exp(O_{s}((d\log(MD/\rho))^{O_{s}(1)})),\quad M^{\prime}\leq O(M),\quad D^{\prime}=D,\quad d^{\prime}\leq O(d)

and

•

A subset $H\subseteq[N]$ with $|H|\geq\rho^{\prime}N$ ;
•

A multidegree $(1,s-1)$ nilcharacter $\chi(h,n)$ with a frequency $\eta^{\ast}$ with height at most $M$ . Furthermore $\chi$ lives on a nilmanifold $(G^{\ast}\times\mathbb{R})/(\Gamma^{\ast}\times\mathbb{Z})$ with dimension bounded by $d^{\prime}$ , output dimension bounded by $D^{\prime}$ , complexity bounded by by $M^{\prime}$ , and the function underlying $\chi$ is $M^{\prime}$ -Lipschitz. We let $g(h,n)$ denote the underlying polynomial sequence;
•

A collection of degree-rank $(s-1,r^{\ast})$ nilcharacters $\chi_{h}(n)$ with a frequency $\eta$ with of height at most $M$ . Furthermore $\chi_{h}$ lives on a nilmanifold $G/\Gamma$ with dimension bounded by $d$ , output dimension bounded by $D$ , $G/\Gamma$ has complexity bounded by $M$ and the function underlying $\chi_{h}$ (which is independent of $h$ ) is $M^{\prime}$ -Lipschitz. We let $g_{h}$ denote the underlying polynomial sequence and we have $g_{h}(0)=\mathrm{id}_{G}$ ;
•

For all $h\in H$ , we have

$\Delta_{h}f(n)\otimes\chi(h,n)\otimes\chi_{h}(n)\in\operatorname{Corr}(s-2,\rho^{\prime},M^{\prime},d^{\prime});$
•

Then there exists a collection of subspaces $W_{i,\ast},W_{i,\mathrm{Lin}},W_{i,\mathrm{Pet}}\leqslant G_{(i,1)}/G_{(i,2)}$ for $1\leq i\leq s-1$ which are $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -rational with respect to $\exp(\mathcal{X})\cap G_{(i,1)}~{}\mathrm{mod}~{}G_{(i,2)}$ ;
•

If $W_{i}=W_{i,\ast}+W_{i,\mathrm{Lin}}+W_{i,\mathrm{Pet}}$ then $\dim(W_{i})=\dim(W_{i,\ast})+\dim(W_{i,\mathrm{Lin}})+\dim(W_{i,\mathrm{Pet}})$ ;
•

Let $Z_{i,1}^{\ast},\ldots,Z_{i,\dim(W_{i,\ast})}^{\ast}$ a sequence of integral linear combinations of $\mathcal{X}\cap G_{(i,1)}\setminus\mathcal{X}\cap G_{(i,2)}$ such that $\operatorname{span}_{\mathbb{R}}(\exp(Z_{i,1}^{\ast}~{}\mathrm{mod}~{}G_{(i,2)},\ldots,\exp(Z_{i,\dim(W_{i,\ast})}^{\ast})~{}\mathrm{mod}~{}G_{(i,2)})=W_{i,\ast}$ . We may let the coefficients of $Z_{i,j}^{\ast}$ be $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -bounded and $\exp(Z_{i,j}^{\ast})\in\Gamma$ .
•

Let $Z_{i,1}^{\mathrm{Lin}},\ldots,Z_{i,\dim(W_{i,\mathrm{Lin}})}^{\mathrm{Lin}}$ be a sequence of integral linear combinations of $\mathcal{X}\cap G_{(i,1)}\setminus\mathcal{X}\cap G_{(i,2)}$ such that $\operatorname{span}_{\mathbb{R}}(\exp(Z_{i,1}^{\mathrm{Lin}})~{}\mathrm{mod}~{}G_{(i,2)},\ldots,\exp(Z_{i,\dim(W_{i,\mathrm{Lin}})}^{\mathrm{Lin}})~{}\mathrm{mod}~{}G_{(i,2)})=W_{i,\mathrm{Lin}}$ . We may let the coefficients of $Z_{i,j}^{\mathrm{Lin}}$ are $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -bounded and $\exp(Z_{i,j}^{\mathrm{Lin}})\in\Gamma$ .
•

Let $Z_{i,1}^{\mathrm{Pet}},\ldots,Z_{i,\dim(W_{i,\mathrm{Pet}})}^{\mathrm{Pet}}$ be a sequence of integral linear combinations of $\mathcal{X}\cap G_{(i,1)}\setminus\mathcal{X}\cap G_{(i,2)}$ such that $\operatorname{span}_{\mathbb{R}}(\exp(Z_{i,1}^{\mathrm{Pet}})~{}\mathrm{mod}~{}G_{(i,2)},\ldots,\exp(Z_{i,\dim(W_{i,\mathrm{Pet}})}^{\mathrm{Pet}})~{}\mathrm{mod}~{}G_{(i,2)})=W_{i,\mathrm{Pet}}$ . We may let the coefficients of $Z_{i,j}^{\mathrm{Pet}}$ be $(MD/\rho)^{O_{s}(d^{O_{s}(1)})}$ -bounded and let $\exp(Z_{i,j}^{\mathrm{Pet}})\in\Gamma$ .

•

For $1\leq i\leq s-1$ and $h\in H$ , we have

	$\displaystyle\operatorname{Taylor}_{i}(g_{h})$	$\displaystyle=\prod_{j=1}^{\dim(W_{i,\ast})}\exp(Z_{i,j}^{\ast})^{z_{i,j}^{\ast}}\cdot\prod_{j=1}^{\dim(W_{i,\mathrm{Pet}})}\exp(Z_{i,j}^{\mathrm{Pet}})^{z_{i,j}^{h,\mathrm{Pet}}}$
		$\displaystyle\qquad\qquad\cdot\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\exp(Z_{i,j}^{\mathrm{Lin}})^{z_{i,j}^{h,\mathrm{Lin}}}~{}\mathrm{mod}~{}G_{(i,2)}$

where

z_{i,j}^{h,\mathrm{Lin}}=\gamma_{i,j}+\sum_{k=1}^{d^{\ast}}\alpha_{i,j,k}\{\beta_{k}h\}

where $d^{\ast}\leq(d\log(MD/\rho))^{O_{s}(1)}$ and $\beta_{k}\in(1/N^{\prime})\mathbb{Z}$ where $N^{\prime}$ is a prime in $[100N,200N]$ .

•

For any integers $i_{1}+\cdots+i_{r^{\ast}}=s-1$ , suppose that $v_{i_{j}}\in W_{i_{j}}$ for all $j$ . If for at least one index $\ell$ we have $v_{i_{\ell}}\in W_{i_{\ell},\mathrm{Pet}}$ , then if $w$ is any $(r^{\ast}-1)$ -fold commutator of $v_{i_{1}},\ldots,v_{i_{r^{\ast}}}$ , we have

$\eta(w)=0.$

Furthermore, if instead for at least two indices $\ell_{1},\ell_{2}$ we have $v_{i_{\ell_{1}}}\in W_{i_{\ell_{1}},\mathrm{Lin}}$ and $v_{i_{\ell_{2}}}\in W_{i_{\ell_{2}},\mathrm{Lin}}$ , then if $w$ which is any $(r^{\ast}-1)$ -fold commutator of $v_{i_{1}},\ldots,v_{i_{r^{\ast}}}$ we have

$\eta(w)=0.$

We have relabeled as $H^{\prime}$ by $H$ , $g_{h}^{\prime}$ by $g_{h}$ , $g^{\prime}(h,n)$ by $g(h,n)$ , $\chi^{\prime}$ by $\chi$ , and $\chi_{h}^{\prime}$ by $\chi_{h}$ . We have applied Lemma 10.1 and scaling to guarantee that $\exp(Z_{i,j}^{\ast}),\exp(Z_{i,j}^{\mathrm{Lin}}),\exp(Z_{i,j}^{\mathrm{Pet}})\in\Gamma$ .

Let $\mathcal{X}=\{X_{1},\ldots,X_{\dim(G)}\}$ denote the filtered Mal’cev basis given for $G/\Gamma$ . Via Lemma 10.2, for $h\in H$ we may define

	$\displaystyle g_{h}^{\ast}$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\ast})}\exp(Z_{i,j}^{\ast})^{z_{i,j}^{\ast}\cdot\frac{n^{i}}{i!}}\cdot\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\exp(Z_{i,j}^{\mathrm{Lin}})^{\gamma_{i,j}\cdot\frac{n^{i}}{i!}},$
	$\displaystyle g_{h}^{\mathrm{Lin}}$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\exp(Z_{i,j}^{\mathrm{Lin}})^{(z_{i,j}^{h,\mathrm{Lin}}-\gamma_{i,j})\cdot\frac{n^{i}}{i!}},$
	$\displaystyle g_{h}^{\mathrm{Pet}}$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Pet}})}\exp(Z_{i,j}^{\mathrm{Pet}})^{z_{i,j}^{h,\mathrm{Pet}}\cdot\frac{n^{i}}{i!}},$

and define $g_{h}^{\mathrm{Rem}}$ via

g_{h}=g_{h}^{\ast}\cdot g_{h}^{\mathrm{Lin}}\cdot g_{h}^{\mathrm{Pet}}\cdot g_{h}^{\mathrm{Rem}}.

Using Lemma 10.2 again, we may write

g_{h}^{\mathrm{Rem}}=\prod_{i=1}^{s-1}\prod_{j=\dim(G)-\dim(G_{i,1})+1}^{\dim(G)}\exp(X_{j})^{\kappa_{i,j}^{h}\cdot\frac{n^{i}}{i!}}.

The fact that when applying Lemma 10.2 for $g_{h}^{\mathrm{Rem}}$ we observe no coefficients for $\frac{n^{i}}{i!}$ corresponding to basis elements in $\mathcal{X}\cap\log(G_{(i,1)})\setminus\mathcal{X}\cap\log(G_{(i,2)})$ follows from the fact that $g_{h}$ and $g_{h}^{\ast}\cdot g_{h}^{\mathrm{Lin}}\cdot g_{h}^{\mathrm{Pet}}$ have Taylor coefficients which match exactly for $1\leq i\leq s-1$ .

We now reach the first stage of “rewriting” where we realize the nilsequence $\chi_{h}(n)=F(g_{h}(n)\Gamma)$ on a universal nilmanifold.

10.2. Rewriting degree-rank nilsequences on the universal nilmanifold

We recall the universal nilmanifold of a given degree-rank (see [34, Definition 9.1]).

Definition 10.3.

The universal nilmanifold of degree-rank $(s-1,r^{\ast})$ and the associated discrete cocompact subgroup are defined as follows. We write $G_{\mathrm{Univ}}=G_{\mathrm{Univ}}^{\vec{D}}$ where $\vec{D}=\vec{D}^{\ast}+\vec{D}^{\mathrm{Lin}}+\vec{D}^{\mathrm{Pet}}$ with $\vec{D}^{\ast},\vec{D}^{\mathrm{Lin}},\vec{D}^{\mathrm{Pet}}\in(\mathbb{Z}_{\geq 0})^{s-1}$ . We specify $G_{\mathrm{Univ}}^{\vec{D}}$ by formal generators of the Lie algebra $e_{i,j}$ for $1\leq i\leq s-1$ and $1\leq j\leq D_{i}$ where $D_{i}=D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}+D_{i}^{\mathrm{Pet}}$ with the relations:

•

Any $(r-1)$ -fold commutator of $e_{i_{1},j_{1}},\ldots,e_{i_{r},j_{r}}$ with $i_{1}+\cdots+i_{r}>(s-1)$ vanishes;
•

Any $(r-1)$ -fold commutator of $e_{i_{1},j_{1}},\ldots,e_{i_{r},j_{r}}$ with $i_{1}+\cdots+i_{r}=(s-1)$ and $r>r^{\ast}$ vanishes.

The associated discrete group which we will be concerned with is $\Gamma_{\mathrm{Univ}}$ which is the discrete group generated by $\exp(e_{i,j})$ for $1\leq i\leq s-1$ and $1\leq j\leq D_{i}$ .

Remark.

Note that in this definition, $G_{\mathrm{Univ}}^{\vec{D}}$ depends only on $\vec{D}$ ; however, the quotient we will consider later depends on $\vec{D}^{\ast},\vec{D}^{\mathrm{Lin}},\vec{D}^{\mathrm{Pet}}$ . Furthermore, we have presented $G_{\mathrm{Univ}}$ as a Lie algebra and not as a Lie group; via the general theory of nilpotent Lie algebras this is sufficient. Note that the Lie algebra defined is trivially seen to be nilpotent. By the Birkhoff Embedding Theorem (see remark following [12, Theorem 1.1.11]), we may realize any real nilpotent Lie algebra $\mathfrak{g}$ as a Lie subalgebra of the $n\times n$ real strictly upper triangular matrices. The proof of [12, Theorem 1.2.1] then realizes the $n\times n$ real strictly upper triangular matrices as a logarithm of a connected, simply connected Lie group $N_{n}$ where the exponential map is bijective. The Baker–Campbell–Hausdorff formula then demonstrates $\mathfrak{g}$ is the logarithm of a connected, simply connected subgroup $G\leqslant N_{n}$ (and by construction the logarithm is a bijection between $G$ and $\mathfrak{g}$ ). The group $G$ constructed is unique up to isomorphism by Lie’s third theorem.

We first prove the fact that $G_{\mathrm{Univ}}$ may be given a degree-rank filtration and that $G_{\mathrm{Univ}}/\Gamma_{\mathrm{Univ}}$ has reasonable complexity.

Lemma 10.4.

Let $G_{\mathrm{Univ}}=G_{\mathrm{Univ}}^{\vec{D}}$ and define $(G_{\mathrm{Univ}})_{(d,r)}$ by taking the group generated by all $(r^{\prime}-1)$ -fold iterated commutators of $\exp(t_{i_{1},j_{1}}e_{i_{1},j_{1}}),\ldots,\exp(t_{i_{r^{\prime}},j_{r^{\prime}}}e_{i_{r^{\prime}},j_{r^{\prime}}})$ with $t_{i_{k},j_{k}}\in\mathbb{R}$ , and either $i_{1}+\cdots+i_{r^{\prime}}>d$ or $i_{1}+\cdots+i_{r^{\prime}}=d$ and $r^{\prime}\geq r$ .

Then $(G_{\mathrm{Univ}})_{(d,r)}$ forms a valid degree-rank $(s-1,r^{\ast})$ filtration of $G_{\mathrm{Univ}}$ . Furthermore the dimension of $G_{\mathrm{Univ}}$ is bounded by $O_{s}(\lVert D\rVert_{\infty}^{O_{s}(1)})$ and one may find an adapted Mal’cev basis $\mathcal{X}_{\mathrm{Univ}}$ such that the complexity of $G_{\mathrm{Univ}}/\Gamma_{\mathrm{Univ}}$ is at most $\exp(\lVert D\rVert_{\infty}^{O_{s}(1)})$ .

Proof.

We will be brief with details; that the associated filtration is valid follows via a straightforward computation with Lemma 2.2. Note $(G_{\mathrm{Univ}})_{(i,0)}=(G_{\mathrm{Univ}})_{(i,1)}$ as $r^{\prime}\geq 1$ in the set of generators always. Also, $(G_{\mathrm{Univ}})_{(0,0)}=(G_{\mathrm{Univ}})_{(0,1)}$ since for all generators $e_{i,j}$ we have $i\geq 1$ .

To establish the complexity bounds, the key point is noting that taking all $(r^{\prime}-1)$ -fold iterated commutators of $e_{i_{1},j_{1}},\ldots,e_{i_{r^{\prime}},j_{r^{\prime}}}$ with $i_{1}+\cdots+i_{r^{\prime}}\leq s-2$ or $i_{1}+\cdots+i_{r^{\prime}}=s-1$ and $r^{\prime}\leq r^{\ast}$ gives a spanning set for $\log(G_{\mathrm{Univ}})$ . This immediately gives the specified dimension bound. These generators are not linearly independent; however, all relations are generated by either antisymmetry ( $[x,y]+[y,x]=0$ ) or the Jacobi identity ( $[x,[y,z]]+[y,[z,x]]+[z,[x,y]]=0$ ) applied to the set of generators specified.

To simplify matters, note that all linear relations can be reduced to those between these generators with the “same type” (i.e., relations between the set of $(r^{\prime}-1)$ -fold commutators of a given set of generators $e_{i_{1},j_{1}},\ldots,e_{i_{r^{\prime}},j_{r^{\prime}}}$ ). These can be collected into disconnected non-interacting “components” which are $O_{s}(1)$ in size. We may take a linearly spanning set within each group; each generator not in the spanning set may be written as a linear combination of height $O_{s}(1)$ . Define $\mathcal{X}$ to be the union of all these spanning elements in $\log(G_{\mathrm{Univ}})$ . This gives us a basis. Note the subspaces $\log((G_{\mathrm{Univ}})_{(d,r)})$ are clearly compatible with natural subsets of these “components” and their associated spanning sets, demonstrating that the basis is appropriate adapted to these vector spaces $\log((G_{\mathrm{Univ}})_{(d,r)})$ .

The last matter to check is that there exists $C_{s}\geq 1$ such that $C_{s}\mathbb{Z}^{\dim(G_{\mathrm{Univ}})}\subseteq\psi_{\mathrm{exp},\mathcal{X}}(\Gamma_{\mathrm{Univ}})\subseteq C_{s}^{-1}\mathbb{Z}^{\dim(G_{\mathrm{Univ}})}$ . This follows by noting that each element $\gamma\in\Gamma$ may be written as

\gamma=\prod_{k=1}^{t}\exp(e_{i_{k},j_{k}})^{s_{k}}

with $s_{k}\in\mathbb{Z}$ . We prove the first implication first; we prove that $\log(\gamma)$ may be written as a linear combination of iterated commutators where $(r^{\prime}-1)$ -fold commutators have denominator bounded by $C_{s}^{r^{\prime}}$ . This is trivial to prove inductively via Baker–Campbell–Hausdorff and noting that all $s$ -fold commutators vanish.

For the reverse direction, consider expressions of the form

\gamma^{\prime}=\exp\bigg{(}\sum_{\alpha}c_{\alpha}e_{\alpha}\bigg{)}

where $e_{\alpha}$ ranges over all possible iterated commutators (here e.g. $e_{[(1,2),(1,3)]}:=[e_{(1,2)},e_{(1,3)}]$ ) where $c_{\alpha}$ are sufficiently divisible integers. Let $f_{\alpha}$ be defined as the commutator of the exponential of associated elements; e.g. $f_{[(1,2),(1,3)]}=[\exp(e_{(1,2)}),\exp(e_{(1,3)})]$ . Choose a generator $\alpha^{\prime}$ with the fewest number of commutators in $\gamma^{\prime}$ such that $c_{\alpha^{\prime}}\neq 0$ . It is straightforward to see via Baker–Campbell–Hausdorff that there is an integer $M_{s}$ such that if $c_{\alpha}$ are all divisible by $M_{s}$ then

f_{\alpha^{\prime}}^{-c_{\alpha^{\prime}}}\gamma^{\prime}=\exp\bigg{(}\sum_{\alpha}c_{\alpha}^{\ast}e_{\alpha}\bigg{)}

has each $c_{\alpha}^{\ast}$ still divisible by $M_{s}$ and $c_{\alpha^{\prime}}^{\ast}=0$ (without introducing backwards corrections).

The desired result then follows from [42, Lemma B.11], noting that $(G_{\mathrm{Univ}})_{(d,r)}$ is the degree-rank ordering forming a nested sequence of subgroups. ∎

We now represent the nilsequences $\chi_{h}(n)=F(g_{h}(n))$ on the universal nilmanifold. We define

\displaystyle D_{i}^{\ast}

\displaystyle=\dim(W_{i,\ast})+\dim(W_{i,\mathrm{Lin}}),\quad D_{i}^{\mathrm{Pet}}=\dim(W_{i,\mathrm{Pet}})+\dim(G_{(i,2)}),\quad D_{i}^{\mathrm{Lin}}=d^{\ast}\dim(W_{i,\mathrm{Lin}}).

Note that $D_{i}^{\mathrm{Lin}}\leq(d\log(MD/\rho))^{O_{s}(1)}$ and trivially $D_{i}^{\ast},D_{i}^{\mathrm{Lin}}\leq d$ .

Recall that $\mathcal{X}=\{X_{1},\ldots,X_{\dim(G)}\}$ is the filtered Mal’cev basis and $Z_{i,j}^{\ast}$ , $Z_{i,j}^{\mathrm{Pet}}$ , $Z_{i,j}^{\mathrm{Lin}}$ are representative of $\log(W_{i,\ast})$ , $\log(W_{i,\mathrm{Lin}})$ , and $\log(W_{i,\mathrm{Pet}})$ respectively.

We define a homomorphism $\phi\colon G_{\mathrm{Univ}}\to G$ by defining the map on generators. Define

\displaystyle\phi(\exp(e_{i,j}))=\begin{cases}\exp(Z_{i,j}^{\ast})&\text{ if }1\leq j\leq\dim(W_{i,\ast}),\\ \exp(Z_{i,j-\dim(W_{i,\ast})}^{\mathrm{Lin}})&\text{ if }\dim(W_{i,\ast})+1\leq j\leq\dim(W_{i,\ast})+\dim(W_{i,\mathrm{Lin}})=D_{i}^{\ast},\\ \exp(Z_{i,\ell}^{\mathrm{Lin}})&\text{ if }1+(\ell-1)d^{\ast}\leq j-D_{i}^{\ast}\leq\ell d^{\ast}\text{ for }1\leq\ell\leq\dim(W_{i,\mathrm{Lin}}),\\ \exp(Z_{i,j-D_{i}^{\ast}-D_{i}^{\mathrm{Lin}}}^{\mathrm{Pet}})&\text{ if }D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}+1\leq j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}+\dim(W_{i,\mathrm{Pet}}),\\ \exp(X_{j-D_{i}+\dim(G)})&\text{ if }D_{i}-\dim(G_{(i,2)})+1\leq j\leq D_{i}.\end{cases}

That this is a homomorphism is an immediate consequence of the fact that the only relations on the universal nilmanifold are forced on the group $G$ since it has degree-rank $(s-1,r^{\ast})$ .

The function with which will be concerned is

\widetilde{F}(g\Gamma_{\mathrm{Univ}}):=F(\phi(g)\Gamma).

This is well-defined since $\phi(\Gamma_{\mathrm{Univ}})\leqslant\Gamma$ ; it suffices to check that the generators $\exp(e_{i,j})$ map to within $\Gamma$ but this is trivial by construction. (This is precisely why we scaled $Z_{i,\cdot}^{\ast}$ , $Z_{i,\cdot}^{\mathrm{Lin}}$ , and $Z_{i,\cdot}^{\mathrm{Pet}}$ so that when exponentiated they live within $\Gamma$ .)

We now note a series of basic properties of $\widetilde{F}$ and the homomorphism $\phi$ .

Lemma 10.5.

Given the above setup we have:

•

$\lVert\widetilde{F}\rVert_{2}=1$ for all $g\in G_{\mathrm{Univ}}$ ;
•

$F$ has a vertical frequency $\eta_{\mathrm{Univ}}$ with height at most $(MD/\rho)^{O_{s}(\dim(G_{\mathrm{Univ}})^{O_{s}(1)})}$ ;
•

$\widetilde{F}$ is $(MD/\rho)^{O_{s}(\dim(G_{\mathrm{Univ}})^{O_{s}(1)})}$ -Lipschitz
•

Consider $e_{i_{1},j_{1}},\ldots,e_{i_{r^{\ast}},j_{r^{\ast}}}$ with $j_{1}+\cdots+j_{r^{\ast}}=s-1$ . If for at least one index $\ell$ we have $j_{\ell}>D_{i_{\ell}}^{\ast}+D_{i_{\ell}}^{\mathrm{Lin}}$ , then

$\eta_{\mathrm{Univ}}([\exp(e_{i_{1},j_{1}}),\ldots,\exp(e_{i_{r^{\ast}},j_{r^{\ast}})}])=0.$

Furthermore, if instead for two indices $\ell_{1},\ell_{2}$ we have $j_{\ell_{1}}>D_{i_{\ell_{1}}}^{\ast}$ and $j_{\ell_{2}}>D_{i_{\ell_{2}}}^{\ast}$ then

$\eta_{\mathrm{Univ}}([\exp(e_{i_{1},j_{1}}),\ldots,\exp(e_{i_{r^{\ast}},j_{r^{\ast}}})])=0.$

Proof.

The first property is trivial. For the second property, note that $\phi$ is an $I$ -filtered homomorphism (e.g. $\phi((G_{\mathrm{Univ}})_{(s,r^{\ast}-1)})\leqslant G_{(s,r^{\ast}-1)}$ ). Thus given $g\in G_{\mathrm{Univ}}$ , $g^{\prime}\in(G_{\mathrm{Univ}})_{(s,r^{\ast}-1)}$ we have

\widetilde{F}(gg^{\prime}\Gamma_{\mathrm{Univ}})=F(\phi(g)\phi(g^{\prime})\Gamma)=e(\eta(\phi(g^{\prime})))F(\phi(g)\Gamma)

and therefore we may set $\eta_{\mathrm{Univ}}=\eta\circ\phi$ . To check the complexity of $\eta_{\mathrm{Univ}}$ it suffices to check the magnitude of $\eta_{\mathrm{Univ}}$ on $[\exp(e_{i_{1},j_{1}}),\ldots,\exp(e_{i_{r^{\ast}},j_{r^{\ast}}})]$ where we use Remark 5.3 to convert between this notion and the notion of height defined. The resulting magnitude is bounded because $Z_{i,j}^{\ast}$ , $Z_{i,j}^{\mathrm{Lin}}$ , $Z_{i,j}^{\mathrm{Pet}}$ are appropriately bounded integral combinations of elements in $\mathcal{X}$ which itself has bounded complexity.

We omit a careful justification that $\widetilde{F}$ has an appropriately bounded Lipchitz constant. The crucial point is that the Mal’cev basis constructed in Lemma 10.4 is made up of appropriately bounded linear combinations of commutators of $e_{i_{1},j_{1}},\ldots,e_{i_{r},j_{r}}$ and each such commutator is seen to map to a bounded element of $G$ since $\phi$ maps each generator to a bounded element.

The final property is an immediate consequence of the properties of $W_{i,\mathrm{Lin}}$ , $W_{i,\mathrm{Pet}}$ , and $W_{i,\ast}$ established in Lemma 9.1 and recorded above. The additional generators which are lifted to the “petal” position on the $i$ -th level come from $G_{(i,2)}$ and otherwise we have only artificially placed certain elements in the “linear” class upward to the “ $\ast$ ” class. (These will correspond to the constant terms in the linear part of the nilsequences.) ∎

We now lift the polynomial sequences in question to the universal nilmanifold. We define:

	$\displaystyle g_{h}^{\ast,\mathrm{Univ}}(n)$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\ast})}\exp(e_{i,j})^{z_{i,j}^{\ast}\cdot\frac{n^{i}}{i!}}\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\exp(e_{i,j+\dim(W_{i,\ast})})^{\gamma_{i,j}\cdot\frac{n^{i}}{i!}},$
	$\displaystyle g_{h}^{\mathrm{Lin},\mathrm{Univ}}(n)$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\prod_{k=1}^{d^{\ast}}\exp(e_{i,D_{i}^{\ast}+(j-1)d^{\ast}+k})^{\alpha_{i,j,k}\{\beta_{k}h\}\cdot\frac{n^{i}}{i!}},$
	$\displaystyle g_{h}^{\mathrm{Pet},\mathrm{Univ}}(n)$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Pet}})}\exp(e_{i,j+D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}})^{z_{i,j}^{h,\mathrm{Pet}}\cdot\frac{n^{i}}{i!}},$
	$\displaystyle g_{h}^{\mathrm{Rem},\mathrm{Univ}}(n)$	$\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(G_{(i,2)})}\exp(e_{i,j+D_{i}-\dim(G_{(i,2)})})^{\kappa_{i,j}^{h}\cdot\frac{n^{i}}{i!}}.$

We define

g_{h}^{\mathrm{Univ}}:=g_{h}^{\ast,\mathrm{Univ}}\cdot g_{h}^{\mathrm{Lin},\mathrm{Univ}}\cdot g_{h}^{\mathrm{Pet},\mathrm{Univ}}\cdot g_{h}^{\mathrm{Rem},\mathrm{Univ}}.

The key claim, which is trivial by construction, is the following equality.

Claim 10.6.

Given the above setup, we have

\widetilde{F}(g_{h}^{\mathrm{Univ}}(n)\Gamma_{\mathrm{Univ}})=F(g_{h}(n)\Gamma)=\chi_{h}(n).

Proof.

The final equality is by definition of $\chi_{h}(n)$ . The first equality follows by checking that $\phi(g_{h}^{\ast,\mathrm{Univ}})=g_{h}^{\ast},~{}\phi(g_{h}^{\mathrm{Lin},\mathrm{Univ}})=g_{h}^{\mathrm{Lin}},~{}\phi(g_{h}^{\mathrm{Pet},\mathrm{Univ}})=g_{h}^{\mathrm{Pet}},~{}\text{ and }\phi(g_{h}^{\mathrm{Rem},\mathrm{Univ}})=g_{h}^{\mathrm{Rem}}$ by construction. Therefore since $\phi$ is a homomorphism we conclude that $\phi(g_{h}^{\mathrm{Univ}})=g_{h}$ . ∎

Note that at this stage we have simply replace the group $G$ in our correlation structure with $G^{\mathrm{Univ}}$ as the cost of replacing $d$ by $\dim(G_{\mathrm{Univ}})=d^{O_{s}(1)}\log(MD\rho^{-1})^{O_{s}(1)}$ and $M$ by $\exp(d^{O_{s}(1)}\log(MD\rho^{-1})^{O_{s}(1)})$ .

This may seem as if we have gone backwards, the key point is that in Lemma 10.5 we have encoded various “vanishing conditions” on the commutator brackets at the level of the generators of the group. This will allow us to translate the “vanishing conditions” obtained in Lemma 9.1 into realizing we can, up to a degree-rank $(s-1,r^{\ast}-1)$ -error term.

10.3. Passing to a quotient nilmanifold

We now construct two additional nilmanifolds; there are essentially $G^{\ast}$ and $\widetilde{G}$ certain quotients constructed in [34, Section 12].⁵⁵5There is a minor issue in [34, p. 1309] when defining $G^{\ast}$ ; we follow the definitions given in the erratum [31].

Definition 10.7.

We define $G_{\mathrm{Rel}}=G_{\mathrm{Rel}}^{\vec{D}^{\ast},\vec{D}^{\mathrm{Lin}},\vec{D}^{\mathrm{Pet}}}$ as the Lie subgroup of $G_{\mathrm{Univ}}$ where $\log(G_{\mathrm{Rel}})$ is spanned by:

•

Any $(r-1)$ -fold commutator of $e_{i_{1},j_{1}},\ldots,e_{i_{r},j_{r}}$ with at least one index $\ell$ such that $j_{\ell}>D^{\ast}_{i_{\ell}}+D^{\mathrm{Lin}}_{i_{\ell}}$ ;
•

Any $(r-1)$ -fold commutator of $e_{i_{1},j_{1}},\ldots,e_{i_{r},j_{r}}$ with $j_{\ell}>D^{\ast}_{i_{\ell}}$ for at least two distinct indices $\ell$ .

We then define $G_{\mathrm{Quot}}$ as $G_{\mathrm{Quot}}:=G_{\mathrm{Univ}}/G_{\mathrm{Rel}}$ and $\Gamma_{\mathrm{Quot}}=\Gamma_{\mathrm{Univ}}/(\Gamma_{\mathrm{Univ}}\cap G_{\mathrm{Rel}})$ .

Remark 10.8.

Note that we may set $r=1$ in the definition of $G_{\mathrm{Rel}}$ ; in particular $\exp(e_{i,j})\in G_{\mathrm{Rel}}$ for $j>D^{\ast}_{i}+D^{\mathrm{Lin}}_{i}$ . Additionally, $\log(G_{\mathrm{Quot}})$ may be realized as the following. Consider formal generators of a Lie algebra, $\widetilde{e}_{i,j}$ for $1\leq j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}$ , with the property that:

•

Any $(r-1)$ -fold commutator of $\widetilde{e}_{i_{1},j_{1}},\ldots,\widetilde{e}_{i_{r},j_{r}}$ with either $i_{1}+\cdots+i_{r}>(s-1)$ or $i_{1}+\cdots+i_{r}=(s-1)$ and $r>r^{\ast}$ vanishes;
•

Any $(r-1)$ -fold commutator of $\widetilde{e}_{i_{1},j_{1}},\ldots,\widetilde{e}_{i_{r}^{\ast},j_{r}^{\ast}}$ with $j_{\ell}>D^{\ast}_{i_{\ell}}$ for at least two distinct indices $\ell$ vanishes.

This realization is given by taking $\widetilde{e}_{i,j}:=\log(\exp(e_{i,j})~{}\mathrm{mod}~{}G_{\mathrm{Rel}})$ .

We first check that $G_{\mathrm{Quot}}$ is well-defined.

Claim 10.9.

For $\vec{D}^{\ast},\vec{D}^{\mathrm{Lin}},\vec{D}^{\mathrm{Pet}}\in(\mathbb{Z}_{\geq 0})^{s-1}$ , $G_{\mathrm{Rel}}$ is a well-defined normal subgroup of $G_{\mathrm{Univ}}$ .

Proof.

It is clear from definition that $\log(G_{\mathrm{Rel}})$ is closed under brackets, so forms a Lie subalgebra within $\log(G_{\mathrm{Univ}})$ . Thus $G_{\mathrm{Rel}}$ is indeed a Lie subgroup. To prove that $G_{\mathrm{Rel}}$ is normal it suffices to prove that it is furthermore a Lie algebra ideal, i.e., $[\log(G_{\mathrm{Univ}}),\log(G_{\mathrm{Rel}})]\leqslant\log(G_{\mathrm{Rel}})$ .

Recall that $\log(G_{\mathrm{Univ}})$ is spanned by all the $(r-1)$ -fold commutators $e_{i_{1},j_{1}},\ldots,e_{i_{r},j_{r}}$ (although as discussed in Lemma 10.4 this is not a basis). It suffices to check the containment at the level of generators of the respective Lie algebras. The result then follows since taking a commutator does not decrease the number of “petal” or “linear” generators. ∎

We also have the following complexity bound on $G_{\mathrm{Quot}}$ . This may be done via the Lie algebra presentation given in Remark 10.8 and repeating the proof in Lemma 10.4, or via noting that $G_{\mathrm{Rel}}$ is a sufficient rational subgroup of $G_{\mathrm{Univ}}$ and applying Lemma 3.10. We omit the details.

Lemma 10.10.

Given the above setup, let $G_{\mathrm{Quot}}=G_{\mathrm{Quot}}^{\vec{D}}$ and note that $G_{\mathrm{Quot}}$ has a degree-rank $(s-1,r^{\ast})$ filtration given by

(G_{\mathrm{Quot}})_{(d,r)}=(G_{\mathrm{Univ}})_{(d,r)}/((G_{\mathrm{Univ}})_{(d,r)}\cap G_{\mathrm{Rel}}).

Furthermore the dimension of $G_{\mathrm{Univ}}$ is bounded by $O_{s}(\lVert D\rVert_{\infty}^{O_{s}(1)})$ and one may find an adapted Mal’cev basis $\mathcal{X}_{\mathrm{Quot}}$ such that the complexity of $G_{\mathrm{Quot}}/\Gamma_{\mathrm{Quot}}$ is $\exp(\lVert D\rVert_{\infty}^{O_{s}(1)})$ .

A key point in this analysis is that this quotient is compatible with $\eta_{\mathrm{Univ}}$ .

Lemma 10.11.

Given the above setup, define $\eta_{\mathrm{Quot}}\colon(G_{\mathrm{Quot}})_{(s-1,r^{\ast})}\to\mathbb{R}$ via

\eta_{\mathrm{Quot}}(g~{}\mathrm{mod}~{}G_{\mathrm{Rel}}):=\eta_{\mathrm{Univ}}(g)

for all $g\in(G_{\mathrm{Univ}})_{(s-1,r^{\ast})}$ . The map $\eta_{\mathrm{Quot}}$ is well-defined and in fact is a vertical character of $G_{\mathrm{Quot}}$ of height at most $(MD/\rho)^{O_{s}(\dim(G_{\mathrm{Univ}})^{O_{s}(1)})}$ .

Proof.

To be well-defined as a map, it suffices to show that $G_{\mathrm{Rel}}\cap(G_{\mathrm{Univ}})_{(s-1,r^{\ast})}\leqslant\operatorname{ker}(\eta_{\mathrm{Univ}})$ . This comes exactly from the final item of Lemma 10.5. That $\eta$ is a vertical character then follows as $\Gamma_{\mathrm{Quot}}=\Gamma_{\mathrm{Univ}}/(\Gamma_{\mathrm{Univ}}\cap G_{\mathrm{Rel}})$ .

To bound the height of $\eta_{\mathrm{Quot}}$ note that taking a quotient by $G_{\mathrm{Rel}}$ maps $\exp(e_{i,j})$ to $\exp(\widetilde{e}_{i,j})$ in the sense of Remark 10.8. Furthermore the construction of $\mathcal{X}_{\mathrm{Quot}}$ has the property that $\mathcal{X}_{\mathrm{Quot}}\cap\log((G_{\mathrm{Quot}})_{(s-1,r^{\ast})})$ are sufficiently rational combinations of $(r^{\ast}-1)$ -fold commutators of $\widetilde{e}_{i_{1},j_{1}},\ldots,\widetilde{e}_{i_{r^{\ast}},j_{r^{\ast}}}$ with $i_{1}+\cdots+i_{r^{\ast}}=s-1$ . By Baker–Campbell–Hausdorff, we have that the $(r^{\ast}-1)$ -fold commutator of $\exp(\widetilde{e}_{i_{1},j_{1}}),\ldots,\exp(\widetilde{e}_{i_{r^{\ast}},j_{r^{\ast}}})$ is the same $~{}\mathrm{mod}~{}G_{\mathrm{Rel}}$ as the corresponding one for $\exp(e_{i_{1},j_{1}}),\ldots,\exp(e_{i_{r^{\ast}},j_{r^{\ast}}})$ . However, $\eta_{\mathrm{Univ}}$ maps the latter commutator to a sufficiently bounded integer by the complexity bound on $\eta_{\mathrm{Univ}}$ and the result follows. ∎

We will require

(10.1)

\displaystyle\begin{split}g_{h}^{\ast,\mathrm{Quot}}(n)&=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\ast})}\exp(\widetilde{e}_{i,j})^{z_{i,j}^{\ast}\cdot\frac{n^{i}}{i!}}\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\exp(\widetilde{e}_{i,j+\dim(W_{i,\ast})})^{\gamma_{i,j}\cdot\frac{n^{i}}{i!}}\\ g_{h}^{\mathrm{Lin},\mathrm{Quot}}(n)&=\prod_{i=1}^{s-1}\prod_{j=1}^{\dim(W_{i,\mathrm{Lin}})}\prod_{k=1}^{d^{\ast}}\exp(\widetilde{e}_{i,D_{i}^{\ast}+(j-1)d^{\ast}+k})^{\alpha_{i,j,k}\{\beta_{k}h\}\cdot\frac{n^{i}}{i!}};\end{split}

note that

g_{h}^{\ast,\mathrm{Quot}}=g_{h}^{\ast,\mathrm{Univ}}~{}\mathrm{mod}~{}G_{\mathrm{Rel}},\qquad g_{h}^{\mathrm{Lin},\mathrm{Quot}}=g_{h}^{\mathrm{Lin},\mathrm{Univ}}~{}\mathrm{mod}~{}G_{\mathrm{Rel}}.

Furthermore we have

g_{h}^{\mathrm{Pet},\mathrm{Univ}}~{}\mathrm{mod}~{}G_{\mathrm{Rel}}=g_{h}^{\mathrm{Rem},\mathrm{Univ}}~{}\mathrm{mod}~{}G_{\mathrm{Rel}}=\mathrm{id}_{G^{\mathrm{Quot}}}

pointwise. Finally we define

g_{h}^{\mathrm{Quot}}:=g_{h}^{\ast,\mathrm{Quot}}\cdot g_{h}^{\mathrm{Lin},\mathrm{Quot}}.

For the remainder of this section and Section 11, fix a nilcharacter $F^{\ast}$ on $G_{\mathrm{Quot}}$ with a $G_{(s-1,r^{\ast})}$ -vertical frequency $\eta_{\mathrm{Quot}}$ . Furthermore by Lemma B.4⁶⁶6The lemma is stated for degree filtrations. However, one can give $G_{\mathrm{Quot}}$ the degree filtration $(G_{\mathrm{Quot}})_{(0,0)}=(G_{\mathrm{Quot}})_{(1,0)}\geqslant(G_{\mathrm{Quot}})_{(2,0)}\geqslant\cdots\geqslant(G_{\mathrm{Quot}})_{(s-1,0)}\geqslant(G_{\mathrm{Quot}})_{(s-1,r^{\ast})}\geqslant\mathrm{Id}_{G_{\mathrm{Quot}}};$ a vertical nilcharacter with respect to this filtration is a vertical nilcharacter with respect to the original degree-rank filtration. $\mathcal{X}_{{\mathrm{Quot}}}$ is adapted to this degree-filtration (as it is adapted to the original degree-rank filtration). we may take $F^{\ast}$ which is $(MD/\rho)^{O_{s}(\dim(G_{\mathrm{Univ}})^{O_{s}(1)})}$ –Lipschitz with output dimension bounded by $2^{O_{s}(\dim(G_{\mathrm{Univ}}))}$ .

The reason it will be sufficient to study $F^{\ast}(g_{h}^{\mathrm{Quot}}\Gamma^{\mathrm{Quot}})$ will be the following lemma which proves that it is equal to $\widetilde{F}(g_{h}^{\mathrm{Univ}}\Gamma^{\mathrm{Univ}})$ up a term which is lower-order in degree-rank.

Lemma 10.12.

Given the above setup, let

G_{\mathrm{Univ}}^{\triangle}:=\{(g,g~{}\mathrm{mod}~{}G_{\mathrm{Rel}})\in G_{\mathrm{Univ}}\times G_{\mathrm{Quot}}\colon g\in G_{\mathrm{Univ}}\}

which is given the degree-rank filtration

(G_{\mathrm{Univ}}^{\triangle})_{(d,r)}:=\{(g,g~{}\mathrm{mod}~{}G_{\mathrm{Rel}})\in(G_{\mathrm{Univ}})_{(d,r)}\times(G_{\mathrm{Quot}})_{(d,r)}\colon g\in(G_{\mathrm{Univ}})_{(d,r)}\}.

Define $\Gamma_{\mathrm{Univ}}^{\triangle}=G_{\mathrm{Univ}}^{\triangle}\cap(\Gamma_{\mathrm{Univ}}\times\Gamma_{\mathrm{Quot}})$ . We have:

•

$(g_{h}^{\mathrm{Univ}},g_{h}^{\mathrm{Quot}})$ is a polynomial sequence on $G_{\mathrm{Univ}}^{\triangle}$ with respect to the given degree-rank filtration;
•

The function

$(g,g^{\prime})\mapsto\widetilde{F}(g\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}\Gamma_{\mathrm{Quot}})$

for $(g,g^{\prime})\in G_{\mathrm{Univ}}^{\triangle}$ is $(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}$ -invariant;
•

$G_{\mathrm{Univ}}^{\triangle}$ has complexity bounded by $(MD/\rho)^{O_{s}(\dim(G_{\mathrm{Univ}})^{O_{s}(1)})}$ ;
•

Each coordinate of $\widetilde{F}(g\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}\Gamma_{\mathrm{Quot}})$ is $(MD/\rho)^{O_{s}(\dim(G_{\mathrm{Univ}})^{O_{s}(1)})}$ -Lipschitz.

Remark 10.13.

The second item implies that $\widetilde{F}(g\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}\Gamma_{\mathrm{Quot}})$ is $(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}$ -invariant and thus can be realized on a degree-rank $(s-1,r^{\ast}-1)$ nilmanifold $G_{\mathrm{Univ}}^{\triangle}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}$ with $\Gamma_{\mathrm{Univ}}^{\triangle}/(\Gamma_{\mathrm{Univ}}^{\triangle}\cap(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})})$ being the lattice.

Proof.

It is trivial to verify that the degree-rank filtration on $G_{\mathrm{Univ}}^{\triangle}$ is valid. Noting that

\{(X_{i},X_{i}~{}\mathrm{mod}~{}\log(G_{\mathrm{Rel}}))\colon X_{i}\in\mathcal{X}_{\mathrm{Univ}}\}

is a valid Mal’cev basis for $G_{\mathrm{Univ}}^{\triangle}$ bounds the complexity of $G_{\mathrm{Univ}}^{\triangle}$ . The complexity bounds on $\widetilde{F}(g\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}\Gamma_{\mathrm{Quot}})$ follow by noting that $\widetilde{F}$ is appropriately Lipschitz on $G_{\mathrm{Univ}}/\Gamma_{\mathrm{Univ}}$ and similar for $\overline{F^{\ast}}$ . For $F^{\ast}$ , we note that each coordinate of $\{X_{i}~{}\mathrm{mod}~{}\log(G_{\mathrm{Rel}})\}$ is appropriately rational with respect to the Mal’cev basis for $\mathcal{X}_{\mathrm{Quot}}$ , by construction.

Furthermore for $(h,h~{}\mathrm{mod}~{}G_{\mathrm{Rel}})\in(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}$ we have

	$\displaystyle\widetilde{F}(gh\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}(h~{}\mathrm{mod}~{}G_{\mathrm{Rel}})\Gamma_{\mathrm{Quot}})$
	$\displaystyle\qquad=\widetilde{F}(g\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}\Gamma_{\mathrm{Quot}})\cdot e(\eta_{\mathrm{Univ}}(h))\overline{e(\eta_{\mathrm{Quot}}(h~{}\mathrm{mod}~{}G_{\mathrm{Rel}}))}$
	$\displaystyle\qquad=\widetilde{F}(g\Gamma_{\mathrm{Univ}})\otimes\overline{F^{\ast}}(g^{\prime}\Gamma_{\mathrm{Quot}})$

where in the final line we have used the definition of $\eta_{\mathrm{Quot}}$ .

Finally to verify that $(g_{h}^{\mathrm{Univ}},g_{h}^{\mathrm{Quot}})$ is a polynomial sequence with respect to this degree-rank filtration, note via Taylor expansion (e.g. [34, Lemma B.9]) that all polynomial sequences $h$ with respect to $G_{\mathrm{Univ}}^{\triangle}$ of the form $(h^{\prime},h^{\prime}~{}\mathrm{mod}~{}G_{\mathrm{Rel}})$ where $h^{\prime}$ is a polynomial sequence with respect to $G_{\mathrm{Univ}}$ (and its specified degree-rank filtration). The result then follows due to the property

g_{h}^{\mathrm{Univ}}~{}\mathrm{mod}~{}G_{\mathrm{Rel}}=g_{h}^{\mathrm{Quot}}

noted above, which was by construction. ∎

11. Extracting a $(1,s-1)$ -nilsequence

The goal of this section is to realize

F^{\ast}(g_{h}^{\mathrm{Quot}}(n)\Gamma_{\mathrm{Quot}})

as a multidegree $(1,s-1)$ nilsequence in $(h,n)$ . We accomplish this via a construction of Green, Tao, and Ziegler [34, Section 12] and then use this construction in order to complete the proof of Lemma 6.3. After this, the main business of the paper is essentially done and all that remains to prove Theorem 1.2 is the symmetrization argument which will be carried out in the next section.

11.1. Constructing the $(1,s-1)$ -nilsequence

Our analysis at this point is essentially verbatim that of [34, pp. 1313-1315]. We reproduce the details here (and discuss various complexity issues which are completely routine in the appendix). For the sake of simplicity, we may clean up notation from (10.1) and write

\displaystyle g_{h}^{\mathrm{Quot}}(n)

\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{D_{i}^{\ast}}\exp(\widetilde{e}_{i,j})^{\gamma_{i,j}\cdot\frac{n^{i}}{i!}}\cdot\prod_{i=1}^{s-1}\prod_{j=D_{i}^{\ast}+1}^{D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}}\exp(\widetilde{e}_{i,j})^{\alpha_{i,j}\{\beta_{i,j}h\}\cdot\frac{n^{i}}{i!}},

where we have abusively reindexed various coefficients $\gamma,\alpha,\beta$ but nothing else.

We now define $G_{\mathrm{Lin}}$ to be the Lie subgroup of $G_{\mathrm{Quot}}$ such that $\log(G_{\mathrm{Lin}})$ is the subspace generated by all $(r-1)$ -fold iterated commutators (with $r\geq 1$ ) of $\widetilde{e}_{i_{1},j_{1}},\ldots,\widetilde{e}_{i_{r},j_{r}}$ with $j_{\ell}>D_{i_{\ell}}^{\ast}$ for exactly one index $\ell$ . We have the following pair of basic observations.

Claim 11.1.

We have that $G_{\mathrm{Lin}}$ is well-defined, abelian, and normal with respect to $G_{\mathrm{Quot}}$ .

Proof.

Similar to the proof of Claim 10.9, $G_{\mathrm{Lin}}$ is well-defined and normal. The only modification to the proof is noting that a commutator of $\widetilde{e}_{i_{k},j_{k}}$ with at least two indices $\ell$ with $j_{\ell}>D_{i_{\ell}}^{\ast}$ vanishes by the definition of $G_{\mathrm{Quot}}$ .

To see that $G_{\mathrm{Lin}}$ is abelian, it suffices to prove that the commutator of any pair of generators is the identity. This immediately follows from the fact that commutators with at least two generators of the form $\widetilde{e}_{i_{\ell},j_{\ell}}$ with $j_{\ell}>D_{i_{\ell}}^{\ast}$ vanish. ∎

Due to normality, $G_{\mathrm{Quot}}$ acts on $G_{\mathrm{Lin}}$ via conjugation. In particular, we define $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ with the group law given by

(g,g_{1})(g^{\prime},g_{1}^{\prime}):=(gg^{\prime},g_{1}^{g^{\prime}}g_{1}^{\prime})=(gg^{\prime},((g^{\prime})^{-1}g_{1}g^{\prime})g_{1}^{\prime}).

We now introduce a manner in which the additive group $R=\mathbb{R}^{\sum_{i=1}^{s-1}D_{i}^{\mathrm{Lin}}}$ , with elements denoted

t=(t_{i,j})_{1\leq i\leq s-1,~{}D_{i,\ast}<j\leq D_{i}+D_{i}^{\mathrm{Lin}}},

acts on $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ . Specifically, we will define an action $\rho(t)$ on this group for all $t\in R$ and use this to construct

G_{\mathrm{Multi}}=R\ltimes_{\rho}(G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}).

This action will allow us to simultaneously “raise” parts of $G_{\mathrm{Lin}}$ to various different fractional powers of $h$ , allowing us to incorporate our “ $h$ -linear” family of nilsequences into a multidegree $(1,s-1)$ nilsequence (in variables $(h,n)$ ).

For each $t\in R$ , we define the homomorphism $g\mapsto g^{t}$ from $G_{\mathrm{Quot}}$ to itself on generators. We map $\exp(\widetilde{e}_{i,j})\to\exp(\widetilde{e}_{i,j})^{t_{i,j}}$ for $1\leq i\leq s-1$ and $D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}$ while $\exp(\widetilde{e}_{i,j})$ is fixed for $1\leq i\leq s-1$ and $1\leq j\leq D_{i}^{\ast}$ . The defining relations of $G_{\mathrm{Quot}}$ are preserved by this transformation, so this is easily seen to be a well-defined homomorphism. At the Lie algebra this transformation is essentially replacing appropriate $\widetilde{e}_{i,j}$ by $t_{i,j}\widetilde{e}_{i,j}$ .

For $g\in G_{\mathrm{Quot}}$ and $t,t^{\prime}\in R$ we have

(g^{t})^{t^{\prime}}=g^{tt^{\prime}},

and for $g,g^{\prime}\in G_{\mathrm{Lin}}$ we have

g^{t}g^{t^{\prime}}=g^{t+t^{\prime}}\text{ and }g^{t}g^{\prime t}=(gg^{\prime})^{t}.

This are trivial since $G_{\mathrm{Lin}}$ is abelian.

We next claim that if $g\in G_{\mathrm{Quot}}$ and $g^{\prime}\in G_{\mathrm{Lin}}$ then

(11.1)

(gg^{\prime}g^{-1})^{t}=gg^{\prime t}g^{-1}.

To prove this note that it suffices to prove the claim for powers of generators of the groups $G_{\mathrm{Quot}}$ and $G_{\mathrm{Lin}}$ (since conjugation and $g\mapsto g^{t}$ are homomorphisms). If $g\in G_{\mathrm{Lin}}$ the result is trivial due to the abelian property, and if $g\notin G_{\mathrm{Lin}}$ (and is the power of a generator) then $g^{t}=g$ by definition so $(gg^{\prime}g^{-1})^{t}=g^{t}g^{\prime t}(g^{-1})^{t}=gg^{\prime t}g^{-1}$ as desired.

We now define $\rho\colon R\to\operatorname{Aut}(G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}})$ by

\rho(t)(g,g_{1}):=(g\cdot g_{1}^{t},g_{1}).

The map $\rho(t)$ is clearly bijective and we have

\rho(s)(\rho(t)(g,g_{1}))=\rho(s)((g\cdot g_{1}^{t},g_{1}))=(g\cdot g_{1}^{t+s},g_{1})=\rho(t+s)(g,g_{1}),

so to check this is a group action it suffices to show $\rho(t)$ gives a valid homomorphism of $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ . This follows because

	$\displaystyle\rho(t)((g,g_{1})\cdot(g^{\prime},g_{1}^{\prime}))$	$\displaystyle=\rho(t)(gg^{\prime},(g^{\prime})^{-1}g_{1}g^{\prime}g_{1}^{\prime})$
		$\displaystyle=(gg^{\prime}(g^{\prime})^{-1}g_{1}^{t}g^{\prime}(g_{1}^{\prime})^{t},(g^{\prime})^{-1}g_{1}g^{\prime}g_{1}^{\prime})=(gg_{1}^{t}g^{\prime}(g_{1}^{\prime})^{t},(g^{\prime})^{-1}g_{1}g^{\prime}g_{1}^{\prime}),$

by (11.1), while

	$\displaystyle\rho(t)(g,g_{1})\rho(t)(g^{\prime},g_{1}^{\prime})$	$\displaystyle=(gg_{1}^{t},g_{1})\cdot(g^{\prime}(g_{1}^{\prime})^{t},g_{1}^{\prime})=(gg_{1}^{t}g^{\prime}(g_{1}^{\prime})^{t},(g^{\prime}(g_{1}^{\prime})^{t})^{-1}g_{1}g^{\prime}(g_{1}^{\prime})^{t}g_{1}^{\prime})$
		$\displaystyle=(gg_{1}^{t}g^{\prime}(g_{1}^{\prime})^{t},(g_{1}^{\prime})^{-t}((g^{\prime})^{-1}g_{1}g^{\prime})(g_{1}^{\prime})^{t}g_{1}^{\prime})=(gg_{1}^{t}g^{\prime}(g_{1}^{\prime})^{t},(g_{1}^{\prime})^{-t}(g_{1}^{\prime})^{t}((g^{\prime})^{-1}g_{1}g^{\prime})g_{1}^{\prime})$
		$\displaystyle=(gg_{1}^{t}g^{\prime}(g_{1}^{\prime})^{t},(g^{\prime})^{-1}g_{1}g^{\prime}g_{1}^{\prime}),$

where we have used that $G_{\mathrm{Lin}}$ is abelian and normal.

We are now in position to define the group of interest which will support the multidegree $(1,s-1)$ nilsequence. Let

G_{\mathrm{Multi}}=R\ltimes_{\rho}(G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}})

where multiplication is given by

(t,(g,g_{1}))(t^{\prime},(g^{\prime},g_{1}^{\prime}))=(t+t^{\prime},(\rho(t^{\prime})(g,g_{1}))\cdot(g^{\prime},g_{1}^{\prime})).

This is seen to be a connected, simply connected Lie group. We give it a multidegree filtration $(G_{\mathrm{Multi}})_{(d_{1},d_{2})}$ defined by:

•

If $d_{1}>1$ then $(G_{\mathrm{Multi}})_{(d_{1},d_{2})}=\mathrm{Id}_{G_{\mathrm{Multi}}}$ ;
•

If $d_{2}>0$ then $(G_{\mathrm{Multi}})_{(1,d_{2})}=\{(0,(g,\mathrm{id}_{G_{\mathrm{Lin}}}))\colon g\in(G_{\mathrm{Quot}})_{(d_{2},0)}\cap G_{\mathrm{Lin}}\}$ ;
•

$(G_{\mathrm{Multi}})_{(1,0)}=\{(t,(g,\mathrm{id}_{G_{\mathrm{Lin}}}))\colon t\in R,g\in(G_{\mathrm{Quot}})_{(0,0)}\cap G_{\mathrm{Lin}}\}$ or equivalently just $\{(t,(g,\mathrm{id}_{G_{\mathrm{Lin}}}))\colon t\in R,g\in G_{\mathrm{Lin}}\}$ ;
•

If $d_{2}>0$ then $(G_{\mathrm{Multi}})_{(0,d_{2})}=\{(0,(g,g_{1}))\colon g\in(G_{\mathrm{Quot}})_{(d_{2},0)},g_{1}\in(G_{\mathrm{Quot}})_{(d_{2},0)}\cap G_{\mathrm{Lin}}\}$ ;
•

$(G_{\mathrm{Multi}})_{(0,0)}=G_{\mathrm{Multi}}$ .

Claim 11.2.

$(G_{\mathrm{Multi}})_{(d_{1},d_{2})}$ is a valid multidegree filtration on $G_{\mathrm{Multi}}$ .

Proof.

Note that

(t,(g,g_{1}))=(t,(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(g,g_{1}))

and therefore $(G_{\mathrm{Multi}})_{(0,0)}=(G_{\mathrm{Multi}})_{(1,0)}\vee(G_{\mathrm{Multi}})_{(0,1)}$ . We next check various commutator relations. First note that

[(G_{\mathrm{Multi}})_{(1,0)},(G_{\mathrm{Multi}})_{(1,0)})]=\mathrm{Id}_{G_{\mathrm{Multi}}}.

This follows because if $g,h\in G_{\mathrm{Lin}}$ we have $gh=hg$ hence

(t,(g,\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(t^{\prime},(h,\mathrm{id}_{G_{\mathrm{Lin}}})))=(t+t^{\prime},(gh,\mathrm{id}_{G_{\mathrm{Lin}}}))=(t^{\prime},(h,\mathrm{id}_{G_{\mathrm{Lin}}})))\cdot(t,(g,\mathrm{id}_{G_{\mathrm{Lin}}})).

Therefore it suffices to verify that

	$\displaystyle[(G_{\mathrm{Multi}})_{(0,a)},(G_{\mathrm{Multi}})_{(0,b)}]$	$\displaystyle\leqslant(G_{\mathrm{Multi}})_{(0,a+b)},$
	$\displaystyle[(G_{\mathrm{Multi}})_{(1,a)},(G_{\mathrm{Multi}})_{(0,b)}]$	$\displaystyle\leqslant(G_{\mathrm{Multi}})_{(1,a+b)}.$

We first tackle the first claim, in which we may reduce to the case $a,b>0$ . We wish to show

[(g,g_{1}),(g^{\prime},g_{1}^{\prime})]\in\{(h,h_{1})\colon h\in(G_{\mathrm{Quot}})_{(a+b,0)},h_{1}\in(G_{\mathrm{Quot}})_{(a+b,0)}\cap G_{\mathrm{Lin}}\}

if $g,g_{1}\in(G_{\mathrm{Quot}})_{(a,0)}$ , $g^{\prime},g_{1}^{\prime}\in(G_{\mathrm{Quot}})_{(b,0)}$ , and $g_{1},g_{1}^{\prime}\in G_{\mathrm{Lin}}$ . Via Lemma 2.2, it suffices to prove $(G_{\mathrm{Quot}})_{(a+b,0)}$ is normal in $(G_{\mathrm{Quot}})_{(a,0)}$ and $(G_{\mathrm{Quot}})_{(b,0)}$ and then check at the level of generators.

To check normality, we have

	$\displaystyle(g,g_{1})(g^{\prime},g_{1}^{\prime})(g,g_{1})^{-1}$	$\displaystyle=(g,g_{1})(g^{\prime},g_{1}^{\prime})(g^{-1},gg_{1}^{-1}g^{-1})$
		$\displaystyle=(gg^{\prime},(g^{\prime})^{-1}g_{1}g^{\prime}\cdot g_{1}^{\prime})(g^{-1},gg_{1}^{-1}g^{-1})$
		$\displaystyle=(gg^{\prime}g^{-1},(g(g^{\prime})^{-1})g_{1}(g^{\prime}g^{-1})\cdot gg_{1}^{\prime}g^{-1}\cdot gg_{1}^{-1}g^{-1})$

and the result follows noting that $G_{\mathrm{Lin}},(G_{\mathrm{Quot}})_{(j,0)}$ are normal in $G_{\mathrm{Quot}}$ for all $j\geq 0$ .

Since

(g,g_{1})=(g,\mathrm{id}_{G_{\mathrm{Lin}}})\cdot(\mathrm{id}_{\mathrm{Quot}},g_{1})

and it suffices to check the claim on generators, we may reduce to the case where exactly one of $g,g_{1}$ and exactly one of $g_{1},g_{1}^{\prime}$ are the identity. The result is clear when $g,g^{\prime}$ are trivial, and the case when $g_{1},g_{1}^{\prime}$ are trivial follows from the fact that we have a valid filtration on $G_{\mathrm{Quot}}$ . In the remaining cases we may assume by symmetry that $g_{1}=\mathrm{id}_{G_{\mathrm{Lin}}}$ and $g^{\prime}=\mathrm{id}_{G_{\mathrm{Quot}}}$ . We have

(g^{-1},\mathrm{id}_{G_{\mathrm{Lin}}})(\mathrm{id}_{G_{\mathrm{Quot}}},(g_{1}^{\prime})^{-1})(g,\mathrm{id}_{G_{\mathrm{Lin}}})(\mathrm{id}_{G_{\mathrm{Quot}}},g_{1}^{\prime})=(\mathrm{id}_{G_{\mathrm{Quot}}},g^{-1}(g_{1}^{\prime})^{-1}gg_{1}^{\prime})

and we see that the final coordinate satisfies $[g,g_{1}^{\prime}]\in(G_{\mathrm{Quot}})_{(a+b,0)}\cap G_{\mathrm{Lin}}$ . We have finished verifying the first claim.

Now note that $\{(h,\mathrm{id}_{G_{\mathrm{Lin}}})\colon h\in G_{\mathrm{Lin}}\}$ is a normal subgroup of $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ , since $G_{\mathrm{Lin}}$ is abelian. Thus combining with the first claim gives the second claim, namely

[(G_{\mathrm{Multi}})_{(1,a)},(G_{\mathrm{Multi}})_{(0,b)})]\leqslant(G_{\mathrm{Multi}})_{(1,a+b)},

for $a>0$ .

The only nontrivial case left is $a=0$ and $b>0$ for the second claim. Furthermore, combining what we know it suffices to check the case when $(t,(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))$ is the element from $(G_{\mathrm{Multi}})_{(1,0)}$ . Note however that

	$\displaystyle(t,$	$\displaystyle(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(g,g_{1}))\cdot(-t,(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(g,g_{1}))^{-1}$
		$\displaystyle=(t,(g,g_{1}))\cdot(-t,(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(g^{-1},gg_{1}^{-1}g^{-1}))=(0,(gg_{1}^{-t},g_{1}))\cdot(0,(g^{-1},gg_{1}^{-1}g^{-1}))$
		$\displaystyle=(0,(gg_{1}^{-t}g^{-1},\mathrm{id}_{G_{\mathrm{Lin}}})).$

and the fact that if $g,g_{1}\in(G_{\mathrm{Quot}})_{(b,0)}$ and $g_{1}\in G_{\mathrm{Lin}}$ then $gg_{1}^{-t}g^{-1}\in(G_{\mathrm{Quot}})_{(b,0)}$ . This follows because if $g_{1}\in(G_{\mathrm{Quot}})_{(b,0)}\cap G_{\mathrm{Lin}}$ then $g_{1}^{t}$ is in the same group. ∎

Writing $t=(t_{i,j})_{1\leq i\leq s-1,~{}D_{i}^{\ast}<j\leq D_{i}+D_{i}^{\mathrm{Lin}}}$ , we define

\Gamma_{\mathrm{Multi}}=\{(t,(g,g_{1}))\colon t_{i,j}\in\mathbb{Z},g\in\Gamma_{\mathrm{Quot}},g_{1}\in\Gamma_{\mathrm{Quot}}\cap G_{\mathrm{Lin}}\}.

To see this is a group, observe that for $g_{1}\in\Gamma_{\mathrm{Quot}}$ we have $g^{t}\in\Gamma_{\mathrm{Quot}}$ if all coordinates of $t$ are integral. This is clear for the generators of $\Gamma_{\mathrm{Quot}}$ and the rest follows from recalling that “taking $t$ -th powers” is a homomorphism on $G_{\mathrm{Quot}}$ .

We now define the relevant functions which will be used to represent $F^{\ast}(g_{h}^{\mathrm{Quot}}(n)\Gamma^{\mathrm{Quot}})$ . Let $\delta=\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)}))$ , where the implicit constants are chosen sufficiently large.

Let $\phi\colon\mathbb{R}\to\mathbb{R}$ be a $1$ -bounded, $1$ -periodic function such that:

•

$\phi(x)=1$ if $|\{x\}|\leq 1/2-2\delta$ ;
•

$\phi(x)=0$ if $|\{x\}|\geq 1/2-\delta$ ;
•

$\phi$ is $O(1/\delta)$ -Lipschitz.

Define $H^{\ast}\subseteq H$ such that for all $1\leq i\leq s-1$ and $D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}$ we have $|\{\beta_{i,j}h\}|\geq 1/2-\delta$ . Using that $\beta_{i,j}\in(1/N^{\prime})\mathbb{Z}$ where $N^{\prime}$ is a prime between $100N$ and $200N$ , we see that there are at most $O(\delta\cdot N\cdot\sum_{i=1}^{s-1}D_{i}^{\mathrm{Lin}})$ indices which do not satisfy the criterion and choosing $\delta$ sufficiently small, we may assume that $H^{\ast}$ is at least half the size of $H$ .

Given $(t,(g,g_{1}))\in G_{\mathrm{Multi}}$ , we may find $(t^{\prime},(g^{\prime},g_{1}^{\prime}))\in(t,(g,g_{1}))\Gamma_{\mathrm{Multi}}$ such that $(t^{\prime})_{i,j}\in(-1/2,1/2]$ for all $i,j$ . Define

F_{\mathrm{Multi}}((t,(g,g_{1}))\Gamma_{\mathrm{Multi}})=F^{\ast}(g^{\prime}\Gamma_{\mathrm{Quot}})\cdot\prod_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}}\phi(t_{i,j}^{\prime});

we check that this in fact gives a well-defined function on $G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}$ . Note that if $(t^{\prime},(g^{\prime},g_{1}^{\prime}))\in(t,(g,g_{1}))\Gamma_{\mathrm{Multi}}$ and $t_{i,j}^{\prime}\in(-1/2,1/2]$ then $t_{i,j}^{\prime}=\{t_{i,j}\}$ and hence $t^{\prime}$ is unique. Furthermore note that

(t^{\prime},(g^{\prime},g_{1}^{\prime}))\cdot(0,(\gamma^{\prime},\gamma_{1}^{\prime}))=(t^{\prime},(g^{\prime},g_{1}^{\prime})\cdot(\gamma^{\prime},\gamma_{1}^{\prime}))=(t^{\prime},(g^{\prime}\gamma^{\prime},(\gamma^{\prime})^{-1}g_{1}^{\prime}\gamma^{\prime}\gamma_{1}^{\prime}))

and trivially

F^{\ast}(g^{\prime}\Gamma_{\mathrm{Quot}})=F^{\ast}(g^{\prime}\gamma^{\prime}\Gamma_{\mathrm{Quot}})

if $\gamma^{\prime}\in\Gamma_{\mathrm{Quot}}$ . Now recall that

\displaystyle g_{h}^{\mathrm{Quot}}(n)

\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{D_{i}^{\ast}}\exp(\widetilde{e}_{i,j})^{\gamma_{i,j}\cdot\frac{n^{i}}{i!}}\cdot\prod_{i=1}^{s-1}\prod_{j=D_{i}^{\ast}+1}^{D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}}\exp(\widetilde{e}_{i,j})^{\alpha_{i,j}\{\beta_{i,j}h\}\cdot\frac{n^{i}}{i!}}.

We set

\displaystyle g_{0}(n)

\displaystyle=\prod_{i=1}^{s-1}\prod_{j=1}^{D_{i}^{\ast}}\exp(\widetilde{e}_{i,j})^{\gamma_{i,j}\cdot\frac{n^{i}}{i!}},\quad g_{1}(n)=\prod_{i=1}^{s-1}\prod_{j=D_{i}^{\ast}+1}^{D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}}\exp(\widetilde{e}_{i,j})^{\alpha_{i,j}\cdot\frac{n^{i}}{i!}}

and define

	$\displaystyle g_{\mathrm{Final}}(h,n)$	$\displaystyle=(0,(g_{0}(n),g_{1}(n)))\cdot((\beta_{i,j}h)_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))$
		$\displaystyle=(0,(g_{0}(n),\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(\mathrm{id}_{G_{\mathrm{Quot}}},g_{1}(n)))\cdot((\beta_{i,j}h)_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}})).$

$g_{\mathrm{Final}}(h,n)$ is seen to be a polynomial sequence with respect to the filtration given to $G_{\mathrm{Multi}}$ as each piece is trivially a polynomial sequence and the polynomial sequences form a group under pointwise multiplication (see [34, Corollary B.4]).

Note that for all $h\in H$ we have

	$\displaystyle g_{\mathrm{Final}}(h,n)\Gamma_{\mathrm{Multi}}$	$\displaystyle=(0,(g_{0}(n),g_{1}(n)))\cdot((\{\beta_{i,j}h\})_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\Gamma_{\mathrm{Multi}}$
		$\displaystyle=((\{\beta_{i,j}h\})_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(g_{0,h}^{\ast}(n),g_{1}(n)))\Gamma_{\mathrm{Multi}},$

writing

g_{0,h}^{\ast}(n)=g_{0}(n)(g_{1}(n))^{t(h)}

where $t(h)=(\{\beta_{i,j}h\})_{1\leq i\leq s-1,~{}D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}}\in R$ . This is precisely the desired sense, discussed earlier, in which we have used the group action to “raise” parts of $G_{\mathrm{Lin}}$ to $h$ -fractional powers.

Therefore, for all $h\in H^{\ast}$ we have

(11.2)

F_{\mathrm{Multi}}(g_{\mathrm{Final}}(h,n)\Gamma_{\mathrm{Multi}})=F^{\ast}(g_{h}^{\mathrm{Quot}}(n)\Gamma_{\mathrm{Quot}}).

We now state various complexity claims regarding $G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}$ and the Lipschitz nature of the function $F_{\mathrm{Multi}}$ . We defer the rather uninspiring task of checking these bounds to the end of Appendix B.

Lemma 11.3.

Given the above setup, we have that $G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}$ has the structure of a multidegree $(1,s-1)$ nilmanifold and it may be given a basis $\mathcal{X}_{\mathrm{Multi}}$ of complexity bounded by $\exp(O_{s}((d\log(MD/\rho))^{O_{s}(1)}))$ . Furthermore $F_{\mathrm{Multi}}$ is $\exp(O_{s}((d\log(MD/\rho))^{O_{s}(1)}))$ -Lipschitz under this metric.

11.2. Extracting correlation

We now complete the proof of Lemma 6.3. The proof is little more than stitching results proven in this and the previous section and noting that if two nilcharacters “differ by a lower degree-rank term” then one may pass from to the other at the cost of introducing a lower order term. (This is essentially [34, Lemma E.7].)

Proof of Lemma 6.3.

We return to the correlation structure discussed in Section 10 (that is output by Lemma 9.1). Again, we will abuse notation slightly as discussed. So, for all $h\in H$ (where $|H|\geq\rho^{\prime}N$ ) we have

\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\otimes\overline{\chi(h,n)}\otimes\overline{\chi_{h}(n)}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}\geq\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)}))

where $\psi_{h}$ is a complexity $M^{\prime}$ nilsequence of degree $(s-2)$ and dimension at most $d^{\prime}$ . We adopt the notation developed in Sections 10 and 11. Applying Claim 10.6, we have

\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\otimes\overline{\chi(h,n)}\otimes\overline{\widetilde{F}(g_{h}^{\mathrm{Univ}}(n)\Gamma_{\mathrm{Univ}})}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}\geq\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)})).

Next note that

F^{\ast}(g^{\prime}\Gamma_{\mathrm{Quot}})\otimes\overline{F^{\ast}(g^{\prime}\Gamma_{\mathrm{Quot}})}

has trace equal to $1$ as $F^{\ast}$ is a nilcharacter. Since the output dimension of $F^{\ast}$ is bounded by $\exp((d\log(MD/\rho))^{O_{s}(1)})$ , we have for all $h\in H$ that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)\otimes\overline{\chi(h,n)}$	$\displaystyle\otimes\overline{\widetilde{F}(g_{h}^{\mathrm{Univ}}(n)\Gamma_{\mathrm{Univ}})}\otimes F^{\ast}(g_{h}^{\mathrm{Quot}}(n)\Gamma_{\mathrm{Quot}})$
		$\displaystyle\otimes\overline{F^{\ast}(g_{h}^{\mathrm{Quot}}(n)\Gamma_{\mathrm{Quot}})}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}\geq\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)})).$

Using (11.2), we in fact may write for $h\in H^{\ast}$ that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)$	$\displaystyle\otimes\overline{\chi(h,n)}\otimes\overline{\widetilde{F}(g_{h}^{\mathrm{Univ}}(n)\Gamma_{\mathrm{Univ}})}\otimes F^{\ast}(g_{h}^{\mathrm{Quot}}(n)\Gamma_{\mathrm{Quot}})\otimes\overline{F_{\mathrm{Multi}}(g_{\mathrm{Final}}(h,n))}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}$
		$\displaystyle\geq\exp(-d^{O_{s}(1)}\log(MD\rho^{-1})^{O_{s}(1)}).$

Now we may pay a cost of $\exp(-(d\log(MD/\rho))^{O_{s}(1)})$ in the size of $H^{\ast}$ by Pigeonhole to choose a single coordinate function of $\overline{\widetilde{F}(g_{h}^{\mathrm{Univ}}(n)\Gamma_{\mathrm{Univ}})}\otimes F^{\ast}(g^{\prime}\Gamma_{\mathrm{Quot}})$ , call it $\psi_{h}^{\ast}(n)$ , such that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)$	$\displaystyle\otimes\overline{\chi(h,n)}\otimes\psi_{h}^{\ast}(n)\otimes\overline{F_{\mathrm{Multi}}(g_{\mathrm{Final}}(h,n))}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}$
		$\displaystyle\geq\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)})).$

By Lemma 10.12 and using Remark 10.13, $\psi_{h}^{\ast}(n)$ can be realized on a nilmanifold with a degree-rank $(s-1,r^{\ast}-1)$ filtration. Furthermore the function underlying $\psi_{h}^{\ast}(n)$ is has Lipschitz constant bounded by $\exp((d\log(MD/\rho))^{O_{s}(1)})$ and the nilmanifold it lives on has dimension at most $(d\log(MD/\rho))^{O_{s}(1)}$ and complexity bounded by $\exp((d\log(MD/\rho))^{O_{s}(1)})$ due to Lemma 10.12.

By applying [42, Lemma A.6] with subgroup corresponding to the $(s-1,r^{\ast}-1)$ degree-rank and Pigeonholing in the associated vertical frequency, we may assume that $\psi_{h}^{\ast}(n)$ has a vertical frequency with height bounded by $\exp((d\log(MD/\rho))^{O_{s}(1)})$ ; this may reduce the subset of $H^{\ast}$ under consideration by a further admissible fraction. We then extend $\psi_{h}^{\ast}(n)$ to a nilcharacter by using Lemma B.4⁷⁷7We have that $\psi_{h}^{\ast}$ lives on the group $G_{\mathrm{Univ}}^{\triangle}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}$ . We may give it the degree filtration $\displaystyle G_{\mathrm{Univ}}^{\triangle}/($ $\displaystyle G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}=(G_{\mathrm{Univ}}^{\triangle})_{(1,0)}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}\geqslant(G_{\mathrm{Univ}}^{\triangle})_{(2,0)}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}$ $\displaystyle\geqslant\cdots\geqslant(G_{\mathrm{Univ}}^{\triangle})_{(s-1,0)}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}\geqslant(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast}-1)}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}\geqslant\mathrm{Id}_{G_{\mathrm{Univ}}^{\triangle}/(G_{\mathrm{Univ}}^{\triangle})_{(s-1,r^{\ast})}}$ and we apply Lemma B.4 to this filtration to get a nilcharacter $H$ . We then embed $\psi_{h}^{\ast}$ by taking the underlying function, call it $Q$ , and taking the nilcharacter $(Q/(2\cdot\lVert Q\rVert_{\infty}),\sqrt{1-|Q/(2\cdot\lVert Q\rVert_{\infty})|^{2}}\cdot H)$ .; we refer to this nilcharacter as $\psi_{h}^{\mathrm{Output}}$ and note it is a degree-rank $(s-1,r^{\ast}-1)$ nilcharacter with appropriate complexity. We thus have

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)$	$\displaystyle\otimes\overline{\chi(h,n)}\otimes\psi_{h}^{\mathrm{Output}}(n)\otimes\overline{F_{\mathrm{Multi}}(g_{\mathrm{Final}}(h,n))}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}$
		$\displaystyle\geq\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)})).$

By Pigeonholing in $h$ once again we may pass to $F_{\mathrm{Multi}}^{\ast}$ , which is a fixed coordinate of $F_{\mathrm{Multi}}$ ,

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)$	$\displaystyle\otimes\overline{\chi(h,n)}\otimes\psi_{h}^{\mathrm{Output}}(n)\cdot\overline{F_{\mathrm{Multi}}^{\ast}(g_{\mathrm{Final}}(h,n))}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}$
		$\displaystyle\geq\exp(-d^{O_{s}(1)}\log(MD\rho^{-1})^{O_{s}(1)}).$

on a $\exp(-(d\log(MD/\rho))^{O_{s}(1)})$ fraction of indices. $F_{\mathrm{Multi}}^{\ast}$ lives on the group $G_{\mathrm{Multi}}$ and via [42, Lemma A.6], Pigeonholing in $h$ so that we have the same frequency, and embedding in a nilcharacter via Lemma B.4 similar to the above argument, we have for all $h\in H^{\ast}$ that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}(\Delta_{h}f)(n)$	$\displaystyle\otimes\overline{\chi(h,n)}\otimes\psi_{h}^{\mathrm{Output}}(n)\otimes\overline{F_{\mathrm{Multi}}^{\mathrm{Output}}(g_{\mathrm{Final}}(h,n))}\cdot\overline{\psi_{h}(n)}\rVert_{\infty}$
		$\displaystyle\geq\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)}))$

where $F_{\mathrm{Multi}}^{\mathrm{Output}}$ is a multidegree $(1,s-1)$ nilcharacter on $G_{\mathrm{Multi}}$ with vertical frequency height, output dimension, and Lipschitz constant of each coordinate bounded by $\exp((d\log(MD/\rho))^{O_{s}(1)})$ while the dimension of the underlying nilmanifold is bounded by $(d\log(MD/\rho))^{O_{s}(1)}$ .

This completes the proof with $\chi(h,n)\otimes F_{\mathrm{Multi}}^{\mathrm{Output}}(g_{\mathrm{Final}}(h,n))$ being the new multidegree $(1,s-1)$ nilcharacter, $\overline{\psi_{h}(n)^{\mathrm{Output}}}$ being the degree-rank $(s-1,r^{\ast}-1)$ nilcharacter and noting that the density of indices $h$ which remain is at least $\exp(-O_{s}((d\log(MD/\rho))^{O_{s}(1)}))$ . ∎

12. Symmetrization argument

We now perform the necessary symmetrization argument. In particular, at this stage in the argument due to Theorem 6.4 we have shown that for many $h$ , $\Delta_{h}f$ correlates with $\chi(h,n)$ which is a multidegree $(1,s-1)$ nilcharacter. We now demonstrate that $\chi(h,n)$ is “symmetric up to lower order terms” in $h$ and $n$ (after multilinearizing the $n$ variable) via an argument of Green, Tao, and Ziegler [34], which in turn is closely related to an earlier argument of Green and Tao [23] which proved such a result for the $U^{3}$ -norm. Our treatment is slightly simpler than in [34]. Importantly, this argument is fundamentally based on a finite number of applications of Cauchy–Schwarz and a single call to equidistribution theory and therefore naturally comes with good bounds.

All references to Appendix C are simply quantified versions of lemmas which appear in the work of Green, Tao, and Ziegler [34, Appendix E] and a discussion of the correspondence is given more carefully in Appendix C. The reader may benefit from glancing at the statements in Appendix C or those in [34, Appendix E].

For the remainder of this section and Appendix C, to lighten statements, we say a nilsequence $\chi$ has complexity $(M,d)$ if the underlying nilmanifold $G/\Gamma$ has complexity $M$ , the underlying function is $M$ -Lipschitz, and the dimension of $G$ is bounded by $d$ . We will say a nilcharacter $\chi$ has complexity $(M,d)$ if the underlying nilmanifold $G/\Gamma$ has complexity $M$ , the output dimension of $\chi$ is bounded by $M$ , the underlying function has all coordinates being $M$ -Lipschitz, the vertical character underlying $\chi$ has height bounded by $M$ , and the dimension of $G$ is bounded by $d$ . In this section, $M$ will always be of the form $M(\delta):=\exp(\log(1/\delta)^{O_{s}(1)})$ while the underlying $d$ will be of the form $d(\delta):=\log(1/\delta)^{O_{s}(1)}$ in our analysis, where the implicit constants may, by abuse of notation, vary from line to line.

We now recall the output of Theorem 6.4. We have

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\chi(h,n)\psi_{h}(n)\rVert_{\infty}\geq M(\delta)^{-1}.

Here $\psi_{h}(n)$ is a degree $(s-2)$ nilsequence and $\chi(h,n)=F(g(h,n)\Gamma)$ is a multidegree $(1,s-1)$ nilcharacter. Furthermore $\chi$ has complexity $(M(\delta),d(\delta))$ while $\psi_{h}(n)$ has complexity $(M(\delta),d(\delta))$ .

Our first step is to multilinearize $\chi$ in the $n$ variable, replacing it by a multidegree $(1,1,\ldots,1)$ nilcharacter which is symmetric in the final $(s-1)$ variables.

Lemma 12.1.

Fix $s\geq 2$ . Suppose that

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\chi(h,n)\cdot\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta)

with $\chi(h,n)$ being a periodic multidegree $(1,s-1)$ -nilcharacter and $\psi_{h}(n)$ are degree $(s-2)$ nilsequences each of complexity $(M(\delta),d(\delta))$ .

There exists $\widetilde{\chi}$ a multidegree $(1,\ldots,1)$ nilcharacter (with $s$ ones), $\widetilde{\psi}$ a degree $(s-1)$ nilsequence, and there exist $\widetilde{\psi_{h}}(n)$ which are degree $(s-2)$ nilcharacters all having complexity complexity $(M(\delta),d(\delta))$ such that

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\widetilde{\chi}(h,n,\ldots,n)\cdot\widetilde{\psi}(n)\otimes\widetilde{\psi_{h}}(n)\rVert_{\infty}\geq 1/M(\delta).

Furthermore $\widetilde{\chi}$ is symmetric in the final $(s-1)$ coordinates, i.e., for any $\sigma\in\mathfrak{S}_{s-1}$ we have

\widetilde{\chi}(h,n_{1},\ldots,n_{s-1})=\widetilde{\chi}(h,n_{\sigma(1)},\ldots,n_{\sigma(s-1)}).

Proof.

This is essentially an immediate consequence of multilinearization (see e.g. [34, Theorem E.10]). By applying Lemma C.5, there is multidegree $(1,\ldots,1)$ nilcharacter $\widetilde{\chi}$ of complexity $(M(\delta),d(\delta))$ such that $\chi(h,n)$ and $\widetilde{\chi}(h,n,\ldots,n)$ are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Furthermore $\widetilde{\chi}$ is symmetric in the final $(s-1)$ coordinates.

Thus applying Lemma 7.4 (and the remark following), there exists a nilsequence $\psi^{\ast}(h,n)$ of degree $\leq(s-1)$ and complexity $(M(\delta),d(\delta))$ such that

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\widetilde{\chi}(h,n,\ldots,n)\otimes\psi^{\ast}(h,n)\cdot\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta).

Note that a degree $(s-1)$ nilsequence of complexity $(M(\delta),d(\delta))$ in two variables $(h,n)$ is also a multidegree $(0,s-1)\cup(s-1,s-2)$ nilsequence of complexity $(M(\delta),d(\delta))$ via taking the filtration $G_{\vec{i}}:=G_{|\vec{i}|}$ . Therefore by Lemma C.6 and the first item of Lemma C.2, there exist nilsequences $\widetilde{\psi}(n)$ and $\psi_{h}^{\ast}(n)$ of degree $(s-1)$ and $(s-2)$ respectively and complexity $(M(\delta),d(\delta))$ such that

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\widetilde{\chi}(h,n,\ldots,n)\otimes\widetilde{\psi}(n)\cdot\psi_{h}^{\ast}(n)\cdot\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta).

Now, $\psi_{h}^{\ast}(n)\cdot\psi_{h}(n)$ is a degree $(s-2)$ nilsequence of complexity $(M(\delta),d(\delta))$ . Applying [42, Lemma A.6], we may replace this product by $\psi_{h}^{\prime}(n)$ which is a degree $(s-2)$ nilsequence of complexity $(M(\delta),d(\delta))$ with a vertical frequency of height $\exp(\log(1/\delta)^{O_{s}(1)})$ . Finally apply Lemma B.4 and embed $\psi_{h}^{\prime}(n)$ as a coordinate of a nilcharacter $\widetilde{\psi_{h}}(n)$ of complexity $(M(\delta),d(\delta))$ , similar to in the proof of Lemma 6.3. We thus have

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\widetilde{\chi}(h,n,\ldots,n)\cdot\widetilde{\psi}(n)\otimes\widetilde{\psi_{h}}(n)\rVert_{\infty}\geq 1/M(\delta)

where $\widetilde{\psi}$ and $\widetilde{\psi_{h}}$ have the appropriate properties. ∎

We are now in position to complete the proof of Theorem 1.2 via a symmetrization argument. Our argument is analogous to that of Green, Tao, and Ziegler [34, Section 13] modulo certain minor simplifications to the underlying Cauchy–Schwarz arguments.

Proof of Theorem 1.2.

We may assume that $s\geq 3$ . The case $s=0$ is trivial, $s=1$ is standard Fourier analysis, and the case $s=2$ follows from work of Sanders [52] (see [44, Theorem 8]). Furthermore, throughout the analysis we will assume implicitly that $N\geq\exp(\log(1/\delta)^{\Omega_{s}(1)})$ ; in the case when $N$ is small one may deduce the statement via Fourier analysis. We proceed by induction, assuming that the inverse theorem is known for smaller $s$ .

By Theorem 6.4 and then Lemma 12.1 we may assume that

(12.1)

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\chi(h,n,\ldots,n)\cdot\psi(n)\otimes\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta).

Here $\chi$ is a multidegree $(1,\ldots,1)$ nilcharacter which is symmetric in the final $(s-1)$ variables, $\psi$ is a degree $(s-1)$ nilsequence, and $\psi_{h}$ are degree $(s-2)$ nilcharacters with complexities bounded by complexity $(M(\delta),d(\delta))$ . For $h\notin[N]$ , we take $\psi_{h}(n)$ to be the constant function $1$ (which is a degree $0$ nilcharacter) throughout the argument. Additionally, we may use differently indexed versions of functions $\psi$ that are defined at intermediate stages of the argument; although an abuse of notation, it will always be clear from context.

Step 1: Initial setup for Cauchy–Schwarz argument. For the sake of shorthand, we will denote $\widetilde{\chi}(h,n)=\chi(h,n,\ldots,n)$ where there are $(s-1)$ copies of the variable $n$ . By Lemma 7.2 (taking $f_{1}=f$ and $f_{2}=\psi(n))$ , we have

	$\displaystyle\mathbb{E}_{\begin{subarray}{c}h_{1}+h_{2}=h_{3}+h_{4}\\ h_{i}\in[N]\end{subarray}}\lVert\mathbb{E}_{n\in[N]}\widetilde{\chi}(h_{1},n)\otimes\widetilde{\chi}(h_{2},n+h_{1}-h_{4})\otimes\overline{\widetilde{\chi}(h_{3},n)}\otimes\overline{\widetilde{\chi}(h_{4},n+h_{1}-h_{4})}$
	$\displaystyle\qquad\qquad\qquad\otimes\psi_{h_{1}}(n)\otimes\psi_{h_{2}}(n+h_{1}-h_{4})\otimes\overline{\psi_{h_{3}}(n)}\otimes\overline{\psi_{h_{4}}(n+h_{1}-h_{4})}\cdot e(\Theta n)\rVert_{\infty}\geq 1/M(\delta)$

for some $\lVert\Theta\rVert_{\mathbb{R}/\mathbb{Z}}\leq M(\delta)/N$ . Note that Lemma 7.2 is stated for scalar function; here we are using that we may Pigeonhole on coordinates of the vector $\chi(h,n,\ldots,n)\cdot\psi(n)\otimes\psi_{h}(n)$ before using Lemma 7.2.

We next change variables with $h_{1}=h+x$ , $h_{2}=h+y$ , $h_{3}=h+x+y$ , and $h_{4}=h$ . The above then implies that

	$\displaystyle\mathbb{E}_{h\in[N],x,y\in[\pm N]}\lVert\mathbb{E}_{n\in[N]}\widetilde{\chi}(h+x,n)\otimes\widetilde{\chi}(h+y,n+x)\otimes\overline{\widetilde{\chi}(h+x+y,n)}\otimes\overline{\widetilde{\chi}(h,n+x)}$
	$\displaystyle\qquad\qquad\qquad\otimes\psi_{h+x}(n)\otimes\psi_{h+y}(n+x)\otimes\overline{\psi_{h+x+y}(n)}\otimes\overline{\psi_{h}(n+x)}e(\Theta n)\rVert_{\infty}\geq 1/M(\delta).$

By the first item of Lemma C.3, $\psi_{h+y}(n+x)$ and $\psi_{h+y}(n)$ are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-3)$ . We use that $s\geq 3$ precisely here so that this is a well-defined term.

Therefore by Lemma 7.4, there exists a collection $\psi_{h,x,y}(n)$ of degree $(s-3)$ nilsequences each of complexity $(M(\delta),d(\delta))$ such that

	$\displaystyle\mathbb{E}_{h\in[N],x,y\in[\pm N]}\lVert\mathbb{E}_{n\in[N]}\widetilde{\chi}(h+x,n)\otimes\widetilde{\chi}(h+y,n+x)\otimes\overline{\widetilde{\chi}(h+x+y,n)}\otimes\overline{\widetilde{\chi}(h,n+x)}$
	$\displaystyle\qquad\qquad\qquad\otimes\psi_{h+x}(n)\otimes\psi_{h+y}(n)\otimes\overline{\psi_{h+x+y}(n)}\otimes\overline{\psi_{h}(n+x)}\psi_{h,x,y}(n)\cdot e(\Theta n)\rVert_{\infty}\geq 1/M(\delta).$

We will use $B$ to denote vector-valued functions (which may vary term to term) with coordinates which are $1$ -bounded such that the dimension is bounded by $M(\delta)$ . The key point is that nearly all terms may be folded into $1$ -bounded terms. In particular, we have

\displaystyle\mathbb{E}_{h\in[N],x,y\in[\pm N]}\lVert\mathbb{E}_{n\in[N]}

\displaystyle\widetilde{\chi}(h+y,n+x)\cdot\psi_{h,x,y}(n)\otimes B(h,x,n)\otimes B(h,y,n)\otimes B(h,x+y,n)\rVert_{\infty}\geq 1/M(\delta).

Noting that $\psi_{h,x,y}(n)$ may be twisted by an appropriate complex phase depending on $h$ , we may in fact assume that

\displaystyle\lVert\mathbb{E}_{h,n\in[N],x,y\in[\pm N]}\widetilde{\chi}(h+y,n+x)\cdot\psi_{h,x,y}(n)\otimes B(h,x,n)\otimes B(h,y,n)\otimes B(h,x+y,n)\rVert_{\infty}\geq 1/M(\delta).

By applying Pigeonhole in $h$ , we may fix $h^{\ast}$ such that

\displaystyle\lVert\mathbb{E}_{n\in[N],x,y\in[\pm N]}\widetilde{\chi}(h^{\ast}+y,n+x)\cdot\psi_{h^{\ast},x,y}(n)\otimes B(x,n)\otimes B(y,n)\otimes B(x+y,n)\rVert_{\infty}\geq 1/M(\delta).

Taking the coordinate which achieves the infinity norm, we may assume that $B(\cdot,\cdot)$ are in fact all scalar and thus

\displaystyle\lVert\mathbb{E}_{x,y\in[\pm N]}\mathbb{E}_{n\in[N]}\widetilde{\chi}(h^{\ast}+y,n+x)\cdot\psi_{x,y}(n)\cdot b(x,n)\cdot b(y,n)\cdot b(x+y,n)\rVert_{\infty}\geq 1/M(\delta);

we have dropped $h^{\ast}$ in one subscript here.

By applying the second item of Lemma C.3 and the second item of Lemma C.2, we have that $\widetilde{\chi}(h^{\ast}+y,n+x)$ and $\widetilde{\chi}(y,n+x)$ are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Thus by Lemma 7.4 there exists a nilsequence $\psi^{\ast}(x,y,n)$ of degree $(s-1)$ and complexity $(M(\delta),d(\delta))$ such that

\displaystyle\lVert\mathbb{E}_{x,y\in[\pm N]}\mathbb{E}_{n\in[N]}\widetilde{\chi}(y,n+x)\cdot\psi^{\ast}(x,y,n)\cdot\psi_{x,y}(n)\cdot b(x,n)b(y,n)b(x+y,n)\rVert_{\infty}\geq 1/M(\delta).

Note that a degree $(s-1)$ nilsequence in variables $x,y,n$ is a multidegree $(s-1,s-1,s-3)\cup(1,0,s-2)\cup(0,1,s-2)\cup(0,0,s-1)$ -nilsequence. Therefore applying Lemma C.6 and applying Pigeonhole, we may adjust $\psi_{x,y}$ and the $1$ -bounded functions and remove $\psi^{\ast}$ and thus we may assume that

\displaystyle\lVert\mathbb{E}_{x,y\in[\pm N]}\mathbb{E}_{n\in[N]}\widetilde{\chi}(y,n+x)\cdot\psi_{x,y}(n)\cdot b(x,n)b(y,n)b(x+y,n)\rVert_{\infty}\geq 1/M(\delta);

note that $\psi_{x,y}$ and $B$ have all been modified but we have abusively maintained the same notation. In particular, $\psi_{x,y}(n)$ is degree $(s-3)$ .

By Lemma C.4, the second item of Lemma C.2, and Lemma C.1 (and symmetry of $\chi$ in the final $(s-1)$ coordinates), we have that $\widetilde{\chi}(y,n+x)$ and

\bigotimes_{k=0}^{s-1}\chi(y,n,\ldots,n,x,\ldots,x)^{\otimes\binom{s-1}{k}}

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . In this notation there are $k$ copies of $n$ and $s-1-k$ copies of $x$ . Now by Lemma 7.4, we have

	$\displaystyle\bigg{\lVert}\mathbb{E}_{x,y\in[\pm N]}\mathbb{E}_{n\in[N]}\psi^{\ast}(x,y,n)\cdot\bigotimes_{k=0}^{s-1}\chi(y,n,\ldots,n,x,\ldots,x)^{\otimes\binom{s-1}{k}}\cdot\psi_{x,y}(n)$
	$\displaystyle\qquad\qquad\qquad\qquad\cdot b(x,n)b(y,n)b(x+y,n)\bigg{\rVert}_{\infty}\geq 1/M(\delta)$

where $\psi^{\ast}(x,y,n)$ is a new degree $(s-1)$ nilsequence of complexity $(M(\delta),d(\delta))$ . Applying Lemma C.6 as before, we may adjust $\psi_{x,y}(n)$ and the $1$ -bounded functions and remove this term to have that

\displaystyle\bigg{\lVert}\mathbb{E}_{x,y\in[\pm N]}\mathbb{E}_{n\in[N]}\bigotimes_{k=0}^{s-1}\chi(y,n,\ldots,n,x,\ldots,x)^{\otimes\binom{s-1}{k}}\cdot\psi_{x,y}(n)b(x,n)b(y,n)b(x+y,n)\bigg{\rVert}_{\infty}\geq 1/M(\delta).

Note that the only terms of $\bigotimes_{k=0}^{s-1}\chi(y,n,\ldots,n,x,\ldots,x)^{\otimes\binom{s-1}{k}}$ which involve all of $x,y,n$ with $n$ appearing at least $s-2$ times have exactly one copy of $x$ , one copy of $y$ and $n$ exactly $(s-2)$ times. Therefore taking the coordinate of

\bigotimes_{\begin{subarray}{c}0\leq k\leq s-1\\ k\neq s-2\end{subarray}}\chi(y,n,\ldots,n,x,\ldots,x)^{\otimes\binom{s-1}{k}}

which achieves the infinity norm and adjusting $\psi_{x,y}$ , $b$ , and adding a term $b(x,y)$ we have

\displaystyle\bigg{\lVert}\mathbb{E}_{x,y\in[\pm N]}\mathbb{E}_{n\in[N]}\chi(y,x,n,\ldots,n)^{\otimes(s-1)}\cdot\psi_{x,y}(n)b(x,n)b(y,n)b(x+y,n)b(x,y)\bigg{\rVert}_{\infty}\geq 1/M(\delta).

Step 2: Cauchy–Schwarz to remove $1$ -bounded functions. Applying Cauchy–Schwarz to each coordinate of the associated vector, duplicating the variable $y$ , and using that $b(x,n)$ is $1$ -bounded, we find that

	$\displaystyle\lVert\mathbb{E}_{n\in[N],x\in[\pm N]}\mathbb{E}_{y,y^{\prime}\in[\pm N]}\chi(y,x,n,\ldots,n)^{\otimes(s-1)}\otimes\overline{\chi(y^{\prime},x,n,\ldots,n)^{\otimes(s-1)}}\cdot\psi_{x,y}(n)\overline{\psi_{x,y^{\prime}}(n)}$
	$\displaystyle\qquad\qquad\qquad\qquad\cdot b(y,n)\overline{b(y^{\prime},n)}\cdot b(x+y,n)\overline{b(x+y^{\prime},n)}\cdot b(x,y)\overline{b(x,y^{\prime})}\rVert_{\infty}\geq 1/M(\delta).$

By Lemma C.4, Lemma C.2, and Lemma C.1, we have that

\chi(y,x,n,\ldots,n)^{\otimes(s-1)}\otimes\overline{\chi(y^{\prime},x,n,\ldots,n)^{\otimes(s-1)}}\text{ and }\chi(y-y^{\prime},x,n,\ldots,n)^{\otimes(s-1)}

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Therefore by Lemma 7.4, there exists $\psi^{\ast}(x,y,y^{\prime},n)$ a degree $(s-1)$ nilsequence of complexity $(M(\delta),d(\delta))$ such that

	$\displaystyle\lVert\mathbb{E}_{n\in[N],x,y,y^{\prime}\in[\pm N]}\chi(y-y^{\prime},x,n,\ldots,n)^{\otimes(s-1)}\cdot\psi^{\ast}(x,y,y^{\prime},n)\cdot\psi_{x,y}(n)\overline{\psi_{x,y^{\prime}}(n)}$
	$\displaystyle\qquad\qquad\cdot b(y,n)\overline{b(y^{\prime},n)}\cdot b(x+y,n)\overline{b(x+y^{\prime},n)}\cdot b(x,y)\overline{b(x,y^{\prime})}\rVert_{\infty}\geq 1/M(\delta).$

Note that $z=x+y+y^{\prime}$ ranges in the set $[-3N,3N]$ . Take $\rho=\exp(-\log(1/\delta)^{O_{s}(1)})$ sufficiently small. Then there exists $z^{\ast}$ such that $z^{\ast}\in[-(3-\rho)N,(3-\rho)N]$ such that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{\begin{subarray}{c}x,y,y^{\prime}\in[\pm N]\\ x+y+y^{\prime}=z^{\ast}\end{subarray}}\chi(y-y^{\prime},x,n,\ldots,n)^{\otimes(s-1)}\cdot\psi^{\ast}(x,y,y^{\prime},n)\cdot\psi_{x,y}(n)\overline{\psi_{x,y^{\prime}}(n)}$
	$\displaystyle\qquad\qquad\cdot b(y,n)\overline{b(y^{\prime},n)}\cdot b(x+y,n)\overline{b(x+y^{\prime},n)}\cdot b(x,y)\overline{b(x,y^{\prime})}\rVert_{\infty}\geq 1/M(\delta).$

This implies that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{\begin{subarray}{c}x,y,y^{\prime}\in[\pm N]\\ x+y+y^{\prime}=z^{\ast}\end{subarray}}\chi(y-y^{\prime},z^{\ast}-y-y^{\prime},n,\ldots,n)^{\otimes(s-1)}\cdot\psi^{\ast}(z^{\ast}-y-y^{\prime},y,y^{\prime},n)\cdot\psi_{x,y}(n)\overline{\psi_{x,y^{\prime}}(n)}$
	$\displaystyle\cdot b(y,n)\overline{b(y^{\prime},n)}\cdot b(z^{\ast}-y^{\prime},n)\overline{b(z^{\ast}-y,n)}\cdot b(z^{\ast}-y-y^{\prime},y)\overline{b(z^{\ast}-y-y^{\prime},y^{\prime})}\rVert_{\infty}\geq 1/M(\delta).$

By applying the first item of Lemma C.3, Lemma C.4, Lemma C.2, and Lemma C.1 we have that

\chi(y-y^{\prime},z^{\ast}-y-y^{\prime},n,\ldots,n)^{\otimes(s-1)}

and

\chi(y^{\prime},y^{\prime},n,\ldots,n)^{\otimes(s-1)}\chi(y^{\prime},y,n,\ldots,n)^{\otimes(s-1)}\overline{\chi(y,y,n,\ldots,n)}^{\otimes(s-1)}\overline{\chi(y,y^{\prime},n,\ldots,n)}^{\otimes(s-1)}

are $(M(\delta),M(\delta),d(\delta))$ equivalent for degree $(s-1)$ . Thus by Lemma 7.4 and letting $\widetilde{\psi}$ denote a degree $(s-1)$ nilsequence in $y,y^{\prime},n$ of complexity $(M(\delta),d(\delta))$ we have that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{\begin{subarray}{c}y,y^{\prime}\in[\pm N]\\ \|z^{\ast}-y-y^{\prime}\|\leq N\end{subarray}}\chi(y^{\prime},y^{\prime},n,\ldots,n)^{\otimes(s-1)}\chi(y^{\prime},y,n,\ldots,n)^{\otimes(s-1)}$
	$\displaystyle\overline{\chi(y,y,n,\ldots,n)}^{\otimes(s-1)}\overline{\chi(y,y^{\prime},n,\ldots,n)}^{\otimes(s-1)}\cdot\widetilde{\psi}(y,y^{\prime},n)\cdot\psi_{z^{\ast}-y-y^{\prime},y}(n)\overline{\psi_{z^{\ast}-y-y^{\prime},y^{\prime}}(n)}$
	$\displaystyle\cdot b(y,n)\overline{b(y^{\prime},n)}\cdot b(z^{\ast}-y^{\prime},n)\overline{b(z^{\ast}-y,n)}\cdot b(z^{\ast}-y-y^{\prime},y)\cdot\overline{b(z^{\ast}-y-y^{\prime},y^{\prime})}\rVert_{\infty}\geq 1/M(\delta).$

Here we have “folded” in $\psi^{\ast}(z^{\ast}-y-y^{\prime},y,y^{\prime},n)$ via Lemma C.2 in $\widetilde{\psi}$ . We may collapse various $1$ -bounded functions (and pass to the coordinates of $\chi(y^{\prime},y^{\prime},n,\ldots,n)^{\otimes(s-1)}$ and $\overline{\chi(y,y,n,\ldots,n)}^{\otimes(s-1)}$ which achieve the $L^{\infty}$ norm) and obtain

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{\begin{subarray}{c}y,y^{\prime}\in[\pm N]\\ \|z^{\ast}-y-y^{\prime}\|\leq N\end{subarray}}\chi(y^{\prime},y,n,\ldots,n)^{\otimes(s-1)}\overline{\chi(y,y^{\prime},n,\ldots,n)}^{\otimes(s-1)}$
	$\displaystyle\qquad\qquad\cdot\widetilde{\psi}(y,y^{\prime},n)\cdot\psi_{y,y^{\prime}}(n)\cdot b(y,n)b(y^{\prime},n)b(y,y^{\prime})\rVert_{\infty}\geq 1/M(\delta);$

here the $\widetilde{\psi}_{y,y^{\prime}}(n)$ are degree $(s-3)$ nilsequences of complexity $(M(\delta),d(\delta))$ . Furthermore as $\widetilde{\psi}(y,y^{\prime},n)$ is a degree $(s-1)$ nilsequence, we have that it is a multidegree $(s-1,s-1,s-3)\cup(1,0,s-1)\cup(0,1,s-1)$ nilsequence. Therefore by Lemma C.6 and Lemma C.2, we may remove $\widetilde{\psi}$ at the cost of adjusting $b$ and $\psi_{y,y^{\prime}}$ to obtain

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{\begin{subarray}{c}y,y^{\prime}\in[\pm N]\\ \|z^{\ast}-y-y^{\prime}\|\leq N\end{subarray}}\chi(y^{\prime},y,n,\ldots,n)^{\otimes(s-1)}\overline{\chi(y,y^{\prime},n,\ldots,n)}^{\otimes(s-1)}$
	$\displaystyle\qquad\qquad\cdot\psi_{y,y^{\prime}}(n)\cdot b(y,n)b(y^{\prime},n)b(y,y^{\prime})\rVert_{\infty}\geq 1/M(\delta).$

This implies that

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{y,y^{\prime}\in[\pm N]}\chi(y^{\prime},y,n,\ldots,n)^{\otimes(s-1)}\overline{\chi(y,y^{\prime},n,\ldots,n)}^{\otimes(s-1)}$
	$\displaystyle\qquad\qquad\cdot\psi_{y,y^{\prime}}(n)\cdot b(y,n)b(y^{\prime},n)b(y,y^{\prime})\cdot\mathbbm{1}_{\|z^{\ast}-y-y^{\prime}\|\leq N}\rVert_{\infty}\geq 1/M(\delta).$

as $\mathbb{P}[\mathbbm{1}_{|z^{\ast}-y-y^{\prime}|\leq N}]\gtrsim\rho^{2}$ . Note that the final indicator may be absorbed into $b(y,y^{\prime})$ to obtain

	$\displaystyle\lVert\mathbb{E}_{n\in[N]}\mathbb{E}_{y,y^{\prime}\in[\pm N]}\chi(y^{\prime},y,n,\ldots,n)^{\otimes(s-1)}\overline{\chi(y,y^{\prime},n,\ldots,n)}^{\otimes(s-1)}$
	$\displaystyle\qquad\qquad\cdot\psi_{y,y^{\prime}}(n)\cdot b(y,n)b(y^{\prime},n)b(y,y^{\prime})\rVert_{\infty}\geq 1/M(\delta).$

Define $G(y,y^{\prime},n)=\chi(y,y^{\prime},n,\ldots,n)^{\otimes(s-1)}\otimes\overline{\chi(y^{\prime},y,n,\ldots,n)}^{\otimes(s-1)}$ and we have

\displaystyle\lVert\mathbb{E}_{n\in[N],y,y^{\prime}\in[\pm N]}G(y,y^{\prime},n)\cdot\overline{\psi_{y,y^{\prime}}(n)}\cdot b(y,n)b(y^{\prime},n)b(y,y^{\prime})\rVert_{\infty}\geq 1/M(\delta).

Applying Cauchy–Schwarz in $n$ , then $y$ , and then $y^{\prime}$ (analogously to as in Lemma 7.2) we may remove the bounded functions $b$ and we have

	$\displaystyle\lVert\mathbb{E}_{n_{1},n_{2}\in[N],y_{1},y_{2},y_{1}^{\prime},y_{2}^{\prime}\in[\pm N]}\bigotimes_{\varepsilon\in\{1,2\}^{3}}\mathcal{C}^{\|\varepsilon\|-1}G(y_{\varepsilon_{1}},y_{\varepsilon_{2}}^{\prime},n_{\varepsilon_{3}})\cdot\psi_{y_{1},y_{2},y_{1}^{\prime},y_{2}^{\prime}}(n_{1})\cdot\overline{\psi_{y_{1},y_{2},y_{1}^{\prime},y_{2}^{\prime}}(n_{2})}\rVert_{\infty}$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\geq 1/M(\delta),$

where $\mathcal{C}$ denotes conjugation and the $\psi_{y_{1},y_{2},y_{1}^{\prime},y_{2}^{\prime}}(n_{i})$ are degree at most $(s-3)$ nilsequences of complexity $(M(\delta),d(\delta))$ . Applying Pigeonhole in $n_{2},y_{2},y_{2}^{\prime}$ and applying Lemma C.2 to specialize variables, reindexing $n_{1},y_{1},y_{1}^{\prime}$ to $n,y,y^{\prime}$ , and taking the maximal coordinate we have

\displaystyle\lVert\mathbb{E}_{n\in[N],y,y^{\prime}\in[\pm N]}G(y,y^{\prime},n)\cdot\psi_{1}(y,n)\cdot\psi_{2}(y^{\prime},n)\cdot\psi_{y,y^{\prime}}(n)\rVert_{\infty}\geq 1/M(\delta).

Here $\psi_{y,y^{\prime}}$ is degree at most $(s-3)$ in $n$ while $\psi_{1}(y,n)$ and $\psi_{2}(y,n)$ are multidegree $(1,s-2)$ and all have complexity $(M(\delta),d(\delta))$ . Finally by the triangle inequality we have

\displaystyle\mathbb{E}_{y,y^{\prime}\in[\pm N]}\lVert\mathbb{E}_{n\in[N]}G(y,y^{\prime},n)\cdot\psi_{1}(y,n)\cdot\psi_{2}(y^{\prime},n)\cdot\psi_{y,y^{\prime}}(n)\rVert_{\infty}\geq 1/M(\delta).

Step 3: Converse of the inverse theorem and polarization. By the converse of the inverse theorem, see Lemma B.5, we have that

\displaystyle\mathbb{E}_{y,y^{\prime}\in[N]}\lVert G(y,y^{\prime},\cdot)\psi_{1}(y,\cdot)\psi_{2}(y^{\prime},\cdot)\rVert_{U^{s-2}[N]}^{2^{s-2}}\geq 1/M(\delta).

Expanding out the definition of the $U^{s-2}$ -norm, we find that

	$\displaystyle\bigg{\lVert}\mathbb{E}_{y,y^{\prime}\in[\pm N]}\mathbb{E}_{n\in[N],h_{1},\ldots,h_{s-2}\in[\pm N]}\bigotimes_{\varepsilon\in\{0,1\}^{s-2}}\Big{(}\mathcal{C}^{\|\varepsilon\|+s}(G(y,y^{\prime},n+\varepsilon\cdot\vec{h})\cdot\psi_{1}(y,n+\varepsilon\cdot\vec{h})\cdot\psi_{2}(y^{\prime},n+\varepsilon\cdot\vec{h}))$
	$\displaystyle\qquad\qquad\cdot\mathbbm{1}_{n+\varepsilon\cdot\vec{h}\in[N]}\Big{)}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

The crucial point is that by repeatedly applying Lemma C.4, Lemma C.2, and Lemma C.1 we have that

\bigotimes_{\varepsilon\in\{0,1\}^{s-2}}\mathcal{C}^{|\varepsilon|+s}(G(y,y^{\prime},n+\varepsilon\cdot\vec{h}))

and

\chi(y,y^{\prime},h_{1},\ldots,h_{s-2})^{\otimes(s-1)!}\cdot\overline{\chi(y^{\prime},y,h_{1},\ldots,h_{s-2})}^{\otimes(s-1)!}

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Therefore by Lemma 7.4, there exists a nilsequence $\widetilde{\psi}$ of degree $(s-1)$ and complexity $(M(\delta),d(\delta))$ such that

	$\displaystyle\lVert\mathbb{E}_{y,y^{\prime}\in[\pm N]}\mathbb{E}_{n\in[N],h_{1},\ldots,h_{s-2}\in[\pm N]}\chi(y,y^{\prime},h_{1},\ldots,h_{s-2})^{\otimes(s-1)!}\cdot\overline{\chi(y^{\prime},y,h_{1},\ldots,h_{s-2})}^{\otimes(s-1)!}$
	$\displaystyle\qquad\cdot\widetilde{\psi}(n,y,y^{\prime},h_{1},\ldots,h_{s-2})\cdot\mathbbm{1}_{n+\varepsilon\cdot\vec{h}\in[N]}\rVert_{\infty}\geq 1/M(\delta).$

Via Fourier expansion (a multidimensional version of the argument in Lemma 7.1), we may fold in $\mathbbm{1}_{n+\varepsilon\cdot\vec{h}\in[N]}$ into $\widetilde{\psi}(n,y,y^{\prime},h_{1},\ldots,h_{s-2})$ .⁸⁸8To be precise, we convolve $\mathbbm{1}_{n+\varepsilon\cdot\vec{h}\in[N]}$ with $\mathbbm{1}_{|n|\leq\rho N}\cdot\prod_{i=1}^{s-2}\mathbbm{1}_{|h_{i}|\leq\rho N}$ where $\rho=1/M(\delta)$ is sufficiently small. This function has the necessary Fourier decay to apply the analysis in Lemma 7.1 We reduce to

	$\displaystyle\lVert\mathbb{E}_{y,y^{\prime}\in[\pm N]}\mathbb{E}_{n\in[N],h_{1},\ldots,h_{s-2}\in[\pm N]}\chi(y,y^{\prime},h_{1},\ldots,h_{s-2})^{\otimes(s-1)!}\cdot\overline{\chi(y^{\prime},y,h_{1},\ldots,h_{s-2})}^{\otimes(s-1)!}$
	$\displaystyle\qquad\cdot\widetilde{\psi}(n,y,y^{\prime},h_{1},\ldots,h_{s-2})\rVert_{\infty}\geq 1/M(\delta).$

Applying Pigeonhole in $n$ and applying the first item of Lemma C.2, we reduce to

(12.2)

\displaystyle\begin{split}&\lVert\mathbb{E}_{y,y^{\prime}\in[\pm N]}\mathbb{E}_{h_{1},\ldots,h_{s-2}\in[\pm N]}\chi(y,y^{\prime},h_{1},\ldots,h_{s-2})^{\otimes(s-1)!}\cdot\overline{\chi(y^{\prime},y,h_{1},\ldots,h_{s-2})}^{\otimes(s-1)!}\\ &\qquad\qquad\qquad\qquad\otimes\widetilde{\psi}(y,y^{\prime},h_{1},\ldots,h_{s-2})\rVert_{\infty}\geq 1/M(\delta);\end{split}

once again we have abusively updated $\widetilde{\psi}$ , which has degree $(s-1)$ .

Step 4: Invoking equidistribution theory. This is the unique moment we have the ability to apply equidistribution theory; up to this point we have been applying “elementary” facts regarding nilsequences. Let

\chi(y,y^{\prime},h_{1},\ldots,h_{s-2})=F(g(y,y^{\prime},h_{1},\ldots,h_{s-2})\Gamma)

and let $\xi$ denote the vertical $G_{(1,\ldots,1)}$ frequency of $F$ on the multidegree $(1,\ldots,1)$ nilmanifold $G/\Gamma$ . We write

\widetilde{\psi}(y,y^{\prime},h_{1},\ldots,h_{s-2})=\widetilde{F}(g^{\ast}(y,y^{\prime},h_{1},\ldots,h_{s-2})\Gamma^{\prime})

on the multidegree $(s-1)$ nilmanifold $G^{\prime}/\Gamma^{\prime}$ . Note that

(g(y,y^{\prime},h_{1},\ldots,h_{s-2}),g(y^{\prime},y,h_{1},\ldots,h_{s-2}),g^{\ast}(y,y_{1},h_{1},\ldots,h_{s-2}))

may be viewed as a polynomial sequence on $G\times G\times G^{\prime}$ where $G^{\prime}$ is given a degree $(s-1)$ filtration. $G\times G\times G^{\prime}$ is given a degree $s$ filtration where the $t$ -th group is

(G\times G\times G^{\prime})_{t}=\bigvee_{|\vec{i}|=t}G_{\vec{i}}\times\bigvee_{|\vec{i}|=t}G_{\vec{i}}\times(G^{\prime})_{t}.

Note that $F\otimes\overline{F}\otimes\widetilde{F}$ has $(G\times G\times G^{\prime})_{s}$ -vertical frequency $\xi^{\prime}=(\xi,-\xi,0)$ , noting that $(G^{\prime})_{s}=\mathrm{Id}_{G^{\prime}}$ . By applying Corollary 5.5 with (LABEL:eq:main-2) to

F(g(y,y^{\prime},h_{1},\ldots,h_{s-2})\Gamma)^{\otimes(s-1)!}\otimes\overline{F(g(y^{\prime},y,h_{1},\ldots,h_{s-2})\Gamma)}^{\otimes(s-1)!}\cdot\widetilde{F}(g^{\ast}(y,y^{\prime},h_{1},\ldots,h_{s-2})\Gamma^{\prime}),

and restricting the factorization to $G\times G$ , we have

	$\displaystyle(g(y,y^{\prime},h_{1},\ldots,h_{s-2}),g(y^{\prime},y,h_{1},\ldots,h_{s-2}))$
	$\displaystyle\qquad=\varepsilon(y,y_{1},h_{1},\ldots,h_{s-2})\cdot g^{\mathrm{Output}}(y,y_{1},h_{1},\ldots,h_{s-2})\cdot\gamma(y,y_{1},h_{1},\ldots,h_{s-2}),$

where

•

$g^{\mathrm{Output}}$ lives in an $M(\delta)$ -rational subgroup $H$ such that $\xi^{\prime}(H\cap(G\times G)_{s})=0$ ;
•

$\gamma$ is an $M(\delta)$ -rational polynomial sequence;
•

$\varepsilon$ is $(M(\delta),N)$ -smooth.

Note that when apply Corollary 5.5 the vertical frequency of the function we have is $(s-1)!\cdot\xi^{\prime}$ and we obtain $(s-1)!\xi^{\prime}(H\cap(G\times G)_{s})=0$ ; we may divide by $(s-1)!$ to obtain the above. Additionally, we have implicitly used that $\xi^{\prime}$ is trivial in the $G^{\prime}$ part and abuse notation to descend $\xi^{\prime}$ to $G\times G$ .

Let $F^{\ast}=F\otimes\overline{F}$ and note that therefore

\displaystyle\chi(h,n,\ldots,n)\otimes\overline{\chi(n,h,n,\ldots,n)}=F^{\ast}(\varepsilon(h,n,\ldots,n)g^{\mathrm{Output}}(h,n,\ldots,n)\cdot\gamma(h,n,\ldots,n)(\Gamma\times\Gamma)).

Step 5: The finishing touch. We now recall from (12.1) that

\mathbb{E}_{h\in[N]}\lVert\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\otimes\chi(h,n,\ldots,n)\cdot\psi(n)\cdot\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta);

here we have restricted to a coordinate of $\psi_{h}$ and we treat it as a degree $(s-2)$ nilsequence (rather than using the nilcharacter). By applying Pigeonhole there exist $q,q^{\prime}\in[s]$ such that

\mathbb{E}_{h\in[N/s]}\lVert\mathbb{E}_{n\in[N/s]}\Delta_{sh+q^{\prime}}f(sn+q)\otimes\chi(sh+q^{\prime},sn+q,\ldots,sn+q)\cdot\psi(sn+q)\cdot\psi_{h}(sn+q)\rVert_{\infty}\geq 1/M(\delta).

By Lemma C.3, we have that

\chi(sh+q^{\prime},sn+q,\ldots,sn+q)\text{ and }\chi(sh,sn,\ldots,sn)

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Applying Lemma C.6 (splitting) and adjusting $\psi,\psi_{h}$ , we may instead assume that

\displaystyle\mathbb{E}_{h\in[N/s]}\lVert\mathbb{E}_{n\in[N/s]}\Delta_{sh+q^{\prime}}f(sn+q)\otimes\chi(sh,sn,\ldots,sn)\cdot\psi(n)\cdot\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta)

for $\psi$ of degree $(s-1)$ and $\psi_{h}$ of degree $(s-2)$ . Now define

T(h,n):=\chi(n+h,\ldots,n+h)\otimes\overline{\chi(h,n,\ldots,n)}\otimes\overline{\chi(n,h,\ldots,n)}^{\otimes(s-1)}\otimes\overline{\chi(n,n,\ldots,n)}.

Since this is a nilcharacter, we automatically know

	$\displaystyle\mathbb{E}_{h\in[N/s]}\lVert\mathbb{E}_{n\in[N/s]}\Delta_{sh+q^{\prime}}f(sn+q)\otimes\chi(sh,sn,\ldots,sn)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\otimes T(h,n)^{\otimes s^{s-1}}\otimes\overline{T(h,n)}^{\otimes s^{s-1}}\rVert_{\infty}\geq 1/M(\delta).$

We define

	$\displaystyle\widetilde{f}_{1}(n)$	$\displaystyle=f(sn+q)\cdot\overline{\chi(n,\ldots,n)}^{\otimes s^{s-1}},$
	$\displaystyle\widetilde{f}_{2}(n+h)$	$\displaystyle=\overline{f(s(n+h)+q+q^{\prime})}\cdot\chi(n+h,\ldots,n+h)^{\otimes s^{s-1}},$

which yields

	$\displaystyle\mathbb{E}_{h\in[N/s]}\lVert\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\otimes\chi(sh,sn,\ldots,sn)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(\overline{\chi(h,n,\ldots,n)}\otimes\overline{\chi(n,h,n,\ldots,n)}^{\otimes(s-1)})^{\otimes s^{s-1}}\otimes\overline{T(h,n)}^{\otimes s^{s-1}}\rVert_{\infty}\geq 1/M(\delta).$

By applications of Lemma C.4, Lemma C.2, and Lemma C.1 we have that $T(h,n)$ and

\bigotimes_{k=1}^{s-1}\chi(h,h,\ldots,h,n,\ldots,n)^{\binom{s-1}{k}}\otimes\bigotimes_{k=2}^{s-1}\chi(n,h,\ldots,h,n,\ldots,n)^{\binom{s-1}{k}}

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . (There are $k+1$ many $h$ ’s in the first term and $k$ many $h$ ’s in the second term.) Applying Lemma C.6, we may approximate each coordinate as a sum of products of multidegree $(s-1,s-2)$ and $(0,s-1)$ nilsequences in variables $(h,n)$ . Furthermore, by the second item of Lemma C.2 this new nilsequence is of similar type. So, folding everything into $\psi(n)$ of degree $(s-1)$ and the $\psi_{h}(n)$ of degree $(s-2)$ , we find

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\otimes\chi(sh,sn,\ldots,sn)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\qquad\qquad\otimes(\overline{\chi(h,n,\ldots,n)}\otimes\overline{\chi(n,h,\ldots,n)}^{\otimes(s-1)})^{\otimes s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

Furthermore note by Lemma C.3 that

\chi(sh,sn,\ldots,sn)\text{ and }\chi(h,n,\ldots,n)^{\otimes s^{s}}

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Applying Lemma 7.4 and Lemma C.6 and adjusting $\psi$ and $\psi_{h}$ yet again we have

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\otimes\chi(h,n,\ldots,n)^{\otimes s^{s}}\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\qquad\qquad\otimes(\overline{\chi(h,n,\ldots,n)}\otimes\overline{\chi(n,h,\ldots,n)}^{\otimes(s-1)})^{\otimes s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

Now by Lemma C.3 we have that

\chi(h,n,\ldots,n)^{\otimes s^{s}}\otimes\overline{\chi(h,n,\ldots,n)}^{\otimes s^{s-1}}\text{ and }\chi(h,n,\ldots,n)^{\otimes(s-1)\cdot s^{s-1}}

are $(M(\delta),M(\delta),d(\delta))$ -equivalent for degree $(s-1)$ . Thus applying Lemma 7.4 and Lemma C.6 and adjusting $\psi$ and $\psi_{h}$ once again we have

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)\otimes(\chi(h,n,\ldots,n)\otimes\overline{\chi(n,h,\ldots,n)})^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\geq 1/M(\delta).$

This is finally where we may apply our earlier factorization for $\chi(h,n,\ldots,n)\otimes\overline{\chi(n,h,\ldots,n)}$ . Recall that

\displaystyle\chi(h,n,\ldots,n)\otimes\overline{\chi(n,h,n,\ldots,n)}=F^{\ast}(\varepsilon(h,n,\ldots,n)g^{\mathrm{Output}}(h,n,\ldots,n)\cdot\gamma(h,n,\ldots,n)(\Gamma\times\Gamma))

where $\gamma$ is $M(\delta)$ -periodic and $\varepsilon$ is $(M(\delta),N)$ -smooth. Let $Q$ denote the period of $\gamma$ (i.e., changing any argument by a multiple of $Q$ keeps its $\Gamma\times\Gamma$ coset the same) and take $\rho=\exp(-\log(1/\delta)^{O_{s}(1)})$ where the implicit constant is sufficiently large. Break $[N/s]$ into arithmetic progressions of length roughly $\rho N$ and common difference $Q$ ; call these $\mathcal{P}_{1},\ldots,\mathcal{P}_{\ell}$ . There exist $\varepsilon_{\mathcal{P}_{i,h}}$ and $\gamma_{\mathcal{P}_{i,h}}$ such that

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\sum_{i=1}^{\ell}\mathbbm{1}_{n\in\mathcal{P}_{i}}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\ast}(\varepsilon_{\mathcal{P}_{i,h}}\gamma_{\mathcal{P}_{i,h}}(\gamma_{\mathcal{P}_{i,h}}^{-1}g^{\mathrm{Output}}(h,n,\ldots,n)\gamma_{\mathcal{P}_{i,h}})(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta)$

where $d_{G\times G}(\varepsilon_{\mathcal{P}_{i,h}},\mathrm{id}_{G\times G})+d_{G\times G}(\gamma_{\mathcal{P}_{i,h}},\mathrm{id}_{G\times G})\leq\exp(\log(1/\delta)^{O_{s}(1)})$ and $\gamma_{\mathcal{P}_{i,h}}$ is $\exp(\log(1/\delta)^{O_{s}(1)})$ -rational.

By Pigeonhole, there exists an index $i$ such that

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\mathbbm{1}_{n\in\mathcal{P}_{i}}\cdot\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\ast}(\varepsilon_{\mathcal{P}_{i,h}}\gamma_{\mathcal{P}_{i,h}}(\gamma_{\mathcal{P}_{i,h}}^{-1}g^{\mathrm{Output}}(h,n,\ldots,n)\cdot\gamma_{\mathcal{P}_{i,h}})(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

As $\gamma_{\mathcal{P}_{i,h}}$ is $\exp(\log(1/\delta)^{O_{s}(1)})$ -rational and bounded, it takes on only $\exp(\log(1/\delta)^{O_{s}(1)})$ possible values. Thus by Pigeonhole, there is $\gamma\in\Gamma\times\Gamma$ such that $d_{G\times G}(\gamma,\mathrm{id}_{G\times G})\leq\exp(\log(1/\delta)^{O_{s}(1)})$ and $\gamma$ is $\exp(\log(1/\delta)^{O_{s}(1)})$ -rational such that

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\mathbbm{1}_{n\in\mathcal{P}_{i}}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\ast}(\varepsilon_{\mathcal{P}_{i,h}}\gamma g^{\mathrm{Conj}}(h,n,\ldots,n)(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta),$

where $g^{\mathrm{Conj}}=\gamma^{-1}g^{\mathrm{Output}}\gamma$ . Finally, rounding $\varepsilon_{\mathcal{P}_{i,h}}\gamma$ to a $\exp(-\log(1/\delta)^{O_{s}(1)})$ -net and noting it is $\exp(\log(1/\delta)^{O_{s}(1)})$ -bounded, there exists $\varepsilon$ such that

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\mathbbm{1}_{n\in\mathcal{P}_{i}}\cdot\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\ast}(\varepsilon g^{\mathrm{Conj}}(h,n,\ldots,n)(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta)$

and $d_{G\times G}(\varepsilon,\mathrm{id}_{G\times G})\leq\exp(\log(1/\delta)^{O_{s}(1)})$ , as long as $\rho$ was chosen small enough.

By Lemma 7.1, there exists $\Theta_{h}$ such that

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}e(\Theta_{h}n)\cdot\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\ast}(\varepsilon g^{\mathrm{Conj}}(h,n,\ldots,n)(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

As $(s-2)\geq 1$ , we may absorb $e(\Theta_{h}n)$ into $\psi_{h}(n)$ and obtain

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\ast}(\varepsilon g^{\mathrm{Conj}}(h,n,\ldots,n)(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

Replacing $F^{\ast}$ with $F^{\mathrm{Final}}(g)=F^{\ast}(\varepsilon g\Gamma)$ and writing $g^{\mathrm{Final}}(h,n)=g^{\mathrm{Conj}}(h,n,\ldots,n)$ , we have

	$\displaystyle\mathbb{E}_{h\in[N/s]}\bigg{\lVert}\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)$
	$\displaystyle\qquad\qquad\otimes(F^{\mathrm{Final}}(g^{\mathrm{Final}}(h,n)(\Gamma\times\Gamma)))^{\otimes(s-1)s^{s-1}}\bigg{\rVert}_{\infty}\geq 1/M(\delta).$

Now $g^{\mathrm{Final}}(h,n)$ takes values in $\gamma^{-1}H\gamma$ such that $\xi^{\prime}(\gamma^{-1}H\gamma\cap(G\times G)_{s})=0$ . The key point is to note that $F^{\mathrm{Final}}$ is right-invariant under $(\gamma^{-1}H\gamma)\cap(G\times G)_{s}$ since it has $(G\times G)_{s}$ -vertical frequency $\xi^{\prime}$ . Note that $\gamma^{-1}H\gamma$ has complexity bounded by $M(\delta)$ due to [42, Lemma B.15]. Furthermore $F^{\mathrm{Final}}$ is $M(\delta)$ -Lipschitz on $\gamma^{-1}H\gamma$ by [42, Lemma B.9, B.15]. Taking the quotient by $(\gamma^{-1}H\gamma)\cap(G\times G)_{s}$ gives that each coordinate of $(F^{\mathrm{Final}}(g^{\mathrm{Final}}(h,n)(\Gamma\times\Gamma)))^{\otimes s^{s-1}}$ may be realized a complexity $(M(\delta),d(\delta))$ nilsequence of degree $(s-1)$ .

Applying Pigeonhole in the coordinates of $(F^{\mathrm{Final}})^{\otimes s^{s-1}}$ and then Lemma C.6 to approximate as a sum of products of multidegree $(s-1,s-2)$ and $(0,s-1)$ nilsequences in variables $(h,n)$ . So again folding everything into $\psi(n)$ of degree $(s-1)$ and the $\psi_{h}(n)$ of degree $(s-2)$ , we find

\displaystyle\mathbb{E}_{h\in[N/s]}\lVert\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1}(n)\otimes\widetilde{f}_{2}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)\rVert_{\infty}\geq 1/M(\delta)

The functions $\widetilde{f}_{1}$ and $\widetilde{f}_{2}$ are vector-valued, but by Pigeonhole there exist coordinates $j_{1},j_{2}$ are coordinates of the vectors $\widetilde{f}_{1}$ and $\widetilde{f}_{2}$ such that

\displaystyle\mathbb{E}_{h\in[N/s]}|\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1,j_{1}}(n)\widetilde{f}_{2,j_{2}}(n+h)\cdot\psi(n)\cdot\psi_{h}(n)|\geq 1/M(\delta).

Since $\psi_{h}(n)$ is a nilsequence of degree $(s-2)$ and complexity $(M(\delta),d(\delta))$ , by the converse of the inverse theorem (see Lemma B.5) we have that

\displaystyle\mathbb{E}_{h\in[N/s]}\lVert\widetilde{f}_{1,j_{1}}(\cdot)\widetilde{f}_{2,j_{2}}(\cdot+h)\psi(\cdot)\rVert_{U^{s-1}[N/s]}^{2^{s-1}}\geq 1/M(\delta).

By the Gowers–Cauchy–Schwarz inequality (e.g. [17, Lemma 3.8]), we have that

\displaystyle\lVert\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1,j_{1}}(n)\psi(n)\rVert_{U^{s}[N/s]}\geq 1/M(\delta).

By induction, there is a nilsequence $\Theta(n)$ of degree $(s-1)$ and complexity $(M(\delta),d(\delta))$ such that

\displaystyle|\mathbb{E}_{n\in[N/s]}\widetilde{f}_{1,j_{1}}(n)\psi(n)\Theta(n)|\geq 1/M(\delta).

Now recall that

\widetilde{f}_{1}(n)=f(sn+q)\cdot\overline{\chi(n,\ldots,n)}^{\otimes s^{s-1}}.

Each coordinate of $\overline{\chi(n,\ldots,n)}^{\otimes s^{s-1}}$ is a degree $s$ nilsequence of complexity $(M(\delta),d(\delta))$ ; say $j_{1}$ -th coordinate is $\Theta^{\prime}(n)$ and thus we have

\displaystyle|\mathbb{E}_{n\in[N/s]}f(sn+q)\Theta^{\prime}(n)\psi(n)\Theta(n)|\geq 1/M(\delta).

This is equivalent to

\displaystyle|\mathbb{E}_{n\in[N]}\mathbbm{1}[n\equiv q~{}\mathrm{mod}~{}s]f(n)\Theta^{\prime}((n-q)/s)\psi((n-q)/s)\Theta((n-q)/s)|\geq 1/M(\delta).

Note the condition

\mathbbm{1}[n\equiv q~{}\mathrm{mod}~{}s]=s^{-1}\sum_{j=0}^{s-1}e\bigg{(}\frac{j\cdot(n-q)}{s}\bigg{)}

and thus there $j$ such that

\displaystyle|\mathbb{E}_{n\in[N]}f(n)\Theta^{\prime}((n-q)/s)\psi((n-q)/s)\Theta((n-q)/s)e(jn/s)|\geq 1/M(\delta).

The desired nilsequence is then

\overline{\Theta^{\prime}((n-q)/s)\psi((n-q)/s)\Theta((n-q)/s)e(jn/s)}

which is seen to have degree $s$ and complexity $(M(\delta),d(\delta))$ . We have finally won. ∎

Appendix A On approximate homomorphisms

In this section, we give a number of basic results regarding approximate homomorphisms. The results in this section are, by now, well known consequences of work of Sanders [52]. The proof we give is essentially that in [43], modulo being forced to deal with slight error terms and operating over $\mathbb{Z}$ . We dispose of these error terms via a rounding trick of Green, Tao, and Ziegler [32, Appendix C].

Lemma A.1.

Fix $\delta\in(0,1/2)$ , let $H_{1},H_{2},H_{3},H_{4}\subseteq[N]$ and let functions $f_{i}\colon H_{i}\to\mathbb{R}^{d}$ be such that there are at least $\delta N^{3}$ additive tuples $h_{1}+h_{2}=h_{3}+h_{4}$ with

\lVert(f_{1}(h_{1})+f_{2}(h_{2})-f_{3}(h_{3})-f_{4}(h_{4}))_{j}\rVert_{\mathbb{R}/\mathbb{Z}}\leq\varepsilon_{j}

for all $1\leq j\leq d$ . Then there exists $H_{1}^{\prime}\subseteq H_{1}$ with $|H_{1}^{\prime}|\geq\exp(-(d\log(1/\delta))^{O(1)})N$ such that

\bigg{\lVert}\bigg{(}f_{1}(h)-\sum_{i=1}^{d^{\prime}}a_{i}\{\alpha_{i}h\}-b\bigg{)}_{j}\bigg{\rVert}_{\mathbb{R}/\mathbb{Z}}\leq\varepsilon_{j}

for all $h\in H_{1}^{\prime}$ , for appropriate choices of $d^{\prime}\leq(d\log(1/\delta))^{O(1)}$ , $a_{i},b\in\mathbb{R}^{d}$ , and $\alpha_{i}\in(1/N^{\prime})\mathbb{Z}$ where $N^{\prime}$ is a prime between $100N$ and $200N$ .

We deduce the result from the following variant which is the same statement modulo not having an error term.

Lemma A.2.

Fix $\delta\in(0,1/2)$ . Let $H_{1},H_{2},H_{3},H_{4}\subseteq[N]$ and $f_{i}\colon H\to\mathbb{R}^{d}$ be such that there are at least $\delta N^{3}$ additive tuples $h_{1}+h_{2}=h_{3}+h_{4}$ with

f_{1}(h_{1})+f_{2}(h_{2})-f_{3}(h_{3})-f_{4}(h_{4})\in\mathbb{Z}^{d}.

Then there exists $H_{1}^{\prime}\subseteq H_{1}$ with $|H_{1}^{\prime}|\geq\exp(-\log(1/\delta)^{O(1)})N$ such that

f_{1}(h)-\sum_{i=1}^{d^{\prime}}a_{i}\{\alpha_{i}h\}-b\in\mathbb{Z}^{d}

for all $h\in H_{1}^{\prime}$ , for appropriate choices of $d^{\prime}\leq\log(1/\delta)^{O(1)}$ , $a_{i},b\in\mathbb{R}^{d}$ , and $\alpha_{i}\in(1/N^{\prime})\mathbb{Z}$ where $N^{\prime}$ is a prime between $100N$ and $200N$ .

We briefly give the deduction, and then in the sequel focus on Lemma A.2.

Proof of Lemma A.1 given Lemma A.2.

Round each value of $f_{i}$ to the nearest point in the lattice $(\varepsilon_{1}\mathbb{Z},\ldots,\varepsilon_{d}\mathbb{Z})$ to form $\widetilde{f_{i}}$ (breaking ties arbitrarily). We have that

\lVert(\widetilde{f_{1}}(h_{1})+\widetilde{f_{2}}(h_{2})-\widetilde{f_{3}}(h_{3})-\widetilde{f_{4}}(h_{4}))_{j}\rVert_{\mathbb{R}/\mathbb{Z}}\leq 5\varepsilon_{j}

for at least $\delta N^{4}$ additive tuples.

Note however that

\widetilde{f_{1}}(h_{1})+\widetilde{f_{2}}(h_{2})-\widetilde{f_{3}}(h_{3})-\widetilde{f_{4}}(h_{4})\in(\varepsilon_{1}\mathbb{Z},\ldots,\varepsilon_{d}\mathbb{Z})

and that there are at most $11^{d}$ lattice points in $(\varepsilon_{1}\mathbb{Z},\ldots,\varepsilon_{d}\mathbb{Z})$ which are at most $5\varepsilon_{j}$ in the $j$ -th direction from the origin in all $d$ directions. Thus there is a vector $w\in(\varepsilon_{1}\mathbb{Z},\ldots,\varepsilon_{d}\mathbb{Z})$ such that

\widetilde{f_{1}}(h_{1})+\widetilde{f_{2}}(h_{2})-\widetilde{f_{3}}(h_{3})-\widetilde{f_{4}}(h_{4})+w\in\mathbb{Z}^{d}

for at least $11^{-d}\delta N^{4}$ additive tuples. Applying Lemma A.2 with $\widetilde{f_{1}}$ , $\widetilde{f_{2}}$ , $\widetilde{f_{3}}$ , and $\widetilde{f_{4}}-w$ immediately gives the desired result. ∎

We now require the notion of a Bohr set in an abelian group.

Definition A.3.

Given an abelian group $G$ and a set $S\subseteq\widehat{G}$ , we define the Bohr set of radius $\rho$ to be

B(S,\rho):=\{x\in G\colon\lVert s\cdot x\rVert_{\mathbb{R}/\mathbb{Z}}\leq\rho\emph{ for all }s\in S\}.

We first require the fact that the four-fold sumset of a set with small doubling contains a Bohr set of small dimension and large radius. This is an immediate consequence of work of Sanders [52, Theorem 1.1] which produces a large symmetric coset progression and a proposition of Milićević [48, Propositon 27] which produces a Bohr set inside a large symmetric coset progression. This is explicitly [48, Corollary 28].

Lemma A.4.

Let $A\subseteq\mathbb{Z}/N\mathbb{Z}$ be such that $|A|\geq N/K$ . Then there exists $S\subseteq\widehat{\mathbb{Z}/N\mathbb{Z}}$ with $|S|\leq\log(2K)^{O(1)}$ and $1/\rho\leq\log(2K)^{O(1)}$ such that $B(S,\rho)\subseteq 2A-2A$ .

We next require the notion of a Freiman homomorphism.

Definition A.5.

A function $f\colon A\to B$ (with $A$ and $B$ being subsets of possibly different abelian groups) is a $k$ -Freiman homorphism if for all $a_{i},a_{i}^{\prime}\in A$ satisfying

a_{1}+\cdots+a_{k}=a_{1}^{\prime}+\cdots+a_{k}^{\prime}

we have

f(a_{1})+\cdots+f(a_{k})=f(a_{1}^{\prime})+\cdots+f(a_{k}^{\prime}).

When $k$ is not specified, we will implicitly have $k=2$ .

We will also require the follow basic lemma which converts the Freiman homomorphism on a Bohr set into a “bracket” linear function on a slightly smaller Bohr set; the proof is a simplification of [23, Proposition 10.8].

Lemma A.6.

Consider $S\subseteq\widehat{\mathbb{Z}/N\mathbb{Z}}$ and $\rho\in(0,1/4)$ with Freiman homomorphism $f\colon B(S,\rho)\to\mathbb{R}/\mathbb{Z}$ . Taking $\rho^{\prime}=\rho\cdot|S|^{-2|S|}$ , we have for all $n\in B(S,\rho^{\prime})$ that

f(n)-\Big{(}\sum_{\alpha_{i}\in S}a_{i}\{\alpha_{i}n\}+\gamma\Big{)}\in\mathbb{Z},

for appropriate choices of $a_{i},\gamma\in\mathbb{R}$ .

Proof.

By [23, Proposition 10.5], we have that

B(S,\rho\cdot|S|^{-2|S|})\subseteq P\subseteq B(S,\rho)

where $P$ is a proper generalized arithmetic progression $\{\sum_{i=1}^{d}\ell_{i}n_{i}\colon n_{i}\in[\pm N_{i}]\}$ of rank $d\leq|S|$ . Furthermore $(\{\alpha\cdot\ell_{i}\})_{\alpha\in S}$ for $1\leq i\leq d$ are linearly independent as vectors in $\mathbb{R}^{S}$ .

Note that for $|n_{i}|\leq N_{i}$ , we have

(A.1)

f\bigg{(}\sum_{i=1}^{d}\ell_{i}n_{i}\bigg{)}-f(0)=\sum_{i=1}^{d}n_{i}(f(\ell_{i})-f(0)).

Furthermore letting $\Phi\colon B(S,\rho)\to\mathbb{R}^{S}$ denote $\Phi(x)=(\{\alpha\cdot x\})_{\alpha\in S}$ we have that

\Phi(x)+\Phi(y)=\Phi(x+y);

we have used crucially that $\rho<1/4$ here. Therefore, by a simple inductive argument we see

\Phi\bigg{(}\sum_{i=1}^{d}\ell_{i}n_{i}\bigg{)}=\sum_{i=1}^{d}n_{i}\Phi(\ell_{i})

if $n_{i}\in[\pm N_{i}]$ for all $1\\ lei\leq d$ .

By the above linear independence, there exists $u_{i}\in\mathbb{R}^{S}$ such that $u_{i}\cdot\Phi(\ell_{i})=1$ and $u_{i}\cdot\Phi(\ell_{j})=0$ for $j\neq i$ . Therefore if $n\in P$ is such that $n=\sum_{i=1}^{d}\ell_{i}n_{i}$ , we have that

n_{i}=u_{i}\cdot\sum_{i=1}^{d}n_{i}\Phi(\ell_{i})=u_{i}\cdot\Phi(n)=\sum_{\alpha\in S}(u_{i})_{\alpha}\cdot\{\alpha n\}.

The lemma then follows by plugging into (A.1). ∎

We now recall the definition of additive energy.

Definition A.7.

Given (finite) subsets $A_{1},A_{2},A_{3},A_{4}$ of an abelian group $G$ , define the additive energy $E(A_{1},A_{2},A_{3},A_{4})$ to be

E(A_{1},A_{2},A_{3},A_{4})=\sum_{x_{i}\in A_{i}}\mathbbm{1}[x_{1}+x_{2}=x_{3}+x_{4}]

and let $E(A)=E(A,A,A,A)$ .

Note that one has the trivial bound $E(A)\leq|A|^{3}$ . Furthermore via a standard Cauchy–Schwarz argument (similar to e.g. [58, Corollary 2.10]) we have

E(A_{1},A_{2},A_{3},A_{4})\leq\prod_{i=1}^{4}E(A_{i})^{1/4}.

Proof of Lemma A.2.

Let $\Gamma_{i}=\{(h_{i},f_{i}(h_{i})~{}\mathrm{mod}~{}\mathbb{Z}^{d})\colon h_{i}\in H_{i}\}\subseteq\mathbb{Z}\times(\mathbb{R}/\mathbb{Z})^{d}$ , which is a graph (i.e., for every $x\in\mathbb{Z}$ there is at most one $y\in(\mathbb{R}/\mathbb{Z})^{d}$ with $(x,y)\in\Gamma_{i}$ ). By assumption we have

E(\Gamma_{1},\Gamma_{2},\Gamma_{3},\Gamma_{4})\geq\delta N^{3}.

We have

E(\Gamma_{1},\Gamma_{2},\Gamma_{3},\Gamma_{4})\leq\prod_{i=1}^{4}E(\Gamma_{i})^{1/4}\leq E(\Gamma_{1})^{1/4}N^{9/4}

and therefore $E(\Gamma_{1})\geq\delta^{4}N^{3}$ . By Balog–Szemerédi–Gowers (see [23, Theorem 5.2]), there is $\Gamma^{\prime}\subseteq\Gamma_{1}$ such that $|\Gamma^{\prime}|\geq\delta^{O(1)}N$ while $|\Gamma^{\prime}-\Gamma^{\prime}|\leq\delta^{-O(1)}N$ .

Let $A=(8\Gamma^{\prime}-8\Gamma^{\prime})\cap(\{0\}\times(\mathbb{R}/\mathbb{Z})^{d})$ . Since $\Gamma^{\prime}$ is a graph, we have that $|\Gamma^{\prime}+A|=|\Gamma^{\prime}||A|$ . However $|\Gamma^{\prime}+A|\leq|9\Gamma^{\prime}-8\Gamma^{\prime}|\leq\delta^{-O(1)}N$ by the Plünnecke–Ruzsa inequality (e.g. [23, Theorem 5.3]) and thus $|A|\leq\delta^{-O(1)}$ .

Now, by abuse of notation we may view $A$ as a subset of $(\mathbb{R}/\mathbb{Z})^{d}$ . We claim there exists $T\subseteq\mathbb{Z}^{d}$ with $|T|\leq O(\log(1/\delta))$ such that $A\cap B(T,1/4)=\{0\}$ ; we give a proof which is essentially identical to that in [23, Lemma 8.3]. Note that given any $w\in(\mathbb{R}/\mathbb{Z})^{d}\setminus\{0\}$ we have

\limsup_{M\to\infty}\mathbb{P}_{v\in\{-M,\ldots,M\}^{d}}[\lVert v\cdot w\rVert_{\mathbb{R}/\mathbb{Z}}<1/4]\leq 3/4.

This follows immediately from noting that if $w$ has an irrational coordinate the probability tends to $1/2$ by Weyl’s equidistribution criterion while if $w$ is rational the limiting probability is at most say $2/3$ . Choosing an integer vector $v$ which kills at least $1/4$ of the set iteratively then immediately gives the desired lemma.

Let $\psi\colon(\mathbb{R}/\mathbb{Z})^{d}\to(\mathbb{R}/\mathbb{Z})^{T}$ be defined as $\psi(\xi)=(t(\xi))_{t\in T}$ . Now let $\tau=2^{-7}$ . By averaging there exists a cube $Q=\vec{x}+[0,\tau)^{T}$ such that

\widetilde{\Gamma}:=\{(h,f_{1}(h))\in\Gamma^{\prime}\colon\psi(f_{1}(h))\in Q\}

with $|\widetilde{\Gamma}|\geq\tau^{|T|}|\Gamma^{\prime}|$ , so $|\widetilde{\Gamma}|\geq\delta^{O(1)}N$ . Fix such a cube $Q$ .

We claim that $4\widetilde{\Gamma}-4\widetilde{\Gamma}$ is a graph. For the sake of contradiction suppose not. Then there exist $h_{1},\ldots,h_{8}$ and $h_{1}^{\prime},\ldots,h_{8}^{\prime}$ such that

	$\displaystyle h_{1}+\cdots+h_{4}-h_{5}-\cdots-h_{8}$	$\displaystyle=h_{1}^{\prime}+\cdots+h_{4}^{\prime}-h_{5}^{\prime}-\cdots-h_{8}^{\prime},$
	$\displaystyle f_{1}(h_{1})+\cdots+f_{1}(h_{4})-f_{1}(h_{5})-\cdots-f_{1}(h_{8})$	$\displaystyle\not\equiv f_{1}(h_{1}^{\prime})+\cdots+f_{1}(h_{4}^{\prime})-f_{1}(h_{5}^{\prime})-\cdots-f_{1}(h_{8}^{\prime})~{}\mathrm{mod}~{}1.$

However,

	$\displaystyle\bigg{\lVert}\psi\big{(}\big{(}f_{1}(h_{1})+\cdots+f_{1}(h_{4})-f_{1}(h_{5})-\cdots-f_{1}(h_{8})\big{)}-\big{(}f_{1}(h_{1}^{\prime})+\cdots+f_{1}(h_{4}^{\prime})-f_{1}(h_{5}^{\prime})-\cdots-f_{1}(h_{8}^{\prime})\big{)}\big{)}\bigg{\rVert}_{\infty}$
	$\displaystyle\qquad\qquad\qquad\qquad\leq 16\cdot\tau<1/4$

by definition of $\widetilde{\Gamma}$ . Since $A\cap B(T,1/4)=\{0\}$ , it follows that

\big{(}f_{1}(h_{1})+\cdots+f_{1}(h_{4})-f_{1}(h_{5})-\cdots-f_{1}(h_{8})\big{)}-\big{(}f_{1}(h_{1}^{\prime})+\cdots+f_{1}(h_{4}^{\prime})-f_{1}(h_{5}^{\prime})-\cdots-f_{1}(h_{8}^{\prime})\big{)}\in\mathbb{Z}^{d}

as desired.

Let $H^{\ast}$ denote the projection of $\widetilde{\Gamma}$ onto the first coordinate. Since $f_{1}$ is an $8$ -Freiman homomorphism on $H^{\ast}$ (because $4\widetilde{\Gamma}-4\widetilde{\Gamma}$ is a graph), we have that $f_{1}$ is a Freiman homorphism on $2H^{\ast}-2H^{\ast}$ (where $f_{1}$ is extended via linearity). We now view $H^{\ast}$ (which is a subset of integers) as a subset of $\mathbb{Z}/N^{\prime}\mathbb{Z}$ where $N^{\prime}$ is a prime in $[100N,200N]$ . Note here that $H^{\ast}\subseteq[-4N,4N]$ and thus $4\widetilde{\Gamma}-4\widetilde{\Gamma}$ when viewed as a subset of $(\mathbb{Z}/N^{\prime}\mathbb{Z})\times(\mathbb{R}/\mathbb{Z})^{d}$ is still a graph. Note that $|H^{\ast}|\geq\delta^{O(1)}N$ .

By Lemma A.4, we have that $2H^{\ast}-2H^{\ast}$ contains a Bohr set $B(S,\rho)$ with $|S|,\rho^{-1}\leq(\log(1/\delta))^{O(1)}$ . Then by applying Lemma A.6 to each coordinate of $f_{1}$ on $B(S,\rho^{\prime})\subseteq 2H^{\ast}-2H^{\ast}$ with $\rho^{\prime-1}\leq\exp(\log(1/\delta)^{O(1)})$ , we have that

(A.2)

f_{1}(h_{1})=\sum_{\alpha_{i}\in S}a_{i}\{\alpha_{i}h_{1}\}+\gamma~{}\mathrm{mod}~{}1

for all $h_{1}\in B(S,\rho^{\prime})$ , for appropriate choices of $a_{i},\gamma\in\mathbb{R}^{d}$ . Here $\alpha_{i}\in(1/N^{\prime})\mathbb{Z}$ .

We now undo this transformation and we abusively view $B(S,\rho^{\prime})\subseteq 2H^{\ast}-2H^{\ast}$ as a subset of integers in $[-4N,4N]$ instead of $\mathbb{Z}/N^{\prime}\mathbb{Z}$ , noting that the fractional part remains identical in both cases. As a slight technical annoyance, $B(S,\rho^{\prime})$ might not intersect $H^{\ast}$ . But, by Pigeonhole there exists $x^{\ast}\in[-5N,5N]$ such that $|(x^{\ast}+B(S,\rho^{\prime}/2))\cap H^{\ast}|\geq\exp(-(\log(1/\delta))^{O(1)})N$ . (This requires a lower bound on the size of a Bohr set, see [58, Lemma 4.20].)

Fix $h^{\ast}\in B(S,\rho^{\prime}/2)$ such that $x^{\ast}+h^{\ast}\in H^{\ast}$ and consider any $h_{1}\in B(S,\rho^{\prime}/2)$ such that $h_{1}+x^{\ast}\in H^{\ast}$ we have that

f_{1}(h_{1}-h^{\ast})+f_{1}(x^{\ast}+h^{\ast})=f_{1}(h_{1}+x^{\ast})+f(0)~{}\mathrm{mod}~{}1

since $4\widetilde{\Gamma}-4\widetilde{\Gamma}$ is a graph (note that $h_{1}-h^{\ast}\in B(S,\rho^{\prime})\subseteq 2H^{\ast}-2H^{\ast}$ ). Thus we have

	$\displaystyle f_{1}(h_{1}+x^{\ast})$	$\displaystyle=f_{1}(h_{1}-h^{\ast})+f_{1}(x^{\ast}+h^{\ast})-f(0)~{}\mathrm{mod}~{}1$
		$\displaystyle=\sum_{\alpha_{i}\in S}a_{i}\{\alpha_{i}((h_{1}+x^{\ast})-(x^{\ast}+h^{\ast}))\}+\gamma^{\prime}~{}\mathrm{mod}~{}1$

The second line holds since $x^{\ast},h^{\ast}$ are viewed as fixed and $h_{1}-h^{\ast}\in B(S,\rho^{\prime})$ hence we may apply (A.2).

So, letting $H^{\prime}$ be the set of values $h_{1}+x^{\ast}\in H_{1}$ where $h_{1}\in B(S,\rho^{\prime}/2)$ , this nearly gives the desired result. The only issue is that there are shifts inside the brackets. Note that

\displaystyle\{z_{1}+z_{2}\}

\displaystyle=\begin{cases}\{z_{1}\}+\{z_{2}\}-1\text{ if }\{z_{1}\}+\{z_{2}\}>1/2,\\ \{z_{1}\}+\{z_{2}\}+1\text{ if }\{z_{1}\}+\{z_{2}\}\leq-1/2,\\ \{z_{1}\}+\{z_{2}\}\text{ otherwise.}\end{cases}

Given this, we may Pigeonhole possible values $h_{1}+x^{\ast}$ into one of $3^{|S|}$ cases based on the corresponding shift for each $\alpha_{i}\in S$ . Applying the above relation with $z_{1}=\alpha_{i}(h_{1}+x^{\ast})$ and $z_{2}=-\alpha_{i}(x^{\ast}+h^{\ast})$ and taking the most common case then gives the desired result. ∎

Appendix B Miscellaneous deferred results

We first require the following elementary lemma which will be used in the following deduction.

Lemma B.1.

Fix an integer $H\geq 2$ . Consider vectors $v_{1},\ldots,v_{\ell}\in\mathbb{Z}^{d}$ with integer coordinates bounded by $H$ and $w\in\mathbb{R}^{d}$ such that $\operatorname{dist}(v_{i}\cdot w,\mathbb{Z})\leq\delta$ for $1\leq i\leq\ell$ . We may write $w=w_{\mathrm{small}}+w_{\mathrm{rat}}+(w-w_{\mathrm{small}}-w_{\mathrm{rat}})$ where $w_{\mathrm{rat}}$ has coordinates which are rationals with denominators bounded by $H^{O(d^{O(1)})}$ , $\lVert w_{\mathrm{small}}\rVert_{\infty}\leq\delta\cdot H^{O(d^{O(1)})}$ , and $v_{i}\cdot(w-w_{\mathrm{small}}-w_{\mathrm{rat}})=0$ for $1\leq i\leq\ell$ .

Proof.

Note that by passing to a subset we may assume that $v_{1},\ldots,v_{\ell}\in\mathbb{Z}^{d}$ are linearly independent. By Cramer’s rule, there exist $w_{1},\ldots,w_{\ell}\in\mathbb{R}^{d}$ which have coordinates which are height $H^{O(d^{O(1)})}$ rationals such that $w_{j}\cdot v_{k}=\mathbbm{1}_{j=k}$ . Taking $w_{\mathrm{rat}}=\sum_{j=1}^{\ell}(v_{j}\cdot w-\{v_{j}\cdot w\})\cdot w_{j}$ and $w_{\mathrm{small}}=\sum_{j=1}^{\ell}\{v_{j}\cdot w\}\cdot w_{j}$ we immediately have the desired result. Recall that we have chosen the fractional part $\{\cdot\}$ to live within $(-1/2,1/2]$ . ∎

We now prove the following elementary lemma which takes a set of horizontal characters (at potentially different levels) and produces a factorization.

Lemma B.2.

Consider a nilmanifold $G/\Gamma$ of degree-rank $(s,r)$ of dimension $d$ and complexity $M$ . Consider a polynomial sequence $g$ such that $g(0)=\mathrm{id}_{G}$ and consider a set of horizontal characters $\psi_{i,j}$ for $1\leq j\leq\ell_{i}$ and where $\psi_{i,\cdot}$ is an $i$ -th horizontal character of height at most $H$ . Furthermore suppose that for all $i,j$ ,

\operatorname{dist}(\psi_{i,j}(\operatorname{Taylor}_{i}(g)),\mathbb{Z})\leq H\cdot N^{-i}.

Then one may factor

g=\varepsilon\cdot g^{\prime}\cdot\gamma

where:

•

$\varepsilon(0)=g^{\prime}(0)=\gamma(0)=\mathrm{id}_{G}$ ;
•

$\psi_{i,j}(\operatorname{Taylor}_{i}(g^{\prime}))=0$ ;
•

$\gamma$ is $(MH)^{O_{s}(d^{O_{s}(1)})}$ -rational;
•

$d_{G}(\varepsilon(n),\varepsilon(n-1))\leq(MH)^{O_{s}(d^{O_{s}(1)})}\cdot N^{-1}$ for $n\in[N]$ .

Proof.

By the classification of polynomial sequences in terms of coordinates of the second-kind, we have that

g(n)=\exp\Big{(}\sum_{k=1}^{s}\binom{n}{k}g_{k}\Big{)}

for some $g_{k}\in\log(G_{(k,0)})=\log(G_{(k,1)})$ . Note that

\operatorname{Taylor}_{i}(g)=\exp(g_{k})~{}\mathrm{mod}~{}G_{(i,2)}

and note that each $\psi_{i,j}$ can be descended to a linear map on $\log(G_{(i,1)})$ with the property that $\psi_{i,j}(\log(\Gamma\cap G_{(i,1)}))\in\mathbb{Z}$ and $\psi_{i,j}(\log(G_{(i,2)}))=0$ . That $\psi_{i,j}$ descends uses the fact that $\log(x)+\log(y)\equiv\log(xy)~{}\mathrm{mod}~{}\log(G_{(i,2)})$ for $x,y\in G_{(i,1)}$ , which follows from Baker–Campbell–Hausdorff.

We now apply Lemma B.1. As $\operatorname{dist}(\psi_{i,j}(\operatorname{Taylor}_{i}(g)),\mathbb{Z})\leq H\cdot N^{-i}$ by assumption, we may write $g_{i}=g_{i,\mathrm{small}}+g_{i,\mathrm{rat}}+(g_{i}-g_{i,\mathrm{small}}-g_{i,\mathrm{rat}})$ such that $g_{i,\mathrm{rat}}$ is an $H^{O_{s}(d^{O_{s}(1)})}$ -rational combination of elements in $\mathcal{X}\cap\log(G_{(i,1)})$ , such that $\lVert g_{i,\mathrm{small}}\rVert_{\infty}\leq(MH)^{O_{s}(d^{O_{s}(1)})}\cdot N^{-i}$ , and such that $\psi_{i,j}(g_{i}-g_{i,\mathrm{small}}-g_{i,\mathrm{rat}})=0$ . Defining

\gamma:=\exp\Big{(}\sum_{k=1}^{s}\binom{n}{k}g_{k,\mathrm{rat}}\Big{)},\quad\varepsilon:=\exp\Big{(}\sum_{k=1}^{s}\binom{n}{k}g_{k,\mathrm{small}}\Big{)},

and $g^{\prime}:=\varepsilon^{-1}g\gamma^{-1}$ , we immediately have that $\gamma\Gamma$ is $(MH)^{O_{s}(d^{O_{s}(1)})}$ -periodic by [42, Lemma B.14]. That $\varepsilon$ is sufficiently smooth is an immediate consequence of [42, Lemmas B.1, B.3]. ∎

We next require the following result regarding the existence of a nilmanifold partition of unity. As a remark, a similar statement (e.g. with $\sum_{j}\tau_{j}=1$ ) appears as [45, Lemma 2.4]. The proof there, strangely, does not adapt in a straightforward manner to here.

Lemma B.3.

Fix $\varepsilon\in(0,1/2)$ and a nilmanifold $G/\Gamma$ of degree $s$ , dimension $d$ , and complexity $M$ . There exists an index set $I$ and a collection of nonnegative smooth functions $\tau_{j}\colon G/\Gamma\to\mathbb{R}^{\geq 0}$ for $j\in I$ such that:

•

For all $g\in G$ , we have $\sum_{j\in I}\tau_{j}(g\Gamma)^{2}=1$ ;
•

$|I|\leq(1/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ ;
•

For each $j\in I$ , there exists $\beta\in[-2,2]^{d}$ so that for any $g\Gamma\in\operatorname{supp}(\tau_{j})$ there exists $g^{\prime}\in g\Gamma$ such that $\psi_{G}(g^{\prime})\in\prod_{i=1}^{d}[\beta_{i}-\varepsilon,\beta_{i}+\varepsilon]$ ;
•

$\tau_{j}$ are $(M/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz on $G/\Gamma$ ;
•

For any $g\in G$ , $g\Gamma$ is contained in the support of at most $2^{O_{s}(d)}$ terms.

Proof.

We will prove the statement inductively based on the degree of the nilmanifold. For degree $1$ nilmanifolds $G$ , note that $G\simeq\mathbb{T}^{d}$ .There exists a set of function $\rho_{1},\ldots,\rho_{2k}\colon\mathbb{T}\to\mathbb{R}^{\geq 0}$ such that:

•

$\operatorname{supp}(\rho_{j})\subseteq[j/(2k),j/(2k)+1/k]~{}\mathrm{mod}~{}1$ ;
•

$\sum_{j=1}^{2k}\rho_{j}^{2}=1$ ;
•

$\rho_{j}$ are $O(1/k)$ -Lipschitz.

Taking $k=O(\varepsilon^{-1})$ , we have that

1=\sum_{(j_{1},\ldots,j_{d})\in[2k]^{d}}\prod_{\ell=1}^{d}\rho_{j_{\ell}}((\psi_{G}(g))_{\ell})^{2}

where $(\psi_{G})_{\ell}$ denotes the $\ell$ -th coordinate of $\psi_{G}$ . For $\vec{j}\in[2k]^{d}$ we take

\tau_{\vec{j}}(g)=\prod_{\ell=1}^{d}\rho_{j_{\ell}}((\psi_{G}(g))_{\ell})

and note that this function is $\Gamma$ -invariant since multiplying by an element in $\Gamma$ shifts all coordinates by an integer. Furthermore, by [42, Lemma B.3] we have that the standard $\ell^{\infty}$ -metric on $G/\Gamma$ is equivalent to $d_{G/\Gamma}$ up to a factor of $O(M)^{O(d^{O(1)})}$ . This completes the proof in this case.

When considering the case of a degree $s\geq 2$ filtration on $G$ , suppose that $G_{0}=G_{1}\geqslant G_{2}\geqslant\cdots\geqslant G_{s}\geqslant\mathrm{Id}_{G}$ is the given filtration. Note that if $\mathcal{X}=\{X_{1},\ldots,X_{d}\}$ is the adapted Mal’cev basis for $G/\Gamma$ then

\widetilde{\mathcal{X}}:=\{X_{1},\ldots,X_{\dim(G)-\dim(G_{s})}\}~{}\mathrm{mod}~{}\log(G_{s})

is a valid Mal’cev basis for $\widetilde{G}:=G/G_{s}$ . Furthermore define $\widetilde{\Gamma}:=\Gamma/(\Gamma\cap G_{s})$ . The complexity of $\widetilde{\mathcal{X}}$ is always bounded by $M$ by definition. The filtration on $\widetilde{G}$ is lower degree.

By induction, we have functions $(\tau_{j})_{j\in I}$ with $|I|\leq(M/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ such that

1=\sum_{j\in I}\widetilde{\tau_{j}}(\widetilde{g}\widetilde{\Gamma})^{2}

and satisfying various other appropriate properties. We may lift these functions to $G/\Gamma$ via

\tau_{j}(g\Gamma)=\widetilde{\tau_{j}}((g~{}\mathrm{mod}~{}G_{s})\widetilde{\Gamma}).

Note that this is well-defined since $g\Gamma~{}\mathrm{mod}~{}G_{s}=(g~{}\mathrm{mod}~{}G_{s})\cdot(\Gamma~{}\mathrm{mod}~{}G_{s})=(g~{}\mathrm{mod}~{}G_{s})\widetilde{\Gamma}$ .

We view each $\tau_{j}$ as a function on $\prod_{i=1}^{\dim(\widetilde{G})}(\beta_{i}-1/2,\beta_{i}+1/2]\times\mathbb{T}^{\dim(G_{s})}$ which only depends on the first $\dim(\widetilde{G})$ coordinates and such that the support is only within some $\prod_{i=1}^{\dim(\widetilde{G})}[\beta_{i}-\varepsilon,\beta_{i}+\varepsilon]\times\mathbb{T}^{\dim(G_{s})}$ . This is via identifying the fundamental domain of $G/\Gamma$ via Mal’cev coordinates of the second-kind (see the proof of [42, Lemma B.6]). We let $\psi_{\beta}\colon G/\Gamma\to\prod_{i=1}^{\dim(\widetilde{G})}(\beta_{i}-1/2,\beta_{i}+1/2]\times\mathbb{T}^{\dim(G_{s})}$ denote this identification. (Note that the choice of $\beta$ depends on $j\in I$ , which we will fix through the remainder of the proof.)

We now have

\tau_{j}(g\Gamma)^{2}=\widetilde{\tau_{j}}((g~{}\mathrm{mod}~{}G_{s})\widetilde{\Gamma})^{2}\cdot\sum_{(t_{1},\ldots,t_{\dim(G_{s})})\in[2k]^{\dim(G_{s})}}\prod_{\ell=1}^{\dim(G_{s})}\rho_{t_{\ell}}((\psi_{\beta}(g\Gamma))_{\ell+\dim(\widetilde{G})})^{2}

where $k=O(1/\varepsilon)$ and $\rho$ are defined as above.

The fact that each piece

\tau_{j,\vec{t}}(g\Gamma)^{2}:=\tau_{j}(g\Gamma)^{2}\cdot\prod_{\ell=1}^{\dim(G_{s})}\rho_{t_{\ell}}((\psi_{\beta}(g\Gamma))_{\ell+\dim(\widetilde{G})})^{2}

is $\Gamma$ -invariant on the right is trivial by construction, and the sum of squares property is trivial.

Identifying $\rho_{j}$ with a function $\mathbb{R}\to\mathbb{R}^{\geq 0}$ where $\operatorname{supp}(\rho_{j})\subseteq[j/(2k),j/(2k)+1/k]$ , we may identify $\tau_{j,\vec{t}}$ with a function on the fundamental domain (with respect to second-kind coordinates) of the form

\prod_{i=1}^{\dim(\widetilde{G})}(\beta_{i}-1/2,\beta_{i}+1/2]\times\prod_{\ell=1}^{\dim(G_{s})}((t_{\ell}+1)/(2k)-1/2,(t_{\ell}+1)/(2k)+1/2].

To check that this function is sufficiently Lipschitz, we note that each element $g\Gamma$ has a unique representative in this domain.

Consider $\tau_{j,\vec{t}}(x\Gamma)$ and $\tau_{j,\vec{t}}(y\Gamma)$ ; by multiplying by the lattice we may assume that $\psi(x),\psi(y)$ are in the specified fundamental domain. Furthermore if $d_{G/\Gamma}(x\Gamma,y\Gamma)\geq\varepsilon^{\prime}=M^{-O_{s}(d^{O}_{s}(1))}$ we immediately win as $\tau_{j,\vec{t}}$ is $1$ -bounded. We claim that if $d_{G/\Gamma}(x\Gamma,y\Gamma)\leq\varepsilon^{\prime}$ then $d_{G/\Gamma}(x\Gamma,y\Gamma)=d_{G}(x,y)$ . In particular, note that

\displaystyle d_{G/\Gamma}(x\Gamma,y\Gamma)

\displaystyle=\min_{\gamma\in\Gamma}d_{G}(x\gamma,y)

and that

\min_{\gamma\in\Gamma\setminus\{\mathrm{id}_{G}\}}d_{G}(x\gamma,y)\geq M^{-O_{s}(d^{O_{s}(1)})}\cdot\min_{\gamma\in\Gamma\setminus\{\mathrm{id}_{G}\}}d_{G}(\gamma,x^{-1}y)\geq M^{-O_{s}(d^{O}_{s}(1))}

which gives the desired contradiction assuming that various implicit constants defining $\varepsilon^{\prime}$ are chosen appropriately.

Now we may assume that $x,y$ are such that

\psi(x),\psi(y)\in\prod_{i=1}^{\dim(\widetilde{G})}[\beta_{i}-2\varepsilon,\beta_{i}+2\varepsilon)\times\prod_{\ell=1}^{\dim(G_{s})}[t_{\ell}/(2k)-\varepsilon,t_{\ell}/(2k)+1/k+\varepsilon),

else both function values vanish (again supposing $\varepsilon^{\prime}$ is sufficiently small). This is because $d_{G}(x,y)$ is equivalent to $\lVert\psi(x)-\psi(y)\rVert_{\infty}$ (up to a factor of $M^{O_{s}(d^{O_{s}(1)})}$ ) for bounded elements by [42, Lemma B.3]), and due to the condition on the support of $\rho_{t_{\ell}}$ .

In particular, $\psi(x),\psi(y)$ are seen to lie in the interior of the domain. The result then follows immediately noting that $\tau_{j}$ is appropriately Lipschitz and $\rho_{t_{\ell}}$ is an appropriately Lipschitz function on $\mathbb{R}$ . The claim that $g\Gamma$ is contained in the support of at most $2^{O_{s}(d)}$ terms follows trivially by construction. ∎

Given this we are now in position to show the existence of nilcharacters on $G/\Gamma$ .

Lemma B.4.

Fix $\varepsilon\in(0,1/2)$ and a nilmanifold $G/\Gamma$ of degree $s$ , dimension $d$ , and complexity $M$ . Fix $\eta$ a vertical $G_{s}$ -frequency with height bounded by $M$ . There exists a nilcharacter $F$ with frequency $\eta$ such that the output dimension is bounded by $2^{O_{s}(d^{O_{s}(1)})}$ and each coordinate is $O_{s}(M)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz.

Proof.

Let $\widetilde{G}=G/G_{s}$ and $\widetilde{\Gamma}=\Gamma/(\Gamma\cap G_{s})$ . Apply Lemma B.3 on $\widetilde{G}/\widetilde{\Gamma}$ with $\varepsilon=1/4$ to obtain $\widetilde{\tau_{j}}$ for $j\in I$ . For $\eta=0$ , we may take the coordinates of $F$ to be

\tau_{j}(g\Gamma)=\widetilde{\tau_{j}}((g~{}\mathrm{mod}~{}G_{d})\widetilde{\Gamma}).

In general, for appropriate $\beta$ depending on $j$ , we have that $g\Gamma$ is naturally identified with a unique point inside $\prod_{i=1}^{\dim(\widetilde{G})}(\beta_{i}-1/2,\beta_{i}+1/2]\times\mathbb{T}^{\dim(G_{s})}$ as in the proof of Lemma B.3 and we let $\psi_{\beta}(g\Gamma)$ denote this map. The key point is to write

\tau_{j}(g\Gamma)=\widetilde{\tau_{j}}((g~{}\mathrm{mod}~{}G_{s})\widetilde{\Gamma})\cdot\exp(\eta\cdot\psi_{\beta}(g\Gamma))

and note that $\sum_{j\in I}|\tau_{j}(g\Gamma)|^{2}=1$ as before. Here we have identified $\eta$ with an integer vector using the last $\dim(G_{s})$ elements of the Mal’cev basis and extending by $0$ . Note that this is trivially a function on $G/\Gamma$ and by construction it has the $G_{s}$ -vertical frequency $\eta$ . The only technical point is verifying that this function is indeed Lipschitz, which we check for each coordinate $\tau_{j}$ .

Consider $x\Gamma$ and $y\Gamma$ . If $\tau_{j}(x\Gamma)=\tau_{j}(y\Gamma)=0$ the Lipschitz condition is obviously satisfied. Thus at least one value is nonzero, and without loss of generality we may assume $\tau_{j}(x\Gamma)\neq 0$ . Furthermore, noting that $\tau_{j}$ is $1$ -bounded, we may assume that $d_{G/\Gamma}(x\Gamma,y\Gamma)\leq M^{-O_{s}(d^{O_{s}(1)})}$ . As $\tau_{j}(x\Gamma)\neq 0$ , possibly shifting $x$ on the right by an element in the lattice allows us to assume

\psi(x)\in\prod_{i=1}^{\dim(\widetilde{G})}(\beta_{i}-1/4,\beta_{i}+1/4]\times(0,1]^{\dim(G_{s})}.

Via an argument analogous to that in the proof of Lemma B.3, there exists $y^{\prime}$ such that $y^{\prime}\Gamma=y\Gamma$ ,

\psi(y^{\prime})\in\prod_{i=1}^{\dim(\widetilde{G})}(\beta_{i}-1/3,\beta_{i}+1/3]\times(-1/2,3/2]^{\dim(G_{s})},

and $\lVert\psi(x)-\psi(y^{\prime})\rVert_{\infty}\leq M^{O_{s}(d^{O_{s}(1)})}d_{G/\Gamma}(x\Gamma,y\Gamma)$ . Since $\vec{z}\mapsto\exp(\eta\cdot\vec{z})$ is an appropriately Lipschitz function on the torus if $\eta\in\mathbb{Z}^{\dim(G_{s})}$ , the desired result follows immediately. ∎

We will also require the following converse of the $U^{s+1}$ -inverse theorem; this is verbatim in [32, Appendix G] modulo various complexity details being omitted.

Lemma B.5.

Fix $\varepsilon\in(0,1/2)$ and let $G/\Gamma$ be a degree $s$ nilmanifold of dimension $d$ and complexity $M$ , and let $g(n)$ be a polynomial sequence with respect to this filtration. Furthermore let $F\colon G/\Gamma\to\mathbf{C}$ satisfy $\lVert F\rVert_{\mathrm{Lip}}\leq M$ . If $f\colon[N]\to\mathbb{C}$ is a $1$ -bounded function such that

\big{|}\mathbb{E}_{n\in[N]}f(n)\overline{F(g(n)\Gamma)}\big{|}\geq\varepsilon,

then

\lVert f\rVert_{U^{s+1}[N]}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})}.

Proof.

In the degenerate case when $s=0$ , we take a degree $s$ nilsequence of complexity $M$ to be a constant function $\psi$ bounded by $M$ . This implies that

|\mathbb{E}_{n\in[N]}f(n)|\geq\varepsilon/M

and by Cauchy–Schwarz we have

\mathbb{E}_{n,n^{\prime}\in[N]}f(n)\overline{f(n^{\prime})}\geq(\varepsilon/M)^{2}.

By unwinding definitions this implies the case $s=0$ .

For larger $s$ , by applying [42, Lemma A.6] we may assume that

\big{|}\mathbb{E}_{n\in[N]}f(n)\overline{F_{\xi}(g(n)\Gamma)}\big{|}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})}

where $F_{\xi}$ is a $(M/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz function with $G_{s}$ -vertical frequency $\xi$ bounded in height by $(M/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ , after Pigeonhole. Cauchy–Schwarz implies that

\mathbb{E}_{n,n^{\prime}\in[N]}f(n)\overline{f(n^{\prime})}F_{\xi}(g(n^{\prime})\Gamma)\overline{F_{\xi}(g(n)\Gamma)}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})}.

Note that we may rewrite this as

\mathbb{E}_{n\in[N],h\in[\pm N]}f(n)\overline{f(n+h)}F_{\xi}(g(n+h)\Gamma)\overline{F_{\xi}(g(n)\Gamma)}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})},

where we extend $f$ by $0$ in the usual manner. We define

G^{\Box}=\{(g,g^{\prime})\colon g,g^{\prime}\in G,g^{-1}g^{\prime}\in G_{2}\}

and note that this has a filtration $(G^{\Box})_{i}=\{(g,g^{\prime})\colon g,g^{\prime}\in G_{i},g^{-1}g^{\prime}\in G_{i+1}\}$ by [42, Lemma A.3] (with $G^{\Box}=(G^{\Box})_{1}$ ). Let $\Gamma^{\Box}=(\Gamma\times\Gamma)\cap G^{\Box}$ and note that

\widetilde{F}_{\xi}((x,y)(\Gamma\times\Gamma)):=F_{\xi}(x\Gamma)\overline{F_{\xi}(y\Gamma)}

is invariant under $G^{\Box}_{s}$ . Note that $\widetilde{F}_{\xi}$ is $(M/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz on $G\times G$ and on $G^{\Box}$ , and $G^{\Box}/\Gamma^{\Box}$ is a nilmanifold of appropriate complexity by [42, Lemma A.3].

Let

(g(0),g(h))=\{(g(0),g(h))\}\cdot[(g(0),g(h))]

with $d_{G\times G}(\{(g(0),g(h))\})\leq M^{O_{s}(d^{O_{s}(1)})}$ and $[(g(0),g(h))]\in\Gamma\times\Gamma$ . Define

g_{h}^{\prime}(n)=\{(g(0),g(h))\}^{-1}(g(n),g(n+h))[(g(0),g(h))]^{-1};

this is easily seen to be a polynomial sequence with respect to $G^{\Box}$ . Thus

\mathbb{E}_{n\in[N],h\in[\pm N]}f(n)\overline{f(n+h)\widetilde{F}_{\xi}(\{(g(0),g(h))\}g_{h}^{\prime}(n)(\Gamma\times\Gamma))}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})}.

Define $\widetilde{F}_{\xi,h}(x,y):=\widetilde{F}_{\xi}(\{(g(0),g(h))\}(x,y)(\Gamma\times\Gamma))$ and note that it is $(M/\varepsilon)^{O_{s}(d^{O_{s}(1)})}$ -Lipschitz on $G\times G$ and on $G^{\Box}$ by [42, Lemma B.4]. Applying the triangle inequality and restricting to $G^{\Box}$ we have

\mathbb{E}_{h\in[\pm N]}\Big{|}\mathbb{E}_{n\in[N]}\Delta_{h}f(n)\cdot\overline{\widetilde{F}_{\chi,h}(g_{h}^{\prime}(n)\Gamma^{\Box}})\Big{|}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})}.

Since $\widetilde{F}_{\xi}$ is invariant under $(G^{\Box})_{s}$ , passing to $G^{\Box}/(G^{\Box})_{s}$ gives a nilmanifold of degree $(s-1)$ and complexity $M^{O_{s}(d^{O_{s}(1)})}$ . Thus we may apply by induction, and deduce that

\mathbb{E}_{h\in[\pm N]}\lVert\Delta_{h}f\rVert_{U^{s}[N]}\geq(\varepsilon/M)^{O_{s}(d^{O_{s}(1)})}.

Since

\mathbb{E}_{h\in[\pm N]}\lVert\Delta_{h}f\rVert_{U^{s}[N]}^{2^{s}}\lesssim_{s}\lVert f\rVert_{U^{s+1}[N]}^{2^{s+1}},

the desired result follows. ∎

We now check the deferred Lemma 11.3.

Proof of Lemma 11.3.

We first construct a weak basis for $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ . Note that each element in $(g,g^{\prime})\in G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ may be written as

(g,g^{\prime})=(g,\mathrm{id}_{G_{\mathrm{Lin}}})\cdot(\mathrm{id}_{G_{\mathrm{Quot}}},g^{\prime}).

Consider $\widetilde{e}_{i,j}$ and consider $(r-1)$ -fold commutators of $\widetilde{e}_{i_{1},j_{1}},\ldots,\widetilde{e}_{i_{r},j_{r}}$ with $i_{1}+\cdots+i_{r}\leq s-2$ or $i_{1}+\cdots+i_{r}=s-1$ , $r\leq r^{\ast}$ and at most one generator has $i_{\ell}>D_{i_{\ell}}^{\ast}$ . We define the type of the commutator to be given by the multiset $\{\widetilde{e}_{i_{1},j_{1}},\ldots,\widetilde{e}_{i_{r},j_{r}}\}$ and we say that said type is linear if $i_{\ell}>D_{i_{\ell}}^{\ast}$ for exactly one index $\ell$ . We define the degree of a commutator to be $i_{1}+\cdots+i_{r}$ . As discussed in Lemmas 10.4 and 10.10, commutators of all types span $\log(G_{\mathrm{Quot}})$ and commutators of linear type span $\log(G_{\mathrm{Lin}})$ , and all relations between these elements are spanned by relations between commutators of the same type of height $O_{s}(1)$ .

Given this, for each collection of commutators of a given type choose a subset which “spans the type” (similar to in the proof of Lemma 10.4). Let $\mathcal{X}_{1}$ denote the set of selected commutators and $\mathcal{X}_{2}$ denote the selected commutators which are of linear type. Our weak basis for $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ will be

\mathcal{X}=\{(X,0)\colon X\in\mathcal{X}_{1}\}\cup\{(0,X)\colon X\in\mathcal{X}_{2}\};

this is seen to be a basis for the Lie algebra of $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ . That it spans is trivial, and if there were a relation note that there could be no elements of the form $(X,0)$ in the relation since projecting onto the first coordinate we recover multiplication in $G_{\mathrm{Quot}}$ . Given that there are no elements of the form $(X,0)$ , within this relation multiplication then acts exactly as in $G_{\mathrm{Lin}}$ and the result claimed independence follows.

We give $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ a multidegree filtration by taking the multidegree filtration of $G_{\mathrm{Multi}}$ and intersecting with the subgroup of elements of the form $(0,(g,g_{1}))$ . We see that all the subgroups of the filtration are in fact spanned subsets by subsets of $\mathcal{X}$ . This is simply by taking the generators in $\mathcal{X}$ of the appropriate degree-rank; for instance

(G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}})_{(0,d)}=\{(g,g_{1})\colon g\in(G_{\mathrm{Quot}})_{(d,0)},g_{1}\in(G_{\mathrm{Quot}})_{(d,0)}\cap G_{\mathrm{Lin}}\}

and we take the subsets of $\{(X,0)\colon X\in\mathcal{X}_{1}\}$ and $\{(0,X)\colon X\in\mathcal{X}_{2}\}$ where $X$ has degree at least $d$ . This is similarly true for $\bigvee_{|\vec{i}|=k}(G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}})_{(i_{1},i_{2})}$ which will ultimately form the underlying degree filtration for $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ . Furthermore ordering the basis according to whether they lie in the degree ordering associated to $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ proves that the basis has the degree $O_{s}(1)$ nesting property. Thus it suffices to check the complexity of various commutators.

Note the identity

\displaystyle[V,W]

\displaystyle=\frac{d}{ds}\frac{d}{dt}\exp(sV)\exp(tW)\exp(-sV)\exp(-tW)\bigg{|}_{s,t=0}

which holds for any Lie group and the associated Lie bracket. It is therefore immediate that

[(X,0),(X^{\prime},0)]=([X,X^{\prime}],0)\text{ and }[(0,X),(0,X^{\prime})]=(0,[X,X^{\prime}]),

and we have

	$\displaystyle[(X,0),(0,X^{\prime})]$
	$\displaystyle\qquad=\frac{d}{ds}\frac{d}{dt}(\exp(sX),\mathrm{id}_{G_{\mathrm{Lin}}})\cdot(\mathrm{id}_{G_{\mathrm{Quot}}},\exp(tX^{\prime}))\cdot(\exp(-sX),\mathrm{id}_{G_{\mathrm{Lin}}})\cdot(\mathrm{id}_{G_{\mathrm{Quot}}},\exp(-tX^{\prime}))\bigg{\|}_{s,t=0}$
	$\displaystyle\qquad=\frac{d}{ds}\frac{d}{dt}(\mathrm{id}_{G_{\mathrm{Quot}}},\exp(sX)\exp(tX^{\prime})\exp(-sX)\exp(-tX^{\prime}))\bigg{\|}_{s,t=0}$
	$\displaystyle\qquad=(0,[X,X^{\prime}]).$

This immediately implies that the structure constants associated to the weak basis $\mathcal{X}$ are of height $O_{s}(1)$ .

When including the semi-direct action, we will use the weak basis given by taking elements $\log((\vec{e}_{ij},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}})))$ where $\vec{e}_{i,j}$ denotes the elementary basis vector in the corresponding direction in $R$ , placed at the start of $\mathcal{X}$ . This is easily seen to preserve the nesting property.

To compute the associated structure constants, first note that

[(\vec{e}_{ij},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}})),(0,(g,\mathrm{id}_{G_{\mathrm{Lin}}}))]=\mathrm{id}_{G_{\mathrm{Multi}}}

and thus the all Lie bracket structure constants of the corresponding form vanish. Furthermore note that

\displaystyle[\log((\vec{e}_{ij},(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))),(0,(0,X^{\prime}))]=\frac{d}{ds}\frac{d}{dt}(0,(\exp(tX^{\prime})^{s\cdot\vec{e}_{i,j}},\mathrm{id}_{G_{\mathrm{Lin}}}))\bigg{|}_{s,t=0}.

We have that if the type of $X^{\prime}$ does not contain $\widetilde{e}_{i,j}$ then $\exp(tX^{\prime})^{s\cdot\vec{e}_{i,j}}=\mathrm{id}_{G_{\mathrm{Quot}}}$ and otherwise $\exp(tX^{\prime})^{s\cdot\vec{e}_{i,j}}=\exp(stX^{\prime})$ (recall the definition of exponentiation by elements of $R$ given in Section 11.1). In either case the structure constant is appropriately rational. Therefore we may construct a Mal’cev basis adapted to $G_{\mathrm{Multi}}$ with the appropriate complexity by applying [42, Lemma B.11] to $\mathcal{X}$ to construct a Mal’cev basis for $G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}$ and adding the semi-direct Mal’cev basis elements described above to the front of the list. We define this basis to be $\mathcal{X}_{\mathrm{Multi}}$ and define initial segment corresponding to the semi-direct Mal’cev basis elements to be $\mathcal{X}_{\mathrm{Multi},R}$ and the remaining elements to be $\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}}$ .

We finally check the that $F_{\mathrm{Multi}}$ is an appropriately Lipschitz function. Let $\delta$ be defined as in Section 11.1. Fix a pair $x,y\in G_{\mathrm{Multi}}$ . Note that if

d_{G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}}(x\Gamma_{\mathrm{Multi}},y\Gamma_{\mathrm{Multi}})\geq\delta^{O_{s}(d^{O_{s}(1)})}

we have that

\frac{F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})-F_{\mathrm{Multi}}(y\Gamma_{\mathrm{Multi}})}{d_{G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}}(x\Gamma_{\mathrm{Multi}},y\Gamma_{\mathrm{Multi}})}\leq\delta^{-O_{s}(d^{O_{s}(1)})}\cdot 2\lVert F_{\mathrm{Multi}}\rVert_{\infty}

which is sufficiently bounded. Therefore to check the Lipschitz constant it suffices to consider $x,y$ such that $d_{G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}}(x\Gamma_{\mathrm{Multi}},y\Gamma_{\mathrm{Multi}})\leq\delta^{O_{s}(d^{O_{s}(1)})}$ (where the implicit constants are chosen sufficiently large for the remainder of the argument). By multiplying by elements in the lattice, we may assume that $d_{G_{\mathrm{Multi}}/\Gamma_{\mathrm{Multi}}}(x\Gamma_{\mathrm{Multi}},y\Gamma_{\mathrm{Multi}})=d_{G_{\mathrm{Multi}}}(x,y)$ , that $F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})\neq 0$ , and

\psi_{\mathcal{X}_{\mathrm{Multi}}}(x)\in[-1/2,1/2)^{\dim(G_{\mathrm{Multi}})}.

Note that to assume that $F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})\neq 0$ we may need to swap $x$ and $y$ (if both are zero there is nothing to check with respect to the Lipschitz constant).

Since $F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})\neq 0$ we in fact have that the first $\sum_{i=1}^{s-1}D_{i}^{\mathrm{Lin}}$ coordinates of $\psi_{\mathcal{X}_{\mathrm{Multi}}}(x)$ are in $[-1/2+\delta,1/2-\delta]$ . This implies, due to the distance bound between $x$ and $y$ and by [42, Lemma B.3], that the first $\sum_{i=1}^{s-1}D_{i}^{\mathrm{Lin}}$ coordinates of $\psi_{\mathcal{X}_{\mathrm{Multi}}}(y)$ are in $[-1/2+\delta/2,1/2-\delta/2]$ . Therefore if $x=(t_{1},(g_{1},g_{1}^{\prime}))$ and $y=(t_{2},(g_{2},g_{2}^{\prime}))$ then

	$\displaystyle F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})$	$\displaystyle=F^{\ast}(g_{1}^{\prime}\Gamma_{\mathrm{Quot}})\cdot\prod_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}}\phi((t_{1})_{i,j}),$
	$\displaystyle F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})$	$\displaystyle=F^{\ast}(g_{2}^{\prime}\Gamma_{\mathrm{Quot}})\cdot\prod_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}}\phi((t_{2})_{i,j}).$

Note that

	$\displaystyle\|F_{\mathrm{Multi}}(x\Gamma_{\mathrm{Multi}})-F_{\mathrm{Multi}}(y\Gamma_{\mathrm{Multi}})\|$	$\displaystyle\leq\lVert F^{\ast}\rVert_{\infty}\cdot\bigg{\|}\prod_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}}\phi((t_{1})_{i,j})-\prod_{\begin{subarray}{c}1\leq i\leq s-1\\ D_{i}^{\ast}<j\leq D_{i}^{\ast}+D_{i}^{\mathrm{Lin}}\end{subarray}}\phi((t_{2})_{i,j})\bigg{\|}$
		$\displaystyle+\|F^{\ast}(g_{1}^{\prime}\Gamma_{\mathrm{Quot}})-F^{\ast}(g_{2}^{\prime}\Gamma_{\mathrm{Quot}})\|,$

where we have used that $\phi$ is $1$ -bounded. Next note that distance in $\psi_{\mathcal{X},\mathrm{exp}}$ controls the distance in $\psi_{\mathcal{X}}$ for bounded elements by [42, Lemma B.1] and distance in $\psi_{\mathcal{X}}$ controls distance in $d_{G_{\mathrm{Multi}}}$ by [42, Lemma B.3]. The first term is therefore sufficiently bounded as $\phi$ is $O(1/\delta)$ -Lipschitz.

Finally note that

	$\displaystyle(g_{1},g_{1}^{\prime})$	$\displaystyle=\prod_{(X_{i},X_{i}^{\prime})\in\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}}}\exp((X_{i},X_{i}^{\prime}))^{x_{i}},$
	$\displaystyle(g_{2},g_{2}^{\prime})$	$\displaystyle=\prod_{(X_{i},X_{i}^{\prime})\in\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}}}\exp((X_{i},X_{i}^{\prime}))^{y_{i}},$

where $x_{i}$ and $y_{i}$ are the coordinates of $x$ and $y$ in $\psi_{\mathcal{X}_{\mathrm{Multi}}}$ in the coordinates corresponding to $\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}\ltimes G_{\mathrm{Lin}}}}$ . This is using that

(t,(\mathrm{id}_{G_{\mathrm{Quot}}},\mathrm{id}_{G_{\mathrm{Lin}}}))\cdot(0,(g,g^{\prime}))=(t,(g,g^{\prime})).

Therefore we have that

	$\displaystyle g_{1}$	$\displaystyle=\prod_{(X_{i},X_{i}^{\prime})\in\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}}}\exp(X_{i})^{x_{i}}$
	$\displaystyle g_{2}$	$\displaystyle=\prod_{(X_{i},X_{i}^{\prime})\in\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}}}\exp(X_{i})^{y_{i}};$

and note that $\exp(X_{i})$ are appropriately bounded elements in $G_{\mathrm{Quot}}$ since $\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}\ltimes G_{\mathrm{Lin}}}$ are low height combinations of elements in $\mathcal{X}$ . Via telescoping, and using that the metric $d_{G_{\mathrm{Quot}}}$ is right-invariant and essentially left-invariant under multiplication by bounded elements (e.g. [42, Lemma B.4]), we have that

	$\displaystyle d_{G_{\mathrm{Quot}}}(g_{1},g_{2})$	$\displaystyle\leq\delta^{-O_{s}(d^{O_{s}(1)})}\cdot\sum_{(X_{i},X_{i}^{\prime})\in\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}}}d_{G_{\mathrm{Quot}}}(\exp(X_{i})^{x_{i}-y_{i}},\mathrm{id}_{G_{\mathrm{Quot}}})$
		$\displaystyle\leq\delta^{-O_{s}(d^{O_{s}(1)})}\cdot\sum_{(X_{i},X_{i}^{\prime})\in\mathcal{X}_{\mathrm{Multi},G_{\mathrm{Quot}}}}\|x_{i}-y_{i}\|$
		$\displaystyle\leq\delta^{-O_{s}(d^{O_{s}(1)})}\cdot\lVert\psi_{\mathcal{X}_{\mathrm{Multi}}}(x)-\psi_{\mathcal{X}_{\mathrm{Multi}}}(y)\rVert\leq\delta^{-O_{s}(d^{O_{s}(1)})}\cdot d_{G_{\mathrm{Multi}}}(x,y).$

This completes the proof upon noting that $F^{\ast}$ is appropriately Lipschitz on $G_{\mathrm{Quot}}$ . ∎

Appendix C Nilcharacters

This section is essentially a straightforward quantification of various statements regarding nilcharacters proven in [34, Appendix E].

We first require that two nilcharacters being equivalent is a transitive relationship; this is a quantified version of [34, Lemma E.7]. Recall the notion of complexity $(M,d)$ that we carry over from Section 12.

Lemma C.1.

Consider three nilcharacters $\chi_{1},\chi_{2},\chi_{3}$ each of complexity $(M,d)$ and such that the pair $\chi_{1}$ and $\chi_{2}$ and the pair $\chi_{2}$ and $\chi_{3}$ are $(M,D,d)$ -equivalent for multidegree $J$ . Then $\chi_{1}$ and $\chi_{3}$ are $((MD)^{O_{|J|}(1)},(MD)^{O_{|J|}(1)},O(d))$ -equivalent for multidegree $J$ .

Proof.

Notice that each coordinate of $\chi_{1}\otimes\overline{\chi_{3}}$ may be expressed as the sum of at most $D$ coordinates of the nilcharacter

\chi_{1}\otimes(\overline{\chi_{2}}\otimes\chi_{2})\otimes\overline{\chi_{3}};

this follows since the trace of $\overline{\chi_{2}}\otimes\chi_{2}$ is $1$ . The result then follows by rewriting

\chi_{1}\otimes(\overline{\chi_{2}}\otimes\chi_{2})\otimes\overline{\chi_{3}}=(\chi_{1}\otimes\overline{\chi_{2}})\otimes(\chi_{2}\otimes\overline{\chi_{3}})

and applying the assumption. ∎

We will generally require the following specialization lemmas; these are rather straightforward consequences of the definitions modulo the need to handle slight filtration issues.

Lemma C.2.

We have the following:

•

Consider a nilsequence $\chi(h_{1},\ldots,h_{k})$ of multidegree $(s_{1},\ldots,s_{k})$ and complexity $(M,d)$ . Given $h^{\ast}\in\mathbb{Z}$ , the function $\chi(h^{\ast},h_{2},\ldots,h_{k})$ , treating $h^{\ast}$ as fixed, is a multidegree $(s_{2},\ldots,s_{k})$ nilsequence of complexity $(M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},d)$ .
•

Consider homomorphisms $L_{i}\colon\mathbb{Z}^{\ell}\to\mathbb{Z}$ . If $\chi(h_{1},\ldots,h_{k})$ is a nilsequence of degree $s$ of complexity $(M,d)$ then $\chi(L_{1}(t_{1},\ldots,t_{\ell}),\ldots,L_{k}(t_{1},\ldots,t_{\ell}))$ is a degree $s$ nilsequence in variables $t_{1},\ldots,t_{\ell}$ of complexity $(M,d)$ .
•

If $\chi(h_{1},\ldots,h_{k})$ is a nilsequence of multidegree $(s_{1},\ldots,s_{k})$ of complexity $(M,d)$ then it is also a nilsequence of degree $s_{1}+\cdots+s_{k}$ of complexity $(M,d)$ .

Remark.

This result allows us to interpret expressions such as $\chi(h_{1}+h_{1}^{\prime},h_{2},\ldots,h_{k})$ as an appropriate degree nilcharacter in $k+1$ variables, if $\chi$ is a nilcharacter in $k$ variables with multidegree $(s_{1},\ldots,s_{k})$ .

Proof.

We handle these items in reverse order (as this is also the difficulty of these claims). Let

\chi(h_{1},\ldots,h_{k})=F(g(h_{1},\ldots,h_{k})\Gamma)

with the underlying nilmanifold being $G/\Gamma$ and the specified Mal’cev basis being $\mathcal{X}$ .

For the last claim, note that $\mathcal{X}$ (by the definition of complexity for multidegree nilmanifolds) is adapted to the degree filtration $G_{t}=\bigvee_{|\vec{i}|=t}G_{\vec{i}}$ . Furthermore by the inclusion given on [34, p. 1264] or direct inspection given the Taylor expansion in [34, Lemma B.9], we have that $g(h_{1},\ldots,h_{k})$ is a polynomial sequence with respect to the degree filtration $G_{0}=G_{1}\geqslant G_{2}\geqslant\cdots\geqslant G_{s_{1}+\cdots+s_{k}}\geqslant\mathrm{Id}_{G}$ . The desired result follows immediately.

For the second item, notice that if a polynomial $P(x_{1},\ldots,x_{k})$ has total degree $s$ , then for any linear maps $L_{i}\colon\mathbb{R}^{\ell}\to\mathbb{R}$ we have that $P(L_{1}(y_{1},\ldots,y_{\ell}),\ldots,L_{k}(y_{1},\ldots,y_{\ell}))$ has total degree $s$ . This coupled with Taylor expansion [34, Lemma B.9] and the fact that the set of polynomial sequences with respect to a given $I$ -filtration is a group (by [34, Corollary B.4]) implies the result.

We now handle the first item; this is the only nontrivial part. Write $g(h^{\ast},0,\ldots,0)=\{g(h^{\ast},0,\ldots,0)\}[g(h^{\ast},0,\ldots,0)]$ with $\psi_{G,\mathcal{X}}(\{g(h^{\ast},0,\ldots,0)\})\in[0,1)^{\dim(G)}$ and $[g(h^{\ast},0,\ldots,0)]\in\Gamma$ . We replace the polynomial sequence $g$ by $g^{\prime}=\{g(h^{\ast},0,\ldots,0)\}^{-1}g[g(h^{\ast},0,\ldots,0)]^{-1}$ and $F$ by the function $F^{\prime}(\cdot)=F(\{g((h^{\ast},0,\ldots,0))\}\cdot)$ . We may thus assume, at the cost of replacing $M$ by $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ , that $g(h^{\ast},0,\ldots,0)=\mathrm{id}_{G}$ .

We now apply [34, Lemma B.9] to see

g(h_{1},\ldots,h_{k})=\prod_{i_{1},\ldots,i_{k}}g_{(i_{1},\ldots,i_{k})}^{\binom{h_{1}}{i_{1}}\cdots\binom{h_{k}}{i_{k}}}

where we order $(i_{1},\ldots,i_{k})$ lexicographically with indices considered in reverse order in the product (in particular, the first few terms are $(0,\ldots,0)$ , $(1,\ldots,0)$ , $(2,\ldots,0)$ , and so on) and $g_{(i_{1},\ldots,i_{k})}\in G_{(i_{1},\ldots,i_{k})}$ . As $g(h^{\ast},0,\ldots,0)=\mathrm{id}_{G}$ , we have that

g(h^{\ast},h_{2},\ldots,h_{k})=\prod_{\begin{subarray}{c}i_{2},\ldots,i_{k}\\ i_{2}+\cdots+i_{k}>0\end{subarray}}g_{(i_{1},\ldots,i_{k})}^{\binom{h^{\ast}}{i_{1}}\cdot\binom{h_{2}}{i_{2}}\cdots\binom{h_{k}}{i_{k}}}.

It then follows that $g(h^{\ast},h_{2},\ldots,h_{k})$ is a polynomial with respect to

G^{\ast}=\bigvee_{\ell=2}^{k}G_{\vec{e}_{\ell}}

which we give a multidegree filtration $G^{\ast}_{(i_{2},\ldots,i_{k})}=G_{(0,i_{2},\ldots,i_{k})}$ for $i_{2}+\cdots+i_{k}>0$ and $G^{\ast}_{(0,\ldots,0)}=G^{\ast}$ . Note that all subgroups in this filtration are $M$ -rational with respect to $\mathcal{X}$ and that $(G^{\ast})_{t}=\bigvee_{|\vec{i}|=t}G^{\ast}_{\vec{i}}$ is a degree $|\vec{s}|-s_{1}$ filtration. Therefore applying [42, Lemma B.11] guarantees that we may find a Mal’cev basis $\mathcal{X}^{\ast}$ for $G^{\ast}$ (which is $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ -rational with respect to $\mathcal{X}$ ). Descending $F$ to $G^{\ast}$ gives the desired result with the necessary Lipschitz bound following from [42, Lemma B.9]. ∎

We now state a quantified version of [34, Lemma E.8]. Recall the notion of equivalence (Definition 7.3).

Lemma C.3.

Consider a nilcharacter $\chi$ with complexity $(M,d)$ of multidegree $\vec{s}=(s_{1},\ldots,s_{k})$ with $|\vec{s}|=s_{1}+\cdots+s_{k}$ . We have that:

•

The nilcharacters

$\chi(\cdot)\emph{ and }\chi(\cdot)$

are $(M^{O_{|\vec{s}|}(1)},M^{O_{|\vec{s}|}(1)},O(d))$ -equivalent for multidegree $<(s_{1},\ldots,s_{k})$ .⁹⁹9This means we take the down-set generated by $(s_{1},\ldots,s_{k})$ and then remove $(s_{1},\ldots,s_{k})$ .
•

Fix $h^{\ast}\in\mathbb{Z}$ . The nilcharacters

$\chi(\cdot+h^{\ast}\vec{e}_{j})\emph{ and }\chi(\cdot)$

are $(M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},O(d))$ -equivalent for multidegree $<(s_{1},\ldots,s_{k})$ .
•

Fix $q\in\mathbb{Z}$ . Then

$\chi^{\otimes q^{|\vec{s}|}}(\cdot)\emph{ and }\chi(q\cdot)$

are $(M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},d^{O_{q}(1)})$ -equivalent for multidegree $<(s_{1},\ldots,s_{k})$ .
•

Fix $q\in\mathbb{Z}^{>0}$ . There exists a nilcharacter $\widetilde{\chi}$ of complexity $(M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},d^{O_{q}(1)})$ such that

$\chi(\cdot)\emph{ and }\widetilde{\chi}^{\otimes q}(\cdot)$

are $(M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},d^{O_{q}(1)})$ -equivalent for multidegree $<(s_{1},\ldots,s_{k})$ .

Remark.

$\chi^{-\otimes q}$ for $q\in\mathbb{Z}^{>0}$ is interpreted as $\overline{\chi}^{\otimes q}$ .

Proof.

Throughout the proof, we let

\chi(\vec{n})=F(g(\vec{n})\Gamma)

where the underlying nilmanifold is $G/\Gamma$ and the underlying Mal’cev basis is $\mathcal{X}$ . When going from item to item, we may reuse variables (e.g., $G^{\prime}$ will be defined in multiple different manners throughout the proof). Additionally, the following analysis implicitly uses that $|\vec{s}|\geq 1$ ; in the remaining case $\vec{s}=0$ all nilsequences become fixed constants and the result is obvious.

For the first item, note that coordinates of $\chi\otimes\overline{\chi}$ are multidegree $(s_{1},\ldots,s_{k})$ polynomial sequences with respect to group $G^{\prime}=\{(g,g)\colon g\in G\}$ given the filtration

G^{\prime}_{\vec{i}}=\{(g,g)\colon g\in G_{\vec{i}}\}.

As all coordinates of $\chi$ have the same vertical frequency, we have that the coordinates of $\chi\otimes\overline{\chi}$ are invariant under $G^{\prime}_{(s_{1},\ldots,s_{k})}$ . This immediately gives the desired result upon taking a quotient and using Lemma 3.10.

For the second item, note that $G^{+\vec{e}_{j}}=(G_{\vec{i}+\vec{e}_{j}})_{\vec{i}\in I}$ is a shifted filtration. Note that this is an $I$ -filtration with respect to the multidegree ordering. We define the group

G^{\prime}=\bigvee_{\ell=1}^{k}(G_{\vec{e}_{\ell}}\times\mathrm{Id}_{G})\vee\{(g,g)\colon g\in G\},

let $\Gamma^{\prime}=G^{\prime}\cap(\Gamma\times\Gamma)$ , and define the following $I$ -filtration with respect to the multidegree ordering:

G^{\prime}_{\vec{i}}=\bigvee_{\ell=1}^{k}(G_{\vec{i}+\vec{e}_{\ell}}\times\mathrm{Id}_{G})\vee\{(g,g)\colon g\in G_{\vec{i}}\}.

By using Lemma 2.2 we may see that this is a valid. We define the cocompact groups similarly. Now the proof of [34, Lemma E.8] shows that

(g(\vec{n}+h\vec{e}_{j}),g(\vec{n}))

is a polynomial sequence with respect to this filtration and that

\widetilde{F}((x,y)(\Gamma\times\Gamma))=F(x\Gamma)\otimes\overline{F(y\Gamma)}

is invariant under the action of $G_{\vec{s}}^{\prime}=\{(g,g)\colon g\in G_{(s_{1},\ldots,s_{k})}\}$ .

We first construct a Mal’cev basis $\mathcal{X}^{\prime}$ on $G^{\prime}$ . Define $G_{t}^{\ast}=\bigvee_{|\vec{i}|=t}G_{\vec{i}+\vec{e}_{j}}$ and note that $G^{\ast}=G_{0}^{\ast}$ has a degree filtration

G_{0}^{\ast}=G_{0}^{\ast}\geqslant G_{1}^{\ast}\geqslant G_{2}^{\ast}\geqslant\cdots

and all these subgroups are $M^{O_{|\vec{s}|}(1)}$ -rational with respect to $\mathcal{X}$ . Therefore $G^{\ast}$ has a Mal’cev basis $\mathcal{X}^{\ast}$ which is adapted to this filtration and all elements are height at most $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ combinations of elements in $\mathcal{X}$ by [42, Lemma B.11]. Note that

\{(X,X)\colon X\in\mathcal{X}\}\cup\{(X^{\ast},0)\colon X^{\ast}\in\mathcal{X}^{\ast}\}

is easily shown to be a weak basis of rationality $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ for $G^{\prime}/\Gamma^{\prime}$ and has the degree $O_{|\vec{s}|}(1)$ nesting property. Letting $G_{t}^{\prime}=\bigvee_{|\vec{i}|=t}G_{\vec{i}}^{\prime}$ we see that

G^{\prime}=G^{\prime}\geqslant G_{1}^{\prime}\geqslant G_{2}^{\prime}\geqslant\cdots

form a sequence of subgroups such that $[G^{\prime},G_{i}^{\prime}]\leqslant G_{i+1}^{\prime}$ for $i\geq 0$ (with $G_{0}^{\prime}=G^{\prime}$ ). Thus by [42, Lemma B.11] we can find a Mal’cev basis $\mathcal{X}^{\prime}$ adapted to this sequence such that each element is a height $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ linear combination of

\{(X,X)\colon X\in\mathcal{X}\}\cup\{(X^{\ast},0)\colon X^{\ast}\in\mathcal{X}^{\ast}\}.

At present, however, we see that $G^{\prime}$ has not been given a multidegree filtration (only an $I$ -filtration with respect to the multidegree ordering; recall Definition 2.4). We replace $G^{\prime}$ by

\widetilde{G}=\bigvee_{\ell=1}^{k}G^{\prime}_{\vec{e}_{\ell}}=G_{1}^{\prime}

and note that $\widetilde{G}$ is appropriately rational with respect to $\mathcal{X}^{\prime}$ and $G^{\prime}_{\vec{i}}\leqslant\widetilde{G}$ for $\vec{i}\neq 0$ . Note that $\widetilde{G}$ is easily seen to have a multidegree $(s_{1},\ldots,s_{k})$ filtration. Furthermore, removing the initial $\dim(G^{\prime})-\dim(\widetilde{G})$ elements, we see that the truncation of $\mathcal{X}^{\prime}$ is valid Mal’cev basis for $\widetilde{G}$ of complexity $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ and all subgroups in the multidegree filtration are $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ -rational.

We now write $(g(h^{\ast}\vec{e}_{j}),g(0))=\{(g(h^{\ast}\vec{e}_{j}),g(0))\}[(g(h^{\ast}\vec{e}_{j}),g(0))]$ where

\lVert\psi_{\mathcal{X}^{\prime}}(\{(g(h^{\ast}\vec{e}_{j}),g(0))\})\rVert_{\infty}\leq M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}

and $[(g(h^{\ast}\vec{e}_{j}j),g(0))]\in\Gamma^{\prime}$ . We consider the modified polynomial sequence

g^{\prime}(\vec{n})=\{(g(h^{\ast}\vec{e}_{j}),g(0))\}^{-1}(g(\vec{n}+h^{\ast}\vec{e}_{j}),g(\vec{n}))[(g(h^{\ast}\vec{e}_{j}),g(0))]^{-1};

evaluating at $\vec{n}=0$ this is now seen to be a polynomial sequence in $\widetilde{G}$ . Defining

F^{\prime}((x,y)(\Gamma\times\Gamma))=\widetilde{F}(\{(g(h^{\ast}\vec{e}_{j}),g(0))\}(x,y)(\Gamma\times\Gamma)),

we have that $F^{\prime}$ is invariant under $(\widetilde{G})_{(s_{1},\ldots,s_{k})}$ and

F^{\prime}(g^{\prime}(\vec{n}))=F(g(\vec{n}+h^{\ast}\vec{e}_{j}))\overline{F}(g(\vec{n})).

We may pass to the quotient group $\widetilde{G}/\widetilde{G}_{(s_{1},\ldots,s_{k})}$ and the desired result is essentially an immediate consequence of Lemma 3.10.

We now come to the third item; we only maintain the notation from the first sentence of the proof. Note that via writing $g(0)=\{g(0)\}[g(0)]$ with $[g(0)]\in\Gamma$ and $\psi_{\mathcal{X}}(\{g(0)\})\in[0,1)^{\dim(G)}$ , replacing $g(\vec{n})$ by $\{g(0)\}^{-1}g(\vec{n})[g(0)]^{-1}$ and $F$ by $F(\{g(0)\}\cdot)$ , up to replacing $M$ by $M^{O_{|\vec{s}|}(1)}$ we may assume that $g(0)=\mathrm{id}_{G}$ .

Define

G^{\prime}_{\vec{i}}=\bigvee_{\vec{j}>\vec{i}}(G_{\vec{j}}\times G_{\vec{j}})\vee\bigvee\{(g^{q^{|\vec{i}|}},g)\colon g\in G_{\vec{i}}\};

here $\vec{j}>\vec{i}$ means $\vec{j}$ is coordinate-wise at least as large as $\vec{i}$ and not identical. Furthermore $\Gamma^{\prime}=G^{\prime}\cap(\Gamma\times\Gamma)$ ; note that $G^{\prime}$ is isomorphic to $G\times G$ however we have given the group an alternate filtration. This is verified to be an $I$ -filtration with respect to the multidegree ordering in [34, p. 1356]. Furthermore the proof of [34, Lemma E.8] shows that

(g(q\vec{n}),g(\vec{n}))

is a polynomial sequence with respect to this filtration and that

\widetilde{F}((x,y)(\Gamma\times\Gamma))=F(x\Gamma)\otimes\overline{F(y\Gamma)}^{\otimes q^{|\vec{s}|}}

is invariant under the action of $G_{\vec{s}}=\{(g^{q^{|\vec{s}|}},g)\colon g\in G_{(s_{1},\ldots,s_{k})}\}$ . The primary technical issue, as before, is that while this is an $I$ –filtration with respect to the multidegree ordering this is not a multidegree filtration (Definition 2.4).

We first give $G^{\prime}$ a Mal’cev basis. Note that $\mathcal{X}$ is adapted to the degree filtration on $G$ given by $G_{t}=\bigvee_{|\vec{i}|=t}G_{\vec{i}}$ (Definition 3.8). It is immediate to see that

\mathcal{X}^{\ast}=\{(X,0)\colon X\in\mathcal{X}\cap G_{1}\}\cup\{(0,X)\colon X\in\mathcal{X}\cap G_{1}\}

is a Mal’cev basis for the product filtration on $G$ . Then using [42, Lemma B.11] on

G_{0}^{\prime}=G_{0}^{\prime}\geqslant G_{1}^{\prime}\geqslant\cdots

where $G_{t}^{\prime}=\bigvee_{|\vec{i}|=t}G_{\vec{i}}^{\prime}$ , which is seen to satisfy $[G_{0}^{\prime},G_{t}^{\prime}]\leqslant G_{t+1}^{\prime}$ for $t\geq 0$ , we easily construct a Mal’cev basis $\mathcal{X}^{\prime}$ for $G^{\prime}/\Gamma^{\prime}$ coming from combinations of $\mathcal{X}^{\ast}$ . (We implicitly use that $G=\bigvee_{j}G_{\vec{e}_{j}}$ .) As $G$ is has complexity $M$ , it is trivial to see that all subgroups in the filtration of $G^{\prime}$ are $M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})}$ -rational. The Mal’cev basis $\mathcal{X}^{\prime}$ clearly has the nesting property of order $|\vec{s}|$ since $\mathcal{X}$ does.

We define $\widetilde{G}$ as

\widetilde{G}=\bigvee_{\ell=1}^{k}G^{\prime}_{\vec{e}_{\ell}}

and this group is seen to be appropriately rational with respect to $\mathcal{X}^{\prime}$ and is given the multidegree filtration $\widetilde{G}_{\vec{i}}=G^{\prime}_{\vec{i}}$ for $\vec{i}\neq 0$ . Noting that the constant term of the Taylor expansion of $(g(q\vec{n}),g(\vec{n}))$ is $(\mathrm{id}_{G},\mathrm{id}_{G})$ , we have that this is in fact a polynomial sequence with respect to the multidegree filtration given to $\widetilde{G}$ . (This is where we use that we reduced to $g(0)=\mathrm{id}_{G}$ .) Furthermore, letting $\widetilde{G}_{t}=\bigvee_{|\vec{i}|=t}\widetilde{G}_{\vec{i}}$ , we see that a truncation of $\mathcal{X}^{\prime}$ is an adapted Mal’cev basis to $\widetilde{G}_{0}=\widetilde{G}_{1}\geqslant\widetilde{G}_{2}\geqslant\cdots$ where each element is an $M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})}$ -rational combination of $\mathcal{X}^{\ast}$ . As $\widetilde{F}$ is invariant under $\widetilde{G}_{(s_{1},\ldots,s_{k})}$ , by passing to the quotient $\widetilde{G}/\widetilde{G}_{(s_{1},\ldots,s_{k})}$ and applying Lemma 3.10 we immediately finishe the proof.

We finally deduce the fourth item from the third item. Let $g^{\prime}(n)=g(n/q)$ ; note that $g$ may be extended to take on rational input via using Mal’cev coordinates and we may treat $g^{\prime}$ as a valid polynomial sequence. By applying the third item, we have that

F(g(n))\text{ and }F(g^{\prime}(n))^{\otimes q^{|\vec{s}|}}

are $(M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},M^{O_{|\vec{s}|,q}(d^{O_{|\vec{s}|,q}(1)})},d^{O_{q}(1)})$ -equivalent for multidegree $<(s_{1},\ldots,s_{k})$ . Outputting $F(g^{\prime}(n))^{\otimes q^{|\vec{s}|-1}}$ then gives the desired result. ∎

The next lemma is a quantified version of [34, Lemma 13.2]. The proof is once again essentially identical modulo noting slight changes in the filtration notions.

Lemma C.4.

Consider $\chi\colon\mathbb{Z}^{k}\to\mathbb{C}$ which is a multidegree $(1,\ldots,1)$ nilcharacter of complexity $(M,d)$ . Then

\chi(h_{1}+h_{1}^{\prime},h_{2},\ldots,h_{k})\emph{ and }\chi(h_{1},h_{2},\ldots,h_{k})\otimes\chi(h_{1}^{\prime},h_{2},\ldots,h_{k})

are $(M^{O_{k}(d^{O_{k}(1)})},d^{O_{k}(1)})$ -equivalent for degree $(k-1)$ .

Proof.

Let $\chi(h_{1},\ldots,h_{k})=F(g(h_{1},\ldots,h_{k})\Gamma)$ where the underlying nilmanifold is $G/\Gamma$ and the underlying Mal’cev basis is $\mathcal{X}$ . Let $g(0,\ldots,0)=\{g(0,\ldots,0)\}[g(0,\ldots,0)]$ with $[g(0,\ldots,0)]\in\Gamma$ and $\psi_{G,\mathcal{X}}(\{g(0,\ldots,0)\})\in[0,1)^{\dim(G)}$ . We have that

F(g(h_{1},\ldots,h_{k})\Gamma)=F(\{g(0,\ldots,0)\}\cdot(\{g(0,\ldots,0)\}^{-1}g(h_{1},\ldots,h_{k})[g(0,\ldots,0)]^{-1}\Gamma)).

We let $F^{\prime}(\cdot\Gamma)=F(\{g(0,\ldots,0)\}\cdot\Gamma)$ and $g^{\prime}(h_{1},\ldots,h_{k})=(\{g(0,\ldots,0)\}^{-1}g(h_{1},\ldots,h_{k})[g(0,\ldots,0)]^{-1}$ . Thus may assume replace $F$ by $F^{\prime}$ and $g$ by $g^{\prime}$ (at the cost of replacing $M$ by $M^{O_{k}(d^{O_{k}(1)})}$ ) and assume that $g(0)=\mathrm{id}_{G}$ .

Consider, for $t\geq 1$ ,

G^{\prime}_{t}=\bigvee_{|\vec{i}|>t}(G_{\vec{i}}\times G_{\vec{i}}\times G_{\vec{i}})\vee\bigvee_{|\vec{i}|=t}\{(g_{1}g_{2},g_{1},g_{2}):g_{i}\in G_{\vec{i}}\}

and take $G^{\prime}=G^{\prime}_{1}$ . Via Baker–Campbell–Hausdorff this gives a valid degree $k$ filtration

G^{\prime}=G^{\prime}_{1}\geqslant G^{\prime}_{2}\geqslant\cdots\geqslant G^{\prime}_{k}\geqslant\mathrm{Id}_{G^{\prime}}.

We define $\Gamma^{\prime}=G^{\prime}\cap(\Gamma\times\Gamma\times\Gamma)$ . We now verify that

(g(h_{1}+h_{1}^{\prime},h_{2},\ldots,h_{k}),g(h_{1},h_{2},\ldots,h_{k}),g(h_{1}^{\prime},h_{2},\ldots,h_{k}))

is a polynomial sequence with respect to this degree filtration. This is immediate nothing that by Taylor expansion [34, Lemma B.9] and the condition at $0$ , we have

g(h_{1},\ldots,h_{k})=\prod_{(i_{1},\ldots,i_{k})\in\{0,1\}^{k}\setminus\{\vec{0}\}}g_{i_{1},\ldots,i_{k}}^{\binom{h_{1}}{i_{1}}\cdots\binom{h_{k}}{i_{k}}}

with $g_{i_{1},\ldots,i_{k}}\in G_{(i_{1},\ldots,i_{k})}$ . The desired polynomiality of the tripled sequence then follows easily from Baker–Campbell–Hausdorff and the fact that the degree of the exponents in the Taylor expansion for the $h_{1}$ term is at most $1$ .

We now construct a Mal’cev basis for $G^{\prime}$ . Let $G_{t}=\bigvee_{|\vec{i}|=t}G_{\vec{i}}$ and note that by definition $\mathcal{X}$ is adapted to $G_{t}$ . We may prove that

	$\displaystyle\mathcal{X}^{{}^{\prime}}=\{(X,0,0)\colon X\in\mathcal{X}\cap\log(G_{2})\}\cup\{(0,X,0)\colon X\in\mathcal{X}\cap\log(G_{2})\}\cup\{(0,0,X):X\in\mathcal{X}\cap\log(G_{2})\}$
	$\displaystyle\cup\{(X,X,0)\colon X\in\mathcal{X}\cap\log(G_{1})\setminus\mathcal{X}\cap\log(G_{2})\}\cup\{(X,0,X)\colon X\in\mathcal{X}\cap\log(G_{1})\setminus\mathcal{X}\cap\log(G_{2})\}$

is a weak basis for $G^{\prime}$ . Furthermore this basis is easily seen to have the nesting property of order $k$ and that all subgroups $G_{t}^{\prime}$ are $M^{O_{k}(d^{O_{k}(1)})}$ -rational. Thus applying [42, Lemma B.11] we may find a Mal’cev basis $\widetilde{\mathcal{X}}$ for $G^{\prime}$ adapted to the given filtration of complexity $M^{O_{k}(d^{O_{k}(1)})}$ and such that all basis elements are $M^{O_{k}(d^{O_{k}(1)})}$ -rational combinations of $\mathcal{X}^{\prime}$ .

The function $\widetilde{F}$ we will consider is

\widetilde{F}((x,y,z)\Gamma^{\otimes 3})=F(x\Gamma)\otimes\overline{F(y\Gamma)}\otimes\overline{F(z\Gamma)}.

This is easily seen to be Lipschitz on $G\times G\times G$ when given the Mal’cev basis $\mathcal{X}^{\ast}=\{(X,0,0)\colon X\in\mathcal{X}\}\cup\{(0,X,0)\colon X\in\mathcal{X}\}\cup\{(0,0,X)\colon X\in\mathcal{X}\}$ . As $\widetilde{\mathcal{X}}$ has basis elements which are $M^{O_{k}(d^{O_{k}(1)})}$ height rational combination of $\mathcal{X}^{\ast}$ , we find that $\widetilde{F}$ is $M^{O_{k}(d^{O_{k}(1)})}$ -Lipschitz on $G^{\prime}/\Gamma^{\prime}$ .

Note that $\widetilde{F}$ is invariant under the group $G_{k}^{3}$ and therefore taking the output quotient group $G^{3}/G_{k}^{3}$ with lattice $\Gamma^{3}/(\Gamma^{3}\cap G_{k}^{3})$ and using Lemma 3.10 completes the proof. ∎

We now come to the most technical of the complexity justifications we will need to perform, multilinearization. We will give a rather barebones analysis (citing much from [34, Proposition E.9, E.10]); the reader may find the discussion in [34, pp. 1360-1363] where an extended example is discussed useful. (We prove a slightly weaker statement which is all that is used in the analysis to ease checking extra complexity details.)

Lemma C.5.

Consider nilcharacter $\chi(h_{1},\ldots,h_{k})$ of multidegree $(s_{1},\ldots,s_{k})$ and complexity $(M,d)$ . There exists a multidegree $(1,\ldots,1)$ nilcharacter

\chi^{\prime}(h_{1,1},\ldots,h_{1,s_{1}},h_{2,1},\ldots,h_{2,s_{2}},\ldots,h_{k,1},\ldots,h_{k,s_{k}})

of complexity $(M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},d^{O_{|\vec{s}|}(1)})$ such that

\chi(h_{1},\ldots,h_{k})\emph{ and }\chi^{\prime}(h_{1},\ldots,h_{1},h_{2},\ldots,h_{2},\ldots,h_{k},\ldots,h_{k})

are $(M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},d^{O_{|\vec{s}|}(1)})$ -equivalent for degree $|\vec{s}|-1$ and furthermore, for each $1\leq i\leq k$ , $\chi^{\prime}$ is symmetric in the variables $h_{i,1},\ldots,h_{i,s_{i}}$ .

Remark.

We will only require the above lemma for multidegree $(1,s-1)$ nilsequences.

Proof.

By Lemma C.3, there exists $\chi^{\ast}$ such that

\chi(h_{1},\ldots,h_{k})\text{ and }\chi^{\ast}(h_{1},\ldots,h_{k})^{\otimes\prod_{i=1}^{k}s_{i}!}

are $(M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},d^{O_{|\vec{s}|}(1)})$ -equivalent for degree $|\vec{s}|-1$ . Therefore by Lemma 7.4, it suffices to produce $\chi^{\prime}$ such that

\chi^{\ast}(h_{1},\ldots,h_{k})^{\otimes\prod_{i=1}^{k}s_{i}!}\text{ and }\chi^{\prime}(h_{1},\ldots,h_{1},h_{2},\ldots,h_{2},\ldots,h_{k},\ldots,h_{k})

are $(M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})},d^{O_{|\vec{s}|}(1)})$ -equivalent for degree $|\vec{s}|-1$ .

Let

\chi^{\prime}(h_{1},\ldots,h_{k})=F(g(h_{1},\ldots,h_{k})\Gamma)

with the underlying nilmanifold being $G/\Gamma$ and the associated Mal’cev basis being $\mathcal{X}$ . Via a standard manipulation which has been perform several times already, we may assume that $g(0)=\mathrm{id}_{G}$ (at the cost of an insignificant change in parameters). Furthermore assume that $\eta$ is the vertical character, so

F(g_{(s_{1},\ldots,s_{k})}x\Gamma)=e(\eta(g_{(s_{1},\ldots,s_{k})}))\cdot F(x\Gamma).

Given $J\subseteq[|\vec{s}|]$ , we denote

\|J\|:=(|J\cap\{s_{1}+\cdots+s_{i-1}+1,\ldots,s_{1}+\cdots+s_{i-1}+s_{i}\}|)_{1\leq i\leq k}.

The group $\widetilde{G}$ we will ultimately use to construct our nilsequence will be given by constructing the associated nilpotent Lie algebra. We take

\log(\widetilde{G})=\bigoplus_{\emptyset\neq J\subseteq[|\vec{s}|]}\log(G_{\|J\|})

and for each $\emptyset\neq J\subseteq[|\vec{s}|]$ let $\iota_{J}\colon\log(G_{\|J\|})\hookrightarrow\log(\widetilde{G})$ denote the embedding into the direct sum. We endow $\widetilde{G}$ with a Lie bracket such that if $J\cap K\neq\emptyset$ then

[\iota_{J}(x_{J}),\iota_{K}(y_{K})]=0

and if $J\cap K=\emptyset$ then

[\iota_{J}(x_{J}),\iota_{K}(y_{K})]=\iota_{J\cup K}([x_{J},y_{K}]),

where the bracket between $x_{J},x_{K}$ is taken in the ambient space $\log(G)$ and is seen to lie in $\log(G_{J\cup K})$ by the commutator property of the original filtration on $G$ .

To verify that this gives a valid Lie algebra it suffices to verify this operation is antisymmetric and satisfies the Jacobi relations. Furthermore to verify it suffices to verify these relations on the generators. For antisymmetry for $\iota_{J}(x_{J}),\iota_{K}(y_{K})$ , if $J\cap K\neq\emptyset$ it is trivial and otherwise

[\iota_{J}(x_{J}),\iota_{K}(y_{K})]=\iota_{J\cup K}([x_{J},y_{K}])=-\iota_{J\cup K}([y_{J},x_{K}])=-[\iota_{K}(y_{K}),\iota_{J}(x_{J})]

as desired. For the Jacobi identity, when checked on generators $\iota_{J}(x_{J}),\iota_{K}(y_{K}),\iota_{L}(z_{L})$ , if $(J\cap K)\cup(K\cap L)\cup(L\cap K)\neq\emptyset$ the result is trivial. Otherwise we have

	$\displaystyle[\iota_{J}(x_{J}),[\iota_{K}(y_{K}),\iota_{L}(z_{L})]]+[\iota_{K}(y_{K}),[\iota_{L}(z_{L}),\iota_{J}(x_{J})]]+[\iota_{L}(z_{L}),[\iota_{J}(x_{J}),\iota_{K}(y_{K})]]$
	$\displaystyle=\iota_{J\cup K\cup L}([x_{J},[y_{K},z_{L}]]+[y_{K},[z_{L},x_{J}]]+[z_{L},[x_{J},y_{K}]])=0$

as desired.

The associated $I$ -filtration with respect to the multidegree ordering is given as follows. For any $(a_{1},\ldots,a_{|\vec{s}|})\in\mathbb{N}^{|\vec{s}|}$ , let $\log(\widetilde{G}_{(a_{1},\ldots,a_{|\vec{s}|})})$ be the Lie subalgebra of $\log(\widetilde{G})$ generated by $\iota_{J}(x_{J})$ for which $1_{J}(j)\geq a_{j}$ for each $j=1,\ldots,|\vec{s}|$ , and $x_{J}\in G_{\|J\|}$ . It follows this is an $I$ -filtration with respect to the multidegree ordering because for vectors $a,b\in\{0,1\}^{|\vec{s}|}$ , if $1_{J}(j)\geq a_{j}$ and $1_{K}(j)\geq b_{j}$ we either have $J\cap K\neq\emptyset$ in which case the commutator is trivial or $1_{K\cup J}(j)\geq a_{j}+b_{j}$ in which case the result also follows easily. Noting that by construction $\widetilde{G}=\bigvee_{\ell=1}^{|\vec{s}|}\widetilde{G}_{\vec{e}_{\ell}}$ , the above immediately implies that we have a multidegree $(1,\ldots,1)$ filtration on $\widetilde{G}$ .

We now construct a weak basis for $\widetilde{G}$ . Recall we have a Mal’cev basis $\mathcal{X}$ for $G$ . Given $\|J\|$ , we define the filtration

G_{\|J\|}^{t}=\bigvee_{|\vec{i}|=t}G_{\|J\|+\vec{i}}.

Note that $G_{\|J\|}=G_{\|J\|}^{0}$ and that $G_{\|J\|}^{0}=G_{\|J\|}^{0}\geqslant G_{\|J\|}^{1}\geqslant G_{\|J\|}^{2}\geqslant\cdots$ is a valid degree filtration when $\|J\|\neq\vec{0}$ . Thus by [34, Lemma B.11], we may find a Mal’cev basis $\mathcal{X}^{\|J\|}$ for each $G_{\|J\|}$ which is an $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ -rational combination of $\mathcal{X}$ .

Define

\widetilde{\mathcal{X}}=\bigcup_{\emptyset\neq J\subseteq[|\vec{s}|]}\iota_{J}(\mathcal{X}^{\|J\|}).

Furthermore define $\widetilde{\Gamma}$ to be the group generated by $\exp(L!\cdot\iota_{J}(\Gamma\cap G_{\|J\|}))$ where $L$ is a sufficiently large constant depending only on $|\vec{s}|$ (and in particular not on $M$ or $d$ ). Direct computation with Baker–Campbell–Hausdorff implies that $\widetilde{\Gamma}\cap\widetilde{G}_{(1,\ldots,1)}$ is contained in $\iota_{[|\vec{s}|]}(\Gamma\cap G_{(s_{1},\ldots,s_{k})})$ . Furthermore we see that $\widetilde{G}/\widetilde{\Gamma}$ is compact, $\widetilde{\mathcal{X}}$ is a weak basis of rationality $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ for $\widetilde{G}$ , and $\widetilde{\mathcal{X}}$ has the degree $O_{|\vec{s}|}(1)$ nesting property. As all groups within the multidegree filtration are $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ -rational with respect to $\widetilde{\mathcal{X}}$ , by applying [42, Lemma B.11] we may construct a basis with respect to the canonical associated degree filtration of $\widetilde{G}$ which certifies that $\widetilde{G}$ with the given multidegree filtration has complexity bounded by $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ . Furthermore the adapted Mal’cev basis $\widetilde{\mathcal{X}}^{\ast}$ is an $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ -rational combination of $\widetilde{\mathcal{X}}$ (lifted to $\log(\widetilde{G})$ appropriately).

We define the $\widetilde{G}_{(1,\ldots,1)}$ -vertical frequency as

\widetilde{\eta}(\exp(\iota_{(1,\ldots,1)}(\log(g_{(s_{1},\ldots,s_{k})})))):=\eta(g_{(s_{1},\ldots,s_{k})})

and it is trivial to use the construction of $\widetilde{\mathcal{X}}^{\ast}$ to certify that $\widetilde{\eta}$ has height bounded by $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ . We take $\widetilde{F}$ to be a nilcharacter with frequency $\widetilde{\eta}$ produced by Lemma B.4 (which is applied to the canonical degree filtration of $\widetilde{G}$ ) and this construction gives output dimension $M^{O_{|\vec{s}|}(d)}$ and Lipschitz constant $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ with respect to $\widetilde{\mathcal{X}}^{\ast}$ .

We now define $\widetilde{g}$ . Note that

g(h_{1},\ldots,h_{k})=\prod_{\vec{0}\neq(i_{1},\ldots,i_{k})\leq(s_{1},\ldots,s_{k})}(g_{(i_{1},\ldots,i_{k})}^{\mathrm{Tay}})^{h_{1}^{i_{1}}\cdots h_{k}^{i_{k}}}

via [34, Lemma B.9] and the condition at $0$ to rule out need a coefficient where $(i_{1},\ldots,i_{k})=\vec{0}$ . (We are using monomials instead of binomials, which is a minor but easy alteration.) The product here is taken in increasing lexicographic order. We define

\widetilde{g}(h_{1},\ldots,h_{|\vec{s}|}):=\prod_{\vec{0}\neq(i_{1},\ldots,i_{k})\leq(s_{1},\ldots,s_{k})}\exp\bigg{(}i_{1}!\cdots i_{k}!\sum_{\begin{subarray}{c}J\subseteq\{1,\ldots,|\vec{s}|\}\\ \|J\|=(i_{1},\ldots,i_{k})\end{subarray}}\big{(}\prod_{i\in J}h_{i}\big{)}\cdot\iota_{J}(\log(g_{(i_{1},\ldots,i_{k})}^{\mathrm{Tay}}))\bigg{)}.

Let $G^{\ast}$ denote the subgroup of $G\times\widetilde{G}$ generated by

G^{\ast}:=\{(g_{(s_{1},\ldots,s_{k})},\exp(s_{1}!\cdots s_{k}!\iota_{(1,\ldots,1)}(\log(g_{(s_{1},\ldots,s_{k})}))))\colon g_{(s_{1},\ldots,s_{k})}\in G_{(s_{1},\ldots,s_{k})}\}.

Note that the function

(g,\widetilde{g})\mapsto F(g\Gamma)^{\otimes s_{1}!\cdots s_{k}!}\otimes\overline{\widetilde{F}(\widetilde{g}\widetilde{\Gamma})}

is invariant under the action of $G^{\ast}$ .

We will construct $G^{\prime}$ which is a subgroup of $G\times\widetilde{G}$ with a degree $|\vec{s}|$ filtration such that the final group is $G^{\ast}$ and such that

(g(h_{1},\ldots,h_{k}),\widetilde{g}(h_{1},\ldots,h_{1},h_{2},\ldots,h_{2},\ldots,h_{k},\ldots,h_{k}))

is a polynomial sequence with respect to this filtration. Let $G_{j}^{\prime}$ (for $j\geq 1$ ) be generated by elements of the form

(C.1)

\bigg{(}g_{(i_{1},\ldots,i_{k})},\exp\bigg{(}i_{1}!\cdots i_{k}!\sum_{\begin{subarray}{c}J\subseteq\{1,\ldots,|\vec{s}|\}\\ \|J\|=(i_{1},\ldots,i_{k})\end{subarray}}\iota_{J}(\log(g_{(i_{1},\ldots,i_{k})}))\bigg{)}\bigg{)}

where $|\vec{i}|=j$ , as well as

\big{(}g_{(i_{1},\ldots,i_{k})},\mathrm{id}_{\widetilde{G}}\big{)},\text{ and }\big{(}\mathrm{id}_{G},\widetilde{G}_{J}\big{)}

where $|\vec{i}|\geq j+1$ in the first case, and $|J|\geq j+1$ in the second case. Furthermore set $G^{\prime}=G_{0}^{\prime}:=G_{1}^{\prime}$ ; we trivially see that $G_{|\vec{s}|}=G^{\ast}$ . That this is a filtration follows from liberal application of Baker–Campbell–Hausdorff; we use crucially that the number of ways to break a set of size $(i+j)$ into two labeled sets of size $i$ and $j$ which are disjoint is $(i+j)!/(i!\cdot j!)$ , which modifies the factorial prefactors in (C.1) appropriately.

Furthermore it is trivial to see that the $G_{j}^{\prime}$ are $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ -rational with respect to the Mal’cev basis for $G\times\widetilde{G}$ given by

\{(X,0)\colon X\in\mathcal{X}\}\cup\{(0,X)\colon X\in\widetilde{\mathcal{X}}^{\ast}\},

and therefore applying [42, Lemma B.11] we may construct a Mal’cev basis $\mathcal{X}^{\prime}$ of complexity $M^{O_{|\vec{s}|}(d^{O_{|\vec{s}|}(1)})}$ for $G^{\prime}/(G^{\prime}\cap(\Gamma\times\widetilde{\Gamma}))$ . Furthermore $F^{\otimes s_{1}!\cdots s_{k}!}\otimes\overline{\widetilde{F}}$ is appropriately Lipschitz with respect to $\mathcal{X}^{\prime}$ . Finally, since

\bigg{(}g_{(i_{1},\ldots,i_{k})},\exp\big{(}i_{1}!\cdots i_{k}!\sum_{\begin{subarray}{c}J\subset\{1,\ldots,|\vec{s}|\\ \|J\|=(i_{1},\ldots,i_{k})\end{subarray}}\iota_{J}(\log(g_{(i_{1},\ldots,i_{k})}))\big{)}\bigg{)}

is in $G_{i_{1}+\cdots+i_{k}}^{\prime}$ by definition, we see that

(g(h_{1},\ldots,h_{k}),\widetilde{g}(h_{1},\ldots,h_{1},h_{2},\ldots,h_{2},\ldots,h_{k},\ldots,h_{k}))

is a polynomial sequence with respect to the filtration. Quotienting out by $G^{\ast}=G_{|\vec{s}|}^{\prime}$ (using that $F^{\otimes s_{1}!\cdots s_{k}!}\otimes\overline{\widetilde{F}}$ is invariant under $G^{\ast}$ ) and using Lemma 3.10, we finally complete the proof. ∎

We now reach the final technical lemma of the paper which states that a nilsequence of multidegree $J\cup J^{\prime}$ can be approximated by a sum of products of nilsequences in $J$ and $J^{\prime}$ . This “splitting” lemma is a quantified version of [34, Lemma E.4]; the proof here is ever so slightly different as we are forced to not use the Stone–Weierstrass theorem.

Lemma C.6.

Let $J$ and $J^{\prime}$ be finite downsets in $\mathbb{N}^{k}$ and fix $\varepsilon\in(0,1/2)$ . Suppose that $\beta(h_{1},\ldots,h_{k})$ is a nilsequence of multidegree $J\cup J^{\prime}$ with complexity $(M,d)$ . Then there exists $1\leq L\leq(M/\varepsilon)^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ such that

\bigg{\lVert}\beta(h_{1},\ldots,h_{k})-\sum_{j=1}^{L}\beta_{j}(h_{1},\ldots,h_{k})\beta_{j}^{\prime}(h_{1},\ldots,h_{k})\bigg{\rVert}_{L^{\infty}(\mathbb{Z}^{k})}\leq\varepsilon

with the $\beta_{j}$ being nilsequences of multidegree $J$ , the $\beta_{j}^{\prime}$ being nilsequences of multidegree $J^{\prime}$ , and $\beta_{j},\beta_{j}^{\prime}$ having complexity $((M/\varepsilon)^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})},d^{O_{J,J^{\prime}}(1)})$ .

Proof.

We let

\beta(h_{1},\ldots,h_{k})=F(g(h_{1},\ldots,h_{k})\Gamma)

where the underlying nilmanifold is $G/\Gamma$ . As is standard, we may assume that $g(0,\ldots,0)=\mathrm{id}_{G}$ up to the insignificant change of adjusting $M$ to $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ . Furthermore let the adapted Mal’cev basis for $G$ be $\mathcal{X}$ .

We have for each $\vec{j}\neq\vec{0}$ that the groups

G_{t}^{\vec{j}}=\bigvee_{|\vec{i}|=t}G_{\vec{j}+\vec{i}}

form a degree filtration $G_{0}^{\vec{j}}=G_{0}^{\vec{j}}\geqslant G_{1}^{\vec{j}}\geqslant G_{2}^{\vec{j}}\geqslant\cdots$ , where the length of the filtration is $O_{J,J^{\prime}}(1)$ . As these subgroups are all $M$ -rational with respect to $\mathcal{X}$ , there exists a Mal’cev basis $\mathcal{X}^{\vec{j}}$ adapted to this filtration of complexity $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ where each element is an $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -rational combination of elements in $\mathcal{X}$ by [42, Lemma B.11].

Using a variant of Lemma 10.2, adapted to multidegree filtrations, we may write

g(h_{1},\ldots,h_{k})=\prod_{\vec{j}\neq\vec{0}}\prod_{X_{\vec{j},i}\in\mathcal{X}^{\vec{j}}}\exp(X_{\vec{j},i})^{\alpha_{\vec{j},i}\prod_{\ell=1}^{k}(h_{\ell}^{j_{\ell}}/j_{\ell}!)}.

The product here is taken in $\vec{j}$ is increasing $|\vec{j}|$ and then lexicographic order and $X_{\vec{j},i}$ taken in increasing order of $i$ . The modified proof of such a representation involves iteratively handling terms in increasing order of $|\vec{j}|$ (and handling these terms in an arbitrary order); we omit a careful proof.

The first key part of the proof is lifting to the universal nilmanifold. We define the universal nilmanifold $\widetilde{G}$ to be generated by generators $\exp(e_{\vec{j},i})^{t_{\vec{j},i}}$ for $\vec{j}\neq\vec{0}$ , $1\leq i\leq\dim(G_{\vec{j}})$ , and $t_{\vec{j},i}\in\mathbb{R}$ . The only relations these generators satisfy is that any $(r-1)$ -fold commutator for $r\geq 1$ between $\exp(e_{\vec{j_{1}},i_{1}}),\ldots,\exp(e_{\vec{j_{r}},i_{r}})$ vanishes if $\vec{j_{1}}+\cdots+\vec{j_{r}}$ is not in $J\cup J^{\prime}$ . We give $\widetilde{G}$ the structure of a multidegree $J\cup J^{\prime}$ nilmanifold by letting $(\widetilde{G})_{\vec{j}^{\ast}}$ be generated by the set of $(r-1)$ -fold commutators (for any $r\geq 1$ ) of $\exp(e_{\vec{j_{1}},i_{1}}),\ldots,\exp(e_{\vec{j_{r}},i_{r}})$ where $\vec{j_{1}}+\cdots+\vec{j_{r}}\geq\vec{j}^{\ast}$ (here $\geq$ means that each coordinate is larger). This is easily proven to be an $I$ -filtration with respect to the multidegree ordering and note that since we have no generators with $\vec{j}=\vec{0}$ , this is in fact a multidegree filtration. Finally we let $\widetilde{\Gamma}$ be the lattice generated by $\exp(e_{\vec{j},i})$ .

The analysis in Lemma 10.4 can easily be extended to prove that $\widetilde{G}$ has a filtered Mal’cev basis $\widetilde{X}$ of complexity $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ where the basis elements are height $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ linear combinations of $(r-1)$ -fold commutators of $e_{\vec{j_{1}},i_{1}},\ldots,e_{\vec{j_{r}},i_{r}}$ . Furthermore note that the dimension of $\widetilde{G}$ is $d^{O_{J,J^{\prime}}(1)}$ .

We now lift $\beta$ to $\widetilde{G}/\widetilde{\Gamma}$ . Define the homomorphism $\phi\colon\widetilde{G}\to G$ via

\phi(\exp(e_{\vec{j},i}))=\exp(X_{\vec{j},i});

here we are writing $\mathcal{X}^{\vec{j}}=\{X_{\vec{j},1},\ldots,X_{\vec{j},\dim(G_{\vec{j}})}\}$ . That this is a homormorphism follows from noting that all relations in $\widetilde{G}$ are present in $G$ because $G$ has multidegree $J\cup J^{\prime}$ . We next lift the polynomial sequence $g$ to

\widetilde{g}(h_{1},\ldots,h_{k})=\prod_{\vec{j}\neq\vec{0}}\prod_{i=1}^{\dim(G_{\vec{j}})}\exp(e_{\vec{j},i})^{\alpha_{i,\vec{j}}\prod_{\ell=1}^{k}(h_{\ell}^{j_{\ell}}/j_{\ell}!)}

and $F$ to $\widetilde{F}$ via

\widetilde{F}(\widetilde{g}\widetilde{\Gamma})=F(\phi(\widetilde{g})\Gamma).

Note that since $\phi(\widetilde{\Gamma})\leqslant\Gamma$ , this is a well-defined function on $\widetilde{G}/\widetilde{\Gamma}$ . Furthermore, noting various properties of $\widetilde{X}$ and that elements of $\mathcal{X}^{\vec{j}}$ are appropriately bounded and rational linear combinations of $\mathcal{X}$ , we have that $\widetilde{F}$ is $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -Lipschitz with respect to the Mal’cev basis specified by $\widetilde{\mathcal{X}}$ . Therefore for the remainder of the proof we operate with the nilsequence

\widetilde{F}(\widetilde{g}(h_{1},\ldots,h_{k})\widetilde{\Gamma}).

For the remainder of the analysis we furthermore assume that there exists $g^{\ast}\in\widetilde{G}$ with $d_{\widetilde{G},\widetilde{\mathcal{X}}}(g^{\ast})\leq M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ such that if $\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-1/2,1/2]^{\dim(G)}$ is identified with $\widetilde{G}/\widetilde{\Gamma}$ then $\operatorname{supp}(\widetilde{F})$ lies in $\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-\delta,\delta]^{\dim(G)}$ . We will ultimately take $\delta=M^{-O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ sufficiently small. If we prove the proposition with $\varepsilon^{\prime}=\varepsilon\cdot\delta^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ for functions with such restricted support then the result in generality follows by Lemma B.3.

The use of the universal nilmanifold comes precisely when defining the following two nilmanifolds for the split terms. Let $\widetilde{G}_{>J}$ be the group generated by $\widetilde{G}_{\vec{i}}$ with $\vec{i}\in J^{\prime}\setminus J$ and $\widetilde{G}_{>J^{\prime}}$ be the group generated by $\widetilde{G}_{\vec{i}}$ with $\vec{i}\in J\setminus J^{\prime}$ . It is trivial to see that $\widetilde{G}_{>J},\widetilde{G}_{>J^{\prime}}$ are normal and $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -rational with respect to $\widetilde{\mathcal{X}}$ . Let $\widetilde{\mathcal{X}}^{J}$ and $\widetilde{\mathcal{X}}^{J^{\prime}}$ be bases for the Lie algebras of $\log(\widetilde{G}_{>J})$ and $\log(\widetilde{G}_{>J^{\prime}})$ which are $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -rational bounded combinations of $\widetilde{\mathcal{X}}$ .

We consider the nilmanifolds $\widetilde{G}/(\widetilde{G}_{>J}\widetilde{\Gamma})$ and $\widetilde{G}/(\widetilde{G}_{>J^{\prime}}\widetilde{\Gamma})$ . The first is clearly a multidegree $J^{\prime}$ nilmanifold while the second is a multidegree $J$ nilmanifold, each of complexity $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ . Furthermore we can choose underlying Mal’cev bases which are $M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -rational combinations of $\widetilde{\mathcal{X}}~{}\mathrm{mod}~{}\widetilde{\mathcal{X}}^{J}$ and $\widetilde{\mathcal{X}}~{}\mathrm{mod}~{}\widetilde{\mathcal{X}}^{J^{\prime}}$ , respectively. (See, e.g., the arguments regarding $G_{\mathrm{Quot}}$ in Section 10.3.) Note here that $\widetilde{\Gamma}_{>J}=\widetilde{\Gamma}/(\widetilde{\Gamma}\cap\widetilde{G}_{>J})$ and analogously for $\widetilde{\Gamma}_{>J^{\prime}}$ .

The key point is that by construction, $\widetilde{G}_{>J}\cap\widetilde{G}_{>J^{\prime}}=\mathrm{Id}_{\widetilde{G}}$ . This implies that there exist linear maps $A$ and $B$ such that

(C.2)

A\circ\psi_{\exp,\widetilde{G}/\widetilde{G}_{>J}}(z~{}\mathrm{mod}~{}\widetilde{G}_{>J})+B\circ\psi_{\exp,\widetilde{G}/\widetilde{G}_{>J^{\prime}}}(z~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}})=\psi_{\exp,\widetilde{G}}(z)

for all $z\in\widetilde{G}$ . Furthermore one can take $A$ and $B$ bounded in the sense that

d_{\widetilde{G}}(A\circ\psi_{\exp,\widetilde{G}/\widetilde{G}_{>J}}(\exp(\widetilde{X}_{i})~{}\mathrm{mod}~{}\widetilde{G}_{>J}),\mathrm{id}_{\widetilde{G}})\leq M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}

for all $\widetilde{X}_{i}\in\widetilde{\mathcal{X}}$ and analogously for $B$ and $J^{\prime}$ .

We now identify $\widetilde{G}/\widetilde{\Gamma}$ via $\psi_{\widetilde{G}}$ with the domain $\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-1/2,1/2]^{\dim(\widetilde{G})}$ and we only have support of $\widetilde{F}$ in $\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-\delta,\delta]^{\dim(\widetilde{G})}$ . Given $x\in\widetilde{G}$ such that $\psi_{\mathrm{exp},\widetilde{G}}(x)\in\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-\delta,\delta]^{\dim(\widetilde{G})}$ , we have that

	$\displaystyle\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(x~{}\mathrm{mod}~{}\widetilde{G}_{>J})\in\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(g^{\ast}~{}\mathrm{mod}~{}\widetilde{G}_{>J})+(-\delta,\delta]^{\dim(\widetilde{G}/\widetilde{G}_{>J})}\cdot M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})},$
	$\displaystyle\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J^{\prime}}}(x~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}})\in\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J^{\prime}}}(g^{\ast}~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}})+(-\delta,\delta]^{\dim(\widetilde{G}/\widetilde{G}_{>J^{\prime}})}\cdot M^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}.$

Given that $\delta=M^{-O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ is sufficiently small, these are contained

	$\displaystyle\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(g^{\ast}~{}\mathrm{mod}~{}\widetilde{G}_{>J})+(-1/4,1/4]^{\dim(\widetilde{G}/\widetilde{G}_{>J})},$
	$\displaystyle\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J^{\prime}}}(g^{\ast}~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}})+(-1/4,1/4]^{\dim(\widetilde{G}/\widetilde{G}_{>J^{\prime}})},$

respectively.

Identify $\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-1/2,1/2]^{\dim(G)}$ with the torus (note that the boundaries are glued differently than in $\widetilde{G}/\widetilde{\Gamma}$ , but we are near the center so it is not an issue). We have that $\widetilde{F}$ is an $(M/\delta)^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -Lipschitz function with respect to the standard torus metric (see e.g. [45, Lemma 2.3] and [42, Lemma B.3]). Thus for $x\in\widetilde{G}$ such that $\psi_{\mathrm{exp},\widetilde{G}}(x)\in\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-1/2,1/2]^{\dim(G)}$ , via standard Fourier approximation (see e.g. [49, Lemma A.8]), for $\xi\in\mathbb{Z}^{\dim(\widetilde{G})}$ there exist $c_{\xi}$ with $|c_{\xi}|\leq(M/(\delta\varepsilon^{\prime}))^{O(\dim(\widetilde{G}))}$ such that

\bigg{\lVert}\widetilde{F}(x\widetilde{\Gamma})-\sum_{\lVert\xi\rVert_{\infty}\leq(M/(\delta\varepsilon))^{O(\dim(\widetilde{G}))}}c_{\xi}e(\xi\cdot\psi_{\exp}(x))\bigg{\rVert}_{\infty}\leq\varepsilon^{\prime}

where the sum is over $\xi\in\mathbb{Z}^{\dim(\widetilde{G})}$ . Using (C.2) we may write this equivalently as

\bigg{\lVert}\widetilde{F}(x\widetilde{\Gamma})-\sum_{\xi}c_{\xi}e(\xi\cdot(A\circ\psi_{\exp,\widetilde{G}_{>J}}(x~{}\mathrm{mod}~{}\widetilde{G}_{>J})))e(\xi\cdot(B\circ\psi_{\exp,\widetilde{G}_{>J^{\prime}}}(x~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}})))\bigg{\rVert}_{\infty}\leq\varepsilon^{\prime},

where again the sum is over $\lVert\xi\rVert_{\infty}\leq(M/(\delta\varepsilon^{\prime}))^{O(\dim(\widetilde{G}))}$ .

For $z$ such that $\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(z)-\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(g^{\ast}~{}\mathrm{mod}~{}\widetilde{G}_{>J})\in(-1/2,1/2]^{\dim(\widetilde{G}/\widetilde{G}_{>J})}$ , we let

\tau_{>J,\xi}(z)=\rho(\lVert\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(z)-\psi_{\mathrm{exp},\widetilde{G}/\widetilde{G}_{>J}}(g^{\ast}~{}\mathrm{mod}~{}\widetilde{G}_{>J})\rVert)\cdot e(\xi\cdot(A\circ\psi_{\exp,\widetilde{G}_{>J}}(z)))

with $\rho(x)=1$ for $|x|\leq 1/4$ , $\rho(x)=0$ for $|x|\geq 1/3$ , and $\rho$ is $O(1)$ -Lipschitz and extends to $\widetilde{G}/(\widetilde{G}_{>J}\widetilde{\Gamma})$ via periodicity. $\tau_{>J}$ is seen to be an $(M/(\delta\varepsilon^{\prime}))^{O_{J,J^{\prime}}(d^{O_{J,J^{\prime}}(1)})}$ -Lipschitz function on $\widetilde{G}/(\widetilde{G}_{>J}\widetilde{\Gamma})$ . This follows via the size of $\xi$ and that distance in $d_{\widetilde{G}}$ controls distance in first-kind coordinates (see e.g. [42, Lemmas B.1, B.3]). Define $\tau_{>J^{\prime},\xi}$ in the same manner. We have that

\bigg{\lVert}\widetilde{F}(x\widetilde{\Gamma})-\sum_{\lVert\xi\rVert_{\infty}\leq(M/(\delta\varepsilon^{\prime}))^{O(\dim(\widetilde{G}))}}c_{\xi}\tau_{>J,\xi}((x~{}\mathrm{mod}~{}\widetilde{G}_{>J})\widetilde{\Gamma}_{>J})\tau_{>J,\xi}((x~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}})\widetilde{\Gamma}_{>J^{\prime}})\bigg{\rVert}_{\infty}\leq\varepsilon^{\prime}.

As this holds for all $x$ such that $\psi_{\mathrm{exp},\widetilde{G}}(x)\in\psi_{\mathrm{exp},\widetilde{G}}(g^{\ast})+(-1/2,1/2]^{\dim(G)}$ and the approximating function is invariant under $\widetilde{\Gamma}$ , this holds for all $x\in\widetilde{G}$ . This completes the proof, plugging in $x=\widetilde{g}(h_{1},\ldots,h_{k})$ and noting that $\widetilde{g}~{}\mathrm{mod}~{}\widetilde{G}_{>J}$ and $\widetilde{g}~{}\mathrm{mod}~{}\widetilde{G}_{>J^{\prime}}$ are multidegree $J^{\prime}$ and $J$ polynomial sequences on $\widetilde{G}/(\widetilde{G}_{>J}\widetilde{\Gamma})$ and $\widetilde{G}/(\widetilde{G}_{>J^{\prime}}\widetilde{\Gamma})$ respectively. ∎

References

[1] D. Altman, A non-flag arithmetic regularity lemma and counting lemma, arXiv:2209.14083.
[2] D. Altman, On a conjecture of Gowers and Wolf, Discrete Anal. (2022), Paper No. 10, 13.
[3] V. Bergelson, B. Host, and B. Kra, Multiple recurrence and nilsequences, Invent. Math. 160 (2005), 261–303, With an appendix by Imre Ruzsa.
[4] V. Bergelson, T. Tao, and T. Ziegler, An inverse theorem for the uniformity seminorms associated with the action of $\mathbb{F}^{\infty}_{p}$ , Geom. Funct. Anal. 19 (2010), 1539–1596.
[5] T. F. Bloom and O. Sisask, An improvement to the Kelley-Meka bounds on three-term arithmetic progressions, arXiv:2309.02353.
[6] O. A. Camarena and B. Szegedy, Nilspaces, nilmanifolds and their morphisms, arXiv:1009.3825.
[7] P. Candela, Notes on compact nilspaces, Discrete Anal. (2017), Paper No. 16, 57.
[8] P. Candela, Notes on nilspaces: algebraic aspects, Discrete Anal. (2017), Paper No. 15, 59.
[9] P. Candela, D. González-Sánchez, and B. Szegedy, On the inverse theorem for gowers norms in abelian groups of bounded torsion, arXiv:2311.13899.
[10] P. Candela and B. Szegedy, Regularity and inverse theorems for uniformity norms on compact abelian groups and nilmanifolds, J. Reine Angew. Math. 789 (2022), 1–42.
[11] J.-P. Conze and E. Lesigne, Théorèmes ergodiques pour des mesures diagonales, Bull. Soc. Math. France 112 (1984), 143–175.
[12] L. J. Corwin and F. P. Greenleaf, Representations of nilpotent Lie groups and their applications. Part I, Cambridge Studies in Advanced Mathematics, vol. 18, Cambridge University Press, Cambridge, 1990, Basic theory and examples.
[13] P. Erdős and P. Turán, On Some Sequences of Integers, J. London Math. Soc. 11 (1936), 261–264.
[14] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions, J. Analyse Math. 31 (1977), 204–256.
[15] H. Furstenberg and B. Weiss, A mean ergodic theorem for $(1/N)\sum^{N}_{n=1}f(T^{n}x)g(T^{n^{2}}x)$ , Convergence in ergodic theory and probability (Columbus, OH, 1993), Ohio State Univ. Math. Res. Inst. Publ., vol. 5, de Gruyter, Berlin, 1996, pp. 193–227.
[16] W. T. Gowers, A new proof of Szemerédi’s theorem for arithmetic progressions of length four, Geom. Funct. Anal. 8 (1998), 529–551.
[17] W. T. Gowers, Arithmetic progressions in sparse sets, Current developments in mathematics, 2000, Int. Press, Somerville, MA, 2001, pp. 149–196.
[18] W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11 (2001), 465–588.
[19] W. T. Gowers and L. Milićević, An inverse theorem for Freiman multi-homomorphisms, arXiv:2002.11667.
[20] W. T. Gowers and L. Milićević, A quantitative inverse theorem for the $U^{4}$ norm over finite fields, arXiv:1712.00241.
[21] W. T. Gowers and J. Wolf, The true complexity of a system of linear equations, Proc. Lond. Math. Soc. (3) 100 (2010), 155–176.
[22] B. Green, 100 open problems, manuscript, available on request.
[23] B. Green and T. Tao, An inverse theorem for the Gowers $U^{3}(G)$ norm, Proc. Edinb. Math. Soc. (2) 51 (2008), 73–153.
[24] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008), 481–547.
[25] B. Green and T. Tao, New bounds for Szemerédi’s theorem. II. A new bound for $r_{4}(N)$ , Analytic number theory, Cambridge Univ. Press, Cambridge, 2009, pp. 180–204.
[26] B. Green and T. Tao, An arithmetic regularity lemma, an associated counting lemma, and applications, An irregular mind, Bolyai Soc. Math. Stud., vol. 21, János Bolyai Math. Soc., Budapest, 2010, pp. 261–334.
[27] B. Green and T. Tao, Linear equations in primes, Ann. of Math. (2) 171 (2010), 1753–1850.
[28] B. Green and T. Tao, The Möbius function is strongly orthogonal to nilsequences, Ann. of Math. (2) 175 (2012), 541–566.
[29] B. Green and T. Tao, The quantitative behaviour of polynomial orbits on nilmanifolds, Ann. of Math. (2) 175 (2012), 465–540.
[30] B. Green and T. Tao, New bounds for Szemerédi’s theorem, III: a polylogarithmic bound for $r_{4}(N)$ , Mathematika 63 (2017), 944–1040.
[31] B. Green, T. Tao, and T. Ziegler, Erratum for “An inverse theorem for the Gowers $U^{s+1}[N]$ -norm”, manuscript.
[32] B. Green, T. Tao, and T. Ziegler, An inverse theorem for the Gowers $U^{4}$ -norm, Glasg. Math. J. 53 (2011), 1–50.
[33] B. Green, T. Tao, and T. Ziegler, An inverse theorem for the Gowers $U^{s+1}[N]$ -norm, Electron. Res. Announc. Math. Sci. 18 (2011), 69–90.
[34] B. Green, T. Tao, and T. Ziegler, An inverse theorem for the Gowers $U^{s+1}[N]$ -norm, Ann. of Math. (2) 176 (2012), 1231–1372.
[35] Y. Gutman, F. W. R. M. Manners, and P. P. Varjú, The structure theory of nilspaces II: Representation as nilmanifolds, Trans. Amer. Math. Soc. 371 (2019), 4951–4992.
[36] Y. Gutman, F. W. R. M. Manners, and P. P. Varjú, The structure theory of nilspaces I, J. Anal. Math. 140 (2020), 299–369.
[37] Y. Gutman, F. W. R. M. Manners, and P. P. Varjú, The structure theory of nilspaces III: Inverse limit representations and topological dynamics, Adv. Math. 365 (2020), 107059, 53.
[38] B. Host and B. Kra, Nonconventional ergodic averages and nilmanifolds, Ann. of Math. (2) 161 (2005), 397–488.
[39] A. Jamneshan, O. Shalom, and T. Tao, The structure of arbitrary Conze–Lesigne systems, Comm. Amer. Math. Soc. 4 (2024), 182–229.
[40] A. Jamneshan and T. Tao, The inverse theorem for the $U^{3}$ Gowers uniformity norm on arbitrary finite abelian groups: Fourier-analytic and ergodic approaches, Discrete Anal. (2023), Paper No. 11, 48.
[41] Z. Kelley and R. Meka, Strong bounds for 3-progressions, arXiv:2302.05537.
[42] J. Leng, Efficient Equidistribution of Nilsequences, arXiv:2312.10772.
[43] J. Leng, Efficient Equidistribution of Periodic Nilsequences and Applications, arXiv:2306.13820.
[44] J. Leng, Improved Quadratic Gowers Uniformity for the Möbius Function, arXiv:2212.09635.
[45] J. Leng, A. Sah, and M. Sawhney, Improved bounds for five-term arithmetic progressions, arXiv:2312.10776.
[46] J. Leng, A. Sah, and M. Sawhney, Improved Bounds for Szemerédi’s Theorem, arXiv:2402.17995.
[47] F. W. R. M. Manners, Quantitative bounds in the inverse theorem for the Gowers ${U}^{s+1}$ -norms over cyclic groups, arXiv:1811.00718.
[48] L. Milićević, Bilinear Bogolyubov Argument in Abelian Groups, arXiv:2109.03093.
[49] S. Peluse, A. Sah, and M. Sawhney, Effective bounds for Roth’s theorem with shifted square common difference, arXiv:2309.08359.
[50] K. F. Roth, On certain sets of integers. II, J. London Math. Soc. 29 (1954), 20–26.
[51] T. Sanders, On certain other sets of integers, J. Anal. Math. 116 (2012), 53–82.
[52] T. Sanders, On the Bogolyubov-Ruzsa lemma, Anal. PDE 5 (2012), 627–655.
[53] B. Szegedy, On higher order Fourier analysis, arXiv:1203.2260.
[54] E. Szemerédi, On sets of integers containing no four elements in arithmetic progression, Number Theory (Colloq., János Bolyai Math. Soc., Debrecen, 1968), Colloq. Math. Soc. János Bolyai, vol. 2, North-Holland, Amsterdam-London, 1970, pp. 197–204.
[55] E. Szemerédi, On sets of integers containing no $k$ elements in arithmetic progression, Acta Arith. 27 (1975), 199–245.
[56] T. Tao, Goursat and Furstenberg–Weiss type lemmas, 2021, blog post. https://terrytao.wordpress.com/2021/05/07/goursat-and-furstenberg-weiss-type-lemmas/.
[57] T. Tao and J. Teräväinen, Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions, J. Eur. Math. Soc. (2023), 1–64.
[58] T. Tao and V. H. Vu, Additive combinatorics, Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2010, Paperback edition [of MR2289012].
[59] T. Tao and T. Ziegler, The inverse conjecture for the Gowers norm over finite fields via the correspondence principle, Anal. PDE 3 (2010), 1–20.
[60] T. Tao and T. Ziegler, The inverse conjecture for the Gowers norm over finite fields in low characteristic, Ann. Comb. 16 (2012), 121–188.
[61] T. Tao and T. Ziegler, Polynomial patterns in the primes, Forum Math. Pi 6 (2018), e1, 60.
[62] T. Ziegler, Universal characteristic factors and Furstenberg averages, J. Amer. Math. Soc. 20 (2007), 53–97.

	$\displaystyle\|\mathbb{E}_{n,m\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}a(n)b(m)\Phi(n,m)\|^{4}$	$\displaystyle\leq\big{(}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{m\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}b(m)\Phi(n,m)\|\big{)}^{4}$
		$\displaystyle\leq\big{(}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{m\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}b(m)\Phi(n,m)\|^{2}\big{)}^{2}$
		$\displaystyle=\big{(}\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\mathbb{E}_{m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}b(m)\overline{b(m^{\prime})}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\|\big{)}^{2}$
		$\displaystyle=\big{(}\mathbb{E}_{m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\|\big{)}^{2}$
		$\displaystyle\leq\mathbb{E}_{m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\|\mathbb{E}_{n\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\|^{2}$
(7.1)			$\displaystyle=\mathbb{E}_{n,n^{\prime},m,m^{\prime}\in\mathbb{Z}/\widetilde{N}\mathbb{Z}}\Phi(n,m)\overline{\Phi(n,m^{\prime})}\overline{\Phi(n^{\prime},m)}\Phi(n^{\prime},m^{\prime})\big{)}.$

Quasipolynomial bounds on the inverse theorem for the Gowers Us+1​[N]U^{s+1}[N]-norm

Abstract.

1. Introduction

Definition 1.1.

Remark.

Theorem 1.2.

Remark.

Theorem 1.3 (Theorem 1.1 in [46]).

1.1. History and previous results

Theorem 1.4.

1.2. Organization of the paper I

Acknowledgements

2. Conventions on nilmanifolds

2.1. Basic group theory

Lemma 2.1.

Lemma 2.2.

Remark.

Proof.

2.2. Filtrations

Definition 2.3.

Definition 2.4.

Remark.

Definition 2.5.

Definition 2.6.

Definition 2.7.

Remark.

Definition 2.8.

2.3. Horizontal tori and Taylor coefficients

Definition 2.9.

Remark.

Definition 2.10.

Definition 2.11.

Lemma 2.12.

Proof.

Lemma 2.13.

Remark 2.14.

Proof.

2.4. Vertical tori and nilcharacters

Definition 2.15.

Remark.

Definition 2.16.

Remark.

2.5. Additional miscellaneous conventions

3. Various complexity notions

3.1. Rationality of bases and Lipschitz norms

Definition 3.1.

Definition 3.2.

Definition 3.3.

Definition 3.4.

Remark.

3.2. Complexity of nilmanifolds

Definition 3.5.

Remark.

Remark.

Definition 3.6.

Fact 3.7.

Proof.

Definition 3.8.

Fact 3.9.

Lemma 3.10.

Proof.

3.3. Size of vertical and horizontal characters

Definition 3.11.

Definition 3.12.

Definition 3.13.

Remark.

3.4. Correlation

Definition 3.14.

3.5. Miscellaneous complexity notions

Definition 3.15.

Remark.

Definition 3.16.

4. Proof outline

4.1. Induction on degree and additive quadruples

4.2. Sunflower and linearization for the top degree-rank

4.3. Degree-rank iteration

4.4. Nilcharacters and horizontal tori

4.5. Quantitative bounds

4.6. Organization of the paper II

5. Efficient equidistribution theory of nilsequences

Quasipolynomial bounds on the inverse theorem for the Gowers $U^{s+1}[N]$ -norm

10. Setup for extracting a $(1,s-1)$ -nilsequence

11. Extracting a $(1,s-1)$ -nilsequence

11.1. Constructing the $(1,s-1)$ -nilsequence