Cutoff for random Cayley graphs of nilpotent groups

Jonathan Hermon¹ [email protected] ¹Department of Mathematics, University of British Columbia, BC, Canada and Xiangying Huang² [email protected] ²Department of Statistics and Operations Research, 304 Hanes Hall, University of North Carolina at Chapel Hill, US

Abstract.

We consider the random Cayley graphs of a sequence of finite nilpotent groups of diverging sizes $G=G(n)$ , whose ranks and nilpotency classes are uniformly bounded. For some $k=k(n)$ such that $1\ll\log k\ll\log|G|$ , we pick a random set of generators $S=S(n)$ by sampling $k$ elements $Z_{1},\ldots,Z_{k}$ from $G$ uniformly at random with replacement, and set $S:=\{Z_{j}^{\pm 1}:1\leq j\leq k\}$ . We show that the simple random walk on Cay $(G,S)$ exhibits cutoff with high probability.

Some of our results apply to a general set of generators. Namely, we show that there is a constant $c>0$ , depending only on the rank and the nilpotency class of $G$ , such that for all symmetric sets of generators $S$ of size at most $\frac{c\log|G|}{\log\log|G|}$ , the spectral gap and the $\varepsilon$ -mixing time of the simple random walk $X=(X_{t})_{t\geq 0}$ on Cay $(G,S)$ are asymptotically the same as those of the projection of $X$ to the abelianization of $G$ , given by $[G,G]X_{t}$ . In particular, $X$ exhibits cutoff if and only if its projection does.

Key words and phrases:

cutoff, mixing times, random walk, random Cayley graphs, nilpotent groups

2020 Mathematics Subject Classification:

Primary: 05C48, 05C80, 05C81; 20D15; 60B15, 60J27, 60K37.

1. Introduction

1.1. Motivation and Objectives of Paper

We examine the random walk (RW) $X=(X_{t})_{t\geq 0}$ on a Cayley graph $\Gamma:=\mathrm{Cay}(G,S)$ of a finite nilpotent group $G$ w.r.t. a symmetric set of generators $S$ . We are interested in the asymptotic behavior of the mixing time and the spectral gap of the walk as $|G|\to\infty$ , while the step (also called nilpotency class) of $G$ and its rank (minimal size of a set of generators), denoted respectively by $L=L(G)$ and $r=r(G)$ , remain bounded.

We also investigate the occurrence of the cutoff phenomenon for the walk, which in the above setup can only occur when $|S|\gg 1$ (i.e., $|S|$ diverges as $|G|\to\infty$ ). In particular, we prove that the walk exhibits cutoff with high probability when $S$ is obtained by picking $k$ elements of $G$ , $Z_{1},\ldots,Z_{k}$ , uniformly at random, with replacement, and then setting $S:=\{Z_{i}^{\pm 1}:1\leq i\leq k\}$ under the necessary condition that $1\ll\log k\ll\log|G|$ .

The overarching theme of this work is that in a certain strong quantitative sense the mixing behavior of the walk is governed by that of the projection of the walk to the abelianization of $G$ , denoted by $G_{\mathrm{ab}}:=G/[G,G]$ , where $[G,G]$ is the commutator subgroup of $G$ . The projected walk $Y_{t}:=[G,G]X_{t}$ is a random walk on the projected Cayley graph $\Gamma_{\mathrm{ab}}:=\mathrm{Cay}(G_{\mathrm{ab}},S_{[G,G]})$ , where $S_{[G,G]}:=\{[G,G]s:s\in S\}$ is the projection of $S$ to $G_{\mathrm{ab}}$ .

As will be stated precisely in Theorem 2 and Corollary 1, for arbitrary symmetric $S$ such that $|S|\leq\frac{c\log|G|}{\log\log|G|}$ for some constant $c=c(r,L)>0$ depending only on the rank and step of $G$ , the relaxation times (inverses of the spectral gaps) of $\Gamma=\mathrm{Cay}(G,S)$ and $\Gamma_{\mathrm{ab}}=\mathrm{Cay}(G_{\mathrm{ab}},S_{[G,G]})$ are equal:

t^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}},

and the total variation mixing times of the random walks on $\Gamma$ and $\Gamma_{\mathrm{ab}}$ satisfy

t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)\leq t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)

for $\varepsilon\in(0,1)$ and $\delta=\delta(|G|):=|G|\exp\left(-\left(\log|G|\right)^{L}\right)$ . In other words, in order to determine the mixing time or the occurrence of cutoff for the walk on $\Gamma$ , it suffices to do so for $\Gamma_{\mathrm{ab}}$ .

In the case that $S=\{Z_{i}^{\pm 1}:1\leq i\leq k\}$ , where $Z_{1},\ldots,Z_{k}$ are i.i.d. and uniformly distributed over $G$ , we extend the analysis to the regime $1\ll\log k\ll\log|G|$ and prove that the random walk exhibits cutoff with high probability around time $\max\{t_{0}(|G_{\mathrm{ab}}|,k),\log_{k}|G|\}$ , where $t_{0}(n,k)$ is defined as the time at which the entropy of the rate 1 continuous time simple random walk on $\mathbb{Z}^{k}$ is $n$ .

1.1.1. Motivations

To motivate our investigation, let us consider for now the scenario where $G$ is a finite group, $k$ is an integer (allowed to depend on $G$ ) and $\mathcal{G}_{k}$ denotes the Cayley graph of $G$ with respect to $k$ independently and uniformly chosen random generators. These are elements of $G$ that will generate $G$ with high probability when $k$ is sufficiently large and hence, with slight abuse of language, will be referred to as generators of $G$ . We consider values of $k$ with $1\ll\log k\ll\log|G|$ for which $\mathcal{G}_{k}$ is connected with high probability, that is, with probability tending to 1 as $|G|\to\infty$ .

Universality of cutoff.

Aldous and Diaconis [1] introduced the term “cutoff phenomenon”, which describes the phenomenon where the total variation distance (TV) between the distribution of a random walk and its equilibrium distribution sharply decreases from nearly 1 to nearly 0 within a time interval of smaller order than the mixing time. The material in this paper is motivated by their conjecture on the “universality of cutoff” for the RW on the random Cayley graph $\mathcal{G}_{k}$ given in [1].

Conjecture (Aldous and Diaconis, [1]).

For any group $G$ , if $k\gg\log|G|$ and $\log k\ll\log|G|$ , then the random walk on $\mathcal{G}_{k}$ exhibits cutoff with high probability.

Additionally, a secondary aspect of the conjecture suggests that the cutoff time does not rely on the algebraic structure of the group, but rather it can be expressed solely as a function of $k$ and $|G|$ .

This conjecture has sparked a substantial body of research, see e.g., [11, 12, 22, 23, 24, 30, 33]. The curious reader is referred to Section 1.3.1 in [17] for a detailed exposition on the literature regarding the progress on the Aldous-Diaconis university conjecture. In this context, we provide a more condensed overview of literature related to this conjecture, which serves as motivations for our work.

In [11, 12], Dou and Hildebrand confirmed the Aldous-Diaconis universality conjecture for all abelian groups. Additionally, their upper bound on the mixing time holds true for all groups. Furthermore, when $\log k\gg\log\log|G|$ , this upper bound matches the trivial diameter lower bound of $\log_{k}|G|$ , confirming the aforementioned secondary aspect of the conjecture. Hermon and and Olesker-Taylor [17, 18, 20, 21] extend this conjecture to the regime $1\ll k\lesssim\log|G|$ for abelian groups, establishing cutoff under the condition $k-r(G)\gg 1$ , where $r(G)$ is the minimal size of a generating subset of $G$ . Moreover, when $k-r(G)\asymp k\gg 1$ , the cutoff time is given by

\max\{\log_{k}|G|,t_{0}(k,|G|)\},

(1)

where $t_{0}(k,|G|)$ is the time at which the entropy of the rate 1 random walk $W$ on $\mathbb{Z}^{k}$ is $\log|G|$ . Due to this definition $t_{0}(k,|G|)$ is also referred to as the entropic time (see Definition 3). Their work confirms that for abelian groups the cutoff time of RW only depends on $|G|$ and $k$ .

Building upon this point, the next natural step is to explore the mixing behavior of random walks on nilpotent groups (see Section 1.4.1 for a brief overview of the literature concerning this topic). Hermon and Olesker-Taylor [19] study two canonical families of nilpotent groups: the $d\times d$ unit-upper triangular matrices $U_{m,d}$ with entries in $\mathbb{Z}_{m}$ and the $d$ -dimensional Heisenberg group $H_{m,d}$ over $\mathbb{Z}_{m}$ where $m\in\mathbb{N}$ (the results hold under certain assumptions on $m$ depending on the regimes of $k$ ). Let $G$ be either $U_{m,d}$ or $H_{m,d}$ and $G_{\mathrm{ab}}:=G/[G,G]$ its abelianization. They prove that for $1\ll\log k\ll\log|G|$ the random walk on the Cayley graph $\mathcal{G}_{k}$ exhibits cutoff with high probability at time

\max\{\log_{k}|G|,t_{0}(k,|G_{\mathrm{ab}}|)\},

(2)

where $t_{0}(k,|G_{\mathrm{ab}}|)$ is the time at which the entropy of the rate 1 random walk $W$ on $\mathbb{Z}^{k}$ is $\log|G_{\mathrm{ab}}|$ . We compare (2) with (1), the latter of which gives the characterization of cutoff time for abelian groups. Indeed, $t_{0}(k,|G_{\mathrm{ab}}|)$ can be interpreted as the time at which the projection of the walk on $G$ onto the abelianization $G_{\mathrm{ab}}$ exhibits cutoff. This characterization of the cutoff time for random walks on $U_{m,d}$ and $H_{m,d}$ raises a natural question: does (2) characterize the cutoff time for random walks on nilpotent groups in general? This question will be explored in our investigation.

Another natural extension of the current research on random walks on groups involves extending the choice of generators. Rather than requiring the $k$ generators to be chosen independently and uniformly at random from the group $G$ , the aim is to advance our understanding to encompass scenarios involving arbitrary choices of generators.

More often than not, the analysis of the mixing of the random walk heavily replies on the specific selection of generators. For example, there is a line of research focusing on understanding the mixing properties of random walks on the unit upper triangular matrix group $U_{p,d}$ with $p$ prime, wherein the set of generators is either $\{I\pm E_{i,i+1}:1\leq i\leq d-1\}$ or $\{I+aE_{i,i+1}:1\leq i\leq d-1,a\in\mathbb{Z}_{p}\}$ , where $E_{i,j}$ represents the $d\times d$ matrix with 1 at the entry $(i,j)$ and 0 elsewhere. See Section 1.4.1 for an overview of existing research. The current analysis critically hinges on the fact that an operation $I+aE_{i,i+1}$ corresponds to a row addition/subtraction, allowing the decomposition of the walk’s mixing behavior into the first row and the remaining part of the matrix, the latter of which can be regarded as a $(d-1)\times(d-1)$ matrix. However, such methodologies become inapplicable when dealing with arbitrary generators. It is hence of interest to develop techniques that enable the study of mixing properties of random walks on Cayley graphs using a wider range of prescribed generators.

Leading role of the abelianization in random walk mixing.

When $G$ is a nilpotent group and $S$ is a symmetric set of generators consisting of i.i.d. uniform random elements of $G$ , we shall see in Theorem 1 that the mixing time of the RW is determined by the abelianization $G_{\mathrm{ab}}$ of $G$ , given by the expression (2).

When the generator set is predetermined, in many instances it has also been demonstrated that the mixing time of random walks on groups is primarily determined by the abelianization of the group. In a recent and notable study, Diaconis and Hough [9] introduced a novel approach to establishing a central limit theorem for random walks on unipotent matrix groups driven by a probability measure $\mu$ , under certain general constraints. It is worth noting that their methodology applies to various choices of generators. For unit-upper triangular matrices $U_{p,d}$ , it has been established that an individual coordinate on the $k$ -th diagonal mixes in order $p^{2/k}$ steps, implying the leading role of abelianization (which corresponds to the first diagonal) in the mixing process of this random walk.

Nestoridi and Sly [27] studied the mixing behavior of the rate 1 RW on $U_{m,d}$ under the canonical set of generators $\{I\pm E_{i,i+1}:i\in[d-1]\}$ . In their analysis, it is proved that the mixing time of the RW on $U_{m,d}$ is bounded by $O(m^{2}d\log d+d^{2}m^{o(1)})$ , where the former term vaguely characterizes the mixing behavior of the RW on the abelianization. This observation becomes clearer when we consider the projected walk on the abelianization, which, under the canonical set of generators, can be viewed as a product chain on $\mathbb{Z}_{m}^{d-1}$ and thus has mixing time of order $m^{2}d\log d$ . Essentially, when $m$ is considerably larger than $d$ , this upper bound is predominantly dictated by the mixing on the abelianization.

One might naturally inquire about the extent to which the abelianization dictates the mixing time of the RW on a general group. In addition, there is interest in explicitly identifying the dependence of the mixing time on the abelianization, which leads to our next motivation.

“Entropic time paradigm”.

As previously discussed, although the entropic time is the mixing time for “most” choice of generators (when $1\ll k\lesssim\log|G|$ ) for abelian groups and nilpotent groups, finding an explicit choice of generators which gives rise to cutoff at the entropic time is still open — even for the cyclic group of prime order. Part of our motivation is to understand the extent to which this paradigm applies for a given set of generators.

It is worth pointing out that for general non-random choice of generators, the cutoff time is not necessarily given by the entropic time. For instance, Hough [24, Theorem 1.11] shows that for the cyclic group $\mathbb{Z}_{p}$ of prime order the choice of generators $S:=\{0\}\cup\{\pm 2^{i}:0\leq i\leq\lfloor\log_{2}p\rfloor-1\}$ , which he describes as “an approximate embedding of the classical hypercube walk into the cycle”, gives rise to a random walk on $\mathbb{Z}_{p}$ that ehxibits cutoff, where the cutoff time is not the entropic time.

Mixing under minimal sets of generators.

A minimal set of generators is a set of elements that generates the group which is minimal in terms of size. For $p$ -groups, it is known that the minimal sets of generators can be described by the Frattini subgroup $\Phi=\Phi(G)$ in the sense that any set $\{x_{1},\dots,x_{r}\}\subseteq G$ such that the cosets $\{\Phi x_{i}:1\leq i\leq r\}$ form a basis of $G/\Phi$ gives a generating set of $G$ . See, e.g., Diaconis and Saloff-Coste [10, Section 5.C]. A random walk supported on the minimal set of generators is thus referred to as a Frattini walk. Examples of such walks are discussed in Section 5.C of [10]. For the Heisenberg group $H_{p,3}$ with prime $p$ , it can be shown that all minimal sets of generators are equivalent from a group theory approach.

Diaconis and Saloff-Coste additionally remarked that based on their experience with the circle and symmetric group, if the number of generators is fixed, most sets of generators should lead to the same convergence rate for the random walk. Motivated by these examples, they ask the following open question (see Remark 2 on Page 23 of [10]): to what extent does the choice of generators effect the mixing behavior?

We give a neat partial answer to this question in Theorem 3. Suppose that $G$ is a $p$ -group with $G_{\mathrm{ab}}\cong\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}}$ or that $G_{\mathrm{ab}}\cong\mathbb{Z}_{m}^{r}$ for some $m\in\mathbb{N}$ . Under very mild assumptions on the rank and step of $G$ , for all minimal (symmetric) sets of generators, the corresponding mixing times on $G$ are the same up to smaller order terms and the corresponding relaxation times are the same.

1.1.2. Objectives

Motivated by the questions discussed in the preceding section, our primary focus in this paper is as follows.

(i) Study the random walk on $\mathcal{G}_{k}$ for general nilpotent groups. Expanding upon the current understanding of random walks on groups, our goal is to establish cutoff for random walks on $\mathcal{G}_{k}$ when $G$ is a nilpotent group and $1\ll\log k\ll\log|G|$ . In particular, we are interested in a general characterization of the cutoff time. An important implication of the findings in [19] is that for certain regimes of $k$ , the cutoff time for $G=U_{m,d}$ (or $G=H_{m,d}$ ) does not depend only on $k$ and $|G|$ . Nevertheless, the only additional information required to determine the cutoff times for these two examples is the size of the abelianization, as indicated by (2). We hope to generalize the characterization of the cutoff time in (2) to general nilpotent groups.

We thank Péter Varjú for suggesting us the problem of extending the analysis from [19] to other¹¹1Namely, the case that $G$ is step 2 and $G_{\mathrm{ab}}\cong\mathbb{Z}_{p}^{r}$ , i.e., $G_{\mathrm{ab}}$ is elementary abelian. We also wish to thank him for providing invaluable insights regarding how certain components of the argument from [19] could be interpreted in terms of the general theory of nilpotent groups. nilpotent groups.

(ii) Develop techniques applicable when the generators are chosen arbitrarily. As indicated by previous discussions, the mixing time of random walks on a group under various choices of generators is largely determined by the abelianization of the group. We aim to explore the extent to which this leading role of the abelianization holds in a broader context.

In essence, our objective is to develop techniques for studying the mixing properties of random walks, applicable not only under arbitrary choices of generators but also for general groups, without dependence on specific group structures.

1.2. Definitions and Notation

We give the precise definitions of the Cayley graph on $G$ and the random walk on Cayley graphs.

Let $G$ be a nilpotent group with lower central series

G=G_{1}\trianglerighteq G_{2}\trianglerighteq\cdots\trianglerighteq G_{L}\trianglerighteq G_{L+1}=\{\mathrm{id}\}

where $G_{i+1}:=[G_{i},G]=\langle\{[g,g^{\prime}]:g\in G_{i},g^{\prime}\in G\}\rangle$ . In particular, $G_{2}=[G,G]$ denotes the commutator subgroup of $G$ . We also denote by $G_{\mathrm{ab}}:=G/G_{2}$ the abelianization of $G$ . The rank of a nilpotent group $G$ , denoted by $r=r(G)$ , is the smallest integer $r$ such that $G$ can be generated by a set containing $r$ elements of $G$ and their inverses. The number $L=L(G)$ is called the step (or the nilpotency class) of $G$ , i.e., $|G_{L}|>1$ and $|G_{L+1}|=1$ .

For a finite group $G$ , let $S\subseteq G$ be a symmetric subset, i.e., $s\in S$ if and only if $s^{-1}\in S$ . We will refer to $S$ as the set of generators when $S$ generates $G$ . The undirected Cayley graph of $G$ generated by $S$ is defined as follows.

Definition 1 (Cayley multi-graph generated by a set of generators).

Fix a symmetric set $S:=\{s_{i}^{\pm 1}:i\in[k]\}\subseteq G$ of generators. Let $\text{Cay}(G,S)$ denote the (right) Cayley multi-graph generated by $G$ with respect to $S$ , where the vertex set $\mathbb{V}:=\{g:g\in G\}$ and the edge set $\mathbb{E}:=\{\{g,gs\}:g\in G,s\in S\}$ . We allow parallel edges and self loops (if $\mathrm{id}\in S$ ) so that the Cayley graph $\text{Cay}(G,S)$ is regular with degree $2k$ .

Random walk on Cayley graphs. We will consider the undirected random walk $X_{t}$ on the Cayley graph $\text{Cay}(G,S)$ which jumps at rate 1, where $S:=\{s_{i}^{\pm 1}:i\in[k]\}$ . Let $\{\sigma_{i}\}_{i\in\mathbb{N}}$ be an i.i.d. sequence of indices uniformly sampled from $[k]$ , and let $\{\eta_{i}\}_{i\in\mathbb{N}}$ be an i.i.d. sequence of signs uniformly sampled from $\{\pm 1\}$ . At the $i$ -th jump, the generator $s_{\sigma_{i}}^{\eta_{i}}$ is applied to the walk $X$ in the sense that we multiply $s_{\sigma_{i}}^{\eta_{i}}$ to the right of the current location of $X$ . That is, the random walk $X$ can be written as a sequence

X=\prod_{i=1}^{N}s^{\eta_{i}}_{\sigma_{i}}=s_{\sigma_{1}}^{\eta_{1}}s_{\sigma_{2}}^{\eta_{2}}\cdots s_{\sigma_{N}}^{\eta_{N}},

where $N:=N(t)$ is the number of steps taken by $X$ by time $t$ and $s^{\eta_{i}}_{\sigma_{i}}$ denotes the $i$ -th step taken by the random walk with $\sigma_{i}\in[k],\eta_{i}\in\{\pm 1\}$ .

Notation. Throughout the paper, we use standard asymptotic notation: “ $\ll$ ” or “ $o(\cdot)$ ” means “of smaller order”; “ $\lesssim$ ” or “ $\mathcal{O}(\cdot)$ ” means “of order at most”; “ $\asymp$ ” means “of the same order”; “ $\eqsim$ ” means “asymptotically equivalent”. We will abbreviate “with high probability” by whp.

Assumptions. Throughout the paper, we will let $G$ be a finite nilpotent group of step $L\geq 2$ and rank $r$ where $r,L\asymp 1$ .

1.3. Overview of Main Results

We focus on the mixing behavior of the random walk on a Cayley graph $\text{Cay}(G,S)$ of a finite nilpotent group $G$ with a symmetric generator set $S=\{s_{i}^{\pm 1}:i\in[k]\}$ . We consider the limit as $|G|\to\infty$ under the assumption that $1\ll\log k\ll\log|G|$ . The condition $1\ll\log k\ll\log|G|$ is necessary for the random walk to exhibit cutoff on $\text{Cay}(G,S)$ for all nilpotent $G$ , see the remark below.

Remark 1.

For any choice of generators, it was established by Diaconis and Saloff-Coste [10] that there is no cutoff when $k\asymp 1$ for all nilpotent groups, which is a class of groups that satisfies their concept of moderate growth. The interested reader can find a short exposition of their argument in [20, §4]. When $\log k\asymp\log|G|$ and with $k$ i.i.d. uniform generators, there is no cutoff for all groups, see [17, §7.2]. Dou [12, Theorems 3.3.1 and 3.4.7] establishes a more general result for $\log k\asymp\log|G|$ .

1.3.1. Cutoff for Random Walks on Nilpotent Groups

We use standard notation and definitions for mixing and cutoff, see e.g. [32, §4 and §18].

Definition 2.

A sequence $(X_{N})_{N\in\mathbb{N}}$ of Markov chains is said to exhibit cutoff if there exists a sequence of times $(t_{N})_{N\in\mathbb{N}}$ with

\limsup_{N\to\infty}d_{N}((1-\varepsilon)t_{N})=1\quad\text{and}\quad\limsup_{N\to\infty}d_{N}((1+\varepsilon)t_{N})=0\quad\text{ for all }\varepsilon\in(0,1),

where $d_{N}(\cdot)$ is the TV distance of $X_{N}(\cdot)$ from its equilibrium distribution for each $N\in\mathbb{N}$ .

We say that a RW on a sequence of random graphs $(H_{N})_{N\in\mathbb{N}}$ exhibits cutoff around time $(t_{N})_{N\in\mathbb{N}}$ whp if, for all fixed $\varepsilon$ , in the limit $N\to\infty$ , the TV distance at time $(1+\varepsilon)t_{N}$ converges in distribution to 0 and at time $(1-\varepsilon)t_{N}$ to 1, where the randomness is over $H_{N}$ .

In other words, $(X_{N})_{N\in\mathbb{N}}$ is said to exhibits cutoff when the TV distance of the distribution of the chain from equilibrium drops from close to 1 to close to 0 in a short time interval of smaller order than the mixing time.

As briefly discussed in Section 1.1.1, there has been considerable interest in studying the cutoff behavior of random walks on groups. Our goal is to generalize the characterization of cutoff time as $\max\{\log_{k}|G|,t_{0}(k,|G_{\mathrm{ab}}|)\}$ to general nilpotent groups (for random i.i.d. generators).

We now give the formal definition of the entropic time $t_{0}:=t_{0}(k,|G_{\mathrm{ab}})$ and the proposed mixing time.

Definition 3.

(i) Let $t_{0}(k,N)$ be the time at which the entropy of the rate 1 random walk $W$ on $\mathbb{Z}^{k}$ is $\log N$ . We refer to $t_{0}(k,|G_{\mathrm{ab}}|)$ as the entropic time.
(ii) Define $t_{*}(k,G):=\max\{t_{0}(k,|G_{\mathrm{ab}}|),\log_{k}|G|\}$ . We refer to $t_{*}(k,G)$ as the cutoff time or the mixing time.

The entropic time $t_{0}(k,|G_{\mathrm{ab}}|)$ is identified as the cutoff time for the projected random walk $Y_{t}:=G_{2}X_{t}$ on $G_{\mathrm{ab}}$ , see [17], which is naturally a lower bound on the mixing time of the RW $X_{t}$ on $G$ . To offer insight into the definition of the cutoff time, note that we need to run the RW sufficiently long to ensure that all elements of the group can be reached with reasonable probability, which leads to a lower bound of $\log_{k}|G|$ .

Our first result establishes cutoff around time $t_{*}(k,G)$ for the random walk $X$ on $\text{Cay}(G,S)$ where $S$ consists of i.i.d. uniform generators.

Theorem 1.

Let $G$ be a finite nilpotent group with $r(G),L(G)\asymp 1$ . Let $S=\{Z_{i}^{\pm 1}:i\in[k]\}$ with $Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G)$ . Assume $1\ll\log k\ll\log|G|$ . As $|G|\to\infty$ , the random walk on $\text{Cay}(G,S)$ exhibits cutoff with high probability at time $t_{*}(k,G)$ , which is the cutoff time defined in Definition 3.

1.3.2. Random Walk on Non-random Cayley Graphs: Reduction to Abelianization

For a nilpotent group $G$ and any symmetric set of generators $S\subseteq G$ whose size satisfies an upper bound, we show that the mixing time of the random walk on $G$ is completely determined (up to smaller order terms) by the mixing time of the projected walk on $G_{\mathrm{ab}}$ .

Theorem 2.

Let $G$ be a finite nilpotent group such that $r(G),L(G)\asymp 1$ and $S\subseteq G$ be a symmetric set of generators. Suppose $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ . For any fixed $\varepsilon\in(0,1)$ and $\delta\in(0,\varepsilon)$ we have

t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)\leq t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)

when $|G|$ is sufficiently large (more precisely, when $|G|\exp(-(\log|G|)^{L})\leq\delta$ ).

Remark 2.

The assumption $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ is to guarantee that $\mathrm{Diam}_{S}(G_{2})$ is of smaller order than $\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ so that the mixing of $X_{t}$ is governed by its projected walk onto $G_{\mathrm{ab}}$ . With more specific knowledge on the structure of $G$ , one can expect to obtain a much less stringent constraint on $S$ . Also see Remark 4.

As a direct consequence of the proof of Theorem 2, we establish that under the same conditions, the spectral gap of the random walk on $G$ is likewise determined by the spectral gap of its projection onto $G_{\mathrm{ab}}$ .

Corollary 1.

Let $t^{G}_{\mathrm{rel}}$ and $t^{G^{\mathrm{ab}}}_{\mathrm{rel}}$ be the relaxation time of the walk $X_{t}$ and $Y_{t}=G_{2}X_{t}$ respectively. Then

t^{G^{\mathrm{ab}}}_{\mathrm{rel}}\leq t^{G}_{\mathrm{rel}}\leq\max\{t^{G^{\mathrm{ab}}}_{\mathrm{rel}},|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\}.

In particular, when $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ we have $t^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}}$ .

As a consequence of the above results, we can see that for a class of nilpotent groups $G$ whose abelianization has a unique representation, with a symmetric set of generators $S$ of minimal size, the mixing time and the relaxation time (inverse of the spectral gap) of the random walk do not depend on the choice of $S$ . In this case, the choice of generators do not effect the mixing behavior. This provides a partial answer to the open question posed in Section 1.1.1.

Theorem 3.

Suppose $G$ is a nilpotent group with rank $r$ and step $L$ such that either (i) $G_{ab}\cong\mathbb{Z}^{r}_{m}$ where $m\in\mathbb{N}$ or (ii) $G$ is a $p$ -group. Suppose the rank and step satisfy $Lr^{L+1}\leq\frac{\log|G|}{16\log\log|G|}$ . For any symmetric set of generators $S\subseteq G$ of minimal size and any given $\varepsilon\in(0,1)$ , the mixing time $t^{G,S}_{mix}(\varepsilon)$ is the same up to smaller order terms, and the relaxation time $t_{\mathrm{rel}}^{G,S}$ is the same.

The mixing property of the random walk $X_{t}$ on the Cayley graph of $G$ is closely related to that of the projected random walk on the Cayley graph of $G_{\mathrm{ab}}$ . More precisely, denoting by $Y_{t}:=G_{2}X_{t}$ the projected RW on $G_{\mathrm{ab}}$ and starting with the walk $X_{t}$ being uniform over $G_{2}$ , one can observe (see Lemma 3.1) that

\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}=\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}.

As suggested by the following triangle inequality

\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}\leq\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}+\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}},

if the total variation distance between $\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)$ and $\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)$ can be nicely controlled then the mixing property of $X_{t}$ is primarily characterized by the mixing of its projection on the abelianization, which we refer to as the reduction to abelianization.

We will prove in Lemma 3.2 that indeed $\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}$ decays exponentially fast in time with rate at least $(|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2})^{-1}$ , where $\mathrm{Diam}_{S}(G_{2})$ is the diameter of $G_{2}$ in $\text{Cay}(G,S)$ . This provides a quantitive criterion to determine when the mixing of the walk $X_{t}$ is governed by its projection onto the abelianization. In particular, if the mixing of the projected walk $Y_{t}$ occurs after $\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}$ had become vanishingly small then the mixing time of $X_{t}$ is roughly that of $Y_{t}$ .

Due to the well known connection between the mixing time and the diameter of the graph, see, e.g., [25, Proposition 13.7], for our purpose it is sufficient to prove $\mathrm{Diam}_{S}(G_{2})$ is small enough compared to $\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ . Section 2 is devoted to proving an upper bound on $\mathrm{Diam}_{S}(G_{2})$ where the roles of $L$ , $|S|$ and $\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ are made explicit.

Theorem 4.

Let $S\subseteq G$ be a symmetric set of generators and let $R\subseteq S$ be such that $|\{s,s^{-1}\}\cap R|=1$ for all $s\in S$ . For $2\leq i\leq L$ , we have

	$\displaystyle\mathrm{Diam}_{S}(G_{2})$	$\displaystyle\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1})$
		$\displaystyle\leq\sum_{i=2}^{L}2^{5i+7}\|R\|^{i}\left(2^{2i}+L\cdot\lceil\mathrm{Diam}_{S}(G_{\mathrm{ab}})/\|R\|\rceil^{1/i}\right).$		(3)

As a consequence, for any set of generators satisfying $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ , one has $\mathrm{Diam}_{S}(G_{2})\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{3/4}$ and hence the mixing of $X_{t}$ can be reduced to the mixing of its projection onto the abelianization.

1.3.3. Our Methodology

We describe our methodology in relation to the objectives described in Section 1.1.2.

(i) Representation of random walk. A substantial body of work has been devoted to the study of random walks on unipotent matrix groups, see Section 1.4.1. The analysis in many existing work heavily depends on the favorable matrix structure specific to unipotent matrix groups, a feature not necessarily present in general nilpotent groups.

There has been some progress made towards treating general nilpotent groups. In [17, §6], partial results were obtained using a comparison between the mixing time of a general nilpotent group $G$ with a “corresponding” abelian group $\bar{G}:=\oplus_{\ell=1}^{L}G_{\ell}/G_{\ell+1}$ in [17, §6]. See Section 1.2 for the definition of $\{G_{\ell}\}_{\ell\in[L]}$ . More specifically, denoting by $\mathcal{G}_{k}$ and $\bar{\mathcal{G}}_{k}$ respectively the random Cayley graphs generated by $k$ i.i.d. uniform generators in $G$ and $\bar{G}$ , it is shown that $t_{mix}(\mathcal{G}_{k})/t_{mix}((\bar{\mathcal{G}})_{k})\leq 1+o(1)$ with high probability, thereby offering an upper bound on the mixing time on $\mathcal{G}_{k}$ .

This comparison leads to a tight upper bound and thus establishes cutoff when $G$ is a nilpotent group when $G$ has a relatively small commutator subgroup $[G,G]$ . Examples of such groups include $p$ -groups with “small” commutators and Heisenberg groups of diverging dimension, see [17, Corollary D.1 and D.2]. However, for general nilpotent groups this comparison is not sharp.

While the comparison technique discussed in [17] may not ensure a sharp upper bound on the mixing time for general nilpotent groups, it underscores the approach of examining the mixing behavior in relation to each quotient group $\{G_{\ell}/G_{\ell+1}\}_{\ell\in[L]}$ . To obtain the tight upper bound and establish cutoff, we give an accurate representation of the random walk dynamics through the lens of quotient groups.

To give a bit of intuition, let us consider the free nilpotent group of step 2 (i.e., $G_{3}=\{\mathrm{id}\}$ ). Let $S=\{Z_{i}^{\pm 1}:i\in[k]\}$ be a set of i.i.d. uniform generators. Let $W:=W(t)=(W_{1}(t),\dots,W_{k}(t))$ be an auxiliary process defined based on the random walk $X_{t}$ where $W_{i}(t)$ is the number of times generator $s_{i}$ has been applied minus the number of times $s_{i}^{-1}$ has been applied in the random walk $X:=X_{t}$ . Through rearranging, we can express any word in the form

X=Z_{1}^{W_{1}}\cdots Z_{k}^{W_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}},

(4)

where $(m_{ba})_{a,b\in[k],a<b}$ results from the rearrangement of generators, see (30) for more details. Roughly speaking, $Z_{1}^{W_{1}}\cdots Z_{k}^{W_{k}}$ keeps track of the walk on $G_{\mathrm{ab}}=G/G_{2}$ whereas the term $\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}$ , which belongs to $G_{2}$ , corresponds to the mixing on the quotient group $G_{2}/G_{3}$ .

We demonstrate in Section 4.3 that this line of reasoning applies to general nilpotent groups of step $L\geq 2$ , see (33). Although the rearranging of generators leads to the presence of multi-fold commutators such as $[[Z_{1},Z_{2}],Z_{3}]$ when $L\geq 3$ , we will argue, through a further careful simplification, that the presence of multi-fold commutators does not add to the complexity of the analysis, and one only needs to control the distribution of $\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}$ as with the case where $L=2$ .

(ii.a) Comparison argument: reduction to abelianization. We develop a nice argument of comparison that addresses the mixing of random walk on general groups with an arbitrary generator set $S$ . Under mild assumptions on the size of $S$ , the mixing time on $G$ is the same as the mixing time of the projected walk on $G_{\mathrm{ab}}$ (up to smaller order terms), see the precise statement in Theorem 2. That is, within the scope of Theorem 2, the mixing time on $G$ is completely determined by that on the abelianization $G_{\mathrm{ab}}$ .

Theorem 2 further implies that for a certain class of nilpotent groups with specific structures in their abelianization, the mixing time remains the same (up to smaller order terms) regardless of the choice of a minimal-sized symmetric set of generators. See Theorem 3 for the precise statement.

(ii.b) Geometry of the Cayley graph on nilpotent groups. We derive a quantitative upper bound on the diameter of the commutator subgroup $G_{2}$ in terms of the diameter of the abelianization $G_{\mathrm{ab}}$ , with explicit dependence on the rank $r=r(G)$ and step $L=L(G)$ of the group $G$ , as detailed in Theorem 4. This, combined with the aforementioned comparison argument, allows us to provide sufficient conditions under which the mixing behavior of the random walk on $G$ is governed by that of the projected walk on $G_{\mathrm{ab}}$ .

1.4. Historic Overview

1.4.1. Random Walks on Unipotent Matrix Groups

Consider the group ${\displaystyle\mathbb{U}_{n}}$ of upper-triangular matrices with $1$ ’s along the diagonal, so they are the group of matrices

\mathbb{U}_{n}=\left\{{\begin{pmatrix}1&*&\cdots&*&*\\ 0&1&\cdots&*&*\\ \vdots&\vdots&&\vdots&\vdots\\ 0&0&\cdots&1&*\\ 0&0&\cdots&0&1\end{pmatrix}}\right\}.

Then, a unipotent group can be defined as a subgroup of some ${\displaystyle\mathbb{U}_{n}}$ . This includes the two families of nilpotent groups discussed earlier: the $d\times d$ unit-upper triangular matrices $U_{m,d}$ with entries in $\mathbb{Z}_{m}$ and the $d$ -dimensional Heisenberg group $H_{m,d}$ over $\mathbb{Z}_{m}$ where $m\in\mathbb{N}$ .

The exploration of random walks on unit upper triangular matrices has led to a substantial body of research. One avenue of investigation involves a simple walk on $U_{m,d}$ , the $d\times d$ unit upper triangular matrix group with entries over $\mathbb{Z}_{m}$ for some $m\in\mathbb{N}$ : a row is chosen uniformly and added to or subtracted from the row above. Ellenberg [15] studied the diameter of the associated Cayley graph, with $d$ growing, and subsequently improved this in Ellenberg and Tymoczko [16]. Stong [31] gave mixing bounds via analysis of eigenvalues. Coppersmith and Pak [8, 28] look directly at mixing. Further work along this line includes Peres and Sly [29], Nestoridi [26] and Nestoridi and Sly [27]. Notably, Nestoridi and Sly [27] are the first to optimize bounds for $m$ and $d$ simultaneously. Diaconis and Hough [9] introduced a new method for proving a central limit theorem for random walks on unipotent matrix groups.

In the context of i.i.d. uniformly chosen generators, Hermon and Olesker-Taylor [19] prove the characterization of the cutoff time as the entropic time of the projected walk onto the abelianization for the two families of nilpotent groups: the $d\times d$ unit-upper triangular matrices $U_{m,d}$ with entries in $\mathbb{Z}_{m}$ and the $d$ -dimensional Heisenberg group $H_{m,d}$ over $\mathbb{Z}_{m}$ where $m\in\mathbb{N}$ .

1.4.2. The Entropic Methodology

A common theme in the study of mixing times is that “generic” instances often exhibit the cutoff phenomenon. Moreover, this can often be handled via the entropic method, see, e.g., [3, 4, 5]. A more detailed exposition of the known literature can be found in a previous article of one of the authors, see [17, §1.3.5]. Additionally, the entropic method has been applied within the context of random walks on groups, as discussed in [17, 19], which we now explain in a little more depth.

The main idea is to relate the mixing of the random walk $X=X_{t}$ on $\text{Cay}(G,S)$ to that of an auxiliary process $W_{t}$ and study the entropy of $W_{t}$ . Suppose $S=\{s_{i}^{\pm 1}:i\in[k]\}$ is given. The auxiliary process $W=W_{t}:=(W_{1}(t),\dots,W_{k}(t))$ is defined based on $X_{t}$ where $W_{i}(t)$ is the number of times generator $s_{i}$ has been applied minus the number of times $s_{i}^{-1}$ has been applied in the random walk $X_{t}$ . The observation that $W$ is a rate 1 random walk on $\mathbb{Z}^{k}$ (whose entropy reveals information regarding the mixing of the walk $X$ ) leads naturally to the definition of the entropic times, see Definition 3. More specifically, the auxiliary process $W$ is related to the original random walk $X$ as follows. We sample two independent copies of the random walk and the auxiliary process, denoted by $(X,W)$ and $(X^{\prime},W^{\prime})$ . By Cauchy-Schwarz inequality one has

4\|\mathbb{P}_{S}(X_{t}=\cdot|W_{t})-\pi_{G}\|^{2}_{\mathrm{TV}}\leq|G|\cdot\mathbb{P}_{S}(X_{t}=X^{\prime}_{t}|W_{t},W^{\prime}_{t})-1,

which relates the mixing of $X$ to the hitting probability of $X$ and $X^{\prime}$ , i.e., the probability that $X(X^{\prime})^{-1}=\mathrm{id}$ , where the index $t$ is suppressed as it is clear from the context.

When the group $G$ is abelian, given the choice of generators $S$ , the total variation distance is a function of $W_{t}$ alone, see [17]. When the group is not abelian, this is not the case. When $G$ is nilpotent, the auxiliary process $W_{t}$ still provides useful (albeit partial) information on $X_{t}$ . In this case, to get a full picture of the mixing of the RW, we will combine the knowledge on the auxiliary process $W_{t}$ with further information obtained through analyzing the mixing on the quotient groups $\{Q_{\ell}\}_{\ell\in[L]}$ separately. See Section 4.6 and 4.8 for the complete discussion.

2. Geometry of Cayley Graphs

The definition of Cayley graph of a group $G$ can be naturally extended to its quotient groups. For $H\trianglelefteq G$ , the Cayley graph of $G/H$ , denoted by $\text{Cay}(G/H,\{Hs:s\in S\})$ , consists of vertex set $G/H$ and edge set $\{\{Hg,Hgs\}:g\in G,s\in S\}$ .

Let $\mathrm{dist}_{S}(\cdot,\cdot)$ denote the graph distance on $\text{Cay}(G,S)$ . Define

S_{H}:=\{Hs:s\in S\}.

(5)

Similarly, let $\mathrm{dist}_{S_{H}}(\cdot,\cdot)$ denote the graph distance on $\text{Cay}(G/H,S_{H})$ . For a subgroup $H$ of $G$ , we define the diameter of $H$ with respect to the graph distance $\mathrm{dist}_{S}(\cdot,\cdot)$ on $\text{Cay}(G,S)$ by

\mathrm{Diam}_{S}(H):=\max\{\mathrm{dist}_{S}(id,h):h\in H\}.

(6)

For $H\trianglelefteq H^{\prime}\trianglelefteq G$ such that $H\trianglelefteq G$ (so that $G/H$ is a group), with slight abuse of notation, we can define the diameter of $H^{\prime}/H$ with respect to the graph distance $\mathrm{dist}_{S_{H}}(\cdot,\cdot)$ on $\text{Cay}(G/H,S_{H})$ ,

\mathrm{Diam}_{S}(H^{\prime}/H):=\max\{\mathrm{dist}_{S_{H}}(H,Hh^{\prime}):h^{\prime}\in H^{\prime}\}.

whose definition is consistent with (6) with $G,H,S$ replaced respectively by $G/H,H^{\prime}/H,S_{H}$ .

We have the following triangle inequality in terms of the diameter of a group $H^{\prime}$ and that of its subgroup $H$ and the quotient group $H^{\prime}/H$ .

Proposition 1.

For all $H\trianglelefteq H^{\prime}\trianglelefteq G$ such that $H\trianglelefteq G$ the following holds:

\displaystyle\mathrm{Diam}_{S}(H^{\prime})

\displaystyle\leq\mathrm{Diam}_{S}(H^{\prime}/H)+\mathrm{Diam}_{S}(H).

(7)

Proof.

Let $h^{\prime}\in H^{\prime}$ . By the definition of $\operatorname{Diam}_{S}(H^{\prime}/H)$ , there exists $s_{1},\dots,s_{m}\in S$ with $m\leq\mathrm{Diam}_{S}(H^{\prime}/H)$ such that $Hh^{\prime}=Hs_{1}\cdots s_{m}$ , i.e., there exists $h\in H$ such that $h^{\prime}=hs_{1}\cdots s_{m}$ . Hence

\mathrm{dist}_{S}(id,h^{\prime})=\mathrm{dist}_{S}(id,hs_{1}\cdots s_{m})\leq\mathrm{dist}_{S}(id,h)+m\leq\mathrm{Diam}_{S}(H)+\mathrm{Diam}_{S}(H^{\prime}/H),

which concludes the proof of (7). ∎

Applying the triangle inequality in (7) iteratively leads to a decomposition of $\mathrm{Diam}_{S}(G_{2})$ as the sum of $\mathrm{Diam}_{S}(G_{i}/G_{i+1})$ over $i\in[L]$ , i.e.,

\mathrm{Diam}_{S}(G_{2})\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1}).

Breulliard and Tointon [6, Lemma 4.11] showed the diameter of $G_{2}$ is at most $C_{S,L}(\mathrm{Diam}_{S}(G)^{1/2})$ , where $C_{S,L}$ is a constant depending on the size of $S$ and $L:=L(G)$ . In fact, they showed for all $i\in[L]$ that $\mathrm{Diam}_{S}(G_{i}/G_{i+1})$ is at most $C_{S,L}\mathrm{Diam}_{S}(G)^{1/i}$ . El-Baz and Pagano [14] proved the same estimates using somewhat similar arguments. In addition, they observe that $\mathrm{Diam}_{S}(G)\leq\mathrm{Diam}_{S}(G_{2})+\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ and hence one can estimate $\mathrm{Diam}_{S}(G_{i}/G_{i+1})$ in terms of $\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ .

As discussed in Remark 1, a necessary condition for the random walk on $\text{Cay}(G,S)$ to exhibit cutoff when $L$ is bounded is for $|S|$ to diverge. Consequently, as opposed to [6] and [14] which did not quantify the dependence of the constant $C_{S,L}$ on $|S|$ and $L$ , it is necessary for us to quantify this dependence. Our approach for upper bounding $\mathrm{Diam}_{S}(G_{2})$ adheres to the framework in El-Baz and Pagano [14], but with considerably more attention devoted to quantifying the influence of $|S|$ as well as $L$ .

Theorem 4.

Let $S\subseteq G$ be a symmetric set of generators and let $R\subseteq S$ be such that $|\{s,s^{-1}\}\cap R|=1$ for all $s\in S$ . For $2\leq i\leq L$ , we have

	$\displaystyle\mathrm{Diam}_{S}(G_{2})$	$\displaystyle\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1})$
		$\displaystyle\leq\sum_{i=2}^{L}2^{5i+7}\|R\|^{i}\left(2^{2i}+L\cdot\lceil\mathrm{Diam}_{S}(G_{\mathrm{ab}})/\|R\|\rceil^{1/i}\right).$

The following comparison between $\mathrm{Diam}_{S}(G_{2})$ and $\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ is what we will use in the proof of Theorem 2.

Corollary 2.

For any fixed $L\in\mathbb{N}$ , we have

\mathrm{Diam}_{S}(G_{2})\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{3/4}

when $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{4L}$ . In particular, this condition holds when $|R|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ .

Remark 3.

The statement above is a special case of the following more general claim: For all $\varepsilon>0$ , if $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{L/\varepsilon}$ then $\mathrm{Diam}_{S}(G_{2})\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{1/2+\varepsilon}$ , which holds when $|R|\leq\frac{\varepsilon\log|G|}{2Lr^{L}\log\log|G|}$ .

Proof.

Knowing that $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{4L}$ , it is an easy consequence of Theorem 4 that

	$\displaystyle\mathrm{Diam}_{S}(G_{2})$	$\displaystyle\leq\sum_{i=2}^{L}2^{5i+7}\|R\|^{i}\left(2^{2i}+L\lceil\mathrm{Diam}_{S}(G_{\mathrm{ab}})/\|R\|\rceil^{1/i}\right)$
		$\displaystyle\lesssim\|R\|^{L}\cdot\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{1/2}\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{3/4}.$

It remains to prove that $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{4L}$ for the given range of $|R|$ . Using the fact $|G_{\mathrm{ab}}|\geq|G|^{1/2r^{L}}$ from Corollary 4 we can observe that

|G_{\mathrm{ab}}|^{1/|R|}\geq|G|^{\frac{1}{2r^{L}|R|}}\geq(\log|G|)^{4L}\gg|R|^{4L}

(8)

for $|R|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ . Based on (8), it suffices to show $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gtrsim|G_{\mathrm{ab}}|^{1/|R|}$ for the given range of $|R|$ .

To prove $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gtrsim|G_{\mathrm{ab}}|^{1/|R|}$ the key is to notice for the Cayley graph $\text{Cay}(G_{\mathrm{ab}},S_{G_{2}})$ , setting $k:=|R|$ , trivially we have $|B_{G_{\mathrm{ab}}}(\ell)|\leq|B_{k}(\ell)|$ , where $B_{G_{\mathrm{ab}}}(\ell):=\{g\in G_{\mathrm{ab}}:\mathrm{dist}_{S_{G_{2}}}(G_{2},g)\leq\ell\}$ is the ball of radius $\ell$ in $\text{Cay}(G_{\mathrm{ab}},S_{G_{2}})$ and $B_{k}(\ell):=\{\bm{z}\in\mathbb{Z}^{k}:\|\bm{z}\|_{1}\leq\ell\}$ is the $k$ -dimensional lattice ball of radius $\ell$ . Thus $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\geq\min\{\ell:|B_{k}(\ell)|\geq|G_{\mathrm{ab}}|\}.$ It follows from Lemma E.2a in [18] that $|B_{k}(\ell)|\leq 2^{k\wedge\ell}{\ell+k\choose k}\leq(4\ell)^{k}$ for $\ell\geq k$ , which implies $(4\mathrm{Diam}_{S}(G_{\mathrm{ab}}))^{k}\geq|G_{\mathrm{ab}}|$ . Hence we have $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gtrsim|G_{\mathrm{ab}}|^{1/|R|}$ .

∎

Before we turn to the proof of Theorem 4, some preliminary results that will be useful are presented in the next section.

2.1. Preliminaries

We begin by recalling some standard notation and stating several properties of commutators. For $x,y\in G$ we write $[x,y]:=x^{-1}y^{-1}xy=[y,x]^{-1}$ and $x^{y}:=y^{-1}xy=x[x,y]=[y,x]x^{-1}$ . Further observe that for $x,y,z\in G$ , $[x,yz]=[x,z][x,y]^{z}=[x,z][x,y][[x,y],z]$ . Define $\rho(x,y):=[x,y]$ for $x,y\in G$ as the two-fold commutator, and inductively

\rho(x_{1},...,x_{i}):=[\rho(x_{1},...,x_{i-1}),x_{i}]\quad\text{for}\quad i\geq 3\quad\text{and}\quad x_{1},...,x_{i}\in G.

(9)

Some standard properties of commutators are collected into the following propositions whose proofs can be easily found in literature, see e.g. [13], and thus are omitted. The following is a fairly well known result following from an induction argument using the three subgroup lemma.

Proposition 2.

The lower central series of a nilpotent group $G$ is a strongly central series, i.e., $[G_{i},G_{j}]$ is a subgroup of $G_{i+j}$ for all $i,j\geq 1$ .

Proposition 3.

For $i\geq 0$ , the map $\phi:G\times G_{i}\to G_{i+1}/G_{i+2}$ given by $\phi(g,h):=G_{i+2}[g,h]$ is anti-symmetric and bi-linear. Namely, the following hold for all $x\in G$ and $y,z\in G_{i}$ :

	$\displaystyle G_{i+2}[x,y]$	$\displaystyle=G_{i+2}[y,x]^{-1}$
	$\displaystyle G_{i+2}[x,yz]$	$\displaystyle=G_{i+2}[x,y][x,z]$
	$\displaystyle G_{i+2}[yz,x]$	$\displaystyle=G_{i+2}[y,x][z,x]$
	$\displaystyle G_{i+2}[x^{\ell},y^{j}]$	$\displaystyle=G_{i+2}[x,y]^{\ell j}\quad\text{for all}\quad\ell,j\in\mathbb{Z}.$

Moreover, for $i\geq 2$ and $j\leq i$ , if $x_{1},\ldots x_{i}\in G$ and $y\in G$ , then we have the following linearity in the $j$ -th component, i.e.,

\displaystyle G_{i+1}\rho(x_{1},...,x_{j-1},x_{j}y,x_{j+1},...,x_{i})=G_{i+1}\rho(x_{1},...,x_{i})\widehat{x}_{y,j}=G_{i+1}\widehat{x}_{y,j}\rho(x_{1},...,x_{i}).

(10)

where $\widehat{x}_{y,j}:=\rho(x_{1},...,x_{j-1},y,x_{j+1},...,x_{i})$ , and so

G_{i+1}\rho(a,x_{2},\dots,x_{i})=G_{i+1}\rho(b,x_{2},\dots,x_{i})\quad\text{ if }ab^{-1}\in G_{2}.

(11)

Let $R\subseteq S$ be such that $|\{s,s^{-1}\}\cap R|=1$ for all $s\in S$ . Now define inductively

S_{1}:=S\quad\text{and}\quad S_{i}:=\{[s,s^{\prime}]\mid s\in R,\>s^{\prime}\in S_{i-1}\}\quad\text{for}\quad i\geq 2.

Write $\widehat{S}_{i}:=\{G_{i+1}s:s\in S_{i}\}$ for $i\geq 1$ . The following proposition can be proved by induction on $i$ using Proposition 3. We omit the details, and refer the reader to [14] for additional details.

Proposition 4.

Assume that $\widehat{S}_{1}$ generates $G_{\mathrm{ab}}$ . Then $\widehat{S}_{i}$ generates the Abelian group $G_{i}/G_{i+1}$ for all $i\geq 1$ . In particular, $S$ generates $G$ if and only if $\widehat{S}_{1}$ generates $G_{\mathrm{ab}}$ .

Corollary 3.

For $i\geq 1$ and any $g\in G_{i}$ we can write

G_{i+1}g=G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\prod_{s\in S}\rho(s^{\ell_{(x_{2},...,x_{i}),g}(s)},x_{2},...,x_{i}),

(12)

where $\{\ell_{(x_{2},...,x_{i}),g}(\cdot):(x_{2},...,x_{i})\in R^{i-1}\}$ are functions from $S$ to $\mathbb{Z}_{+}$ belonging to the set

A:=\{\ell:S\to\mathbb{Z}_{+}\text{ s.t. }\sum_{s\in S}|\ell(s)|\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}})\text{ and }\ell(s)\cdot\ell(s^{-1})=0\text{ for all }s\in S\text{ such that }s\neq s^{-1}\},

where the second condition means for all $s\in S$ such that $s\neq s^{-1}$ we have either $\ell(s)=0$ or $\ell(s^{-1})=0$ for any $\ell\in A$ .

Proof.

We know from Proposition 4 that $\widehat{S}_{i}$ generates $G_{i}/G_{i+1}$ , i.e., for any $g\in G_{i}$ we can express $G_{i+1}g$ as a product of elements in $\widehat{S}_{i}=\{G_{i+1}s:s\in S_{i}\}$ . Observe that $\rho:S\times R^{i-1}\to S_{i}$ is surjective due to the definition of $S_{i}$ . Hence we can express

G_{i+1}g=G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})},

where $\tilde{\ell}(s,x_{2},\dots,x_{i})\in\mathbb{Z}$ corresponds to the number of times $\rho(s,x_{2},...,x_{i})$ appears. Let $h_{(x_{2},\dots,x_{i}),g}:=\prod_{s\in S}s^{\tilde{\ell}(s,x_{2},\dots,x_{i})}\in G$ so that by (10)

\displaystyle G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})}

\displaystyle=G_{i+1}\rho(\prod_{s\in S}s^{\tilde{\ell}(s,x_{2},\dots,x_{i})},x_{2},\dots,x_{i})=G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},\dots,x_{i}).

Then we can take $h^{ab}_{(x_{2},\dots,x_{i}),g}=G_{2}h_{(x_{2},\dots,x_{i}),g}\in G_{\mathrm{ab}}$ so that

G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})=G_{i+1}\rho(h^{ab}_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i}).

(13)

The above expression contains a slight abuse of notation on the right hand side as $h^{ab}_{(x_{2},\dots,x_{i}),g}$ is not an element of $G$ while $\rho$ was defined to have inputs from $G$ . By (11) we see that for any $h^{\prime}\in G$ such that $h^{\prime}h_{(x_{2},\dots,x_{i}),g}^{-1}\in G_{2}$ ,

G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})=G_{i+1}\rho(h^{\prime},x_{2},...,x_{i})

and hence what essentially determines the value of (13) is $h^{ab}_{(x_{2},\dots,x_{i}),g}=G_{2}h_{(x_{2},\dots,x_{i}),g}$ , which clarifies the meaning of the right hand side of (13). The point of doing so is that we can identify $G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})}$ with $G_{i+1}\rho(h^{ab}_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})$ for some $h^{ab}_{(x_{2},\dots,x_{i}),g}\in G_{\mathrm{ab}}$ . As $G_{\mathrm{ab}}$ can be generated by $\widehat{S}_{1}$ , there exists some function $\hat{\ell}(\cdot,x_{2},\dots,x_{i})$ that satisfies $\sum_{s\in S}|\hat{\ell}(s,x_{2},\dots,x_{i})|\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ such that

G_{2}\prod_{s\in S}s^{\hat{\ell}(s,x_{2},\dots,x_{i})}=h^{ab}_{(x_{2},\dots,x_{i}),g},

i.e.,

G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})}=G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\hat{\ell}(s,x_{2},\dots,x_{i})}.

This explains the first condition in the definition of $A$ .

To explain the second condition in the definition of $A$ , we observe that since $\rho(s^{-1},x_{2},\dots,x_{i})=\rho(s,x_{2},\dots,x_{i})^{-1}$ for $s\in S$ , only one of $\{s,s^{-1}\}$ needs to appear in the expression above. Given $g\in G_{i}$ and $(x_{2},\dots,x_{i})$ , for each $s\in S$ , we choose $s_{+}\in\{s,s^{-1}\}$ such that simplifying the product

G_{i+1}\prod_{s^{\prime}\in\{s,s^{-1}\}}\rho(s^{\prime},x_{2},...,x_{i})^{\hat{\ell}(s^{\prime},x_{2},\dots,x_{i})}=G_{i+1}\rho(s_{+},x_{2},\dots,x_{i})^{\ell_{(x_{2},\dots,x_{i}),g}(s)}

leads to a non-negative power $\ell_{(x_{2},\dots,x_{i}),g}(s)$ . We can view $\ell_{(x_{2},\dots,x_{i}),g}(\cdot)$ as function from $S$ to $\mathbb{Z}_{+}$ such that only one of $\{\ell_{(x_{2},\dots,x_{i}),g}(s),\ell_{(x_{2},\dots,x_{i}),g}(s^{-1})\}$ is nonzero for $s\in S$ . It is straightforward to verify $\sum_{s\in S}|\ell_{(x_{2},\dots,x_{i}),g}(s)|\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ .

Finally, the proof is concluded by applying (10) with the above choice of $\ell_{(x_{2},\dots,x_{i}),g}(\cdot)$ .

∎

Corollary 4.

For $1\leq i\leq L$ ,

|G_{i}/G_{i+1}|\leq|G_{\mathrm{ab}}|^{r(G)^{i-1}}\quad\text{ and }\quad|G|\leq|G_{\mathrm{ab}}|^{2r(G)^{L}}.

Remark 4.

From a technical standpoint, the second inequality above is why the term $r^{L}$ is present in the condition $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ of Theorem 2. This inequality can be improved in various scenarios with extra knowledge on the group structure.

Proof.

By (10) and (12), for any $g\in G_{i}$ , we can express $G_{i+1}g$ as

G_{i+1}g=G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})

for some $h_{(x_{2},\dots,x_{i}),g}:=\prod_{s\in S}s^{\ell_{(x_{2},...,x_{i}),g}(s)}$ where $\ell_{(x_{2},...,x_{i}),g}(\cdot)\in A$ for all $(x_{2},\dots,x_{i})\in R^{i-1}$ where $A$ is as in Corollary 3. By the same argument as in the proof of Corollary 3, for any given $h_{(x_{2},\dots,x_{i}),g}\in G$ , we can take $h^{ab}_{(x_{2},\dots,x_{i}),g}=G_{2}h_{(x_{2},\dots,x_{i}),g}\in G_{\mathrm{ab}}$ so that

G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})=G_{i+1}\rho(h^{ab}_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i}).

That is, for any $G_{i+1}g\in G_{i}/G_{i+1}$ , we can define a function $\phi_{g}:R^{i-1}\to G_{\mathrm{ab}}$ by $\phi_{g}(x_{2},\dots,x_{i})=h^{ab}_{(x_{2},\dots,x_{i}),g}$ , which implies $|G_{i}/G_{i+1}|$ is upper bounded by the number of functions from $R^{i-1}$ to $G_{\mathrm{ab}}$ .

|G|=\prod_{i=1}^{L}|G_{i}/G_{i+1}|\leq\prod_{i=1}^{L}|G_{\mathrm{ab}}|^{|R^{i-1}|}\leq|G_{\mathrm{ab}}|^{2|R|^{L}}.

Taking $S$ such that $|R|=r(G)$ gives the desired inequality.

∎

2.2. Proof of Theorem 4

We begin with the following estimate that plays a key role in the proof of Theorem 4.

Lemma 2.1.

Let $i\in[L]$ be fixed. For any $1\leq m\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ and $(s,x_{2},...,x_{i})\in S\times R^{i-1}$

|G_{i+1}\rho(s^{m},x_{2},...,x_{i})|\leq 2^{5i+6}\left(2^{2i}+Lm^{1/i}\right).

In what follows, we first present the proof of Theorem 4 given Lemma 2.1 and then complete the proof of Lemma 2.1.

Proof of Theorem 4.

To simplify notation, we abbreviate $D:=\mathrm{Diam}_{S}(G_{\mathrm{ab}})$ . Recall from (5) that $S_{G_{i+1}}=\{G_{i+1}s:s\in S\}$ for $i\in[L]$ . Let $\mathrm{dist}_{S,i}(\cdot,\cdot)$ denote the graph distance on the Cayley graph $\text{Cay}(G_{i}/G_{i+1},S_{G_{i+1}})$ and write $|G_{i+1}g|:=\operatorname{dist}_{S,i}(\mathrm{id},g)$ for $g\in G$ .

The first inequality $\mathrm{Diam}_{S}(G_{2})\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1})$ follows from inductively applying (7) with $H=G_{i+1}$ and $H^{\prime}=G_{i}$ for $2\leq i\leq L$ .

We turn to the second inequality. The goal is to prove that for every $2\leq i\leq L$ ,

\mathrm{Diam}_{S}(G_{i}/G_{i+1})\leq 2^{5i+7}|R|^{i}\left(2^{2i}+L\lceil D/|R|\rceil^{1/i}\right).

(14)

By Corollary 3, in order to prove (14) it suffices to show that for any $g\in G_{i}$ ,

|G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\prod_{s\in S}\rho(s^{\ell_{(x_{2},...,x_{i}),g}(s)},x_{2},...,x_{i})|\leq 2^{5i+7}|R|^{i}\left(2^{2i}+L\lceil D/|R|\rceil^{1/i}\right),

(15)

where $\{\ell_{(x_{2},...,x_{i}),g}(\cdot):(x_{2},...,x_{i})\in R^{i-1}\}$ are functions defined in Corollary 3, belonging to the set

A:=\{\ell:S\to\mathbb{Z}_{+}\text{ s.t. }\sum_{s\in S}|\ell(s)|\leq D\text{ and }\ell(s)\cdot\ell(s^{-1})=0\text{ for all }s\in S\text{ such that }s\neq s^{-1}\}.

For any $\ell\in A$ and $s\in S$ , we have $\ell(s)\leq D$ . Applying Lemma 2.1 to $\ell(s)$ and using the triangle inequality, we can obtain

|G_{i+1}\prod_{s\in S}\rho(s^{\ell(s)},x_{2},...,x_{i})|\leq 2^{5i+6}(2^{2i}|S|+L\max_{\ell\in A}\{\sum_{s\in S}\ell(s)^{1/i}\}).

Given the constraint $\ell\in A$ , a simple application of Lagrange multipliers gives

\max_{\ell\in A}\{\sum_{s\in S}\ell(s)^{1/i}\}\leq|R|\cdot\lceil D/|R|\rceil^{1/i}.

Plugging this into the previous display gives

|G_{i+1}\prod_{s\in S}\rho(s^{\ell(s)},x_{2},...,x_{i})|\leq 2^{5i+6}(2^{2i}|S|+L|R|\cdot\lceil D/|R|\rceil^{1/i}).

Summing over $(x_{2},...,x_{i})\in R^{i-1}$ using the triangle inequality gives the required bound in (15) and thus completes the proof.

∎

Proof of Lemma 2.1. The following simple estimate will play a major role in our proof. For $2\leq j\leq i$ , $(x_{1},...,x_{i})\in S\times R^{i-1}$ and $n\in\mathbb{N}$ , we have

|\rho(x_{1}^{n},x_{2}^{n},...,x_{j}^{n},x_{j+1},...,x_{i})|\leq 2^{i+2}n.

(16)

This follows from the fact that

|\rho(x_{1},...,x_{i})|\leq 2(|x_{i}|+|\rho(x_{1},...,x_{i-1})|),

which is a simple consequence of the definition $\rho(x_{1},\dots,x_{i})=[\rho(x_{1},\dots,x_{i-1}),x_{i}]$ .

Note that if $m=n^{j}$ for some $n\in\mathbb{N}$ and $j\geq 1$ , then by (10) we can simply write

G_{i+1}\rho(s^{m},x_{2},...,x_{i})=G_{i+1}\rho(s^{n},x_{2}^{n},...,x_{j}^{n},x_{j+1},...,x_{i})

and use (16) to conclude the proof. Otherwise, we can still try to decompose $m$ as a sum of terms of the form $\{n^{j}:j\in[i],n\in\mathbb{N}\}$ , which helps improving the upper bound on $|G_{i+1}\rho(s^{m},x_{2},...,x_{i})|$ . In other words, our goal is to express $G_{i+1}\rho(s^{m},x_{2},...,x_{i})$ as the product of elements of the form

\{G_{i+1}\rho(s^{n},x_{2}^{n},...,x_{j}^{n},x_{j+1},...,x_{i}):j\in[i],n\in\mathbb{N}\}.

To find the decomposition of $G_{i+1}\rho(s^{m},x_{2},...,x_{i})$ we can employ a greedy procedure to search for some set $W(j)\subseteq\mathbb{N}$ for each $j\in[i]$ so that $m=\sum_{j\in[i]}\sum_{n\in W(j)}n^{j}$ . In what follows we first define $W(i)$ and then find $W(j)$ for $j=i-1,i-2,\dots,1$ . Setting $E_{1}:=m$ and $D_{1}:=\lfloor m^{1/i}\rfloor$ , we will define $E_{j}$ and $D_{j}$ inductively for $j=i-1,i-2,\dots 2.$

For $a\geq 1$ such that $E_{a}\geq 4^{i^{2}}$ , let

E_{a+1}:=E_{a}-D_{a}^{i},\quad D_{a+1}:=\lfloor E_{a+1}^{1/i}\rfloor\quad\text{and}\quad y_{a}:=\rho(s^{D_{a}},x_{2}^{D_{a}},...,x_{i}^{D_{a}}).

We stop at the first time when $|E_{a}|<4^{i^{2}}$ and record $\ell_{i}:=\min\{a:E_{a}<4^{i^{2}}\}$ . Set $W(i):=\{D_{a}:1\leq a<\ell_{i}\}$ .

For each $j=i-1,i-2,\dots,2$ , in order to find $W(j)$ we proceed as follows: let

E_{a+1}:=E_{a}-D_{a}^{j},\quad D_{a+1}:=\lfloor E_{a+1}^{1/j}\rfloor\quad\text{and}\quad y_{a}:=\rho(s^{D_{a}},x_{2}^{D_{a}},...,x_{j}^{D_{a}},x_{j+1},...,x_{i}).

We stop at the first time when $E_{a}<4^{j^{2}}$ and record $\ell_{j}:=\min\{a:E_{a}<4^{j^{2}}\}$ . Set $W(j):=\{D_{a}:\ell_{j+1}\leq a<\ell_{j}\}$ .

Finally, we set $y_{\ell_{2}}:=\rho(s^{E_{\ell_{2}}},x_{2},...,x_{i})$ , $y:=\prod_{a=1}^{\ell_{2}}y_{a}$ and $W(1):=\{E_{\ell_{2}}\}$ .

By Propositon 3 we have

G_{i+1}\rho(s^{m},x_{2},...,x_{i})=G_{i+1}y.

That is, it suffices to upper bound $|G_{i+1}y|$ . For $1\leq a<\ell_{2}$ , it follows from (16) and the definition of $y_{a}$ that $|G_{i+1}y_{a}|\leq 2^{i+2}D_{a}$ . It is easy to see that $D_{a+1}\leq D_{a}$ for all $\ell_{j+1}\leq a<\ell_{j}$ and thus $D_{a}\leq D_{\ell_{j+1}}$ for $\ell_{j+1}\leq a<\ell_{j}$ . Lastly, by definition, $|G_{i+1}y_{\ell_{2}}|\leq E_{\ell_{2}}\leq 4^{2^{2}}=2^{8}$ . Combining these facts gives

	$\displaystyle\|G_{i+1}y\|$	$\displaystyle\leq\sum_{a=1}^{\ell_{2}}\|G_{i+1}y_{a}\|=\sum_{a=1}^{\ell_{i}-1}\|G_{i+1}y_{a}\|+\sum_{j=2}^{i-1}\sum_{\ell_{j+1}\leq a<\ell_{j}}\|G_{i+1}y_{a}\|+\|G_{i+1}y_{\ell_{2}}\|$
		$\displaystyle\leq\sum_{a=1}^{\ell_{i}-1}2^{i+2}D_{a}+\sum_{j=2}^{i-1}(\ell_{j}-\ell_{j+1})2^{i+2}D_{\ell_{j+1}}+2^{8}.$		(17)

We first upper bound the second term in (2.2). By definition $E_{\ell_{j+1}}<4^{(j+1)^{2}}$ and thus $D_{\ell_{j+1}}\leq E_{\ell_{j+1}}^{1/j}\leq 4^{(j+1)^{2}/j}\leq 4^{j+3}$ . To bound $\ell_{j}-\ell_{j+1}$ for $2\leq j\leq i-1$ , note that for $\ell_{j+1}\leq a<\ell_{j}$ , $E_{a}\geq 4^{j^{2}}$ and thus $D_{a}\geq 4^{j}$ , which implies that

4^{j^{2}}\leq E_{\ell_{j}-1}\leq E_{\ell_{j}-2}-4^{j^{2}}\leq E_{\ell_{j+1}}-(\ell_{j}-1-\ell_{j+1})4^{j^{2}},

i.e.,

(\ell_{j}-\ell_{j+1})\leq\frac{E_{\ell_{j+1}}}{4^{j^{2}}}<\frac{4^{(j+1)^{2}}}{4^{j^{2}}}=4^{2j+1}.

It follows that the second term in (2.2) satisfies

\sum_{j=2}^{i-1}(\ell_{j}-\ell_{j+1})2^{i+2}D_{\ell_{j+1}}\leq 2^{i+2}\sum_{j=2}^{i-1}4^{2j+1}4^{j+3}\leq 2^{7i+5}.

(18)

Next, we estimate $\sum_{a=1}^{\ell_{i}-1}D_{a}$ in (2.2). Observe that

E_{2}=m-\lfloor m^{1/i}\rfloor^{i}\leq(\lfloor m^{1/i}\rfloor+1)^{i}-\lfloor m^{1/i}\rfloor^{i}\leq 2^{i}\lfloor m^{1/i}\rfloor^{(i-1)}\leq 2^{i}m^{\frac{i-1}{i}}.

(19)

Repeating the same calculation for $2^{i}m^{\frac{i-1}{i}}$ yields that $E_{3}\leq 2^{i(1+\frac{i-1}{i})}m^{(\frac{i-1}{i})^{2}}$ . More generally, for $1\leq a<\ell_{i}$ , since $\sum_{h=0}^{\infty}(\frac{i-1}{i})^{h}=i$ ,

E_{a+1}\leq(2^{i})^{\sum_{h=0}^{a-1}(\frac{i-1}{i})^{h}}m^{(\frac{i-1}{i})^{a}}\leq 2^{i^{2}}m^{(\frac{i-1}{i})^{a}}.

Since for $2\leq a<\ell_{i}$ , $D_{a}=\lfloor E_{a}^{1/i}\rfloor\leq\lfloor E_{2}^{1/i}\rfloor\leq E_{2}^{1/i}$ , by (19) and the fact that $D_{1}:=\lfloor m^{1/i}\rfloor$ we have

\sum_{a=1}^{\ell_{i}-1}D_{a}\leq\lfloor m^{1/i}\rfloor+\ell_{i}\cdot 2m^{\frac{i-1}{i^{2}}}.

It remains to upper bound $\ell_{i}$ . By definition, we have $\ell_{i}\leq\min\{a:2^{i^{2}}m^{(\frac{i-1}{i})^{a}}<4^{i^{2}}\}$ . Simple calculation shows that for any $2\leq i\leq L$ ,

\ell_{i}\leq\left\lceil\frac{\log\log m-\log\log(2^{i^{2}})}{\log(\frac{i}{i-1})}\right\rceil\leq\frac{\log\log m}{\log(\frac{L}{L-1})}\leq 2(L-1)\log\log m,

where the last inequality follows from the fact that $\log(1+x)\geq x/2$ for $x\in[0,1]$ . Therefore, for $2\leq i\leq L$ and $1\leq m\leq D$ ,

\displaystyle\sum_{a=1}^{\ell_{i}-1}D_{a}

\displaystyle\leq\lfloor m^{1/i}\rfloor+2(L-1)(\log\log m)\cdot 2m^{\frac{i-1}{i^{2}}}\leq 4L\cdot\max\{m^{1/i},(\log\log m)m^{\frac{i-1}{i^{2}}}\}.

Noting that $\max_{1\leq m\leq e^{i^{2}}}\{m^{1/i},(\log\log m)m^{\frac{i-1}{i^{2}}}\}\leq(\log i)e^{i}$ and $\max_{m>e^{i^{2}}}\{m^{1/i},(\log\log m)m^{\frac{i-1}{i^{2}}}\}\leq(\log i)m^{1/i}$ , we have

\sum_{a=1}^{\ell_{i}-1}D_{a}\leq 4L(\log i)e^{i}m^{1/i}\leq 2^{4i+2}Lm^{1/i}.

(20)

Finally, plugging the upper bounds in (18) and (20) into (2.2) yields, for $2\leq i\leq L$ ,

	$\displaystyle\|G_{i+1}y\|$	$\displaystyle\leq 2^{i+2}\cdot 2^{4i+2}Lm^{1/i}+2^{7i+5}+2^{8}\leq 2^{5i+4}Lm^{1/i}+2^{7i+6}$
		$\displaystyle\leq 2^{5i+6}\left(2^{2i}+Lm^{1/i}\right).$

which completes the proof of Lemma 2.1. ∎

3. Reduction to Abelianization

Let $X_{t}$ be a rate 1 simple random walk on $\text{Cay}(G,S)$ and let $Y_{t}:=G_{2}X_{t}$ be the projected random walk of $X_{t}$ onto $G_{\mathrm{ab}}$ , which is a rate 1 simple random walk on $\text{Cay}(G_{\mathrm{ab}},S_{G_{2}})$ . Let $t_{\mathrm{mix}}^{G,S}(\varepsilon)$ denote the $\varepsilon$ -mixing time of the random walk on $\text{Cay}(G,S)$ and $t_{\mathrm{mix}}^{G_{\mathrm{ab}},S}(\varepsilon)$ the $\varepsilon$ -mixing time for the projected random walk on $\text{Cay}(G_{\mathrm{ab}},S_{G_{2}})$ . To simplify notation we will drop the $S$ in the superscript and write $t_{\mathrm{mix}}^{G}(\varepsilon)$ instead when the choice of $S$ is clear from the context.

Since $Y_{t}$ is the projection of $X_{t}$ onto the abelianization $G_{\mathrm{ab}}$ we can observe

\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}},

(21)

which implies $t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)$ . Naturally we are interested in the mixing behavior of $X_{t}$ in comparison to that of $Y_{t}$ , that is, we hope to understand to what extent the mixing behavior of the random walk on $G$ is governed by its projection on the abelianzation group $G_{\mathrm{ab}}$ . It turns out that for a nilpotent group $G$ of bounded step and rank, when the generator set $S$ is not too large, the mixing of $X_{t}$ is completely governed by that of $Y_{t}$ .

Theorem 2.

t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)\leq t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)

when $|G|$ is sufficiently large (more precisely, when $|G|\exp(-(\log|G|)^{L})\leq\delta$ ).

It is well known that the relaxation time is characterized by the exponential decay rate of the total variation distance between the talk and its equilibrium (see e.g., Corollary 12.7 in [32]). As a consequence of the proof of Theorem 2, we obtain the following characterization of the relaxation time of $X_{t}$ in terms of its projection $Y_{t}$ .

Corollary 1.

Let $t^{G}_{\mathrm{rel}}$ and $t^{G^{\mathrm{ab}}}_{\mathrm{rel}}$ be the relaxation time of the walk $X_{t}$ and $Y_{t}$ respectively. Then

t^{G^{\mathrm{ab}}}_{\mathrm{rel}}\leq t^{G}_{\mathrm{rel}}\leq\max\{t^{G^{\mathrm{ab}}}_{\mathrm{rel}},|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\}.

In particular, when $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ we have $t^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}}$ .

Before moving on to proving Theorem 2, we first explain why Corollary 1 is an easy consequence of the proof of Theorem 2 and delay its proof to the end of this section. It is useful to observe the following inequality

\displaystyle\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}

\displaystyle\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}+\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}},

(22)

where $\pi_{A}$ denotes the uniform distribution over the set $A\subseteq G$ . We will establish that in the given regime of $|S|$ , the second term $\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}$ in the above inequality is the leading order term that determines the time of mixing. Moreover, Lemma 3.1 shows this term is fully characterized by the projected random walk $Y_{t}$ on $G_{\mathrm{ab}}$ .

Using the interpretation that the relaxation time is the exponential decay rate of the total variation distance, by taking power $1/t$ and letting $t\to\infty$ on both sides of (21) and (22), we shall see that $t^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}}$ if $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ .

As a direct consequence of Theorem 2 and Corollary 1 we can now present the proof of Theorem 3.

Theorem 3.

Suppose $G$ is a nilpotent group with rank $r$ and step $L$ such that either (i) $G_{ab}\cong\mathbb{Z}^{r}_{m}$ where $m\in\mathbb{N}$ or (ii) $G$ is a $p$ -group. Suppose the rank and step satisfy $Lr^{L+1}\leq\frac{\log|G|}{16\log\log|G|}$ . For any symmetric set of generators $S\subseteq G$ of minimal size and any given $\varepsilon>0$ , the mixing time $t^{G,S}_{mix}(\varepsilon)$ is the same up to smaller order terms, and the relaxation time $t_{\mathrm{rel}}^{G,S}$ is the same.

Proof.

Let $S$ be a minimal size symmetric set of generators for $G$ and let $S_{G_{2}}=\{G_{2}s:s\in S\}$ . By Proposition 4 we can see that $S_{G_{2}}$ is a minimal size symmetric set of generators for $G_{\mathrm{ab}}$ . As $G_{\mathrm{ab}}$ is an abelian group, it can be expressed in the form

G_{\mathrm{ab}}\cong\mathbb{Z}_{m_{1}}\oplus\mathbb{Z}_{m_{2}}\oplus\cdots\oplus\mathbb{Z}_{m_{r}},

(23)

where $m_{1},\dots,m_{r}\in\mathbb{Z}$ and $r$ is the rank of $G$ . In general the choice of $m_{1},\dots,m_{r}$ is not unique (e.g., $\mathbb{Z}_{2}\oplus\mathbb{Z}_{15}\cong\mathbb{Z}_{6}\oplus\mathbb{Z}_{5}$ ), but when $G$ satisfies the assumption in the statement the expression in (23) is unique: in case (i) $G_{ab}\cong\mathbb{Z}^{r}_{m}$ ; in case (ii) we can write $G_{\mathrm{ab}}\cong\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}}$ for some $\alpha_{i}\in\mathbb{\mathbb{Z}}$ with $\alpha_{1}\leq\dots\leq\alpha_{r}$ . Hence, any minimal size symmetric set $S_{G_{2}}$ that generates $G_{\mathrm{ab}}$ uniquely corresponds to $\{\pm e_{i}:i\in[r]\}$ , the collection of standard basis. Thus the mixing time $t^{G_{\mathrm{ab}},S}_{mix}(\varepsilon)$ on $G_{\mathrm{ab}}$ for any such $S$ is equal to the mixing time of the walk on $\mathbb{Z}^{r}_{m}$ (or $\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}}$ ) with generators $\{\pm e_{i}:i\in[r]\}$ .

As our assumptions on $r$ and $L$ guarantees $|S|=2r\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ , we can apply Theorem 2 to show that the mixing time $t^{G,S}_{mix}(\varepsilon)$ is the equal to (up to smaller order terms) the mixing time of the walk on $\mathbb{Z}^{r}_{m}$ (or $\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}}$ ) with generators $\{\pm e_{i}:i\in[r]\}$ for any minimal size symmetric set of generators $S$ of $G$ . The result for relaxation time follows similarly from Corollary 1. ∎

3.1. Proofs

Lemma 3.1.

For $t\geq 0$ ,

\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}=\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}.

Proof.

Write $G_{\mathrm{ab}}=\{G_{2}g_{i}:1\leq i\leq|G_{\mathrm{ab}}|\}$ . One can easily check that starting from the initial distribution ${\pi_{G_{2}}}$ , for any $1\leq i\leq|G_{\mathrm{ab}}|$ and $h,h^{\prime}\in G_{2}g_{i}$ , $\mathbb{P}_{\pi_{G_{2}}}(X_{t}=h)=\mathbb{P}_{\pi_{G_{2}}}(X_{t}=h^{\prime})$ . Hence,

	$\displaystyle 2\\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\\|_{\mathrm{TV}}$	$\displaystyle=\sum_{i=1}^{\|G_{\mathrm{ab}}\|}\sum_{x\in G_{2}g_{i}}\bigg{\|}\mathbb{P}_{\pi_{G_{2}}}(X_{t}=x)-\frac{1}{\|G\|}\bigg{\|}=\sum_{i=1}^{\|G_{\mathrm{ab}}\|}\bigg{\|}\mathbb{P}_{\mathrm{id}}(X_{t}\in G_{2}g_{i})-\frac{\|G_{2}\|}{\|G\|}\bigg{\|}$
		$\displaystyle=\sum_{i=1}^{\|G_{\mathrm{ab}}\|}\bigg{\|}\mathbb{P}_{G_{2}}(Y_{t}=G_{2}g_{i})-\frac{1}{\|G_{\mathrm{ab}}\|}\bigg{\|}$
		$\displaystyle=2\\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\\|_{\mathrm{TV}}.$

∎

It remains to upper bound the difference $\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}$ .

Lemma 3.2.

For $t\geq 0$ ,

\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}\leq\frac{|G|}{2}\exp\left(-\frac{t}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right).

Remark 5.

The conclusion in Lemma 3.2 holds if we replace $G_{2}$ by any subgroup $H$ of $G$ .

Proof.

Let $P$ be the transition matrix of the simple random walk $X_{t}$ on $\text{Cay}(G,S)$ . For $t\geq 0$ , we can define the continuous time kernel by $P_{t}:=\sum_{n=0}^{\infty}\frac{(tP)^{n}}{n!}e^{-t}$ .

Consider the linear subspace of functions

\mathcal{A}:=\{f:G\to\mathbb{R}\big{|}\sum_{x\in G_{2}g}f(x)=0\text{ for all }g\in G\}.

We now show that $\mathcal{A}$ is invariant under the transition matrix $P$ , i.e., $Pf\in\mathcal{A}$ for all $f\in\mathcal{A}$ . For any $g\in G$ ,

	$\displaystyle\sum_{x\in G_{2}g}Pf(x)$	$\displaystyle=\sum_{h\in G_{2}}Pf(hg)=\sum_{h\in G_{2}}\sum_{y\in G}P(hg,y)f(y)=\sum_{h\in G_{2}}\sum_{z\in G}P(hg,hz)f(hz)$
		$\displaystyle=\sum_{z\in G}P(g,z)\left(\sum_{h\in G_{2}}f(hz)\right)=0,$

where the second line uses the fact that $P$ is translation invariant, i.e., $P(hg,hz)=P(g,z)$ for any $h,g,z\in G$ and that $f\in\mathcal{A}$ .

Let $\tilde{P}$ denote the transition matrix of the SRW on $\text{Cay}(G,\tilde{S})$ with $\tilde{S}:=G_{2}$ . We can also check that $\tilde{P}f=0$ for all $f\in\mathcal{A}$ , i.e.,

\tilde{P}f(x)=\sum_{y\in G}\tilde{P}(x,y)f(y)=\sum_{y\in G}\frac{\mathbf{1}\{x^{-1}y\in G_{2}\}f(y)}{|G_{2}|}=0\quad\text{ for all }x\in G.

Hence, for $f\in\mathcal{A}$ we have that (below $X$ and $Y$ are independent)

	$\displaystyle\tilde{\mathcal{E}}(f,f)$	$\displaystyle:=\frac{1}{2}\mathbb{E}_{X\sim\pi,Y\sim\mathrm{Unif}(\tilde{S})}[(f(X)-f(XY))^{2}]=\\|f\\|_{2}^{2}-\mathbb{E}_{\pi}[f(\tilde{P}f)]=\\|f\\|_{2}^{2},$
	$\displaystyle\mathcal{E}(f,f)$	$\displaystyle:=\frac{1}{2}\mathbb{E}_{X\sim\pi,Y\sim\mathrm{Unif}(S)}[(f(X)-f(XY))^{2}]=\\|f\\|_{2}^{2}-\mathbb{E}_{\pi}[f(Pf)]\geq\\|f\\|_{2}^{2}-\\|f\\|_{2}\\|Pf\\|_{2},$

where $\|f\|_{2}:=\left(\mathbb{E}_{\pi}[f^{2}]\right)^{1/2}=\left(\sum_{g\in G}\frac{1}{|G|}f^{2}(g)\right)^{1/2}$ is the $\ell_{2}$ norm of $f$ . Applying Theorem 4.4 in [2], which gives a comparison of Dirichlet forms for two sets of generators, we see that for the two symmetric random walks on the finite group $G$ with transition matrices $P$ and $\tilde{P}$ defined as before,

\tilde{\mathcal{E}}(f,f)\leq|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\cdot\mathcal{E}(f,f)

for all functions $f:G\to\mathbb{R}$ . That is,

1-\frac{\|Pf\|_{2}}{\|f\|_{2}}=\frac{\mathcal{E}(f,f)}{\tilde{\mathcal{E}}(f,f)}\geq\frac{1}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}},

and it follows by induction that

\|P^{n}f\|_{2}\leq\|f\|_{2}\left(1-\frac{1}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right)^{n}.

Hence, for any $t\geq 0$ , recalling that $P_{t}:=\sum_{n=0}^{\infty}\frac{(tP)^{n}}{n!}e^{-t}$ , we have

\|P_{t}f\|_{2}\leq\|f\|_{2}\exp\left(-\frac{t}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right).

(24)

Define the function $\varphi:G\to\mathbb{R}$ by $\varphi(g)=\mathbf{1}\{g=\mathrm{id}\}-\mathbf{1}\{g\in G_{2}\}/|G_{2}|$ so that

P_{t}\varphi(g)=\mathbb{E}_{g}[\varphi(X_{t})]=\sum_{x\in G}P_{t}(g,x)\varphi(x)=\sum_{x\in G}P_{t}(x,g)\varphi(x).

By our choice of $\varphi$ it is easy to see that $P_{t}\varphi(g)=\mathbb{P}_{\mathrm{id}}(X_{t}=g)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=g)$ and $\varphi\in\mathcal{A}$ . By the definition of $\varphi$ one can check that $\|\varphi\|_{2}\leq\|\varphi\|_{\infty}\leq 1$ . Note that it follows from Cauchy-Schwarz inequality that

4\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|^{2}_{\mathrm{TV}}=\left(\sum_{g\in G}|P_{t}\varphi(g)|\right)^{2}\leq|G|^{2}\cdot\|P_{t}\varphi\|^{2}_{2}.

Therefore, applying $\varphi$ in (24) gives

	$\displaystyle\\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\\|_{\mathrm{TV}}$	$\displaystyle\leq\frac{\|G\|}{2}\cdot\\|P_{t}\varphi\\|_{2}$
		$\displaystyle\leq\frac{\|G\|}{2}\exp\left(-\frac{t}{\|S\|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right).$

∎

Proof of Theorem 2. The first inequality $t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)$ follows directly from the projection of $G$ onto $G_{\mathrm{ab}}$ . To prove the second part of the inequality, observe that by the triangle inequality

\|P_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}+\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}.

Lemma 3.1 shows that $\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}\leq\varepsilon-\delta$ for $t:=t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)$ .

It remains to prove $\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}\leq\delta$ . Recall from Corollary 2 that $\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|S|^{4L}$ when $|S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}$ . Note that as a direct consequence of Proposition 13.7 in [25], which is a simple application of the Carne-Varopoulos inequality, one has that for all $\varepsilon\leq 1-|G_{\mathrm{ab}}|^{-1/4}$ ,

t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\gtrsim\frac{\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{2}}{\log|G_{\mathrm{ab}}|}.

(25)

It then follows from (25) and Corollary 2 that

\displaystyle t^{G_{\mathrm{ab}}}_{\mathrm{mix}}(\varepsilon-\delta)

\displaystyle\gtrsim\frac{\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{2}}{\log|G_{\mathrm{ab}}|}\geq\frac{\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{1/4}}{\log|G|}\cdot\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{7/4}\gg(\log|G|)^{L}\cdot|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2},

where we recall that $L\geq 2$ . We can then apply Lemma 3.2 to $t:=t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)$ and get

\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}\ll|G|\exp\left(-(\log|G|)^{L}\right)\leq\delta

when $|G|$ is large enough. The proof is then complete.

∎

Proof of Corollary 1. It is straightforward to extend the conclusion in Corollary 12.7 of [32] to continuous time Markov chains to get

\lim_{t\to\infty}\frac{\log(d(t))}{t}=-\frac{1}{t_{\mathrm{rel}}},

where $d(t)$ denotes the total variation distance to stationarity and $t_{\mathrm{rel}}$ denotes the corresponding relaxation time. Recall from (21) and (22) that

	$\displaystyle\\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\\|_{\mathrm{TV}}$	$\displaystyle\leq\\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\\|_{\mathrm{TV}}$
		$\displaystyle\leq\\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\\|_{\mathrm{TV}}+\\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\\|_{\mathrm{TV}},$

Define $f(\cdot)=\lim_{t\to\infty}\frac{\log(\cdot)}{t}$ . We can apply the function $f$ to both sides in the above inequality and use Lemma 3.1 and 3.2 to obtain

t^{G^{\mathrm{ab}}}_{\mathrm{rel}}\leq t^{G}_{\mathrm{rel}}\leq\max\{t^{G^{\mathrm{ab}}}_{\mathrm{rel}},|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\}.

∎

4. On Cayley Graphs with Random i.i.d. Generators

We consider the random walk $X(t)$ on the Cayley graph of a finite nilpotent group $G$ with respect to $k$ generators chosen uniformly at random. The graph is denoted by $\text{Cay}(G,S)$ , where the generator set $S:=\{Z_{i}^{\pm 1}:i\in[k]\}$ with $Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G)$ . The regime of interest in this paper is $1\ll\log k\ll\log|G|$ . In this section, our aim is to prove Theorem 1, which we have restated below for ease of reference.

Theorem 1.

The structure of this section is as follows: in Section 4.1 we will give an overview of the entropic method within the context of Theorem 1 and how it is useful as a framework for proving bounds on the mixing time; in Section 4.2 we prove the lower bound on the mixing time.

Establishing the corresponding upper bound on the mixing time requires significant effort, which involves meticulous analysis of the distribution of the RW on each quotient group $Q_{\ell}=G_{\ell}/G_{\ell+1}$ for $\ell\in[L]$ . To this end we give a representation of the random walk $X(t)$ on $\text{Cay}(G,S)$ in Section 4.3. Subsequently, we provide an outline of the proof for the upper bound on mixing time in Section 4.4 and complete the proof in the remaining of the paper.

4.1. Entropic Method and Entropic Times

Let $S:=\{Z_{i}^{\pm}\in G:i\in[k]\}$ denote the symmetric set of generators of $G$ . We define the auxiliary process $W:=W(t)=(W_{1}(t),\dots,W_{k}(t))$ based on $X(t)$ where $W_{i}(t)$ is the number of times generator $Z_{i}$ has been applied minus the number of times $Z_{i}^{-1}$ has been applied in the random walk $X(t)$ . It is easy to see $W(t)$ is a rate 1 random walk on $\mathbb{Z}^{k}$ . To simplify notation, we sometimes drop the time index $t$ when it is clear from the context.

Let $(X,W)$ and $(X^{\prime},W^{\prime})$ be two independent copies of the random walk on $\text{Cay}(G,S)$ starting at $\mathrm{id}$ and its auxiliary process. Denote by $\mathcal{W}:=\mathcal{W}(t)\subseteq\mathbb{Z}^{k}$ a subset of the state space of the walk $W$ , where the index suggests that the precise choice of $\mathcal{W}$ (which is postponed to Definition 5) depends on the time $t$ . For now we will think of $\mathcal{W}$ as a set of “typical” locations of $W$ such that $\mathbb{P}(W(t)\notin\mathcal{W})=o(1)$ for the relevant choice of $t$ .

We use a “modified $L^{2}$ calculation”: first conditioning on $W$ being “typical”; then using a standard $L^{2}$ calculation on the conditioned law. The proof of the following lemma is quite straightforward and hence is omitted.

Lemma 4.1 (Lemma 2.6 of [17]).

For all $t\geq 0$ and all $\mathcal{W}\subseteq\mathbb{Z}^{k}$ the following inequalities hold:

	$\displaystyle d_{S}(t):=\\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\\|_{\mathrm{TV}}$	$\displaystyle\leq\\|\mathbb{P}_{S}(X(t)\in\cdot\|W(t)\in\mathcal{W})-\pi_{G}\\|_{\mathrm{TV}}+\mathbb{P}(W(t)\notin\mathcal{W})$
	$\displaystyle 4\\|\mathbb{P}_{S}(X(t)\in\cdot\|W(t)\in\mathcal{W})-\pi_{G}\\|^{2}_{\mathrm{TV}}$	$\displaystyle\leq\|G\|\cdot\mathbb{P}_{S}(X(t)=X^{\prime}(t)\|W(t),W^{\prime}(t)\in\mathcal{W})-1$

where $\mathbb{P}_{S}$ denotes the law of the random walk given the generator set $S$ starting at $X(0)=\mathrm{id}$ .

Note that when $S$ is a random set of generators, $d_{S}(t)$ is a random variable that is measurable with respect to $\sigma(S)$ , the $\sigma$ -field generated by the choice of $S$ . In what comes later in our arguments, we will take the expectation over the choices of $S$ and work with

\mathbb{E}[\mathbb{P}_{S}(X(t)=X^{\prime}(t)|W(t),W^{\prime}(t)\in\mathcal{W})]=:\mathbb{P}(X(t)=X^{\prime}(t)|W(t),W^{\prime}(t)\in\mathcal{W}).

A good choice of the set of “typical” locations $\mathcal{W}$ will greatly simplify the analysis. Hence, in the remaining of this section, we will discuss in detail the choice of $\mathcal{W}$ , or in other words, the “typical event” that we will condition on. As our goal is to obtain an upper bound on the total variation mixing time, we will look at a time $t$ that is slightly larger than the proposed mixing time, which is the entropic times that will be defined in Section 4.1.1. Based on this choice of time we then define the typical event in Section 4.1.2.

4.1.1. Asymptotics of Entropic Times

Recall from Definition 3 that the entropic time $t_{0}:=t_{0}(k,|G_{\mathrm{ab}}|)$ is the time at which the entropy of the rate 1 random walk $W$ on $\mathbb{Z}^{k}$ is $\log|G_{\mathrm{ab}}|$ .

For simplicity in simplicity of notation, we will write $t_{1}:=\log_{k}|G|$ and let the cutoff time be denoted by $t_{*}:=t_{*}(k,G)=\max\{t_{0},t_{1}\}$ .

The entropy of the random walk on $\mathbb{Z}^{k}$ has been well understood. The interested reader can find a detailed exposition of the entropic times in [18]. Via direct calculation with the simple random walk and Poisson laws, one can obtain asymptotics of entropic times, see, e.g., [18, Proposition A.2 and §A.5] for full details. Here we content ourselves with restating the result of such calculation, which can be found in Proposition 2.2 in [19], that yields the asymptotic values of $t_{0}$ :

\begin{array}[]{ll}t_{0}\eqsim k\cdot|G_{\mathrm{ab}}|^{2/k}/(2\pi e)&\quad\text{when }k\ll\log|G_{\mathrm{ab}}|,\\ t_{0}\eqsim k\cdot f(\lambda)&\quad\text{when }k\eqsim\lambda\log|G_{\mathrm{ab}}|,\\ t_{0}\eqsim k\cdot 1/(\kappa\log\kappa)&\quad\text{when }k\gg\log|G_{\mathrm{ab}}|,\end{array}

(26)

where $\kappa:=k/\log|G_{\mathrm{ab}}|$ and $f:(0,\infty)\to(0,\infty)$ is some continuous decreasing function whose exact value is unimportant for our analysis. Since we assume $r\asymp 1$ and $L\asymp 1$ , it follows from Corollary 4 that $\log|G_{\mathrm{ab}}|\asymp\log|G|$ . Consequently, it is possible to have $t_{1}>t_{0}$ only in the regime $k\gg\log|G_{\mathrm{ab}}|$ . To be more specific, writing $\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}$ , in the regime $k\gg\log|G_{\mathrm{ab}}|$ we have $\liminf_{|G|\to\infty}\rho\geq 1$ and

\begin{cases}t_{0}>t_{1}&\quad\text{ when }\frac{\rho}{\rho-1}>\frac{\log|G|}{\log|G_{\mathrm{ab}}|},\\ t_{0}\leq t_{1}&\quad\text{ otherwise.}\end{cases}

The following proposition gives the asymptotics of the cutoff time $t_{*}$ for the regimes of $k$ that we are interested in.

Proposition 5.

Writing $\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}$ , we have the following asymptotics of $t_{*}(k,G)$ :

t_{*}(k,G)\eqsim\begin{cases}k|G_{\mathrm{ab}}|^{2/k}/(2\pi e)&\quad\text{when}\quad k\ll\log|G_{\mathrm{ab}}|,\\ kf(\lambda)&\quad\text{when}\quad k\eqsim\lambda\log|G_{\mathrm{ab}}|,\\ k/(\kappa\log\kappa)&\quad\text{when}\quad k\gg\log|G_{\mathrm{ab}}|,\frac{\rho}{\rho-1}>\frac{\log|G|}{\log|G_{\mathrm{ab}}|},\\ \log_{k}|G|&\quad\text{when}\quad k\gg\log|G_{\mathrm{ab}}|,\frac{\rho}{\rho-1}\leq\frac{\log|G|}{\log|G_{\mathrm{ab}}|},\end{cases}

where $f$ is defined as in (26) and $\kappa:=k/\log|G_{\mathrm{ab}}|$ .

4.1.2. Typical Event

For simplicity of notation, in this section we will drop the index of time $t$ and write $W:=W(t)$ and $X:=X(t)$ . Write $W=(W_{1},W_{2},\dots,W_{k})$ , where for each $a\in[k]$ , $W_{a}$ is an independent rate $1/k$ random walk on $\mathbb{Z}$ . For each $a\in[k]$ , define $W_{a}^{+}$ to be the number of steps to the right and $W_{a}^{-}$ the number of steps to the left in the walk $W_{a}$ . It is then easy to see $W_{a}=W_{a}^{+}-W_{a}^{-}$ .

To upper bound the key quantity $\mathbb{P}(X=X^{\prime}|W,W^{\prime}\in\mathcal{W})$ from Lemma 4.1, we will separate it into cases according to whether or not $W=W^{\prime}$ :

	$\displaystyle\mathbb{P}(X=X^{\prime}\|W,W^{\prime}\in\mathcal{W})$	$\displaystyle=\mathbb{P}(X=X\|\{W=W^{\prime}\}\cap\{W,W^{\prime}\in\mathcal{W}\})\mathbb{P}(W=W^{\prime}\|W,W^{\prime}\in\mathcal{W})$
		$\displaystyle\quad+\mathbb{P}(X=X\|\{W\neq W^{\prime}\}\cap\{W,W^{\prime}\in\mathcal{W}\})\mathbb{P}(W\neq W^{\prime}\|W,W^{\prime}\in\mathcal{W}).$

As suggested by this decomposition, bounding $\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W})$ plays an important role in the proof. Since $W,W^{\prime}$ are two independent copies,

\displaystyle\mathbb{P}(\{W=W^{\prime}\}\cap\{W,W^{\prime}\in\mathcal{W}\})

\displaystyle=\sum_{w\in\mathcal{W}}\mathbb{P}(W=w)\mathbb{P}(W^{\prime}=w)\leq\max_{w\in\mathcal{W}}\mathbb{P}(W=w),

i.e.,

\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W})\leq\frac{\max_{w\in\mathcal{W}}\mathbb{P}(W=w)}{\mathbb{P}(W,W^{\prime}\in\mathcal{W})}.

It now becomes clearer that in order to control the probability $\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W})$ , we would like to choose $\mathcal{W}$ so that $\mathbb{P}(W,W^{\prime}\in\mathcal{W})=1-o(1)$ and $\max_{w\in\mathcal{W}}\mathbb{P}(W=w)$ is sufficiently small.

For $t\geq 0$ , write $\mu_{t}$ for the law of $W(t)$ , the rate 1 random walk on $\mathbb{Z}^{k}$ , so that $\mu_{t}(w)=\mathbb{P}(W(t)=w)$ . Also write $\nu_{s}$ for the law of $W_{1}(sk)$ so that $\mu_{t}=\nu_{t/k}^{\otimes k}$ . Also, for each $a\in[k]$ , define

Q_{a}(t):=-\log\nu_{t/k}(W_{a}(t))\quad\text{ and }\quad Q(t):=-\log\mu_{t}(W(t))=\sum_{a=1}^{k}Q_{a}(t).

Then $\mathbb{E}(Q(t))$ (and respectively $\mathbb{E}(Q_{1}(t))$ ) is the entropy of $W(t)$ (and respectively $W_{1}(t)$ ).

We need an estimate on the entropy $\mathbb{E}(Q(t))$ shortly after the proposed mixing time, i.e., for $t\geq(1+\varepsilon)t_{*}(k,G)$ , for which we refer to results in [19]. To state their results, we need to define the following quantity.

Definition 4.

Define $h_{0}$ as follows

h_{0}:=\begin{cases}\log|G_{\mathrm{ab}}|&\text{ when }t_{0}(k,|G_{\mathrm{ab}}|)>\log_{k}|G|,\\ (1-\frac{1}{\rho})\log|G|&\text{ when }t_{0}(k,|G_{\mathrm{ab}}|)\leq\log_{k}|G|\end{cases}

where $\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}$ . Fix some $\omega$ such that $1\ll\omega\ll\min\{k,\log|G_{\mathrm{ab}}|\}$ , and set $h:=h_{0}+\omega$ .

Remark 6.

Note that $\log|G_{\mathrm{ab}}|\leq h_{0}$ in both cases.

Lemma 3.9 in [19] proves concentration of $Q(t)$ whereas we only need one side of their estimate, which we state below.

Lemma 4.2 (Lemma 3.9 in [19]).

Assume that $\omega\ll\min\{k,\log|G_{\mathrm{ab}}|\}$ . Let $\varepsilon>0$ and $t\geq(1+\varepsilon)t_{*}(k,G)$ . Then

\mathbb{P}(Q(t)\geq h)=\mathbb{P}(\mu_{t}(W(t))\leq e^{-h})=1-o(1).

Based on the above discussion and Lemma 4.2, it makes sense to define the (global) typical event as follows

\mathcal{W}_{glo}:=\{w\in\mathbb{Z}^{k}:\mathbb{P}(W(t)=w)\leq e^{-h}\}.

(27)

Write $W^{\pm}:=(W^{\pm}_{1},W^{\pm}_{2},\dots,W^{\pm}_{k})$ . Let $w\in\mathbb{Z}^{k}$ denote a realization of $W$ and the corresponding $(w^{+},w^{-})\in\mathbb{Z}^{k}_{+}\times\mathbb{Z}^{k}_{+}$ a realization of $(W^{+},W^{-})$ . To have better control over the behavior of each coordinate, we further define

\mathcal{W}_{loc}:=\{(w^{+},w^{-})\in\mathbb{Z}^{k}_{+}\times\mathbb{Z}^{k}_{+}:|w^{\pm}_{a}-\mathbb{E}(W^{\pm}_{a}(t))|\leq r_{*},\forall a\in[k]\}\\

(28)

where $r_{*}:=\frac{1}{2}|G_{\mathrm{ab}}|^{1/k}(\log k)^{2}$ . It can be observed that $r_{*}$ is defined based on $t_{0}(k,|G_{\mathrm{ab}}|)$ when $k\lesssim\log|G_{\mathrm{ab}}|$ so that $W^{\pm}(t)\in\mathcal{W}_{loc}$ whp by a union bound on the $k$ coordinates. In fact, we will use $\mathcal{W}_{loc}$ only in the regime $k\lesssim\log|G_{\mathrm{ab}}|$ .

In the regime $k\gtrsim\log|G_{\mathrm{ab}}|$ we have $t_{*}/k\lesssim 1$ . By Poisson thinning, for each $a\in[k]$ the arrivals of the generators $Z_{a}^{\pm 1}$ follow an independent Poisson process with rate $1/k$ . Then $t_{*}/k\lesssim 1$ implies that each generator is expected to appear for $O(1)$ times in the walk $X$ . Thus in this regime we will focus on the collection of generators that appear exactly once. Define, for $(w^{+},w^{-})\in\mathbb{Z}_{+}^{k}\times\mathbb{Z}_{+}^{k}$ ,

\mathcal{J}(w^{+},w^{-}):=\{a\in[k]:w^{+}_{a}+w^{-}_{a}=1\}\quad\text{ and }\quad J(w^{+},w^{-})=|\mathcal{J}(w^{+},w^{-})|

so that $\mathcal{J}(w^{+},w^{-})$ is the index set of generators that appear exactly once in the realization $\{W=w\}$ .

Moreover, for a sufficiently small $\varepsilon>0$ , define

\mathcal{W}_{once}:=\{(w^{+},w^{-})\in\mathbb{Z}^{k}_{+}\times\mathbb{Z}^{k}_{+}:|J(w^{+},w^{-})-te^{-t/k}|\leq\frac{1}{2}\varepsilon te^{-t/k}\}.

(29)

We can observe that the distribution of $J$ is $\mathrm{Binomial}(k,(t/k)e^{-t/k})$ so that when $te^{-t/k}\gg 1$ (which holds when $k\gtrsim\log|G|$ and $\log k\ll\log|G|$ ), $\mathcal{W}_{once}$ occurs with high probability for any $\varepsilon>0$ .

Definition 5.

Let $\mathcal{W}_{glo},\mathcal{W}_{loc},\mathcal{W}_{once}$ be defined as in (27),(28) and (29). Define the typical event

\mathrm{\mathbf{typ}}:=\begin{cases}\{W,W^{\prime}\in\mathcal{W}_{glo}\}\cap\{(W^{+},W^{-}),((W^{\prime})^{+},(W^{\prime})^{-})\in\mathcal{W}_{loc}\}&\text{ when }k\ll\log|G_{\mathrm{ab}}|,\\ \{W,W^{\prime}\in\mathcal{W}_{glo}\}\cap\{(W^{+},W^{-}),((W^{\prime})^{+},(W^{\prime})^{-})\in\mathcal{W}_{loc}\cap\mathcal{W}_{once}\}&\text{ when }k\eqsim\lambda\log|G_{\mathrm{ab}}|,\\ \{W,W^{\prime}\in\mathcal{W}_{glo}\}\cap\{(W^{+},W^{-}),((W^{\prime})^{+},(W^{\prime})^{-})\in\mathcal{W}_{once}\}&\text{ when }k\gg\log|G_{\mathrm{ab}}|.\\ \end{cases}

Lemma 4.3.

$\mathbb{P}(\mathrm{\mathbf{typ}})=1-o(1)$ .

Proof.

Lemma 4.2 implies that $W,W^{\prime}\in\mathcal{W}_{glo}$ with high probability. The proof for $\mathbb{P}((W^{+},W^{-})\in\mathcal{W}_{loc})=1-o(1)$ when $k\lesssim\log|G_{\mathrm{ab}}|$ follows from standard large deviation estimation. The proof for $\mathbb{P}((W^{+},W^{-})\in\mathcal{W}_{once})=1-o(1)$ when $k\gtrsim\log|G_{\mathrm{ab}}|$ also follows from standard large deviation estimation. ∎

4.2. Lower Bound on Mixing Time

As projection does not increasing the total variation distance, to find a lower bound on the mixing time we can consider the projection of the original random walk $X(t)$ , defined by $Y(t):=G_{2}X(t)$ , on the projected Cayley graph $\text{Cay}(G_{\mathrm{ab}},\{G_{2}Z_{i}^{\pm 1}:i\in[k]\})$ . As $G_{\mathrm{ab}}$ is abelian, the mixing behavior of $Y_{t}$ is well understood, see [17, 19].

The key idea of proving the lower bound comes from a concentration result of the entropy, which has been proved in the literature and is restated here for the sake of self-containment.

Proposition 6 (Proposition 2.3 in [19]).

Assume that $k$ satisfies $1\ll\log k\ll\log|G|$ and recall $t_{0}:=t_{0}(k,|G_{\mathrm{ab}}|)$ as in Definition 3. Then $\mathrm{Var}(Q(t_{0}))\gg 1$ and further, for $\varepsilon>0$ , writing $\omega:=(\mathrm{Var}(Q(t_{0})))^{1/4}$ , we have

\mathbb{P}(Q((1-\varepsilon)t_{0})\geq\log|G_{\mathrm{ab}}|-\omega)\to 0.

The proof of the lower bound follows from a somewhat conventional argument within the entropic methodology. Our proof is essentially a restatement of the proof in Section 3.3 of [19], which we include for the purpose of being self-contained.

Lemma 4.4.

Assume that $1\ll\log k\ll\log|G|$ . Let $S=\{s^{\pm}_{i}:i\in[k]\}\subseteq G$ be a given generator set. For any $\varepsilon>0$ and $t\leq(1-\varepsilon)t_{*}(k,G)$ ,

\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}}\geq 1-o(1),

where $\mathbb{P}_{S}$ denotes the law of the random walk $X$ with $X(0)=\mathrm{id}$ given the generator set $S$ .

Proof.

Since the argument does not depend on the choice of $S:=\{s_{i}^{\pm 1}:i\in[k]\}$ we will suppress it from notation and write $\mathbb{P}$ for $\mathbb{P}_{S}$ .

Recall that $t_{*}(k,G)=\max\{\log_{k}|G|,t_{0}(k,|G_{\mathrm{ab}})\}$ . To argue that $(1-\varepsilon)\log_{k}|G|$ is a lower bound on the mixing time, first observe that in $m\in\mathbb{N}$ steps the support of the random walk $X$ is at most of size $k^{m}$ . When $m\leq(1-\varepsilon)\log_{k}|G|$ , the support has size at most $|G|^{1-\varepsilon}$ and hence the walk cannot be mixed in this many steps.

Let $t:=(1-\varepsilon)t_{0}(k,|G_{\mathrm{ab}}|)$ and define

\mathcal{E}:=\{\mu_{t}(W(t))\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}\}=\{Q(t)\leq\log|G_{\mathrm{ab}}|-\omega\}

with $\omega\gg 1$ from Proposition 6, by which we have $\mathbb{P}(\mathcal{E})=1-o(1)$ .

Let $\Pi:G\to G_{\mathrm{ab}}$ denote the canonical projection. Consider

E:=\{x\in G_{\mathrm{ab}}:\exists w\in\mathbb{Z}^{k}\text{ s.t. }\mu_{t}(w)\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}\text{ and }x=\Pi(Z_{1}^{w_{1}}\cdots Z_{k}^{w_{k}})\}\subseteq G_{\mathrm{ab}}.

Based on the definition of $E$ we have $\mathbb{P}(\Pi(X(t))\in E|\mathcal{E})=1$ . Every element $x\in E$ satisfies $x=\Pi(Z_{1}^{w^{x}_{1}}\cdots Z_{k}^{w^{x}_{k}})$ for some $w^{x}\in\mathbb{Z}^{k}$ with $\mu_{t}(w^{x})\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}$ . Hence, for all $x\in E$ , we have

\mathbb{P}(\Pi(X(t))=x)\geq\mathbb{P}(W(t)=w^{x})=\mu_{t}(w^{x})\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}.

Summing over $x\in E$ gives

1\geq\sum_{x\in E}\mathbb{P}(\Pi(X(t))=x)\geq|E|\cdot|G_{\mathrm{ab}}|^{-1}e^{\omega}

and hence $|E|/|G_{\mathrm{ab}}|\leq e^{-\omega}-o(1)$ . Therefore,

\|\mathbb{P}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}}\geq\mathbb{P}(X(t)\in\Pi^{-1}(E))-\pi_{G}(\Pi^{-1}(E))\geq\mathbb{P}(\mathcal{E})-|E|/|G_{\mathrm{ab}}|=1-o(1),

which completes the proof.

∎

4.3. Representation of $X(t)$

Let $N:=N(t)$ be the number of steps taken by the continuous time rate 1 random walk $X:=X(t)$ on the group $G$ . For $i\in[N]$ , we can write (the increment of) the $i$ -th step taken by $X$ as $Z^{\eta_{i}}_{\sigma_{i}}$ , where $\sigma_{i}\overset{iid}{\sim}\mathrm{Unif}([k])$ and $\eta_{i}\overset{iid}{\sim}\mathrm{Unif}\{\pm 1\}$ . Then we can express $X=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}$ , and similarly $X^{\prime}=\prod_{i=1}^{N^{\prime}}Z^{\eta^{\prime}_{i}}_{\sigma^{\prime}_{i}}$ , where $N^{\prime},\{\eta^{\prime}_{i}:i\in[N^{\prime}]\},\{\sigma^{\prime}_{i}:i\in[N^{\prime}]\}$ are independent random variables defined analogously. Note that for $a\in[k]$ , $W_{a}(t)=\sum_{i=1}^{N(t)}\mathbf{1}\{\sigma_{i}=a\}\eta_{i}$ for $a\in[k]$ .

We are interested in the event $\{X=X^{\prime}\}$ , which is equivalent to $\{\mathbf{X}=\mathrm{id}\}$ , where

\mathbf{X}:=X(X^{\prime})^{-1}=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}\cdot\left(\prod_{i=1}^{N^{\prime}}Z^{\eta^{\prime}_{i}}_{\sigma^{\prime}_{i}}\right)^{-1}=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}\cdot\prod_{j=0}^{N^{\prime}-1}Z^{-\eta^{\prime}_{N^{\prime}-j}}_{\sigma^{\prime}_{N^{\prime}-j}}.

(30)

It is easy to see the law of $\mathbf{X}$ is the same as that of a rate 2 simple random walk on the same Cayley graph $\text{Cay}(G,S)$ . In fact, for simplicity of notation we will write (30) as

\mathbf{X}=\prod_{i=1}^{N+N^{\prime}}Z^{\eta_{i}}_{\sigma_{i}}

(31)

where $\sigma_{i}:=\sigma^{\prime}_{N^{\prime}+N-i}$ and $\eta_{i}:=-\eta^{\prime}_{N^{\prime}+N+1-i}$ for $N<i\leq N+N^{\prime}$ .

Our goal is to express $X$ (and analogously $\mathbf{X}$ ) in a way that the role of $W$ (and analogously $W-W^{\prime}$ ) is understood. If $G$ is Abelian we can simply rearrange the sequence $X=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}$ to obtain $X=Z_{1}^{W_{1}}\cdots Z_{k}^{W_{k}}$ . Although we do not have this nice and simple relation between $X$ and $W$ when $G$ is not abelian, we can still rearrange the terms in (30) and pay the price of adding an extra commutator, i.e., for $x,y\in G$ we can rewrite $xy$ as $yx[x,y]$ . For this reason, in some of our analysis we also care about the specific order in which each generator appears in $\mathbf{X}$ , which is why we will sometimes also refer to $\mathbf{X}$ as a sequence to emphasize this perspective. To be more specific, when we refer to $\mathbf{X}$ as a “sequence” we are referring to the corresponding $(\sigma_{i},\eta_{i})_{i\in[N+N^{\prime}]}$ .

More generally, for $x,y,z\in G$ , consider as an example the element $xyzx\in G$ . In order to express $xyzx$ in our desired form we can rearrange the terms in the sequence $xyzx$ as follows

	$\displaystyle xyzx$	$\displaystyle=xyxz[z,x]=x^{2}y[y,x]z[z,x]=x^{2}yz[y,x][[y,x],z][z,x]$
		$\displaystyle=x^{2}yz[y,x][z,x][[y,x],z][[[y,x],z],[z,x]].$		(32)

We can see from this example that rearranging the whole sequence $\mathbf{X}$ will result in commutators of the form $\{\rho(x_{1},...,x_{i}):x_{j}\in G,j\in[i],i\geq 2\}$ , which was defined in (9).

In terms of the sequence $\mathbf{X}$ , it will become clear later that we actually only need to keep track of the two-fold commutators of the form $\{[Z_{a},Z_{b}]:a,b\in[k]\}$ . Hence, we will write $V:=W-W^{\prime}$ and rearrange the terms in (31) to obtain the following expression

\mathbf{X}=Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k}),

(33)

where, letting $(\sigma_{i},\eta_{i})$ denote the $i$ -th generator and its sign in the sequence in (31),

m_{ba}:=-\sum_{i=1}^{N+N^{\prime}}\sum_{j<i}\eta_{i}\eta_{j}\mathbf{1}\{\sigma_{i}=a,\sigma_{j}=b\}\quad\text{ for }1\leq a<b\leq k,

(34)

and $\varphi(Z_{1},\dots,Z_{k})$ is the residual part as the result of the rearranging. To give a more specific description of $\varphi$ , define $\mathcal{C}_{\mathrm{com}}:=\{\rho(Z^{\pm 1}_{a_{1}},\dots,Z_{a_{i}}^{\pm 1}):a_{j}\in[k]\text{ for all }j\in[i],i\geq 2\}$ to be the collection of commutators of $\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}$ . A multi-fold commutator of $\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}$ refers to a term of the form $\rho(x_{1},\dots,x_{i})$ with $i\geq 2$ where $x_{j}\in\mathcal{C}_{\mathrm{com}}\cup\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}$ for all $j\in[i]$ , which is not simply a two-fold commutator of the form $\{[Z_{a_{1}}^{\pm 1},Z_{a_{2}}^{\pm 1}]:a_{1},a_{2}\in[k]\}$ . A multi-fold commutator consisting of $i$ pairs of brackets is said to be $(i+1)$ -fold. For example, see (4.3) where $[[y,x],z]$ is a 3-fold commutator and $[[[y,x],z],[z,x]]$ is a 5-fold commutator.

It will become clear in our later arguments that the specific order in which terms appear in $\varphi(Z_{1},\dots,Z_{k})$ is of no interest to us, and thus with some abuse of language we will sometimes refer to $\varphi(\cdot)$ as a polynomial with terms that are multi-fold commutators of $\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}$ .

4.4. Upper Bound on Mixing Time

In this section we prove the upper bound on the mixing time, for which the precise statement is presented below. Recall that $\mathbb{P}_{S}$ denotes the law of the random walk given the generator set $S$ starting at $X(0)=\mathrm{id}$ .

Theorem 5.

Let $G$ be a nilpotent group with $r(G),L(G)\asymp 1$ . Let $S=\{Z_{i}^{\pm 1}:i\in[k]\}$ with $Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G)$ and assume $1\ll\log k\ll\log|G|$ . For any $\varepsilon>0$ and $t\geq(1+\varepsilon)t_{*}(k,G)$ , we have $\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}}=o(1)$ with high probability.

Notice that when $1\ll k\ll\frac{\log|G|}{\log\log|G|}$ this theorem follows directly from Theorem 2 and hence in the proof we will focus on the regime $k\gtrsim\frac{\log|G|}{\log\log|G|}$ .

We will first define the notation that will be used throughout this section and explain why it is useful.

Definition 6.

For each $\ell\in[L]$ , define $Q_{\ell}:=G_{\ell}/G_{\ell+1}$ and let $r_{\ell}:=r(Q_{\ell})$ denote the rank of $Q_{\ell}$ . For each $Q_{\ell}$ , we will choose a set $R_{\ell}\subseteq G_{\ell}$ such that $|R_{\ell}|=|Q_{\ell}|$ and $Q_{\ell}=\{G_{\ell+1}g:g\in R_{\ell}\}$ .

\{X(X^{\prime})^{-1}=\mathrm{id}\}=\cap_{\ell=1}^{L+1}\{X(X^{\prime})^{-1}\in G_{\ell}\}=\cap_{\ell=1}^{L+1}\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\},

we will be interested in events related to $\{G_{\ell}X(X^{\prime})^{-1}\}_{\ell\in[L+1]}$ . In other words, we will decompose $X(X^{\prime})^{-1}$ with respect to the quotient groups $\{Q_{\ell}:\ell\in[L]\}$ and derive simplified expressions to make the distribution of $X(X^{\prime})^{-1}$ more tractable.

It will be tremendously useful if we can express $Z_{a}$ as a product of elements belonging to each layer of $\{G_{\ell}:\ell\in[L]\}$ . Indeed, the following lemma asserts that we can construct each generator $Z_{a}$ as a product of independent random variables $\{Z_{a,\ell}:\ell\in[L]\}$ .

Lemma 4.5 (Corollary 6.4 in [17]).

Let $\{Z_{a,\ell}:a\in[k],1\leq\ell\leq L\}$ be independent and such that $Z_{a,\ell}\sim\mathrm{Unif}(R_{\ell})$ . Then $G_{\ell+1}Z_{a,\ell}\sim\mathrm{Unif}(Q_{\ell})$ . Moreover, $Z_{a}:=\prod_{\ell=1}^{L}Z_{a,\ell}$ are i.i.d. uniform over $G$ for $a\in[k]$ .

4.4.1. Proof Framework for Theorem 5

As suggested by Lemma 4.1, in order to control the total variation distance to stationarity we will upper bound $D(t):=|G|\cdot\mathbb{P}(X=X^{\prime}|\mathrm{\mathbf{typ}})-1$ , where we also average over the choice of $S$ . Letting $V:=W-W^{\prime}$ , $D(t)$ can be further decomposed with respect to $V$ :

D(t)=|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})+|G|\cdot\mathbb{P}(X=X^{\prime},V=0|\mathrm{\mathbf{typ}})-1.

(35)

Again, for simplicity of notation we will suppress the dependence of the mixing times on $k$ and $G$ , and write $t_{0}:=t_{0}(k,|G_{\mathrm{ab}}|),t_{1}:=\log_{k}|G|$ and $t_{*}:=t_{*}(k,G)=\max\{t_{0},t_{1}\}$ . It follows easily from (35) that in order to prove Theorem 5 it suffices to prove the following two results.

Proposition 7.

For any $\varepsilon>0$ and $t\geq(1+\varepsilon)t_{*}$ , we have

\displaystyle|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})

\displaystyle=1+o(1).

Proposition 8.

For any $\varepsilon>0$ and $t\geq(1+\varepsilon)t_{*}$ , we have

\displaystyle|G|\cdot\mathbb{P}(X=X^{\prime},V=0|\mathrm{\mathbf{typ}})

\displaystyle=o(1).

The proofs of both Proposition 7 and 8 rely on the analysis of the events $\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\}_{\ell\in[L+1]}$ . In the case where $V\neq 0$ , writing $V=(V_{1},\dots,V_{k})$ , one will see in Section 4.6 that the analysis boils down to understanding the distribution of $(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell})_{\ell\in[L]}$ . When $V=0$ the analysis is significantly more involved, as in this case (33) turns into

\displaystyle X(X^{\prime})^{-1}

\displaystyle=Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k})=\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k})

and hence we need to carefully understand the distribution of the product of commutators. The proof of Proposition 8 is therefore considerably more involved, prompting us to provide an outline here that captures the main steps.

Proof outline of Proposition 8. To give an intuitive and brief explanation on why Proposition 8 is true, we will give the outline of the proof of Proposition 8 here. We will analyze $\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\}$ for each $2\leq\ell\leq L+1$ when $V=0$ .

The event $\{G_{2}X(X^{\prime})^{-1}=G_{2}\}$ is already guaranteed to occur if we condition on $\{V=0\}$ . In general, in order to understand each event $\{G_{\ell+2}X(X^{\prime})^{-1}=G_{\ell+2}\}$ for $1\leq\ell\leq L-1$ , we hope to show that for all $1\leq\ell\leq L-1$ , $G_{\ell+2}X(X^{\prime})^{-1}$ is close to being uniform over $Q_{\ell+1}:=G_{\ell+1}/G_{\ell+2}$ . To analyze the distribution of $G_{\ell+2}X(X^{\prime})^{-1}$ for each $\ell\in[L-1]$ , we observe from the representation of $X(X^{\prime})^{-1}$ given in (33) that $(m_{ba})_{a,b\in[k],a<b}$ , which is defined in (34), plays a crucial role in determining the distribution.

As we will later show in Section 4.8.3, after simplifying $G_{\ell+2}X(X^{\prime})^{-1}$ for $2\leq\ell\leq L-1$ (the case where $\ell=1$ is somewhat different and will be discussed separately in Section 4.8.4) one will eventually arrive at a term of the form

G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}],

where $\chi_{b}\in G$ satisfies $G_{2}\chi_{b}=G_{2}(\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}-\sum_{a\in[k]:b<a}m_{ab}Z_{a,1})$ and $\mathcal{K}\subseteq[k]$ is a subset to be chosen, see (4.8.3). The terms $(\chi_{b})_{b\in\mathcal{K}}$ explicitly indicate the role of $(m_{ba})_{a,b\in[k],a<b}$ . More specifically, since $(Z_{b,\ell})_{b\in\mathcal{K}}$ is a collection of independent uniform random variables, Lemma 4.12 below asserts that the distribution of $G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}]$ is uniform over the subgroup generated, denoted by $\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle$ . As we would like $G_{\ell+2}X(X^{\prime})^{-1}$ to be close to being uniform over $Q_{\ell+1}$ , the choice of $\mathcal{K}$ will be made so that the set $\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle$ takes up a sufficiently large fraction of $Q_{\ell+1}$ .

The above discussion suggests that we need to specify some conditions on $(m_{ba})_{a,b\in[k],a<b}$ to guarantee the existence of such an index set $\mathcal{K}$ and thus desired behavior of $\{G_{\ell+2}X(X^{\prime})^{-1}\}_{2\leq\ell\leq L-1}$ . These conditions are summarized in Definition 10 as a “good” event $\mathcal{A}$ . As a result, we will obtain the following estimate. Recall the definition of $h$ from Definition 4.

Proposition 9.

$|G|\cdot\mathbb{P}(X=X^{\prime}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})\ll e^{h}$ .

Lastly, we would like $\mathcal{A}$ to be an event that occurs with sufficiently high probability, i.e.,

Proposition 10.

$|G|\cdot\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\ll e^{h}$ .

Proof of Proposition 8 given Proposition 9 and 10. It follows from the definition of $\mathrm{\mathbf{typ}}$ and a direct calculation that

\displaystyle\mathbb{P}(V=0,\mathrm{\mathbf{typ}})

\displaystyle=\mathbb{P}(W=W^{\prime},W\in\mathcal{W}_{glo})=\sum_{w\in\mathcal{W}_{glo}}\mathbb{P}(W=w)\mathbb{P}(W^{\prime}=w)\leq e^{-h}.

That is,

\mathbb{P}(V=0|\mathrm{\mathbf{typ}})\leq e^{-h}/\mathbb{P}(\mathrm{\mathbf{typ}}).

(36)

Hence, by Proposition 9, Proposition 10 and (36),

\displaystyle\mathbb{P}(X=X^{\prime},V=0|\mathrm{\mathbf{typ}})

\displaystyle\leq\left(\mathbb{P}(X=X^{\prime}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})+\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\right)\cdot\mathbb{P}(V=0|\mathrm{\mathbf{typ}})\ll\frac{1}{|G|}.

∎

The remaining of this section is organized as follows: in Section 4.5 we define filtrations that will serve as useful tools in the subsequent proofs; in Section 4.6 we prove Proposition 7; in Section 4.7 we present some preliminary results on groups that are useful in the proof of Proposition 8. As discussed above, the proof of Proposition 8 is divided into two parts: Proposition 9 will be proved in Section 4.8 and Proposition 10 in Section 4.9.

4.5. Useful Filtrations

Recall that the random walk $X$ can be written as a sequence

X=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}=Z_{\sigma_{1}}^{\eta_{1}}Z_{\sigma_{2}}^{\eta_{2}}\cdots Z_{\sigma_{N}}^{\eta_{N}},

where $N$ is the number of steps taken by $X$ by time $t$ and $Z^{\eta_{i}}_{\sigma_{i}}$ denotes the $i$ -th step taken by the random walk. Similarly, we write $X^{\prime}=\prod_{i=1}^{N^{\prime}}Z^{\eta^{\prime}_{i}}_{\sigma^{\prime}_{i}}$ . Recall from Lemma 4.5 the decomposition $Z_{a}=\prod_{\ell=1}^{L}Z_{a,\ell}$ for $a\in[k]$ .

We will define the following $\sigma$ -fields and events that will be useful in later analysis.

Definition 7.

(i) Let $\widetilde{\mathcal{H}}$ be the $\sigma$ -field generated by $N,N^{\prime},(\sigma_{i},\eta_{i})_{i\in[N+N^{\prime}]}$ , i.e., $\widetilde{\mathcal{H}}$ contains information on the sequences $X,X^{\prime}$ , other than the identities of $(Z_{a})_{a\in[k]}$ .
(ii) For each $\ell\in[L]$ , let $\mathcal{F}_{\ell}=\sigma(\{Z_{a,i}:a\in[k],1\leq i\leq\ell\})$ . Let $\mathcal{F}_{0}$ be the trivial $\sigma$ -field.
(iii) For $\ell\in[L+1]$ , let $\mathcal{E}_{\ell}=\{X(X^{\prime})^{-1}\in G_{\ell}\}$ .

To make the intuitive picture a bit clearer, note that there are mainly two sources of randomness:
(i) At every step a generator is randomly chosen and applied, resulting in the random order of the sequences (encoded in $\widetilde{\mathcal{H}}$ );
(ii) For each $a\in[k]$ , the specific choice of the generator $Z_{a}$ is random (encoded in $\{\mathcal{F}_{\ell}:\ell\in[L]\}$ ).

Remark 7.

The following random variables are measurable with respect to $\widetilde{\mathcal{H}}$ : $V$ , $\mathbf{1}_{\mathrm{\mathbf{typ}}}$ and $(m_{ba})_{a,b\in[k],a<b}$ .

As preparation for later discussion, we first clarify the measurability of events of interest to us.

Lemma 4.6.

For $\ell\in[L]$ ,
(i) $\mathcal{E}_{\ell}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ .
(ii) $\{\mathcal{E}_{\ell+1},V=0\}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ .

Proof.

Recall from (33) we can write

X(X^{\prime})^{-1}=Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k}),

where $\varphi(Z_{1},\dots,Z_{k})$ is a product of multi-fold commutators of $\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}$ as a result of the rearranging. (See the end of Section 4.3 for a detailed description of $\varphi$ and multi-fold commutators.) By Lemma 4.5, for each $a\in[k]$ , $Z_{a}$ can be decomposed into $Z_{a}=\prod_{\ell=1}^{L}Z_{a,\ell}$ . For $a\in[k]$ and $\ell\in[L]$ , we will write $Z_{a,<\ell}:=\prod_{j=1}^{\ell-1}Z_{a,j}$ .

Applying the decomposition $Z_{a}=\prod_{\ell=1}^{L}Z_{a,\ell}$ to the sequence above yields

		$\displaystyle G_{\ell+1}X(X^{\prime})^{-1}=G_{\ell+1}Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k})$
	$\displaystyle=$	$\displaystyle G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,<\ell}\right)\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell-1}Z_{a,i},\prod_{j=1}^{\ell-1}Z_{b,j}\right]^{m_{ba}}\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\})$		(37)

where $\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\})$ is a certain polynomial with terms that are $i$ -fold commutators of $\{Z_{a,u}:a\in[k],u\leq\ell-2\}$ for $i\geq 3$ , satisfying that

G_{\ell+1}\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\})=G_{\ell+1}\varphi(Z_{1},\dots,Z_{k}).

The reason that we restrict our attention above to $u\leq\ell-2$ is because by Proposition 2 any $i$ -fold commutator with $i\geq 3$ that involves $\{Z_{a,u}:a\in[k],u>\ell-2\}$ is in $G_{\ell+1}$ . Also note that we can exchange the order of $Z_{a,\ell}$ and $Z_{a^{\prime},<\ell}$ for any $a,a^{\prime}\in[k]$ as $[Z_{a,\ell},Z_{a^{\prime},<\ell}]\in G_{\ell+1}$ by Proposition 2.

On $\{V=0\}$ the right hand side of (4.5) only involves $\{Z_{a,i}:1\leq i\leq\ell-1\}$ and hence is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ . Therefore, $\{\mathcal{E}_{\ell+1},V=0\}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ , proving (ii).

Next we prove (i). Writing

$\displaystyle f^{(\ell-1)}=$	$\displaystyle f^{(\ell-1)}(\{Z_{a,u}:a\in[k],u\leq\ell-1\})$
$\displaystyle:=$	$\displaystyle\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,<\ell}\right)\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell-1}Z_{a,i},\prod_{j=1}^{\ell-1}Z_{b,j}\right]^{m_{ba}}$
	$\displaystyle\quad\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\}),$	(38)

by (4.5) we have

G_{\ell+1}X(X^{\prime})^{-1}=G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)f^{(\ell-1)}.

(39)

Note that $f^{(\ell-1)}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ as it only involves $\{Z_{a,u}:1\leq u\leq\ell-1\}$ . Furthermore, since $\{Z_{a,\ell}:a\in[k]\}\subseteq G_{\ell}$ , the same kind of simplification as (4.5) leads to

G_{\ell}X(X^{\prime})^{-1}=G_{\ell}f^{(\ell-1)}(\{Z_{a,u}:a\in[k],u\leq\ell-1\}),

which implies that $\mathcal{E}_{\ell}=\{f^{(\ell-1)}\in G_{\ell}\}$ . Therefore, $\mathcal{E}_{\ell}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ . ∎

4.6. Proof of Proposition 7

Linear combinations of independent uniform random variables in an abelian group are themselves uniform on their support. As preparation for the proof we present the following lemma, which is a restatement of Lemma 2.11 and Lemma 6.5 of [17].

Lemma 4.7.

Let $k\in\mathbb{N}$ . Let $H$ be an Abelian group and $U_{1},\dots,U_{k}\overset{iid}{\sim}\mathrm{Unif}(H)$ . For $v=(v_{1},\dots,v_{k})\in\mathbb{Z}^{k}$ , write $U=(U_{1},\dots,U_{k})$ and define $v\cdot U:=\sum_{i=1}^{k}v_{i}U_{i}$ . We have

v\cdot U\sim\mathrm{Unif}(\gamma H)\quad\text{ where }\gamma=\mathrm{gcd}(v_{1},\dots,v_{k},|H|).

Consequently,

\max_{h\in H}\mathbb{P}(v\cdot U=h)=\mathbb{P}(v\cdot U=0)

(40)

Applying Lemma 4.7 to $(Z_{1,\ell},\dots,Z_{k,\ell})$ and $H=Q_{\ell}$ for $\ell\in[L]$ and writing $\mathrm{gcd}(v,|Q_{\ell}|)=\mathrm{gcd}(v_{1},\dots,v_{k},|Q_{\ell}|)$ gives

G_{\ell+1}\sum_{a\in[k]}v_{a}Z_{a,\ell}\sim\mathrm{Unif}(\mathrm{gcd}(v,|Q_{\ell}|)Q_{\ell}).

(41)

The key to the proof of Proposition 7 is the following estimate, whose proof uses the simplified expression of $\{G_{\ell+1}X(X^{\prime})^{-1}\}_{\ell\in L}$ derived in the last section.

Lemma 4.8.

We have

\mathbb{P}(X=X^{\prime}|V)\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|V).

Proof.

Recall from the discussion in the proof of Lemma 4.6 that $\mathcal{E}_{\ell}=\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\}=\{f^{(\ell-1)}\in G_{\ell}\}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ for $\ell\in[L]$ (where $\mathcal{F}_{0}$ is the trivial $\sigma$ -field). Since $\mathcal{E}_{\ell}\subseteq\mathcal{E}_{\ell+1}$ , it follows from (39) that for $\ell\in[L]$ ,

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$	$\displaystyle=\mathbb{P}(\mathcal{E}_{\ell},G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)f^{(\ell-1)}=G_{\ell+1}\bigg{\|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle\leq\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}=G_{\ell+1}g\bigg{\|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}g\bigg{\|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}),$

where the last line follows from rewriting the term $G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)$ in the form of summation, as $Q_{\ell}:=G_{\ell}/G_{\ell+1}$ is abelian. It then follows from (40) in Lemma 4.7 that

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$	$\displaystyle\leq\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}g\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\widetilde{\mathcal{H}}),$

where the last equality follows from observing that $G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}$ is independent from $\mathcal{F}_{\ell-1}$ . It then follows from the tower property that

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}\|\widetilde{\mathcal{H}})$	$\displaystyle=\mathbb{E}[\mathbb{P}(\mathcal{E}_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\|\widetilde{\mathcal{H}}]$
		$\displaystyle\leq\mathbb{E}[\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\widetilde{\mathcal{H}})\|\widetilde{\mathcal{H}}]=\mathbb{P}(\mathcal{E}_{\ell}\|\widetilde{\mathcal{H}})\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\widetilde{\mathcal{H}}).$		(42)

Applying (4.6) iteratively for $\ell\in[L]$ , we obtain

\displaystyle\mathbb{P}(X=X^{\prime}|\widetilde{\mathcal{H}})

\displaystyle=\mathbb{P}(\mathcal{E}_{L+1}|\widetilde{\mathcal{H}})\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|\widetilde{\mathcal{H}}).

As $V$ is measurable with respect to $\widetilde{\mathcal{H}}$ , i.e., $\sigma(V)\subseteq\widetilde{\mathcal{H}}$ , by the tower property we have the desired result

\mathbb{P}(X=X^{\prime}|V)\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|V).

∎

Given Lemma 4.8, we can obtain the following upper bound involving the greatest common divider.

Lemma 4.9.

Let $\bar{r}:=\sum_{\ell=1}^{L}r_{\ell}$ and $\mathrm{gcd}(v):=\mathrm{gcd}(v_{1},\dots,v_{k},|G|)$ . We have

|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})\leq\mathbb{E}\left[(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]

Proof.

For $v\in\mathbb{Z}^{k}\backslash\{0\}$ , it follows from Lemma 4.8 that

	$\displaystyle\mathbb{P}(X(X^{\prime})^{-1}=\mathrm{id}\|V=v)$	$\displaystyle\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}v_{a}Z_{a,\ell}=G_{\ell+1})\leq\prod_{\ell=1}^{L}\frac{1}{\|\mathrm{gcd}(v,\|Q_{\ell}\|)Q_{\ell}\|}$
		$\displaystyle\leq\prod_{\ell=1}^{L}\frac{(\mathrm{gcd}(V,\|Q_{\ell}\|))^{r_{\ell}}}{\|Q_{\ell}\|},$

where the last inequality follows from the fact that for an abelian group $H$ and $\gamma\in\mathbb{N}$ , $|H|/|\gamma H|\leq\gamma^{r(H)}$ , see Lemma 2.12 in [17] for instance.

Recalling from Definition 7 the definition of $\widetilde{\mathcal{H}}$ , one then has that

	$\displaystyle\|G\|\cdot\mathbb{P}(X=X^{\prime},V\neq 0\|\widetilde{\mathcal{H}})$	$\displaystyle\leq\|G\|\cdot\prod_{\ell=1}^{L}\frac{(\mathrm{gcd}(V,\|Q_{\ell}\|))^{r_{\ell}}\cdot\mathbf{1}\{V\neq 0\}}{\|Q_{\ell}\|}$
		$\displaystyle=\prod_{\ell=1}^{L}(\mathrm{gcd}(V,\|Q_{\ell}\|))^{r_{\ell}}\cdot\mathbf{1}\{V\neq 0\}\leq(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\},$		(43)

where the last inequality follows from the fact that $\mathrm{gcd}(V,|Q_{\ell}|)\leq\mathrm{gcd}(V)$ for all $1\leq\ell\leq L$ . Since $\mathrm{\mathbf{typ}}\in\widetilde{\mathcal{H}}$ , integrating both sides of (4.6) over $1_{\mathrm{\mathbf{typ}}}$ gives the desired bound

|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})\leq\mathbb{E}\left[(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right].

∎

Lemma 4.10.

Fix any $\varepsilon>0$ and let $s:=t/k$ for $t\geq(1+\varepsilon)t_{*}$ . For all $\gamma\geq 2$ and all $\lambda>0$ , there exists a constant $\tilde{\delta}_{\lambda}\in(0,1)$ that depends only on $\lambda$ such that

\mathbb{P}(\mathrm{gcd}(V)=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\begin{cases}\min\{e^{-t},\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}\}&\text{ when }s\ll 1,\\ (1-\tilde{\delta}_{\lambda})^{k}&\text{ when }s\geq f(\lambda),\\ (1/\gamma+s^{-1/2})^{k}&\text{ when }s\gg 1,\end{cases}

where $f(\lambda)$ is defined as in (26).

Proof.

Regime: $s\ll 1$ . Let $N_{1}(s)$ denote a rate 1 Poisson process. Recall that $V_{1}$ is a rate $2/k$ simple random walk on $\mathbb{Z}$ . For a random walk to return to the origin it must have taken an even number of steps, which means

\mathbb{P}(V_{1}(t)=0)\leq\mathbb{P}(N_{1}(2s)\in 2\mathbb{Z}_{+})\leq\mathbb{P}(N_{1}(2s)=0)+\sum^{\infty}_{m=1}\mathbb{P}(N_{1}(2s)=2m)\leq e^{-2s}+8s^{2}.

To simplify notation, write $\mathrm{gcd}:=\mathrm{gcd}(V)$ from now on. Using a Chernoff bound argument on the Poisson random variable $N_{1}(2s)$ yields for $\gamma\geq 2$

	$\displaystyle\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0\|\mathrm{\mathbf{typ}})$	$\displaystyle\leq\mathbb{P}(\mathrm{gcd}=\gamma\|\mathrm{\mathbf{typ}})$
		$\displaystyle\lesssim(\mathbb{P}(V_{1}=0)+\mathbb{P}(N_{1}(2s)\geq\gamma))^{k}\leq\left(e^{-2s}+8s^{2}+e^{-2s}\cdot\frac{(2es)^{\gamma}}{\gamma^{\gamma}}\right)^{k}$
		$\displaystyle\leq(e^{-2s}+8s^{2}+(es)^{2})^{k}\leq(1-s)^{k}\leq e^{-t}.$

To prove the second part of the upper bound, note that for $\{\mathrm{gcd}=\gamma,V\neq 0\}$ to occur, there must be some $a\in[k]$ such that $N_{a}(2s)\neq 0$ and $\gamma$ divides $N_{a}(2s)$ , which implies that $N_{a}(2s)\geq\gamma$ . Hence,

\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim k\cdot\mathbb{P}(N_{1}(2s)\geq\gamma)\leq k\cdot\frac{(2es)^{\gamma}e^{-2s}}{\gamma^{\gamma}}\leq\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}.

Regime: $s\geq f(\lambda)$ . Define $c_{\gamma}:=\mathbb{P}(V_{1}(t)\in\gamma\mathbb{Z})$ . Since $s\geq f(\lambda)$ , we have $\mathbb{P}(V_{1}(t)=0)\leq\tilde{c}_{\lambda}<1$ for some constant $\tilde{c}_{\lambda}$ depending on $\lambda$ . So we can fix some small $\delta_{\lambda}>0$ and some large $\gamma_{\lambda}\in\mathbb{N}$ such that $\mathbb{P}(V_{1}=0)+1/\gamma_{\lambda}<1-\delta_{\lambda}$ . That is, for all $\gamma\geq\gamma_{\lambda}$ ,

c_{\gamma}=\mathbb{P}(V_{1}\in\gamma\mathbb{Z})\leq\mathbb{P}(V_{1}=0)+\mathbb{P}(V_{1}\in\gamma\mathbb{Z}|V_{1}\neq 0)\leq\mathbb{P}(V_{1}=0)+1/\gamma_{\lambda}<1-\delta_{\lambda}.

Thus there is some $\tilde{\delta}_{\lambda}>0$ so that

\max_{\gamma\geq 2}c_{\gamma}\leq\max\{\max_{2\leq\gamma\leq\gamma_{\lambda}}c_{\gamma},1-\delta_{\lambda}\}\leq 1-\tilde{\delta}_{\lambda}.

Therefore, for $\gamma\geq 2$ ,

\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\mathbb{P}(V_{1}\in\gamma\mathbb{Z})^{k}\leq(1-\tilde{\delta}_{\lambda})^{k}.

Regime: $s\gg 1$ . Note that

\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\mathbb{P}(V_{1}\in\gamma\mathbb{Z})^{k}\leq(\mathbb{P}(V_{1}=0)+\mathbb{P}(V_{1}\in\gamma\mathbb{Z}|V_{1}\neq 0))^{k}.

For the second term it follows from Lemma 2.14 of [17] that $\mathbb{P}(V_{1}\in\gamma\mathbb{Z}|V_{1}\neq 0)\leq 1/\gamma$ . As $V_{1}$ is a one dimension SRW with rate $2/k$ , Theorem A.4 of [18] implies that when $s\gg 1$ we have

\mathbb{P}(V_{1}(t)=0)\leq\frac{1}{\sqrt{2\pi(2s)}}\exp\left(\mathcal{O}\left(\frac{1}{\sqrt{2s}}\right)\right)\leq s^{-1/2}.

Hence,

\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim(\mathbb{P}(V_{1}=0)+1/\gamma)^{k}\leq(1/\gamma+s^{-1/2})^{k}.

∎

Lemma 4.11.

Suppose $1\ll\log k\ll\log|G|$ . For any $\varepsilon>0$ , when $t\geq(1+\varepsilon)t_{*}$ we have

\mathbb{E}\left[(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]=1+o(1).

Proof.

Again we will write $\mathrm{gcd}=\mathrm{gcd}(V)$ .

Regime: $1\ll k\ll\log|G_{\mathrm{ab}}|$ . Observe that $s:=t/k\geq|G_{\mathrm{ab}}|^{2/k}\gg 1$ . On $\mathrm{\mathbf{typ}}$ we have $\mathrm{gcd}$ is at most $2r_{*}=|G_{\mathrm{ab}}|^{1/k}(\log k)^{2}$ , which gives

\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}1\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]=1+\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}}).

Let $\delta\in(0,1)$ be sufficiently small. For $2\leq\gamma\leq\delta|G_{\mathrm{ab}}|^{1/k}$ , applying Lemma 4.10 gives

\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim(1/\gamma+1/|G_{\mathrm{ab}}|^{1/k})\leq(1/\gamma+1/(\gamma/\delta))^{k}=(1+\delta)^{k}/\gamma^{k}.

For $\gamma>\delta|G_{\mathrm{ab}}|^{1/k}$ , we use the bound $(a+b)^{k}\leq 2^{k}(a^{k}+b^{k})$ to get

\mathbb{P}(\mathrm{gcd}=\gamma|\mathrm{\mathbf{typ}})\lesssim 2^{k}(1/\gamma^{k}+1/|G_{\mathrm{ab}}|).

Therefore,

	$\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}\|\mathrm{\mathbf{typ}}\right]-1$	$\displaystyle\lesssim\sum_{\gamma=2}^{\delta\|G_{\mathrm{ab}}\|^{1/k}}\frac{(1+\delta)^{k}}{\gamma^{k-\bar{r}}}+\sum_{\gamma=\delta\|G_{\mathrm{ab}}\|^{1/k}+1}^{2r_{*}}\gamma^{\bar{r}}2^{k}(1/\gamma^{k}+1/\|G_{\mathrm{ab}}\|)$
		$\displaystyle\lesssim e^{\delta k}2^{\bar{r}+1-k}+2^{k}(\delta\|G_{\mathrm{ab}}\|^{1/k})^{\bar{r}+1-k}+2^{k}(\log k)^{2(\bar{r}+1)}\|G_{\mathrm{ab}}\|^{(\bar{r}+1-k)/k}$
		$\displaystyle=o(1)$

as $\bar{r}\asymp 1$ and $k\ll\log|G_{\mathrm{ab}}|$ .

Regime: $k\eqsim\lambda\log|G_{\mathrm{ab}}|$ . In this regime $s\geq(1+\varepsilon)t_{0}/k\geq f(\lambda)$ . It follows from Lemma 4.10 that there exists $\tilde{\delta}_{\lambda}\in(0,1)$ such that

	$\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}\|\mathrm{\mathbf{typ}}\right]-1$	$\displaystyle\leq\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}(1-\tilde{\delta}_{\lambda})^{k}$
		$\displaystyle\lesssim(\log k)^{2(\bar{r}+1)}(1-\tilde{\delta}_{\lambda})^{k}=o(1).$

Regime: $k\gg\log|G_{\mathrm{ab}}|$ . In this regime $t_{*}/k\ll 1$ and thus for $t\geq(1+\varepsilon)t_{*}$ there are two regimes of $s=t/k$ to be discussed: $s\ll 1$ and $s\gtrsim 1$ . When $s\gtrsim 1$ , by the same argument as before we can show $\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1=o(1)$ .

It remains to treat the case where $s\ll 1$ . When $\frac{\log\log k}{k}\ll s\ll 1$ , we apply the bound $\mathbb{P}(\mathrm{gcd}(V)=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim e^{-t}$ in Lemma 4.10 to show

\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1

\displaystyle\leq\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}e^{-t}\lesssim(2r_{*})^{\bar{r}+1}e^{-t}\lesssim(\log k)^{2(\bar{r}+1)}e^{-t}=o(1).

When $s\lesssim\frac{\log\log k}{k}$ , we apply the bound $\mathbb{P}(\mathrm{gcd}(V)=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}$ in Lemma 4.10 to show

	$\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}\|\mathrm{\mathbf{typ}}\right]-1$	$\displaystyle\lesssim\sum_{\gamma=2}^{2r_{}}\gamma^{\bar{r}}\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}\leq\sum_{\gamma=2}^{\bar{r}}\gamma^{\bar{r}-\gamma}k(2es)^{2}+\sum_{\bar{r}<\gamma\leq 2r_{}}k(2es)^{\gamma}$
		$\displaystyle\lesssim C_{\bar{r}}ks^{2}+Cks^{\bar{r}}\lesssim C^{\prime}_{\bar{r}}ks^{2}\lesssim\frac{(\log\log k)^{2}}{k}=o(1).$

∎

The proof of Proposition 7 is completed given Lemma 4.9 and Lemma 4.11.

4.7. Preliminary Results on Groups

Throughout this paper, the notation $\langle A\rangle$ denotes the subgroup generated by the set of elements $A\subseteq G$ . Recall from Definition 6 that $R_{\ell}$ are the representatives of $Q_{\ell}=G_{\ell}/G_{\ell+1}$ .

Lemma 4.12.

For fixed $h_{1},\dots,h_{n}\in R_{1}$ and $U_{1},\dots,U_{n}\overset{iid}{\sim}\mathrm{Unif}(R_{\ell})$ with $\ell\geq 1$ , we have

G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]\sim\mathrm{Unif}\left(\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle\right).

Proof.

Write $R_{\ell}=\{g_{u}:u\in[|R_{\ell}|]\}$ . Any $G_{\ell+2}x\in\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle$ can be expressed as

G_{\ell+2}x=G_{\ell+2}\sum_{i\in[n],u\in[|R_{\ell}|]}c_{i,u}[h_{i},g_{u}]=G_{\ell+2}\sum_{i\in[n]}\left[h_{i},\sum_{u\in[|R_{\ell}|]}c_{i,u}g_{u}\right]

for some integer coefficients $\{c_{i,u}:i\in[n],u\in[|R_{\ell}|]\}$ . Note that since $R_{\ell}$ is a representative set of $Q_{\ell}$ , there exists a $g^{\prime}_{i}\in R_{\ell}$ such that $G_{\ell+1}\sum_{u\in[|R_{\ell}|]}c_{i,u}g_{u}=G_{\ell+1}g^{\prime}_{i}$ , and thus by Proposition 2,

G_{\ell+2}\sum_{i\in[n]}\left[h_{i},\sum_{u\in[|R_{\ell}|]}c_{i,u}g_{u}\right]=G_{\ell+2}[h_{i},g^{\prime}_{i}].

Therefore, for any $x\in\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle$ , there exists $g^{\prime}_{1},\dots,g^{\prime}_{n}\in R_{\ell}$ such that

G_{\ell+2}x=G_{\ell+2}\sum_{i\in[n]}[h_{i},g^{\prime}_{i}].

Hence, for any $x\in\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle$ , by Proposition 3,

\displaystyle\mathbb{P}(G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]=G_{\ell+2}x)

\displaystyle=\mathbb{P}(G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}\cdot(g^{\prime}_{i})^{-1}]=G_{\ell+2})=\mathbb{P}(G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]=G_{\ell+2}),

which proves the uniformity of $G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]$ . ∎

Lemma 4.13.

Let $n,\alpha\in\mathbb{N}$ be fixed and $p$ be a prime. Suppose $U_{1},U_{2},\dots,U_{n+2}$ are i.i.d. uniform random variables over $\mathbb{Z}_{p^{\alpha}}$ . We have

\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p^{\alpha}}|}{|\langle U_{1},\dots,U_{n+2}\rangle|}\right)^{n}\right]\leq\exp\left(\frac{1}{p^{2}-1}\right).

Proof.

Let $A_{i}=\{\text{$U_{1},\dots,U_{n+2}$ are all divisible by $p^{i}$}\}$ for $0\leq i\leq\alpha$ and $A_{\alpha+1}=\emptyset$ . On the event $A_{i}\backslash A_{i+1}$ we know $\frac{|\mathbb{Z}_{p^{\alpha}}|}{|\langle U_{1},\dots,U_{n+2}\rangle|}=p^{i}$ . Also note that $\mathbb{P}(A_{i})=p^{-(n+2)i}$ for $0\leq i\leq\alpha$ . It follows that

	$\displaystyle\mathbb{E}\left[\left(\frac{\|\mathbb{Z}_{p^{\alpha}}\|}{\|\langle U_{1},\dots,U_{n+2}\rangle\|}\right)^{n}\right]$	$\displaystyle=\sum_{i=0}^{\alpha}p^{ni}\mathbb{P}(A_{i}\backslash A_{i+1})$
		$\displaystyle\leq\sum_{i=0}^{\alpha}p^{ni}\cdot p^{-(n+2)i}\leq\frac{1}{1-p^{-2}}\leq\exp\left(\frac{1}{p^{2}-1}\right).$

∎

Lemma 4.14.

Let $H$ be a subset of $G$ . For any $\ell\in[L-1]$ ,

\frac{|Q_{\ell+1}|}{|\langle\{G_{\ell+2}[h,g]:h\in H,g\in R_{\ell}\}\rangle|}\leq\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}h:h\in H\}\rangle|}\right)^{r_{\ell+1}}.

Proof.

For simplicity of notation, write $N:=\langle\{G_{2}h:h\in H\}\rangle$ and then let $\lambda=|G_{\mathrm{ab}}|/|N|$ . Since $G_{\mathrm{ab}}$ is abelian, it can be expressed in the form

G_{\mathrm{ab}}=\mathbb{Z}_{m_{1}}\oplus\cdots\oplus\mathbb{Z}_{m_{r}}

for some $m_{1},\dots,m_{r}\in\mathbb{N}$ where $r$ is the rank of $G$ . This decomposition allows us to see that $\lambda G_{\mathrm{ab}}\trianglelefteq N$ . As a consequence,

\langle\{G_{\ell+2}[h,g]:G_{2}h\in\lambda G_{\mathrm{ab}},g\in R_{\ell}\}\rangle\trianglelefteq\langle\{G_{\ell+2}[h,g]:h\in H,g\in R_{\ell}\}\rangle

Note that

\langle\{G_{\ell+2}[h,g]:G_{2}h\in\lambda G_{\mathrm{ab}},g\in R_{\ell}\}\rangle=\langle\{\lambda G_{\ell+2}[h,g]:h\in G,g\in R_{\ell}\}\rangle=\lambda Q_{\ell+1}.

We then have

	$\displaystyle\frac{\|Q_{\ell+1}\|}{\|\langle\{G_{\ell+2}[h,g]:h\in H,g\in R_{\ell}\}\rangle\|}$	$\displaystyle\leq\frac{\|Q_{\ell+1}\|}{\langle\{G_{\ell+2}[h,g]:G_{2}h\in\lambda G_{\mathrm{ab}},g\in R_{\ell}\}\rangle}$
		$\displaystyle=\frac{\|Q_{\ell+1}\|}{\|\lambda Q_{\ell+1}\|}\leq\lambda^{r_{\ell+1}}.$

∎

Let $R$ be a non-trivial commutative ring with identity, and let $M=(m_{ij})_{n\times n}$ be a matrix over $R$ . For a maximal ideal $\mathcal{I}$ of $R$ , let $\pi:R\to R/\mathcal{I}$ be the natural homomorphism. Let $\pi^{n}:R^{n}\to(R/\mathcal{I})^{n}$ be defined as $\pi^{n}(g_{1},\dots,g_{n})=(\pi(g_{1}),\dots,\pi(g_{n}))$ for $g_{1},\dots,g_{n}\in R$ . The following result comes from the theory of matrices over a commutative ring and it is stated in [7, p. 259].

Proposition 11.

If $M:R^{n}\to R^{n}$ is a homomorphism, then $M$ is surjective if and only if for every maximal ideal $\mathcal{I}$ of $R$ , the map $\pi_{M}:(R/\mathcal{I})^{n}\to(R/\mathcal{I})^{n}$ is surjective, where $\pi_{M}\circ\pi^{n}=\pi^{n}\circ M$ .

We will be interested in $R=\mathbb{Z}_{p^{\alpha}}$ where $p$ is a prime. The unique maximal ideal of $\mathbb{Z}_{p^{\alpha}}$ is $\mathcal{I}=\langle p\rangle$ . Then $\mathbb{Z}_{p^{\alpha}}/\mathcal{I}\cong\mathbb{Z}_{p}$ .

Lemma 4.15.

Let $\alpha,n\in\mathbb{N}$ and $U:=(U_{1},\dots,U_{n})$ where $U_{i}\overset{iid}{\sim}\mathrm{Unif}(\mathbb{Z}_{p^{\alpha}})$ . Let $M$ denote a $n\times n$ matrix over $\mathbb{Z}_{p^{\alpha}}$ . If $M:(\mathbb{Z}_{p^{\alpha}})^{n}\to(\mathbb{Z}_{p^{\alpha}})^{n}$ is surjective, then $\tilde{U}:=MU$ has independent entries that are uniform over $\mathbb{Z}_{p^{\alpha}}$ .

Proof.

Define a surjective group homomorphism $f:(\mathbb{Z}_{p^{\alpha}})^{n}\to(\mathbb{Z}_{p^{\alpha}})^{n}$ by $f(\bm{x}):=M\bm{x}$ where $\bm{x}\in(\mathbb{Z}_{p^{\alpha}})^{n}$ . Since $f$ is surjective, for any $\bm{y}\in(\mathbb{Z}_{p^{\alpha}})^{n}$ there exists some $\bm{y}^{\prime}\in(\mathbb{Z}_{p^{\alpha}})^{n}$ such that $f(\bm{y}^{\prime})=\bm{y}$ . It follows that for any $\bm{y}\in(\mathbb{Z}_{p^{\alpha}})^{n}$

	$\displaystyle\mathbb{P}(\tilde{U}=\bm{y})$	$\displaystyle=\mathbb{P}(f(U)=\bm{y})=\mathbb{P}(f(U-\bm{y}^{\prime})=0)=\mathbb{P}(f(U)=0)=\mathbb{P}(\tilde{U}=0)$
		$\displaystyle=\frac{\|Ker(f)\|}{\|(\mathbb{Z}_{p^{\alpha}})^{n}\|}=\frac{1}{\|Im(f)\|}=\frac{1}{p^{\alpha n}},$

where the last equality uses the fact that $f$ is surjective. ∎

4.8. Proof of Proposition 9

4.8.1. Definitions

To prepare for the proof of Proposition 9, we first introduce several useful quantities and describe the selection of the good event $\mathcal{A}$ and index set $\mathcal{K}\subseteq[k]$ , the reason for whose definition will become clearer as we proceed with the proof of Proposition 9.

To simplify notation we define the following matrix.

Definition 8.

Recall the definition of $(m_{ba})_{a,b\in[k],a<b}$ from (34). Define

\hat{m}_{ba}=\begin{cases}m_{ba}&\text{ if }a<b\\ -m_{ab}&\text{ if }b<a\\ 0&\text{ if }b=a.\end{cases}

(44)

Definition 9.

Let $A=(A_{ba})_{a,b\in[k]}$ be a $k\times k$ matrix with entries in $\mathbb{Z}$ , and let $\mathcal{K}:=\mathcal{K}(A)\subseteq[k]$ denote a subset of indices which is known when $A$ is given. For $b\in[k]$ , we define $\chi_{b}(A)$ , and respectively $\psi_{b}(A)$ , to be an element in $G$ that satisfies

G_{2}\chi_{b}(A)=G_{2}\left(\sum_{a\in[k]}A_{ba}Z_{a,1}\right),

and respectively

G_{2}\psi_{b}(A)=G_{2}\left(\sum_{a\in[k]:a<b}A_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}A_{ba}Z_{a,1}\right).

In particular, for $b\in[k]$ , define $\chi_{b}:=\chi_{b}(\hat{m})$ and $\psi_{b}:=\psi_{b}(\hat{m})$ .

Since by definition

G_{2}\chi_{b}=G_{2}\left(\sum_{a\in[k]}\hat{m}_{ba}Z_{a,1}\right),

with slight abuse of notation we will write

\chi_{b}=\sum_{a\in[k]}\hat{m}_{ba}Z_{a,1}\quad\quad\text{for }b\in[k],

(45)

so that we can write, again with slight abuse of notation,

\chi:=(\chi_{1},\dots,\chi_{k})^{T}=\hat{m}(Z_{1,1},\dots,Z_{k,1})^{T},

which is the product of the matrix $\hat{m}$ and the vector $(Z_{1,1},\dots,Z_{k,1})$ .

Let

K:=\sum_{\ell=2}^{L}r_{\ell}+2.

(46)

For a $K\times K$ submatrix $M$ of $\hat{m}$ , define the matrix $M_{p}:=M\mod p$ , i.e., $M_{p}(i,j):=M(i,j)\mod p$ for the $(i,j)$ -th entry in $M$ . Note that $M_{p}$ is a matrix over the field $\mathbb{F}_{p}$ and hence its row rank is defined as the number of linearly independent rows in the matrix.

Definition 10.

A fixed $k\times k$ matrix $(A_{ba})_{a,b\in[k]}$ is said to be good if it satisfies the following two conditions:

(i)

There exists a set $\mathcal{K}\subseteq[k]$ such that

\{G_{2}\psi_{b}(A),G_{2}\chi_{b}(A):b\in\mathcal{K}\}\text{ are independent from }\{G_{2}Z_{b,1}:b\in\mathcal{K}\},

(47)

(ii)

Let $\Gamma:=\{p:p\text{ is a prime that divides }|G_{\mathrm{ab}}|\}$ . For each $p\in\Gamma$ there exists a $K\times K$ submatrix $M$ of $(A_{ba})_{b\in\mathcal{K},a\in[k]}$ such that $M_{p}:=M\mod p$ has rank $K$ over the field $\mathbb{F}_{p}$ , where as above $K=\sum_{\ell=2}^{L}r_{\ell}+2$ .

Define $\mathcal{A}:=\{\text{$\hat{m}$ is a good matrix}\}$ and let $\mathcal{K}$ be the corresponding subset of indices satisfying (i).

Note that by definition both $\mathcal{A}$ and $\mathcal{K}$ are measurable with respect to $\widetilde{\mathcal{H}}$ .

4.8.2. Outline of Proof

To prove Proposition 9, we derive an upper bound on $\mathbb{P}(X=X^{\prime}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})$ inductively using the following proposition.

Proposition 12.

Let $\mathcal{K}\subseteq[k]$ be measurable with respect to $\widetilde{\mathcal{H}}$ . For $2\leq\ell\leq L-1$ , letting

H_{\mathcal{K},\ell+1}:=\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle,

we have

\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(\mathcal{E}_{\ell+2}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\leq\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot|H_{\mathcal{K},\ell+1}|^{-1}.

Applying Lemma 4.14 to $H_{\mathcal{K},\ell+1}$ gives

|H_{\mathcal{K},\ell+1}|^{-1}\leq\frac{1}{|Q_{\ell+1}|}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{\ell+1}}.

(48)

Our goal is to choose $\mathcal{K}\subseteq[k]$ properly so that $\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle$ is sufficiently large compared to $G_{\mathrm{ab}}$ . To guarantee such a choice of $\mathcal{K}$ exists, we further define a “good” event $\mathcal{A}$ , see Definition 10, which is measurable with respect to $\widetilde{\mathcal{H}}$ and occurs with high probability.

Note that $\mathcal{A},\mathrm{\mathbf{typ}}$ and $\{V=0\}$ are measurable with respect to $\widetilde{\mathcal{H}}$ . By the tower property of conditional expectation and the fact that $\sigma(\mathcal{F}_{1},\widetilde{\mathcal{H}})\subseteq\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ for $2\leq\ell\leq L-1$ , Proposition 12 leads to

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+2},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{F}_{1},\widetilde{\mathcal{H}})$	$\displaystyle=\mathbb{E}\left[\mathbb{P}(\mathcal{E}_{\ell+2},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\big{\|}\mathcal{F}_{1},\widetilde{\mathcal{H}}\right]$
		$\displaystyle\leq\|H_{\mathcal{K},\ell+1}\|^{-1}\cdot\mathbb{P}(\mathcal{E}_{\ell+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{F}_{1},\widetilde{\mathcal{H}}),$

which implies

\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}})

\displaystyle=\left(\prod_{\ell=2}^{L-1}|H_{\mathcal{K},\ell+1}|^{-1}\right)\cdot\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}}).

Combined with (48), the above yields

		$\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{F}_{1},\widetilde{\mathcal{H}})$
	$\displaystyle\leq$	$\displaystyle\left(\prod_{\ell=2}^{L-1}\frac{1}{\|Q_{\ell+1}\|}\right)\cdot\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{\sum_{\ell=2}^{L-1}r_{\ell+1}}\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{F}_{1},\widetilde{\mathcal{H}}).$		(49)

Upper bounding the expectation of (4.8.2) is the key to proving Proposition 9. The choice of $\mathcal{A}$ and $\mathcal{K}$ in Definition 10 is made so that the expectation of the right hand side of (4.8.2) leads to the desired result.

In Section 4.8.3 we prove Proposition 12. We complete the proof of Proposition 9 in Section 4.8.4 and finish the proof of a key lemma in Section 4.8.5.

4.8.3. Proof of Proposition 12

The key to proving Proposition 12 is the simplification of $G_{\ell+2}X(X^{\prime})^{-1}$ . The analysis in this section is somewhat similar to that in Section 4.5, where we obtained a simplified expression of $G_{\ell+1}X(X^{\prime})^{-1}$ when $V\neq 0$ . However, when $V=0$ the result of simplification is quite different. Instead of a simple quantity of the form $G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}$ (see Lemma 4.8), we now have to deal with an expression involving commutators of $\{Z_{a,u}:a\in[k],u\in[L]\}$ .

Recall from Definition 7 the definition of $\widetilde{\mathcal{H}}$ , $\{\mathcal{F}_{\ell}:\ell\in[L]\}$ and $\{\mathcal{E}_{\ell}:\ell\in[L+1]\}$ . Define

\mathbf{X}^{(\ell+1)}:=\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell-1}Z_{a,i},\prod_{j=1}^{\ell-1}Z_{b,j}\right]^{m_{ba}}\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\}),

(50)

which comes from the right hand side of (4.5).

Lemma 4.16.

On $\{\mathcal{E}_{\ell+1},V=0\}$ we have

G_{\ell+2}X(X^{\prime})^{-1}=G_{\ell+2}\sum_{a,b\in[k]:a<b}m_{ba}([Z_{a,1},Z_{b,\ell}]+[Z_{a,\ell},Z_{b,1}])+G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)}),

where $G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)})\in Q_{\ell+1}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ and $\tilde{\varphi}:=\tilde{\varphi}(\{Z_{a,u}:a\in[k],u\leq\ell-1\})$ is a polynomial whose definition will be clarified in the proof.

Proof.

Recall from Lemma 4.6 that $\{\mathcal{E}_{\ell+1},V=0\}$ is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ . On the event $\{\mathcal{E}_{\ell+1},V=0\}$ we can write

\displaystyle G_{\ell+2}X(X^{\prime})^{-1}

\displaystyle=G_{\ell+2}\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell}Z_{a,i},\prod_{j=1}^{\ell}Z_{b,j}\right]^{m_{ba}}\varphi^{(\ell-1)}(\{Z_{a,u}:a\in[k],u\leq\ell-1\}),

(51)

where $\varphi^{(\ell-1)}$ is defined analogously to $\varphi^{(\ell-2)}$ in (4.5). Since all terms in $\varphi^{(\ell-2)}$ are present in $\varphi^{(\ell-1)}$ we can express $G_{\ell+2}\varphi^{(\ell-1)}:=G_{\ell+2}\varphi^{(\ell-2)}\cdot\tilde{\varphi},$ where $\tilde{\varphi}:=\tilde{\varphi}(\{Z_{a,u}:a\in[k],u\leq\ell-1\})$ is the polynomial that comes from excluding all the terms in $G_{\ell+2}\varphi^{(\ell-2)}$ from $G_{\ell+2}\varphi^{(\ell-1)}$ .

We can rewrite (51) as

\displaystyle G_{\ell+2}X(X^{\prime})^{-1}

\displaystyle=G_{\ell+2}\prod_{a,b\in[k]:a<b}([Z_{a,1},Z_{b,\ell}]\cdot[Z_{a,\ell},Z_{b,1}])^{m_{ba}}\cdot\tilde{\varphi}\cdot\mathbf{X}^{(\ell+1)}.

(52)

Note that $G_{\ell+2}\tilde{\varphi}\in G_{\ell+1}$ , as every $i$ -fold commutator with $i\geq 3$ in $\tilde{\varphi}$ must involve a $Z_{a,\ell-1}$ for some $a\in[k]$ . It follows from the proof of Lemma 4.6 that $\mathbf{X}^{(\ell+1)}\in G_{\ell+1}$ on $\{\mathcal{E}_{\ell+1},V=0\}$ . Furthermore, it is easy to see that both $G_{\ell+2}\tilde{\varphi}$ and $G_{\ell+2}\mathbf{X}^{(\ell+1)}$ are measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ as they only involve terms in $\{Z_{a,i}:1\leq i\leq\ell-1\}$ .

Since $Q_{\ell+1}=G_{\ell+1}/G_{\ell+2}$ is abelian, we can equivalently write (52) in terms of addition and obtain the desired expression. ∎

Proof of Proposition 12. For simplicity of notation, let $f\in G_{\ell+1}$ be such that

G_{\ell+2}f:=G_{\ell+2}\sum_{a,b\in[k]:a<b}m_{ba}([Z_{a,1},Z_{b,\ell}]+[Z_{a,\ell},Z_{b,1}]).

It follows from Lemma 4.16 that $G_{\ell+2}X(X^{\prime})^{-1}=G_{\ell+2}f+G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)})$ on $\{\mathcal{E}_{\ell+1},V=0\}$ and hence

	$\displaystyle\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(\mathcal{E}_{\ell+2}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$	$\displaystyle=\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(G_{\ell+2}f+G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)})=G_{\ell+2}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle\leq\max_{g_{\ell}\in G_{\ell+1}}\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$		(53)

Let $\mathcal{G}_{\ell,\mathcal{K}^{c}}$ denote the $\sigma$ -field generated by $\{Z_{a,\ell}:a\in[k]\backslash\mathcal{K}\}$ . Observe that

$\displaystyle G_{\ell+2}f$	$\displaystyle=G_{\ell+2}\sum_{a,b\in[k]:a<b}m_{ba}([Z_{a,1},Z_{b,\ell}]+[Z_{a,\ell},Z_{b,1}])$
	$\displaystyle=G_{\ell+2}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,\ell}]+G_{\ell+2}\sum_{a\in\mathcal{K},b\in[k]:a<b}m_{ba}[Z_{a,\ell},Z_{b,1}]$
	$\displaystyle\quad+G_{\ell+2}\sum_{b\in\mathcal{K}^{c},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,\ell}]+G_{\ell+2}\sum_{a\in\mathcal{K}^{c},b\in[k]:a<b}m_{ba}[Z_{a,\ell},Z_{b,1}],$
	$\displaystyle=:G_{\ell+2}f_{unknown}+G_{\ell+2}f_{known}$	(54)

where the second-to-last line is known under $\sigma(\mathcal{G}_{\ell,\mathcal{K}^{c}},\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ and thus will be denoted by $G_{\ell+2}f_{known}$ . It remains to consider the third-to-last line, i.e.,

$\displaystyle G_{\ell+2}f_{unknown}:=$	$\displaystyle G_{\ell+2}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,\ell}]+G_{\ell+2}\sum_{a\in\mathcal{K},b\in[k]:a<b}m_{ba}[Z_{a,\ell},Z_{b,1}]$
$\displaystyle=$	$\displaystyle G_{\ell+2}\sum_{b\in\mathcal{K}}\left[\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}-\sum_{a\in[k]:b<a}m_{ab}Z_{a,1},Z_{b,\ell}\right]$
$\displaystyle=:$	$\displaystyle G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}],$	(55)

where $\chi_{b}$ is as in Definition 9, i.e., $\chi_{b}$ is such that $G_{2}\chi_{b}=G_{2}(\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}-\sum_{a\in[k]:b<a}m_{ab}Z_{a,1})$ .

Lemma 4.12 shows that $G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}]$ is uniform over $\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle$ , which is measurable with respect to $\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ since $(\chi_{b})_{b\in\mathcal{K}}$ are measurable with respect to $\sigma(\mathcal{F}_{1},\widetilde{\mathcal{H}})\subseteq\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$ . It turns out that if we further condition on the $\sigma$ -field $\mathcal{G}_{\ell,\mathcal{K}^{c}}$ , one has that for any $g_{\ell}\in G_{\ell+1}$ ,

	$\displaystyle\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})$
$\displaystyle=$	$\displaystyle\mathbb{P}(G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}]+G_{\ell+2}f_{known}=G_{\ell+2}g_{\ell}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})$
$\displaystyle\leq$	$\displaystyle\max_{\tilde{g}_{\ell}\in G_{\ell+1}}\mathbb{P}(G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}]=G_{\ell+2}\tilde{g}_{\ell}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})$
$\displaystyle\leq$	$\displaystyle\|\{\langle G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle\|^{-1}=\|H_{\mathcal{K},\ell+1}\|^{-1}.$	(56)

Therefore, we can bound (4.8.3) from above by

\displaystyle\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})

\displaystyle=\mathbb{E}_{\mathcal{G}_{\ell,\mathcal{K}^{c}}}\left[\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})\right]\leq|H_{\mathcal{K},\ell+1}|^{-1}.

(57)

where $\mathbb{E}_{\mathcal{G}_{\ell,\mathcal{K}^{c}}}[\cdot]$ means we are taking the expectation over $\{Z_{a,\ell}:a\in[k]\backslash\mathcal{K}\}$ . Combining (4.8.3), (4.8.3) and (57) then yields the conclusion of Proposition 12, i.e.,

\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(\mathcal{E}_{\ell+2}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\leq\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot|H_{\mathcal{K},\ell+1}|^{-1}.

∎

4.8.4. Proof of Proposition 9

We begin by addressing the last term in (4.8.2). Recall the definition of $(\chi_{b})_{b\in[k]},(\psi_{b})_{b\in[k]}$ from Definition 9. Recall that $\mathcal{K}$ is measurable with respect to $\widetilde{\mathcal{H}}$ .

Lemma 4.17.

Let $\sigma(\mathcal{G},\widetilde{\mathcal{H}}):=\sigma((G_{2}\psi_{b})_{b\in\mathcal{K}},(G_{2}\chi_{b})_{b\in\mathcal{K}},\widetilde{\mathcal{H}})$ . Then

\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{G},\widetilde{\mathcal{H}})\leq\frac{\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}}{|Q_{2}|}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{2}}.

Proof.

Observe that

$\displaystyle G_{3}X(X^{\prime})^{-1}$	$\displaystyle=G_{3}\sum_{a<b}m_{ba}[Z_{a,1},Z_{b,1}]=G_{3}\sum_{b\in[k]}\sum_{a<b}m_{ba}[Z_{a,1},Z_{b,1}]$
	$\displaystyle=G_{3}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,1}]+G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}:a<b}m_{ba}[Z_{a,1},Z_{b,1}]$
	$\displaystyle\quad+G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}^{c}:a<b}m_{ba}[Z_{a,1},Z_{b,1}]$
	$\displaystyle=G_{3}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,1}]-G_{3}\sum_{b\in\mathcal{K},a\in\mathcal{K}^{c}:a>b}m_{ab}[Z_{a,1},Z_{b,1}]$
	$\displaystyle\quad+G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}^{c}:a<b}m_{ba}[Z_{a,1},Z_{b,1}]$
	$\displaystyle=:G_{3}\tilde{f}+G_{3}\tilde{f}_{c},$	(58)

where

G_{3}\tilde{f}_{c}:=G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}^{c}:a<b}m_{ba}[Z_{a,1},Z_{b,1}],

and

	$\displaystyle G_{3}\tilde{f}$	$\displaystyle:=G_{3}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,1}]-G_{3}\sum_{b\in\mathcal{K},a\in\mathcal{K}^{c}:a>b}m_{ab}[Z_{a,1},Z_{b,1}]$
		$\displaystyle=G_{3}\sum_{b\in\mathcal{K}}\left[\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1},Z_{b,1}\right]=:G_{3}\sum_{b\in\mathcal{K}}\left[\psi_{b},Z_{b,1}\right],$

where $\psi_{b}$ is as in Definition 9.

By Definition 10, on $\mathcal{A}$ there exists a set $\mathcal{K}\subseteq[k]$ such that conditionally on $\widetilde{\mathcal{H}}$ , $(G_{2}\chi_{b})_{b\in\mathcal{K}},(G_{2}\psi_{b})_{b\in\mathcal{K}}$ are independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ . Hence, conditioning on $\sigma(\mathcal{G},\widetilde{\mathcal{H}})$ , $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ are i.i.d. uniform over $G_{\mathrm{ab}}$ . In addition note that $G_{3}\tilde{f}_{c}$ is independent from $(Z_{b,1})_{b\in\mathcal{K}}$ since $G_{3}\tilde{f}_{c}$ only involves $(Z_{b,1})_{b\in\mathcal{K}^{c}}$ . Hence,

		$\displaystyle\mathbb{P}(G_{3}X(X^{\prime})^{-1}=G_{3}\|\mathcal{G},\widetilde{\mathcal{H}})$
	$\displaystyle=$	$\displaystyle\mathbb{P}\left(G_{3}\sum_{b\in\mathcal{K}}[\psi_{b},Z_{b,1}]=-G_{3}\tilde{f}_{c}\bigg{\|}\mathcal{G},\widetilde{\mathcal{H}}\right)\leq\max_{\tilde{g}\in G_{2}}\mathbb{P}\left(G_{3}\sum_{b\in\mathcal{K}}[\psi_{b},Z_{b,1}]=G_{3}\tilde{g}\bigg{\|}\mathcal{G},\widetilde{\mathcal{H}}\right)$
	$\displaystyle\leq$	$\displaystyle\|\langle\{G_{3}[\psi_{b},g]:b\in\mathcal{K},g\in R_{1}\}\rangle\|^{-1}\leq\frac{1}{\|Q_{2}\|}\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{r_{2}},$

where the last line follows from Lemma 4.12 and Lemma 4.18. ∎

The last ingredient needed to complete the proof of Proposition 9 is the following estimate whose proof will be delayed till Section 4.8.5.

Lemma 4.18.

Let $(\chi_{b})_{b\in[k]},(\psi_{b})_{b\in[k]}$ be defined as in Definition 9. Let $\mathcal{A}$ and $\mathcal{K}$ be defined as in Definition 10. Then

\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]\leq\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right)

and

\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]\leq\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right).

Proof of Proposition 9. Recall that $\sigma(\mathcal{G},\widetilde{\mathcal{H}})=\sigma((\psi_{b})_{b\in\mathcal{K}},(\chi_{b})_{b\in\mathcal{K}},\widetilde{\mathcal{H}})$ so that $(G_{2}\chi_{b})_{b\in\mathcal{K}}$ are measurable with respect to $\sigma(\mathcal{G},\widetilde{\mathcal{H}})$ . Recall that $K=\sum_{\ell=2}^{L}r_{\ell}+2$ . Noting that $\sigma(\mathcal{G},\widetilde{\mathcal{H}})\subseteq\sigma(\mathcal{F}_{1},\widetilde{\mathcal{H}})$ , applying the tower property to (4.8.2) gives

		$\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{G},\widetilde{\mathcal{H}})$
	$\displaystyle\leq$	$\displaystyle\left(\prod_{\ell=2}^{L-1}\frac{1}{\|Q_{\ell+1}\|}\right)\mathbb{E}\left[\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2-r_{2}}\mathbf{1}\{\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\bigg{\|}\mathcal{G},\widetilde{\mathcal{H}}\right]$
	$\displaystyle=$	$\displaystyle\left(\prod_{\ell=2}^{L-1}\frac{1}{\|Q_{\ell+1}\|}\right)\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2-r_{2}}\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{G},\widetilde{\mathcal{H}})$
	$\displaystyle\leq$	$\displaystyle\left(\prod_{\ell=1}^{L-1}\frac{1}{\|Q_{\ell+1}\|}\right)\cdot\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2-r_{2}}\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{r_{2}}$

where the last line follows from Lemma 4.17. Then

		$\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\widetilde{\mathcal{H}})=\mathbb{E}\left[\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\|\mathcal{G},\widetilde{\mathcal{H}})\big{\|}\widetilde{\mathcal{H}}\right]$
	$\displaystyle\leq$	$\displaystyle\left(\prod_{\ell=1}^{L-1}\frac{1}{\|Q_{\ell+1}\|}\right)\cdot\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2-r_{2}}\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{r_{2}}\bigg{\|}\widetilde{\mathcal{H}}\right].$

We can bound the last term above by Hölder’s inequality and Lemma 4.18,

		$\displaystyle\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2-r_{2}}\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{r_{2}}\bigg{\|}\widetilde{\mathcal{H}}\right]$
	$\displaystyle\leq$	$\displaystyle\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2}\bigg{\|}\widetilde{\mathcal{H}}\right]^{\frac{K-2-r_{2}}{K-2}}\mathbb{E}\left[\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2}\bigg{\|}\widetilde{\mathcal{H}}\right]^{\frac{r_{2}}{K-2}}$
	$\displaystyle\leq$	$\displaystyle\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right).$

That is,

\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\widetilde{\mathcal{H}})\leq\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\prod_{\ell=1}^{L-1}\frac{1}{|Q_{\ell+1}|}\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right).

Taking expectation over $\widetilde{\mathcal{H}}$ on both sides and letting $C:=\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right)$ , we have

|G|\cdot\mathbb{P}(\mathcal{E}_{L+1}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})\leq|G|\cdot C\prod_{\ell=1}^{L-1}\frac{1}{|Q_{\ell+1}|}=C|G_{\mathrm{ab}}|\ll e^{h},

where $h$ is defined as in Definition 4 and by definition $|G_{\mathrm{ab}}|\ll e^{h}$ . ∎

4.8.5. Proof of Lemma 4.18

Proof.

The proof of the two inequalities is essentially the same. Without loss of generality we only prove the first inequality.

We can write

G_{\mathrm{ab}}=F_{p_{1}}\oplus\cdots\oplus F_{p_{\gamma}}

(59)

where $p_{1},\dots,p_{\gamma}$ are distinct primes and $F_{p}$ is a Sylow $p$ -subgroup of $G_{\mathrm{ab}}$ . Each $F_{p_{i}}$ has the form

F_{p_{i}}=\oplus_{j=1}^{\beta_{i}}\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}

and hence

G_{\mathrm{ab}}\cong\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}\mathbb{Z}_{p_{i}^{\alpha_{i,j}}},

(60)

where we can observe that $\max_{i\in[\gamma]}\beta_{i}\leq r$ .

Since $G_{2}Z_{a,1}\overset{iid}{\sim}\mathrm{Unif}(G_{\mathrm{ab}})$ , for each $Z_{a,1}$ with $a\in[k]$ , $G_{2}Z_{a,1}$ can be represented in the following form

\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}Z^{(a)}_{i,j}

for a collection of independent random variables $\{Z^{(a)}_{i,j}:1\leq i\leq\gamma,1\leq j\leq\beta_{i}\}$ such that $Z^{(a)}_{i,j}$ is uniform over $\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}$ . With slight abuse of notation, we will write

G_{2}Z_{a,1}=\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}Z^{(a)}_{i,j}.

Based on this we can further write

G_{2}\chi_{b}=\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}\left(\sum_{a\in[k]}\hat{m}_{ba}Z^{(a)}_{i,j}\right)=:\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}\chi^{(b)}_{i,j},

where $\chi^{(b)}_{i,j}:=\sum_{a\in[k]}\hat{m}_{ba}Z^{(a)}_{i,j}$ is an element in $\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}$ . Under the $\sigma$ -field $\widetilde{\mathcal{H}}$ , the coefficients $\{\hat{m}_{ba}:a,b\in[k]\}$ are known. Hence the collections $\{\chi^{(b)}_{i,j}:b\in[k]\}$ are independent for different $(i,j)$ ’s, and we can consider the generation of each subgroup $\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}$ by $\{\chi^{(b)}_{i,j}:b\in\mathcal{K}\}$ separately, i.e.,

\displaystyle\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{\ell+1}}\bigg{|}\widetilde{\mathcal{H}}\right]

\displaystyle\leq\prod_{i=1}^{\gamma}\prod_{j=1}^{\beta_{i}}\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}|}{|\langle\chi^{(b)}_{i,j}:b\in\mathcal{K}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right],

(61)

where as above $K=\sum_{\ell=2}^{L}r_{\ell}+2$ .

For any $p_{i}\in\Gamma$ , we will argue that on the event $\mathcal{A}$ there exists a set of indices $\mathcal{K}_{p_{i}}\subseteq\mathcal{K}$ such that for any $j\in[\beta_{i}]$ , $(\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}$ is a collection of $K$ i.i.d. uniform random variables over $\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}$ , so that one can simply apply Lemma 4.13 to the right hand side of (61) to obtain the desired conclusion.

By Definition 10, on the event $\mathcal{A}$ there exists a $K\times K$ submatrix $M$ of $(\hat{m}_{ba})_{b\in\mathcal{K},a\in[k]}$ such that $M_{p_{i}}$ has rank $K$ . We will collect the column indices of $M$ into the set $\mathcal{K}_{p_{i},col}:=\{a_{1},\dots,a_{K}\}$ (let $\mathcal{K}_{p_{i},col}=\emptyset$ if $\mathcal{A}$ does not occur). Then we can define the $\sigma$ -field $\mathcal{G}_{i,j}:=\sigma((Z^{(a)}_{i,j})_{a\notin\mathcal{K}_{p_{i},col}},\widetilde{\mathcal{H}})$ , and express the vector $(\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}$ as a sum of two parts, one of which unknown under $\mathcal{G}_{i,j}$ whereas the other known:

M(Z^{(a_{1})}_{i,j},\dots,Z^{(a_{K})}_{i,j})+e((Z^{(a)}_{i,j})_{a\notin\mathcal{K}_{p_{i},col}}),

where $e(\cdot)$ is a function of $(Z^{(a)}_{i,j})_{a\notin\mathcal{K}_{p_{i},col}}$ whose value is known under $\mathcal{G}_{i,j}$ . Proposition 11 shows that $M$ is a surjective map from $(\mathbb{Z}_{p_{i}^{\alpha_{i,j}}})^{K}$ to $(\mathbb{Z}_{p_{i}^{\alpha_{i,j}}})^{K}$ . Lemma 4.15 further implies that the $K$ entries of $M\cdot(Z^{(a_{1})}_{i,j},\dots,Z^{(a_{K})}_{i,j})$ are i.i.d. uniform over $\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}$ . It is then straightforward to see that $(\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}$ are i.i.d. uniform over $\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}$ given $\mathcal{G}_{i,j}$ . That is, for any $\mathbf{x}:=(x_{b})_{b\in\mathcal{K}_{p_{i}}}\in(\mathbb{Z}_{p_{i}^{\alpha_{i,j}}})^{K}$ , we have

\mathbf{1}_{\mathcal{A}}\cdot\mathbb{P}\left((\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}=\mathbf{x}|\widetilde{\mathcal{H}}\right)=\mathbf{1}_{\mathcal{A}}\cdot\mathbb{E}\left(\mathbb{P}\left((\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}=\mathbf{x}|\mathcal{G}_{i,j}\right)\big{|}\widetilde{\mathcal{H}}\right)=\mathbf{1}_{\mathcal{A}}\cdot(p_{i}^{\alpha_{i,j}})^{-K}.

Therefore, applying Lemma 4.13 to the i.i.d. uniform $\{\chi^{(b)}_{i,j}:b\in\mathcal{K}_{p_{i}}\}$ gives

	$\displaystyle\textbf{1}_{\mathcal{A}}\cdot\mathbb{E}\left[\left(\frac{\|\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}\|}{\|\langle\chi^{(b)}_{i,j}:b\in\mathcal{K}\rangle\|}\right)^{K-2}\bigg{\|}\widetilde{\mathcal{H}}\right]$	$\displaystyle\leq\textbf{1}_{\mathcal{A}}\cdot\mathbb{E}\left[\left(\frac{\|\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}\|}{\|\langle\chi^{(b)}_{i,j}:b\in\mathcal{K}_{p_{i}}\rangle\|}\right)^{K-2}\bigg{\|}\widetilde{\mathcal{H}}\right]$
		$\displaystyle\leq\textbf{1}_{\mathcal{A}}\cdot\exp\left(\frac{1}{p_{i}^{2}-1}\right).$

Combining all the subgroups $\{\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}:i\in[\gamma],j\in[\beta_{i}]\}$ , we can obtain using (61) that

		$\displaystyle\textbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{\|G_{\mathrm{ab}}\|}{\|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle\|}\right)^{K-2}\bigg{\|}\widetilde{\mathcal{H}}\right]$
	$\displaystyle\leq$	$\displaystyle\textbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\prod_{i=1}^{\gamma}\prod_{j=1}^{\beta_{i}}\exp\left(\frac{1}{p_{i}^{2}-1}\right)\leq\textbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right)$

as $\max_{i\in[\gamma]}\beta_{i}\leq r$ . ∎

4.9. Proof of Proposition 10

In the proof of Proposition 9 we have conditioned on the “good event” $\mathcal{A}$ that guarantees the existence of a subset $\mathcal{K}\subseteq[k]$ of indices that plays a critical role in the proof. In this section we aim to prove that indeed $\mathcal{A}$ will occur with a sufficiently high probability.

4.9.1. Regime: $\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|$

Recall the definition of $h$ from Definition 4. In this regime we will work with the unconditional probability and prove the following stronger bound than that in Proposition 10:

\mathbb{P}(\mathcal{A}^{c})\ll\frac{e^{h}}{|G|}=\frac{e^{\omega}}{|G_{2}|}.

(62)

Indeed, with (62) and $\mathbb{P}(\mathrm{\mathbf{typ}})\asymp 1$ this yields the statement in Proposition 10 as follows

|G|\cdot\mathbb{P}(\mathcal{A}^{c},V=0|\mathrm{\mathbf{typ}})\leq\frac{|G|\cdot\mathbb{P}(\mathcal{A}^{c})}{\mathbb{P}(\mathrm{\mathbf{typ}})}\ll e^{h}.

First we will specify the choice of $\mathcal{K}$ in this regime and verify this choice satisfies (47) in Definition 10,

\displaystyle\mathcal{K}

\displaystyle=\{b>k/2:\mathrm{gcd}(\{m_{ba}:a\leq k/2\})\text{ and }|G_{\mathrm{ab}}|\text{ are coprime}\}.

(63)

Note this choice purely depends on $(m_{ba})_{a,b\in[k],a<b}$ and hence is measurable with respect to $\widetilde{\mathcal{H}}$ . To see that $(G_{2}\psi_{b})_{b\in\mathcal{K}}$ are conditionally independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ given $\widetilde{\mathcal{H}}$ , we observe that for any $b\in\mathcal{K}$ ,

G_{2}\psi_{b}=G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}+G_{2}\left(\sum_{a>k/2:a<b}\hat{m}_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}\right),

where $G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}\sim\mathrm{Unif}(G_{\mathrm{ab}})$ due to the definition of $\mathcal{K}$ , see Lemma 4.7.

Furthermore, we can see that $G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}$ involves terms in $(Z_{a,1})_{a\leq k/2}$ , whereas $G_{2}\left(\sum_{a>k/2:a<b}\hat{m}_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}\right)$ involves only terms in $(Z_{a,1})_{a>k/2}$ (since $b\in\mathcal{K}$ , in the second sum the condition $a>b$ leads to $a>k/2$ ). Therefore, conditioned on $\widetilde{\mathcal{H}}$ , we have that $G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}$ is independent from $G_{2}\left(\sum_{a>k/2:a<b}\hat{m}_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}\right)$ . By the same reasoning we also have that $G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}$ is independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ . With the information combined, we can see that conditionally on $\widetilde{\mathcal{H}}$ , for any $b\in\mathcal{K}$ , $G_{2}\psi_{b}$ is independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ and uniform over $G_{\mathrm{ab}}$ .

Noting that

G_{2}\chi_{b}=G_{2}\sum_{a\in[k]}\hat{m}_{ba}Z_{a,1}=G_{2}\sum_{a\leq k/2}\hat{m}_{ba}Z_{a,1}+G_{2}\sum_{a>k/2}\hat{m}_{ba}Z_{a,1},

by the same reasoning as above we have that $G_{2}\sum_{a\leq k/2}\hat{m}_{ba}Z_{a,1}$ is independent from $G_{2}\sum_{a>k/2}\hat{m}_{ba}Z_{a,1}$ and $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ . Again by the definition of $\mathcal{K}$ we have $G_{2}\sum_{a\leq k/2}\hat{m}_{ba}Z_{a,1}\sim\mathrm{Unif}(G_{\mathrm{ab}})$ . That is, conditionally on $\widetilde{\mathcal{H}}$ , for any $b\in\mathcal{K}$ , $G_{2}\chi_{b}$ is independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ and uniform over $G_{\mathrm{ab}}$ . Therefore we have verified the condition that $(G_{2}\psi_{b})_{b\in\mathcal{K}},(G_{2}\chi_{b})_{b\in\mathcal{K}}$ are independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ in Definition 10.

Given the choice of $\mathcal{K}$ in (63), for each $p\in\Gamma$ we can define $\mathcal{A}_{p}(\mathcal{K})$ to be the event that there exists a $K\times K$ submatrix $M$ of $(\hat{m}_{ba})_{b\in\mathcal{K},a\in[k]}$ such that $M_{p}:=M\mod p$ has rank $K$ . Observe that $\cap_{p\in\Gamma}\mathcal{A}_{p}(\mathcal{K})\subseteq\mathcal{A}$ and hence

\mathbb{P}(\mathcal{A}^{c})\leq\sum_{p\in\Gamma}\mathbb{P}(\mathcal{A}_{p}(\mathcal{K})^{c}).

Recall that our goal is to show $\mathbb{P}(\mathcal{A}^{c})\ll\frac{e^{h}}{|G|}$ , where $h$ is as in Definition 4. As $|\Gamma|\lesssim\log|G|$ , it suffices to prove $\max_{p\in\Gamma}\mathbb{P}(\mathcal{A}_{p}(\mathcal{K})^{c})\ll\frac{e^{h}}{|G|\log|G|}$ . In this regime we can prove a much stronger estimate.

Proposition 13.

When $\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|$ , for any $C>0$ ,

\max_{p\in\Gamma}\mathbb{P}(\mathcal{A}_{p}(\mathcal{K})^{c})=o(|G|^{-C}).

In order to prove Proposition 13 we begin by discussing a useful property, which we refer to as the relative independence of the coefficient matrix $(\hat{m}_{ba})_{a,b\in[k]}$ , that holds in the regime $\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|$ .

Relative independence in $\hat{m}$ . Recall that in this regime $t_{*}=t_{0}\asymp k|G_{\mathrm{ab}}|^{2/k}$ . We will consider $t\geq(1+\varepsilon)t_{0}$ and hence $s:=t/k\gg 1$ , i.e., in both of the random walks $X(t)$ and $X^{\prime}(t)$ each generator in $\{Z_{a}:a\in[k]\}$ should typically appear many times. As a result, we will see certain “relative independence” among a subset of terms in $\{m_{ba}:a<b\}$ . Before explaining the meaning of relative independence we need to define some notation.

We can view $\mathbf{X}=X(X^{\prime})^{-1}$ as a sequence of generators, and express this sequence by $(\sigma_{i},\eta_{i})_{i\in[N+N^{\prime}]}$ . In the following construction, we will obtain partial information on $(\sigma_{i})_{i\in[N+N^{\prime}]}$ while conditioning on $\sigma((\eta_{i})_{i\in\mathbb{N}})$ . That is, at this stage of construction we will treat $(\eta_{i})_{i\in\mathbb{N}}$ as known.

Let $P\subseteq[k]$ be a subset of indices with size $d$ , where $d$ is to be chosen later. We will denote by $\mathbf{X}_{P}$ the subsequence of $\mathbf{X}$ consisting of only generators in $\{Z_{a}^{\pm 1}:a\in P\}$ and let $\mathcal{N}_{P}(t)$ denote the length of $\mathbf{X}_{P}$ , which follows a Poisson distribution with rate $\frac{2td}{k}$ by Poisson thinning.

Based on the subsequence $\mathbf{X}_{P}$ , we will construct a collection of disjoint sets $\{B_{ab}:a,b\in P,a<b\}$ as follows:

(1)

Partition $\mathbf{X}_{P}$ into $\lfloor\mathcal{N}_{P}(t)/2\rfloor$ disjoint pairs, each pair consisting of the $2i-1$ and $2i$ -th elements in $\mathbf{X}_{P}$ .
(2)

For $a,b\in[k]$ with $a<b$ , we look at the $i$ -th pair in $\mathbf{X}_{P}$ for each $1\leq i\leq\lfloor\mathcal{N}_{P}(t)/2\rfloor$ . If the two corresponding generators in the $i$ -th pair are $Z^{\eta_{a}}_{a},Z^{\eta_{b}}_{b}$ for some $\eta_{a},\eta_{b}\in\{\pm 1\}$ (where $\eta_{a},\eta_{b}$ are known) regardless of the order in which they appear, then we record the pair of locations $(2i-1,2i)$ in the corresponding set $B_{ab}$ .

Note that in the construction of $B_{ab}$ , we have no knowledge on the exact order of the pair, that is, we only know the pair is either $(Z_{a}^{\eta_{a}},Z_{b}^{\eta_{b}})$ or $(Z_{b}^{\eta_{b}},Z_{a}^{\eta_{a}})$ with equal probability.

The subset $P$ is said to be nice if $|B_{ab}|\geq\lfloor\frac{t}{2dk}\rfloor$ for all $a,b\in P$ with $a<b$ . As shown in the following lemma, $P$ is nice with high probability when we pick the right size $d$ .

Lemma 4.19.

The probability that the subset $P$ is nice is at least $1-Cd^{2}\exp(-\frac{t}{10dk})$ for some positive constant $C>0$ independent of $d$ .

Proof.

Since $\mathcal{N}_{P}(t)\sim\mathrm{Poisson}(\frac{2td}{k})$ , by a Chernoff bound argument

\mathbb{P}\left(\mathcal{N}_{P}(t)<\frac{td}{k}\right)\leq e^{-(c_{0}td)/k}

for some constant $c_{0}>0$ . Since each pair in the subsequence belongs to some $B_{ab}$ independently with probability $\frac{2}{d^{2}}$ ,

	$\displaystyle\mathbb{P}(\text{$P$ is not nice})$	$\displaystyle\leq d^{2}\mathbb{P}\left(\|B_{12}\|<\lfloor\frac{t}{2dk}\rfloor\bigg{\|}N_{P}(t)\geq\frac{td}{k}\right)+e^{-(c_{0}td)/k}$
		$\displaystyle\leq d^{2}\mathbb{P}\left(\mathrm{Binomial}\left(\lfloor\frac{td}{2k}\rfloor,\frac{2}{d^{2}}\right)<\lfloor\frac{t}{2dk}\rfloor\right)+e^{-(c_{0}td)/k}$
		$\displaystyle\leq d^{2}\exp\left(-\frac{t}{10dk}\right)+e^{-(c_{0}td)/k}\leq Cd^{2}\exp\left(-\frac{t}{10dk}\right)$

where the third line follows from a Chernoff bound. ∎

The reason to look at the collection $\{B_{ab}:a,b\in P,a<b\}$ is the following simple observation: when revealing the relative order of a pair of $\{Z_{a}^{\eta_{a}},Z_{b}^{\eta_{b}}\}$ in $B_{ab}$ , if we have $(Z_{b}^{\eta_{b}},Z_{a}^{\eta_{a}})$ then it leads to an increment of $-\eta_{a}\eta_{b}$ on the value of $m_{ba}$ ; if instead we have $(Z_{a}^{\eta_{a}},Z_{b}^{\eta_{b}})$ then the resulting increment on $m_{ba}$ is 0. Taking account of the fact that for different pairs in $B_{ab}$ , the corresponding $-\eta_{a}\eta_{b}$ ’s are i.i.d. random variables uniform in $\{\pm 1\}$ , we can expect diffusive behavior that “smoothes out” the probability of $m_{ba}$ taking a certain value.

Next we will translate this intuition into a rigorous proof. We first define the following quantity:

q(n):=\max_{x\in\mathbb{Z}}\mathbb{P}(\mathrm{Bin}(n,1/2)=x).

(64)

It is well known that $q(n)$ is non-increasing with respect to $n$ and there exists some $C>0$ such that $q(n)\leq Cn^{-1/2}$ for all $n\in\mathbb{N}$ .

Lemma 4.20.

Let $S_{n}\sim\mathrm{Bin}(n,1/2)$ for some $n\in\mathbb{N}$ . For any prime $p$ ,

\max_{x\in\mathbb{Z}}\mathbb{P}(S_{n}=x\mod p)\leq\min\{2/p,1/2\}+q(n).

Proof.

The proof follows from the idea used in Lemma 2.14 of [17]. It is easy to see that any distribution on $\mathbb{N}$ whose probability mass function is non-decreasing can be written as a mixture of $\mathrm{Unif}(\{1,...,Y\})$ distributions, for different $Y\in\mathbb{N}$ .

For an even $n\in\mathbb{N}$ the binomial distribution $\mathrm{Bin}(n,1/2)$ is known to be unimodal, i.e., the mode $x_{*}:=\mathrm{argmax}_{x}\mathbb{P}(S_{n}=x)$ is unique. When $n\in\mathbb{N}$ is odd, the maximum of $\mathbb{P}(S_{n}=x)$ is achieved at two adjacent values. Without loss of generality, let $x_{*}:=\min\{\mathrm{argmax}_{x}\mathbb{P}(S_{n}=x)\}$ .

Letting $\bar{S}_{n}:=S_{n}-x_{*}$ , it is easy to see the map $m\mapsto\mathbb{P}(|\bar{S}_{n}|=m)$ is non-increasing on $\mathbb{N}$ and hence we can write

|\bar{S}_{n}|\sim\mathrm{Unif}(\{1,...,Y\})\quad\text{conditional on}\quad\bar{S}_{n}\neq 0,

for some random variable $Y\in\mathbb{N}$ whose law is insignificant to us. As a consequence, when $p>2$ for $x\in\mathbb{Z}$ ,

	$\displaystyle\mathbb{P}(\bar{S}_{n}=x\mod p\|\bar{S}_{n}\neq 0)$	$\displaystyle\leq\mathbb{P}(\|\bar{S}_{n}\|+x\in p\mathbb{Z}\|\bar{S}_{n}\neq 0)+\mathbb{P}(\|\bar{S}_{n}\|-x\in p\mathbb{Z}\|\bar{S}_{n}\neq 0)$
		$\displaystyle\leq 2\mathbb{E}(\lfloor Y/p\rfloor/Y)\leq 2/p.$

That is, for any $x\in\mathbb{Z}$ , $\mathbb{P}(S_{n}=x\mod p)=\mathbb{P}(\bar{S}_{n}=(x-x_{*})\mod p)\leq 2/p$ .

When $p=2$ , by the same reasoning we have the following bound

\max_{x\in\{0,1\}}\mathbb{P}(\bar{S}_{n}=x\mod 2|\bar{S}_{n}\neq 0)\leq 1/2.

Therefore,

	$\displaystyle\max_{x\in\mathbb{Z}}\mathbb{P}(S_{n}=x\mod p)$	$\displaystyle=\max_{x\in\mathbb{Z}}\mathbb{P}(\bar{S}_{n}=x\mod p)$
		$\displaystyle\leq\mathbb{P}(\bar{S}_{n}=0)+\max_{x\in\mathbb{Z}}\mathbb{P}(\bar{S}_{n}=x\mod p\|\bar{S}_{n}\neq 0)\leq q(n)+\min\{2/p,1/2\}.$

∎

Now we are ready to state the meaning of “relative independence” in the terms of $(m_{ba})_{a,b\in P,a<b}$ . The following lemma implies that conditioned on $P$ being nice, for any $(x_{ba})_{a,b\in P:a<b}$ , the collection $(\textbf{1}\{m_{ab}=x_{ba}\})_{a,b\in P,a<b}$ (respectively, $(\textbf{1}\{m_{ab}=x_{ba}\mod p\})_{a,b\in P,a<b}$ ) is stochastically dominated by i.i.d. Bernoulli random variables with probability $q(\lfloor\frac{t}{2dk}\rfloor)$ (respectively, $\min\{2/p,1/2\}+q(\lfloor\frac{t}{2dk}\rfloor)$ ).

Lemma 4.21.

Let $q:=q(\lfloor\frac{t}{2dk}\rfloor)$ . For $A\subseteq\{(a,b):a,b\in P,a<b\}$ , let $\mathcal{F}_{A^{c}}=\sigma((m_{ba})_{(a,b)\notin A,a,b\in P,a<b})$ . Then for any $(x_{ba})_{(a,b)\in A}\in\mathbb{Z}^{|A|}$ we have that

\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\}|\mathcal{F}_{A^{c}},P\text{ is nice})\leq q^{|A|},

(65)

and furthermore,

\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\mod p\}|\mathcal{F}_{A^{c}},P\text{ is nice})\leq(\min\{2/p,1/2\}+q)^{|A|}.

(66)

Proof.

For $a,b\in P$ with $a<b$ , we first recall from (34) the definition of $m_{ba}$ and observe that $m_{ba}$ is in fact determined purely by the subsequence $\mathbf{X}_{P}$ . When referring to $\mathbf{X}_{P}$ as a subsequence, we essentially use the information given by $(\sigma_{P,i},\eta_{P,i})_{i\in[\mathcal{N}_{P}]}$ , where $\sigma_{P,i}$ denotes the index of the $i$ -th term in $\mathbf{X}_{P}$ and $\eta_{P,i}$ its sign. Hence we can write

m_{ba}=-\sum_{j=1}^{\mathcal{N}_{P}}\sum_{i<j}\eta_{i}\eta_{j}\mathbf{1}\{\sigma_{P,j}=a,\sigma_{P,i}=b\}\quad\text{ for }a,b\in P\text{ with }a<b.

(67)

With a slight abuse of notation, let $B^{c}:=\{i\in[\mathcal{N}_{P}]:i\notin\cup_{a,b\in P,a<b}B_{ab}\}$ denote the collection of locations in $\mathbf{X}_{P}$ that are not recorded in any $B_{ab}$ ’s. Let $\mathcal{G}_{\eta}:=\sigma(\{\eta_{P,i}:i\in\mathbb{N}\})$ (we reveal all the signs $\eta_{P,i}$ regardless of the value of $\mathcal{N}_{P}$ ) and define

\mathcal{G}:=\sigma(\{B_{ab}:a,b\in P,a<b\},(\sigma_{i})_{i\in B^{c}},\mathcal{G}_{\eta}).

From now on, we will condition on the $\sigma$ -field $\mathcal{G}$ . By conditioning on $\mathcal{G}$ , we are treating all the signs as known and revealing the indices of generators that are not in any $B_{ab}$ ’s. Hence, by (67) $m_{ba}$ can be written as the sum of two parts, one independent from $\mathcal{G}$ and the other known under $\mathcal{G}$ :

m_{ba}=\sum_{i\in\mathbb{N}:(2i-1,2i)\in B_{ab}}-\eta_{2i-1}\eta_{2i}\mathbf{1}\{\sigma_{P,2i}=a,\sigma_{P,2i-1}=b\}+m^{known}_{ba}

(68)

where $m_{ba}^{known}$ represents the cumulated increment from the pairs of $\{Z_{a}^{\pm},Z_{b}^{\pm}\}$ that are not in $B_{ab}$ , which is known under $\mathcal{G}$ . We can further observe that the collection of random variables $(\mathbf{1}\{\sigma_{P,2i}=a,\sigma_{P,2i-1}=b\})_{(2i-1,2i)\in B_{ab}}$ from the first part of (68) are i.i.d. $\mathrm{Bernoulli}(1/2)$ and independent from $\mathcal{G}$ .

Let $n_{ab}^{\pm}:=|\{(2i-1,2i)\in|B_{ab}|:-\eta_{2i-1}\eta_{2i}=\pm 1\}|$ . (Note that $n^{+}_{ab}+n^{-}_{ab}=|B_{ab}|$ and $n_{ab}^{\pm}$ are measurable with respect to $\mathcal{G}$ .) We can then define two independent binomial random variables $Y^{+}_{ab}\sim\mathrm{Bin}(n_{ab}^{+},1/2)$ , $Y^{-}_{ab}\sim\mathrm{Bin}(n^{-}_{ab},1/2)$ so that the increment on $m_{ba}$ resulted from revealing the orders of the pairs in $B_{ab}$ is given by $Y^{+}_{ab}-Y^{-}_{ab}=Y_{ab}-n^{-}_{ab}$ , where $Y_{ab}:=Y^{+}_{ab}+(n^{-}_{ab}-Y^{-}_{ab})\sim\mathrm{Bin}(|B_{ab}|,1/2)$ . It follows from (68) that, for any $x\in\mathbb{Z}$ ,

$\displaystyle\mathbb{P}(m_{ba}=x\|\mathcal{G},P\text{ is nice})$	$\displaystyle\leq\max_{x^{\prime}\in\mathbb{Z}}\mathbb{P}\left(\sum_{(2i-1,2i)\in B_{ab}}-\eta_{2i-1}\eta_{2i}\mathbf{1}\{\sigma_{P,2i}=a,\sigma_{P,2i-1}=b\}=x^{\prime}\bigg{\|}\mathcal{G},P\text{ is nice}\right)$
	$\displaystyle=\max_{x^{\prime}\in\mathbb{Z}}\mathbb{P}\left(Y^{+}_{ab}-Y^{-}_{ab}=x^{\prime}\bigg{\|}\mathcal{G},P\text{ is nice}\right)$
	$\displaystyle=\max_{x^{\prime\prime}\in\mathbb{Z}}\mathbb{P}\left(Y_{ab}=x^{\prime\prime}\bigg{\|}\mathcal{G},P\text{ is nice}\right)$
	$\displaystyle=\max_{x^{\prime\prime}\in\mathbb{Z}}\mathbb{P}\left(\mathrm{Bin}(\|B_{ab}\|,1/2)=x^{\prime\prime}\|P\text{ is nice}\right)\leq q.$	(69)

Hence we have the desired upper bound for a given pair of $a,b\in P,a<b$ .

\mathbb{P}(m_{ba}=x|P\text{ is nice})\leq q.

Now note that for different pairs $(a,b)\neq(a^{\prime},b^{\prime})$ with $a,a^{\prime},b,b^{\prime}\in P$ and $a<b,a^{\prime}<b^{\prime}$ , by our construction their corresponding $B_{ab}$ and $B_{a^{\prime}b^{\prime}}$ are disjoint. Hence $(Y_{ab})_{a,b\in P,a<b}$ is a collection of independent random variables. Carrying out a calculation similar to (4.9.1) using $(Y_{ab})_{a,b\in P,a<b}$ leads to

	$\displaystyle\max_{(x_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\}\|\mathcal{F}_{A^{c}},\text{$P$ is nice})$	$\displaystyle\leq\max_{(y_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{Y_{ab}=y_{ba}\}\|\mathcal{F}_{A^{c}},\text{$P$ is nice})$
		$\displaystyle\leq q^{\|A\|}.$

Similarly, by Lemma 4.20

		$\displaystyle\max_{(x_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\mod p\}\|\mathcal{F}_{A^{c}},\text{$P$ is nice})$
	$\displaystyle\leq$	$\displaystyle\max_{(y_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{Y_{ab}=y_{ba}\mod p\}\|\mathcal{F}_{A^{c}},\text{$P$ is nice})$
	$\displaystyle\leq$	$\displaystyle\max_{(y_{ba})_{(a,b)\in A}}\prod_{(a,b)\in A}\mathbb{P}(\mathrm{Bin}(\lfloor\frac{t}{2dk}\rfloor,1/2)=y_{ba}\mod p)\leq(\min\{2/p,1/2\}+q)^{\|A\|}.$

∎

Proof of Proposition 13. Recall that $K=\sum_{\ell=2}^{L}r_{\ell}+2$ . Let $d\geq K$ be an even integer to be chosen later. Partition $\{a\in[k]:a\leq k/2\}$ into subsets $\{\mathcal{J}_{1,i}:1\leq i\leq\lfloor k/(2d)\rfloor\}$ , each of size $d\geq K$ , and omit the rest of the generators. Without loss of generality assume $d$ is even. Similarly we partition $\{b\in[k]:b>k/2\}$ into subsets $\{\mathcal{J}_{2,i}:1\leq i\leq\lfloor k/(2d)\rfloor\}$ of size $d$ . Let $P_{i}=\mathcal{J}_{1,i}\cup\mathcal{J}_{2,i}$ . Since the arrivals of generators whose indices are in disjoint sets $\{P_{i}:1\leq i\leq\lfloor k/(2d)\rfloor\}$ are independent, we can try independently for $\lfloor k/(2d)\rfloor$ times to search for a $K\times K$ submatrix $M$ in $(\hat{m}_{ba})_{b\in\mathcal{K}\cap\mathcal{J}_{2,i},a\in\mathcal{J}_{1,i}}$ such that $M_{p}$ has rank $K$ .

For each $1\leq i\leq\lfloor k/(2d)\rfloor$ we perform the following trial. Let $\mathcal{K}$ be as in (63). Since we will only be looking at $b\in\mathcal{J}_{2,i}\cap\mathcal{K}$ , we need to determine if $|\mathcal{J}_{2,i}\cap\mathcal{K}|$ is large enough to begin with. If $|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2$ , we will look for the desired $K\times K$ submatrix $M$ in $(\hat{m}_{ba})_{b\in\mathcal{J}_{2,i}\cap\mathcal{K},a\in\mathcal{J}_{1,i}}$ . We will look at a batch of $K$ indices in $\mathcal{J}_{2,i}\cap\mathcal{K}$ at a time, which will be denoted by $\{b_{1},\dots,b_{K}\}$ . Let $\mathcal{J}^{\mathrm{1st}}_{1,i}$ denote the first half of $\mathcal{J}_{1,i}$ and $\mathcal{J}^{\mathrm{2nd}}_{1,i}$ the second half. Our goal is to search for column indices in $\mathcal{J}^{\mathrm{2nd}}_{1,i}$ , which will be labelled $a_{1},\dots,a_{K}$ , such that the submatrix $M$ induced by the rows $\{b_{1},\dots,b_{K}\}$ and the columns $\{a_{1},\dots,a_{K}\}$ of $\hat{m}$ satisfies the condition that $M_{p}$ has rank $K$ .

We will describe the steps of the $i$ -th trial now:

(1)

If $|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2$ proceed to the search of submatrix $M$ . Otherwise declare failure for this trial.
(2)

We look for the first $a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}$ such that $\hat{m}_{b_{1},a}\neq 0\mod p$ and set it as $a_{1}$ . If there is no such $a$ declare this trial to be a failure.
(3)

The search will then proceed iteratively: for $u\geq 1$ , given the choice of $\{a_{1},\dots,a_{u}\}$ , we will look for the first $a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\}$ such that the last row $(m_{b_{u+1},a_{1}},\dots,m_{b_{u+1},a_{u+1}})\mod p$ is not in the vector space spanned by the previous rows $\{(m_{b_{i},a_{1}},\dots,m_{b_{i},a_{u+1}})\mod p:1\leq i\leq u\}$ . If there is no $a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\}$ that works, then we declare failure for this trial.
(4)

The trial is a success if we have found $\{a_{1},\dots,a_{K}\}$ .

It remains to estimate the success probability for each trial, i.e., upper bound the failure probability in each step of the trial.

Step 1. For any $b\in\mathcal{J}_{2,i}$ , to upper bound the probability that $b\notin\mathcal{K}$ , we will reveal the orders in $\{B_{ab}:a\in\mathcal{J}^{\mathrm{1st}}_{1,i}\}$ . (By the relative independence in $\hat{m}$ , we can still control the failure probability of our later search in $\mathcal{J}^{\mathrm{2nd}}_{1,i}$ given that $\{B_{ab}:a\in\mathcal{J}^{\mathrm{1st}}_{1,i}\}$ has been revealed.)

Note that

	$\displaystyle\{b\notin\mathcal{K}\}$	$\displaystyle\subseteq\{\mathrm{gcd}(\{m_{ba}:a\in\mathcal{J}^{\mathrm{1st}}_{1,i}\})\text{ is not coprime with }\|G_{\mathrm{ab}}\|\}$
		$\displaystyle=\cup_{p\in\Gamma}\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}.$

For all $a\in[k]$ , let $N_{a}:=N_{a}(t)$ (and respectively $N^{\prime}_{a}:=N^{\prime}_{a}(t)$ ) denote the number of times the generator $Z_{a}^{\pm 1}$ appears in $X$ (and respectively $X^{\prime}$ ). Recall that the arrivals of each generator can be viewed as an independent Poisson process with rate $1/k$ . Letting

\mathcal{C}:=\{N_{c}\leq 2(t/k),N^{\prime}_{c}\leq 2(t/k)\text{ for all }c\in\{b\}\cup\mathcal{J}^{\mathrm{1st}}_{1,i}\},

we have

\mathbb{P}(\mathcal{C}^{c})\leq 2(|\mathcal{J}^{\mathrm{1st}}_{1,i}|+1)\mathbb{P}(N_{a}>2(t/k))\leq 2d\exp(-\Omega(t/k)).

Further observe that by (67) $|m_{ba}|\leq(N_{b}+N^{\prime}_{b})(N_{a}+N^{\prime}_{a})$ and hence $\mathcal{C}\subseteq\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{|m_{ba}|\leq 16(t/k)^{2}\}$ . This implies that when $p>16(t/k)^{2}$ , for $\mathcal{C}\cap(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\})$ to occur the only possible case is when $m_{ba}=0$ for all $a\in\mathcal{J}^{\mathrm{1st}}_{1,i}$ . That is,

	$\displaystyle\{b\notin\mathcal{K}\}$	$\displaystyle\subseteq\{\{b\notin\mathcal{K}\}\cap\mathcal{C}\}\cup\mathcal{C}^{c}$
		$\displaystyle\subseteq\left(\cup_{p\in\Gamma:p\leq 16(t/k)^{2}}\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}\cap\mathcal{C}\right)$
		$\displaystyle\quad\cup(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\})\cup\mathcal{C}^{c}$

By Lemma 4.19 it is easy to see $\mathbb{P}(\text{$P$ is nice})\geq 1/2$ when $t\gg kd\log d$ , which will be guaranteed by our choice of $d$ . Recall that $q:=q(\lfloor\frac{t}{2dk}\rfloor)$ . It follows from Lemma 4.21 that

	$\displaystyle\mathbb{P}(b\notin\mathcal{K}\|\text{$P$ is nice})$	$\displaystyle\leq\sum_{p\in\Gamma:p\leq 16(t/k)^{2}}\mathbb{P}(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}\|\text{$P$ is nice})$
		$\displaystyle\quad+\mathbb{P}(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\}\|\text{$P$ is nice})+\mathbb{P}(\mathcal{C}^{c}\|\text{$P$ is nice})$
		$\displaystyle\leq 16(t/k)^{2}\left(\min\{2/p,1/2\}+q\right)^{d/2}+q^{d/2}+4d\exp(-\Omega(t/k))\lesssim\exp(-\Omega(d))=:\tilde{q}$

when $\log(t/k)\ll d\ll(t/k)$ . Conditioned on $P$ being nice, by the relative independence the collection $(\mathbf{1}\{b\notin\mathcal{K}\})_{b\in\mathcal{J}_{2,i}}$ is dominated by i.i.d. Bernoulli random variables with probability $\tilde{q}$ . Since $\mathcal{J}_{2,i}$ has size $d$ ,

\mathbb{P}(|\mathcal{J}_{2,i}\cap\mathcal{K}|<d/2|\text{$P$ is nice})\leq\mathbb{P}(\mathrm{Binomial}(d,\tilde{q})>d/2)\leq 2^{d}(\tilde{q})^{d/2}\leq\exp(-\Omega(d^{2})).

(70)

Step 2 and 3: the search for $a_{u}$ with $1\leq u\leq K$ . Since $|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2$ we can try at least $\lfloor d/(2K)\rfloor$ batches of $K$ indices in $|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2$ . These trials are not exactly independent, but using the relative independence for $\{\hat{m}_{ba}:a,b\in P_{i},a<b\}$ we can upper bound the probability that all trials are failures.

We begin by estimating the failure probability for a single trial. Consider the $i$ -th trial where we will look for the candidates $\{a_{u}:1\leq u\leq K\}$ in $\mathcal{J}^{\mathrm{2nd}}_{1,i}$ . Write $\{b_{1},\dots,b_{K}\}$ as the corresponding row indices in this trial. The probability that we fail to find a $a_{1}$ is $\mathbb{P}(\hat{m}_{b_{1},a}=0\mod p\text{ for all }a\in\mathcal{J}_{1}|\text{$P$ is nice})$ , which, by Lemma 4.21, satisfies

\mathbb{P}(\hat{m}_{b_{1},a}=0\mod p\text{ for all }a\in\mathcal{J}_{1}|\text{$P$ is nice})\leq(\min\{2/p,1/2\}+q)^{\lfloor d/2\rfloor}.

For $u\in[K]$ , suppose we have found $\{a_{1},\dots,a_{u}\}$ such that the matrix induced by $\{b_{1},\dots,b_{u}\}$ and $\{a_{1},\dots,a_{u}\}$ has linearly independent rows. If a candidate $a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\}$ fails, it means that the new row

(\hat{m}_{b_{u+1},a_{1}},\dots,\hat{m}_{b_{u+1},a_{u}},\hat{m}_{b_{u+1},a})\mod p

is in the vector space spanned by previous $u$ rows $\{(\hat{m}_{b_{i},a_{1}},\dots,\hat{m}_{b_{i},a_{u}},\hat{m}_{b_{i},a})\mod p:1\leq i\leq u\}$ . Since by assumption the matrix induced by $\{b_{1},\dots,b_{u}\}$ and $\{a_{1},\dots,a_{u}\}$ has independent rows, there exists a unique linear combination $(c_{1},\dots,c_{u})\in\mathbb{Z}_{p}^{u}$ such that

(\hat{m}_{b_{u+1},a_{1}},\dots,\hat{m}_{b_{u+1},a_{u}})=\sum_{i=1}^{u}c_{i}(\hat{m}_{b_{i},a_{1}},\dots,\hat{m}_{b_{i},a_{u}})\mod p,

and thus the last column needs to satisfy

\hat{m}_{b_{u+1},a}=\sum_{i=1}^{u}c_{i}\hat{m}_{b_{i},a}\mod p.

Therefore, by Lemma 4.21 the failure probability for a candidate $a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\}$ is at most

\mathbb{P}(\hat{m}_{b_{u+1},a}=\sum_{i=1}^{u}c_{i}\hat{m}_{b_{i},a}\mod p|\text{$P$ is nice})\leq\min\{2/p,1/2\}+q.

The relative independence in $\{\hat{m}_{ba}:a,b\in P_{i},a<b\}$ implies that the probability of failing to find $a_{u+1}$ in the set $\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\}$ is at most $(\min\{2/p,1/2\}+q)^{\lfloor d/2\rfloor-u}$ . Through a simple union bound we see that the batch $\{b_{1},\dots,b_{K}\}$ fails with probability at most

\sum_{u=1}^{K}\mathbb{P}(\text{the search for $a_{u}$ fails}|\text{$P$ is nice})\leq\sum_{u=1}^{K}(\min\{2/p,1/2\}+q)^{\lfloor d/2\rfloor-r}\leq K(\min\{2/p,1/2\}+q)^{d/2-K}.

(71)

Combining all these failure probabilities, i.e., Lemma 4.19, (70), (71), and using the fact that $\{\hat{m}_{ba}:a,b\in P_{i}\}$ are independent for the disjoint index sets $\{P_{i}:1\leq i\leq\lfloor k/(2d)\rfloor\}$ , we have

$\displaystyle\mathbb{P}(\mathcal{A}_{p}^{c})$	$\displaystyle\leq\mathbb{P}(\text{the 1st trial fails})^{\lfloor k/(2d)\rfloor}$
	$\displaystyle\leq\left(\mathbb{P}(P_{1}\text{ is not nice})+\mathbb{P}(\|\mathcal{J}_{2,i}\cap\mathcal{K}\|<d/2\|\text{$P$ is nice})+\left(K(q+\min\{2/p,1/2\})^{(d/2)-K}\right)^{\lfloor d/(2K)\rfloor}\right)^{\lfloor k/(2d)\rfloor}$
	$\displaystyle\leq\left(Cd^{2}\exp\left(-\Omega\left(\frac{t}{dk}\right)\right)+\exp(-\Omega(d^{2}))+\left(K(q+\min\{2/p,1/2\})^{(d/2)-K}\right)^{\lfloor d/(2K)\rfloor}\right)^{\lfloor k/(2d)\rfloor}$
	$\displaystyle\leq 3^{k/(2d)}\left(\left(Cd^{2}\exp\left(-\Omega\left(\frac{t}{dk}\right)\right)\right)^{\lfloor k/(2d)\rfloor}+\exp(-\Omega(kd))+\exp(-\Omega(kd))\right)$	(72)

by the elementary inequality $(a+b+c)^{n}\leq 3^{n}(a^{n}+b^{n}+c^{n})$ for $n\geq 1$ .

Recall that $t\geq(1+\varepsilon)t_{*}(k,G)\asymp k|G_{\mathrm{ab}}|^{2/k}$ in the currently considered regime. Our choice of $d$ should satisfy $1\ll d\ll k$ and $\frac{t}{dk}\gg 1$ so that $q=o(1)$ . It was also required that $\log(t/k)\ll d\ll(t/k)$ right before (70). Furthermore, we will choose $d$ satisfying $\frac{t}{dk}\gg\log d$ and $\frac{t}{d^{2}}\gg\log|G|$ so that the first term in (4.9.1) is $o(|G|^{-C})$ for any $C>0$ . To control the second and third terms in (4.9.1), the choice of $d$ also needs to satisfy $kd\gg\log|G|$ .

We choose $d=|G_{\mathrm{ab}}|^{\delta/k}$ for some sufficiently small $\delta>0$ , which satisfies all the conditions listed above. Consequently, for any $C>0$ ,

\mathbb{P}(\mathcal{A}_{p}^{c})=o(|G|^{-C}),

which leads to Proposition 13. ∎

4.9.2. Regime: $k\gtrsim\log|G_{\mathrm{ab}}|$

In this regime $s=t/k\lesssim 1$ and we no longer have the relative independence in $\hat{m}$ , so we will take a different approach.

Recall that $W_{a}^{\pm}:=\sum_{i=1}^{N(t)}1\{\sigma_{i}=a,\eta_{i}=\pm 1\}$ tracks the number of times $Z^{\pm 1}_{a}$ appears in $X(t)$ . We will only look at these set of generators

\mathcal{J}=\{a\in[k]:W_{a}^{+}+W_{a}^{-}=1\},

that appear exactly once in $X:=X(t)$ . In this regime, we will be conditioning on $\{V=0,\mathrm{\mathbf{typ}}\}$ , see Definition 5 for the definition of $\mathrm{\mathbf{typ}}$ in this regime. Denote by $X|_{\mathcal{J}}$ the subsequence of $X$ that contains only generators in $\mathcal{J}$ . For simplicity of notation, for any $a\in\mathcal{J}$ we will assume that $Z_{a}$ (instead of $Z_{a}^{-1}$ ) appears in $X$ . Otherwise we can just relabel $Z_{a}^{-1}$ as a new $Z_{a}$ . We also want to emphasize that only the order in which $(Z_{a})_{a\in\mathcal{J}}$ appear in $X|_{\mathcal{J}}$ is important in the following argument, and the values of $(Z_{a})_{a\in\mathcal{J}}$ are not.

By (67) it is easy to see that on the event $\{V=0\}$ we have $m_{ba}\in\{-1,0,1\}$ for $a,b\in\mathcal{J}$ since $Z_{a},Z_{b}$ only appear once in both $X$ and $X^{\prime}$ . Basically, if $\{Z_{a},Z_{b}\}$ appears in the same order in $X^{\prime}$ as they do in $X$ then we have $\hat{m}_{ba}=0$ , otherwise $|\hat{m}_{ba}|=1$ .

Our goal is to look for an upper triangular submatrix $M$ of $\hat{m}$ that has full rank $K$ . Indeed, since $M$ is upper triangular and all the entries of $M$ are in $\{-1,0,1\}$ , if $M$ has full rank $K$ then for all $p\in\Gamma$ , $M_{p}$ also has full rank over the field $\mathbb{F}_{p}$ . The proof is based on a combinatorial argument where we interpret $(|\hat{m}_{ba}|)_{a,b\in\mathcal{J}}$ in terms of the relative order of $Z_{a},Z_{b}$ in $X|_{\mathcal{J}}$ and $X^{\prime}|_{\mathcal{J}}$ .

Let $\{c_{i}:i\in\mathbb{N}\}$ be a set of distinct colors. Let $d$ be a positive integer to be determined later and $n:=\lfloor|\mathcal{J}|/d\rfloor$ . For $1\leq i\leq n$ , we will color the $((i-1)d+1)$ -th to the $id$ -th generator that appears in $X|_{\mathcal{J}}$ as color $c_{n-i+1}$ . In other words, the first $d$ generators in $X|_{\mathcal{J}}$ have color $c_{n}$ , followed by $d$ generators of color $c_{n-1}$ and so on. Let $\mathcal{J}_{i}$ denote the set of generators in $\mathcal{J}$ that are colored $c_{i}$ . Our coloring scheme implies that in the sequence $X|_{\mathcal{J}}$ , for any $i>i^{\prime}$ , any generator belonging to $\mathcal{J}_{i}$ is in front of any generator belonging to $\mathcal{J}_{i^{\prime}}$ . In order to understand $(|\hat{m}_{ba}|)_{a,b\in\mathcal{J}}$ it remains to determine the relative orders of $\{(Z_{a},Z_{b}):a,b\in\mathcal{J}\text{ and $Z_{a},Z_{b}$ are in different colors}\}$ in $X^{\prime}|_{\mathcal{J}}$ .

Note that $X^{\prime}|_{\mathcal{J}}$ has the distribution of a uniform permutation of $(Z_{a})_{a\in\mathcal{J}}$ , which means we can construct this subsequence by inserting the generators to an existing sequence uniformly at random, one by one. Write $\mathcal{J}_{i}=(Z_{x_{i,1}},\dots,Z_{x_{i,d}})$ for $1\leq i\leq n$ . In order to construct $X^{\prime}|_{\mathcal{J}}$ , we first sample a uniform permutation of $(Z_{a})_{a\in\mathcal{J}_{1}}$ and denote it by $X^{\prime}|_{\mathcal{J}_{1}}$ . Without loss of generality, we can label them as $(Z_{x_{1,1}},\dots,Z_{x_{1,d}})$ . Next we will insert the generators from $\mathcal{J}_{2}$ into this sequence, one by one. Once we are done with inserting the generators in $\mathcal{J}_{2}$ , we proceed to insert the generators from $\mathcal{J}_{3},\mathcal{J}_{4}$ and so on. Since all the generators are inserted at random and one by one, the resulting sequence will be a uniform permutation of the generators in $\mathcal{J}$ .

Given the sequence $X^{\prime}|_{\mathcal{J}}$ , we can define a collection of good events $\{\mathcal{C}_{i}:2\leq i\leq n\}$ . As an example, we will first define $\mathcal{C}_{2}$ . For any $l\in[d]$ , if $Z_{x_{2,l}}$ is inserted between $Z_{x_{1,j}}$ and $Z_{x_{1,j+1}}$ for some $0\leq j\leq d$ (let $j=0$ if $Z_{x_{2,l}}$ is inserted in front of $Z_{x_{1,1}}$ whereas let $j=d$ if it is inserted behind $Z_{x_{1,d}}$ ), then

|\hat{m}_{x_{1,j^{\prime}},x_{2,l}}|=\begin{cases}1&\text{ for }0\leq j^{\prime}\leq j,\\ 0&\text{ for }j<j^{\prime}\leq d.\end{cases}

We will let $\mathcal{C}_{2}$ be the event that in the sequence $X^{\prime}|_{\mathcal{J}}$ there are at least $K$ distinct pairs from the set $\{(Z_{x_{1,j}},Z_{x_{1,j+1}}):0\leq j\leq d\}$ such that there is at least one generator from $\mathcal{J}_{2}$ that is between them. The reason for this definition is that if $\mathcal{C}_{2}$ occurs we can collect the first $K$ elements in

\{x_{1,j}:0\leq j\leq d,\text{ there is some $Z_{x_{2,l}}\in\mathcal{J}_{2}$ inserted between $Z_{x_{1,j}}$ and $Z_{x_{1,j+1}}$}\}

as the row indices of $M$ and the corresponding $x_{2,l}$ ’s as the column indices. This choice leads to an upper triangular $K\times K$ matrix $M$ where the top right entries are $\{\pm 1\}$ . Consequently, the induced matrix $M_{p}$ is still an upper triangular matrix that has full rank $K$ . This argument can be most easily explained through an example.

Example. For simplicity we assume $X|_{\mathcal{J}}=(Z_{1},Z_{2},\dots,Z_{6})$ . Let $d=3$ so that there are 2 colors. In particular, $Z_{1},Z_{2},Z_{3}$ are in color 2 while $Z_{4},Z_{5},Z_{6}$ are in color 1. Suppose when constructing $X^{\prime}|_{\mathcal{J}}$ we first sample a random permutation of $Z_{4},Z_{5},Z_{6}$ and get $(Z_{4},Z_{5},Z_{6})$ , and then inserting $Z_{1},Z_{2},Z_{3}$ to the current sequence, obtaining as a result

X^{\prime}|_{\mathcal{J}}=(Z_{4},Z_{1},Z_{5},Z_{2},Z_{6},Z_{3}).

It is easy to see that as a consequence of inserting $Z_{1}$ (of color 2) in between $Z_{4}$ and $Z_{5}$ (of color 1) in $X^{\prime}|_{\mathcal{J}}$ , the relative order of $(Z_{1},Z_{4})$ in $X^{\prime}|_{\mathcal{J}}$ is different from that in $X|_{\mathcal{J}}$ and hence $|\hat{m}_{41}|=1$ . Moreover, we can obtain an upper triangular $2\times 2$ submatrix

\begin{pmatrix}\hat{m}_{41}&\hat{m}_{42}\\ \hat{m}_{51}&\hat{m}_{52}\end{pmatrix}=\begin{pmatrix}\pm 1&\pm 1\\ 0&\pm 1\end{pmatrix}.

In general, for $2\leq i\leq n$ we can define $\mathcal{C}_{i}$ to be the event that in the sequence $X^{\prime}|_{\mathcal{J}}$ there are at least $K$ distinct pairs from the set

\{(Z_{x_{i_{1}},l_{1}},Z_{x_{i_{2}},l_{2}}):i_{1},i_{2}\leq i-1,l_{1},l_{2}\in[d]\text{ and }Z_{x_{i_{1}},l_{1}},Z_{x_{i_{2}},l_{2}}\text{ are consecutive in }X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i-1}}\}

such that there is at least one generator from $\mathcal{J}_{i}$ that is between them. If any of the events $\{\mathcal{C}_{i}:2\leq i\leq n\}$ occurred we would be able to find a $K\times K$ submatrix $M$ satisfying our condition and the set $\mathcal{K}$ (from the definition of $\mathcal{A}$ in Definition 10) by collecting the corresponding $K$ row indices of $M$ . Hence, the corresponding column indices of $M$ are in $\mathcal{K}^{c}$ , a fact we shall now use to verify condition (47) from the definition of $\mathcal{A}$ . One can easily check that for any $b\in\mathcal{K}$ ,

	$\displaystyle G_{2}\psi_{b}$	$\displaystyle=G_{2}\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}$
		$\displaystyle=G_{2}\sum_{a\in\mathcal{K}^{c}}\hat{m}_{ba}Z_{a,1}+G_{2}\sum_{a\in\mathcal{K}:a<b}\hat{m}_{ba}Z_{a,1}$

where the first term is uniform in $G_{\mathrm{ab}}$ and independent from $(G_{2}Z_{b,1})_{b\in\mathcal{K}}$ . Therefore, condition (47) is satisfied for this choice of $\mathcal{K}$ and $\mathcal{A}_{p}(\mathcal{K})$ occurs for all $p\in\Gamma$ , i.e., $\mathcal{A}$ occurs as long as $\cup_{i=2}^{n}\mathcal{C}_{i}$ occurs. Therefore, it remains to upper bound

\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\leq\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}^{c}_{i}).

For $1\leq i\leq n-1$ , let $\mathcal{G}_{i}$ be the $\sigma$ -field that encodes relative orders of generators in $\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}$ in $X^{\prime}|_{\mathcal{J}}$ . Then for $2\leq j\leq n$ ,

\mathbb{P}(\cap_{i=2}^{j}\mathcal{C}^{c}_{i}|\mathcal{G}_{j-1})=\mathbf{1}\{\cap_{i=2}^{j-1}\mathcal{C}_{i}\}\cdot\mathbb{P}(\mathcal{C}_{j}|\mathcal{G}_{j-1}).

The key is to observe that $\mathbb{P}(\mathcal{C}^{c}_{j}|\mathcal{G}_{j-1})=\mathbb{P}(\mathcal{C}^{c}_{j})$ is independent from $\mathcal{G}_{j-1}$ . Applying this iteratively gives that

\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}^{c}_{i})=\prod_{i=2}^{n-1}\mathbb{P}(\mathcal{C}_{i}^{c}).

We will calculate $\mathbb{P}(\mathcal{C}_{i+1}^{c})$ for $1\leq i\leq n-1$ . Looking at the distribution of the subsequence $X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i+1}}$ is equivalent to looking at the subsequence $X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}}$ and inserting the generators in $\mathcal{J}_{i+1}$ randomly and one by one to this subsequence. This perspective allows us to calculate $\mathbb{P}(\mathcal{C}^{c}_{i})$ via a multi-type urn scheme.

The subsequence $X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}}$ has length $di$ and we are inserting the generators from $\mathcal{J}_{i+1}$ . If we view the gap between two generators in the existing sequence $X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}}$ as a distinct type (also taking into account the gap before and behind all generators), then we would have $di+1$ types and there is one ball of each type in the urn. We will be conducting $d$ steps. At each step, we choose a ball randomly from the urn and place the ball together with a new ball of the same type back to the urn. It is not difficult to see this urn scheme is equivalent to inserting elements randomly to a sequence.

Our goal is to understand the probability of the event

\{\text{there are at most $K-1$ types with at least 2 balls after $d$ balls have been inserted}\},

which is the event equivalent to $\mathcal{C}^{c}_{i+1}$ in the urn model and hence has the same probability. This calculation can be simplified by first fixing the $di+1-(K-1)$ types each with at most one ball from all $di+1$ types and then group the rest $K-1$ types together to form a new type, called type 0. Then we have a new urn model with $di+1-K$ types (coming from the $di+1-(K-1)$ types and the new type 0) starting with $K-1$ balls of type 0 and one ball of each remaining type. The probability that we are only going to choose balls of type 0 (thus in the original urn model there are at most $K-1$ types that can potentially have at least 2 balls) is

\displaystyle\prod_{j=0}^{d-1}\frac{K-1+j}{di+1+j}=\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!}.

Therefore,

\mathbb{P}(\mathcal{C}_{i+1}^{c})\leq{di+1\choose K-1}\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!}\leq(di+1)^{K-1}\cdot\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!}.

Finally, we have

\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})=\prod_{i=1}^{n-1}\mathbb{P}(\mathcal{C}_{i+1}^{c})\leq\left(\prod_{i=1}^{n-1}(di+1)^{K-1}\right)\cdot\prod_{i=1}^{n-1}\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!},

where the second product is a telescoping product and hence can be simplified to yield

$\displaystyle\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})$	$\displaystyle\leq\exp(Kn\log(nd))\cdot\frac{d!}{(nd)!}\cdot\frac{((K+d-2)!)^{n-1}}{((K-2)!)^{n-1}}$
	$\displaystyle=\exp(Kn\log(nd))\cdot\frac{(d!)^{n}}{(nd)!}\cdot\left(\frac{(K+d-2)!}{d!(K-2)!}\right)^{n-1}$
	$\displaystyle\leq\exp(Kn\log(nd))\cdot(K+d-2)^{(K-2)(n-1)}\cdot\frac{(d!)^{n}}{(nd)!}$	(73)

The key is to understand $\varphi(d):=\frac{(d!)^{n}}{(nd)!}$ . By stirling’s formula we have

\varphi(d)\lesssim\frac{(2\pi d)^{n/2-1}}{n^{1/2}}\cdot\exp(-nd\log n).

We collect the first terms as

\exp(Kn\log(nd))\cdot(K+d-2)^{(K-2)(n-1)}\cdot\frac{(2\pi d)^{n/2-1}}{n^{1/2}}\leq\exp(C_{K}n\log(nd))

for some constant $C_{K}>0$ . Recall that $nd\approx|\mathcal{J}|$ . We will choose $d\gg 1$ which implies that

n\log(nd)\ll nd\log n,

(74)

and consequently, for arbitrarily small $\delta>0$ , when $n$ is sufficiently large we have

\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})\leq\exp(-(1-\delta)nd\log n)

(75)

Recall from (29) and Definition 5 that when $k\gtrsim\log|G_{\mathrm{ab}}|$ , conditioning on $\mathrm{\mathbf{typ}}$ ensures $|\mathcal{J}|\geq(1-\varepsilon/2)te^{-t/k}\geq(1-\varepsilon)t$ .

•

In the regime $k\eqsim\lambda\log|G_{\mathrm{ab}}|\asymp\log|G|$ , the above implies that $|\mathcal{J}|\asymp k$ . Since $nd\approx|\mathcal{J}|$ we have

$\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})\leq\exp(-(1-\delta)nd\log n)=o(|G|^{-1})$

as long as $n$ is a sufficiently large constant.

•

For the regime $k\gg\log|G_{\mathrm{ab}}|$ , we have $t/k\ll 1$ . By typicality, $|\mathcal{J}|\geq(1-\varepsilon/2)te^{-t/k}\geq(1-\varepsilon)t$ . Recall that we write $\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}$ .

–

When $t_{*}=t_{0}$ and $t\geq(1+3\varepsilon)t_{0}$ , we can choose $n$ so that $nd\geq(1-\varepsilon/2)|\mathcal{J}|\geq(1+\varepsilon)t_{0}$ . Recall that in this regime $\frac{\rho}{\rho-1}\geq\frac{\log|G|}{\log|G_{\mathrm{ab}}|}$ , i.e., $\frac{1}{\rho-1}\geq\frac{\log|G_{2}|}{\log|G_{\mathrm{ab}}|}$ , and $t_{0}\eqsim k\cdot\frac{1}{\kappa\log\kappa}$ where $\kappa=k/\log|G_{\mathrm{ab}}|$ .

Letting $1\ll d\ll t_{0}^{\varepsilon/8}$ and $\delta=\varepsilon/4$ , the failure probability given by (75) is at most

	$\displaystyle\exp(-(1-\delta)nd\log n)$	$\displaystyle\leq\exp(-(1-\delta)(1+\varepsilon)t_{0}\log(t_{0}/d))\leq\exp(-(1+\varepsilon/2)t_{0}\log t_{0})$
		$\displaystyle\leq\exp\left(-(1+\varepsilon/4)\frac{\log\|G_{\mathrm{ab}}\|}{\rho-1}\right)\leq\|G_{2}\|^{-(1+\varepsilon/4)},$

where in the second to the last inequality we use the fact that

t_{0}\log t_{0}\eqsim\log|G_{\mathrm{ab}}|\cdot\frac{\log\log|G_{\mathrm{ab}}|-\log\log(k/\log|G_{\mathrm{ab}|})}{\log(k/\log|G_{\mathrm{ab}}|)}=\frac{\log|G_{\mathrm{ab}}|}{\rho-1}(1-o(1)).

–

When $t_{*}=t_{1}=\log_{k}|G|$ and $t\geq(1+3\varepsilon)t_{1}$ , we have $nd\geq(1+\varepsilon)t_{1}$ and similarly the failure probability is at most

\exp(-(1-\delta)nd\log n)\leq\exp(-(1+\varepsilon/2)t_{1}\log t_{1})\leq\exp\left(-(1+\varepsilon/4)\frac{\log|G|}{\rho}\right)=|G|^{-(1+\varepsilon/4)/\rho},

where the last inequality holds because $\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}\eqsim\frac{\log k}{\log\log|G|}$ and $t_{1}\log t_{1}=(1-o(1))t_{1}\log\log|G|$ .

Therefore, we have proved that $\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\ll e^{h}/|G|$ (see Definition 4 for the value of $h$ in each regime) and thus completed the proof of Proposition 10.

References

[1] David Aldous and Persi Diaconis. Shuffling cards and stopping times. The American Mathematical Monthly, 93(5):333–348, 1986.
[2] Nathanaël Berestycki. Mixing times of markov chains: Techniques and examples. Alea-Latin American Journal of Probability and Mathematical Statistics, 2016.
[3] Nathanaël Berestycki, Eyal Lubetzky, Yuval Peres, and Allan Sly. Random walks on the random graph. The Annals of Probability, 46(1):456–490, 2018.
[4] Charles Bordenave, Pietro Caputo, and Justin Salez. Cutoff at the “entropic time” for sparse markov chains. Probability Theory and Related Fields, 173:261–292, 2019.
[5] Charles Bordenave and Hubert Lacoin. Cutoff at the entropic time for random walks on covered expander graphs. Journal of the Institute of Mathematics of Jussieu, 21(5):1571–1616, 2022.
[6] Emmanuel Breuillard and Matthew CH Tointon. Nilprogressions and groups with moderate growth. Advances in Mathematics, 289:1008–1055, 2016.
[7] Wai-Sin Ching. Linear equations over commutative rings. Linear Algebra and its Applications, 18(3):257–266, 1977.
[8] Don Coppersmith and Igor Pak. Random walk on upper triangular matrices mixes rapidly. Probability theory and related fields, 117:407–417, 2000.
[9] Persi Diaconis and Robert Hough. Random walk on unipotent matrix groups. In Annales scientifiques de lÉcole normale supérieure, volume 54, 2021.
[10] Persi Diaconis and Laurent Saloff-Coste. Moderate growth and random walk on finite groups. Geometric & Functional Analysis GAFA, 4:1–36, 1994.
[11] Carl Dou and Martin Hildebrand. Enumeration and random random walks on finite groups. The Annals of Probability, 24(2):987–1000, 1996.
[12] Carl CZ Dou. Studies of random walks on groups and random graphs. PhD thesis, Massachusetts Institute of Technology, 1992.
[13] David Steven Dummit and Richard M Foote. Abstract algebra, volume 3. Wiley Hoboken, 2004.
[14] Daniel El-Baz and Carlo Pagano. Diameters of random cayley graphs of finite nilpotent groups. Journal of Group Theory, 24(5):1043–1053, 2021.
[15] J Ellenberg. A sharp diameter bound for upper triangular matrices. Senior Honors Thesis, Department of Mathematics, Harvard University, 1993.
[16] Jordan S Ellenberg and Julianna Tymoczko. A sharp diameter bound for unipotent groups of classical type over $\mathbb{Z}/p\mathbb{Z}$ . 2010.
[17] J Hermon and S Olesker-Taylor. Cutoff for almost all random walks on abelian groups (2021). arXiv preprint arXiv:2102.02809, 2021.
[18] Jonathan Hermon and Sam Olesker-Taylor. Supplementary material for random cayley graphs project. arXiv preprint arXiv:1810.05130, 2018.
[19] Jonathan Hermon and Sam Olesker-Taylor. Cutoff for random walks on upper triangular matrices. arXiv preprint arXiv:1911.02974, 2019.
[20] Jonathan Hermon and Sam Olesker-Taylor. Further results and discussions on random cayley graphs. arXiv preprint arXiv:1911.02975, 2019.
[21] Jonathan Hermon and Sam Olesker-Taylor. Geometry of random cayley graphs of abelian groups. arXiv preprint arXiv:2102.02801, 2021.
[22] Martin Hildebrand. Random walks supported on random points of $\mathbb{Z}/n\mathbb{Z}$ . Probability Theory and Related Fields, 100:191–203, 1994.
[23] Martin Hildebrand. A survey of results on random random walks on finite groups. 2005.
[24] Robert Hough. Mixing and cut-off in cycle walks. Electronic Journal of Probability, 22(none):1 – 49, 2017.
[25] Russell Lyons and Yuval Peres. Probability on trees and networks, volume 42. Cambridge University Press, 2017.
[26] Evita Nestoridi. Super-character theory and comparison arguments for a random walk on the upper triangular matrices. Journal of Algebra, 521:97–113, 2019.
[27] Evita Nestoridi and Allan Sly. The random walk on upper triangular matrices over $\mathbb{Z}/m\mathbb{Z}$ . arXiv preprint arXiv:2012.08731, 2020.
[28] Igor Pak et al. Two random walks on upper triangular matrices. Journal of Theoretical Probability, 13(4):1083–1100, 2000.
[29] Yuval Peres and Allan Sly. Mixing of the upper triangular matrix walk. Probability Theory and Related Fields, 156(3-4):581–591, 2013.
[30] Yuval Roichman. On random random walks. The Annals of Probability, 24(2):1001–1011, 1996.
[31] Richard Stong. Random walks on the groups of upper triangular matrices. The Annals of Probability, pages 1939–1949, 1995.
[32] EL Wilmer, David A Levin, and Yuval Peres. Markov chains and mixing times. American Mathematical Soc., Providence, 2009.
[33] David Bruce Wilson. Random random walks on $\mathbb{Z}_{2}^{d}$ . Probability Theory and Related Fields, 108:441–457, 1997.

	$\displaystyle 2\\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\\|_{\mathrm{TV}}$	$\displaystyle=\sum_{i=1}^{\|G_{\mathrm{ab}}\|}\sum_{x\in G_{2}g_{i}}\bigg{\|}\mathbb{P}_{\pi_{G_{2}}}(X_{t}=x)-\frac{1}{\|G\|}\bigg{\|}=\sum_{i=1}^{\|G_{\mathrm{ab}}\|}\bigg{\|}\mathbb{P}_{\mathrm{id}}(X_{t}\in G_{2}g_{i})-\frac{\|G_{2}\|}{\|G\|}\bigg{\|}$
		$\displaystyle=\sum_{i=1}^{\|G_{\mathrm{ab}}\|}\bigg{\|}\mathbb{P}_{G_{2}}(Y_{t}=G_{2}g_{i})-\frac{1}{\|G_{\mathrm{ab}}\|}\bigg{\|}$
		$\displaystyle=2\\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\\|_{\mathrm{TV}}.$

	$\displaystyle d_{S}(t):=\\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\\|_{\mathrm{TV}}$	$\displaystyle\leq\\|\mathbb{P}_{S}(X(t)\in\cdot\|W(t)\in\mathcal{W})-\pi_{G}\\|_{\mathrm{TV}}+\mathbb{P}(W(t)\notin\mathcal{W})$
	$\displaystyle 4\\|\mathbb{P}_{S}(X(t)\in\cdot\|W(t)\in\mathcal{W})-\pi_{G}\\|^{2}_{\mathrm{TV}}$	$\displaystyle\leq\|G\|\cdot\mathbb{P}_{S}(X(t)=X^{\prime}(t)\|W(t),W^{\prime}(t)\in\mathcal{W})-1$

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$	$\displaystyle=\mathbb{P}(\mathcal{E}_{\ell},G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)f^{(\ell-1)}=G_{\ell+1}\bigg{\|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle\leq\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}=G_{\ell+1}g\bigg{\|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}g\bigg{\|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}),$

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$	$\displaystyle\leq\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}g\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})$
		$\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\widetilde{\mathcal{H}}),$

	$\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}\|\widetilde{\mathcal{H}})$	$\displaystyle=\mathbb{E}[\mathbb{P}(\mathcal{E}_{\ell+1}\|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\|\widetilde{\mathcal{H}}]$
		$\displaystyle\leq\mathbb{E}[\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\widetilde{\mathcal{H}})\|\widetilde{\mathcal{H}}]=\mathbb{P}(\mathcal{E}_{\ell}\|\widetilde{\mathcal{H}})\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}\|\widetilde{\mathcal{H}}).$		(42)

Cutoff for random Cayley graphs of nilpotent groups

Abstract.

Key words and phrases:

2020 Mathematics Subject Classification:

1. Introduction

1.1. Motivation and Objectives of Paper

1.1.1. Motivations

Conjecture (Aldous and Diaconis, [1]).

1.1.2. Objectives

1.2. Definitions and Notation

Definition 1 (Cayley multi-graph generated by a set of generators).

1.3. Overview of Main Results

Remark 1.

1.3.1. Cutoff for Random Walks on Nilpotent Groups

Definition 2.

Definition 3.

Theorem 1.

1.3.2. Random Walk on Non-random Cayley Graphs: Reduction to Abelianization

Theorem 2.

Remark 2.

Corollary 1.

Theorem 3.

Theorem 4.

1.3.3. Our Methodology

1.4. Historic Overview

1.4.1. Random Walks on Unipotent Matrix Groups

1.4.2. The Entropic Methodology

2. Geometry of Cayley Graphs

Proposition 1.

Proof.

Theorem 4.

Corollary 2.

Remark 3.

Proof.

2.1. Preliminaries

Proposition 2.

Proposition 3.

Proposition 4.

Corollary 3.

Proof.

Corollary 4.

Remark 4.

Proof.

2.2. Proof of Theorem 4

Lemma 2.1.

Proof of Theorem 4.

3. Reduction to Abelianization

Theorem 2.

Corollary 1.

Theorem 3.

Proof.

3.1. Proofs

Lemma 3.1.

Proof.

Lemma 3.2.

Remark 5.

Proof.

4. On Cayley Graphs with Random i.i.d. Generators

Theorem 1.

4.1. Entropic Method and Entropic Times

Lemma 4.1 (Lemma 2.6 of [17]).

4.1.1. Asymptotics of Entropic Times

Proposition 5.

4.1.2. Typical Event

Definition 4.

Remark 6.

Lemma 4.2 (Lemma 3.9 in [19]).

Definition 5.

Lemma 4.3.

Proof.

4.2. Lower Bound on Mixing Time

Proposition 6 (Proposition 2.3 in [19]).

Lemma 4.4.

Proof.

4.3. Representation of X​(t)X(t)

4.4. Upper Bound on Mixing Time

Theorem 5.

Definition 6.

Lemma 4.5 (Corollary 6.4 in [17]).

4.4.1. Proof Framework for Theorem 5

4.3. Representation of $X(t)$

4.9.1. Regime: $\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|$

4.9.2. Regime: $k\gtrsim\log|G_{\mathrm{ab}}|$