This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cutoff for random Cayley graphs of nilpotent groups

Jonathan Hermon1 [email protected] 1Department of Mathematics, University of British Columbia, BC, Canada  and  Xiangying Huang2 [email protected] 2Department of Statistics and Operations Research, 304 Hanes Hall, University of North Carolina at Chapel Hill, US
Abstract.

We consider the random Cayley graphs of a sequence of finite nilpotent groups of diverging sizes G=G(n)G=G(n), whose ranks and nilpotency classes are uniformly bounded. For some k=k(n)k=k(n) such that 1logklog|G|1\ll\log k\ll\log|G|, we pick a random set of generators S=S(n)S=S(n) by sampling kk elements Z1,,ZkZ_{1},\ldots,Z_{k} from GG uniformly at random with replacement, and set S:={Zj±1:1jk}S:=\{Z_{j}^{\pm 1}:1\leq j\leq k\}. We show that the simple random walk on Cay(G,S)(G,S) exhibits cutoff with high probability.

Some of our results apply to a general set of generators. Namely, we show that there is a constant c>0c>0, depending only on the rank and the nilpotency class of GG, such that for all symmetric sets of generators SS of size at most clog|G|loglog|G|\frac{c\log|G|}{\log\log|G|}, the spectral gap and the ε\varepsilon-mixing time of the simple random walk X=(Xt)t0X=(X_{t})_{t\geq 0} on Cay(G,S)(G,S) are asymptotically the same as those of the projection of XX to the abelianization of GG, given by [G,G]Xt[G,G]X_{t}. In particular, XX exhibits cutoff if and only if its projection does.

Key words and phrases:
cutoff, mixing times, random walk, random Cayley graphs, nilpotent groups
2020 Mathematics Subject Classification:
Primary: 05C48, 05C80, 05C81; 20D15; 60B15, 60J27, 60K37.

1. Introduction

1.1. Motivation and Objectives of Paper

We examine the random walk (RW) X=(Xt)t0X=(X_{t})_{t\geq 0} on a Cayley graph Γ:=Cay(G,S)\Gamma:=\mathrm{Cay}(G,S) of a finite nilpotent group GG w.r.t. a symmetric set of generators SS. We are interested in the asymptotic behavior of the mixing time and the spectral gap of the walk as |G||G|\to\infty, while the step (also called nilpotency class) of GG and its rank (minimal size of a set of generators), denoted respectively by L=L(G)L=L(G) and r=r(G)r=r(G), remain bounded.

We also investigate the occurrence of the cutoff phenomenon for the walk, which in the above setup can only occur when |S|1|S|\gg 1 (i.e., |S||S| diverges as |G||G|\to\infty). In particular, we prove that the walk exhibits cutoff with high probability when SS is obtained by picking kk elements of GG, Z1,,ZkZ_{1},\ldots,Z_{k}, uniformly at random, with replacement, and then setting S:={Zi±1:1ik}S:=\{Z_{i}^{\pm 1}:1\leq i\leq k\} under the necessary condition that 1logklog|G|1\ll\log k\ll\log|G|.

The overarching theme of this work is that in a certain strong quantitative sense the mixing behavior of the walk is governed by that of the projection of the walk to the abelianization of GG, denoted by Gab:=G/[G,G]G_{\mathrm{ab}}:=G/[G,G], where [G,G][G,G] is the commutator subgroup of GG. The projected walk Yt:=[G,G]XtY_{t}:=[G,G]X_{t} is a random walk on the projected Cayley graph Γab:=Cay(Gab,S[G,G])\Gamma_{\mathrm{ab}}:=\mathrm{Cay}(G_{\mathrm{ab}},S_{[G,G]}), where S[G,G]:={[G,G]s:sS}S_{[G,G]}:=\{[G,G]s:s\in S\} is the projection of SS to GabG_{\mathrm{ab}}.

As will be stated precisely in Theorem 2 and Corollary 1, for arbitrary symmetric SS such that |S|clog|G|loglog|G||S|\leq\frac{c\log|G|}{\log\log|G|} for some constant c=c(r,L)>0c=c(r,L)>0 depending only on the rank and step of GG, the relaxation times (inverses of the spectral gaps) of Γ=Cay(G,S)\Gamma=\mathrm{Cay}(G,S) and Γab=Cay(Gab,S[G,G])\Gamma_{\mathrm{ab}}=\mathrm{Cay}(G_{\mathrm{ab}},S_{[G,G]}) are equal:

trelG=trelGab,t^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}},

and the total variation mixing times of the random walks on Γ\Gamma and Γab\Gamma_{\mathrm{ab}} satisfy

tmixGab(ε)tmixG(ε)tmixGab(εδ)t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)\leq t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)

for ε(0,1)\varepsilon\in(0,1) and δ=δ(|G|):=|G|exp((log|G|)L)\delta=\delta(|G|):=|G|\exp\left(-\left(\log|G|\right)^{L}\right). In other words, in order to determine the mixing time or the occurrence of cutoff for the walk on Γ\Gamma, it suffices to do so for Γab\Gamma_{\mathrm{ab}}.

In the case that S={Zi±1:1ik}S=\{Z_{i}^{\pm 1}:1\leq i\leq k\}, where Z1,,ZkZ_{1},\ldots,Z_{k} are i.i.d. and uniformly distributed over GG, we extend the analysis to the regime 1logklog|G|1\ll\log k\ll\log|G| and prove that the random walk exhibits cutoff with high probability around time max{t0(|Gab|,k),logk|G|}\max\{t_{0}(|G_{\mathrm{ab}}|,k),\log_{k}|G|\}, where t0(n,k)t_{0}(n,k) is defined as the time at which the entropy of the rate 1 continuous time simple random walk on k\mathbb{Z}^{k} is nn.

1.1.1. Motivations

To motivate our investigation, let us consider for now the scenario where GG is a finite group, kk is an integer (allowed to depend on GG) and 𝒢k\mathcal{G}_{k} denotes the Cayley graph of GG with respect to kk independently and uniformly chosen random generators. These are elements of GG that will generate GG with high probability when kk is sufficiently large and hence, with slight abuse of language, will be referred to as generators of GG. We consider values of kk with 1logklog|G|1\ll\log k\ll\log|G| for which 𝒢k\mathcal{G}_{k} is connected with high probability, that is, with probability tending to 1 as |G||G|\to\infty.

Universality of cutoff.

Aldous and Diaconis [1] introduced the term “cutoff phenomenon”, which describes the phenomenon where the total variation distance (TV) between the distribution of a random walk and its equilibrium distribution sharply decreases from nearly 1 to nearly 0 within a time interval of smaller order than the mixing time. The material in this paper is motivated by their conjecture on the “universality of cutoff” for the RW on the random Cayley graph 𝒢k\mathcal{G}_{k} given in [1].

Conjecture (Aldous and Diaconis, [1]).

For any group GG, if klog|G|k\gg\log|G| and logklog|G|\log k\ll\log|G|, then the random walk on 𝒢k\mathcal{G}_{k} exhibits cutoff with high probability.

Additionally, a secondary aspect of the conjecture suggests that the cutoff time does not rely on the algebraic structure of the group, but rather it can be expressed solely as a function of kk and |G||G|.

This conjecture has sparked a substantial body of research, see e.g., [11, 12, 22, 23, 24, 30, 33]. The curious reader is referred to Section 1.3.1 in [17] for a detailed exposition on the literature regarding the progress on the Aldous-Diaconis university conjecture. In this context, we provide a more condensed overview of literature related to this conjecture, which serves as motivations for our work.

In [11, 12], Dou and Hildebrand confirmed the Aldous-Diaconis universality conjecture for all abelian groups. Additionally, their upper bound on the mixing time holds true for all groups. Furthermore, when logkloglog|G|\log k\gg\log\log|G|, this upper bound matches the trivial diameter lower bound of logk|G|\log_{k}|G|, confirming the aforementioned secondary aspect of the conjecture. Hermon and and Olesker-Taylor [17, 18, 20, 21] extend this conjecture to the regime 1klog|G|1\ll k\lesssim\log|G| for abelian groups, establishing cutoff under the condition kr(G)1k-r(G)\gg 1, where r(G)r(G) is the minimal size of a generating subset of GG. Moreover, when kr(G)k1k-r(G)\asymp k\gg 1, the cutoff time is given by

max{logk|G|,t0(k,|G|)},\max\{\log_{k}|G|,t_{0}(k,|G|)\}, (1)

where t0(k,|G|)t_{0}(k,|G|) is the time at which the entropy of the rate 1 random walk WW on k\mathbb{Z}^{k} is log|G|\log|G|. Due to this definition t0(k,|G|)t_{0}(k,|G|) is also referred to as the entropic time (see Definition 3). Their work confirms that for abelian groups the cutoff time of RW only depends on |G||G| and kk.

Building upon this point, the next natural step is to explore the mixing behavior of random walks on nilpotent groups (see Section 1.4.1 for a brief overview of the literature concerning this topic). Hermon and Olesker-Taylor [19] study two canonical families of nilpotent groups: the d×dd\times d unit-upper triangular matrices Um,dU_{m,d} with entries in m\mathbb{Z}_{m} and the dd-dimensional Heisenberg group Hm,dH_{m,d} over m\mathbb{Z}_{m} where mm\in\mathbb{N} (the results hold under certain assumptions on mm depending on the regimes of kk). Let GG be either Um,dU_{m,d} or Hm,dH_{m,d} and Gab:=G/[G,G]G_{\mathrm{ab}}:=G/[G,G] its abelianization. They prove that for 1logklog|G|1\ll\log k\ll\log|G| the random walk on the Cayley graph 𝒢k\mathcal{G}_{k} exhibits cutoff with high probability at time

max{logk|G|,t0(k,|Gab|)},\max\{\log_{k}|G|,t_{0}(k,|G_{\mathrm{ab}}|)\}, (2)

where t0(k,|Gab|)t_{0}(k,|G_{\mathrm{ab}}|) is the time at which the entropy of the rate 1 random walk WW on k\mathbb{Z}^{k} is log|Gab|\log|G_{\mathrm{ab}}|. We compare (2) with (1), the latter of which gives the characterization of cutoff time for abelian groups. Indeed, t0(k,|Gab|)t_{0}(k,|G_{\mathrm{ab}}|) can be interpreted as the time at which the projection of the walk on GG onto the abelianization GabG_{\mathrm{ab}} exhibits cutoff. This characterization of the cutoff time for random walks on Um,dU_{m,d} and Hm,dH_{m,d} raises a natural question: does (2) characterize the cutoff time for random walks on nilpotent groups in general? This question will be explored in our investigation.


Another natural extension of the current research on random walks on groups involves extending the choice of generators. Rather than requiring the kk generators to be chosen independently and uniformly at random from the group GG, the aim is to advance our understanding to encompass scenarios involving arbitrary choices of generators.

More often than not, the analysis of the mixing of the random walk heavily replies on the specific selection of generators. For example, there is a line of research focusing on understanding the mixing properties of random walks on the unit upper triangular matrix group Up,dU_{p,d} with pp prime, wherein the set of generators is either {I±Ei,i+1:1id1}\{I\pm E_{i,i+1}:1\leq i\leq d-1\} or {I+aEi,i+1:1id1,ap}\{I+aE_{i,i+1}:1\leq i\leq d-1,a\in\mathbb{Z}_{p}\}, where Ei,jE_{i,j} represents the d×dd\times d matrix with 1 at the entry (i,j)(i,j) and 0 elsewhere. See Section 1.4.1 for an overview of existing research. The current analysis critically hinges on the fact that an operation I+aEi,i+1I+aE_{i,i+1} corresponds to a row addition/subtraction, allowing the decomposition of the walk’s mixing behavior into the first row and the remaining part of the matrix, the latter of which can be regarded as a (d1)×(d1)(d-1)\times(d-1) matrix. However, such methodologies become inapplicable when dealing with arbitrary generators. It is hence of interest to develop techniques that enable the study of mixing properties of random walks on Cayley graphs using a wider range of prescribed generators.

Leading role of the abelianization in random walk mixing.

When GG is a nilpotent group and SS is a symmetric set of generators consisting of i.i.d. uniform random elements of GG, we shall see in Theorem 1 that the mixing time of the RW is determined by the abelianization GabG_{\mathrm{ab}} of GG, given by the expression (2).

When the generator set is predetermined, in many instances it has also been demonstrated that the mixing time of random walks on groups is primarily determined by the abelianization of the group. In a recent and notable study, Diaconis and Hough [9] introduced a novel approach to establishing a central limit theorem for random walks on unipotent matrix groups driven by a probability measure μ\mu, under certain general constraints. It is worth noting that their methodology applies to various choices of generators. For unit-upper triangular matrices Up,dU_{p,d}, it has been established that an individual coordinate on the kk-th diagonal mixes in order p2/kp^{2/k} steps, implying the leading role of abelianization (which corresponds to the first diagonal) in the mixing process of this random walk.

Nestoridi and Sly [27] studied the mixing behavior of the rate 1 RW on Um,dU_{m,d} under the canonical set of generators {I±Ei,i+1:i[d1]}\{I\pm E_{i,i+1}:i\in[d-1]\}. In their analysis, it is proved that the mixing time of the RW on Um,dU_{m,d} is bounded by O(m2dlogd+d2mo(1))O(m^{2}d\log d+d^{2}m^{o(1)}), where the former term vaguely characterizes the mixing behavior of the RW on the abelianization. This observation becomes clearer when we consider the projected walk on the abelianization, which, under the canonical set of generators, can be viewed as a product chain on md1\mathbb{Z}_{m}^{d-1} and thus has mixing time of order m2dlogdm^{2}d\log d. Essentially, when mm is considerably larger than dd, this upper bound is predominantly dictated by the mixing on the abelianization.

One might naturally inquire about the extent to which the abelianization dictates the mixing time of the RW on a general group. In addition, there is interest in explicitly identifying the dependence of the mixing time on the abelianization, which leads to our next motivation.

“Entropic time paradigm”.

As previously discussed, although the entropic time is the mixing time for “most” choice of generators (when 1klog|G|1\ll k\lesssim\log|G|) for abelian groups and nilpotent groups, finding an explicit choice of generators which gives rise to cutoff at the entropic time is still open — even for the cyclic group of prime order. Part of our motivation is to understand the extent to which this paradigm applies for a given set of generators.

It is worth pointing out that for general non-random choice of generators, the cutoff time is not necessarily given by the entropic time. For instance, Hough [24, Theorem 1.11] shows that for the cyclic group p\mathbb{Z}_{p} of prime order the choice of generators S:={0}{±2i:0ilog2p1}S:=\{0\}\cup\{\pm 2^{i}:0\leq i\leq\lfloor\log_{2}p\rfloor-1\}, which he describes as “an approximate embedding of the classical hypercube walk into the cycle”, gives rise to a random walk on p\mathbb{Z}_{p} that ehxibits cutoff, where the cutoff time is not the entropic time.

Mixing under minimal sets of generators.

A minimal set of generators is a set of elements that generates the group which is minimal in terms of size. For pp-groups, it is known that the minimal sets of generators can be described by the Frattini subgroup Φ=Φ(G)\Phi=\Phi(G) in the sense that any set {x1,,xr}G\{x_{1},\dots,x_{r}\}\subseteq G such that the cosets {Φxi:1ir}\{\Phi x_{i}:1\leq i\leq r\} form a basis of G/ΦG/\Phi gives a generating set of GG. See, e.g., Diaconis and Saloff-Coste [10, Section 5.C]. A random walk supported on the minimal set of generators is thus referred to as a Frattini walk. Examples of such walks are discussed in Section 5.C of [10]. For the Heisenberg group Hp,3H_{p,3} with prime pp, it can be shown that all minimal sets of generators are equivalent from a group theory approach.

Diaconis and Saloff-Coste additionally remarked that based on their experience with the circle and symmetric group, if the number of generators is fixed, most sets of generators should lead to the same convergence rate for the random walk. Motivated by these examples, they ask the following open question (see Remark 2 on Page 23 of [10]): to what extent does the choice of generators effect the mixing behavior?

We give a neat partial answer to this question in Theorem 3. Suppose that GG is a pp-group with Gabpα1pαrG_{\mathrm{ab}}\cong\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}} or that GabmrG_{\mathrm{ab}}\cong\mathbb{Z}_{m}^{r} for some mm\in\mathbb{N}. Under very mild assumptions on the rank and step of GG, for all minimal (symmetric) sets of generators, the corresponding mixing times on GG are the same up to smaller order terms and the corresponding relaxation times are the same.

1.1.2. Objectives

Motivated by the questions discussed in the preceding section, our primary focus in this paper is as follows.

(i) Study the random walk on 𝒢k\mathcal{G}_{k} for general nilpotent groups. Expanding upon the current understanding of random walks on groups, our goal is to establish cutoff for random walks on 𝒢k\mathcal{G}_{k} when GG is a nilpotent group and 1logklog|G|1\ll\log k\ll\log|G|. In particular, we are interested in a general characterization of the cutoff time. An important implication of the findings in [19] is that for certain regimes of kk, the cutoff time for G=Um,dG=U_{m,d} (or G=Hm,dG=H_{m,d}) does not depend only on kk and |G||G|. Nevertheless, the only additional information required to determine the cutoff times for these two examples is the size of the abelianization, as indicated by (2). We hope to generalize the characterization of the cutoff time in (2) to general nilpotent groups.

We thank Péter Varjú for suggesting us the problem of extending the analysis from [19] to other111Namely, the case that GG is step 2 and GabprG_{\mathrm{ab}}\cong\mathbb{Z}_{p}^{r}, i.e., GabG_{\mathrm{ab}} is elementary abelian. We also wish to thank him for providing invaluable insights regarding how certain components of the argument from [19] could be interpreted in terms of the general theory of nilpotent groups. nilpotent groups.

(ii) Develop techniques applicable when the generators are chosen arbitrarily. As indicated by previous discussions, the mixing time of random walks on a group under various choices of generators is largely determined by the abelianization of the group. We aim to explore the extent to which this leading role of the abelianization holds in a broader context.

In essence, our objective is to develop techniques for studying the mixing properties of random walks, applicable not only under arbitrary choices of generators but also for general groups, without dependence on specific group structures.

1.2. Definitions and Notation

We give the precise definitions of the Cayley graph on GG and the random walk on Cayley graphs.

Let GG be a nilpotent group with lower central series

G=G1G2GLGL+1={id}G=G_{1}\trianglerighteq G_{2}\trianglerighteq\cdots\trianglerighteq G_{L}\trianglerighteq G_{L+1}=\{\mathrm{id}\}

where Gi+1:=[Gi,G]={[g,g]:gGi,gG}G_{i+1}:=[G_{i},G]=\langle\{[g,g^{\prime}]:g\in G_{i},g^{\prime}\in G\}\rangle. In particular, G2=[G,G]G_{2}=[G,G] denotes the commutator subgroup of GG. We also denote by Gab:=G/G2G_{\mathrm{ab}}:=G/G_{2} the abelianization of GG. The rank of a nilpotent group GG, denoted by r=r(G)r=r(G), is the smallest integer rr such that GG can be generated by a set containing rr elements of GG and their inverses. The number L=L(G)L=L(G) is called the step (or the nilpotency class) of GG, i.e., |GL|>1|G_{L}|>1 and |GL+1|=1|G_{L+1}|=1.

For a finite group GG, let SGS\subseteq G be a symmetric subset, i.e., sSs\in S if and only if s1Ss^{-1}\in S. We will refer to SS as the set of generators when SS generates GG. The undirected Cayley graph of GG generated by SS is defined as follows.

Definition 1 (Cayley multi-graph generated by a set of generators).

Fix a symmetric set S:={si±1:i[k]}GS:=\{s_{i}^{\pm 1}:i\in[k]\}\subseteq G of generators. Let Cay(G,S)\text{Cay}(G,S) denote the (right) Cayley multi-graph generated by GG with respect to SS, where the vertex set 𝕍:={g:gG}\mathbb{V}:=\{g:g\in G\} and the edge set 𝔼:={{g,gs}:gG,sS}\mathbb{E}:=\{\{g,gs\}:g\in G,s\in S\}. We allow parallel edges and self loops (if idS\mathrm{id}\in S) so that the Cayley graph Cay(G,S)\text{Cay}(G,S) is regular with degree 2k2k.

Random walk on Cayley graphs. We will consider the undirected random walk XtX_{t} on the Cayley graph Cay(G,S)\text{Cay}(G,S) which jumps at rate 1, where S:={si±1:i[k]}S:=\{s_{i}^{\pm 1}:i\in[k]\}. Let {σi}i\{\sigma_{i}\}_{i\in\mathbb{N}} be an i.i.d. sequence of indices uniformly sampled from [k][k], and let {ηi}i\{\eta_{i}\}_{i\in\mathbb{N}} be an i.i.d. sequence of signs uniformly sampled from {±1}\{\pm 1\}. At the ii-th jump, the generator sσiηis_{\sigma_{i}}^{\eta_{i}} is applied to the walk XX in the sense that we multiply sσiηis_{\sigma_{i}}^{\eta_{i}} to the right of the current location of XX. That is, the random walk XX can be written as a sequence

X=i=1Nsσiηi=sσ1η1sσ2η2sσNηN,X=\prod_{i=1}^{N}s^{\eta_{i}}_{\sigma_{i}}=s_{\sigma_{1}}^{\eta_{1}}s_{\sigma_{2}}^{\eta_{2}}\cdots s_{\sigma_{N}}^{\eta_{N}},

where N:=N(t)N:=N(t) is the number of steps taken by XX by time tt and sσiηis^{\eta_{i}}_{\sigma_{i}} denotes the ii-th step taken by the random walk with σi[k],ηi{±1}\sigma_{i}\in[k],\eta_{i}\in\{\pm 1\}.

Notation. Throughout the paper, we use standard asymptotic notation: “\ll” or “o()o(\cdot)” means “of smaller order”; “\lesssim” or “𝒪()\mathcal{O}(\cdot)” means “of order at most”; “\asymp” means “of the same order”; “\eqsim” means “asymptotically equivalent”. We will abbreviate “with high probability” by whp.

Assumptions. Throughout the paper, we will let GG be a finite nilpotent group of step L2L\geq 2 and rank rr where r,L1r,L\asymp 1.

1.3. Overview of Main Results

We focus on the mixing behavior of the random walk on a Cayley graph Cay(G,S)\text{Cay}(G,S) of a finite nilpotent group GG with a symmetric generator set S={si±1:i[k]}S=\{s_{i}^{\pm 1}:i\in[k]\}. We consider the limit as |G||G|\to\infty under the assumption that 1logklog|G|1\ll\log k\ll\log|G|. The condition 1logklog|G|1\ll\log k\ll\log|G| is necessary for the random walk to exhibit cutoff on Cay(G,S)\text{Cay}(G,S) for all nilpotent GG, see the remark below.

Remark 1.

For any choice of generators, it was established by Diaconis and Saloff-Coste [10] that there is no cutoff when k1k\asymp 1 for all nilpotent groups, which is a class of groups that satisfies their concept of moderate growth. The interested reader can find a short exposition of their argument in [20, §4]. When logklog|G|\log k\asymp\log|G| and with kk i.i.d. uniform generators, there is no cutoff for all groups, see [17, §7.2]. Dou [12, Theorems 3.3.1 and 3.4.7] establishes a more general result for logklog|G|\log k\asymp\log|G|.

1.3.1. Cutoff for Random Walks on Nilpotent Groups

We use standard notation and definitions for mixing and cutoff, see e.g. [32, §4 and §18].

Definition 2.

A sequence (XN)N(X_{N})_{N\in\mathbb{N}} of Markov chains is said to exhibit cutoff if there exists a sequence of times (tN)N(t_{N})_{N\in\mathbb{N}} with

lim supNdN((1ε)tN)=1andlim supNdN((1+ε)tN)=0 for all ε(0,1),\limsup_{N\to\infty}d_{N}((1-\varepsilon)t_{N})=1\quad\text{and}\quad\limsup_{N\to\infty}d_{N}((1+\varepsilon)t_{N})=0\quad\text{ for all }\varepsilon\in(0,1),

where dN()d_{N}(\cdot) is the TV distance of XN()X_{N}(\cdot) from its equilibrium distribution for each NN\in\mathbb{N}.

We say that a RW on a sequence of random graphs (HN)N(H_{N})_{N\in\mathbb{N}} exhibits cutoff around time (tN)N(t_{N})_{N\in\mathbb{N}} whp if, for all fixed ε\varepsilon, in the limit NN\to\infty, the TV distance at time (1+ε)tN(1+\varepsilon)t_{N} converges in distribution to 0 and at time (1ε)tN(1-\varepsilon)t_{N} to 1, where the randomness is over HNH_{N}.

In other words, (XN)N(X_{N})_{N\in\mathbb{N}} is said to exhibits cutoff when the TV distance of the distribution of the chain from equilibrium drops from close to 1 to close to 0 in a short time interval of smaller order than the mixing time.

As briefly discussed in Section 1.1.1, there has been considerable interest in studying the cutoff behavior of random walks on groups. Our goal is to generalize the characterization of cutoff time as max{logk|G|,t0(k,|Gab|)}\max\{\log_{k}|G|,t_{0}(k,|G_{\mathrm{ab}}|)\} to general nilpotent groups (for random i.i.d. generators).

We now give the formal definition of the entropic time t0:=t0(k,|Gab)t_{0}:=t_{0}(k,|G_{\mathrm{ab}}) and the proposed mixing time.

Definition 3.

(i) Let t0(k,N)t_{0}(k,N) be the time at which the entropy of the rate 1 random walk WW on k\mathbb{Z}^{k} is logN\log N. We refer to t0(k,|Gab|)t_{0}(k,|G_{\mathrm{ab}}|) as the entropic time.
(ii) Define t(k,G):=max{t0(k,|Gab|),logk|G|}t_{*}(k,G):=\max\{t_{0}(k,|G_{\mathrm{ab}}|),\log_{k}|G|\}. We refer to t(k,G)t_{*}(k,G) as the cutoff time or the mixing time.

The entropic time t0(k,|Gab|)t_{0}(k,|G_{\mathrm{ab}}|) is identified as the cutoff time for the projected random walk Yt:=G2XtY_{t}:=G_{2}X_{t} on GabG_{\mathrm{ab}}, see [17], which is naturally a lower bound on the mixing time of the RW XtX_{t} on GG. To offer insight into the definition of the cutoff time, note that we need to run the RW sufficiently long to ensure that all elements of the group can be reached with reasonable probability, which leads to a lower bound of logk|G|\log_{k}|G|.

Our first result establishes cutoff around time t(k,G)t_{*}(k,G) for the random walk XX on Cay(G,S)\text{Cay}(G,S) where SS consists of i.i.d. uniform generators.

Theorem 1.

Let GG be a finite nilpotent group with r(G),L(G)1r(G),L(G)\asymp 1. Let S={Zi±1:i[k]}S=\{Z_{i}^{\pm 1}:i\in[k]\} with Z1,,ZkiidUnif(G)Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G). Assume 1logklog|G|1\ll\log k\ll\log|G|. As |G||G|\to\infty, the random walk on Cay(G,S)\text{Cay}(G,S) exhibits cutoff with high probability at time t(k,G)t_{*}(k,G), which is the cutoff time defined in Definition 3.

1.3.2. Random Walk on Non-random Cayley Graphs: Reduction to Abelianization

For a nilpotent group GG and any symmetric set of generators SGS\subseteq G whose size satisfies an upper bound, we show that the mixing time of the random walk on GG is completely determined (up to smaller order terms) by the mixing time of the projected walk on GabG_{\mathrm{ab}}.

Theorem 2.

Let GG be a finite nilpotent group such that r(G),L(G)1r(G),L(G)\asymp 1 and SGS\subseteq G be a symmetric set of generators. Suppose |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}. For any fixed ε(0,1)\varepsilon\in(0,1) and δ(0,ε)\delta\in(0,\varepsilon) we have

tmixGab(ε)tmixG(ε)tmixGab(εδ)t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)\leq t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)

when |G||G| is sufficiently large (more precisely, when |G|exp((log|G|)L)δ|G|\exp(-(\log|G|)^{L})\leq\delta).

Remark 2.

The assumption |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|} is to guarantee that DiamS(G2)\mathrm{Diam}_{S}(G_{2}) is of smaller order than DiamS(Gab)\mathrm{Diam}_{S}(G_{\mathrm{ab}}) so that the mixing of XtX_{t} is governed by its projected walk onto GabG_{\mathrm{ab}}. With more specific knowledge on the structure of GG, one can expect to obtain a much less stringent constraint on SS. Also see Remark 4.

As a direct consequence of the proof of Theorem 2, we establish that under the same conditions, the spectral gap of the random walk on GG is likewise determined by the spectral gap of its projection onto GabG_{\mathrm{ab}}.

Corollary 1.

Let trelGt^{G}_{\mathrm{rel}} and trelGabt^{G^{\mathrm{ab}}}_{\mathrm{rel}} be the relaxation time of the walk XtX_{t} and Yt=G2XtY_{t}=G_{2}X_{t} respectively. Then

trelGabtrelGmax{trelGab,|S|DiamS(G2)2}.t^{G^{\mathrm{ab}}}_{\mathrm{rel}}\leq t^{G}_{\mathrm{rel}}\leq\max\{t^{G^{\mathrm{ab}}}_{\mathrm{rel}},|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\}.

In particular, when |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|} we have trelG=trelGabt^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}}.

As a consequence of the above results, we can see that for a class of nilpotent groups GG whose abelianization has a unique representation, with a symmetric set of generators SS of minimal size, the mixing time and the relaxation time (inverse of the spectral gap) of the random walk do not depend on the choice of SS. In this case, the choice of generators do not effect the mixing behavior. This provides a partial answer to the open question posed in Section 1.1.1.

Theorem 3.

Suppose GG is a nilpotent group with rank rr and step LL such that either (i) GabmrG_{ab}\cong\mathbb{Z}^{r}_{m} where mm\in\mathbb{N} or (ii) GG is a pp-group. Suppose the rank and step satisfy LrL+1log|G|16loglog|G|Lr^{L+1}\leq\frac{\log|G|}{16\log\log|G|}. For any symmetric set of generators SGS\subseteq G of minimal size and any given ε(0,1)\varepsilon\in(0,1), the mixing time tmixG,S(ε)t^{G,S}_{mix}(\varepsilon) is the same up to smaller order terms, and the relaxation time trelG,St_{\mathrm{rel}}^{G,S} is the same.

The mixing property of the random walk XtX_{t} on the Cayley graph of GG is closely related to that of the projected random walk on the Cayley graph of GabG_{\mathrm{ab}}. More precisely, denoting by Yt:=G2XtY_{t}:=G_{2}X_{t} the projected RW on GabG_{\mathrm{ab}} and starting with the walk XtX_{t} being uniform over G2G_{2}, one can observe (see Lemma 3.1) that

πG2(Xt=)πGTV=G2(Yt=)πGabTV.\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}=\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}.

As suggested by the following triangle inequality

id(Xt=)πGTVπG2(Xt=)πGTV+id(Xt=)πG2(Xt=)TV,\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}\leq\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}+\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}},

if the total variation distance between id(Xt=)\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot) and πG2(Xt=)\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot) can be nicely controlled then the mixing property of XtX_{t} is primarily characterized by the mixing of its projection on the abelianization, which we refer to as the reduction to abelianization.

We will prove in Lemma 3.2 that indeed id(Xt=)πG2(Xt=)TV\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}} decays exponentially fast in time with rate at least (|S|DiamS(G2)2)1(|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2})^{-1}, where DiamS(G2)\mathrm{Diam}_{S}(G_{2}) is the diameter of G2G_{2} in Cay(G,S)\text{Cay}(G,S). This provides a quantitive criterion to determine when the mixing of the walk XtX_{t} is governed by its projection onto the abelianization. In particular, if the mixing of the projected walk YtY_{t} occurs after id(Xt=)πG2(Xt=)TV\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}} had become vanishingly small then the mixing time of XtX_{t} is roughly that of YtY_{t}.

Due to the well known connection between the mixing time and the diameter of the graph, see, e.g., [25, Proposition 13.7], for our purpose it is sufficient to prove DiamS(G2)\mathrm{Diam}_{S}(G_{2}) is small enough compared to DiamS(Gab)\mathrm{Diam}_{S}(G_{\mathrm{ab}}). Section 2 is devoted to proving an upper bound on DiamS(G2)\mathrm{Diam}_{S}(G_{2}) where the roles of LL, |S||S| and DiamS(Gab)\mathrm{Diam}_{S}(G_{\mathrm{ab}}) are made explicit.

Theorem 4.

Let SGS\subseteq G be a symmetric set of generators and let RSR\subseteq S be such that |{s,s1}R|=1|\{s,s^{-1}\}\cap R|=1 for all sSs\in S. For 2iL2\leq i\leq L, we have

DiamS(G2)\displaystyle\mathrm{Diam}_{S}(G_{2}) i=2LDiamS(Gi/Gi+1)\displaystyle\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1})
i=2L25i+7|R|i(22i+LDiamS(Gab)/|R|1/i).\displaystyle\leq\sum_{i=2}^{L}2^{5i+7}|R|^{i}\left(2^{2i}+L\cdot\lceil\mathrm{Diam}_{S}(G_{\mathrm{ab}})/|R|\rceil^{1/i}\right). (3)

As a consequence, for any set of generators satisfying |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}, one has DiamS(G2)DiamS(Gab)3/4\mathrm{Diam}_{S}(G_{2})\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{3/4} and hence the mixing of XtX_{t} can be reduced to the mixing of its projection onto the abelianization.

1.3.3. Our Methodology

We describe our methodology in relation to the objectives described in Section 1.1.2.

(i) Representation of random walk. A substantial body of work has been devoted to the study of random walks on unipotent matrix groups, see Section 1.4.1. The analysis in many existing work heavily depends on the favorable matrix structure specific to unipotent matrix groups, a feature not necessarily present in general nilpotent groups.

There has been some progress made towards treating general nilpotent groups. In [17, §6], partial results were obtained using a comparison between the mixing time of a general nilpotent group GG with a “corresponding” abelian group G¯:==1LG/G+1\bar{G}:=\oplus_{\ell=1}^{L}G_{\ell}/G_{\ell+1} in [17, §6]. See Section 1.2 for the definition of {G}[L]\{G_{\ell}\}_{\ell\in[L]}. More specifically, denoting by 𝒢k\mathcal{G}_{k} and 𝒢¯k\bar{\mathcal{G}}_{k} respectively the random Cayley graphs generated by kk i.i.d. uniform generators in GG and G¯\bar{G}, it is shown that tmix(𝒢k)/tmix((𝒢¯)k)1+o(1)t_{mix}(\mathcal{G}_{k})/t_{mix}((\bar{\mathcal{G}})_{k})\leq 1+o(1) with high probability, thereby offering an upper bound on the mixing time on 𝒢k\mathcal{G}_{k}.

This comparison leads to a tight upper bound and thus establishes cutoff when GG is a nilpotent group when GG has a relatively small commutator subgroup [G,G][G,G]. Examples of such groups include pp-groups with “small” commutators and Heisenberg groups of diverging dimension, see [17, Corollary D.1 and D.2]. However, for general nilpotent groups this comparison is not sharp.

While the comparison technique discussed in [17] may not ensure a sharp upper bound on the mixing time for general nilpotent groups, it underscores the approach of examining the mixing behavior in relation to each quotient group {G/G+1}[L]\{G_{\ell}/G_{\ell+1}\}_{\ell\in[L]}. To obtain the tight upper bound and establish cutoff, we give an accurate representation of the random walk dynamics through the lens of quotient groups.

To give a bit of intuition, let us consider the free nilpotent group of step 2 (i.e., G3={id}G_{3}=\{\mathrm{id}\}). Let S={Zi±1:i[k]}S=\{Z_{i}^{\pm 1}:i\in[k]\} be a set of i.i.d. uniform generators. Let W:=W(t)=(W1(t),,Wk(t))W:=W(t)=(W_{1}(t),\dots,W_{k}(t)) be an auxiliary process defined based on the random walk XtX_{t} where Wi(t)W_{i}(t) is the number of times generator sis_{i} has been applied minus the number of times si1s_{i}^{-1} has been applied in the random walk X:=XtX:=X_{t}. Through rearranging, we can express any word in the form

X=Z1W1ZkWka,b[k]:a<b[Za,Zb]mba,X=Z_{1}^{W_{1}}\cdots Z_{k}^{W_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}, (4)

where (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b} results from the rearrangement of generators, see (30) for more details. Roughly speaking, Z1W1ZkWkZ_{1}^{W_{1}}\cdots Z_{k}^{W_{k}} keeps track of the walk on Gab=G/G2G_{\mathrm{ab}}=G/G_{2} whereas the term a,b[k]:a<b[Za,Zb]mba\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}, which belongs to G2G_{2}, corresponds to the mixing on the quotient group G2/G3G_{2}/G_{3}.

We demonstrate in Section 4.3 that this line of reasoning applies to general nilpotent groups of step L2L\geq 2, see (33). Although the rearranging of generators leads to the presence of multi-fold commutators such as [[Z1,Z2],Z3][[Z_{1},Z_{2}],Z_{3}] when L3L\geq 3, we will argue, through a further careful simplification, that the presence of multi-fold commutators does not add to the complexity of the analysis, and one only needs to control the distribution of a,b[k]:a<b[Za,Zb]mba\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}} as with the case where L=2L=2.

(ii.a) Comparison argument: reduction to abelianization. We develop a nice argument of comparison that addresses the mixing of random walk on general groups with an arbitrary generator set SS. Under mild assumptions on the size of SS, the mixing time on GG is the same as the mixing time of the projected walk on GabG_{\mathrm{ab}} (up to smaller order terms), see the precise statement in Theorem 2. That is, within the scope of Theorem 2, the mixing time on GG is completely determined by that on the abelianization GabG_{\mathrm{ab}}.

Theorem 2 further implies that for a certain class of nilpotent groups with specific structures in their abelianization, the mixing time remains the same (up to smaller order terms) regardless of the choice of a minimal-sized symmetric set of generators. See Theorem 3 for the precise statement.

(ii.b) Geometry of the Cayley graph on nilpotent groups. We derive a quantitative upper bound on the diameter of the commutator subgroup G2G_{2} in terms of the diameter of the abelianization GabG_{\mathrm{ab}}, with explicit dependence on the rank r=r(G)r=r(G) and step L=L(G)L=L(G) of the group GG, as detailed in Theorem 4. This, combined with the aforementioned comparison argument, allows us to provide sufficient conditions under which the mixing behavior of the random walk on GG is governed by that of the projected walk on GabG_{\mathrm{ab}}.

1.4. Historic Overview

1.4.1. Random Walks on Unipotent Matrix Groups

Consider the group 𝕌n{\displaystyle\mathbb{U}_{n}} of upper-triangular matrices with 11’s along the diagonal, so they are the group of matrices

𝕌n={(1010010001)}.\mathbb{U}_{n}=\left\{{\begin{pmatrix}1&*&\cdots&*&*\\ 0&1&\cdots&*&*\\ \vdots&\vdots&&\vdots&\vdots\\ 0&0&\cdots&1&*\\ 0&0&\cdots&0&1\end{pmatrix}}\right\}.

Then, a unipotent group can be defined as a subgroup of some 𝕌n{\displaystyle\mathbb{U}_{n}}. This includes the two families of nilpotent groups discussed earlier: the d×dd\times d unit-upper triangular matrices Um,dU_{m,d} with entries in m\mathbb{Z}_{m} and the dd-dimensional Heisenberg group Hm,dH_{m,d} over m\mathbb{Z}_{m} where mm\in\mathbb{N}.

The exploration of random walks on unit upper triangular matrices has led to a substantial body of research. One avenue of investigation involves a simple walk on Um,dU_{m,d}, the d×dd\times d unit upper triangular matrix group with entries over m\mathbb{Z}_{m} for some mm\in\mathbb{N}: a row is chosen uniformly and added to or subtracted from the row above. Ellenberg [15] studied the diameter of the associated Cayley graph, with dd growing, and subsequently improved this in Ellenberg and Tymoczko [16]. Stong [31] gave mixing bounds via analysis of eigenvalues. Coppersmith and Pak [8, 28] look directly at mixing. Further work along this line includes Peres and Sly [29], Nestoridi [26] and Nestoridi and Sly [27]. Notably, Nestoridi and Sly [27] are the first to optimize bounds for mm and dd simultaneously. Diaconis and Hough [9] introduced a new method for proving a central limit theorem for random walks on unipotent matrix groups.

In the context of i.i.d. uniformly chosen generators, Hermon and Olesker-Taylor [19] prove the characterization of the cutoff time as the entropic time of the projected walk onto the abelianization for the two families of nilpotent groups: the d×dd\times d unit-upper triangular matrices Um,dU_{m,d} with entries in m\mathbb{Z}_{m} and the dd-dimensional Heisenberg group Hm,dH_{m,d} over m\mathbb{Z}_{m} where mm\in\mathbb{N}.

1.4.2. The Entropic Methodology

A common theme in the study of mixing times is that “generic” instances often exhibit the cutoff phenomenon. Moreover, this can often be handled via the entropic method, see, e.g., [3, 4, 5]. A more detailed exposition of the known literature can be found in a previous article of one of the authors, see [17, §1.3.5]. Additionally, the entropic method has been applied within the context of random walks on groups, as discussed in [17, 19], which we now explain in a little more depth.

The main idea is to relate the mixing of the random walk X=XtX=X_{t} on Cay(G,S)\text{Cay}(G,S) to that of an auxiliary process WtW_{t} and study the entropy of WtW_{t}. Suppose S={si±1:i[k]}S=\{s_{i}^{\pm 1}:i\in[k]\} is given. The auxiliary process W=Wt:=(W1(t),,Wk(t))W=W_{t}:=(W_{1}(t),\dots,W_{k}(t)) is defined based on XtX_{t} where Wi(t)W_{i}(t) is the number of times generator sis_{i} has been applied minus the number of times si1s_{i}^{-1} has been applied in the random walk XtX_{t}. The observation that WW is a rate 1 random walk on k\mathbb{Z}^{k} (whose entropy reveals information regarding the mixing of the walk XX) leads naturally to the definition of the entropic times, see Definition 3. More specifically, the auxiliary process WW is related to the original random walk XX as follows. We sample two independent copies of the random walk and the auxiliary process, denoted by (X,W)(X,W) and (X,W)(X^{\prime},W^{\prime}). By Cauchy-Schwarz inequality one has

4S(Xt=|Wt)πGTV2|G|S(Xt=Xt|Wt,Wt)1,4\|\mathbb{P}_{S}(X_{t}=\cdot|W_{t})-\pi_{G}\|^{2}_{\mathrm{TV}}\leq|G|\cdot\mathbb{P}_{S}(X_{t}=X^{\prime}_{t}|W_{t},W^{\prime}_{t})-1,

which relates the mixing of XX to the hitting probability of XX and XX^{\prime}, i.e., the probability that X(X)1=idX(X^{\prime})^{-1}=\mathrm{id}, where the index tt is suppressed as it is clear from the context.

When the group GG is abelian, given the choice of generators SS, the total variation distance is a function of WtW_{t} alone, see [17]. When the group is not abelian, this is not the case. When GG is nilpotent, the auxiliary process WtW_{t} still provides useful (albeit partial) information on XtX_{t}. In this case, to get a full picture of the mixing of the RW, we will combine the knowledge on the auxiliary process WtW_{t} with further information obtained through analyzing the mixing on the quotient groups {Q}[L]\{Q_{\ell}\}_{\ell\in[L]} separately. See Section 4.6 and 4.8 for the complete discussion.

2. Geometry of Cayley Graphs

The definition of Cayley graph of a group GG can be naturally extended to its quotient groups. For HGH\trianglelefteq G, the Cayley graph of G/HG/H, denoted by Cay(G/H,{Hs:sS})\text{Cay}(G/H,\{Hs:s\in S\}), consists of vertex set G/HG/H and edge set {{Hg,Hgs}:gG,sS}\{\{Hg,Hgs\}:g\in G,s\in S\}.

Let distS(,)\mathrm{dist}_{S}(\cdot,\cdot) denote the graph distance on Cay(G,S)\text{Cay}(G,S). Define

SH:={Hs:sS}.S_{H}:=\{Hs:s\in S\}. (5)

Similarly, let distSH(,)\mathrm{dist}_{S_{H}}(\cdot,\cdot) denote the graph distance on Cay(G/H,SH)\text{Cay}(G/H,S_{H}). For a subgroup HH of GG, we define the diameter of HH with respect to the graph distance distS(,)\mathrm{dist}_{S}(\cdot,\cdot) on Cay(G,S)\text{Cay}(G,S) by

DiamS(H):=max{distS(id,h):hH}.\mathrm{Diam}_{S}(H):=\max\{\mathrm{dist}_{S}(id,h):h\in H\}. (6)

For HHGH\trianglelefteq H^{\prime}\trianglelefteq G such that HGH\trianglelefteq G (so that G/HG/H is a group), with slight abuse of notation, we can define the diameter of H/HH^{\prime}/H with respect to the graph distance distSH(,)\mathrm{dist}_{S_{H}}(\cdot,\cdot) on Cay(G/H,SH)\text{Cay}(G/H,S_{H}),

DiamS(H/H):=max{distSH(H,Hh):hH}.\mathrm{Diam}_{S}(H^{\prime}/H):=\max\{\mathrm{dist}_{S_{H}}(H,Hh^{\prime}):h^{\prime}\in H^{\prime}\}.

whose definition is consistent with (6) with G,H,SG,H,S replaced respectively by G/H,H/H,SHG/H,H^{\prime}/H,S_{H}.

We have the following triangle inequality in terms of the diameter of a group HH^{\prime} and that of its subgroup HH and the quotient group H/HH^{\prime}/H.

Proposition 1.

For all HHGH\trianglelefteq H^{\prime}\trianglelefteq G such that HGH\trianglelefteq G the following holds:

DiamS(H)\displaystyle\mathrm{Diam}_{S}(H^{\prime}) DiamS(H/H)+DiamS(H).\displaystyle\leq\mathrm{Diam}_{S}(H^{\prime}/H)+\mathrm{Diam}_{S}(H). (7)
Proof.

Let hHh^{\prime}\in H^{\prime}. By the definition of DiamS(H/H)\operatorname{Diam}_{S}(H^{\prime}/H), there exists s1,,smSs_{1},\dots,s_{m}\in S with mDiamS(H/H)m\leq\mathrm{Diam}_{S}(H^{\prime}/H) such that Hh=Hs1smHh^{\prime}=Hs_{1}\cdots s_{m}, i.e., there exists hHh\in H such that h=hs1smh^{\prime}=hs_{1}\cdots s_{m}. Hence

distS(id,h)=distS(id,hs1sm)distS(id,h)+mDiamS(H)+DiamS(H/H),\mathrm{dist}_{S}(id,h^{\prime})=\mathrm{dist}_{S}(id,hs_{1}\cdots s_{m})\leq\mathrm{dist}_{S}(id,h)+m\leq\mathrm{Diam}_{S}(H)+\mathrm{Diam}_{S}(H^{\prime}/H),

which concludes the proof of (7). ∎

Applying the triangle inequality in (7) iteratively leads to a decomposition of DiamS(G2)\mathrm{Diam}_{S}(G_{2}) as the sum of DiamS(Gi/Gi+1)\mathrm{Diam}_{S}(G_{i}/G_{i+1}) over i[L]i\in[L], i.e.,

DiamS(G2)i=2LDiamS(Gi/Gi+1).\mathrm{Diam}_{S}(G_{2})\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1}).

Breulliard and Tointon [6, Lemma 4.11] showed the diameter of G2G_{2} is at most CS,L(DiamS(G)1/2)C_{S,L}(\mathrm{Diam}_{S}(G)^{1/2}), where CS,LC_{S,L} is a constant depending on the size of SS and L:=L(G)L:=L(G). In fact, they showed for all i[L]i\in[L] that DiamS(Gi/Gi+1)\mathrm{Diam}_{S}(G_{i}/G_{i+1}) is at most CS,LDiamS(G)1/iC_{S,L}\mathrm{Diam}_{S}(G)^{1/i}. El-Baz and Pagano [14] proved the same estimates using somewhat similar arguments. In addition, they observe that DiamS(G)DiamS(G2)+DiamS(Gab)\mathrm{Diam}_{S}(G)\leq\mathrm{Diam}_{S}(G_{2})+\mathrm{Diam}_{S}(G_{\mathrm{ab}}) and hence one can estimate DiamS(Gi/Gi+1)\mathrm{Diam}_{S}(G_{i}/G_{i+1}) in terms of DiamS(Gab)\mathrm{Diam}_{S}(G_{\mathrm{ab}}).

As discussed in Remark 1, a necessary condition for the random walk on Cay(G,S)\text{Cay}(G,S) to exhibit cutoff when LL is bounded is for |S||S| to diverge. Consequently, as opposed to [6] and [14] which did not quantify the dependence of the constant CS,LC_{S,L} on |S||S| and LL, it is necessary for us to quantify this dependence. Our approach for upper bounding DiamS(G2)\mathrm{Diam}_{S}(G_{2}) adheres to the framework in El-Baz and Pagano [14], but with considerably more attention devoted to quantifying the influence of |S||S| as well as LL.

Theorem 4.

Let SGS\subseteq G be a symmetric set of generators and let RSR\subseteq S be such that |{s,s1}R|=1|\{s,s^{-1}\}\cap R|=1 for all sSs\in S. For 2iL2\leq i\leq L, we have

DiamS(G2)\displaystyle\mathrm{Diam}_{S}(G_{2}) i=2LDiamS(Gi/Gi+1)\displaystyle\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1})
i=2L25i+7|R|i(22i+LDiamS(Gab)/|R|1/i).\displaystyle\leq\sum_{i=2}^{L}2^{5i+7}|R|^{i}\left(2^{2i}+L\cdot\lceil\mathrm{Diam}_{S}(G_{\mathrm{ab}})/|R|\rceil^{1/i}\right).

The following comparison between DiamS(G2)\mathrm{Diam}_{S}(G_{2}) and DiamS(Gab)\mathrm{Diam}_{S}(G_{\mathrm{ab}}) is what we will use in the proof of Theorem 2.

Corollary 2.

For any fixed LL\in\mathbb{N}, we have

DiamS(G2)DiamS(Gab)3/4\mathrm{Diam}_{S}(G_{2})\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{3/4}

when DiamS(Gab)|R|4L\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{4L}. In particular, this condition holds when |R|log|G|8LrLloglog|G||R|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}.

Remark 3.

The statement above is a special case of the following more general claim: For all ε>0\varepsilon>0, if DiamS(Gab)|R|L/ε\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{L/\varepsilon} then DiamS(G2)DiamS(Gab)1/2+ε\mathrm{Diam}_{S}(G_{2})\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{1/2+\varepsilon}, which holds when |R|εlog|G|2LrLloglog|G||R|\leq\frac{\varepsilon\log|G|}{2Lr^{L}\log\log|G|}.

Proof.

Knowing that DiamS(Gab)|R|4L\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{4L}, it is an easy consequence of Theorem 4 that

DiamS(G2)\displaystyle\mathrm{Diam}_{S}(G_{2}) i=2L25i+7|R|i(22i+LDiamS(Gab)/|R|1/i)\displaystyle\leq\sum_{i=2}^{L}2^{5i+7}|R|^{i}\left(2^{2i}+L\lceil\mathrm{Diam}_{S}(G_{\mathrm{ab}})/|R|\rceil^{1/i}\right)
|R|LDiamS(Gab)1/2DiamS(Gab)3/4.\displaystyle\lesssim|R|^{L}\cdot\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{1/2}\lesssim\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{3/4}.

It remains to prove that DiamS(Gab)|R|4L\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|R|^{4L} for the given range of |R||R|. Using the fact |Gab||G|1/2rL|G_{\mathrm{ab}}|\geq|G|^{1/2r^{L}} from Corollary 4 we can observe that

|Gab|1/|R||G|12rL|R|(log|G|)4L|R|4L|G_{\mathrm{ab}}|^{1/|R|}\geq|G|^{\frac{1}{2r^{L}|R|}}\geq(\log|G|)^{4L}\gg|R|^{4L} (8)

for |R|log|G|8LrLloglog|G||R|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}. Based on (8), it suffices to show DiamS(Gab)|Gab|1/|R|\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gtrsim|G_{\mathrm{ab}}|^{1/|R|} for the given range of |R||R|.

To prove DiamS(Gab)|Gab|1/|R|\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gtrsim|G_{\mathrm{ab}}|^{1/|R|} the key is to notice for the Cayley graph Cay(Gab,SG2)\text{Cay}(G_{\mathrm{ab}},S_{G_{2}}), setting k:=|R|k:=|R|, trivially we have |BGab()||Bk()||B_{G_{\mathrm{ab}}}(\ell)|\leq|B_{k}(\ell)|, where BGab():={gGab:distSG2(G2,g)}B_{G_{\mathrm{ab}}}(\ell):=\{g\in G_{\mathrm{ab}}:\mathrm{dist}_{S_{G_{2}}}(G_{2},g)\leq\ell\} is the ball of radius \ell in Cay(Gab,SG2)\text{Cay}(G_{\mathrm{ab}},S_{G_{2}}) and Bk():={𝒛k:𝒛1}B_{k}(\ell):=\{\bm{z}\in\mathbb{Z}^{k}:\|\bm{z}\|_{1}\leq\ell\} is the kk-dimensional lattice ball of radius \ell. Thus DiamS(Gab)min{:|Bk()||Gab|}.\mathrm{Diam}_{S}(G_{\mathrm{ab}})\geq\min\{\ell:|B_{k}(\ell)|\geq|G_{\mathrm{ab}}|\}. It follows from Lemma E.2a in [18] that |Bk()|2k(+kk)(4)k|B_{k}(\ell)|\leq 2^{k\wedge\ell}{\ell+k\choose k}\leq(4\ell)^{k} for k\ell\geq k, which implies (4DiamS(Gab))k|Gab|(4\mathrm{Diam}_{S}(G_{\mathrm{ab}}))^{k}\geq|G_{\mathrm{ab}}|. Hence we have DiamS(Gab)|Gab|1/|R|\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gtrsim|G_{\mathrm{ab}}|^{1/|R|}.

Before we turn to the proof of Theorem 4, some preliminary results that will be useful are presented in the next section.

2.1. Preliminaries

We begin by recalling some standard notation and stating several properties of commutators. For x,yGx,y\in G we write [x,y]:=x1y1xy=[y,x]1[x,y]:=x^{-1}y^{-1}xy=[y,x]^{-1} and xy:=y1xy=x[x,y]=[y,x]x1x^{y}:=y^{-1}xy=x[x,y]=[y,x]x^{-1}. Further observe that for x,y,zGx,y,z\in G, [x,yz]=[x,z][x,y]z=[x,z][x,y][[x,y],z][x,yz]=[x,z][x,y]^{z}=[x,z][x,y][[x,y],z]. Define ρ(x,y):=[x,y]\rho(x,y):=[x,y] for x,yGx,y\in G as the two-fold commutator, and inductively

ρ(x1,,xi):=[ρ(x1,,xi1),xi]fori3andx1,,xiG.\rho(x_{1},...,x_{i}):=[\rho(x_{1},...,x_{i-1}),x_{i}]\quad\text{for}\quad i\geq 3\quad\text{and}\quad x_{1},...,x_{i}\in G. (9)

Some standard properties of commutators are collected into the following propositions whose proofs can be easily found in literature, see e.g. [13], and thus are omitted. The following is a fairly well known result following from an induction argument using the three subgroup lemma.

Proposition 2.

The lower central series of a nilpotent group GG is a strongly central series, i.e., [Gi,Gj][G_{i},G_{j}] is a subgroup of Gi+jG_{i+j} for all i,j1i,j\geq 1.

Proposition 3.

For i0i\geq 0, the map ϕ:G×GiGi+1/Gi+2\phi:G\times G_{i}\to G_{i+1}/G_{i+2} given by ϕ(g,h):=Gi+2[g,h]\phi(g,h):=G_{i+2}[g,h] is anti-symmetric and bi-linear. Namely, the following hold for all xGx\in G and y,zGiy,z\in G_{i}:

Gi+2[x,y]\displaystyle G_{i+2}[x,y] =Gi+2[y,x]1\displaystyle=G_{i+2}[y,x]^{-1}
Gi+2[x,yz]\displaystyle G_{i+2}[x,yz] =Gi+2[x,y][x,z]\displaystyle=G_{i+2}[x,y][x,z]
Gi+2[yz,x]\displaystyle G_{i+2}[yz,x] =Gi+2[y,x][z,x]\displaystyle=G_{i+2}[y,x][z,x]
Gi+2[x,yj]\displaystyle G_{i+2}[x^{\ell},y^{j}] =Gi+2[x,y]jfor all,j.\displaystyle=G_{i+2}[x,y]^{\ell j}\quad\text{for all}\quad\ell,j\in\mathbb{Z}.

Moreover, for i2i\geq 2 and jij\leq i, if x1,xiGx_{1},\ldots x_{i}\in G and yGy\in G, then we have the following linearity in the jj-th component, i.e.,

Gi+1ρ(x1,,xj1,xjy,xj+1,,xi)=Gi+1ρ(x1,,xi)x^y,j=Gi+1x^y,jρ(x1,,xi).\displaystyle G_{i+1}\rho(x_{1},...,x_{j-1},x_{j}y,x_{j+1},...,x_{i})=G_{i+1}\rho(x_{1},...,x_{i})\widehat{x}_{y,j}=G_{i+1}\widehat{x}_{y,j}\rho(x_{1},...,x_{i}). (10)

where x^y,j:=ρ(x1,,xj1,y,xj+1,,xi)\widehat{x}_{y,j}:=\rho(x_{1},...,x_{j-1},y,x_{j+1},...,x_{i}), and so

Gi+1ρ(a,x2,,xi)=Gi+1ρ(b,x2,,xi) if ab1G2.G_{i+1}\rho(a,x_{2},\dots,x_{i})=G_{i+1}\rho(b,x_{2},\dots,x_{i})\quad\text{ if }ab^{-1}\in G_{2}. (11)

Let RSR\subseteq S be such that |{s,s1}R|=1|\{s,s^{-1}\}\cap R|=1 for all sSs\in S. Now define inductively

S1:=SandSi:={[s,s]sR,sSi1}fori2.S_{1}:=S\quad\text{and}\quad S_{i}:=\{[s,s^{\prime}]\mid s\in R,\>s^{\prime}\in S_{i-1}\}\quad\text{for}\quad i\geq 2.

Write S^i:={Gi+1s:sSi}\widehat{S}_{i}:=\{G_{i+1}s:s\in S_{i}\} for i1i\geq 1. The following proposition can be proved by induction on ii using Proposition 3. We omit the details, and refer the reader to [14] for additional details.

Proposition 4.

Assume that S^1\widehat{S}_{1} generates GabG_{\mathrm{ab}}. Then S^i\widehat{S}_{i} generates the Abelian group Gi/Gi+1G_{i}/G_{i+1} for all i1i\geq 1. In particular, SS generates GG if and only if S^1\widehat{S}_{1} generates GabG_{\mathrm{ab}}.

Corollary 3.

For i1i\geq 1 and any gGig\in G_{i} we can write

Gi+1g=Gi+1(x2,,xi)Ri1sSρ(s(x2,,xi),g(s),x2,,xi),G_{i+1}g=G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\prod_{s\in S}\rho(s^{\ell_{(x_{2},...,x_{i}),g}(s)},x_{2},...,x_{i}), (12)

where {(x2,,xi),g():(x2,,xi)Ri1}\{\ell_{(x_{2},...,x_{i}),g}(\cdot):(x_{2},...,x_{i})\in R^{i-1}\} are functions from SS to +\mathbb{Z}_{+} belonging to the set

A:={:S+ s.t. sS|(s)|DiamS(Gab) and (s)(s1)=0 for all sS such that ss1},A:=\{\ell:S\to\mathbb{Z}_{+}\text{ s.t. }\sum_{s\in S}|\ell(s)|\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}})\text{ and }\ell(s)\cdot\ell(s^{-1})=0\text{ for all }s\in S\text{ such that }s\neq s^{-1}\},

where the second condition means for all sSs\in S such that ss1s\neq s^{-1} we have either (s)=0\ell(s)=0 or (s1)=0\ell(s^{-1})=0 for any A\ell\in A.

Proof.

We know from Proposition 4 that S^i\widehat{S}_{i} generates Gi/Gi+1G_{i}/G_{i+1}, i.e., for any gGig\in G_{i} we can express Gi+1gG_{i+1}g as a product of elements in S^i={Gi+1s:sSi}\widehat{S}_{i}=\{G_{i+1}s:s\in S_{i}\}. Observe that ρ:S×Ri1Si\rho:S\times R^{i-1}\to S_{i} is surjective due to the definition of SiS_{i}. Hence we can express

Gi+1g=Gi+1(x2,,xi)Ri1sSρ(s,x2,,xi)~(s,x2,,xi),G_{i+1}g=G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})},

where ~(s,x2,,xi)\tilde{\ell}(s,x_{2},\dots,x_{i})\in\mathbb{Z} corresponds to the number of times ρ(s,x2,,xi)\rho(s,x_{2},...,x_{i}) appears. Let h(x2,,xi),g:=sSs~(s,x2,,xi)Gh_{(x_{2},\dots,x_{i}),g}:=\prod_{s\in S}s^{\tilde{\ell}(s,x_{2},\dots,x_{i})}\in G so that by (10)

Gi+1sSρ(s,x2,,xi)~(s,x2,,xi)\displaystyle G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})} =Gi+1ρ(sSs~(s,x2,,xi),x2,,xi)=Gi+1ρ(h(x2,,xi),g,x2,,xi).\displaystyle=G_{i+1}\rho(\prod_{s\in S}s^{\tilde{\ell}(s,x_{2},\dots,x_{i})},x_{2},\dots,x_{i})=G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},\dots,x_{i}).

Then we can take h(x2,,xi),gab=G2h(x2,,xi),gGabh^{ab}_{(x_{2},\dots,x_{i}),g}=G_{2}h_{(x_{2},\dots,x_{i}),g}\in G_{\mathrm{ab}} so that

Gi+1ρ(h(x2,,xi),g,x2,,xi)=Gi+1ρ(h(x2,,xi),gab,x2,,xi).G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})=G_{i+1}\rho(h^{ab}_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i}). (13)

The above expression contains a slight abuse of notation on the right hand side as h(x2,,xi),gabh^{ab}_{(x_{2},\dots,x_{i}),g} is not an element of GG while ρ\rho was defined to have inputs from GG. By (11) we see that for any hGh^{\prime}\in G such that hh(x2,,xi),g1G2h^{\prime}h_{(x_{2},\dots,x_{i}),g}^{-1}\in G_{2},

Gi+1ρ(h(x2,,xi),g,x2,,xi)=Gi+1ρ(h,x2,,xi)G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})=G_{i+1}\rho(h^{\prime},x_{2},...,x_{i})

and hence what essentially determines the value of (13) is h(x2,,xi),gab=G2h(x2,,xi),gh^{ab}_{(x_{2},\dots,x_{i}),g}=G_{2}h_{(x_{2},\dots,x_{i}),g}, which clarifies the meaning of the right hand side of (13). The point of doing so is that we can identify Gi+1sSρ(s,x2,,xi)~(s,x2,,xi)G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})} with Gi+1ρ(h(x2,,xi),gab,x2,,xi)G_{i+1}\rho(h^{ab}_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i}) for some h(x2,,xi),gabGabh^{ab}_{(x_{2},\dots,x_{i}),g}\in G_{\mathrm{ab}}. As GabG_{\mathrm{ab}} can be generated by S^1\widehat{S}_{1}, there exists some function ^(,x2,,xi)\hat{\ell}(\cdot,x_{2},\dots,x_{i}) that satisfies sS|^(s,x2,,xi)|DiamS(Gab)\sum_{s\in S}|\hat{\ell}(s,x_{2},\dots,x_{i})|\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}}) such that

G2sSs^(s,x2,,xi)=h(x2,,xi),gab,G_{2}\prod_{s\in S}s^{\hat{\ell}(s,x_{2},\dots,x_{i})}=h^{ab}_{(x_{2},\dots,x_{i}),g},

i.e.,

Gi+1sSρ(s,x2,,xi)~(s,x2,,xi)=Gi+1sSρ(s,x2,,xi)^(s,x2,,xi).G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\tilde{\ell}(s,x_{2},\dots,x_{i})}=G_{i+1}\prod_{s\in S}\rho(s,x_{2},...,x_{i})^{\hat{\ell}(s,x_{2},\dots,x_{i})}.

This explains the first condition in the definition of AA.

To explain the second condition in the definition of AA, we observe that since ρ(s1,x2,,xi)=ρ(s,x2,,xi)1\rho(s^{-1},x_{2},\dots,x_{i})=\rho(s,x_{2},\dots,x_{i})^{-1} for sSs\in S, only one of {s,s1}\{s,s^{-1}\} needs to appear in the expression above. Given gGig\in G_{i} and (x2,,xi)(x_{2},\dots,x_{i}), for each sSs\in S, we choose s+{s,s1}s_{+}\in\{s,s^{-1}\} such that simplifying the product

Gi+1s{s,s1}ρ(s,x2,,xi)^(s,x2,,xi)=Gi+1ρ(s+,x2,,xi)(x2,,xi),g(s)G_{i+1}\prod_{s^{\prime}\in\{s,s^{-1}\}}\rho(s^{\prime},x_{2},...,x_{i})^{\hat{\ell}(s^{\prime},x_{2},\dots,x_{i})}=G_{i+1}\rho(s_{+},x_{2},\dots,x_{i})^{\ell_{(x_{2},\dots,x_{i}),g}(s)}

leads to a non-negative power (x2,,xi),g(s)\ell_{(x_{2},\dots,x_{i}),g}(s). We can view (x2,,xi),g()\ell_{(x_{2},\dots,x_{i}),g}(\cdot) as function from SS to +\mathbb{Z}_{+} such that only one of {(x2,,xi),g(s),(x2,,xi),g(s1)}\{\ell_{(x_{2},\dots,x_{i}),g}(s),\ell_{(x_{2},\dots,x_{i}),g}(s^{-1})\} is nonzero for sSs\in S. It is straightforward to verify sS|(x2,,xi),g(s)|DiamS(Gab)\sum_{s\in S}|\ell_{(x_{2},\dots,x_{i}),g}(s)|\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}}).

Finally, the proof is concluded by applying (10) with the above choice of (x2,,xi),g()\ell_{(x_{2},\dots,x_{i}),g}(\cdot).

Corollary 4.

For 1iL1\leq i\leq L,

|Gi/Gi+1||Gab|r(G)i1 and |G||Gab|2r(G)L.|G_{i}/G_{i+1}|\leq|G_{\mathrm{ab}}|^{r(G)^{i-1}}\quad\text{ and }\quad|G|\leq|G_{\mathrm{ab}}|^{2r(G)^{L}}.
Remark 4.

From a technical standpoint, the second inequality above is why the term rLr^{L} is present in the condition |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|} of Theorem 2. This inequality can be improved in various scenarios with extra knowledge on the group structure.

Proof.

By (10) and (12), for any gGig\in G_{i}, we can express Gi+1gG_{i+1}g as

Gi+1g=Gi+1(x2,,xi)Ri1ρ(h(x2,,xi),g,x2,,xi)G_{i+1}g=G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})

for some h(x2,,xi),g:=sSs(x2,,xi),g(s)h_{(x_{2},\dots,x_{i}),g}:=\prod_{s\in S}s^{\ell_{(x_{2},...,x_{i}),g}(s)} where (x2,,xi),g()A\ell_{(x_{2},...,x_{i}),g}(\cdot)\in A for all (x2,,xi)Ri1(x_{2},\dots,x_{i})\in R^{i-1} where AA is as in Corollary 3. By the same argument as in the proof of Corollary 3, for any given h(x2,,xi),gGh_{(x_{2},\dots,x_{i}),g}\in G, we can take h(x2,,xi),gab=G2h(x2,,xi),gGabh^{ab}_{(x_{2},\dots,x_{i}),g}=G_{2}h_{(x_{2},\dots,x_{i}),g}\in G_{\mathrm{ab}} so that

Gi+1ρ(h(x2,,xi),g,x2,,xi)=Gi+1ρ(h(x2,,xi),gab,x2,,xi).G_{i+1}\rho(h_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i})=G_{i+1}\rho(h^{ab}_{(x_{2},\dots,x_{i}),g},x_{2},...,x_{i}).

That is, for any Gi+1gGi/Gi+1G_{i+1}g\in G_{i}/G_{i+1}, we can define a function ϕg:Ri1Gab\phi_{g}:R^{i-1}\to G_{\mathrm{ab}} by ϕg(x2,,xi)=h(x2,,xi),gab\phi_{g}(x_{2},\dots,x_{i})=h^{ab}_{(x_{2},\dots,x_{i}),g}, which implies |Gi/Gi+1||G_{i}/G_{i+1}| is upper bounded by the number of functions from Ri1R^{i-1} to GabG_{\mathrm{ab}}.

|G|=i=1L|Gi/Gi+1|i=1L|Gab||Ri1||Gab|2|R|L.|G|=\prod_{i=1}^{L}|G_{i}/G_{i+1}|\leq\prod_{i=1}^{L}|G_{\mathrm{ab}}|^{|R^{i-1}|}\leq|G_{\mathrm{ab}}|^{2|R|^{L}}.

Taking SS such that |R|=r(G)|R|=r(G) gives the desired inequality.

2.2. Proof of Theorem 4

We begin with the following estimate that plays a key role in the proof of Theorem 4.

Lemma 2.1.

Let i[L]i\in[L] be fixed. For any 1mDiamS(Gab)1\leq m\leq\mathrm{Diam}_{S}(G_{\mathrm{ab}}) and (s,x2,,xi)S×Ri1(s,x_{2},...,x_{i})\in S\times R^{i-1}

|Gi+1ρ(sm,x2,,xi)|25i+6(22i+Lm1/i).|G_{i+1}\rho(s^{m},x_{2},...,x_{i})|\leq 2^{5i+6}\left(2^{2i}+Lm^{1/i}\right).

In what follows, we first present the proof of Theorem 4 given Lemma 2.1 and then complete the proof of Lemma 2.1.

Proof of Theorem 4.

To simplify notation, we abbreviate D:=DiamS(Gab)D:=\mathrm{Diam}_{S}(G_{\mathrm{ab}}). Recall from (5) that SGi+1={Gi+1s:sS}S_{G_{i+1}}=\{G_{i+1}s:s\in S\} for i[L]i\in[L]. Let distS,i(,)\mathrm{dist}_{S,i}(\cdot,\cdot) denote the graph distance on the Cayley graph Cay(Gi/Gi+1,SGi+1)\text{Cay}(G_{i}/G_{i+1},S_{G_{i+1}}) and write |Gi+1g|:=distS,i(id,g)|G_{i+1}g|:=\operatorname{dist}_{S,i}(\mathrm{id},g) for gGg\in G.

The first inequality DiamS(G2)i=2LDiamS(Gi/Gi+1)\mathrm{Diam}_{S}(G_{2})\leq\sum_{i=2}^{L}\mathrm{Diam}_{S}(G_{i}/G_{i+1}) follows from inductively applying (7) with H=Gi+1H=G_{i+1} and H=GiH^{\prime}=G_{i} for 2iL2\leq i\leq L.

We turn to the second inequality. The goal is to prove that for every 2iL2\leq i\leq L,

DiamS(Gi/Gi+1)25i+7|R|i(22i+LD/|R|1/i).\mathrm{Diam}_{S}(G_{i}/G_{i+1})\leq 2^{5i+7}|R|^{i}\left(2^{2i}+L\lceil D/|R|\rceil^{1/i}\right). (14)

By Corollary 3, in order to prove (14) it suffices to show that for any gGig\in G_{i},

|Gi+1(x2,,xi)Ri1sSρ(s(x2,,xi),g(s),x2,,xi)|25i+7|R|i(22i+LD/|R|1/i),|G_{i+1}\prod_{(x_{2},...,x_{i})\in R^{i-1}}\prod_{s\in S}\rho(s^{\ell_{(x_{2},...,x_{i}),g}(s)},x_{2},...,x_{i})|\leq 2^{5i+7}|R|^{i}\left(2^{2i}+L\lceil D/|R|\rceil^{1/i}\right), (15)

where {(x2,,xi),g():(x2,,xi)Ri1}\{\ell_{(x_{2},...,x_{i}),g}(\cdot):(x_{2},...,x_{i})\in R^{i-1}\} are functions defined in Corollary 3, belonging to the set

A:={:S+ s.t. sS|(s)|D and (s)(s1)=0 for all sS such that ss1}.A:=\{\ell:S\to\mathbb{Z}_{+}\text{ s.t. }\sum_{s\in S}|\ell(s)|\leq D\text{ and }\ell(s)\cdot\ell(s^{-1})=0\text{ for all }s\in S\text{ such that }s\neq s^{-1}\}.

For any A\ell\in A and sSs\in S, we have (s)D\ell(s)\leq D. Applying Lemma 2.1 to (s)\ell(s) and using the triangle inequality, we can obtain

|Gi+1sSρ(s(s),x2,,xi)|25i+6(22i|S|+LmaxA{sS(s)1/i}).|G_{i+1}\prod_{s\in S}\rho(s^{\ell(s)},x_{2},...,x_{i})|\leq 2^{5i+6}(2^{2i}|S|+L\max_{\ell\in A}\{\sum_{s\in S}\ell(s)^{1/i}\}).

Given the constraint A\ell\in A, a simple application of Lagrange multipliers gives

maxA{sS(s)1/i}|R|D/|R|1/i.\max_{\ell\in A}\{\sum_{s\in S}\ell(s)^{1/i}\}\leq|R|\cdot\lceil D/|R|\rceil^{1/i}.

Plugging this into the previous display gives

|Gi+1sSρ(s(s),x2,,xi)|25i+6(22i|S|+L|R|D/|R|1/i).|G_{i+1}\prod_{s\in S}\rho(s^{\ell(s)},x_{2},...,x_{i})|\leq 2^{5i+6}(2^{2i}|S|+L|R|\cdot\lceil D/|R|\rceil^{1/i}).

Summing over (x2,,xi)Ri1(x_{2},...,x_{i})\in R^{i-1} using the triangle inequality gives the required bound in (15) and thus completes the proof.

Proof of Lemma 2.1. The following simple estimate will play a major role in our proof. For 2ji2\leq j\leq i, (x1,,xi)S×Ri1(x_{1},...,x_{i})\in S\times R^{i-1} and nn\in\mathbb{N}, we have

|ρ(x1n,x2n,,xjn,xj+1,,xi)|2i+2n.|\rho(x_{1}^{n},x_{2}^{n},...,x_{j}^{n},x_{j+1},...,x_{i})|\leq 2^{i+2}n. (16)

This follows from the fact that

|ρ(x1,,xi)|2(|xi|+|ρ(x1,,xi1)|),|\rho(x_{1},...,x_{i})|\leq 2(|x_{i}|+|\rho(x_{1},...,x_{i-1})|),

which is a simple consequence of the definition ρ(x1,,xi)=[ρ(x1,,xi1),xi]\rho(x_{1},\dots,x_{i})=[\rho(x_{1},\dots,x_{i-1}),x_{i}].

Note that if m=njm=n^{j} for some nn\in\mathbb{N} and j1j\geq 1, then by (10) we can simply write

Gi+1ρ(sm,x2,,xi)=Gi+1ρ(sn,x2n,,xjn,xj+1,,xi)G_{i+1}\rho(s^{m},x_{2},...,x_{i})=G_{i+1}\rho(s^{n},x_{2}^{n},...,x_{j}^{n},x_{j+1},...,x_{i})

and use (16) to conclude the proof. Otherwise, we can still try to decompose mm as a sum of terms of the form {nj:j[i],n}\{n^{j}:j\in[i],n\in\mathbb{N}\}, which helps improving the upper bound on |Gi+1ρ(sm,x2,,xi)||G_{i+1}\rho(s^{m},x_{2},...,x_{i})|. In other words, our goal is to express Gi+1ρ(sm,x2,,xi)G_{i+1}\rho(s^{m},x_{2},...,x_{i}) as the product of elements of the form

{Gi+1ρ(sn,x2n,,xjn,xj+1,,xi):j[i],n}.\{G_{i+1}\rho(s^{n},x_{2}^{n},...,x_{j}^{n},x_{j+1},...,x_{i}):j\in[i],n\in\mathbb{N}\}.

To find the decomposition of Gi+1ρ(sm,x2,,xi)G_{i+1}\rho(s^{m},x_{2},...,x_{i}) we can employ a greedy procedure to search for some set W(j)W(j)\subseteq\mathbb{N} for each j[i]j\in[i] so that m=j[i]nW(j)njm=\sum_{j\in[i]}\sum_{n\in W(j)}n^{j}. In what follows we first define W(i)W(i) and then find W(j)W(j) for j=i1,i2,,1j=i-1,i-2,\dots,1. Setting E1:=mE_{1}:=m and D1:=m1/iD_{1}:=\lfloor m^{1/i}\rfloor, we will define EjE_{j} and DjD_{j} inductively for j=i1,i2,2.j=i-1,i-2,\dots 2.

For a1a\geq 1 such that Ea4i2E_{a}\geq 4^{i^{2}}, let

Ea+1:=EaDai,Da+1:=Ea+11/iandya:=ρ(sDa,x2Da,,xiDa).E_{a+1}:=E_{a}-D_{a}^{i},\quad D_{a+1}:=\lfloor E_{a+1}^{1/i}\rfloor\quad\text{and}\quad y_{a}:=\rho(s^{D_{a}},x_{2}^{D_{a}},...,x_{i}^{D_{a}}).

We stop at the first time when |Ea|<4i2|E_{a}|<4^{i^{2}} and record i:=min{a:Ea<4i2}\ell_{i}:=\min\{a:E_{a}<4^{i^{2}}\}. Set W(i):={Da:1a<i}W(i):=\{D_{a}:1\leq a<\ell_{i}\}.

For each j=i1,i2,,2j=i-1,i-2,\dots,2, in order to find W(j)W(j) we proceed as follows: let

Ea+1:=EaDaj,Da+1:=Ea+11/jandya:=ρ(sDa,x2Da,,xjDa,xj+1,,xi).E_{a+1}:=E_{a}-D_{a}^{j},\quad D_{a+1}:=\lfloor E_{a+1}^{1/j}\rfloor\quad\text{and}\quad y_{a}:=\rho(s^{D_{a}},x_{2}^{D_{a}},...,x_{j}^{D_{a}},x_{j+1},...,x_{i}).

We stop at the first time when Ea<4j2E_{a}<4^{j^{2}} and record j:=min{a:Ea<4j2}\ell_{j}:=\min\{a:E_{a}<4^{j^{2}}\}. Set W(j):={Da:j+1a<j}W(j):=\{D_{a}:\ell_{j+1}\leq a<\ell_{j}\}.

Finally, we set y2:=ρ(sE2,x2,,xi)y_{\ell_{2}}:=\rho(s^{E_{\ell_{2}}},x_{2},...,x_{i}), y:=a=12yay:=\prod_{a=1}^{\ell_{2}}y_{a} and W(1):={E2}W(1):=\{E_{\ell_{2}}\}.

By Propositon 3 we have

Gi+1ρ(sm,x2,,xi)=Gi+1y.G_{i+1}\rho(s^{m},x_{2},...,x_{i})=G_{i+1}y.

That is, it suffices to upper bound |Gi+1y||G_{i+1}y|. For 1a<21\leq a<\ell_{2}, it follows from (16) and the definition of yay_{a} that |Gi+1ya|2i+2Da|G_{i+1}y_{a}|\leq 2^{i+2}D_{a}. It is easy to see that Da+1DaD_{a+1}\leq D_{a} for all j+1a<j\ell_{j+1}\leq a<\ell_{j} and thus DaDj+1D_{a}\leq D_{\ell_{j+1}} for j+1a<j\ell_{j+1}\leq a<\ell_{j}. Lastly, by definition, |Gi+1y2|E2422=28|G_{i+1}y_{\ell_{2}}|\leq E_{\ell_{2}}\leq 4^{2^{2}}=2^{8}. Combining these facts gives

|Gi+1y|\displaystyle|G_{i+1}y| a=12|Gi+1ya|=a=1i1|Gi+1ya|+j=2i1j+1a<j|Gi+1ya|+|Gi+1y2|\displaystyle\leq\sum_{a=1}^{\ell_{2}}|G_{i+1}y_{a}|=\sum_{a=1}^{\ell_{i}-1}|G_{i+1}y_{a}|+\sum_{j=2}^{i-1}\sum_{\ell_{j+1}\leq a<\ell_{j}}|G_{i+1}y_{a}|+|G_{i+1}y_{\ell_{2}}|
a=1i12i+2Da+j=2i1(jj+1)2i+2Dj+1+28.\displaystyle\leq\sum_{a=1}^{\ell_{i}-1}2^{i+2}D_{a}+\sum_{j=2}^{i-1}(\ell_{j}-\ell_{j+1})2^{i+2}D_{\ell_{j+1}}+2^{8}. (17)

We first upper bound the second term in (2.2). By definition Ej+1<4(j+1)2E_{\ell_{j+1}}<4^{(j+1)^{2}} and thus Dj+1Ej+11/j4(j+1)2/j4j+3D_{\ell_{j+1}}\leq E_{\ell_{j+1}}^{1/j}\leq 4^{(j+1)^{2}/j}\leq 4^{j+3}. To bound jj+1\ell_{j}-\ell_{j+1} for 2ji12\leq j\leq i-1, note that for j+1a<j\ell_{j+1}\leq a<\ell_{j}, Ea4j2E_{a}\geq 4^{j^{2}} and thus Da4jD_{a}\geq 4^{j}, which implies that

4j2Ej1Ej24j2Ej+1(j1j+1)4j2,4^{j^{2}}\leq E_{\ell_{j}-1}\leq E_{\ell_{j}-2}-4^{j^{2}}\leq E_{\ell_{j+1}}-(\ell_{j}-1-\ell_{j+1})4^{j^{2}},

i.e.,

(jj+1)Ej+14j2<4(j+1)24j2=42j+1.(\ell_{j}-\ell_{j+1})\leq\frac{E_{\ell_{j+1}}}{4^{j^{2}}}<\frac{4^{(j+1)^{2}}}{4^{j^{2}}}=4^{2j+1}.

It follows that the second term in (2.2) satisfies

j=2i1(jj+1)2i+2Dj+12i+2j=2i142j+14j+327i+5.\sum_{j=2}^{i-1}(\ell_{j}-\ell_{j+1})2^{i+2}D_{\ell_{j+1}}\leq 2^{i+2}\sum_{j=2}^{i-1}4^{2j+1}4^{j+3}\leq 2^{7i+5}. (18)

Next, we estimate a=1i1Da\sum_{a=1}^{\ell_{i}-1}D_{a} in (2.2). Observe that

E2=mm1/ii(m1/i+1)im1/ii2im1/i(i1)2imi1i.E_{2}=m-\lfloor m^{1/i}\rfloor^{i}\leq(\lfloor m^{1/i}\rfloor+1)^{i}-\lfloor m^{1/i}\rfloor^{i}\leq 2^{i}\lfloor m^{1/i}\rfloor^{(i-1)}\leq 2^{i}m^{\frac{i-1}{i}}. (19)

Repeating the same calculation for 2imi1i2^{i}m^{\frac{i-1}{i}} yields that E32i(1+i1i)m(i1i)2E_{3}\leq 2^{i(1+\frac{i-1}{i})}m^{(\frac{i-1}{i})^{2}}. More generally, for 1a<i1\leq a<\ell_{i}, since h=0(i1i)h=i\sum_{h=0}^{\infty}(\frac{i-1}{i})^{h}=i,

Ea+1(2i)h=0a1(i1i)hm(i1i)a2i2m(i1i)a.E_{a+1}\leq(2^{i})^{\sum_{h=0}^{a-1}(\frac{i-1}{i})^{h}}m^{(\frac{i-1}{i})^{a}}\leq 2^{i^{2}}m^{(\frac{i-1}{i})^{a}}.

Since for 2a<i2\leq a<\ell_{i}, Da=Ea1/iE21/iE21/iD_{a}=\lfloor E_{a}^{1/i}\rfloor\leq\lfloor E_{2}^{1/i}\rfloor\leq E_{2}^{1/i}, by (19) and the fact that D1:=m1/iD_{1}:=\lfloor m^{1/i}\rfloor we have

a=1i1Dam1/i+i2mi1i2.\sum_{a=1}^{\ell_{i}-1}D_{a}\leq\lfloor m^{1/i}\rfloor+\ell_{i}\cdot 2m^{\frac{i-1}{i^{2}}}.

It remains to upper bound i\ell_{i}. By definition, we have imin{a:2i2m(i1i)a<4i2}\ell_{i}\leq\min\{a:2^{i^{2}}m^{(\frac{i-1}{i})^{a}}<4^{i^{2}}\}. Simple calculation shows that for any 2iL2\leq i\leq L,

iloglogmloglog(2i2)log(ii1)loglogmlog(LL1)2(L1)loglogm,\ell_{i}\leq\left\lceil\frac{\log\log m-\log\log(2^{i^{2}})}{\log(\frac{i}{i-1})}\right\rceil\leq\frac{\log\log m}{\log(\frac{L}{L-1})}\leq 2(L-1)\log\log m,

where the last inequality follows from the fact that log(1+x)x/2\log(1+x)\geq x/2 for x[0,1]x\in[0,1]. Therefore, for 2iL2\leq i\leq L and 1mD1\leq m\leq D,

a=1i1Da\displaystyle\sum_{a=1}^{\ell_{i}-1}D_{a} m1/i+2(L1)(loglogm)2mi1i24Lmax{m1/i,(loglogm)mi1i2}.\displaystyle\leq\lfloor m^{1/i}\rfloor+2(L-1)(\log\log m)\cdot 2m^{\frac{i-1}{i^{2}}}\leq 4L\cdot\max\{m^{1/i},(\log\log m)m^{\frac{i-1}{i^{2}}}\}.

Noting that max1mei2{m1/i,(loglogm)mi1i2}(logi)ei\max_{1\leq m\leq e^{i^{2}}}\{m^{1/i},(\log\log m)m^{\frac{i-1}{i^{2}}}\}\leq(\log i)e^{i} and maxm>ei2{m1/i,(loglogm)mi1i2}(logi)m1/i\max_{m>e^{i^{2}}}\{m^{1/i},(\log\log m)m^{\frac{i-1}{i^{2}}}\}\leq(\log i)m^{1/i}, we have

a=1i1Da4L(logi)eim1/i24i+2Lm1/i.\sum_{a=1}^{\ell_{i}-1}D_{a}\leq 4L(\log i)e^{i}m^{1/i}\leq 2^{4i+2}Lm^{1/i}. (20)

Finally, plugging the upper bounds in (18) and (20) into (2.2) yields, for 2iL2\leq i\leq L,

|Gi+1y|\displaystyle|G_{i+1}y| 2i+224i+2Lm1/i+27i+5+2825i+4Lm1/i+27i+6\displaystyle\leq 2^{i+2}\cdot 2^{4i+2}Lm^{1/i}+2^{7i+5}+2^{8}\leq 2^{5i+4}Lm^{1/i}+2^{7i+6}
25i+6(22i+Lm1/i).\displaystyle\leq 2^{5i+6}\left(2^{2i}+Lm^{1/i}\right).

which completes the proof of Lemma 2.1. ∎

3. Reduction to Abelianization

Let XtX_{t} be a rate 1 simple random walk on Cay(G,S)\text{Cay}(G,S) and let Yt:=G2XtY_{t}:=G_{2}X_{t} be the projected random walk of XtX_{t} onto GabG_{\mathrm{ab}}, which is a rate 1 simple random walk on Cay(Gab,SG2)\text{Cay}(G_{\mathrm{ab}},S_{G_{2}}). Let tmixG,S(ε)t_{\mathrm{mix}}^{G,S}(\varepsilon) denote the ε\varepsilon-mixing time of the random walk on Cay(G,S)\text{Cay}(G,S) and tmixGab,S(ε)t_{\mathrm{mix}}^{G_{\mathrm{ab}},S}(\varepsilon) the ε\varepsilon-mixing time for the projected random walk on Cay(Gab,SG2)\text{Cay}(G_{\mathrm{ab}},S_{G_{2}}). To simplify notation we will drop the SS in the superscript and write tmixG(ε)t_{\mathrm{mix}}^{G}(\varepsilon) instead when the choice of SS is clear from the context.

Since YtY_{t} is the projection of XtX_{t} onto the abelianization GabG_{\mathrm{ab}} we can observe

G2(Yt=)πGabTVid(Xt=)πGTV,\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}, (21)

which implies tmixGab(ε)tmixG(ε)t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon). Naturally we are interested in the mixing behavior of XtX_{t} in comparison to that of YtY_{t}, that is, we hope to understand to what extent the mixing behavior of the random walk on GG is governed by its projection on the abelianzation group GabG_{\mathrm{ab}}. It turns out that for a nilpotent group GG of bounded step and rank, when the generator set SS is not too large, the mixing of XtX_{t} is completely governed by that of YtY_{t}.

Theorem 2.

Let GG be a finite nilpotent group such that r(G),L(G)1r(G),L(G)\asymp 1 and SGS\subseteq G be a symmetric set of generators. Suppose |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}. For any fixed ε(0,1)\varepsilon\in(0,1) and δ(0,ε)\delta\in(0,\varepsilon) we have

tmixGab(ε)tmixG(ε)tmixGab(εδ)t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon)\leq t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta)

when |G||G| is sufficiently large (more precisely, when |G|exp((log|G|)L)δ|G|\exp(-(\log|G|)^{L})\leq\delta).

It is well known that the relaxation time is characterized by the exponential decay rate of the total variation distance between the talk and its equilibrium (see e.g., Corollary 12.7 in [32]). As a consequence of the proof of Theorem 2, we obtain the following characterization of the relaxation time of XtX_{t} in terms of its projection YtY_{t}.

Corollary 1.

Let trelGt^{G}_{\mathrm{rel}} and trelGabt^{G^{\mathrm{ab}}}_{\mathrm{rel}} be the relaxation time of the walk XtX_{t} and YtY_{t} respectively. Then

trelGabtrelGmax{trelGab,|S|DiamS(G2)2}.t^{G^{\mathrm{ab}}}_{\mathrm{rel}}\leq t^{G}_{\mathrm{rel}}\leq\max\{t^{G^{\mathrm{ab}}}_{\mathrm{rel}},|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\}.

In particular, when |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|} we have trelG=trelGabt^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}}.

Before moving on to proving Theorem 2, we first explain why Corollary 1 is an easy consequence of the proof of Theorem 2 and delay its proof to the end of this section. It is useful to observe the following inequality

id(Xt=)πGTV\displaystyle\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}} id(Xt=)πG2(Xt=)TV+πG2(Xt=)πGTV,\displaystyle\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}+\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}, (22)

where πA\pi_{A} denotes the uniform distribution over the set AGA\subseteq G. We will establish that in the given regime of |S||S|, the second term πG2(Xt=)πGTV\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}} in the above inequality is the leading order term that determines the time of mixing. Moreover, Lemma 3.1 shows this term is fully characterized by the projected random walk YtY_{t} on GabG_{\mathrm{ab}}.

Using the interpretation that the relaxation time is the exponential decay rate of the total variation distance, by taking power 1/t1/t and letting tt\to\infty on both sides of (21) and (22), we shall see that trelG=trelGabt^{G}_{\mathrm{rel}}=t^{G^{\mathrm{ab}}}_{\mathrm{rel}} if |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}.

As a direct consequence of Theorem 2 and Corollary 1 we can now present the proof of Theorem 3.

Theorem 3.

Suppose GG is a nilpotent group with rank rr and step LL such that either (i) GabmrG_{ab}\cong\mathbb{Z}^{r}_{m} where mm\in\mathbb{N} or (ii) GG is a pp-group. Suppose the rank and step satisfy LrL+1log|G|16loglog|G|Lr^{L+1}\leq\frac{\log|G|}{16\log\log|G|}. For any symmetric set of generators SGS\subseteq G of minimal size and any given ε>0\varepsilon>0, the mixing time tmixG,S(ε)t^{G,S}_{mix}(\varepsilon) is the same up to smaller order terms, and the relaxation time trelG,St_{\mathrm{rel}}^{G,S} is the same.

Proof.

Let SS be a minimal size symmetric set of generators for GG and let SG2={G2s:sS}S_{G_{2}}=\{G_{2}s:s\in S\}. By Proposition 4 we can see that SG2S_{G_{2}} is a minimal size symmetric set of generators for GabG_{\mathrm{ab}}. As GabG_{\mathrm{ab}} is an abelian group, it can be expressed in the form

Gabm1m2mr,G_{\mathrm{ab}}\cong\mathbb{Z}_{m_{1}}\oplus\mathbb{Z}_{m_{2}}\oplus\cdots\oplus\mathbb{Z}_{m_{r}}, (23)

where m1,,mrm_{1},\dots,m_{r}\in\mathbb{Z} and rr is the rank of GG. In general the choice of m1,,mrm_{1},\dots,m_{r} is not unique (e.g., 21565\mathbb{Z}_{2}\oplus\mathbb{Z}_{15}\cong\mathbb{Z}_{6}\oplus\mathbb{Z}_{5}), but when GG satisfies the assumption in the statement the expression in (23) is unique: in case (i) GabmrG_{ab}\cong\mathbb{Z}^{r}_{m}; in case (ii) we can write Gabpα1pαrG_{\mathrm{ab}}\cong\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}} for some αi\alpha_{i}\in\mathbb{\mathbb{Z}} with α1αr\alpha_{1}\leq\dots\leq\alpha_{r}. Hence, any minimal size symmetric set SG2S_{G_{2}} that generates GabG_{\mathrm{ab}} uniquely corresponds to {±ei:i[r]}\{\pm e_{i}:i\in[r]\}, the collection of standard basis. Thus the mixing time tmixGab,S(ε)t^{G_{\mathrm{ab}},S}_{mix}(\varepsilon) on GabG_{\mathrm{ab}} for any such SS is equal to the mixing time of the walk on mr\mathbb{Z}^{r}_{m} (or pα1pαr\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}}) with generators {±ei:i[r]}\{\pm e_{i}:i\in[r]\}.

As our assumptions on rr and LL guarantees |S|=2rlog|G|8LrLloglog|G||S|=2r\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}, we can apply Theorem 2 to show that the mixing time tmixG,S(ε)t^{G,S}_{mix}(\varepsilon) is the equal to (up to smaller order terms) the mixing time of the walk on mr\mathbb{Z}^{r}_{m} (or pα1pαr\mathbb{Z}_{p^{\alpha_{1}}}\oplus\cdots\oplus\mathbb{Z}_{p^{\alpha_{r}}}) with generators {±ei:i[r]}\{\pm e_{i}:i\in[r]\} for any minimal size symmetric set of generators SS of GG. The result for relaxation time follows similarly from Corollary 1. ∎

3.1. Proofs

Lemma 3.1.

For t0t\geq 0,

πG2(Xt=)πGTV=G2(Yt=)πGabTV.\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}=\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}.
Proof.

Write Gab={G2gi:1i|Gab|}G_{\mathrm{ab}}=\{G_{2}g_{i}:1\leq i\leq|G_{\mathrm{ab}}|\}. One can easily check that starting from the initial distribution πG2{\pi_{G_{2}}}, for any 1i|Gab|1\leq i\leq|G_{\mathrm{ab}}| and h,hG2gih,h^{\prime}\in G_{2}g_{i}, πG2(Xt=h)=πG2(Xt=h)\mathbb{P}_{\pi_{G_{2}}}(X_{t}=h)=\mathbb{P}_{\pi_{G_{2}}}(X_{t}=h^{\prime}). Hence,

2πG2(Xt=)πGTV\displaystyle 2\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}} =i=1|Gab|xG2gi|πG2(Xt=x)1|G||=i=1|Gab||id(XtG2gi)|G2||G||\displaystyle=\sum_{i=1}^{|G_{\mathrm{ab}}|}\sum_{x\in G_{2}g_{i}}\bigg{|}\mathbb{P}_{\pi_{G_{2}}}(X_{t}=x)-\frac{1}{|G|}\bigg{|}=\sum_{i=1}^{|G_{\mathrm{ab}}|}\bigg{|}\mathbb{P}_{\mathrm{id}}(X_{t}\in G_{2}g_{i})-\frac{|G_{2}|}{|G|}\bigg{|}
=i=1|Gab||G2(Yt=G2gi)1|Gab||\displaystyle=\sum_{i=1}^{|G_{\mathrm{ab}}|}\bigg{|}\mathbb{P}_{G_{2}}(Y_{t}=G_{2}g_{i})-\frac{1}{|G_{\mathrm{ab}}|}\bigg{|}
=2G2(Yt=)πGabTV.\displaystyle=2\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}}.

It remains to upper bound the difference id(Xt=)πG2(Xt=)TV\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}.

Lemma 3.2.

For t0t\geq 0,

id(Xt=)πG2(Xt=)TV|G|2exp(t|S|DiamS(G2)2).\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}\leq\frac{|G|}{2}\exp\left(-\frac{t}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right).
Remark 5.

The conclusion in Lemma 3.2 holds if we replace G2G_{2} by any subgroup HH of GG.

Proof.

Let PP be the transition matrix of the simple random walk XtX_{t} on Cay(G,S)\text{Cay}(G,S). For t0t\geq 0, we can define the continuous time kernel by Pt:=n=0(tP)nn!etP_{t}:=\sum_{n=0}^{\infty}\frac{(tP)^{n}}{n!}e^{-t}.

Consider the linear subspace of functions

𝒜:={f:G|xG2gf(x)=0 for all gG}.\mathcal{A}:=\{f:G\to\mathbb{R}\big{|}\sum_{x\in G_{2}g}f(x)=0\text{ for all }g\in G\}.

We now show that 𝒜\mathcal{A} is invariant under the transition matrix PP, i.e., Pf𝒜Pf\in\mathcal{A} for all f𝒜f\in\mathcal{A}. For any gGg\in G,

xG2gPf(x)\displaystyle\sum_{x\in G_{2}g}Pf(x) =hG2Pf(hg)=hG2yGP(hg,y)f(y)=hG2zGP(hg,hz)f(hz)\displaystyle=\sum_{h\in G_{2}}Pf(hg)=\sum_{h\in G_{2}}\sum_{y\in G}P(hg,y)f(y)=\sum_{h\in G_{2}}\sum_{z\in G}P(hg,hz)f(hz)
=zGP(g,z)(hG2f(hz))=0,\displaystyle=\sum_{z\in G}P(g,z)\left(\sum_{h\in G_{2}}f(hz)\right)=0,

where the second line uses the fact that PP is translation invariant, i.e., P(hg,hz)=P(g,z)P(hg,hz)=P(g,z) for any h,g,zGh,g,z\in G and that f𝒜f\in\mathcal{A}.

Let P~\tilde{P} denote the transition matrix of the SRW on Cay(G,S~)\text{Cay}(G,\tilde{S}) with S~:=G2\tilde{S}:=G_{2}. We can also check that P~f=0\tilde{P}f=0 for all f𝒜f\in\mathcal{A}, i.e.,

P~f(x)=yGP~(x,y)f(y)=yG𝟏{x1yG2}f(y)|G2|=0 for all xG.\tilde{P}f(x)=\sum_{y\in G}\tilde{P}(x,y)f(y)=\sum_{y\in G}\frac{\mathbf{1}\{x^{-1}y\in G_{2}\}f(y)}{|G_{2}|}=0\quad\text{ for all }x\in G.

Hence, for f𝒜f\in\mathcal{A} we have that (below XX and YY are independent)

~(f,f)\displaystyle\tilde{\mathcal{E}}(f,f) :=12𝔼Xπ,YUnif(S~)[(f(X)f(XY))2]=f22𝔼π[f(P~f)]=f22,\displaystyle:=\frac{1}{2}\mathbb{E}_{X\sim\pi,Y\sim\mathrm{Unif}(\tilde{S})}[(f(X)-f(XY))^{2}]=\|f\|_{2}^{2}-\mathbb{E}_{\pi}[f(\tilde{P}f)]=\|f\|_{2}^{2},
(f,f)\displaystyle\mathcal{E}(f,f) :=12𝔼Xπ,YUnif(S)[(f(X)f(XY))2]=f22𝔼π[f(Pf)]f22f2Pf2,\displaystyle:=\frac{1}{2}\mathbb{E}_{X\sim\pi,Y\sim\mathrm{Unif}(S)}[(f(X)-f(XY))^{2}]=\|f\|_{2}^{2}-\mathbb{E}_{\pi}[f(Pf)]\geq\|f\|_{2}^{2}-\|f\|_{2}\|Pf\|_{2},

where f2:=(𝔼π[f2])1/2=(gG1|G|f2(g))1/2\|f\|_{2}:=\left(\mathbb{E}_{\pi}[f^{2}]\right)^{1/2}=\left(\sum_{g\in G}\frac{1}{|G|}f^{2}(g)\right)^{1/2} is the 2\ell_{2} norm of ff. Applying Theorem 4.4 in [2], which gives a comparison of Dirichlet forms for two sets of generators, we see that for the two symmetric random walks on the finite group GG with transition matrices PP and P~\tilde{P} defined as before,

~(f,f)|S|DiamS(G2)2(f,f)\tilde{\mathcal{E}}(f,f)\leq|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\cdot\mathcal{E}(f,f)

for all functions f:Gf:G\to\mathbb{R}. That is,

1Pf2f2=(f,f)~(f,f)1|S|DiamS(G2)2,1-\frac{\|Pf\|_{2}}{\|f\|_{2}}=\frac{\mathcal{E}(f,f)}{\tilde{\mathcal{E}}(f,f)}\geq\frac{1}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}},

and it follows by induction that

Pnf2f2(11|S|DiamS(G2)2)n.\|P^{n}f\|_{2}\leq\|f\|_{2}\left(1-\frac{1}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right)^{n}.

Hence, for any t0t\geq 0, recalling that Pt:=n=0(tP)nn!etP_{t}:=\sum_{n=0}^{\infty}\frac{(tP)^{n}}{n!}e^{-t}, we have

Ptf2f2exp(t|S|DiamS(G2)2).\|P_{t}f\|_{2}\leq\|f\|_{2}\exp\left(-\frac{t}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right). (24)

Define the function φ:G\varphi:G\to\mathbb{R} by φ(g)=𝟏{g=id}𝟏{gG2}/|G2|\varphi(g)=\mathbf{1}\{g=\mathrm{id}\}-\mathbf{1}\{g\in G_{2}\}/|G_{2}| so that

Ptφ(g)=𝔼g[φ(Xt)]=xGPt(g,x)φ(x)=xGPt(x,g)φ(x).P_{t}\varphi(g)=\mathbb{E}_{g}[\varphi(X_{t})]=\sum_{x\in G}P_{t}(g,x)\varphi(x)=\sum_{x\in G}P_{t}(x,g)\varphi(x).

By our choice of φ\varphi it is easy to see that Ptφ(g)=id(Xt=g)πG2(Xt=g)P_{t}\varphi(g)=\mathbb{P}_{\mathrm{id}}(X_{t}=g)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=g) and φ𝒜\varphi\in\mathcal{A}. By the definition of φ\varphi one can check that φ2φ1\|\varphi\|_{2}\leq\|\varphi\|_{\infty}\leq 1. Note that it follows from Cauchy-Schwarz inequality that

4id(Xt=)πG2(Xt=)TV2=(gG|Ptφ(g)|)2|G|2Ptφ22.4\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|^{2}_{\mathrm{TV}}=\left(\sum_{g\in G}|P_{t}\varphi(g)|\right)^{2}\leq|G|^{2}\cdot\|P_{t}\varphi\|^{2}_{2}.

Therefore, applying φ\varphi in (24) gives

id(Xt=)πG2(Xt=)TV\displaystyle\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}} |G|2Ptφ2\displaystyle\leq\frac{|G|}{2}\cdot\|P_{t}\varphi\|_{2}
|G|2exp(t|S|DiamS(G2)2).\displaystyle\leq\frac{|G|}{2}\exp\left(-\frac{t}{|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}}\right).

Proof of Theorem 2. The first inequality tmixGab(ε)tmixG(ε)t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\leq t_{\mathrm{mix}}^{G}(\varepsilon) follows directly from the projection of GG onto GabG_{\mathrm{ab}}. To prove the second part of the inequality, observe that by the triangle inequality

Pid(Xt=)πGTVid(Xt=)πG2(Xt=)TV+πG2(Xt=)πGTV.\|P_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}+\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}.

Lemma 3.1 shows that πG2(Xt=)πGTVεδ\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}\leq\varepsilon-\delta for t:=tmixGab(εδ)t:=t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta).

It remains to prove id(Xt=)πG2(Xt=)TVδ\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}\leq\delta. Recall from Corollary 2 that DiamS(Gab)|S|4L\mathrm{Diam}_{S}(G_{\mathrm{ab}})\gg|S|^{4L} when |S|log|G|8LrLloglog|G||S|\leq\frac{\log|G|}{8Lr^{L}\log\log|G|}. Note that as a direct consequence of Proposition 13.7 in [25], which is a simple application of the Carne-Varopoulos inequality, one has that for all ε1|Gab|1/4\varepsilon\leq 1-|G_{\mathrm{ab}}|^{-1/4},

tmixGab(ε)DiamS(Gab)2log|Gab|.t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon)\gtrsim\frac{\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{2}}{\log|G_{\mathrm{ab}}|}. (25)

It then follows from (25) and Corollary 2 that

tmixGab(εδ)\displaystyle t^{G_{\mathrm{ab}}}_{\mathrm{mix}}(\varepsilon-\delta) DiamS(Gab)2log|Gab|DiamS(Gab)1/4log|G|DiamS(Gab)7/4(log|G|)L|S|DiamS(G2)2,\displaystyle\gtrsim\frac{\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{2}}{\log|G_{\mathrm{ab}}|}\geq\frac{\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{1/4}}{\log|G|}\cdot\mathrm{Diam}_{S}(G_{\mathrm{ab}})^{7/4}\gg(\log|G|)^{L}\cdot|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2},

where we recall that L2L\geq 2. We can then apply Lemma 3.2 to t:=tmixGab(εδ)t:=t_{\mathrm{mix}}^{G_{\mathrm{ab}}}(\varepsilon-\delta) and get

id(Xt=)πG2(Xt=)TV|G|exp((log|G|)L)δ\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}\ll|G|\exp\left(-(\log|G|)^{L}\right)\leq\delta

when |G||G| is large enough. The proof is then complete.

Proof of Corollary 1. It is straightforward to extend the conclusion in Corollary 12.7 of [32] to continuous time Markov chains to get

limtlog(d(t))t=1trel,\lim_{t\to\infty}\frac{\log(d(t))}{t}=-\frac{1}{t_{\mathrm{rel}}},

where d(t)d(t) denotes the total variation distance to stationarity and trelt_{\mathrm{rel}} denotes the corresponding relaxation time. Recall from (21) and (22) that

G2(Yt=)πGabTV\displaystyle\|\mathbb{P}_{G_{2}}(Y_{t}=\cdot)-\pi_{G_{\mathrm{ab}}}\|_{\mathrm{TV}} id(Xt=)πGTV\displaystyle\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}}
id(Xt=)πG2(Xt=)TV+πG2(Xt=)πGTV,\displaystyle\leq\|\mathbb{P}_{\mathrm{id}}(X_{t}=\cdot)-\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)\|_{\mathrm{TV}}+\|\mathbb{P}_{\pi_{G_{2}}}(X_{t}=\cdot)-\pi_{G}\|_{\mathrm{TV}},

Define f()=limtlog()tf(\cdot)=\lim_{t\to\infty}\frac{\log(\cdot)}{t}. We can apply the function ff to both sides in the above inequality and use Lemma 3.1 and 3.2 to obtain

trelGabtrelGmax{trelGab,|S|DiamS(G2)2}.t^{G^{\mathrm{ab}}}_{\mathrm{rel}}\leq t^{G}_{\mathrm{rel}}\leq\max\{t^{G^{\mathrm{ab}}}_{\mathrm{rel}},|S|\cdot\mathrm{Diam}_{S}(G_{2})^{2}\}.

4. On Cayley Graphs with Random i.i.d. Generators

We consider the random walk X(t)X(t) on the Cayley graph of a finite nilpotent group GG with respect to kk generators chosen uniformly at random. The graph is denoted by Cay(G,S)\text{Cay}(G,S), where the generator set S:={Zi±1:i[k]}S:=\{Z_{i}^{\pm 1}:i\in[k]\} with Z1,,ZkiidUnif(G)Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G). The regime of interest in this paper is 1logklog|G|1\ll\log k\ll\log|G|. In this section, our aim is to prove Theorem 1, which we have restated below for ease of reference.

Theorem 1.

Let GG be a finite nilpotent group with r(G),L(G)1r(G),L(G)\asymp 1. Let S={Zi±1:i[k]}S=\{Z_{i}^{\pm 1}:i\in[k]\} with Z1,,ZkiidUnif(G)Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G). Assume 1logklog|G|1\ll\log k\ll\log|G|. As |G||G|\to\infty, the random walk on Cay(G,S)\text{Cay}(G,S) exhibits cutoff with high probability at time t(k,G)t_{*}(k,G), which is the cutoff time defined in Definition 3.

The structure of this section is as follows: in Section 4.1 we will give an overview of the entropic method within the context of Theorem 1 and how it is useful as a framework for proving bounds on the mixing time; in Section 4.2 we prove the lower bound on the mixing time.

Establishing the corresponding upper bound on the mixing time requires significant effort, which involves meticulous analysis of the distribution of the RW on each quotient group Q=G/G+1Q_{\ell}=G_{\ell}/G_{\ell+1} for [L]\ell\in[L]. To this end we give a representation of the random walk X(t)X(t) on Cay(G,S)\text{Cay}(G,S) in Section 4.3. Subsequently, we provide an outline of the proof for the upper bound on mixing time in Section 4.4 and complete the proof in the remaining of the paper.

4.1. Entropic Method and Entropic Times

Let S:={Zi±G:i[k]}S:=\{Z_{i}^{\pm}\in G:i\in[k]\} denote the symmetric set of generators of GG. We define the auxiliary process W:=W(t)=(W1(t),,Wk(t))W:=W(t)=(W_{1}(t),\dots,W_{k}(t)) based on X(t)X(t) where Wi(t)W_{i}(t) is the number of times generator ZiZ_{i} has been applied minus the number of times Zi1Z_{i}^{-1} has been applied in the random walk X(t)X(t). It is easy to see W(t)W(t) is a rate 1 random walk on k\mathbb{Z}^{k}. To simplify notation, we sometimes drop the time index tt when it is clear from the context.

Let (X,W)(X,W) and (X,W)(X^{\prime},W^{\prime}) be two independent copies of the random walk on Cay(G,S)\text{Cay}(G,S) starting at id\mathrm{id} and its auxiliary process. Denote by 𝒲:=𝒲(t)k\mathcal{W}:=\mathcal{W}(t)\subseteq\mathbb{Z}^{k} a subset of the state space of the walk WW, where the index suggests that the precise choice of 𝒲\mathcal{W} (which is postponed to Definition 5) depends on the time tt. For now we will think of 𝒲\mathcal{W} as a set of “typical” locations of WW such that (W(t)𝒲)=o(1)\mathbb{P}(W(t)\notin\mathcal{W})=o(1) for the relevant choice of tt.

We use a “modified L2L^{2} calculation”: first conditioning on WW being “typical”; then using a standard L2L^{2} calculation on the conditioned law. The proof of the following lemma is quite straightforward and hence is omitted.

Lemma 4.1 (Lemma 2.6 of [17]).

For all t0t\geq 0 and all 𝒲k\mathcal{W}\subseteq\mathbb{Z}^{k} the following inequalities hold:

dS(t):=S(X(t))πGTV\displaystyle d_{S}(t):=\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}} S(X(t)|W(t)𝒲)πGTV+(W(t)𝒲)\displaystyle\leq\|\mathbb{P}_{S}(X(t)\in\cdot|W(t)\in\mathcal{W})-\pi_{G}\|_{\mathrm{TV}}+\mathbb{P}(W(t)\notin\mathcal{W})
4S(X(t)|W(t)𝒲)πGTV2\displaystyle 4\|\mathbb{P}_{S}(X(t)\in\cdot|W(t)\in\mathcal{W})-\pi_{G}\|^{2}_{\mathrm{TV}} |G|S(X(t)=X(t)|W(t),W(t)𝒲)1\displaystyle\leq|G|\cdot\mathbb{P}_{S}(X(t)=X^{\prime}(t)|W(t),W^{\prime}(t)\in\mathcal{W})-1

where S\mathbb{P}_{S} denotes the law of the random walk given the generator set SS starting at X(0)=idX(0)=\mathrm{id}.

Note that when SS is a random set of generators, dS(t)d_{S}(t) is a random variable that is measurable with respect to σ(S)\sigma(S), the σ\sigma-field generated by the choice of SS. In what comes later in our arguments, we will take the expectation over the choices of SS and work with

𝔼[S(X(t)=X(t)|W(t),W(t)𝒲)]=:(X(t)=X(t)|W(t),W(t)𝒲).\mathbb{E}[\mathbb{P}_{S}(X(t)=X^{\prime}(t)|W(t),W^{\prime}(t)\in\mathcal{W})]=:\mathbb{P}(X(t)=X^{\prime}(t)|W(t),W^{\prime}(t)\in\mathcal{W}).

A good choice of the set of “typical” locations 𝒲\mathcal{W} will greatly simplify the analysis. Hence, in the remaining of this section, we will discuss in detail the choice of 𝒲\mathcal{W}, or in other words, the “typical event” that we will condition on. As our goal is to obtain an upper bound on the total variation mixing time, we will look at a time tt that is slightly larger than the proposed mixing time, which is the entropic times that will be defined in Section 4.1.1. Based on this choice of time we then define the typical event in Section 4.1.2.

4.1.1. Asymptotics of Entropic Times

Recall from Definition 3 that the entropic time t0:=t0(k,|Gab|)t_{0}:=t_{0}(k,|G_{\mathrm{ab}}|) is the time at which the entropy of the rate 1 random walk WW on k\mathbb{Z}^{k} is log|Gab|\log|G_{\mathrm{ab}}|.

For simplicity in simplicity of notation, we will write t1:=logk|G|t_{1}:=\log_{k}|G| and let the cutoff time be denoted by t:=t(k,G)=max{t0,t1}t_{*}:=t_{*}(k,G)=\max\{t_{0},t_{1}\}.

The entropy of the random walk on k\mathbb{Z}^{k} has been well understood. The interested reader can find a detailed exposition of the entropic times in [18]. Via direct calculation with the simple random walk and Poisson laws, one can obtain asymptotics of entropic times, see, e.g., [18, Proposition A.2 and §A.5] for full details. Here we content ourselves with restating the result of such calculation, which can be found in Proposition 2.2 in [19], that yields the asymptotic values of t0t_{0}:

t0k|Gab|2/k/(2πe)when klog|Gab|,t0kf(λ)when kλlog|Gab|,t0k1/(κlogκ)when klog|Gab|,\begin{array}[]{ll}t_{0}\eqsim k\cdot|G_{\mathrm{ab}}|^{2/k}/(2\pi e)&\quad\text{when }k\ll\log|G_{\mathrm{ab}}|,\\ t_{0}\eqsim k\cdot f(\lambda)&\quad\text{when }k\eqsim\lambda\log|G_{\mathrm{ab}}|,\\ t_{0}\eqsim k\cdot 1/(\kappa\log\kappa)&\quad\text{when }k\gg\log|G_{\mathrm{ab}}|,\end{array} (26)

where κ:=k/log|Gab|\kappa:=k/\log|G_{\mathrm{ab}}| and f:(0,)(0,)f:(0,\infty)\to(0,\infty) is some continuous decreasing function whose exact value is unimportant for our analysis. Since we assume r1r\asymp 1 and L1L\asymp 1, it follows from Corollary 4 that log|Gab|log|G|\log|G_{\mathrm{ab}}|\asymp\log|G|. Consequently, it is possible to have t1>t0t_{1}>t_{0} only in the regime klog|Gab|k\gg\log|G_{\mathrm{ab}}|. To be more specific, writing ρ=logkloglog|Gab|\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}, in the regime klog|Gab|k\gg\log|G_{\mathrm{ab}}| we have lim inf|G|ρ1\liminf_{|G|\to\infty}\rho\geq 1 and

{t0>t1 when ρρ1>log|G|log|Gab|,t0t1 otherwise.\begin{cases}t_{0}>t_{1}&\quad\text{ when }\frac{\rho}{\rho-1}>\frac{\log|G|}{\log|G_{\mathrm{ab}}|},\\ t_{0}\leq t_{1}&\quad\text{ otherwise.}\end{cases}

The following proposition gives the asymptotics of the cutoff time tt_{*} for the regimes of kk that we are interested in.

Proposition 5.

Writing ρ=logkloglog|Gab|\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}, we have the following asymptotics of t(k,G)t_{*}(k,G):

t(k,G){k|Gab|2/k/(2πe)whenklog|Gab|,kf(λ)whenkλlog|Gab|,k/(κlogκ)whenklog|Gab|,ρρ1>log|G|log|Gab|,logk|G|whenklog|Gab|,ρρ1log|G|log|Gab|,t_{*}(k,G)\eqsim\begin{cases}k|G_{\mathrm{ab}}|^{2/k}/(2\pi e)&\quad\text{when}\quad k\ll\log|G_{\mathrm{ab}}|,\\ kf(\lambda)&\quad\text{when}\quad k\eqsim\lambda\log|G_{\mathrm{ab}}|,\\ k/(\kappa\log\kappa)&\quad\text{when}\quad k\gg\log|G_{\mathrm{ab}}|,\frac{\rho}{\rho-1}>\frac{\log|G|}{\log|G_{\mathrm{ab}}|},\\ \log_{k}|G|&\quad\text{when}\quad k\gg\log|G_{\mathrm{ab}}|,\frac{\rho}{\rho-1}\leq\frac{\log|G|}{\log|G_{\mathrm{ab}}|},\end{cases}

where ff is defined as in (26) and κ:=k/log|Gab|\kappa:=k/\log|G_{\mathrm{ab}}|.

4.1.2. Typical Event

For simplicity of notation, in this section we will drop the index of time tt and write W:=W(t)W:=W(t) and X:=X(t)X:=X(t). Write W=(W1,W2,,Wk)W=(W_{1},W_{2},\dots,W_{k}), where for each a[k]a\in[k], WaW_{a} is an independent rate 1/k1/k random walk on \mathbb{Z}. For each a[k]a\in[k], define Wa+W_{a}^{+} to be the number of steps to the right and WaW_{a}^{-} the number of steps to the left in the walk WaW_{a}. It is then easy to see Wa=Wa+WaW_{a}=W_{a}^{+}-W_{a}^{-}.

To upper bound the key quantity (X=X|W,W𝒲)\mathbb{P}(X=X^{\prime}|W,W^{\prime}\in\mathcal{W}) from Lemma 4.1, we will separate it into cases according to whether or not W=WW=W^{\prime}:

(X=X|W,W𝒲)\displaystyle\mathbb{P}(X=X^{\prime}|W,W^{\prime}\in\mathcal{W}) =(X=X|{W=W}{W,W𝒲})(W=W|W,W𝒲)\displaystyle=\mathbb{P}(X=X|\{W=W^{\prime}\}\cap\{W,W^{\prime}\in\mathcal{W}\})\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W})
+(X=X|{WW}{W,W𝒲})(WW|W,W𝒲).\displaystyle\quad+\mathbb{P}(X=X|\{W\neq W^{\prime}\}\cap\{W,W^{\prime}\in\mathcal{W}\})\mathbb{P}(W\neq W^{\prime}|W,W^{\prime}\in\mathcal{W}).

As suggested by this decomposition, bounding (W=W|W,W𝒲)\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W}) plays an important role in the proof. Since W,WW,W^{\prime} are two independent copies,

({W=W}{W,W𝒲})\displaystyle\mathbb{P}(\{W=W^{\prime}\}\cap\{W,W^{\prime}\in\mathcal{W}\}) =w𝒲(W=w)(W=w)maxw𝒲(W=w),\displaystyle=\sum_{w\in\mathcal{W}}\mathbb{P}(W=w)\mathbb{P}(W^{\prime}=w)\leq\max_{w\in\mathcal{W}}\mathbb{P}(W=w),

i.e.,

(W=W|W,W𝒲)maxw𝒲(W=w)(W,W𝒲).\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W})\leq\frac{\max_{w\in\mathcal{W}}\mathbb{P}(W=w)}{\mathbb{P}(W,W^{\prime}\in\mathcal{W})}.

It now becomes clearer that in order to control the probability (W=W|W,W𝒲)\mathbb{P}(W=W^{\prime}|W,W^{\prime}\in\mathcal{W}), we would like to choose 𝒲\mathcal{W} so that (W,W𝒲)=1o(1)\mathbb{P}(W,W^{\prime}\in\mathcal{W})=1-o(1) and maxw𝒲(W=w)\max_{w\in\mathcal{W}}\mathbb{P}(W=w) is sufficiently small.

For t0t\geq 0, write μt\mu_{t} for the law of W(t)W(t), the rate 1 random walk on k\mathbb{Z}^{k}, so that μt(w)=(W(t)=w)\mu_{t}(w)=\mathbb{P}(W(t)=w). Also write νs\nu_{s} for the law of W1(sk)W_{1}(sk) so that μt=νt/kk\mu_{t}=\nu_{t/k}^{\otimes k}. Also, for each a[k]a\in[k], define

Qa(t):=logνt/k(Wa(t)) and Q(t):=logμt(W(t))=a=1kQa(t).Q_{a}(t):=-\log\nu_{t/k}(W_{a}(t))\quad\text{ and }\quad Q(t):=-\log\mu_{t}(W(t))=\sum_{a=1}^{k}Q_{a}(t).

Then 𝔼(Q(t))\mathbb{E}(Q(t)) (and respectively 𝔼(Q1(t))\mathbb{E}(Q_{1}(t))) is the entropy of W(t)W(t) (and respectively W1(t)W_{1}(t)).

We need an estimate on the entropy 𝔼(Q(t))\mathbb{E}(Q(t)) shortly after the proposed mixing time, i.e., for t(1+ε)t(k,G)t\geq(1+\varepsilon)t_{*}(k,G), for which we refer to results in [19]. To state their results, we need to define the following quantity.

Definition 4.

Define h0h_{0} as follows

h0:={log|Gab| when t0(k,|Gab|)>logk|G|,(11ρ)log|G| when t0(k,|Gab|)logk|G|h_{0}:=\begin{cases}\log|G_{\mathrm{ab}}|&\text{ when }t_{0}(k,|G_{\mathrm{ab}}|)>\log_{k}|G|,\\ (1-\frac{1}{\rho})\log|G|&\text{ when }t_{0}(k,|G_{\mathrm{ab}}|)\leq\log_{k}|G|\end{cases}

where ρ=logkloglog|Gab|\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}. Fix some ω\omega such that 1ωmin{k,log|Gab|}1\ll\omega\ll\min\{k,\log|G_{\mathrm{ab}}|\}, and set h:=h0+ωh:=h_{0}+\omega.

Remark 6.

Note that log|Gab|h0\log|G_{\mathrm{ab}}|\leq h_{0} in both cases.

Lemma 3.9 in [19] proves concentration of Q(t)Q(t) whereas we only need one side of their estimate, which we state below.

Lemma 4.2 (Lemma 3.9 in [19]).

Assume that ωmin{k,log|Gab|}\omega\ll\min\{k,\log|G_{\mathrm{ab}}|\}. Let ε>0\varepsilon>0 and t(1+ε)t(k,G)t\geq(1+\varepsilon)t_{*}(k,G). Then

(Q(t)h)=(μt(W(t))eh)=1o(1).\mathbb{P}(Q(t)\geq h)=\mathbb{P}(\mu_{t}(W(t))\leq e^{-h})=1-o(1).

Based on the above discussion and Lemma 4.2, it makes sense to define the (global) typical event as follows

𝒲glo:={wk:(W(t)=w)eh}.\mathcal{W}_{glo}:=\{w\in\mathbb{Z}^{k}:\mathbb{P}(W(t)=w)\leq e^{-h}\}. (27)

Write W±:=(W1±,W2±,,Wk±)W^{\pm}:=(W^{\pm}_{1},W^{\pm}_{2},\dots,W^{\pm}_{k}). Let wkw\in\mathbb{Z}^{k} denote a realization of WW and the corresponding (w+,w)+k×+k(w^{+},w^{-})\in\mathbb{Z}^{k}_{+}\times\mathbb{Z}^{k}_{+} a realization of (W+,W)(W^{+},W^{-}). To have better control over the behavior of each coordinate, we further define

𝒲loc:={(w+,w)+k×+k:|wa±𝔼(Wa±(t))|r,a[k]}\mathcal{W}_{loc}:=\{(w^{+},w^{-})\in\mathbb{Z}^{k}_{+}\times\mathbb{Z}^{k}_{+}:|w^{\pm}_{a}-\mathbb{E}(W^{\pm}_{a}(t))|\leq r_{*},\forall a\in[k]\}\\ (28)

where r:=12|Gab|1/k(logk)2r_{*}:=\frac{1}{2}|G_{\mathrm{ab}}|^{1/k}(\log k)^{2}. It can be observed that rr_{*} is defined based on t0(k,|Gab|)t_{0}(k,|G_{\mathrm{ab}}|) when klog|Gab|k\lesssim\log|G_{\mathrm{ab}}| so that W±(t)𝒲locW^{\pm}(t)\in\mathcal{W}_{loc} whp by a union bound on the kk coordinates. In fact, we will use 𝒲loc\mathcal{W}_{loc} only in the regime klog|Gab|k\lesssim\log|G_{\mathrm{ab}}|.

In the regime klog|Gab|k\gtrsim\log|G_{\mathrm{ab}}| we have t/k1t_{*}/k\lesssim 1. By Poisson thinning, for each a[k]a\in[k] the arrivals of the generators Za±1Z_{a}^{\pm 1} follow an independent Poisson process with rate 1/k1/k. Then t/k1t_{*}/k\lesssim 1 implies that each generator is expected to appear for O(1)O(1) times in the walk XX. Thus in this regime we will focus on the collection of generators that appear exactly once. Define, for (w+,w)+k×+k(w^{+},w^{-})\in\mathbb{Z}_{+}^{k}\times\mathbb{Z}_{+}^{k},

𝒥(w+,w):={a[k]:wa++wa=1} and J(w+,w)=|𝒥(w+,w)|\mathcal{J}(w^{+},w^{-}):=\{a\in[k]:w^{+}_{a}+w^{-}_{a}=1\}\quad\text{ and }\quad J(w^{+},w^{-})=|\mathcal{J}(w^{+},w^{-})|

so that 𝒥(w+,w)\mathcal{J}(w^{+},w^{-}) is the index set of generators that appear exactly once in the realization {W=w}\{W=w\}.

Moreover, for a sufficiently small ε>0\varepsilon>0, define

𝒲once:={(w+,w)+k×+k:|J(w+,w)tet/k|12εtet/k}.\mathcal{W}_{once}:=\{(w^{+},w^{-})\in\mathbb{Z}^{k}_{+}\times\mathbb{Z}^{k}_{+}:|J(w^{+},w^{-})-te^{-t/k}|\leq\frac{1}{2}\varepsilon te^{-t/k}\}. (29)

We can observe that the distribution of JJ is Binomial(k,(t/k)et/k)\mathrm{Binomial}(k,(t/k)e^{-t/k}) so that when tet/k1te^{-t/k}\gg 1 (which holds when klog|G|k\gtrsim\log|G| and logklog|G|\log k\ll\log|G|), 𝒲once\mathcal{W}_{once} occurs with high probability for any ε>0\varepsilon>0.

Definition 5.

Let 𝒲glo,𝒲loc,𝒲once\mathcal{W}_{glo},\mathcal{W}_{loc},\mathcal{W}_{once} be defined as in (27),(28) and (29). Define the typical event

𝐭𝐲𝐩:={{W,W𝒲glo}{(W+,W),((W)+,(W))𝒲loc} when klog|Gab|,{W,W𝒲glo}{(W+,W),((W)+,(W))𝒲loc𝒲once} when kλlog|Gab|,{W,W𝒲glo}{(W+,W),((W)+,(W))𝒲once} when klog|Gab|.\mathrm{\mathbf{typ}}:=\begin{cases}\{W,W^{\prime}\in\mathcal{W}_{glo}\}\cap\{(W^{+},W^{-}),((W^{\prime})^{+},(W^{\prime})^{-})\in\mathcal{W}_{loc}\}&\text{ when }k\ll\log|G_{\mathrm{ab}}|,\\ \{W,W^{\prime}\in\mathcal{W}_{glo}\}\cap\{(W^{+},W^{-}),((W^{\prime})^{+},(W^{\prime})^{-})\in\mathcal{W}_{loc}\cap\mathcal{W}_{once}\}&\text{ when }k\eqsim\lambda\log|G_{\mathrm{ab}}|,\\ \{W,W^{\prime}\in\mathcal{W}_{glo}\}\cap\{(W^{+},W^{-}),((W^{\prime})^{+},(W^{\prime})^{-})\in\mathcal{W}_{once}\}&\text{ when }k\gg\log|G_{\mathrm{ab}}|.\\ \end{cases}
Lemma 4.3.

(𝐭𝐲𝐩)=1o(1)\mathbb{P}(\mathrm{\mathbf{typ}})=1-o(1).

Proof.

Lemma 4.2 implies that W,W𝒲gloW,W^{\prime}\in\mathcal{W}_{glo} with high probability. The proof for ((W+,W)𝒲loc)=1o(1)\mathbb{P}((W^{+},W^{-})\in\mathcal{W}_{loc})=1-o(1) when klog|Gab|k\lesssim\log|G_{\mathrm{ab}}| follows from standard large deviation estimation. The proof for ((W+,W)𝒲once)=1o(1)\mathbb{P}((W^{+},W^{-})\in\mathcal{W}_{once})=1-o(1) when klog|Gab|k\gtrsim\log|G_{\mathrm{ab}}| also follows from standard large deviation estimation. ∎

4.2. Lower Bound on Mixing Time

As projection does not increasing the total variation distance, to find a lower bound on the mixing time we can consider the projection of the original random walk X(t)X(t), defined by Y(t):=G2X(t)Y(t):=G_{2}X(t), on the projected Cayley graph Cay(Gab,{G2Zi±1:i[k]})\text{Cay}(G_{\mathrm{ab}},\{G_{2}Z_{i}^{\pm 1}:i\in[k]\}). As GabG_{\mathrm{ab}} is abelian, the mixing behavior of YtY_{t} is well understood, see [17, 19].

The key idea of proving the lower bound comes from a concentration result of the entropy, which has been proved in the literature and is restated here for the sake of self-containment.

Proposition 6 (Proposition 2.3 in [19]).

Assume that kk satisfies 1logklog|G|1\ll\log k\ll\log|G| and recall t0:=t0(k,|Gab|)t_{0}:=t_{0}(k,|G_{\mathrm{ab}}|) as in Definition 3. Then Var(Q(t0))1\mathrm{Var}(Q(t_{0}))\gg 1 and further, for ε>0\varepsilon>0, writing ω:=(Var(Q(t0)))1/4\omega:=(\mathrm{Var}(Q(t_{0})))^{1/4}, we have

(Q((1ε)t0)log|Gab|ω)0.\mathbb{P}(Q((1-\varepsilon)t_{0})\geq\log|G_{\mathrm{ab}}|-\omega)\to 0.

The proof of the lower bound follows from a somewhat conventional argument within the entropic methodology. Our proof is essentially a restatement of the proof in Section 3.3 of [19], which we include for the purpose of being self-contained.

Lemma 4.4.

Assume that 1logklog|G|1\ll\log k\ll\log|G|. Let S={si±:i[k]}GS=\{s^{\pm}_{i}:i\in[k]\}\subseteq G be a given generator set. For any ε>0\varepsilon>0 and t(1ε)t(k,G)t\leq(1-\varepsilon)t_{*}(k,G),

S(X(t))πGTV1o(1),\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}}\geq 1-o(1),

where S\mathbb{P}_{S} denotes the law of the random walk XX with X(0)=idX(0)=\mathrm{id} given the generator set SS.

Proof.

Since the argument does not depend on the choice of S:={si±1:i[k]}S:=\{s_{i}^{\pm 1}:i\in[k]\} we will suppress it from notation and write \mathbb{P} for S\mathbb{P}_{S}.

Recall that t(k,G)=max{logk|G|,t0(k,|Gab)}t_{*}(k,G)=\max\{\log_{k}|G|,t_{0}(k,|G_{\mathrm{ab}})\}. To argue that (1ε)logk|G|(1-\varepsilon)\log_{k}|G| is a lower bound on the mixing time, first observe that in mm\in\mathbb{N} steps the support of the random walk XX is at most of size kmk^{m}. When m(1ε)logk|G|m\leq(1-\varepsilon)\log_{k}|G|, the support has size at most |G|1ε|G|^{1-\varepsilon} and hence the walk cannot be mixed in this many steps.

Let t:=(1ε)t0(k,|Gab|)t:=(1-\varepsilon)t_{0}(k,|G_{\mathrm{ab}}|) and define

:={μt(W(t))|Gab|1eω}={Q(t)log|Gab|ω}\mathcal{E}:=\{\mu_{t}(W(t))\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}\}=\{Q(t)\leq\log|G_{\mathrm{ab}}|-\omega\}

with ω1\omega\gg 1 from Proposition 6, by which we have ()=1o(1)\mathbb{P}(\mathcal{E})=1-o(1).

Let Π:GGab\Pi:G\to G_{\mathrm{ab}} denote the canonical projection. Consider

E:={xGab:wk s.t. μt(w)|Gab|1eω and x=Π(Z1w1Zkwk)}Gab.E:=\{x\in G_{\mathrm{ab}}:\exists w\in\mathbb{Z}^{k}\text{ s.t. }\mu_{t}(w)\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}\text{ and }x=\Pi(Z_{1}^{w_{1}}\cdots Z_{k}^{w_{k}})\}\subseteq G_{\mathrm{ab}}.

Based on the definition of EE we have (Π(X(t))E|)=1\mathbb{P}(\Pi(X(t))\in E|\mathcal{E})=1. Every element xEx\in E satisfies x=Π(Z1w1xZkwkx)x=\Pi(Z_{1}^{w^{x}_{1}}\cdots Z_{k}^{w^{x}_{k}}) for some wxkw^{x}\in\mathbb{Z}^{k} with μt(wx)|Gab|1eω\mu_{t}(w^{x})\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}. Hence, for all xEx\in E, we have

(Π(X(t))=x)(W(t)=wx)=μt(wx)|Gab|1eω.\mathbb{P}(\Pi(X(t))=x)\geq\mathbb{P}(W(t)=w^{x})=\mu_{t}(w^{x})\geq|G_{\mathrm{ab}}|^{-1}e^{\omega}.

Summing over xEx\in E gives

1xE(Π(X(t))=x)|E||Gab|1eω1\geq\sum_{x\in E}\mathbb{P}(\Pi(X(t))=x)\geq|E|\cdot|G_{\mathrm{ab}}|^{-1}e^{\omega}

and hence |E|/|Gab|eωo(1)|E|/|G_{\mathrm{ab}}|\leq e^{-\omega}-o(1). Therefore,

(X(t))πGTV(X(t)Π1(E))πG(Π1(E))()|E|/|Gab|=1o(1),\|\mathbb{P}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}}\geq\mathbb{P}(X(t)\in\Pi^{-1}(E))-\pi_{G}(\Pi^{-1}(E))\geq\mathbb{P}(\mathcal{E})-|E|/|G_{\mathrm{ab}}|=1-o(1),

which completes the proof.

4.3. Representation of X(t)X(t)

Let N:=N(t)N:=N(t) be the number of steps taken by the continuous time rate 1 random walk X:=X(t)X:=X(t) on the group GG. For i[N]i\in[N], we can write (the increment of) the ii-th step taken by XX as ZσiηiZ^{\eta_{i}}_{\sigma_{i}}, where σiiidUnif([k])\sigma_{i}\overset{iid}{\sim}\mathrm{Unif}([k]) and ηiiidUnif{±1}\eta_{i}\overset{iid}{\sim}\mathrm{Unif}\{\pm 1\}. Then we can express X=i=1NZσiηiX=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}, and similarly X=i=1NZσiηiX^{\prime}=\prod_{i=1}^{N^{\prime}}Z^{\eta^{\prime}_{i}}_{\sigma^{\prime}_{i}}, where N,{ηi:i[N]},{σi:i[N]}N^{\prime},\{\eta^{\prime}_{i}:i\in[N^{\prime}]\},\{\sigma^{\prime}_{i}:i\in[N^{\prime}]\} are independent random variables defined analogously. Note that for a[k]a\in[k], Wa(t)=i=1N(t)𝟏{σi=a}ηiW_{a}(t)=\sum_{i=1}^{N(t)}\mathbf{1}\{\sigma_{i}=a\}\eta_{i} for a[k]a\in[k].

We are interested in the event {X=X}\{X=X^{\prime}\}, which is equivalent to {𝐗=id}\{\mathbf{X}=\mathrm{id}\}, where

𝐗:=X(X)1=i=1NZσiηi(i=1NZσiηi)1=i=1NZσiηij=0N1ZσNjηNj.\mathbf{X}:=X(X^{\prime})^{-1}=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}\cdot\left(\prod_{i=1}^{N^{\prime}}Z^{\eta^{\prime}_{i}}_{\sigma^{\prime}_{i}}\right)^{-1}=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}\cdot\prod_{j=0}^{N^{\prime}-1}Z^{-\eta^{\prime}_{N^{\prime}-j}}_{\sigma^{\prime}_{N^{\prime}-j}}. (30)

It is easy to see the law of 𝐗\mathbf{X} is the same as that of a rate 2 simple random walk on the same Cayley graph Cay(G,S)\text{Cay}(G,S). In fact, for simplicity of notation we will write (30) as

𝐗=i=1N+NZσiηi\mathbf{X}=\prod_{i=1}^{N+N^{\prime}}Z^{\eta_{i}}_{\sigma_{i}} (31)

where σi:=σN+Ni\sigma_{i}:=\sigma^{\prime}_{N^{\prime}+N-i} and ηi:=ηN+N+1i\eta_{i}:=-\eta^{\prime}_{N^{\prime}+N+1-i} for N<iN+NN<i\leq N+N^{\prime}.

Our goal is to express XX (and analogously 𝐗\mathbf{X}) in a way that the role of WW (and analogously WWW-W^{\prime}) is understood. If GG is Abelian we can simply rearrange the sequence X=i=1NZσiηiX=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}} to obtain X=Z1W1ZkWkX=Z_{1}^{W_{1}}\cdots Z_{k}^{W_{k}}. Although we do not have this nice and simple relation between XX and WW when GG is not abelian, we can still rearrange the terms in (30) and pay the price of adding an extra commutator, i.e., for x,yGx,y\in G we can rewrite xyxy as yx[x,y]yx[x,y]. For this reason, in some of our analysis we also care about the specific order in which each generator appears in 𝐗\mathbf{X}, which is why we will sometimes also refer to 𝐗\mathbf{X} as a sequence to emphasize this perspective. To be more specific, when we refer to 𝐗\mathbf{X} as a “sequence” we are referring to the corresponding (σi,ηi)i[N+N](\sigma_{i},\eta_{i})_{i\in[N+N^{\prime}]}.

More generally, for x,y,zGx,y,z\in G, consider as an example the element xyzxGxyzx\in G. In order to express xyzxxyzx in our desired form we can rearrange the terms in the sequence xyzxxyzx as follows

xyzx\displaystyle xyzx =xyxz[z,x]=x2y[y,x]z[z,x]=x2yz[y,x][[y,x],z][z,x]\displaystyle=xyxz[z,x]=x^{2}y[y,x]z[z,x]=x^{2}yz[y,x][[y,x],z][z,x]
=x2yz[y,x][z,x][[y,x],z][[[y,x],z],[z,x]].\displaystyle=x^{2}yz[y,x][z,x][[y,x],z][[[y,x],z],[z,x]]. (32)

We can see from this example that rearranging the whole sequence 𝐗\mathbf{X} will result in commutators of the form {ρ(x1,,xi):xjG,j[i],i2}\{\rho(x_{1},...,x_{i}):x_{j}\in G,j\in[i],i\geq 2\}, which was defined in (9).

In terms of the sequence 𝐗\mathbf{X}, it will become clear later that we actually only need to keep track of the two-fold commutators of the form {[Za,Zb]:a,b[k]}\{[Z_{a},Z_{b}]:a,b\in[k]\}. Hence, we will write V:=WWV:=W-W^{\prime} and rearrange the terms in (31) to obtain the following expression

𝐗=Z1V1ZkVka,b[k]:a<b[Za,Zb]mbaφ(Z1,,Zk),\mathbf{X}=Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k}), (33)

where, letting (σi,ηi)(\sigma_{i},\eta_{i}) denote the ii-th generator and its sign in the sequence in (31),

mba:=i=1N+Nj<iηiηj𝟏{σi=a,σj=b} for 1a<bk,m_{ba}:=-\sum_{i=1}^{N+N^{\prime}}\sum_{j<i}\eta_{i}\eta_{j}\mathbf{1}\{\sigma_{i}=a,\sigma_{j}=b\}\quad\text{ for }1\leq a<b\leq k, (34)

and φ(Z1,,Zk)\varphi(Z_{1},\dots,Z_{k}) is the residual part as the result of the rearranging. To give a more specific description of φ\varphi, define 𝒞com:={ρ(Za1±1,,Zai±1):aj[k] for all j[i],i2}\mathcal{C}_{\mathrm{com}}:=\{\rho(Z^{\pm 1}_{a_{1}},\dots,Z_{a_{i}}^{\pm 1}):a_{j}\in[k]\text{ for all }j\in[i],i\geq 2\} to be the collection of commutators of {Z1±1,,Zk±1}\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}. A multi-fold commutator of {Z1±1,,Zk±1}\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\} refers to a term of the form ρ(x1,,xi)\rho(x_{1},\dots,x_{i}) with i2i\geq 2 where xj𝒞com{Z1±1,,Zk±1}x_{j}\in\mathcal{C}_{\mathrm{com}}\cup\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\} for all j[i]j\in[i], which is not simply a two-fold commutator of the form {[Za1±1,Za2±1]:a1,a2[k]}\{[Z_{a_{1}}^{\pm 1},Z_{a_{2}}^{\pm 1}]:a_{1},a_{2}\in[k]\}. A multi-fold commutator consisting of ii pairs of brackets is said to be (i+1)(i+1)-fold. For example, see (4.3) where [[y,x],z][[y,x],z] is a 3-fold commutator and [[[y,x],z],[z,x]][[[y,x],z],[z,x]] is a 5-fold commutator.

It will become clear in our later arguments that the specific order in which terms appear in φ(Z1,,Zk)\varphi(Z_{1},\dots,Z_{k}) is of no interest to us, and thus with some abuse of language we will sometimes refer to φ()\varphi(\cdot) as a polynomial with terms that are multi-fold commutators of {Z1±1,,Zk±1}\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\}.

4.4. Upper Bound on Mixing Time

In this section we prove the upper bound on the mixing time, for which the precise statement is presented below. Recall that S\mathbb{P}_{S} denotes the law of the random walk given the generator set SS starting at X(0)=idX(0)=\mathrm{id}.

Theorem 5.

Let GG be a nilpotent group with r(G),L(G)1r(G),L(G)\asymp 1. Let S={Zi±1:i[k]}S=\{Z_{i}^{\pm 1}:i\in[k]\} with Z1,,ZkiidUnif(G)Z_{1},\dots,Z_{k}\overset{iid}{\sim}\mathrm{Unif}(G) and assume 1logklog|G|1\ll\log k\ll\log|G|. For any ε>0\varepsilon>0 and t(1+ε)t(k,G)t\geq(1+\varepsilon)t_{*}(k,G), we have S(X(t))πGTV=o(1)\|\mathbb{P}_{S}(X(t)\in\cdot)-\pi_{G}\|_{\mathrm{TV}}=o(1) with high probability.

Notice that when 1klog|G|loglog|G|1\ll k\ll\frac{\log|G|}{\log\log|G|} this theorem follows directly from Theorem 2 and hence in the proof we will focus on the regime klog|G|loglog|G|k\gtrsim\frac{\log|G|}{\log\log|G|}.

We will first define the notation that will be used throughout this section and explain why it is useful.

Definition 6.

For each [L]\ell\in[L], define Q:=G/G+1Q_{\ell}:=G_{\ell}/G_{\ell+1} and let r:=r(Q)r_{\ell}:=r(Q_{\ell}) denote the rank of QQ_{\ell}. For each QQ_{\ell}, we will choose a set RGR_{\ell}\subseteq G_{\ell} such that |R|=|Q||R_{\ell}|=|Q_{\ell}| and Q={G+1g:gR}Q_{\ell}=\{G_{\ell+1}g:g\in R_{\ell}\}.

As

{X(X)1=id}==1L+1{X(X)1G}==1L+1{GX(X)1=G},\{X(X^{\prime})^{-1}=\mathrm{id}\}=\cap_{\ell=1}^{L+1}\{X(X^{\prime})^{-1}\in G_{\ell}\}=\cap_{\ell=1}^{L+1}\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\},

we will be interested in events related to {GX(X)1}[L+1]\{G_{\ell}X(X^{\prime})^{-1}\}_{\ell\in[L+1]}. In other words, we will decompose X(X)1X(X^{\prime})^{-1} with respect to the quotient groups {Q:[L]}\{Q_{\ell}:\ell\in[L]\} and derive simplified expressions to make the distribution of X(X)1X(X^{\prime})^{-1} more tractable.

It will be tremendously useful if we can express ZaZ_{a} as a product of elements belonging to each layer of {G:[L]}\{G_{\ell}:\ell\in[L]\}. Indeed, the following lemma asserts that we can construct each generator ZaZ_{a} as a product of independent random variables {Za,:[L]}\{Z_{a,\ell}:\ell\in[L]\}.

Lemma 4.5 (Corollary 6.4 in [17]).

Let {Za,:a[k],1L}\{Z_{a,\ell}:a\in[k],1\leq\ell\leq L\} be independent and such that Za,Unif(R)Z_{a,\ell}\sim\mathrm{Unif}(R_{\ell}). Then G+1Za,Unif(Q)G_{\ell+1}Z_{a,\ell}\sim\mathrm{Unif}(Q_{\ell}). Moreover, Za:==1LZa,Z_{a}:=\prod_{\ell=1}^{L}Z_{a,\ell} are i.i.d. uniform over GG for a[k]a\in[k].

4.4.1. Proof Framework for Theorem 5

As suggested by Lemma 4.1, in order to control the total variation distance to stationarity we will upper bound D(t):=|G|(X=X|𝐭𝐲𝐩)1D(t):=|G|\cdot\mathbb{P}(X=X^{\prime}|\mathrm{\mathbf{typ}})-1, where we also average over the choice of SS. Letting V:=WWV:=W-W^{\prime}, D(t)D(t) can be further decomposed with respect to VV:

D(t)=|G|(X=X,V0|𝐭𝐲𝐩)+|G|(X=X,V=0|𝐭𝐲𝐩)1.D(t)=|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})+|G|\cdot\mathbb{P}(X=X^{\prime},V=0|\mathrm{\mathbf{typ}})-1. (35)

Again, for simplicity of notation we will suppress the dependence of the mixing times on kk and GG, and write t0:=t0(k,|Gab|),t1:=logk|G|t_{0}:=t_{0}(k,|G_{\mathrm{ab}}|),t_{1}:=\log_{k}|G| and t:=t(k,G)=max{t0,t1}t_{*}:=t_{*}(k,G)=\max\{t_{0},t_{1}\}. It follows easily from (35) that in order to prove Theorem 5 it suffices to prove the following two results.

Proposition 7.

For any ε>0\varepsilon>0 and t(1+ε)tt\geq(1+\varepsilon)t_{*}, we have

|G|(X=X,V0|𝐭𝐲𝐩)\displaystyle|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}}) =1+o(1).\displaystyle=1+o(1).
Proposition 8.

For any ε>0\varepsilon>0 and t(1+ε)tt\geq(1+\varepsilon)t_{*}, we have

|G|(X=X,V=0|𝐭𝐲𝐩)\displaystyle|G|\cdot\mathbb{P}(X=X^{\prime},V=0|\mathrm{\mathbf{typ}}) =o(1).\displaystyle=o(1).

The proofs of both Proposition 7 and 8 rely on the analysis of the events {GX(X)1=G}[L+1]\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\}_{\ell\in[L+1]}. In the case where V0V\neq 0, writing V=(V1,,Vk)V=(V_{1},\dots,V_{k}), one will see in Section 4.6 that the analysis boils down to understanding the distribution of (G+1a[k]VaZa,)[L](G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell})_{\ell\in[L]}. When V=0V=0 the analysis is significantly more involved, as in this case (33) turns into

X(X)1\displaystyle X(X^{\prime})^{-1} =Z1V1ZkVka,b[k]:a<b[Za,Zb]mbaφ(Z1,,Zk)=a,b[k]:a<b[Za,Zb]mbaφ(Z1,,Zk)\displaystyle=Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k})=\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k})

and hence we need to carefully understand the distribution of the product of commutators. The proof of Proposition 8 is therefore considerably more involved, prompting us to provide an outline here that captures the main steps.

Proof outline of Proposition 8. To give an intuitive and brief explanation on why Proposition 8 is true, we will give the outline of the proof of Proposition 8 here. We will analyze {GX(X)1=G}\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\} for each 2L+12\leq\ell\leq L+1 when V=0V=0.

The event {G2X(X)1=G2}\{G_{2}X(X^{\prime})^{-1}=G_{2}\} is already guaranteed to occur if we condition on {V=0}\{V=0\}. In general, in order to understand each event {G+2X(X)1=G+2}\{G_{\ell+2}X(X^{\prime})^{-1}=G_{\ell+2}\} for 1L11\leq\ell\leq L-1, we hope to show that for all 1L11\leq\ell\leq L-1, G+2X(X)1G_{\ell+2}X(X^{\prime})^{-1} is close to being uniform over Q+1:=G+1/G+2Q_{\ell+1}:=G_{\ell+1}/G_{\ell+2}. To analyze the distribution of G+2X(X)1G_{\ell+2}X(X^{\prime})^{-1} for each [L1]\ell\in[L-1], we observe from the representation of X(X)1X(X^{\prime})^{-1} given in (33) that (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b}, which is defined in (34), plays a crucial role in determining the distribution.

As we will later show in Section 4.8.3, after simplifying G+2X(X)1G_{\ell+2}X(X^{\prime})^{-1} for 2L12\leq\ell\leq L-1 (the case where =1\ell=1 is somewhat different and will be discussed separately in Section 4.8.4) one will eventually arrive at a term of the form

G+2b𝒦[χb,Zb,],G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}],

where χbG\chi_{b}\in G satisfies G2χb=G2(a[k]:a<bmbaZa,1a[k]:b<amabZa,1)G_{2}\chi_{b}=G_{2}(\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}-\sum_{a\in[k]:b<a}m_{ab}Z_{a,1}) and 𝒦[k]\mathcal{K}\subseteq[k] is a subset to be chosen, see (4.8.3). The terms (χb)b𝒦(\chi_{b})_{b\in\mathcal{K}} explicitly indicate the role of (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b}. More specifically, since (Zb,)b𝒦(Z_{b,\ell})_{b\in\mathcal{K}} is a collection of independent uniform random variables, Lemma 4.12 below asserts that the distribution of G+2b𝒦[χb,Zb,]G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}] is uniform over the subgroup generated, denoted by {G+2[χb,g]:b𝒦,gR}\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle. As we would like G+2X(X)1G_{\ell+2}X(X^{\prime})^{-1} to be close to being uniform over Q+1Q_{\ell+1}, the choice of 𝒦\mathcal{K} will be made so that the set {G+2[χb,g]:b𝒦,gR}\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle takes up a sufficiently large fraction of Q+1Q_{\ell+1}.


The above discussion suggests that we need to specify some conditions on (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b} to guarantee the existence of such an index set 𝒦\mathcal{K} and thus desired behavior of {G+2X(X)1}2L1\{G_{\ell+2}X(X^{\prime})^{-1}\}_{2\leq\ell\leq L-1}. These conditions are summarized in Definition 10 as a “good” event 𝒜\mathcal{A}. As a result, we will obtain the following estimate. Recall the definition of hh from Definition 4.

Proposition 9.

|G|(X=X|𝒜,V=0,𝐭𝐲𝐩)eh|G|\cdot\mathbb{P}(X=X^{\prime}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})\ll e^{h}.

Lastly, we would like 𝒜\mathcal{A} to be an event that occurs with sufficiently high probability, i.e.,

Proposition 10.

|G|(𝒜c|V=0,𝐭𝐲𝐩)eh|G|\cdot\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\ll e^{h}.

Proof of Proposition 8 given Proposition 9 and 10. It follows from the definition of 𝐭𝐲𝐩\mathrm{\mathbf{typ}} and a direct calculation that

(V=0,𝐭𝐲𝐩)\displaystyle\mathbb{P}(V=0,\mathrm{\mathbf{typ}}) =(W=W,W𝒲glo)=w𝒲glo(W=w)(W=w)eh.\displaystyle=\mathbb{P}(W=W^{\prime},W\in\mathcal{W}_{glo})=\sum_{w\in\mathcal{W}_{glo}}\mathbb{P}(W=w)\mathbb{P}(W^{\prime}=w)\leq e^{-h}.

That is,

(V=0|𝐭𝐲𝐩)eh/(𝐭𝐲𝐩).\mathbb{P}(V=0|\mathrm{\mathbf{typ}})\leq e^{-h}/\mathbb{P}(\mathrm{\mathbf{typ}}). (36)

Hence, by Proposition 9, Proposition 10 and (36),

(X=X,V=0|𝐭𝐲𝐩)\displaystyle\mathbb{P}(X=X^{\prime},V=0|\mathrm{\mathbf{typ}}) ((X=X|𝒜,V=0,𝐭𝐲𝐩)+(𝒜c|V=0,𝐭𝐲𝐩))(V=0|𝐭𝐲𝐩)1|G|.\displaystyle\leq\left(\mathbb{P}(X=X^{\prime}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})+\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\right)\cdot\mathbb{P}(V=0|\mathrm{\mathbf{typ}})\ll\frac{1}{|G|}.

The remaining of this section is organized as follows: in Section 4.5 we define filtrations that will serve as useful tools in the subsequent proofs; in Section 4.6 we prove Proposition 7; in Section 4.7 we present some preliminary results on groups that are useful in the proof of Proposition 8. As discussed above, the proof of Proposition 8 is divided into two parts: Proposition 9 will be proved in Section 4.8 and Proposition 10 in Section 4.9.


4.5. Useful Filtrations

Recall that the random walk XX can be written as a sequence

X=i=1NZσiηi=Zσ1η1Zσ2η2ZσNηN,X=\prod_{i=1}^{N}Z^{\eta_{i}}_{\sigma_{i}}=Z_{\sigma_{1}}^{\eta_{1}}Z_{\sigma_{2}}^{\eta_{2}}\cdots Z_{\sigma_{N}}^{\eta_{N}},

where NN is the number of steps taken by XX by time tt and ZσiηiZ^{\eta_{i}}_{\sigma_{i}} denotes the ii-th step taken by the random walk. Similarly, we write X=i=1NZσiηiX^{\prime}=\prod_{i=1}^{N^{\prime}}Z^{\eta^{\prime}_{i}}_{\sigma^{\prime}_{i}}. Recall from Lemma 4.5 the decomposition Za==1LZa,Z_{a}=\prod_{\ell=1}^{L}Z_{a,\ell} for a[k]a\in[k].

We will define the following σ\sigma-fields and events that will be useful in later analysis.

Definition 7.

(i) Let ~\widetilde{\mathcal{H}} be the σ\sigma-field generated by N,N,(σi,ηi)i[N+N]N,N^{\prime},(\sigma_{i},\eta_{i})_{i\in[N+N^{\prime}]}, i.e., ~\widetilde{\mathcal{H}} contains information on the sequences X,XX,X^{\prime}, other than the identities of (Za)a[k](Z_{a})_{a\in[k]}.
(ii) For each [L]\ell\in[L], let =σ({Za,i:a[k],1i})\mathcal{F}_{\ell}=\sigma(\{Z_{a,i}:a\in[k],1\leq i\leq\ell\}). Let 0\mathcal{F}_{0} be the trivial σ\sigma-field.
(iii) For [L+1]\ell\in[L+1], let ={X(X)1G}\mathcal{E}_{\ell}=\{X(X^{\prime})^{-1}\in G_{\ell}\}.

To make the intuitive picture a bit clearer, note that there are mainly two sources of randomness:
(i) At every step a generator is randomly chosen and applied, resulting in the random order of the sequences (encoded in ~\widetilde{\mathcal{H}});
(ii) For each a[k]a\in[k], the specific choice of the generator ZaZ_{a} is random (encoded in {:[L]}\{\mathcal{F}_{\ell}:\ell\in[L]\}).

Remark 7.

The following random variables are measurable with respect to ~\widetilde{\mathcal{H}}: VV, 𝟏𝐭𝐲𝐩\mathbf{1}_{\mathrm{\mathbf{typ}}} and (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b}.

As preparation for later discussion, we first clarify the measurability of events of interest to us.

Lemma 4.6.

For [L]\ell\in[L],
(i) \mathcal{E}_{\ell} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}).
(ii) {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}).

Proof.

Recall from (33) we can write

X(X)1=Z1V1ZkVka,b[k]:a<b[Za,Zb]mbaφ(Z1,,Zk),X(X^{\prime})^{-1}=Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k}),

where φ(Z1,,Zk)\varphi(Z_{1},\dots,Z_{k}) is a product of multi-fold commutators of {Z1±1,,Zk±1}\{Z^{\pm 1}_{1},\dots,Z^{\pm 1}_{k}\} as a result of the rearranging. (See the end of Section 4.3 for a detailed description of φ\varphi and multi-fold commutators.) By Lemma 4.5, for each a[k]a\in[k], ZaZ_{a} can be decomposed into Za==1LZa,Z_{a}=\prod_{\ell=1}^{L}Z_{a,\ell}. For a[k]a\in[k] and [L]\ell\in[L], we will write Za,<:=j=11Za,jZ_{a,<\ell}:=\prod_{j=1}^{\ell-1}Z_{a,j}.

Applying the decomposition Za==1LZa,Z_{a}=\prod_{\ell=1}^{L}Z_{a,\ell} to the sequence above yields

G+1X(X)1=G+1Z1V1ZkVka,b[k]:a<b[Za,Zb]mbaφ(Z1,,Zk)\displaystyle G_{\ell+1}X(X^{\prime})^{-1}=G_{\ell+1}Z_{1}^{V_{1}}\cdots Z_{k}^{V_{k}}\prod_{a,b\in[k]:a<b}[Z_{a},Z_{b}]^{m_{ba}}\varphi(Z_{1},\dots,Z_{k})
=\displaystyle= G+1(a=1kZa,Va)(a=1kZa,<Va)a,b[k]:a<b[i=11Za,i,j=11Zb,j]mbaφ(2)({Za,u:a[k],u2})\displaystyle G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,<\ell}\right)\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell-1}Z_{a,i},\prod_{j=1}^{\ell-1}Z_{b,j}\right]^{m_{ba}}\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\}) (37)

where φ(2)({Za,u:a[k],u2})\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\}) is a certain polynomial with terms that are ii-fold commutators of {Za,u:a[k],u2}\{Z_{a,u}:a\in[k],u\leq\ell-2\} for i3i\geq 3, satisfying that

G+1φ(2)({Za,u:a[k],u2})=G+1φ(Z1,,Zk).G_{\ell+1}\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\})=G_{\ell+1}\varphi(Z_{1},\dots,Z_{k}).

The reason that we restrict our attention above to u2u\leq\ell-2 is because by Proposition 2 any ii-fold commutator with i3i\geq 3 that involves {Za,u:a[k],u>2}\{Z_{a,u}:a\in[k],u>\ell-2\} is in G+1G_{\ell+1}. Also note that we can exchange the order of Za,Z_{a,\ell} and Za,<Z_{a^{\prime},<\ell} for any a,a[k]a,a^{\prime}\in[k] as [Za,,Za,<]G+1[Z_{a,\ell},Z_{a^{\prime},<\ell}]\in G_{\ell+1} by Proposition 2.

On {V=0}\{V=0\} the right hand side of (4.5) only involves {Za,i:1i1}\{Z_{a,i}:1\leq i\leq\ell-1\} and hence is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}). Therefore, {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}), proving (ii).

Next we prove (i). Writing

f(1)=\displaystyle f^{(\ell-1)}= f(1)({Za,u:a[k],u1})\displaystyle f^{(\ell-1)}(\{Z_{a,u}:a\in[k],u\leq\ell-1\})
:=\displaystyle:= (a=1kZa,<Va)a,b[k]:a<b[i=11Za,i,j=11Zb,j]mba\displaystyle\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,<\ell}\right)\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell-1}Z_{a,i},\prod_{j=1}^{\ell-1}Z_{b,j}\right]^{m_{ba}}
φ(2)({Za,u:a[k],u2}),\displaystyle\quad\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\}), (38)

by (4.5) we have

G+1X(X)1=G+1(a=1kZa,Va)f(1).G_{\ell+1}X(X^{\prime})^{-1}=G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)f^{(\ell-1)}. (39)

Note that f(1)f^{(\ell-1)} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) as it only involves {Za,u:1u1}\{Z_{a,u}:1\leq u\leq\ell-1\}. Furthermore, since {Za,:a[k]}G\{Z_{a,\ell}:a\in[k]\}\subseteq G_{\ell}, the same kind of simplification as (4.5) leads to

GX(X)1=Gf(1)({Za,u:a[k],u1}),G_{\ell}X(X^{\prime})^{-1}=G_{\ell}f^{(\ell-1)}(\{Z_{a,u}:a\in[k],u\leq\ell-1\}),

which implies that ={f(1)G}\mathcal{E}_{\ell}=\{f^{(\ell-1)}\in G_{\ell}\}. Therefore, \mathcal{E}_{\ell} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}). ∎

4.6. Proof of Proposition 7

Linear combinations of independent uniform random variables in an abelian group are themselves uniform on their support. As preparation for the proof we present the following lemma, which is a restatement of Lemma 2.11 and Lemma 6.5 of [17].

Lemma 4.7.

Let kk\in\mathbb{N}. Let HH be an Abelian group and U1,,UkiidUnif(H)U_{1},\dots,U_{k}\overset{iid}{\sim}\mathrm{Unif}(H). For v=(v1,,vk)kv=(v_{1},\dots,v_{k})\in\mathbb{Z}^{k}, write U=(U1,,Uk)U=(U_{1},\dots,U_{k}) and define vU:=i=1kviUiv\cdot U:=\sum_{i=1}^{k}v_{i}U_{i}. We have

vUUnif(γH) where γ=gcd(v1,,vk,|H|).v\cdot U\sim\mathrm{Unif}(\gamma H)\quad\text{ where }\gamma=\mathrm{gcd}(v_{1},\dots,v_{k},|H|).

Consequently,

maxhH(vU=h)=(vU=0)\max_{h\in H}\mathbb{P}(v\cdot U=h)=\mathbb{P}(v\cdot U=0) (40)

Applying Lemma 4.7 to (Z1,,,Zk,)(Z_{1,\ell},\dots,Z_{k,\ell}) and H=QH=Q_{\ell} for [L]\ell\in[L] and writing gcd(v,|Q|)=gcd(v1,,vk,|Q|)\mathrm{gcd}(v,|Q_{\ell}|)=\mathrm{gcd}(v_{1},\dots,v_{k},|Q_{\ell}|) gives

G+1a[k]vaZa,Unif(gcd(v,|Q|)Q).G_{\ell+1}\sum_{a\in[k]}v_{a}Z_{a,\ell}\sim\mathrm{Unif}(\mathrm{gcd}(v,|Q_{\ell}|)Q_{\ell}). (41)

The key to the proof of Proposition 7 is the following estimate, whose proof uses the simplified expression of {G+1X(X)1}L\{G_{\ell+1}X(X^{\prime})^{-1}\}_{\ell\in L} derived in the last section.

Lemma 4.8.

We have

(X=X|V)=1L(G+1a[k]VaZa,=G+1|V).\mathbb{P}(X=X^{\prime}|V)\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|V).
Proof.

Recall from the discussion in the proof of Lemma 4.6 that ={GX(X)1=G}={f(1)G}\mathcal{E}_{\ell}=\{G_{\ell}X(X^{\prime})^{-1}=G_{\ell}\}=\{f^{(\ell-1)}\in G_{\ell}\} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) for [L]\ell\in[L] (where 0\mathcal{F}_{0} is the trivial σ\sigma-field). Since +1\mathcal{E}_{\ell}\subseteq\mathcal{E}_{\ell+1}, it follows from (39) that for [L]\ell\in[L],

(+1|1,~)\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) =(,G+1(a=1kZa,Va)f(1)=G+1|1,~)\displaystyle=\mathbb{P}(\mathcal{E}_{\ell},G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right)f^{(\ell-1)}=G_{\ell+1}\bigg{|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})
𝟏maxgG(G+1a=1kZa,Va=G+1g|1,~)\displaystyle\leq\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}=G_{\ell+1}g\bigg{|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})
=𝟏maxgG(G+1a[k]VaZa,=G+1g|1,~),\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}g\bigg{|}\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}),

where the last line follows from rewriting the term G+1(a=1kZa,Va)G_{\ell+1}\left(\prod_{a=1}^{k}Z^{V_{a}}_{a,\ell}\right) in the form of summation, as Q:=G/G+1Q_{\ell}:=G_{\ell}/G_{\ell+1} is abelian. It then follows from (40) in Lemma 4.7 that

(+1|1,~)\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) 𝟏maxgG(G+1a[k]VaZa,=G+1g|1,~)\displaystyle\leq\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\max_{g\in G_{\ell}}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}g|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})
=𝟏(G+1a[k]VaZa,=G+1|1,~)\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})
=𝟏(G+1a[k]VaZa,=G+1|~),\displaystyle=\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|\widetilde{\mathcal{H}}),

where the last equality follows from observing that G+1a[k]VaZa,G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell} is independent from 1\mathcal{F}_{\ell-1}. It then follows from the tower property that

(+1|~)\displaystyle\mathbb{P}(\mathcal{E}_{\ell+1}|\widetilde{\mathcal{H}}) =𝔼[(+1|1,~)|~]\displaystyle=\mathbb{E}[\mathbb{P}(\mathcal{E}_{\ell+1}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})|\widetilde{\mathcal{H}}]
𝔼[𝟏(G+1a[k]VaZa,=G+1|~)|~]=(|~)(G+1a[k]VaZa,=G+1|~).\displaystyle\leq\mathbb{E}[\mathbf{1}_{\mathcal{E}_{\ell}}\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|\widetilde{\mathcal{H}})|\widetilde{\mathcal{H}}]=\mathbb{P}(\mathcal{E}_{\ell}|\widetilde{\mathcal{H}})\cdot\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|\widetilde{\mathcal{H}}). (42)

Applying (4.6) iteratively for [L]\ell\in[L], we obtain

(X=X|~)\displaystyle\mathbb{P}(X=X^{\prime}|\widetilde{\mathcal{H}}) =(L+1|~)=1L(G+1a[k]VaZa,=G+1|~).\displaystyle=\mathbb{P}(\mathcal{E}_{L+1}|\widetilde{\mathcal{H}})\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|\widetilde{\mathcal{H}}).

As VV is measurable with respect to ~\widetilde{\mathcal{H}}, i.e., σ(V)~\sigma(V)\subseteq\widetilde{\mathcal{H}}, by the tower property we have the desired result

(X=X|V)=1L(G+1a[k]VaZa,=G+1|V).\mathbb{P}(X=X^{\prime}|V)\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell}=G_{\ell+1}|V).

Given Lemma 4.8, we can obtain the following upper bound involving the greatest common divider.

Lemma 4.9.

Let r¯:==1Lr\bar{r}:=\sum_{\ell=1}^{L}r_{\ell} and gcd(v):=gcd(v1,,vk,|G|)\mathrm{gcd}(v):=\mathrm{gcd}(v_{1},\dots,v_{k},|G|). We have

|G|(X=X,V0|𝐭𝐲𝐩)𝔼[(gcd(V))r¯𝟏{V0}|𝐭𝐲𝐩]|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})\leq\mathbb{E}\left[(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]
Proof.

For vk\{0}v\in\mathbb{Z}^{k}\backslash\{0\}, it follows from Lemma 4.8 that

(X(X)1=id|V=v)\displaystyle\mathbb{P}(X(X^{\prime})^{-1}=\mathrm{id}|V=v) =1L(G+1a[k]vaZa,=G+1)=1L1|gcd(v,|Q|)Q|\displaystyle\leq\prod_{\ell=1}^{L}\mathbb{P}(G_{\ell+1}\sum_{a\in[k]}v_{a}Z_{a,\ell}=G_{\ell+1})\leq\prod_{\ell=1}^{L}\frac{1}{|\mathrm{gcd}(v,|Q_{\ell}|)Q_{\ell}|}
=1L(gcd(V,|Q|))r|Q|,\displaystyle\leq\prod_{\ell=1}^{L}\frac{(\mathrm{gcd}(V,|Q_{\ell}|))^{r_{\ell}}}{|Q_{\ell}|},

where the last inequality follows from the fact that for an abelian group HH and γ\gamma\in\mathbb{N}, |H|/|γH|γr(H)|H|/|\gamma H|\leq\gamma^{r(H)}, see Lemma 2.12 in [17] for instance.

Recalling from Definition 7 the definition of ~\widetilde{\mathcal{H}}, one then has that

|G|(X=X,V0|~)\displaystyle|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\widetilde{\mathcal{H}}) |G|=1L(gcd(V,|Q|))r𝟏{V0}|Q|\displaystyle\leq|G|\cdot\prod_{\ell=1}^{L}\frac{(\mathrm{gcd}(V,|Q_{\ell}|))^{r_{\ell}}\cdot\mathbf{1}\{V\neq 0\}}{|Q_{\ell}|}
==1L(gcd(V,|Q|))r𝟏{V0}(gcd(V))r¯𝟏{V0},\displaystyle=\prod_{\ell=1}^{L}(\mathrm{gcd}(V,|Q_{\ell}|))^{r_{\ell}}\cdot\mathbf{1}\{V\neq 0\}\leq(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}, (43)

where the last inequality follows from the fact that gcd(V,|Q|)gcd(V)\mathrm{gcd}(V,|Q_{\ell}|)\leq\mathrm{gcd}(V) for all 1L1\leq\ell\leq L. Since 𝐭𝐲𝐩~\mathrm{\mathbf{typ}}\in\widetilde{\mathcal{H}}, integrating both sides of (4.6) over 1𝐭𝐲𝐩1_{\mathrm{\mathbf{typ}}} gives the desired bound

|G|(X=X,V0|𝐭𝐲𝐩)𝔼[(gcd(V))r¯𝟏{V0}|𝐭𝐲𝐩].|G|\cdot\mathbb{P}(X=X^{\prime},V\neq 0|\mathrm{\mathbf{typ}})\leq\mathbb{E}\left[(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right].

Lemma 4.10.

Fix any ε>0\varepsilon>0 and let s:=t/ks:=t/k for t(1+ε)tt\geq(1+\varepsilon)t_{*}. For all γ2\gamma\geq 2 and all λ>0\lambda>0, there exists a constant δ~λ(0,1)\tilde{\delta}_{\lambda}\in(0,1) that depends only on λ\lambda such that

(gcd(V)=γ,V0|𝐭𝐲𝐩){min{et,k(2es)γγγ} when s1,(1δ~λ)k when sf(λ),(1/γ+s1/2)k when s1,\mathbb{P}(\mathrm{gcd}(V)=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\begin{cases}\min\{e^{-t},\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}\}&\text{ when }s\ll 1,\\ (1-\tilde{\delta}_{\lambda})^{k}&\text{ when }s\geq f(\lambda),\\ (1/\gamma+s^{-1/2})^{k}&\text{ when }s\gg 1,\end{cases}

where f(λ)f(\lambda) is defined as in (26).

Proof.

Regime: s1s\ll 1. Let N1(s)N_{1}(s) denote a rate 1 Poisson process. Recall that V1V_{1} is a rate 2/k2/k simple random walk on \mathbb{Z}. For a random walk to return to the origin it must have taken an even number of steps, which means

(V1(t)=0)(N1(2s)2+)(N1(2s)=0)+m=1(N1(2s)=2m)e2s+8s2.\mathbb{P}(V_{1}(t)=0)\leq\mathbb{P}(N_{1}(2s)\in 2\mathbb{Z}_{+})\leq\mathbb{P}(N_{1}(2s)=0)+\sum^{\infty}_{m=1}\mathbb{P}(N_{1}(2s)=2m)\leq e^{-2s}+8s^{2}.

To simplify notation, write gcd:=gcd(V)\mathrm{gcd}:=\mathrm{gcd}(V) from now on. Using a Chernoff bound argument on the Poisson random variable N1(2s)N_{1}(2s) yields for γ2\gamma\geq 2

(gcd=γ,V0|𝐭𝐲𝐩)\displaystyle\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}}) (gcd=γ|𝐭𝐲𝐩)\displaystyle\leq\mathbb{P}(\mathrm{gcd}=\gamma|\mathrm{\mathbf{typ}})
((V1=0)+(N1(2s)γ))k(e2s+8s2+e2s(2es)γγγ)k\displaystyle\lesssim(\mathbb{P}(V_{1}=0)+\mathbb{P}(N_{1}(2s)\geq\gamma))^{k}\leq\left(e^{-2s}+8s^{2}+e^{-2s}\cdot\frac{(2es)^{\gamma}}{\gamma^{\gamma}}\right)^{k}
(e2s+8s2+(es)2)k(1s)ket.\displaystyle\leq(e^{-2s}+8s^{2}+(es)^{2})^{k}\leq(1-s)^{k}\leq e^{-t}.

To prove the second part of the upper bound, note that for {gcd=γ,V0}\{\mathrm{gcd}=\gamma,V\neq 0\} to occur, there must be some a[k]a\in[k] such that Na(2s)0N_{a}(2s)\neq 0 and γ\gamma divides Na(2s)N_{a}(2s), which implies that Na(2s)γN_{a}(2s)\geq\gamma. Hence,

(gcd=γ,V0|𝐭𝐲𝐩)k(N1(2s)γ)k(2es)γe2sγγk(2es)γγγ.\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim k\cdot\mathbb{P}(N_{1}(2s)\geq\gamma)\leq k\cdot\frac{(2es)^{\gamma}e^{-2s}}{\gamma^{\gamma}}\leq\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}.

Regime: sf(λ)s\geq f(\lambda). Define cγ:=(V1(t)γ)c_{\gamma}:=\mathbb{P}(V_{1}(t)\in\gamma\mathbb{Z}). Since sf(λ)s\geq f(\lambda), we have (V1(t)=0)c~λ<1\mathbb{P}(V_{1}(t)=0)\leq\tilde{c}_{\lambda}<1 for some constant c~λ\tilde{c}_{\lambda} depending on λ\lambda. So we can fix some small δλ>0\delta_{\lambda}>0 and some large γλ\gamma_{\lambda}\in\mathbb{N} such that (V1=0)+1/γλ<1δλ\mathbb{P}(V_{1}=0)+1/\gamma_{\lambda}<1-\delta_{\lambda}. That is, for all γγλ\gamma\geq\gamma_{\lambda},

cγ=(V1γ)(V1=0)+(V1γ|V10)(V1=0)+1/γλ<1δλ.c_{\gamma}=\mathbb{P}(V_{1}\in\gamma\mathbb{Z})\leq\mathbb{P}(V_{1}=0)+\mathbb{P}(V_{1}\in\gamma\mathbb{Z}|V_{1}\neq 0)\leq\mathbb{P}(V_{1}=0)+1/\gamma_{\lambda}<1-\delta_{\lambda}.

Thus there is some δ~λ>0\tilde{\delta}_{\lambda}>0 so that

maxγ2cγmax{max2γγλcγ,1δλ}1δ~λ.\max_{\gamma\geq 2}c_{\gamma}\leq\max\{\max_{2\leq\gamma\leq\gamma_{\lambda}}c_{\gamma},1-\delta_{\lambda}\}\leq 1-\tilde{\delta}_{\lambda}.

Therefore, for γ2\gamma\geq 2,

(gcd=γ,V0|𝐭𝐲𝐩)(V1γ)k(1δ~λ)k.\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\mathbb{P}(V_{1}\in\gamma\mathbb{Z})^{k}\leq(1-\tilde{\delta}_{\lambda})^{k}.

Regime: s1s\gg 1. Note that

(gcd=γ,V0|𝐭𝐲𝐩)(V1γ)k((V1=0)+(V1γ|V10))k.\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\mathbb{P}(V_{1}\in\gamma\mathbb{Z})^{k}\leq(\mathbb{P}(V_{1}=0)+\mathbb{P}(V_{1}\in\gamma\mathbb{Z}|V_{1}\neq 0))^{k}.

For the second term it follows from Lemma 2.14 of [17] that (V1γ|V10)1/γ\mathbb{P}(V_{1}\in\gamma\mathbb{Z}|V_{1}\neq 0)\leq 1/\gamma. As V1V_{1} is a one dimension SRW with rate 2/k2/k, Theorem A.4 of [18] implies that when s1s\gg 1 we have

(V1(t)=0)12π(2s)exp(𝒪(12s))s1/2.\mathbb{P}(V_{1}(t)=0)\leq\frac{1}{\sqrt{2\pi(2s)}}\exp\left(\mathcal{O}\left(\frac{1}{\sqrt{2s}}\right)\right)\leq s^{-1/2}.

Hence,

(gcd=γ,V0|𝐭𝐲𝐩)((V1=0)+1/γ)k(1/γ+s1/2)k.\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim(\mathbb{P}(V_{1}=0)+1/\gamma)^{k}\leq(1/\gamma+s^{-1/2})^{k}.

Lemma 4.11.

Suppose 1logklog|G|1\ll\log k\ll\log|G|. For any ε>0\varepsilon>0, when t(1+ε)tt\geq(1+\varepsilon)t_{*} we have

𝔼[(gcd(V))r¯𝟏{V0}|𝐭𝐲𝐩]=1+o(1).\mathbb{E}\left[(\mathrm{gcd}(V))^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]=1+o(1).
Proof.

Again we will write gcd=gcd(V)\mathrm{gcd}=\mathrm{gcd}(V).

Regime: 1klog|Gab|1\ll k\ll\log|G_{\mathrm{ab}}|. Observe that s:=t/k|Gab|2/k1s:=t/k\geq|G_{\mathrm{ab}}|^{2/k}\gg 1. On 𝐭𝐲𝐩\mathrm{\mathbf{typ}} we have gcd\mathrm{gcd} is at most 2r=|Gab|1/k(logk)22r_{*}=|G_{\mathrm{ab}}|^{1/k}(\log k)^{2}, which gives

𝔼[gcdr¯1{V0}|𝐭𝐲𝐩]=1+γ=22rγr¯(gcd=γ,V0|𝐭𝐲𝐩).\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}1\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]=1+\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}}).

Let δ(0,1)\delta\in(0,1) be sufficiently small. For 2γδ|Gab|1/k2\leq\gamma\leq\delta|G_{\mathrm{ab}}|^{1/k}, applying Lemma 4.10 gives

(gcd=γ,V0|𝐭𝐲𝐩)(1/γ+1/|Gab|1/k)(1/γ+1/(γ/δ))k=(1+δ)k/γk.\mathbb{P}(\mathrm{gcd}=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim(1/\gamma+1/|G_{\mathrm{ab}}|^{1/k})\leq(1/\gamma+1/(\gamma/\delta))^{k}=(1+\delta)^{k}/\gamma^{k}.

For γ>δ|Gab|1/k\gamma>\delta|G_{\mathrm{ab}}|^{1/k}, we use the bound (a+b)k2k(ak+bk)(a+b)^{k}\leq 2^{k}(a^{k}+b^{k}) to get

(gcd=γ|𝐭𝐲𝐩)2k(1/γk+1/|Gab|).\mathbb{P}(\mathrm{gcd}=\gamma|\mathrm{\mathbf{typ}})\lesssim 2^{k}(1/\gamma^{k}+1/|G_{\mathrm{ab}}|).

Therefore,

𝔼[gcdr¯𝟏{V0}|𝐭𝐲𝐩]1\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1 γ=2δ|Gab|1/k(1+δ)kγkr¯+γ=δ|Gab|1/k+12rγr¯2k(1/γk+1/|Gab|)\displaystyle\lesssim\sum_{\gamma=2}^{\delta|G_{\mathrm{ab}}|^{1/k}}\frac{(1+\delta)^{k}}{\gamma^{k-\bar{r}}}+\sum_{\gamma=\delta|G_{\mathrm{ab}}|^{1/k}+1}^{2r_{*}}\gamma^{\bar{r}}2^{k}(1/\gamma^{k}+1/|G_{\mathrm{ab}}|)
eδk2r¯+1k+2k(δ|Gab|1/k)r¯+1k+2k(logk)2(r¯+1)|Gab|(r¯+1k)/k\displaystyle\lesssim e^{\delta k}2^{\bar{r}+1-k}+2^{k}(\delta|G_{\mathrm{ab}}|^{1/k})^{\bar{r}+1-k}+2^{k}(\log k)^{2(\bar{r}+1)}|G_{\mathrm{ab}}|^{(\bar{r}+1-k)/k}
=o(1)\displaystyle=o(1)

as r¯1\bar{r}\asymp 1 and klog|Gab|k\ll\log|G_{\mathrm{ab}}|.

Regime: kλlog|Gab|k\eqsim\lambda\log|G_{\mathrm{ab}}|. In this regime s(1+ε)t0/kf(λ)s\geq(1+\varepsilon)t_{0}/k\geq f(\lambda). It follows from Lemma 4.10 that there exists δ~λ(0,1)\tilde{\delta}_{\lambda}\in(0,1) such that

𝔼[gcdr¯𝟏{V0}|𝐭𝐲𝐩]1\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1 γ=22rγr¯(1δ~λ)k\displaystyle\leq\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}(1-\tilde{\delta}_{\lambda})^{k}
(logk)2(r¯+1)(1δ~λ)k=o(1).\displaystyle\lesssim(\log k)^{2(\bar{r}+1)}(1-\tilde{\delta}_{\lambda})^{k}=o(1).

Regime: klog|Gab|k\gg\log|G_{\mathrm{ab}}|. In this regime t/k1t_{*}/k\ll 1 and thus for t(1+ε)tt\geq(1+\varepsilon)t_{*} there are two regimes of s=t/ks=t/k to be discussed: s1s\ll 1 and s1s\gtrsim 1. When s1s\gtrsim 1, by the same argument as before we can show 𝔼[gcdr¯𝟏{V0}|𝐭𝐲𝐩]1=o(1)\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1=o(1).

It remains to treat the case where s1s\ll 1. When loglogkks1\frac{\log\log k}{k}\ll s\ll 1, we apply the bound (gcd(V)=γ,V0|𝐭𝐲𝐩)et\mathbb{P}(\mathrm{gcd}(V)=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim e^{-t} in Lemma 4.10 to show

𝔼[gcdr¯𝟏{V0}|𝐭𝐲𝐩]1\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1 γ=22rγr¯et(2r)r¯+1et(logk)2(r¯+1)et=o(1).\displaystyle\leq\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}e^{-t}\lesssim(2r_{*})^{\bar{r}+1}e^{-t}\lesssim(\log k)^{2(\bar{r}+1)}e^{-t}=o(1).

When sloglogkks\lesssim\frac{\log\log k}{k}, we apply the bound (gcd(V)=γ,V0|𝐭𝐲𝐩)k(2es)γγγ\mathbb{P}(\mathrm{gcd}(V)=\gamma,V\neq 0|\mathrm{\mathbf{typ}})\lesssim\frac{k(2es)^{\gamma}}{\gamma^{\gamma}} in Lemma 4.10 to show

𝔼[gcdr¯𝟏{V0}|𝐭𝐲𝐩]1\displaystyle\mathbb{E}\left[\mathrm{gcd}^{\bar{r}}\mathbf{1}\{V\neq 0\}|\mathrm{\mathbf{typ}}\right]-1 γ=22rγr¯k(2es)γγγγ=2r¯γr¯γk(2es)2+r¯<γ2rk(2es)γ\displaystyle\lesssim\sum_{\gamma=2}^{2r_{*}}\gamma^{\bar{r}}\frac{k(2es)^{\gamma}}{\gamma^{\gamma}}\leq\sum_{\gamma=2}^{\bar{r}}\gamma^{\bar{r}-\gamma}k(2es)^{2}+\sum_{\bar{r}<\gamma\leq 2r_{*}}k(2es)^{\gamma}
Cr¯ks2+Cksr¯Cr¯ks2(loglogk)2k=o(1).\displaystyle\lesssim C_{\bar{r}}ks^{2}+Cks^{\bar{r}}\lesssim C^{\prime}_{\bar{r}}ks^{2}\lesssim\frac{(\log\log k)^{2}}{k}=o(1).

The proof of Proposition 7 is completed given Lemma 4.9 and Lemma 4.11.

4.7. Preliminary Results on Groups

Throughout this paper, the notation A\langle A\rangle denotes the subgroup generated by the set of elements AGA\subseteq G. Recall from Definition 6 that RR_{\ell} are the representatives of Q=G/G+1Q_{\ell}=G_{\ell}/G_{\ell+1}.

Lemma 4.12.

For fixed h1,,hnR1h_{1},\dots,h_{n}\in R_{1} and U1,,UniidUnif(R)U_{1},\dots,U_{n}\overset{iid}{\sim}\mathrm{Unif}(R_{\ell}) with 1\ell\geq 1, we have

G+2i[n][hi,Ui]Unif({G+2[hi,g]:i[n],gR}).G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]\sim\mathrm{Unif}\left(\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle\right).
Proof.

Write R={gu:u[|R|]}R_{\ell}=\{g_{u}:u\in[|R_{\ell}|]\}. Any G+2x{G+2[hi,g]:i[n],gR}G_{\ell+2}x\in\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle can be expressed as

G+2x=G+2i[n],u[|R|]ci,u[hi,gu]=G+2i[n][hi,u[|R|]ci,ugu]G_{\ell+2}x=G_{\ell+2}\sum_{i\in[n],u\in[|R_{\ell}|]}c_{i,u}[h_{i},g_{u}]=G_{\ell+2}\sum_{i\in[n]}\left[h_{i},\sum_{u\in[|R_{\ell}|]}c_{i,u}g_{u}\right]

for some integer coefficients {ci,u:i[n],u[|R|]}\{c_{i,u}:i\in[n],u\in[|R_{\ell}|]\}. Note that since RR_{\ell} is a representative set of QQ_{\ell}, there exists a giRg^{\prime}_{i}\in R_{\ell} such that G+1u[|R|]ci,ugu=G+1giG_{\ell+1}\sum_{u\in[|R_{\ell}|]}c_{i,u}g_{u}=G_{\ell+1}g^{\prime}_{i}, and thus by Proposition 2,

G+2i[n][hi,u[|R|]ci,ugu]=G+2[hi,gi].G_{\ell+2}\sum_{i\in[n]}\left[h_{i},\sum_{u\in[|R_{\ell}|]}c_{i,u}g_{u}\right]=G_{\ell+2}[h_{i},g^{\prime}_{i}].

Therefore, for any x{G+2[hi,g]:i[n],gR}x\in\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle, there exists g1,,gnRg^{\prime}_{1},\dots,g^{\prime}_{n}\in R_{\ell} such that

G+2x=G+2i[n][hi,gi].G_{\ell+2}x=G_{\ell+2}\sum_{i\in[n]}[h_{i},g^{\prime}_{i}].

Hence, for any x{G+2[hi,g]:i[n],gR}x\in\langle\{G_{\ell+2}[h_{i},g]:i\in[n],g\in R_{\ell}\}\rangle, by Proposition 3,

(G+2i[n][hi,Ui]=G+2x)\displaystyle\mathbb{P}(G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]=G_{\ell+2}x) =(G+2i[n][hi,Ui(gi)1]=G+2)=(G+2i[n][hi,Ui]=G+2),\displaystyle=\mathbb{P}(G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}\cdot(g^{\prime}_{i})^{-1}]=G_{\ell+2})=\mathbb{P}(G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]=G_{\ell+2}),

which proves the uniformity of G+2i[n][hi,Ui]G_{\ell+2}\sum_{i\in[n]}[h_{i},U_{i}]. ∎

Lemma 4.13.

Let n,αn,\alpha\in\mathbb{N} be fixed and pp be a prime. Suppose U1,U2,,Un+2U_{1},U_{2},\dots,U_{n+2} are i.i.d. uniform random variables over pα\mathbb{Z}_{p^{\alpha}}. We have

𝔼[(|pα||U1,,Un+2|)n]exp(1p21).\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p^{\alpha}}|}{|\langle U_{1},\dots,U_{n+2}\rangle|}\right)^{n}\right]\leq\exp\left(\frac{1}{p^{2}-1}\right).
Proof.

Let Ai={U1,,Un+2 are all divisible by pi}A_{i}=\{\text{$U_{1},\dots,U_{n+2}$ are all divisible by $p^{i}$}\} for 0iα0\leq i\leq\alpha and Aα+1=A_{\alpha+1}=\emptyset. On the event Ai\Ai+1A_{i}\backslash A_{i+1} we know |pα||U1,,Un+2|=pi\frac{|\mathbb{Z}_{p^{\alpha}}|}{|\langle U_{1},\dots,U_{n+2}\rangle|}=p^{i}. Also note that (Ai)=p(n+2)i\mathbb{P}(A_{i})=p^{-(n+2)i} for 0iα0\leq i\leq\alpha. It follows that

𝔼[(|pα||U1,,Un+2|)n]\displaystyle\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p^{\alpha}}|}{|\langle U_{1},\dots,U_{n+2}\rangle|}\right)^{n}\right] =i=0αpni(Ai\Ai+1)\displaystyle=\sum_{i=0}^{\alpha}p^{ni}\mathbb{P}(A_{i}\backslash A_{i+1})
i=0αpnip(n+2)i11p2exp(1p21).\displaystyle\leq\sum_{i=0}^{\alpha}p^{ni}\cdot p^{-(n+2)i}\leq\frac{1}{1-p^{-2}}\leq\exp\left(\frac{1}{p^{2}-1}\right).

Lemma 4.14.

Let HH be a subset of GG. For any [L1]\ell\in[L-1],

|Q+1||{G+2[h,g]:hH,gR}|(|Gab||{G2h:hH}|)r+1.\frac{|Q_{\ell+1}|}{|\langle\{G_{\ell+2}[h,g]:h\in H,g\in R_{\ell}\}\rangle|}\leq\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}h:h\in H\}\rangle|}\right)^{r_{\ell+1}}.
Proof.

For simplicity of notation, write N:={G2h:hH}N:=\langle\{G_{2}h:h\in H\}\rangle and then let λ=|Gab|/|N|\lambda=|G_{\mathrm{ab}}|/|N|. Since GabG_{\mathrm{ab}} is abelian, it can be expressed in the form

Gab=m1mrG_{\mathrm{ab}}=\mathbb{Z}_{m_{1}}\oplus\cdots\oplus\mathbb{Z}_{m_{r}}

for some m1,,mrm_{1},\dots,m_{r}\in\mathbb{N} where rr is the rank of GG. This decomposition allows us to see that λGabN\lambda G_{\mathrm{ab}}\trianglelefteq N. As a consequence,

{G+2[h,g]:G2hλGab,gR}{G+2[h,g]:hH,gR}\langle\{G_{\ell+2}[h,g]:G_{2}h\in\lambda G_{\mathrm{ab}},g\in R_{\ell}\}\rangle\trianglelefteq\langle\{G_{\ell+2}[h,g]:h\in H,g\in R_{\ell}\}\rangle

Note that

{G+2[h,g]:G2hλGab,gR}={λG+2[h,g]:hG,gR}=λQ+1.\langle\{G_{\ell+2}[h,g]:G_{2}h\in\lambda G_{\mathrm{ab}},g\in R_{\ell}\}\rangle=\langle\{\lambda G_{\ell+2}[h,g]:h\in G,g\in R_{\ell}\}\rangle=\lambda Q_{\ell+1}.

We then have

|Q+1||{G+2[h,g]:hH,gR}|\displaystyle\frac{|Q_{\ell+1}|}{|\langle\{G_{\ell+2}[h,g]:h\in H,g\in R_{\ell}\}\rangle|} |Q+1|{G+2[h,g]:G2hλGab,gR}\displaystyle\leq\frac{|Q_{\ell+1}|}{\langle\{G_{\ell+2}[h,g]:G_{2}h\in\lambda G_{\mathrm{ab}},g\in R_{\ell}\}\rangle}
=|Q+1||λQ+1|λr+1.\displaystyle=\frac{|Q_{\ell+1}|}{|\lambda Q_{\ell+1}|}\leq\lambda^{r_{\ell+1}}.

Let RR be a non-trivial commutative ring with identity, and let M=(mij)n×nM=(m_{ij})_{n\times n} be a matrix over RR. For a maximal ideal \mathcal{I} of RR, let π:RR/\pi:R\to R/\mathcal{I} be the natural homomorphism. Let πn:Rn(R/)n\pi^{n}:R^{n}\to(R/\mathcal{I})^{n} be defined as πn(g1,,gn)=(π(g1),,π(gn))\pi^{n}(g_{1},\dots,g_{n})=(\pi(g_{1}),\dots,\pi(g_{n})) for g1,,gnRg_{1},\dots,g_{n}\in R. The following result comes from the theory of matrices over a commutative ring and it is stated in [7, p. 259].

Proposition 11.

If M:RnRnM:R^{n}\to R^{n} is a homomorphism, then MM is surjective if and only if for every maximal ideal \mathcal{I} of RR, the map πM:(R/)n(R/)n\pi_{M}:(R/\mathcal{I})^{n}\to(R/\mathcal{I})^{n} is surjective, where πMπn=πnM\pi_{M}\circ\pi^{n}=\pi^{n}\circ M.

We will be interested in R=pαR=\mathbb{Z}_{p^{\alpha}} where pp is a prime. The unique maximal ideal of pα\mathbb{Z}_{p^{\alpha}} is =p\mathcal{I}=\langle p\rangle. Then pα/p\mathbb{Z}_{p^{\alpha}}/\mathcal{I}\cong\mathbb{Z}_{p}.

Lemma 4.15.

Let α,n\alpha,n\in\mathbb{N} and U:=(U1,,Un)U:=(U_{1},\dots,U_{n}) where UiiidUnif(pα)U_{i}\overset{iid}{\sim}\mathrm{Unif}(\mathbb{Z}_{p^{\alpha}}). Let MM denote a n×nn\times n matrix over pα\mathbb{Z}_{p^{\alpha}}. If M:(pα)n(pα)nM:(\mathbb{Z}_{p^{\alpha}})^{n}\to(\mathbb{Z}_{p^{\alpha}})^{n} is surjective, then U~:=MU\tilde{U}:=MU has independent entries that are uniform over pα\mathbb{Z}_{p^{\alpha}}.

Proof.

Define a surjective group homomorphism f:(pα)n(pα)nf:(\mathbb{Z}_{p^{\alpha}})^{n}\to(\mathbb{Z}_{p^{\alpha}})^{n} by f(𝒙):=M𝒙f(\bm{x}):=M\bm{x} where 𝒙(pα)n\bm{x}\in(\mathbb{Z}_{p^{\alpha}})^{n}. Since ff is surjective, for any 𝒚(pα)n\bm{y}\in(\mathbb{Z}_{p^{\alpha}})^{n} there exists some 𝒚(pα)n\bm{y}^{\prime}\in(\mathbb{Z}_{p^{\alpha}})^{n} such that f(𝒚)=𝒚f(\bm{y}^{\prime})=\bm{y}. It follows that for any 𝒚(pα)n\bm{y}\in(\mathbb{Z}_{p^{\alpha}})^{n}

(U~=𝒚)\displaystyle\mathbb{P}(\tilde{U}=\bm{y}) =(f(U)=𝒚)=(f(U𝒚)=0)=(f(U)=0)=(U~=0)\displaystyle=\mathbb{P}(f(U)=\bm{y})=\mathbb{P}(f(U-\bm{y}^{\prime})=0)=\mathbb{P}(f(U)=0)=\mathbb{P}(\tilde{U}=0)
=|Ker(f)||(pα)n|=1|Im(f)|=1pαn,\displaystyle=\frac{|Ker(f)|}{|(\mathbb{Z}_{p^{\alpha}})^{n}|}=\frac{1}{|Im(f)|}=\frac{1}{p^{\alpha n}},

where the last equality uses the fact that ff is surjective. ∎

4.8. Proof of Proposition 9

4.8.1. Definitions

To prepare for the proof of Proposition 9, we first introduce several useful quantities and describe the selection of the good event 𝒜\mathcal{A} and index set 𝒦[k]\mathcal{K}\subseteq[k], the reason for whose definition will become clearer as we proceed with the proof of Proposition 9.

To simplify notation we define the following matrix.

Definition 8.

Recall the definition of (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b} from (34). Define

m^ba={mba if a<bmab if b<a0 if b=a.\hat{m}_{ba}=\begin{cases}m_{ba}&\text{ if }a<b\\ -m_{ab}&\text{ if }b<a\\ 0&\text{ if }b=a.\end{cases} (44)
Definition 9.

Let A=(Aba)a,b[k]A=(A_{ba})_{a,b\in[k]} be a k×kk\times k matrix with entries in \mathbb{Z}, and let 𝒦:=𝒦(A)[k]\mathcal{K}:=\mathcal{K}(A)\subseteq[k] denote a subset of indices which is known when AA is given. For b[k]b\in[k], we define χb(A)\chi_{b}(A), and respectively ψb(A)\psi_{b}(A), to be an element in GG that satisfies

G2χb(A)=G2(a[k]AbaZa,1),G_{2}\chi_{b}(A)=G_{2}\left(\sum_{a\in[k]}A_{ba}Z_{a,1}\right),

and respectively

G2ψb(A)=G2(a[k]:a<bAbaZa,1+a𝒦c:a>bAbaZa,1).G_{2}\psi_{b}(A)=G_{2}\left(\sum_{a\in[k]:a<b}A_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}A_{ba}Z_{a,1}\right).

In particular, for b[k]b\in[k], define χb:=χb(m^)\chi_{b}:=\chi_{b}(\hat{m}) and ψb:=ψb(m^)\psi_{b}:=\psi_{b}(\hat{m}).

Since by definition

G2χb=G2(a[k]m^baZa,1),G_{2}\chi_{b}=G_{2}\left(\sum_{a\in[k]}\hat{m}_{ba}Z_{a,1}\right),

with slight abuse of notation we will write

χb=a[k]m^baZa,1for b[k],\chi_{b}=\sum_{a\in[k]}\hat{m}_{ba}Z_{a,1}\quad\quad\text{for }b\in[k], (45)

so that we can write, again with slight abuse of notation,

χ:=(χ1,,χk)T=m^(Z1,1,,Zk,1)T,\chi:=(\chi_{1},\dots,\chi_{k})^{T}=\hat{m}(Z_{1,1},\dots,Z_{k,1})^{T},

which is the product of the matrix m^\hat{m} and the vector (Z1,1,,Zk,1)(Z_{1,1},\dots,Z_{k,1}).


Let

K:==2Lr+2.K:=\sum_{\ell=2}^{L}r_{\ell}+2. (46)

For a K×KK\times K submatrix MM of m^\hat{m}, define the matrix Mp:=MmodpM_{p}:=M\mod p, i.e., Mp(i,j):=M(i,j)modpM_{p}(i,j):=M(i,j)\mod p for the (i,j)(i,j)-th entry in MM. Note that MpM_{p} is a matrix over the field 𝔽p\mathbb{F}_{p} and hence its row rank is defined as the number of linearly independent rows in the matrix.

Definition 10.

A fixed k×kk\times k matrix (Aba)a,b[k](A_{ba})_{a,b\in[k]} is said to be good if it satisfies the following two conditions:

  1. (i)

    There exists a set 𝒦[k]\mathcal{K}\subseteq[k] such that

    {G2ψb(A),G2χb(A):b𝒦} are independent from {G2Zb,1:b𝒦},\{G_{2}\psi_{b}(A),G_{2}\chi_{b}(A):b\in\mathcal{K}\}\text{ are independent from }\{G_{2}Z_{b,1}:b\in\mathcal{K}\}, (47)
  2. (ii)

    Let Γ:={p:p is a prime that divides |Gab|}\Gamma:=\{p:p\text{ is a prime that divides }|G_{\mathrm{ab}}|\}. For each pΓp\in\Gamma there exists a K×KK\times K submatrix MM of (Aba)b𝒦,a[k](A_{ba})_{b\in\mathcal{K},a\in[k]} such that Mp:=MmodpM_{p}:=M\mod p has rank KK over the field 𝔽p\mathbb{F}_{p}, where as above K==2Lr+2K=\sum_{\ell=2}^{L}r_{\ell}+2.

Define 𝒜:={m^ is a good matrix}\mathcal{A}:=\{\text{$\hat{m}$ is a good matrix}\} and let 𝒦\mathcal{K} be the corresponding subset of indices satisfying (i).

Note that by definition both 𝒜\mathcal{A} and 𝒦\mathcal{K} are measurable with respect to ~\widetilde{\mathcal{H}}.

4.8.2. Outline of Proof

To prove Proposition 9, we derive an upper bound on (X=X|𝒜,V=0,𝐭𝐲𝐩)\mathbb{P}(X=X^{\prime}|\mathcal{A},V=0,\mathrm{\mathbf{typ}}) inductively using the following proposition.

Proposition 12.

Let 𝒦[k]\mathcal{K}\subseteq[k] be measurable with respect to ~\widetilde{\mathcal{H}}. For 2L12\leq\ell\leq L-1, letting

H𝒦,+1:={G+2[χb,g]:b𝒦,gR},H_{\mathcal{K},\ell+1}:=\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle,

we have

𝟏{+1,V=0}(+2|1,~)𝟏{+1,V=0}|H𝒦,+1|1.\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(\mathcal{E}_{\ell+2}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\leq\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot|H_{\mathcal{K},\ell+1}|^{-1}.

Applying Lemma 4.14 to H𝒦,+1H_{\mathcal{K},\ell+1} gives

|H𝒦,+1|11|Q+1|(|Gab||{G2χb:b𝒦}|)r+1.|H_{\mathcal{K},\ell+1}|^{-1}\leq\frac{1}{|Q_{\ell+1}|}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{\ell+1}}. (48)

Our goal is to choose 𝒦[k]\mathcal{K}\subseteq[k] properly so that {G2χb:b𝒦}\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle is sufficiently large compared to GabG_{\mathrm{ab}}. To guarantee such a choice of 𝒦\mathcal{K} exists, we further define a “good” event 𝒜\mathcal{A}, see Definition 10, which is measurable with respect to ~\widetilde{\mathcal{H}} and occurs with high probability.

Note that 𝒜,𝐭𝐲𝐩\mathcal{A},\mathrm{\mathbf{typ}} and {V=0}\{V=0\} are measurable with respect to ~\widetilde{\mathcal{H}}. By the tower property of conditional expectation and the fact that σ(1,~)σ(1,~)\sigma(\mathcal{F}_{1},\widetilde{\mathcal{H}})\subseteq\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) for 2L12\leq\ell\leq L-1, Proposition 12 leads to

(+2,𝒜,V=0,𝐭𝐲𝐩|1,~)\displaystyle\mathbb{P}(\mathcal{E}_{\ell+2},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}}) =𝔼[(+2,𝒜,V=0,𝐭𝐲𝐩|1,~)|1,~]\displaystyle=\mathbb{E}\left[\mathbb{P}(\mathcal{E}_{\ell+2},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\big{|}\mathcal{F}_{1},\widetilde{\mathcal{H}}\right]
|H𝒦,+1|1(+1,𝒜,V=0,𝐭𝐲𝐩|1,~),\displaystyle\leq|H_{\mathcal{K},\ell+1}|^{-1}\cdot\mathbb{P}(\mathcal{E}_{\ell+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}}),

which implies

(L+1,𝒜,V=0,𝐭𝐲𝐩|1,~)\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}}) =(=2L1|H𝒦,+1|1)(3,𝒜,V=0,𝐭𝐲𝐩|1,~).\displaystyle=\left(\prod_{\ell=2}^{L-1}|H_{\mathcal{K},\ell+1}|^{-1}\right)\cdot\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}}).

Combined with (48), the above yields

(L+1,𝒜,V=0,𝐭𝐲𝐩|1,~)\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}})
\displaystyle\leq (=2L11|Q+1|)(|Gab||{G2χb:b𝒦}|)=2L1r+1(3,𝒜,V=0,𝐭𝐲𝐩|1,~).\displaystyle\left(\prod_{\ell=2}^{L-1}\frac{1}{|Q_{\ell+1}|}\right)\cdot\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{\sum_{\ell=2}^{L-1}r_{\ell+1}}\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{F}_{1},\widetilde{\mathcal{H}}). (49)

Upper bounding the expectation of (4.8.2) is the key to proving Proposition 9. The choice of 𝒜\mathcal{A} and 𝒦\mathcal{K} in Definition 10 is made so that the expectation of the right hand side of (4.8.2) leads to the desired result.

In Section 4.8.3 we prove Proposition 12. We complete the proof of Proposition 9 in Section 4.8.4 and finish the proof of a key lemma in Section 4.8.5.

4.8.3. Proof of Proposition 12

The key to proving Proposition 12 is the simplification of G+2X(X)1G_{\ell+2}X(X^{\prime})^{-1}. The analysis in this section is somewhat similar to that in Section 4.5, where we obtained a simplified expression of G+1X(X)1G_{\ell+1}X(X^{\prime})^{-1} when V0V\neq 0. However, when V=0V=0 the result of simplification is quite different. Instead of a simple quantity of the form G+1a[k]VaZa,G_{\ell+1}\sum_{a\in[k]}V_{a}Z_{a,\ell} (see Lemma 4.8), we now have to deal with an expression involving commutators of {Za,u:a[k],u[L]}\{Z_{a,u}:a\in[k],u\in[L]\}.

Recall from Definition 7 the definition of ~\widetilde{\mathcal{H}}, {:[L]}\{\mathcal{F}_{\ell}:\ell\in[L]\} and {:[L+1]}\{\mathcal{E}_{\ell}:\ell\in[L+1]\}. Define

𝐗(+1):=a,b[k]:a<b[i=11Za,i,j=11Zb,j]mbaφ(2)({Za,u:a[k],u2}),\mathbf{X}^{(\ell+1)}:=\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell-1}Z_{a,i},\prod_{j=1}^{\ell-1}Z_{b,j}\right]^{m_{ba}}\varphi^{(\ell-2)}(\{Z_{a,u}:a\in[k],u\leq\ell-2\}), (50)

which comes from the right hand side of (4.5).

Lemma 4.16.

On {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\} we have

G+2X(X)1=G+2a,b[k]:a<bmba([Za,1,Zb,]+[Za,,Zb,1])+G+2(φ~𝐗(+1)),G_{\ell+2}X(X^{\prime})^{-1}=G_{\ell+2}\sum_{a,b\in[k]:a<b}m_{ba}([Z_{a,1},Z_{b,\ell}]+[Z_{a,\ell},Z_{b,1}])+G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)}),

where G+2(φ~𝐗(+1))Q+1G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)})\in Q_{\ell+1} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) and φ~:=φ~({Za,u:a[k],u1})\tilde{\varphi}:=\tilde{\varphi}(\{Z_{a,u}:a\in[k],u\leq\ell-1\}) is a polynomial whose definition will be clarified in the proof.

Proof.

Recall from Lemma 4.6 that {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\} is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}). On the event {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\} we can write

G+2X(X)1\displaystyle G_{\ell+2}X(X^{\prime})^{-1} =G+2a,b[k]:a<b[i=1Za,i,j=1Zb,j]mbaφ(1)({Za,u:a[k],u1}),\displaystyle=G_{\ell+2}\prod_{a,b\in[k]:a<b}\left[\prod_{i=1}^{\ell}Z_{a,i},\prod_{j=1}^{\ell}Z_{b,j}\right]^{m_{ba}}\varphi^{(\ell-1)}(\{Z_{a,u}:a\in[k],u\leq\ell-1\}), (51)

where φ(1)\varphi^{(\ell-1)} is defined analogously to φ(2)\varphi^{(\ell-2)} in (4.5). Since all terms in φ(2)\varphi^{(\ell-2)} are present in φ(1)\varphi^{(\ell-1)} we can express G+2φ(1):=G+2φ(2)φ~,G_{\ell+2}\varphi^{(\ell-1)}:=G_{\ell+2}\varphi^{(\ell-2)}\cdot\tilde{\varphi}, where φ~:=φ~({Za,u:a[k],u1})\tilde{\varphi}:=\tilde{\varphi}(\{Z_{a,u}:a\in[k],u\leq\ell-1\}) is the polynomial that comes from excluding all the terms in G+2φ(2)G_{\ell+2}\varphi^{(\ell-2)} from G+2φ(1)G_{\ell+2}\varphi^{(\ell-1)}.

We can rewrite (51) as

G+2X(X)1\displaystyle G_{\ell+2}X(X^{\prime})^{-1} =G+2a,b[k]:a<b([Za,1,Zb,][Za,,Zb,1])mbaφ~𝐗(+1).\displaystyle=G_{\ell+2}\prod_{a,b\in[k]:a<b}([Z_{a,1},Z_{b,\ell}]\cdot[Z_{a,\ell},Z_{b,1}])^{m_{ba}}\cdot\tilde{\varphi}\cdot\mathbf{X}^{(\ell+1)}. (52)

Note that G+2φ~G+1G_{\ell+2}\tilde{\varphi}\in G_{\ell+1}, as every ii-fold commutator with i3i\geq 3 in φ~\tilde{\varphi} must involve a Za,1Z_{a,\ell-1} for some a[k]a\in[k]. It follows from the proof of Lemma 4.6 that 𝐗(+1)G+1\mathbf{X}^{(\ell+1)}\in G_{\ell+1} on {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\}. Furthermore, it is easy to see that both G+2φ~G_{\ell+2}\tilde{\varphi} and G+2𝐗(+1)G_{\ell+2}\mathbf{X}^{(\ell+1)} are measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) as they only involve terms in {Za,i:1i1}\{Z_{a,i}:1\leq i\leq\ell-1\}.

Since Q+1=G+1/G+2Q_{\ell+1}=G_{\ell+1}/G_{\ell+2} is abelian, we can equivalently write (52) in terms of addition and obtain the desired expression. ∎


Proof of Proposition 12. For simplicity of notation, let fG+1f\in G_{\ell+1} be such that

G+2f:=G+2a,b[k]:a<bmba([Za,1,Zb,]+[Za,,Zb,1]).G_{\ell+2}f:=G_{\ell+2}\sum_{a,b\in[k]:a<b}m_{ba}([Z_{a,1},Z_{b,\ell}]+[Z_{a,\ell},Z_{b,1}]).

It follows from Lemma 4.16 that G+2X(X)1=G+2f+G+2(φ~𝐗(+1))G_{\ell+2}X(X^{\prime})^{-1}=G_{\ell+2}f+G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)}) on {+1,V=0}\{\mathcal{E}_{\ell+1},V=0\} and hence

𝟏{+1,V=0}(+2|1,~)\displaystyle\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(\mathcal{E}_{\ell+2}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) =𝟏{+1,V=0}(G+2f+G+2(φ~𝐗(+1))=G+2|1,~)\displaystyle=\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(G_{\ell+2}f+G_{\ell+2}(\tilde{\varphi}\mathbf{X}^{(\ell+1)})=G_{\ell+2}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})
maxgG+1𝟏{+1,V=0}(G+2f=G+2g|1,~)\displaystyle\leq\max_{g_{\ell}\in G_{\ell+1}}\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) (53)

Let 𝒢,𝒦c\mathcal{G}_{\ell,\mathcal{K}^{c}} denote the σ\sigma-field generated by {Za,:a[k]\𝒦}\{Z_{a,\ell}:a\in[k]\backslash\mathcal{K}\}. Observe that

G+2f\displaystyle G_{\ell+2}f =G+2a,b[k]:a<bmba([Za,1,Zb,]+[Za,,Zb,1])\displaystyle=G_{\ell+2}\sum_{a,b\in[k]:a<b}m_{ba}([Z_{a,1},Z_{b,\ell}]+[Z_{a,\ell},Z_{b,1}])
=G+2b𝒦,a[k]:a<bmba[Za,1,Zb,]+G+2a𝒦,b[k]:a<bmba[Za,,Zb,1]\displaystyle=G_{\ell+2}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,\ell}]+G_{\ell+2}\sum_{a\in\mathcal{K},b\in[k]:a<b}m_{ba}[Z_{a,\ell},Z_{b,1}]
+G+2b𝒦c,a[k]:a<bmba[Za,1,Zb,]+G+2a𝒦c,b[k]:a<bmba[Za,,Zb,1],\displaystyle\quad+G_{\ell+2}\sum_{b\in\mathcal{K}^{c},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,\ell}]+G_{\ell+2}\sum_{a\in\mathcal{K}^{c},b\in[k]:a<b}m_{ba}[Z_{a,\ell},Z_{b,1}],
=:G+2funknown+G+2fknown\displaystyle=:G_{\ell+2}f_{unknown}+G_{\ell+2}f_{known} (54)

where the second-to-last line is known under σ(𝒢,𝒦c,1,~)\sigma(\mathcal{G}_{\ell,\mathcal{K}^{c}},\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) and thus will be denoted by G+2fknownG_{\ell+2}f_{known}. It remains to consider the third-to-last line, i.e.,

G+2funknown:=\displaystyle G_{\ell+2}f_{unknown}:= G+2b𝒦,a[k]:a<bmba[Za,1,Zb,]+G+2a𝒦,b[k]:a<bmba[Za,,Zb,1]\displaystyle G_{\ell+2}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,\ell}]+G_{\ell+2}\sum_{a\in\mathcal{K},b\in[k]:a<b}m_{ba}[Z_{a,\ell},Z_{b,1}]
=\displaystyle= G+2b𝒦[a[k]:a<bmbaZa,1a[k]:b<amabZa,1,Zb,]\displaystyle G_{\ell+2}\sum_{b\in\mathcal{K}}\left[\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}-\sum_{a\in[k]:b<a}m_{ab}Z_{a,1},Z_{b,\ell}\right]
=:\displaystyle=: G+2b𝒦[χb,Zb,],\displaystyle G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}], (55)

where χb\chi_{b} is as in Definition 9, i.e., χb\chi_{b} is such that G2χb=G2(a[k]:a<bmbaZa,1a[k]:b<amabZa,1)G_{2}\chi_{b}=G_{2}(\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}-\sum_{a\in[k]:b<a}m_{ab}Z_{a,1}).

Lemma 4.12 shows that G+2b𝒦[χb,Zb,]G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}] is uniform over {G+2[χb,g]:b𝒦,gR}\langle\{G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle, which is measurable with respect to σ(1,~)\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) since (χb)b𝒦(\chi_{b})_{b\in\mathcal{K}} are measurable with respect to σ(1,~)σ(1,~)\sigma(\mathcal{F}_{1},\widetilde{\mathcal{H}})\subseteq\sigma(\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}). It turns out that if we further condition on the σ\sigma-field 𝒢,𝒦c\mathcal{G}_{\ell,\mathcal{K}^{c}}, one has that for any gG+1g_{\ell}\in G_{\ell+1},

(G+2f=G+2g|1,~,𝒢,𝒦c)\displaystyle\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})
=\displaystyle= (G+2b𝒦[χb,Zb,]+G+2fknown=G+2g|1,~,𝒢,𝒦c)\displaystyle\mathbb{P}(G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}]+G_{\ell+2}f_{known}=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})
\displaystyle\leq maxg~G+1(G+2b𝒦[χb,Zb,]=G+2g~|1,~,𝒢,𝒦c)\displaystyle\max_{\tilde{g}_{\ell}\in G_{\ell+1}}\mathbb{P}(G_{\ell+2}\sum_{b\in\mathcal{K}}[\chi_{b},Z_{b,\ell}]=G_{\ell+2}\tilde{g}_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})
\displaystyle\leq |{G+2[χb,g]:b𝒦,gR}|1=|H𝒦,+1|1.\displaystyle|\{\langle G_{\ell+2}[\chi_{b},g]:b\in\mathcal{K},g\in R_{\ell}\}\rangle|^{-1}=|H_{\mathcal{K},\ell+1}|^{-1}. (56)

Therefore, we can bound (4.8.3) from above by

(G+2f=G+2g|1,~)\displaystyle\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}}) =𝔼𝒢,𝒦c[(G+2f=G+2g|1,~,𝒢,𝒦c)]|H𝒦,+1|1.\displaystyle=\mathbb{E}_{\mathcal{G}_{\ell,\mathcal{K}^{c}}}\left[\mathbb{P}(G_{\ell+2}f=G_{\ell+2}g_{\ell}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}},\mathcal{G}_{\ell,\mathcal{K}^{c}})\right]\leq|H_{\mathcal{K},\ell+1}|^{-1}. (57)

where 𝔼𝒢,𝒦c[]\mathbb{E}_{\mathcal{G}_{\ell,\mathcal{K}^{c}}}[\cdot] means we are taking the expectation over {Za,:a[k]\𝒦}\{Z_{a,\ell}:a\in[k]\backslash\mathcal{K}\}. Combining (4.8.3), (4.8.3) and (57) then yields the conclusion of Proposition 12, i.e.,

𝟏{+1,V=0}(+2|1,~)𝟏{+1,V=0}|H𝒦,+1|1.\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot\mathbb{P}(\mathcal{E}_{\ell+2}|\mathcal{F}_{\ell-1},\widetilde{\mathcal{H}})\leq\mathbf{1}\{\mathcal{E}_{\ell+1},V=0\}\cdot|H_{\mathcal{K},\ell+1}|^{-1}.

4.8.4. Proof of Proposition 9

We begin by addressing the last term in (4.8.2). Recall the definition of (χb)b[k],(ψb)b[k](\chi_{b})_{b\in[k]},(\psi_{b})_{b\in[k]} from Definition 9. Recall that 𝒦\mathcal{K} is measurable with respect to ~\widetilde{\mathcal{H}}.

Lemma 4.17.

Let σ(𝒢,~):=σ((G2ψb)b𝒦,(G2χb)b𝒦,~)\sigma(\mathcal{G},\widetilde{\mathcal{H}}):=\sigma((G_{2}\psi_{b})_{b\in\mathcal{K}},(G_{2}\chi_{b})_{b\in\mathcal{K}},\widetilde{\mathcal{H}}). Then

(3,𝒜,V=0,𝐭𝐲𝐩|𝒢,~)𝟏{𝒜,V=0,𝐭𝐲𝐩}|Q2|(|Gab||{G2ψb:b𝒦}|)r2.\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{G},\widetilde{\mathcal{H}})\leq\frac{\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}}{|Q_{2}|}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{2}}.
Proof.

Observe that

G3X(X)1\displaystyle G_{3}X(X^{\prime})^{-1} =G3a<bmba[Za,1,Zb,1]=G3b[k]a<bmba[Za,1,Zb,1]\displaystyle=G_{3}\sum_{a<b}m_{ba}[Z_{a,1},Z_{b,1}]=G_{3}\sum_{b\in[k]}\sum_{a<b}m_{ba}[Z_{a,1},Z_{b,1}]
=G3b𝒦,a[k]:a<bmba[Za,1,Zb,1]+G3b𝒦c,a𝒦:a<bmba[Za,1,Zb,1]\displaystyle=G_{3}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,1}]+G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}:a<b}m_{ba}[Z_{a,1},Z_{b,1}]
+G3b𝒦c,a𝒦c:a<bmba[Za,1,Zb,1]\displaystyle\quad+G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}^{c}:a<b}m_{ba}[Z_{a,1},Z_{b,1}]
=G3b𝒦,a[k]:a<bmba[Za,1,Zb,1]G3b𝒦,a𝒦c:a>bmab[Za,1,Zb,1]\displaystyle=G_{3}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,1}]-G_{3}\sum_{b\in\mathcal{K},a\in\mathcal{K}^{c}:a>b}m_{ab}[Z_{a,1},Z_{b,1}]
+G3b𝒦c,a𝒦c:a<bmba[Za,1,Zb,1]\displaystyle\quad+G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}^{c}:a<b}m_{ba}[Z_{a,1},Z_{b,1}]
=:G3f~+G3f~c,\displaystyle=:G_{3}\tilde{f}+G_{3}\tilde{f}_{c}, (58)

where

G3f~c:=G3b𝒦c,a𝒦c:a<bmba[Za,1,Zb,1],G_{3}\tilde{f}_{c}:=G_{3}\sum_{b\in\mathcal{K}^{c},a\in\mathcal{K}^{c}:a<b}m_{ba}[Z_{a,1},Z_{b,1}],

and

G3f~\displaystyle G_{3}\tilde{f} :=G3b𝒦,a[k]:a<bmba[Za,1,Zb,1]G3b𝒦,a𝒦c:a>bmab[Za,1,Zb,1]\displaystyle:=G_{3}\sum_{b\in\mathcal{K},a\in[k]:a<b}m_{ba}[Z_{a,1},Z_{b,1}]-G_{3}\sum_{b\in\mathcal{K},a\in\mathcal{K}^{c}:a>b}m_{ab}[Z_{a,1},Z_{b,1}]
=G3b𝒦[a[k]:a<bmbaZa,1+a𝒦c:a>bm^baZa,1,Zb,1]=:G3b𝒦[ψb,Zb,1],\displaystyle=G_{3}\sum_{b\in\mathcal{K}}\left[\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1},Z_{b,1}\right]=:G_{3}\sum_{b\in\mathcal{K}}\left[\psi_{b},Z_{b,1}\right],

where ψb\psi_{b} is as in Definition 9.

By Definition 10, on 𝒜\mathcal{A} there exists a set 𝒦[k]\mathcal{K}\subseteq[k] such that conditionally on ~\widetilde{\mathcal{H}}, (G2χb)b𝒦,(G2ψb)b𝒦(G_{2}\chi_{b})_{b\in\mathcal{K}},(G_{2}\psi_{b})_{b\in\mathcal{K}} are independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}}. Hence, conditioning on σ(𝒢,~)\sigma(\mathcal{G},\widetilde{\mathcal{H}}), (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}} are i.i.d. uniform over GabG_{\mathrm{ab}}. In addition note that G3f~cG_{3}\tilde{f}_{c} is independent from (Zb,1)b𝒦(Z_{b,1})_{b\in\mathcal{K}} since G3f~cG_{3}\tilde{f}_{c} only involves (Zb,1)b𝒦c(Z_{b,1})_{b\in\mathcal{K}^{c}}. Hence,

(G3X(X)1=G3|𝒢,~)\displaystyle\mathbb{P}(G_{3}X(X^{\prime})^{-1}=G_{3}|\mathcal{G},\widetilde{\mathcal{H}})
=\displaystyle= (G3b𝒦[ψb,Zb,1]=G3f~c|𝒢,~)maxg~G2(G3b𝒦[ψb,Zb,1]=G3g~|𝒢,~)\displaystyle\mathbb{P}\left(G_{3}\sum_{b\in\mathcal{K}}[\psi_{b},Z_{b,1}]=-G_{3}\tilde{f}_{c}\bigg{|}\mathcal{G},\widetilde{\mathcal{H}}\right)\leq\max_{\tilde{g}\in G_{2}}\mathbb{P}\left(G_{3}\sum_{b\in\mathcal{K}}[\psi_{b},Z_{b,1}]=G_{3}\tilde{g}\bigg{|}\mathcal{G},\widetilde{\mathcal{H}}\right)
\displaystyle\leq |{G3[ψb,g]:b𝒦,gR1}|11|Q2|(|Gab||{G2ψb:b𝒦}|)r2,\displaystyle|\langle\{G_{3}[\psi_{b},g]:b\in\mathcal{K},g\in R_{1}\}\rangle|^{-1}\leq\frac{1}{|Q_{2}|}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{2}},

where the last line follows from Lemma 4.12 and Lemma 4.18. ∎

The last ingredient needed to complete the proof of Proposition 9 is the following estimate whose proof will be delayed till Section 4.8.5.

Lemma 4.18.

Let (χb)b[k],(ψb)b[k](\chi_{b})_{b\in[k]},(\psi_{b})_{b\in[k]} be defined as in Definition 9. Let 𝒜\mathcal{A} and 𝒦\mathcal{K} be defined as in Definition 10. Then

𝟏{𝒜,V=0,𝐭𝐲𝐩}𝔼[(|Gab||{G2χb:b𝒦}|)K2|~]𝟏{𝒜,V=0,𝐭𝐲𝐩}exp(i=1ri2)\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]\leq\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right)

and

𝟏{𝒜,V=0,𝐭𝐲𝐩}𝔼[(|Gab||{G2ψb:b𝒦}|)K2|~]𝟏{𝒜,V=0,𝐭𝐲𝐩}exp(i=1ri2).\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]\leq\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right).

Proof of Proposition 9. Recall that σ(𝒢,~)=σ((ψb)b𝒦,(χb)b𝒦,~)\sigma(\mathcal{G},\widetilde{\mathcal{H}})=\sigma((\psi_{b})_{b\in\mathcal{K}},(\chi_{b})_{b\in\mathcal{K}},\widetilde{\mathcal{H}}) so that (G2χb)b𝒦(G_{2}\chi_{b})_{b\in\mathcal{K}} are measurable with respect to σ(𝒢,~)\sigma(\mathcal{G},\widetilde{\mathcal{H}}). Recall that K==2Lr+2K=\sum_{\ell=2}^{L}r_{\ell}+2. Noting that σ(𝒢,~)σ(1,~)\sigma(\mathcal{G},\widetilde{\mathcal{H}})\subseteq\sigma(\mathcal{F}_{1},\widetilde{\mathcal{H}}), applying the tower property to (4.8.2) gives

(L+1,𝒜,V=0,𝐭𝐲𝐩|𝒢,~)\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{G},\widetilde{\mathcal{H}})
\displaystyle\leq (=2L11|Q+1|)𝔼[(|Gab||{G2χb:b𝒦}|)K2r2𝟏{3,𝒜,V=0,𝐭𝐲𝐩}|𝒢,~]\displaystyle\left(\prod_{\ell=2}^{L-1}\frac{1}{|Q_{\ell+1}|}\right)\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2-r_{2}}\mathbf{1}\{\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\bigg{|}\mathcal{G},\widetilde{\mathcal{H}}\right]
=\displaystyle= (=2L11|Q+1|)(|Gab||{G2χb:b𝒦}|)K2r2(3,𝒜,V=0,𝐭𝐲𝐩|𝒢,~)\displaystyle\left(\prod_{\ell=2}^{L-1}\frac{1}{|Q_{\ell+1}|}\right)\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2-r_{2}}\mathbb{P}(\mathcal{E}_{3},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{G},\widetilde{\mathcal{H}})
\displaystyle\leq (=1L11|Q+1|)𝟏{𝒜,V=0,𝐭𝐲𝐩}(|Gab||{G2χb:b𝒦}|)K2r2(|Gab||{G2ψb:b𝒦}|)r2\displaystyle\left(\prod_{\ell=1}^{L-1}\frac{1}{|Q_{\ell+1}|}\right)\cdot\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2-r_{2}}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{2}}

where the last line follows from Lemma 4.17. Then

(L+1,𝒜,V=0,𝐭𝐲𝐩|~)=𝔼[(L+1,𝒜,V=0,𝐭𝐲𝐩|𝒢,~)|~]\displaystyle\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\widetilde{\mathcal{H}})=\mathbb{E}\left[\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\mathcal{G},\widetilde{\mathcal{H}})\big{|}\widetilde{\mathcal{H}}\right]
\displaystyle\leq (=1L11|Q+1|)𝟏{𝒜,V=0,𝐭𝐲𝐩}𝔼[(|Gab||{G2χb:b𝒦}|)K2r2(|Gab||{G2ψb:b𝒦}|)r2|~].\displaystyle\left(\prod_{\ell=1}^{L-1}\frac{1}{|Q_{\ell+1}|}\right)\cdot\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2-r_{2}}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{2}}\bigg{|}\widetilde{\mathcal{H}}\right].

We can bound the last term above by Hölder’s inequality and Lemma 4.18,

𝟏{𝒜,V=0,𝐭𝐲𝐩}𝔼[(|Gab||{G2χb:b𝒦}|)K2r2(|Gab||{G2ψb:b𝒦}|)r2|~]\displaystyle\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2-r_{2}}\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{2}}\bigg{|}\widetilde{\mathcal{H}}\right]
\displaystyle\leq 𝟏{𝒜,V=0,𝐭𝐲𝐩}𝔼[(|Gab||{G2χb:b𝒦}|)K2|~]K2r2K2𝔼[(|Gab||{G2ψb:b𝒦}|)K2|~]r2K2\displaystyle\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]^{\frac{K-2-r_{2}}{K-2}}\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\psi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]^{\frac{r_{2}}{K-2}}
\displaystyle\leq 𝟏{𝒜,V=0,𝐭𝐲𝐩}exp(i=1ri2).\displaystyle\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right).

That is,

(L+1,𝒜,V=0,𝐭𝐲𝐩|~)𝟏{𝒜,V=0,𝐭𝐲𝐩}=1L11|Q+1|exp(i=1ri2).\mathbb{P}(\mathcal{E}_{L+1},\mathcal{A},V=0,\mathrm{\mathbf{typ}}|\widetilde{\mathcal{H}})\leq\mathbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\prod_{\ell=1}^{L-1}\frac{1}{|Q_{\ell+1}|}\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right).

Taking expectation over ~\widetilde{\mathcal{H}} on both sides and letting C:=exp(i=1ri2)C:=\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right), we have

|G|(L+1|𝒜,V=0,𝐭𝐲𝐩)|G|C=1L11|Q+1|=C|Gab|eh,|G|\cdot\mathbb{P}(\mathcal{E}_{L+1}|\mathcal{A},V=0,\mathrm{\mathbf{typ}})\leq|G|\cdot C\prod_{\ell=1}^{L-1}\frac{1}{|Q_{\ell+1}|}=C|G_{\mathrm{ab}}|\ll e^{h},

where hh is defined as in Definition 4 and by definition |Gab|eh|G_{\mathrm{ab}}|\ll e^{h}. ∎

4.8.5. Proof of Lemma 4.18

Proof.

The proof of the two inequalities is essentially the same. Without loss of generality we only prove the first inequality.

We can write

Gab=Fp1FpγG_{\mathrm{ab}}=F_{p_{1}}\oplus\cdots\oplus F_{p_{\gamma}} (59)

where p1,,pγp_{1},\dots,p_{\gamma} are distinct primes and FpF_{p} is a Sylow pp-subgroup of GabG_{\mathrm{ab}}. Each FpiF_{p_{i}} has the form

Fpi=j=1βipiαi,jF_{p_{i}}=\oplus_{j=1}^{\beta_{i}}\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}

and hence

Gabi=1γj=1βipiαi,j,G_{\mathrm{ab}}\cong\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}, (60)

where we can observe that maxi[γ]βir\max_{i\in[\gamma]}\beta_{i}\leq r.

Since G2Za,1iidUnif(Gab)G_{2}Z_{a,1}\overset{iid}{\sim}\mathrm{Unif}(G_{\mathrm{ab}}), for each Za,1Z_{a,1} with a[k]a\in[k], G2Za,1G_{2}Z_{a,1} can be represented in the following form

i=1γj=1βiZi,j(a)\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}Z^{(a)}_{i,j}

for a collection of independent random variables {Zi,j(a):1iγ,1jβi}\{Z^{(a)}_{i,j}:1\leq i\leq\gamma,1\leq j\leq\beta_{i}\} such that Zi,j(a)Z^{(a)}_{i,j} is uniform over piαi,j\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}. With slight abuse of notation, we will write

G2Za,1=i=1γj=1βiZi,j(a).G_{2}Z_{a,1}=\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}Z^{(a)}_{i,j}.

Based on this we can further write

G2χb=i=1γj=1βi(a[k]m^baZi,j(a))=:i=1γj=1βiχi,j(b),G_{2}\chi_{b}=\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}\left(\sum_{a\in[k]}\hat{m}_{ba}Z^{(a)}_{i,j}\right)=:\oplus_{i=1}^{\gamma}\oplus_{j=1}^{\beta_{i}}\chi^{(b)}_{i,j},

where χi,j(b):=a[k]m^baZi,j(a)\chi^{(b)}_{i,j}:=\sum_{a\in[k]}\hat{m}_{ba}Z^{(a)}_{i,j} is an element in piαi,j\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}. Under the σ\sigma-field ~\widetilde{\mathcal{H}}, the coefficients {m^ba:a,b[k]}\{\hat{m}_{ba}:a,b\in[k]\} are known. Hence the collections {χi,j(b):b[k]}\{\chi^{(b)}_{i,j}:b\in[k]\} are independent for different (i,j)(i,j)’s, and we can consider the generation of each subgroup piαi,j\mathbb{Z}_{p_{i}^{\alpha_{i,j}}} by {χi,j(b):b𝒦}\{\chi^{(b)}_{i,j}:b\in\mathcal{K}\} separately, i.e.,

𝔼[(|Gab||{G2χb:b𝒦}|)r+1|~]\displaystyle\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{r_{\ell+1}}\bigg{|}\widetilde{\mathcal{H}}\right] i=1γj=1βi𝔼[(|piαi,j||χi,j(b):b𝒦|)K2|~],\displaystyle\leq\prod_{i=1}^{\gamma}\prod_{j=1}^{\beta_{i}}\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}|}{|\langle\chi^{(b)}_{i,j}:b\in\mathcal{K}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right], (61)

where as above K==2Lr+2K=\sum_{\ell=2}^{L}r_{\ell}+2.

For any piΓp_{i}\in\Gamma, we will argue that on the event 𝒜\mathcal{A} there exists a set of indices 𝒦pi𝒦\mathcal{K}_{p_{i}}\subseteq\mathcal{K} such that for any j[βi]j\in[\beta_{i}], (χi,j(b))b𝒦pi(\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}} is a collection of KK i.i.d. uniform random variables over piαi,j\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}, so that one can simply apply Lemma 4.13 to the right hand side of (61) to obtain the desired conclusion.

By Definition 10, on the event 𝒜\mathcal{A} there exists a K×KK\times K submatrix MM of (m^ba)b𝒦,a[k](\hat{m}_{ba})_{b\in\mathcal{K},a\in[k]} such that MpiM_{p_{i}} has rank KK. We will collect the column indices of MM into the set 𝒦pi,col:={a1,,aK}\mathcal{K}_{p_{i},col}:=\{a_{1},\dots,a_{K}\} (let 𝒦pi,col=\mathcal{K}_{p_{i},col}=\emptyset if 𝒜\mathcal{A} does not occur). Then we can define the σ\sigma-field 𝒢i,j:=σ((Zi,j(a))a𝒦pi,col,~)\mathcal{G}_{i,j}:=\sigma((Z^{(a)}_{i,j})_{a\notin\mathcal{K}_{p_{i},col}},\widetilde{\mathcal{H}}), and express the vector (χi,j(b))b𝒦pi(\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}} as a sum of two parts, one of which unknown under 𝒢i,j\mathcal{G}_{i,j} whereas the other known:

M(Zi,j(a1),,Zi,j(aK))+e((Zi,j(a))a𝒦pi,col),M(Z^{(a_{1})}_{i,j},\dots,Z^{(a_{K})}_{i,j})+e((Z^{(a)}_{i,j})_{a\notin\mathcal{K}_{p_{i},col}}),

where e()e(\cdot) is a function of (Zi,j(a))a𝒦pi,col(Z^{(a)}_{i,j})_{a\notin\mathcal{K}_{p_{i},col}} whose value is known under 𝒢i,j\mathcal{G}_{i,j}. Proposition 11 shows that MM is a surjective map from (piαi,j)K(\mathbb{Z}_{p_{i}^{\alpha_{i,j}}})^{K} to (piαi,j)K(\mathbb{Z}_{p_{i}^{\alpha_{i,j}}})^{K}. Lemma 4.15 further implies that the KK entries of M(Zi,j(a1),,Zi,j(aK))M\cdot(Z^{(a_{1})}_{i,j},\dots,Z^{(a_{K})}_{i,j}) are i.i.d. uniform over piαi,j\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}. It is then straightforward to see that (χi,j(b))b𝒦pi(\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}} are i.i.d. uniform over piαi,j\mathbb{Z}_{p_{i}^{\alpha_{i,j}}} given 𝒢i,j\mathcal{G}_{i,j}. That is, for any 𝐱:=(xb)b𝒦pi(piαi,j)K\mathbf{x}:=(x_{b})_{b\in\mathcal{K}_{p_{i}}}\in(\mathbb{Z}_{p_{i}^{\alpha_{i,j}}})^{K}, we have

𝟏𝒜((χi,j(b))b𝒦pi=𝐱|~)=𝟏𝒜𝔼(((χi,j(b))b𝒦pi=𝐱|𝒢i,j)|~)=𝟏𝒜(piαi,j)K.\mathbf{1}_{\mathcal{A}}\cdot\mathbb{P}\left((\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}=\mathbf{x}|\widetilde{\mathcal{H}}\right)=\mathbf{1}_{\mathcal{A}}\cdot\mathbb{E}\left(\mathbb{P}\left((\chi^{(b)}_{i,j})_{b\in\mathcal{K}_{p_{i}}}=\mathbf{x}|\mathcal{G}_{i,j}\right)\big{|}\widetilde{\mathcal{H}}\right)=\mathbf{1}_{\mathcal{A}}\cdot(p_{i}^{\alpha_{i,j}})^{-K}.

Therefore, applying Lemma 4.13 to the i.i.d. uniform {χi,j(b):b𝒦pi}\{\chi^{(b)}_{i,j}:b\in\mathcal{K}_{p_{i}}\} gives

1𝒜𝔼[(|piαi,j||χi,j(b):b𝒦|)K2|~]\displaystyle\textbf{1}_{\mathcal{A}}\cdot\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}|}{|\langle\chi^{(b)}_{i,j}:b\in\mathcal{K}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right] 1𝒜𝔼[(|piαi,j||χi,j(b):b𝒦pi|)K2|~]\displaystyle\leq\textbf{1}_{\mathcal{A}}\cdot\mathbb{E}\left[\left(\frac{|\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}|}{|\langle\chi^{(b)}_{i,j}:b\in\mathcal{K}_{p_{i}}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]
1𝒜exp(1pi21).\displaystyle\leq\textbf{1}_{\mathcal{A}}\cdot\exp\left(\frac{1}{p_{i}^{2}-1}\right).

Combining all the subgroups {piαi,j:i[γ],j[βi]}\{\mathbb{Z}_{p_{i}^{\alpha_{i,j}}}:i\in[\gamma],j\in[\beta_{i}]\}, we can obtain using (61) that

1{𝒜,V=0,𝐭𝐲𝐩}𝔼[(|Gab||{G2χb:b𝒦}|)K2|~]\displaystyle\textbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\mathbb{E}\left[\left(\frac{|G_{\mathrm{ab}}|}{|\langle\{G_{2}\chi_{b}:b\in\mathcal{K}\}\rangle|}\right)^{K-2}\bigg{|}\widetilde{\mathcal{H}}\right]
\displaystyle\leq 1{𝒜,V=0,𝐭𝐲𝐩}i=1γj=1βiexp(1pi21)1{𝒜,V=0,𝐭𝐲𝐩}exp(i=1ri2)\displaystyle\textbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\prod_{i=1}^{\gamma}\prod_{j=1}^{\beta_{i}}\exp\left(\frac{1}{p_{i}^{2}-1}\right)\leq\textbf{1}\{\mathcal{A},V=0,\mathrm{\mathbf{typ}}\}\cdot\exp\left(\sum_{i=1}^{\infty}\frac{r}{i^{2}}\right)

as maxi[γ]βir\max_{i\in[\gamma]}\beta_{i}\leq r. ∎

4.9. Proof of Proposition 10

In the proof of Proposition 9 we have conditioned on the “good event” 𝒜\mathcal{A} that guarantees the existence of a subset 𝒦[k]\mathcal{K}\subseteq[k] of indices that plays a critical role in the proof. In this section we aim to prove that indeed 𝒜\mathcal{A} will occur with a sufficiently high probability.

4.9.1. Regime: log|Gab|loglog|Gab|klog|Gab|\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|

Recall the definition of hh from Definition 4. In this regime we will work with the unconditional probability and prove the following stronger bound than that in Proposition 10:

(𝒜c)eh|G|=eω|G2|.\mathbb{P}(\mathcal{A}^{c})\ll\frac{e^{h}}{|G|}=\frac{e^{\omega}}{|G_{2}|}. (62)

Indeed, with (62) and (𝐭𝐲𝐩)1\mathbb{P}(\mathrm{\mathbf{typ}})\asymp 1 this yields the statement in Proposition 10 as follows

|G|(𝒜c,V=0|𝐭𝐲𝐩)|G|(𝒜c)(𝐭𝐲𝐩)eh.|G|\cdot\mathbb{P}(\mathcal{A}^{c},V=0|\mathrm{\mathbf{typ}})\leq\frac{|G|\cdot\mathbb{P}(\mathcal{A}^{c})}{\mathbb{P}(\mathrm{\mathbf{typ}})}\ll e^{h}.

First we will specify the choice of 𝒦\mathcal{K} in this regime and verify this choice satisfies (47) in Definition 10,

𝒦\displaystyle\mathcal{K} ={b>k/2:gcd({mba:ak/2}) and |Gab| are coprime}.\displaystyle=\{b>k/2:\mathrm{gcd}(\{m_{ba}:a\leq k/2\})\text{ and }|G_{\mathrm{ab}}|\text{ are coprime}\}. (63)

Note this choice purely depends on (mba)a,b[k],a<b(m_{ba})_{a,b\in[k],a<b} and hence is measurable with respect to ~\widetilde{\mathcal{H}}. To see that (G2ψb)b𝒦(G_{2}\psi_{b})_{b\in\mathcal{K}} are conditionally independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}} given ~\widetilde{\mathcal{H}}, we observe that for any b𝒦b\in\mathcal{K},

G2ψb=G2ak/2:a<bm^baZa,1+G2(a>k/2:a<bm^baZa,1+a𝒦c:a>bm^baZa,1),G_{2}\psi_{b}=G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}+G_{2}\left(\sum_{a>k/2:a<b}\hat{m}_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}\right),

where G2ak/2:a<bm^baZa,1Unif(Gab)G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1}\sim\mathrm{Unif}(G_{\mathrm{ab}}) due to the definition of 𝒦\mathcal{K}, see Lemma 4.7.

Furthermore, we can see that G2ak/2:a<bm^baZa,1G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1} involves terms in (Za,1)ak/2(Z_{a,1})_{a\leq k/2}, whereas G2(a>k/2:a<bm^baZa,1+a𝒦c:a>bm^baZa,1)G_{2}\left(\sum_{a>k/2:a<b}\hat{m}_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}\right) involves only terms in (Za,1)a>k/2(Z_{a,1})_{a>k/2} (since b𝒦b\in\mathcal{K}, in the second sum the condition a>ba>b leads to a>k/2a>k/2). Therefore, conditioned on ~\widetilde{\mathcal{H}}, we have that G2ak/2:a<bm^baZa,1G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1} is independent from G2(a>k/2:a<bm^baZa,1+a𝒦c:a>bm^baZa,1)G_{2}\left(\sum_{a>k/2:a<b}\hat{m}_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}\right). By the same reasoning we also have that G2ak/2:a<bm^baZa,1G_{2}\sum_{a\leq k/2:a<b}\hat{m}_{ba}Z_{a,1} is independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}}. With the information combined, we can see that conditionally on ~\widetilde{\mathcal{H}}, for any b𝒦b\in\mathcal{K}, G2ψbG_{2}\psi_{b} is independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}} and uniform over GabG_{\mathrm{ab}}.

Noting that

G2χb=G2a[k]m^baZa,1=G2ak/2m^baZa,1+G2a>k/2m^baZa,1,G_{2}\chi_{b}=G_{2}\sum_{a\in[k]}\hat{m}_{ba}Z_{a,1}=G_{2}\sum_{a\leq k/2}\hat{m}_{ba}Z_{a,1}+G_{2}\sum_{a>k/2}\hat{m}_{ba}Z_{a,1},

by the same reasoning as above we have that G2ak/2m^baZa,1G_{2}\sum_{a\leq k/2}\hat{m}_{ba}Z_{a,1} is independent from G2a>k/2m^baZa,1G_{2}\sum_{a>k/2}\hat{m}_{ba}Z_{a,1} and (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}}. Again by the definition of 𝒦\mathcal{K} we have G2ak/2m^baZa,1Unif(Gab)G_{2}\sum_{a\leq k/2}\hat{m}_{ba}Z_{a,1}\sim\mathrm{Unif}(G_{\mathrm{ab}}). That is, conditionally on ~\widetilde{\mathcal{H}}, for any b𝒦b\in\mathcal{K}, G2χbG_{2}\chi_{b} is independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}} and uniform over GabG_{\mathrm{ab}}. Therefore we have verified the condition that (G2ψb)b𝒦,(G2χb)b𝒦(G_{2}\psi_{b})_{b\in\mathcal{K}},(G_{2}\chi_{b})_{b\in\mathcal{K}} are independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}} in Definition 10.

Given the choice of 𝒦\mathcal{K} in (63), for each pΓp\in\Gamma we can define 𝒜p(𝒦)\mathcal{A}_{p}(\mathcal{K}) to be the event that there exists a K×KK\times K submatrix MM of (m^ba)b𝒦,a[k](\hat{m}_{ba})_{b\in\mathcal{K},a\in[k]} such that Mp:=MmodpM_{p}:=M\mod p has rank KK. Observe that pΓ𝒜p(𝒦)𝒜\cap_{p\in\Gamma}\mathcal{A}_{p}(\mathcal{K})\subseteq\mathcal{A} and hence

(𝒜c)pΓ(𝒜p(𝒦)c).\mathbb{P}(\mathcal{A}^{c})\leq\sum_{p\in\Gamma}\mathbb{P}(\mathcal{A}_{p}(\mathcal{K})^{c}).

Recall that our goal is to show (𝒜c)eh|G|\mathbb{P}(\mathcal{A}^{c})\ll\frac{e^{h}}{|G|}, where hh is as in Definition 4. As |Γ|log|G||\Gamma|\lesssim\log|G|, it suffices to prove maxpΓ(𝒜p(𝒦)c)eh|G|log|G|\max_{p\in\Gamma}\mathbb{P}(\mathcal{A}_{p}(\mathcal{K})^{c})\ll\frac{e^{h}}{|G|\log|G|}. In this regime we can prove a much stronger estimate.

Proposition 13.

When log|Gab|loglog|Gab|klog|Gab|\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|, for any C>0C>0,

maxpΓ(𝒜p(𝒦)c)=o(|G|C).\max_{p\in\Gamma}\mathbb{P}(\mathcal{A}_{p}(\mathcal{K})^{c})=o(|G|^{-C}).

In order to prove Proposition 13 we begin by discussing a useful property, which we refer to as the relative independence of the coefficient matrix (m^ba)a,b[k](\hat{m}_{ba})_{a,b\in[k]}, that holds in the regime log|Gab|loglog|Gab|klog|Gab|\frac{\log|G_{\mathrm{ab}}|}{\log\log|G_{\mathrm{ab}}|}\lesssim k\ll\log|G_{\mathrm{ab}}|.

Relative independence in m^\hat{m}. Recall that in this regime t=t0k|Gab|2/kt_{*}=t_{0}\asymp k|G_{\mathrm{ab}}|^{2/k}. We will consider t(1+ε)t0t\geq(1+\varepsilon)t_{0} and hence s:=t/k1s:=t/k\gg 1, i.e., in both of the random walks X(t)X(t) and X(t)X^{\prime}(t) each generator in {Za:a[k]}\{Z_{a}:a\in[k]\} should typically appear many times. As a result, we will see certain “relative independence” among a subset of terms in {mba:a<b}\{m_{ba}:a<b\}. Before explaining the meaning of relative independence we need to define some notation.

We can view 𝐗=X(X)1\mathbf{X}=X(X^{\prime})^{-1} as a sequence of generators, and express this sequence by (σi,ηi)i[N+N](\sigma_{i},\eta_{i})_{i\in[N+N^{\prime}]}. In the following construction, we will obtain partial information on (σi)i[N+N](\sigma_{i})_{i\in[N+N^{\prime}]} while conditioning on σ((ηi)i)\sigma((\eta_{i})_{i\in\mathbb{N}}). That is, at this stage of construction we will treat (ηi)i(\eta_{i})_{i\in\mathbb{N}} as known.

Let P[k]P\subseteq[k] be a subset of indices with size dd, where dd is to be chosen later. We will denote by 𝐗P\mathbf{X}_{P} the subsequence of 𝐗\mathbf{X} consisting of only generators in {Za±1:aP}\{Z_{a}^{\pm 1}:a\in P\} and let 𝒩P(t)\mathcal{N}_{P}(t) denote the length of 𝐗P\mathbf{X}_{P}, which follows a Poisson distribution with rate 2tdk\frac{2td}{k} by Poisson thinning.

Based on the subsequence 𝐗P\mathbf{X}_{P}, we will construct a collection of disjoint sets {Bab:a,bP,a<b}\{B_{ab}:a,b\in P,a<b\} as follows:

  1. (1)

    Partition 𝐗P\mathbf{X}_{P} into 𝒩P(t)/2\lfloor\mathcal{N}_{P}(t)/2\rfloor disjoint pairs, each pair consisting of the 2i12i-1 and 2i2i-th elements in 𝐗P\mathbf{X}_{P}.

  2. (2)

    For a,b[k]a,b\in[k] with a<ba<b, we look at the ii-th pair in 𝐗P\mathbf{X}_{P} for each 1i𝒩P(t)/21\leq i\leq\lfloor\mathcal{N}_{P}(t)/2\rfloor. If the two corresponding generators in the ii-th pair are Zaηa,ZbηbZ^{\eta_{a}}_{a},Z^{\eta_{b}}_{b} for some ηa,ηb{±1}\eta_{a},\eta_{b}\in\{\pm 1\} (where ηa,ηb\eta_{a},\eta_{b} are known) regardless of the order in which they appear, then we record the pair of locations (2i1,2i)(2i-1,2i) in the corresponding set BabB_{ab}.

Note that in the construction of BabB_{ab}, we have no knowledge on the exact order of the pair, that is, we only know the pair is either (Zaηa,Zbηb)(Z_{a}^{\eta_{a}},Z_{b}^{\eta_{b}}) or (Zbηb,Zaηa)(Z_{b}^{\eta_{b}},Z_{a}^{\eta_{a}}) with equal probability.

The subset PP is said to be nice if |Bab|t2dk|B_{ab}|\geq\lfloor\frac{t}{2dk}\rfloor for all a,bPa,b\in P with a<ba<b. As shown in the following lemma, PP is nice with high probability when we pick the right size dd.

Lemma 4.19.

The probability that the subset PP is nice is at least 1Cd2exp(t10dk)1-Cd^{2}\exp(-\frac{t}{10dk}) for some positive constant C>0C>0 independent of dd.

Proof.

Since 𝒩P(t)Poisson(2tdk)\mathcal{N}_{P}(t)\sim\mathrm{Poisson}(\frac{2td}{k}), by a Chernoff bound argument

(𝒩P(t)<tdk)e(c0td)/k\mathbb{P}\left(\mathcal{N}_{P}(t)<\frac{td}{k}\right)\leq e^{-(c_{0}td)/k}

for some constant c0>0c_{0}>0. Since each pair in the subsequence belongs to some BabB_{ab} independently with probability 2d2\frac{2}{d^{2}},

(P is not nice)\displaystyle\mathbb{P}(\text{$P$ is not nice}) d2(|B12|<t2dk|NP(t)tdk)+e(c0td)/k\displaystyle\leq d^{2}\mathbb{P}\left(|B_{12}|<\lfloor\frac{t}{2dk}\rfloor\bigg{|}N_{P}(t)\geq\frac{td}{k}\right)+e^{-(c_{0}td)/k}
d2(Binomial(td2k,2d2)<t2dk)+e(c0td)/k\displaystyle\leq d^{2}\mathbb{P}\left(\mathrm{Binomial}\left(\lfloor\frac{td}{2k}\rfloor,\frac{2}{d^{2}}\right)<\lfloor\frac{t}{2dk}\rfloor\right)+e^{-(c_{0}td)/k}
d2exp(t10dk)+e(c0td)/kCd2exp(t10dk)\displaystyle\leq d^{2}\exp\left(-\frac{t}{10dk}\right)+e^{-(c_{0}td)/k}\leq Cd^{2}\exp\left(-\frac{t}{10dk}\right)

where the third line follows from a Chernoff bound. ∎

The reason to look at the collection {Bab:a,bP,a<b}\{B_{ab}:a,b\in P,a<b\} is the following simple observation: when revealing the relative order of a pair of {Zaηa,Zbηb}\{Z_{a}^{\eta_{a}},Z_{b}^{\eta_{b}}\} in BabB_{ab}, if we have (Zbηb,Zaηa)(Z_{b}^{\eta_{b}},Z_{a}^{\eta_{a}}) then it leads to an increment of ηaηb-\eta_{a}\eta_{b} on the value of mbam_{ba}; if instead we have (Zaηa,Zbηb)(Z_{a}^{\eta_{a}},Z_{b}^{\eta_{b}}) then the resulting increment on mbam_{ba} is 0. Taking account of the fact that for different pairs in BabB_{ab}, the corresponding ηaηb-\eta_{a}\eta_{b}’s are i.i.d. random variables uniform in {±1}\{\pm 1\}, we can expect diffusive behavior that “smoothes out” the probability of mbam_{ba} taking a certain value.

Next we will translate this intuition into a rigorous proof. We first define the following quantity:

q(n):=maxx(Bin(n,1/2)=x).q(n):=\max_{x\in\mathbb{Z}}\mathbb{P}(\mathrm{Bin}(n,1/2)=x). (64)

It is well known that q(n)q(n) is non-increasing with respect to nn and there exists some C>0C>0 such that q(n)Cn1/2q(n)\leq Cn^{-1/2} for all nn\in\mathbb{N}.

Lemma 4.20.

Let SnBin(n,1/2)S_{n}\sim\mathrm{Bin}(n,1/2) for some nn\in\mathbb{N}. For any prime pp,

maxx(Sn=xmodp)min{2/p,1/2}+q(n).\max_{x\in\mathbb{Z}}\mathbb{P}(S_{n}=x\mod p)\leq\min\{2/p,1/2\}+q(n).
Proof.

The proof follows from the idea used in Lemma 2.14 of [17]. It is easy to see that any distribution on \mathbb{N} whose probability mass function is non-decreasing can be written as a mixture of Unif({1,,Y})\mathrm{Unif}(\{1,...,Y\}) distributions, for different YY\in\mathbb{N}.

For an even nn\in\mathbb{N} the binomial distribution Bin(n,1/2)\mathrm{Bin}(n,1/2) is known to be unimodal, i.e., the mode x:=argmaxx(Sn=x)x_{*}:=\mathrm{argmax}_{x}\mathbb{P}(S_{n}=x) is unique. When nn\in\mathbb{N} is odd, the maximum of (Sn=x)\mathbb{P}(S_{n}=x) is achieved at two adjacent values. Without loss of generality, let x:=min{argmaxx(Sn=x)}x_{*}:=\min\{\mathrm{argmax}_{x}\mathbb{P}(S_{n}=x)\}.

Letting S¯n:=Snx\bar{S}_{n}:=S_{n}-x_{*}, it is easy to see the map m(|S¯n|=m)m\mapsto\mathbb{P}(|\bar{S}_{n}|=m) is non-increasing on \mathbb{N} and hence we can write

|S¯n|Unif({1,,Y})conditional onS¯n0,|\bar{S}_{n}|\sim\mathrm{Unif}(\{1,...,Y\})\quad\text{conditional on}\quad\bar{S}_{n}\neq 0,

for some random variable YY\in\mathbb{N} whose law is insignificant to us. As a consequence, when p>2p>2 for xx\in\mathbb{Z},

(S¯n=xmodp|S¯n0)\displaystyle\mathbb{P}(\bar{S}_{n}=x\mod p|\bar{S}_{n}\neq 0) (|S¯n|+xp|S¯n0)+(|S¯n|xp|S¯n0)\displaystyle\leq\mathbb{P}(|\bar{S}_{n}|+x\in p\mathbb{Z}|\bar{S}_{n}\neq 0)+\mathbb{P}(|\bar{S}_{n}|-x\in p\mathbb{Z}|\bar{S}_{n}\neq 0)
2𝔼(Y/p/Y)2/p.\displaystyle\leq 2\mathbb{E}(\lfloor Y/p\rfloor/Y)\leq 2/p.

That is, for any xx\in\mathbb{Z}, (Sn=xmodp)=(S¯n=(xx)modp)2/p\mathbb{P}(S_{n}=x\mod p)=\mathbb{P}(\bar{S}_{n}=(x-x_{*})\mod p)\leq 2/p.

When p=2p=2, by the same reasoning we have the following bound

maxx{0,1}(S¯n=xmod2|S¯n0)1/2.\max_{x\in\{0,1\}}\mathbb{P}(\bar{S}_{n}=x\mod 2|\bar{S}_{n}\neq 0)\leq 1/2.

Therefore,

maxx(Sn=xmodp)\displaystyle\max_{x\in\mathbb{Z}}\mathbb{P}(S_{n}=x\mod p) =maxx(S¯n=xmodp)\displaystyle=\max_{x\in\mathbb{Z}}\mathbb{P}(\bar{S}_{n}=x\mod p)
(S¯n=0)+maxx(S¯n=xmodp|S¯n0)q(n)+min{2/p,1/2}.\displaystyle\leq\mathbb{P}(\bar{S}_{n}=0)+\max_{x\in\mathbb{Z}}\mathbb{P}(\bar{S}_{n}=x\mod p|\bar{S}_{n}\neq 0)\leq q(n)+\min\{2/p,1/2\}.

Now we are ready to state the meaning of “relative independence” in the terms of (mba)a,bP,a<b(m_{ba})_{a,b\in P,a<b}. The following lemma implies that conditioned on PP being nice, for any (xba)a,bP:a<b(x_{ba})_{a,b\in P:a<b}, the collection (1{mab=xba})a,bP,a<b(\textbf{1}\{m_{ab}=x_{ba}\})_{a,b\in P,a<b} (respectively, (1{mab=xbamodp})a,bP,a<b(\textbf{1}\{m_{ab}=x_{ba}\mod p\})_{a,b\in P,a<b}) is stochastically dominated by i.i.d. Bernoulli random variables with probability q(t2dk)q(\lfloor\frac{t}{2dk}\rfloor) (respectively, min{2/p,1/2}+q(t2dk)\min\{2/p,1/2\}+q(\lfloor\frac{t}{2dk}\rfloor)).

Lemma 4.21.

Let q:=q(t2dk)q:=q(\lfloor\frac{t}{2dk}\rfloor). For A{(a,b):a,bP,a<b}A\subseteq\{(a,b):a,b\in P,a<b\}, let Ac=σ((mba)(a,b)A,a,bP,a<b)\mathcal{F}_{A^{c}}=\sigma((m_{ba})_{(a,b)\notin A,a,b\in P,a<b}). Then for any (xba)(a,b)A|A|(x_{ba})_{(a,b)\in A}\in\mathbb{Z}^{|A|} we have that

((a,b)A{mba=xba}|Ac,P is nice)q|A|,\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\}|\mathcal{F}_{A^{c}},P\text{ is nice})\leq q^{|A|}, (65)

and furthermore,

((a,b)A{mba=xbamodp}|Ac,P is nice)(min{2/p,1/2}+q)|A|.\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\mod p\}|\mathcal{F}_{A^{c}},P\text{ is nice})\leq(\min\{2/p,1/2\}+q)^{|A|}. (66)
Proof.

For a,bPa,b\in P with a<ba<b, we first recall from (34) the definition of mbam_{ba} and observe that mbam_{ba} is in fact determined purely by the subsequence 𝐗P\mathbf{X}_{P}. When referring to 𝐗P\mathbf{X}_{P} as a subsequence, we essentially use the information given by (σP,i,ηP,i)i[𝒩P](\sigma_{P,i},\eta_{P,i})_{i\in[\mathcal{N}_{P}]}, where σP,i\sigma_{P,i} denotes the index of the ii-th term in 𝐗P\mathbf{X}_{P} and ηP,i\eta_{P,i} its sign. Hence we can write

mba=j=1𝒩Pi<jηiηj𝟏{σP,j=a,σP,i=b} for a,bP with a<b.m_{ba}=-\sum_{j=1}^{\mathcal{N}_{P}}\sum_{i<j}\eta_{i}\eta_{j}\mathbf{1}\{\sigma_{P,j}=a,\sigma_{P,i}=b\}\quad\text{ for }a,b\in P\text{ with }a<b. (67)

With a slight abuse of notation, let Bc:={i[𝒩P]:ia,bP,a<bBab}B^{c}:=\{i\in[\mathcal{N}_{P}]:i\notin\cup_{a,b\in P,a<b}B_{ab}\} denote the collection of locations in 𝐗P\mathbf{X}_{P} that are not recorded in any BabB_{ab}’s. Let 𝒢η:=σ({ηP,i:i})\mathcal{G}_{\eta}:=\sigma(\{\eta_{P,i}:i\in\mathbb{N}\}) (we reveal all the signs ηP,i\eta_{P,i} regardless of the value of 𝒩P\mathcal{N}_{P}) and define

𝒢:=σ({Bab:a,bP,a<b},(σi)iBc,𝒢η).\mathcal{G}:=\sigma(\{B_{ab}:a,b\in P,a<b\},(\sigma_{i})_{i\in B^{c}},\mathcal{G}_{\eta}).

From now on, we will condition on the σ\sigma-field 𝒢\mathcal{G}. By conditioning on 𝒢\mathcal{G}, we are treating all the signs as known and revealing the indices of generators that are not in any BabB_{ab}’s. Hence, by (67) mbam_{ba} can be written as the sum of two parts, one independent from 𝒢\mathcal{G} and the other known under 𝒢\mathcal{G}:

mba=i:(2i1,2i)Babη2i1η2i𝟏{σP,2i=a,σP,2i1=b}+mbaknownm_{ba}=\sum_{i\in\mathbb{N}:(2i-1,2i)\in B_{ab}}-\eta_{2i-1}\eta_{2i}\mathbf{1}\{\sigma_{P,2i}=a,\sigma_{P,2i-1}=b\}+m^{known}_{ba} (68)

where mbaknownm_{ba}^{known} represents the cumulated increment from the pairs of {Za±,Zb±}\{Z_{a}^{\pm},Z_{b}^{\pm}\} that are not in BabB_{ab}, which is known under 𝒢\mathcal{G}. We can further observe that the collection of random variables (𝟏{σP,2i=a,σP,2i1=b})(2i1,2i)Bab(\mathbf{1}\{\sigma_{P,2i}=a,\sigma_{P,2i-1}=b\})_{(2i-1,2i)\in B_{ab}} from the first part of (68) are i.i.d. Bernoulli(1/2)\mathrm{Bernoulli}(1/2) and independent from 𝒢\mathcal{G}.

Let nab±:=|{(2i1,2i)|Bab|:η2i1η2i=±1}|n_{ab}^{\pm}:=|\{(2i-1,2i)\in|B_{ab}|:-\eta_{2i-1}\eta_{2i}=\pm 1\}|. (Note that nab++nab=|Bab|n^{+}_{ab}+n^{-}_{ab}=|B_{ab}| and nab±n_{ab}^{\pm} are measurable with respect to 𝒢\mathcal{G}.) We can then define two independent binomial random variables Yab+Bin(nab+,1/2)Y^{+}_{ab}\sim\mathrm{Bin}(n_{ab}^{+},1/2), YabBin(nab,1/2)Y^{-}_{ab}\sim\mathrm{Bin}(n^{-}_{ab},1/2) so that the increment on mbam_{ba} resulted from revealing the orders of the pairs in BabB_{ab} is given by Yab+Yab=YabnabY^{+}_{ab}-Y^{-}_{ab}=Y_{ab}-n^{-}_{ab}, where Yab:=Yab++(nabYab)Bin(|Bab|,1/2)Y_{ab}:=Y^{+}_{ab}+(n^{-}_{ab}-Y^{-}_{ab})\sim\mathrm{Bin}(|B_{ab}|,1/2). It follows from (68) that, for any xx\in\mathbb{Z},

(mba=x|𝒢,P is nice)\displaystyle\mathbb{P}(m_{ba}=x|\mathcal{G},P\text{ is nice}) maxx((2i1,2i)Babη2i1η2i𝟏{σP,2i=a,σP,2i1=b}=x|𝒢,P is nice)\displaystyle\leq\max_{x^{\prime}\in\mathbb{Z}}\mathbb{P}\left(\sum_{(2i-1,2i)\in B_{ab}}-\eta_{2i-1}\eta_{2i}\mathbf{1}\{\sigma_{P,2i}=a,\sigma_{P,2i-1}=b\}=x^{\prime}\bigg{|}\mathcal{G},P\text{ is nice}\right)
=maxx(Yab+Yab=x|𝒢,P is nice)\displaystyle=\max_{x^{\prime}\in\mathbb{Z}}\mathbb{P}\left(Y^{+}_{ab}-Y^{-}_{ab}=x^{\prime}\bigg{|}\mathcal{G},P\text{ is nice}\right)
=maxx′′(Yab=x′′|𝒢,P is nice)\displaystyle=\max_{x^{\prime\prime}\in\mathbb{Z}}\mathbb{P}\left(Y_{ab}=x^{\prime\prime}\bigg{|}\mathcal{G},P\text{ is nice}\right)
=maxx′′(Bin(|Bab|,1/2)=x′′|P is nice)q.\displaystyle=\max_{x^{\prime\prime}\in\mathbb{Z}}\mathbb{P}\left(\mathrm{Bin}(|B_{ab}|,1/2)=x^{\prime\prime}|P\text{ is nice}\right)\leq q. (69)

Hence we have the desired upper bound for a given pair of a,bP,a<ba,b\in P,a<b.

(mba=x|P is nice)q.\mathbb{P}(m_{ba}=x|P\text{ is nice})\leq q.

Now note that for different pairs (a,b)(a,b)(a,b)\neq(a^{\prime},b^{\prime}) with a,a,b,bPa,a^{\prime},b,b^{\prime}\in P and a<b,a<ba<b,a^{\prime}<b^{\prime}, by our construction their corresponding BabB_{ab} and BabB_{a^{\prime}b^{\prime}} are disjoint. Hence (Yab)a,bP,a<b(Y_{ab})_{a,b\in P,a<b} is a collection of independent random variables. Carrying out a calculation similar to (4.9.1) using (Yab)a,bP,a<b(Y_{ab})_{a,b\in P,a<b} leads to

max(xba)(a,b)A((a,b)A{mba=xba}|Ac,P is nice)\displaystyle\max_{(x_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\}|\mathcal{F}_{A^{c}},\text{$P$ is nice}) max(yba)(a,b)A((a,b)A{Yab=yba}|Ac,P is nice)\displaystyle\leq\max_{(y_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{Y_{ab}=y_{ba}\}|\mathcal{F}_{A^{c}},\text{$P$ is nice})
q|A|.\displaystyle\leq q^{|A|}.

Similarly, by Lemma 4.20

max(xba)(a,b)A((a,b)A{mba=xbamodp}|Ac,P is nice)\displaystyle\max_{(x_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{m_{ba}=x_{ba}\mod p\}|\mathcal{F}_{A^{c}},\text{$P$ is nice})
\displaystyle\leq max(yba)(a,b)A((a,b)A{Yab=ybamodp}|Ac,P is nice)\displaystyle\max_{(y_{ba})_{(a,b)\in A}}\mathbb{P}(\cap_{(a,b)\in A}\{Y_{ab}=y_{ba}\mod p\}|\mathcal{F}_{A^{c}},\text{$P$ is nice})
\displaystyle\leq max(yba)(a,b)A(a,b)A(Bin(t2dk,1/2)=ybamodp)(min{2/p,1/2}+q)|A|.\displaystyle\max_{(y_{ba})_{(a,b)\in A}}\prod_{(a,b)\in A}\mathbb{P}(\mathrm{Bin}(\lfloor\frac{t}{2dk}\rfloor,1/2)=y_{ba}\mod p)\leq(\min\{2/p,1/2\}+q)^{|A|}.

Proof of Proposition 13. Recall that K==2Lr+2K=\sum_{\ell=2}^{L}r_{\ell}+2. Let dKd\geq K be an even integer to be chosen later. Partition {a[k]:ak/2}\{a\in[k]:a\leq k/2\} into subsets {𝒥1,i:1ik/(2d)}\{\mathcal{J}_{1,i}:1\leq i\leq\lfloor k/(2d)\rfloor\}, each of size dKd\geq K, and omit the rest of the generators. Without loss of generality assume dd is even. Similarly we partition {b[k]:b>k/2}\{b\in[k]:b>k/2\} into subsets {𝒥2,i:1ik/(2d)}\{\mathcal{J}_{2,i}:1\leq i\leq\lfloor k/(2d)\rfloor\} of size dd. Let Pi=𝒥1,i𝒥2,iP_{i}=\mathcal{J}_{1,i}\cup\mathcal{J}_{2,i}. Since the arrivals of generators whose indices are in disjoint sets {Pi:1ik/(2d)}\{P_{i}:1\leq i\leq\lfloor k/(2d)\rfloor\} are independent, we can try independently for k/(2d)\lfloor k/(2d)\rfloor times to search for a K×KK\times K submatrix MM in (m^ba)b𝒦𝒥2,i,a𝒥1,i(\hat{m}_{ba})_{b\in\mathcal{K}\cap\mathcal{J}_{2,i},a\in\mathcal{J}_{1,i}} such that MpM_{p} has rank KK.

For each 1ik/(2d)1\leq i\leq\lfloor k/(2d)\rfloor we perform the following trial. Let 𝒦\mathcal{K} be as in (63). Since we will only be looking at b𝒥2,i𝒦b\in\mathcal{J}_{2,i}\cap\mathcal{K}, we need to determine if |𝒥2,i𝒦||\mathcal{J}_{2,i}\cap\mathcal{K}| is large enough to begin with. If |𝒥2,i𝒦|d/2|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2, we will look for the desired K×KK\times K submatrix MM in (m^ba)b𝒥2,i𝒦,a𝒥1,i(\hat{m}_{ba})_{b\in\mathcal{J}_{2,i}\cap\mathcal{K},a\in\mathcal{J}_{1,i}}. We will look at a batch of KK indices in 𝒥2,i𝒦\mathcal{J}_{2,i}\cap\mathcal{K} at a time, which will be denoted by {b1,,bK}\{b_{1},\dots,b_{K}\}. Let 𝒥1,i1st\mathcal{J}^{\mathrm{1st}}_{1,i} denote the first half of 𝒥1,i\mathcal{J}_{1,i} and 𝒥1,i2nd\mathcal{J}^{\mathrm{2nd}}_{1,i} the second half. Our goal is to search for column indices in 𝒥1,i2nd\mathcal{J}^{\mathrm{2nd}}_{1,i}, which will be labelled a1,,aKa_{1},\dots,a_{K}, such that the submatrix MM induced by the rows {b1,,bK}\{b_{1},\dots,b_{K}\} and the columns {a1,,aK}\{a_{1},\dots,a_{K}\} of m^\hat{m} satisfies the condition that MpM_{p} has rank KK.

We will describe the steps of the ii-th trial now:

  1. (1)

    If |𝒥2,i𝒦|d/2|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2 proceed to the search of submatrix MM. Otherwise declare failure for this trial.

  2. (2)

    We look for the first a𝒥1,i2nda\in\mathcal{J}^{\mathrm{2nd}}_{1,i} such that m^b1,a0modp\hat{m}_{b_{1},a}\neq 0\mod p and set it as a1a_{1}. If there is no such aa declare this trial to be a failure.

  3. (3)

    The search will then proceed iteratively: for u1u\geq 1, given the choice of {a1,,au}\{a_{1},\dots,a_{u}\}, we will look for the first a𝒥1,i2nd\{a1,,au}a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\} such that the last row (mbu+1,a1,,mbu+1,au+1)modp(m_{b_{u+1},a_{1}},\dots,m_{b_{u+1},a_{u+1}})\mod p is not in the vector space spanned by the previous rows {(mbi,a1,,mbi,au+1)modp:1iu}\{(m_{b_{i},a_{1}},\dots,m_{b_{i},a_{u+1}})\mod p:1\leq i\leq u\}. If there is no a𝒥1,i2nd\{a1,,au}a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\} that works, then we declare failure for this trial.

  4. (4)

    The trial is a success if we have found {a1,,aK}\{a_{1},\dots,a_{K}\}.

It remains to estimate the success probability for each trial, i.e., upper bound the failure probability in each step of the trial.

Step 1. For any b𝒥2,ib\in\mathcal{J}_{2,i}, to upper bound the probability that b𝒦b\notin\mathcal{K}, we will reveal the orders in {Bab:a𝒥1,i1st}\{B_{ab}:a\in\mathcal{J}^{\mathrm{1st}}_{1,i}\}. (By the relative independence in m^\hat{m}, we can still control the failure probability of our later search in 𝒥1,i2nd\mathcal{J}^{\mathrm{2nd}}_{1,i} given that {Bab:a𝒥1,i1st}\{B_{ab}:a\in\mathcal{J}^{\mathrm{1st}}_{1,i}\} has been revealed.)

Note that

{b𝒦}\displaystyle\{b\notin\mathcal{K}\} {gcd({mba:a𝒥1,i1st}) is not coprime with |Gab|}\displaystyle\subseteq\{\mathrm{gcd}(\{m_{ba}:a\in\mathcal{J}^{\mathrm{1st}}_{1,i}\})\text{ is not coprime with }|G_{\mathrm{ab}}|\}
=pΓa𝒥1,i1st{mba=0modp}.\displaystyle=\cup_{p\in\Gamma}\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}.

For all a[k]a\in[k], let Na:=Na(t)N_{a}:=N_{a}(t) (and respectively Na:=Na(t)N^{\prime}_{a}:=N^{\prime}_{a}(t)) denote the number of times the generator Za±1Z_{a}^{\pm 1} appears in XX (and respectively XX^{\prime}). Recall that the arrivals of each generator can be viewed as an independent Poisson process with rate 1/k1/k. Letting

𝒞:={Nc2(t/k),Nc2(t/k) for all c{b}𝒥1,i1st},\mathcal{C}:=\{N_{c}\leq 2(t/k),N^{\prime}_{c}\leq 2(t/k)\text{ for all }c\in\{b\}\cup\mathcal{J}^{\mathrm{1st}}_{1,i}\},

we have

(𝒞c)2(|𝒥1,i1st|+1)(Na>2(t/k))2dexp(Ω(t/k)).\mathbb{P}(\mathcal{C}^{c})\leq 2(|\mathcal{J}^{\mathrm{1st}}_{1,i}|+1)\mathbb{P}(N_{a}>2(t/k))\leq 2d\exp(-\Omega(t/k)).

Further observe that by (67) |mba|(Nb+Nb)(Na+Na)|m_{ba}|\leq(N_{b}+N^{\prime}_{b})(N_{a}+N^{\prime}_{a}) and hence 𝒞a𝒥1,i1st{|mba|16(t/k)2}\mathcal{C}\subseteq\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{|m_{ba}|\leq 16(t/k)^{2}\}. This implies that when p>16(t/k)2p>16(t/k)^{2}, for 𝒞(a𝒥1,i1st{mba=0modp})\mathcal{C}\cap(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}) to occur the only possible case is when mba=0m_{ba}=0 for all a𝒥1,i1sta\in\mathcal{J}^{\mathrm{1st}}_{1,i}. That is,

{b𝒦}\displaystyle\{b\notin\mathcal{K}\} {{b𝒦}𝒞}𝒞c\displaystyle\subseteq\{\{b\notin\mathcal{K}\}\cap\mathcal{C}\}\cup\mathcal{C}^{c}
(pΓ:p16(t/k)2a𝒥1,i1st{mba=0modp}𝒞)\displaystyle\subseteq\left(\cup_{p\in\Gamma:p\leq 16(t/k)^{2}}\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}\cap\mathcal{C}\right)
(a𝒥1,i1st{mba=0})𝒞c\displaystyle\quad\cup(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\})\cup\mathcal{C}^{c}

By Lemma 4.19 it is easy to see (P is nice)1/2\mathbb{P}(\text{$P$ is nice})\geq 1/2 when tkdlogdt\gg kd\log d, which will be guaranteed by our choice of dd. Recall that q:=q(t2dk)q:=q(\lfloor\frac{t}{2dk}\rfloor). It follows from Lemma 4.21 that

(b𝒦|P is nice)\displaystyle\mathbb{P}(b\notin\mathcal{K}|\text{$P$ is nice}) pΓ:p16(t/k)2(a𝒥1,i1st{mba=0modp}|P is nice)\displaystyle\leq\sum_{p\in\Gamma:p\leq 16(t/k)^{2}}\mathbb{P}(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\mod p\}|\text{$P$ is nice})
+(a𝒥1,i1st{mba=0}|P is nice)+(𝒞c|P is nice)\displaystyle\quad+\mathbb{P}(\cap_{a\in\mathcal{J}^{\mathrm{1st}}_{1,i}}\{m_{ba}=0\}|\text{$P$ is nice})+\mathbb{P}(\mathcal{C}^{c}|\text{$P$ is nice})
16(t/k)2(min{2/p,1/2}+q)d/2+qd/2+4dexp(Ω(t/k))exp(Ω(d))=:q~\displaystyle\leq 16(t/k)^{2}\left(\min\{2/p,1/2\}+q\right)^{d/2}+q^{d/2}+4d\exp(-\Omega(t/k))\lesssim\exp(-\Omega(d))=:\tilde{q}

when log(t/k)d(t/k)\log(t/k)\ll d\ll(t/k). Conditioned on PP being nice, by the relative independence the collection (𝟏{b𝒦})b𝒥2,i(\mathbf{1}\{b\notin\mathcal{K}\})_{b\in\mathcal{J}_{2,i}} is dominated by i.i.d. Bernoulli random variables with probability q~\tilde{q}. Since 𝒥2,i\mathcal{J}_{2,i} has size dd,

(|𝒥2,i𝒦|<d/2|P is nice)(Binomial(d,q~)>d/2)2d(q~)d/2exp(Ω(d2)).\mathbb{P}(|\mathcal{J}_{2,i}\cap\mathcal{K}|<d/2|\text{$P$ is nice})\leq\mathbb{P}(\mathrm{Binomial}(d,\tilde{q})>d/2)\leq 2^{d}(\tilde{q})^{d/2}\leq\exp(-\Omega(d^{2})). (70)

Step 2 and 3: the search for aua_{u} with 1uK1\leq u\leq K. Since |𝒥2,i𝒦|d/2|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2 we can try at least d/(2K)\lfloor d/(2K)\rfloor batches of KK indices in |𝒥2,i𝒦|d/2|\mathcal{J}_{2,i}\cap\mathcal{K}|\geq d/2. These trials are not exactly independent, but using the relative independence for {m^ba:a,bPi,a<b}\{\hat{m}_{ba}:a,b\in P_{i},a<b\} we can upper bound the probability that all trials are failures.

We begin by estimating the failure probability for a single trial. Consider the ii-th trial where we will look for the candidates {au:1uK}\{a_{u}:1\leq u\leq K\} in 𝒥1,i2nd\mathcal{J}^{\mathrm{2nd}}_{1,i}. Write {b1,,bK}\{b_{1},\dots,b_{K}\} as the corresponding row indices in this trial. The probability that we fail to find a a1a_{1} is (m^b1,a=0modp for all a𝒥1|P is nice)\mathbb{P}(\hat{m}_{b_{1},a}=0\mod p\text{ for all }a\in\mathcal{J}_{1}|\text{$P$ is nice}), which, by Lemma 4.21, satisfies

(m^b1,a=0modp for all a𝒥1|P is nice)(min{2/p,1/2}+q)d/2.\mathbb{P}(\hat{m}_{b_{1},a}=0\mod p\text{ for all }a\in\mathcal{J}_{1}|\text{$P$ is nice})\leq(\min\{2/p,1/2\}+q)^{\lfloor d/2\rfloor}.

For u[K]u\in[K], suppose we have found {a1,,au}\{a_{1},\dots,a_{u}\} such that the matrix induced by {b1,,bu}\{b_{1},\dots,b_{u}\} and {a1,,au}\{a_{1},\dots,a_{u}\} has linearly independent rows. If a candidate a𝒥1,i2nd\{a1,,au}a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\} fails, it means that the new row

(m^bu+1,a1,,m^bu+1,au,m^bu+1,a)modp(\hat{m}_{b_{u+1},a_{1}},\dots,\hat{m}_{b_{u+1},a_{u}},\hat{m}_{b_{u+1},a})\mod p

is in the vector space spanned by previous uu rows {(m^bi,a1,,m^bi,au,m^bi,a)modp:1iu}\{(\hat{m}_{b_{i},a_{1}},\dots,\hat{m}_{b_{i},a_{u}},\hat{m}_{b_{i},a})\mod p:1\leq i\leq u\}. Since by assumption the matrix induced by {b1,,bu}\{b_{1},\dots,b_{u}\} and {a1,,au}\{a_{1},\dots,a_{u}\} has independent rows, there exists a unique linear combination (c1,,cu)pu(c_{1},\dots,c_{u})\in\mathbb{Z}_{p}^{u} such that

(m^bu+1,a1,,m^bu+1,au)=i=1uci(m^bi,a1,,m^bi,au)modp,(\hat{m}_{b_{u+1},a_{1}},\dots,\hat{m}_{b_{u+1},a_{u}})=\sum_{i=1}^{u}c_{i}(\hat{m}_{b_{i},a_{1}},\dots,\hat{m}_{b_{i},a_{u}})\mod p,

and thus the last column needs to satisfy

m^bu+1,a=i=1ucim^bi,amodp.\hat{m}_{b_{u+1},a}=\sum_{i=1}^{u}c_{i}\hat{m}_{b_{i},a}\mod p.

Therefore, by Lemma 4.21 the failure probability for a candidate a𝒥1,i2nd\{a1,,au}a\in\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\} is at most

(m^bu+1,a=i=1ucim^bi,amodp|P is nice)min{2/p,1/2}+q.\mathbb{P}(\hat{m}_{b_{u+1},a}=\sum_{i=1}^{u}c_{i}\hat{m}_{b_{i},a}\mod p|\text{$P$ is nice})\leq\min\{2/p,1/2\}+q.

The relative independence in {m^ba:a,bPi,a<b}\{\hat{m}_{ba}:a,b\in P_{i},a<b\} implies that the probability of failing to find au+1a_{u+1} in the set 𝒥1,i2nd\{a1,,au}\mathcal{J}^{\mathrm{2nd}}_{1,i}\backslash\{a_{1},\dots,a_{u}\} is at most (min{2/p,1/2}+q)d/2u(\min\{2/p,1/2\}+q)^{\lfloor d/2\rfloor-u}. Through a simple union bound we see that the batch {b1,,bK}\{b_{1},\dots,b_{K}\} fails with probability at most

u=1K(the search for au fails|P is nice)u=1K(min{2/p,1/2}+q)d/2rK(min{2/p,1/2}+q)d/2K.\sum_{u=1}^{K}\mathbb{P}(\text{the search for $a_{u}$ fails}|\text{$P$ is nice})\leq\sum_{u=1}^{K}(\min\{2/p,1/2\}+q)^{\lfloor d/2\rfloor-r}\leq K(\min\{2/p,1/2\}+q)^{d/2-K}. (71)

Combining all these failure probabilities, i.e., Lemma 4.19, (70), (71), and using the fact that {m^ba:a,bPi}\{\hat{m}_{ba}:a,b\in P_{i}\} are independent for the disjoint index sets {Pi:1ik/(2d)}\{P_{i}:1\leq i\leq\lfloor k/(2d)\rfloor\}, we have

(𝒜pc)\displaystyle\mathbb{P}(\mathcal{A}_{p}^{c}) (the 1st trial fails)k/(2d)\displaystyle\leq\mathbb{P}(\text{the 1st trial fails})^{\lfloor k/(2d)\rfloor}
((P1 is not nice)+(|𝒥2,i𝒦|<d/2|P is nice)+(K(q+min{2/p,1/2})(d/2)K)d/(2K))k/(2d)\displaystyle\leq\left(\mathbb{P}(P_{1}\text{ is not nice})+\mathbb{P}(|\mathcal{J}_{2,i}\cap\mathcal{K}|<d/2|\text{$P$ is nice})+\left(K(q+\min\{2/p,1/2\})^{(d/2)-K}\right)^{\lfloor d/(2K)\rfloor}\right)^{\lfloor k/(2d)\rfloor}
(Cd2exp(Ω(tdk))+exp(Ω(d2))+(K(q+min{2/p,1/2})(d/2)K)d/(2K))k/(2d)\displaystyle\leq\left(Cd^{2}\exp\left(-\Omega\left(\frac{t}{dk}\right)\right)+\exp(-\Omega(d^{2}))+\left(K(q+\min\{2/p,1/2\})^{(d/2)-K}\right)^{\lfloor d/(2K)\rfloor}\right)^{\lfloor k/(2d)\rfloor}
3k/(2d)((Cd2exp(Ω(tdk)))k/(2d)+exp(Ω(kd))+exp(Ω(kd)))\displaystyle\leq 3^{k/(2d)}\left(\left(Cd^{2}\exp\left(-\Omega\left(\frac{t}{dk}\right)\right)\right)^{\lfloor k/(2d)\rfloor}+\exp(-\Omega(kd))+\exp(-\Omega(kd))\right) (72)

by the elementary inequality (a+b+c)n3n(an+bn+cn)(a+b+c)^{n}\leq 3^{n}(a^{n}+b^{n}+c^{n}) for n1n\geq 1.

Recall that t(1+ε)t(k,G)k|Gab|2/kt\geq(1+\varepsilon)t_{*}(k,G)\asymp k|G_{\mathrm{ab}}|^{2/k} in the currently considered regime. Our choice of dd should satisfy 1dk1\ll d\ll k and tdk1\frac{t}{dk}\gg 1 so that q=o(1)q=o(1). It was also required that log(t/k)d(t/k)\log(t/k)\ll d\ll(t/k) right before (70). Furthermore, we will choose dd satisfying tdklogd\frac{t}{dk}\gg\log d and td2log|G|\frac{t}{d^{2}}\gg\log|G| so that the first term in (4.9.1) is o(|G|C)o(|G|^{-C}) for any C>0C>0. To control the second and third terms in (4.9.1), the choice of dd also needs to satisfy kdlog|G|kd\gg\log|G|.

We choose d=|Gab|δ/kd=|G_{\mathrm{ab}}|^{\delta/k} for some sufficiently small δ>0\delta>0, which satisfies all the conditions listed above. Consequently, for any C>0C>0,

(𝒜pc)=o(|G|C),\mathbb{P}(\mathcal{A}_{p}^{c})=o(|G|^{-C}),

which leads to Proposition 13. ∎

4.9.2. Regime: klog|Gab|k\gtrsim\log|G_{\mathrm{ab}}|

In this regime s=t/k1s=t/k\lesssim 1 and we no longer have the relative independence in m^\hat{m}, so we will take a different approach.

Recall that Wa±:=i=1N(t)1{σi=a,ηi=±1}W_{a}^{\pm}:=\sum_{i=1}^{N(t)}1\{\sigma_{i}=a,\eta_{i}=\pm 1\} tracks the number of times Za±1Z^{\pm 1}_{a} appears in X(t)X(t). We will only look at these set of generators

𝒥={a[k]:Wa++Wa=1},\mathcal{J}=\{a\in[k]:W_{a}^{+}+W_{a}^{-}=1\},

that appear exactly once in X:=X(t)X:=X(t). In this regime, we will be conditioning on {V=0,𝐭𝐲𝐩}\{V=0,\mathrm{\mathbf{typ}}\}, see Definition 5 for the definition of 𝐭𝐲𝐩\mathrm{\mathbf{typ}} in this regime. Denote by X|𝒥X|_{\mathcal{J}} the subsequence of XX that contains only generators in 𝒥\mathcal{J}. For simplicity of notation, for any a𝒥a\in\mathcal{J} we will assume that ZaZ_{a} (instead of Za1Z_{a}^{-1}) appears in XX. Otherwise we can just relabel Za1Z_{a}^{-1} as a new ZaZ_{a}. We also want to emphasize that only the order in which (Za)a𝒥(Z_{a})_{a\in\mathcal{J}} appear in X|𝒥X|_{\mathcal{J}} is important in the following argument, and the values of (Za)a𝒥(Z_{a})_{a\in\mathcal{J}} are not.

By (67) it is easy to see that on the event {V=0}\{V=0\} we have mba{1,0,1}m_{ba}\in\{-1,0,1\} for a,b𝒥a,b\in\mathcal{J} since Za,ZbZ_{a},Z_{b} only appear once in both XX and XX^{\prime}. Basically, if {Za,Zb}\{Z_{a},Z_{b}\} appears in the same order in XX^{\prime} as they do in XX then we have m^ba=0\hat{m}_{ba}=0, otherwise |m^ba|=1|\hat{m}_{ba}|=1.

Our goal is to look for an upper triangular submatrix MM of m^\hat{m} that has full rank KK. Indeed, since MM is upper triangular and all the entries of MM are in {1,0,1}\{-1,0,1\}, if MM has full rank KK then for all pΓp\in\Gamma, MpM_{p} also has full rank over the field 𝔽p\mathbb{F}_{p}. The proof is based on a combinatorial argument where we interpret (|m^ba|)a,b𝒥(|\hat{m}_{ba}|)_{a,b\in\mathcal{J}} in terms of the relative order of Za,ZbZ_{a},Z_{b} in X|𝒥X|_{\mathcal{J}} and X|𝒥X^{\prime}|_{\mathcal{J}}.

Let {ci:i}\{c_{i}:i\in\mathbb{N}\} be a set of distinct colors. Let dd be a positive integer to be determined later and n:=|𝒥|/dn:=\lfloor|\mathcal{J}|/d\rfloor. For 1in1\leq i\leq n, we will color the ((i1)d+1)((i-1)d+1)-th to the idid-th generator that appears in X|𝒥X|_{\mathcal{J}} as color cni+1c_{n-i+1}. In other words, the first dd generators in X|𝒥X|_{\mathcal{J}} have color cnc_{n}, followed by dd generators of color cn1c_{n-1} and so on. Let 𝒥i\mathcal{J}_{i} denote the set of generators in 𝒥\mathcal{J} that are colored cic_{i}. Our coloring scheme implies that in the sequence X|𝒥X|_{\mathcal{J}}, for any i>ii>i^{\prime}, any generator belonging to 𝒥i\mathcal{J}_{i} is in front of any generator belonging to 𝒥i\mathcal{J}_{i^{\prime}}. In order to understand (|m^ba|)a,b𝒥(|\hat{m}_{ba}|)_{a,b\in\mathcal{J}} it remains to determine the relative orders of {(Za,Zb):a,b𝒥 and Za,Zb are in different colors}\{(Z_{a},Z_{b}):a,b\in\mathcal{J}\text{ and $Z_{a},Z_{b}$ are in different colors}\} in X|𝒥X^{\prime}|_{\mathcal{J}}.

Note that X|𝒥X^{\prime}|_{\mathcal{J}} has the distribution of a uniform permutation of (Za)a𝒥(Z_{a})_{a\in\mathcal{J}}, which means we can construct this subsequence by inserting the generators to an existing sequence uniformly at random, one by one. Write 𝒥i=(Zxi,1,,Zxi,d)\mathcal{J}_{i}=(Z_{x_{i,1}},\dots,Z_{x_{i,d}}) for 1in1\leq i\leq n. In order to construct X|𝒥X^{\prime}|_{\mathcal{J}}, we first sample a uniform permutation of (Za)a𝒥1(Z_{a})_{a\in\mathcal{J}_{1}} and denote it by X|𝒥1X^{\prime}|_{\mathcal{J}_{1}}. Without loss of generality, we can label them as (Zx1,1,,Zx1,d)(Z_{x_{1,1}},\dots,Z_{x_{1,d}}). Next we will insert the generators from 𝒥2\mathcal{J}_{2} into this sequence, one by one. Once we are done with inserting the generators in 𝒥2\mathcal{J}_{2}, we proceed to insert the generators from 𝒥3,𝒥4\mathcal{J}_{3},\mathcal{J}_{4} and so on. Since all the generators are inserted at random and one by one, the resulting sequence will be a uniform permutation of the generators in 𝒥\mathcal{J}.

Given the sequence X|𝒥X^{\prime}|_{\mathcal{J}}, we can define a collection of good events {𝒞i:2in}\{\mathcal{C}_{i}:2\leq i\leq n\}. As an example, we will first define 𝒞2\mathcal{C}_{2}. For any l[d]l\in[d], if Zx2,lZ_{x_{2,l}} is inserted between Zx1,jZ_{x_{1,j}} and Zx1,j+1Z_{x_{1,j+1}} for some 0jd0\leq j\leq d (let j=0j=0 if Zx2,lZ_{x_{2,l}} is inserted in front of Zx1,1Z_{x_{1,1}} whereas let j=dj=d if it is inserted behind Zx1,dZ_{x_{1,d}}), then

|m^x1,j,x2,l|={1 for 0jj,0 for j<jd.|\hat{m}_{x_{1,j^{\prime}},x_{2,l}}|=\begin{cases}1&\text{ for }0\leq j^{\prime}\leq j,\\ 0&\text{ for }j<j^{\prime}\leq d.\end{cases}

We will let 𝒞2\mathcal{C}_{2} be the event that in the sequence X|𝒥X^{\prime}|_{\mathcal{J}} there are at least KK distinct pairs from the set {(Zx1,j,Zx1,j+1):0jd}\{(Z_{x_{1,j}},Z_{x_{1,j+1}}):0\leq j\leq d\} such that there is at least one generator from 𝒥2\mathcal{J}_{2} that is between them. The reason for this definition is that if 𝒞2\mathcal{C}_{2} occurs we can collect the first KK elements in

{x1,j:0jd, there is some Zx2,l𝒥2 inserted between Zx1,j and Zx1,j+1}\{x_{1,j}:0\leq j\leq d,\text{ there is some $Z_{x_{2,l}}\in\mathcal{J}_{2}$ inserted between $Z_{x_{1,j}}$ and $Z_{x_{1,j+1}}$}\}

as the row indices of MM and the corresponding x2,lx_{2,l}’s as the column indices. This choice leads to an upper triangular K×KK\times K matrix MM where the top right entries are {±1}\{\pm 1\}. Consequently, the induced matrix MpM_{p} is still an upper triangular matrix that has full rank KK. This argument can be most easily explained through an example.

Example. For simplicity we assume X|𝒥=(Z1,Z2,,Z6)X|_{\mathcal{J}}=(Z_{1},Z_{2},\dots,Z_{6}). Let d=3d=3 so that there are 2 colors. In particular, Z1,Z2,Z3Z_{1},Z_{2},Z_{3} are in color 2 while Z4,Z5,Z6Z_{4},Z_{5},Z_{6} are in color 1. Suppose when constructing X|𝒥X^{\prime}|_{\mathcal{J}} we first sample a random permutation of Z4,Z5,Z6Z_{4},Z_{5},Z_{6} and get (Z4,Z5,Z6)(Z_{4},Z_{5},Z_{6}), and then inserting Z1,Z2,Z3Z_{1},Z_{2},Z_{3} to the current sequence, obtaining as a result

X|𝒥=(Z4,Z1,Z5,Z2,Z6,Z3).X^{\prime}|_{\mathcal{J}}=(Z_{4},Z_{1},Z_{5},Z_{2},Z_{6},Z_{3}).

It is easy to see that as a consequence of inserting Z1Z_{1} (of color 2) in between Z4Z_{4} and Z5Z_{5} (of color 1) in X|𝒥X^{\prime}|_{\mathcal{J}}, the relative order of (Z1,Z4)(Z_{1},Z_{4}) in X|𝒥X^{\prime}|_{\mathcal{J}} is different from that in X|𝒥X|_{\mathcal{J}} and hence |m^41|=1|\hat{m}_{41}|=1. Moreover, we can obtain an upper triangular 2×22\times 2 submatrix

(m^41m^42m^51m^52)=(±1±10±1).\begin{pmatrix}\hat{m}_{41}&\hat{m}_{42}\\ \hat{m}_{51}&\hat{m}_{52}\end{pmatrix}=\begin{pmatrix}\pm 1&\pm 1\\ 0&\pm 1\end{pmatrix}.

In general, for 2in2\leq i\leq n we can define 𝒞i\mathcal{C}_{i} to be the event that in the sequence X|𝒥X^{\prime}|_{\mathcal{J}} there are at least KK distinct pairs from the set

{(Zxi1,l1,Zxi2,l2):i1,i2i1,l1,l2[d] and Zxi1,l1,Zxi2,l2 are consecutive in X|𝒥1𝒥i1}\{(Z_{x_{i_{1}},l_{1}},Z_{x_{i_{2}},l_{2}}):i_{1},i_{2}\leq i-1,l_{1},l_{2}\in[d]\text{ and }Z_{x_{i_{1}},l_{1}},Z_{x_{i_{2}},l_{2}}\text{ are consecutive in }X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i-1}}\}

such that there is at least one generator from 𝒥i\mathcal{J}_{i} that is between them. If any of the events {𝒞i:2in}\{\mathcal{C}_{i}:2\leq i\leq n\} occurred we would be able to find a K×KK\times K submatrix MM satisfying our condition and the set 𝒦\mathcal{K} (from the definition of 𝒜\mathcal{A} in Definition 10) by collecting the corresponding KK row indices of MM. Hence, the corresponding column indices of MM are in 𝒦c\mathcal{K}^{c}, a fact we shall now use to verify condition (47) from the definition of 𝒜\mathcal{A}. One can easily check that for any b𝒦b\in\mathcal{K},

G2ψb\displaystyle G_{2}\psi_{b} =G2a[k]:a<bmbaZa,1+a𝒦c:a>bm^baZa,1\displaystyle=G_{2}\sum_{a\in[k]:a<b}m_{ba}Z_{a,1}+\sum_{a\in\mathcal{K}^{c}:a>b}\hat{m}_{ba}Z_{a,1}
=G2a𝒦cm^baZa,1+G2a𝒦:a<bm^baZa,1\displaystyle=G_{2}\sum_{a\in\mathcal{K}^{c}}\hat{m}_{ba}Z_{a,1}+G_{2}\sum_{a\in\mathcal{K}:a<b}\hat{m}_{ba}Z_{a,1}

where the first term is uniform in GabG_{\mathrm{ab}} and independent from (G2Zb,1)b𝒦(G_{2}Z_{b,1})_{b\in\mathcal{K}}. Therefore, condition (47) is satisfied for this choice of 𝒦\mathcal{K} and 𝒜p(𝒦)\mathcal{A}_{p}(\mathcal{K}) occurs for all pΓp\in\Gamma, i.e., 𝒜\mathcal{A} occurs as long as i=2n𝒞i\cup_{i=2}^{n}\mathcal{C}_{i} occurs. Therefore, it remains to upper bound

(𝒜c|V=0,𝐭𝐲𝐩)(i=2n𝒞ic).\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\leq\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}^{c}_{i}).

For 1in11\leq i\leq n-1, let 𝒢i\mathcal{G}_{i} be the σ\sigma-field that encodes relative orders of generators in 𝒥1𝒥i\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i} in X|𝒥X^{\prime}|_{\mathcal{J}}. Then for 2jn2\leq j\leq n,

(i=2j𝒞ic|𝒢j1)=𝟏{i=2j1𝒞i}(𝒞j|𝒢j1).\mathbb{P}(\cap_{i=2}^{j}\mathcal{C}^{c}_{i}|\mathcal{G}_{j-1})=\mathbf{1}\{\cap_{i=2}^{j-1}\mathcal{C}_{i}\}\cdot\mathbb{P}(\mathcal{C}_{j}|\mathcal{G}_{j-1}).

The key is to observe that (𝒞jc|𝒢j1)=(𝒞jc)\mathbb{P}(\mathcal{C}^{c}_{j}|\mathcal{G}_{j-1})=\mathbb{P}(\mathcal{C}^{c}_{j}) is independent from 𝒢j1\mathcal{G}_{j-1}. Applying this iteratively gives that

(i=2n𝒞ic)=i=2n1(𝒞ic).\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}^{c}_{i})=\prod_{i=2}^{n-1}\mathbb{P}(\mathcal{C}_{i}^{c}).

We will calculate (𝒞i+1c)\mathbb{P}(\mathcal{C}_{i+1}^{c}) for 1in11\leq i\leq n-1. Looking at the distribution of the subsequence X|𝒥1𝒥i+1X^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i+1}} is equivalent to looking at the subsequence X|𝒥1𝒥iX^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}} and inserting the generators in 𝒥i+1\mathcal{J}_{i+1} randomly and one by one to this subsequence. This perspective allows us to calculate (𝒞ic)\mathbb{P}(\mathcal{C}^{c}_{i}) via a multi-type urn scheme.

The subsequence X|𝒥1𝒥iX^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}} has length didi and we are inserting the generators from 𝒥i+1\mathcal{J}_{i+1}. If we view the gap between two generators in the existing sequence X|𝒥1𝒥iX^{\prime}|_{\mathcal{J}_{1}\cup\cdots\cup\mathcal{J}_{i}} as a distinct type (also taking into account the gap before and behind all generators), then we would have di+1di+1 types and there is one ball of each type in the urn. We will be conducting dd steps. At each step, we choose a ball randomly from the urn and place the ball together with a new ball of the same type back to the urn. It is not difficult to see this urn scheme is equivalent to inserting elements randomly to a sequence.

Our goal is to understand the probability of the event

{there are at most K1 types with at least 2 balls after d balls have been inserted},\{\text{there are at most $K-1$ types with at least 2 balls after $d$ balls have been inserted}\},

which is the event equivalent to 𝒞i+1c\mathcal{C}^{c}_{i+1} in the urn model and hence has the same probability. This calculation can be simplified by first fixing the di+1(K1)di+1-(K-1) types each with at most one ball from all di+1di+1 types and then group the rest K1K-1 types together to form a new type, called type 0. Then we have a new urn model with di+1Kdi+1-K types (coming from the di+1(K1)di+1-(K-1) types and the new type 0) starting with K1K-1 balls of type 0 and one ball of each remaining type. The probability that we are only going to choose balls of type 0 (thus in the original urn model there are at most K1K-1 types that can potentially have at least 2 balls) is

j=0d1K1+jdi+1+j=(di)!(K+d2)!(d(i+1))!(K2)!.\displaystyle\prod_{j=0}^{d-1}\frac{K-1+j}{di+1+j}=\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!}.

Therefore,

(𝒞i+1c)(di+1K1)(di)!(K+d2)!(d(i+1))!(K2)!(di+1)K1(di)!(K+d2)!(d(i+1))!(K2)!.\mathbb{P}(\mathcal{C}_{i+1}^{c})\leq{di+1\choose K-1}\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!}\leq(di+1)^{K-1}\cdot\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!}.

Finally, we have

(i=2n𝒞ic)=i=1n1(𝒞i+1c)(i=1n1(di+1)K1)i=1n1(di)!(K+d2)!(d(i+1))!(K2)!,\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})=\prod_{i=1}^{n-1}\mathbb{P}(\mathcal{C}_{i+1}^{c})\leq\left(\prod_{i=1}^{n-1}(di+1)^{K-1}\right)\cdot\prod_{i=1}^{n-1}\frac{(di)!(K+d-2)!}{(d(i+1))!(K-2)!},

where the second product is a telescoping product and hence can be simplified to yield

(i=2n𝒞ic)\displaystyle\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c}) exp(Knlog(nd))d!(nd)!((K+d2)!)n1((K2)!)n1\displaystyle\leq\exp(Kn\log(nd))\cdot\frac{d!}{(nd)!}\cdot\frac{((K+d-2)!)^{n-1}}{((K-2)!)^{n-1}}
=exp(Knlog(nd))(d!)n(nd)!((K+d2)!d!(K2)!)n1\displaystyle=\exp(Kn\log(nd))\cdot\frac{(d!)^{n}}{(nd)!}\cdot\left(\frac{(K+d-2)!}{d!(K-2)!}\right)^{n-1}
exp(Knlog(nd))(K+d2)(K2)(n1)(d!)n(nd)!\displaystyle\leq\exp(Kn\log(nd))\cdot(K+d-2)^{(K-2)(n-1)}\cdot\frac{(d!)^{n}}{(nd)!} (73)

The key is to understand φ(d):=(d!)n(nd)!\varphi(d):=\frac{(d!)^{n}}{(nd)!}. By stirling’s formula we have

φ(d)(2πd)n/21n1/2exp(ndlogn).\varphi(d)\lesssim\frac{(2\pi d)^{n/2-1}}{n^{1/2}}\cdot\exp(-nd\log n).

We collect the first terms as

exp(Knlog(nd))(K+d2)(K2)(n1)(2πd)n/21n1/2exp(CKnlog(nd))\exp(Kn\log(nd))\cdot(K+d-2)^{(K-2)(n-1)}\cdot\frac{(2\pi d)^{n/2-1}}{n^{1/2}}\leq\exp(C_{K}n\log(nd))

for some constant CK>0C_{K}>0. Recall that nd|𝒥|nd\approx|\mathcal{J}|. We will choose d1d\gg 1 which implies that

nlog(nd)ndlogn,n\log(nd)\ll nd\log n, (74)

and consequently, for arbitrarily small δ>0\delta>0, when nn is sufficiently large we have

(i=2n𝒞ic)exp((1δ)ndlogn)\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})\leq\exp(-(1-\delta)nd\log n) (75)

Recall from (29) and Definition 5 that when klog|Gab|k\gtrsim\log|G_{\mathrm{ab}}|, conditioning on 𝐭𝐲𝐩\mathrm{\mathbf{typ}} ensures |𝒥|(1ε/2)tet/k(1ε)t|\mathcal{J}|\geq(1-\varepsilon/2)te^{-t/k}\geq(1-\varepsilon)t.

  • In the regime kλlog|Gab|log|G|k\eqsim\lambda\log|G_{\mathrm{ab}}|\asymp\log|G|, the above implies that |𝒥|k|\mathcal{J}|\asymp k. Since nd|𝒥|nd\approx|\mathcal{J}| we have

    (i=2n𝒞ic)exp((1δ)ndlogn)=o(|G|1)\mathbb{P}(\cap_{i=2}^{n}\mathcal{C}_{i}^{c})\leq\exp(-(1-\delta)nd\log n)=o(|G|^{-1})

    as long as nn is a sufficiently large constant.

  • For the regime klog|Gab|k\gg\log|G_{\mathrm{ab}}|, we have t/k1t/k\ll 1. By typicality, |𝒥|(1ε/2)tet/k(1ε)t|\mathcal{J}|\geq(1-\varepsilon/2)te^{-t/k}\geq(1-\varepsilon)t. Recall that we write ρ=logkloglog|Gab|\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}.

    • When t=t0t_{*}=t_{0} and t(1+3ε)t0t\geq(1+3\varepsilon)t_{0}, we can choose nn so that nd(1ε/2)|𝒥|(1+ε)t0nd\geq(1-\varepsilon/2)|\mathcal{J}|\geq(1+\varepsilon)t_{0}. Recall that in this regime ρρ1log|G|log|Gab|\frac{\rho}{\rho-1}\geq\frac{\log|G|}{\log|G_{\mathrm{ab}}|}, i.e., 1ρ1log|G2|log|Gab|\frac{1}{\rho-1}\geq\frac{\log|G_{2}|}{\log|G_{\mathrm{ab}}|}, and t0k1κlogκt_{0}\eqsim k\cdot\frac{1}{\kappa\log\kappa} where κ=k/log|Gab|\kappa=k/\log|G_{\mathrm{ab}}|.

      Letting 1dt0ε/81\ll d\ll t_{0}^{\varepsilon/8} and δ=ε/4\delta=\varepsilon/4, the failure probability given by (75) is at most

      exp((1δ)ndlogn)\displaystyle\exp(-(1-\delta)nd\log n) exp((1δ)(1+ε)t0log(t0/d))exp((1+ε/2)t0logt0)\displaystyle\leq\exp(-(1-\delta)(1+\varepsilon)t_{0}\log(t_{0}/d))\leq\exp(-(1+\varepsilon/2)t_{0}\log t_{0})
      exp((1+ε/4)log|Gab|ρ1)|G2|(1+ε/4),\displaystyle\leq\exp\left(-(1+\varepsilon/4)\frac{\log|G_{\mathrm{ab}}|}{\rho-1}\right)\leq|G_{2}|^{-(1+\varepsilon/4)},

      where in the second to the last inequality we use the fact that

      t0logt0log|Gab|loglog|Gab|loglog(k/log|Gab|)log(k/log|Gab|)=log|Gab|ρ1(1o(1)).t_{0}\log t_{0}\eqsim\log|G_{\mathrm{ab}}|\cdot\frac{\log\log|G_{\mathrm{ab}}|-\log\log(k/\log|G_{\mathrm{ab}|})}{\log(k/\log|G_{\mathrm{ab}}|)}=\frac{\log|G_{\mathrm{ab}}|}{\rho-1}(1-o(1)).
    • When t=t1=logk|G|t_{*}=t_{1}=\log_{k}|G| and t(1+3ε)t1t\geq(1+3\varepsilon)t_{1}, we have nd(1+ε)t1nd\geq(1+\varepsilon)t_{1} and similarly the failure probability is at most

      exp((1δ)ndlogn)exp((1+ε/2)t1logt1)exp((1+ε/4)log|G|ρ)=|G|(1+ε/4)/ρ,\exp(-(1-\delta)nd\log n)\leq\exp(-(1+\varepsilon/2)t_{1}\log t_{1})\leq\exp\left(-(1+\varepsilon/4)\frac{\log|G|}{\rho}\right)=|G|^{-(1+\varepsilon/4)/\rho},

      where the last inequality holds because ρ=logkloglog|Gab|logkloglog|G|\rho=\frac{\log k}{\log\log|G_{\mathrm{ab}}|}\eqsim\frac{\log k}{\log\log|G|} and t1logt1=(1o(1))t1loglog|G|t_{1}\log t_{1}=(1-o(1))t_{1}\log\log|G|.

Therefore, we have proved that (𝒜c|V=0,𝐭𝐲𝐩)eh/|G|\mathbb{P}(\mathcal{A}^{c}|V=0,\mathrm{\mathbf{typ}})\ll e^{h}/|G| (see Definition 4 for the value of hh in each regime) and thus completed the proof of Proposition 10.

References

  • [1] David Aldous and Persi Diaconis. Shuffling cards and stopping times. The American Mathematical Monthly, 93(5):333–348, 1986.
  • [2] Nathanaël Berestycki. Mixing times of markov chains: Techniques and examples. Alea-Latin American Journal of Probability and Mathematical Statistics, 2016.
  • [3] Nathanaël Berestycki, Eyal Lubetzky, Yuval Peres, and Allan Sly. Random walks on the random graph. The Annals of Probability, 46(1):456–490, 2018.
  • [4] Charles Bordenave, Pietro Caputo, and Justin Salez. Cutoff at the “entropic time” for sparse markov chains. Probability Theory and Related Fields, 173:261–292, 2019.
  • [5] Charles Bordenave and Hubert Lacoin. Cutoff at the entropic time for random walks on covered expander graphs. Journal of the Institute of Mathematics of Jussieu, 21(5):1571–1616, 2022.
  • [6] Emmanuel Breuillard and Matthew CH Tointon. Nilprogressions and groups with moderate growth. Advances in Mathematics, 289:1008–1055, 2016.
  • [7] Wai-Sin Ching. Linear equations over commutative rings. Linear Algebra and its Applications, 18(3):257–266, 1977.
  • [8] Don Coppersmith and Igor Pak. Random walk on upper triangular matrices mixes rapidly. Probability theory and related fields, 117:407–417, 2000.
  • [9] Persi Diaconis and Robert Hough. Random walk on unipotent matrix groups. In Annales scientifiques de lÉcole normale supérieure, volume 54, 2021.
  • [10] Persi Diaconis and Laurent Saloff-Coste. Moderate growth and random walk on finite groups. Geometric & Functional Analysis GAFA, 4:1–36, 1994.
  • [11] Carl Dou and Martin Hildebrand. Enumeration and random random walks on finite groups. The Annals of Probability, 24(2):987–1000, 1996.
  • [12] Carl CZ Dou. Studies of random walks on groups and random graphs. PhD thesis, Massachusetts Institute of Technology, 1992.
  • [13] David Steven Dummit and Richard M Foote. Abstract algebra, volume 3. Wiley Hoboken, 2004.
  • [14] Daniel El-Baz and Carlo Pagano. Diameters of random cayley graphs of finite nilpotent groups. Journal of Group Theory, 24(5):1043–1053, 2021.
  • [15] J Ellenberg. A sharp diameter bound for upper triangular matrices. Senior Honors Thesis, Department of Mathematics, Harvard University, 1993.
  • [16] Jordan S Ellenberg and Julianna Tymoczko. A sharp diameter bound for unipotent groups of classical type over /p\mathbb{Z}/p\mathbb{Z}. 2010.
  • [17] J Hermon and S Olesker-Taylor. Cutoff for almost all random walks on abelian groups (2021). arXiv preprint arXiv:2102.02809, 2021.
  • [18] Jonathan Hermon and Sam Olesker-Taylor. Supplementary material for random cayley graphs project. arXiv preprint arXiv:1810.05130, 2018.
  • [19] Jonathan Hermon and Sam Olesker-Taylor. Cutoff for random walks on upper triangular matrices. arXiv preprint arXiv:1911.02974, 2019.
  • [20] Jonathan Hermon and Sam Olesker-Taylor. Further results and discussions on random cayley graphs. arXiv preprint arXiv:1911.02975, 2019.
  • [21] Jonathan Hermon and Sam Olesker-Taylor. Geometry of random cayley graphs of abelian groups. arXiv preprint arXiv:2102.02801, 2021.
  • [22] Martin Hildebrand. Random walks supported on random points of /n\mathbb{Z}/n\mathbb{Z}. Probability Theory and Related Fields, 100:191–203, 1994.
  • [23] Martin Hildebrand. A survey of results on random random walks on finite groups. 2005.
  • [24] Robert Hough. Mixing and cut-off in cycle walks. Electronic Journal of Probability, 22(none):1 – 49, 2017.
  • [25] Russell Lyons and Yuval Peres. Probability on trees and networks, volume 42. Cambridge University Press, 2017.
  • [26] Evita Nestoridi. Super-character theory and comparison arguments for a random walk on the upper triangular matrices. Journal of Algebra, 521:97–113, 2019.
  • [27] Evita Nestoridi and Allan Sly. The random walk on upper triangular matrices over /m\mathbb{Z}/m\mathbb{Z}. arXiv preprint arXiv:2012.08731, 2020.
  • [28] Igor Pak et al. Two random walks on upper triangular matrices. Journal of Theoretical Probability, 13(4):1083–1100, 2000.
  • [29] Yuval Peres and Allan Sly. Mixing of the upper triangular matrix walk. Probability Theory and Related Fields, 156(3-4):581–591, 2013.
  • [30] Yuval Roichman. On random random walks. The Annals of Probability, 24(2):1001–1011, 1996.
  • [31] Richard Stong. Random walks on the groups of upper triangular matrices. The Annals of Probability, pages 1939–1949, 1995.
  • [32] EL Wilmer, David A Levin, and Yuval Peres. Markov chains and mixing times. American Mathematical Soc., Providence, 2009.
  • [33] David Bruce Wilson. Random random walks on 2d\mathbb{Z}_{2}^{d}. Probability Theory and Related Fields, 108:441–457, 1997.