Cutoff for random Cayley graphs of nilpotent groups
Abstract.
We consider the random Cayley graphs of a sequence of finite nilpotent groups of diverging sizes , whose ranks and nilpotency classes are uniformly bounded. For some such that , we pick a random set of generators by sampling elements from uniformly at random with replacement, and set . We show that the simple random walk on Cay exhibits cutoff with high probability.
Some of our results apply to a general set of generators. Namely, we show that there is a constant , depending only on the rank and the nilpotency class of , such that for all symmetric sets of generators of size at most , the spectral gap and the -mixing time of the simple random walk on Cay are asymptotically the same as those of the projection of to the abelianization of , given by . In particular, exhibits cutoff if and only if its projection does.
Key words and phrases:
cutoff, mixing times, random walk, random Cayley graphs, nilpotent groups2020 Mathematics Subject Classification:
Primary: 05C48, 05C80, 05C81; 20D15; 60B15, 60J27, 60K37.1. Introduction
1.1. Motivation and Objectives of Paper
We examine the random walk (RW) on a Cayley graph of a finite nilpotent group w.r.t. a symmetric set of generators . We are interested in the asymptotic behavior of the mixing time and the spectral gap of the walk as , while the step (also called nilpotency class) of and its rank (minimal size of a set of generators), denoted respectively by and , remain bounded.
We also investigate the occurrence of the cutoff phenomenon for the walk, which in the above setup can only occur when (i.e., diverges as ). In particular, we prove that the walk exhibits cutoff with high probability when is obtained by picking elements of , , uniformly at random, with replacement, and then setting under the necessary condition that .
The overarching theme of this work is that in a certain strong quantitative sense the mixing behavior of the walk is governed by that of the projection of the walk to the abelianization of , denoted by , where is the commutator subgroup of . The projected walk is a random walk on the projected Cayley graph , where is the projection of to .
As will be stated precisely in Theorem 2 and Corollary 1, for arbitrary symmetric such that for some constant depending only on the rank and step of , the relaxation times (inverses of the spectral gaps) of and are equal:
and the total variation mixing times of the random walks on and satisfy
for and . In other words, in order to determine the mixing time or the occurrence of cutoff for the walk on , it suffices to do so for .
In the case that , where are i.i.d. and uniformly distributed over , we extend the analysis to the regime and prove that the random walk exhibits cutoff with high probability around time , where is defined as the time at which the entropy of the rate 1 continuous time simple random walk on is .
1.1.1. Motivations
To motivate our investigation, let us consider for now the scenario where is a finite group, is an integer (allowed to depend on ) and denotes the Cayley graph of with respect to independently and uniformly chosen random generators. These are elements of that will generate with high probability when is sufficiently large and hence, with slight abuse of language, will be referred to as generators of . We consider values of with for which is connected with high probability, that is, with probability tending to 1 as .
Universality of cutoff.
Aldous and Diaconis [1] introduced the term “cutoff phenomenon”, which describes the phenomenon where the total variation distance (TV) between the distribution of a random walk and its equilibrium distribution sharply decreases from nearly 1 to nearly 0 within a time interval of smaller order than the mixing time. The material in this paper is motivated by their conjecture on the “universality of cutoff” for the RW on the random Cayley graph given in [1].
Conjecture (Aldous and Diaconis, [1]).
For any group , if and , then the random walk on exhibits cutoff with high probability.
Additionally, a secondary aspect of the conjecture suggests that the cutoff time does not rely on the algebraic structure of the group, but rather it can be expressed solely as a function of and .
This conjecture has sparked a substantial body of research, see e.g., [11, 12, 22, 23, 24, 30, 33]. The curious reader is referred to Section 1.3.1 in [17] for a detailed exposition on the literature regarding the progress on the Aldous-Diaconis university conjecture. In this context, we provide a more condensed overview of literature related to this conjecture, which serves as motivations for our work.
In [11, 12], Dou and Hildebrand confirmed the Aldous-Diaconis universality conjecture for all abelian groups. Additionally, their upper bound on the mixing time holds true for all groups. Furthermore, when , this upper bound matches the trivial diameter lower bound of , confirming the aforementioned secondary aspect of the conjecture. Hermon and and Olesker-Taylor [17, 18, 20, 21] extend this conjecture to the regime for abelian groups, establishing cutoff under the condition , where is the minimal size of a generating subset of . Moreover, when , the cutoff time is given by
(1) |
where is the time at which the entropy of the rate 1 random walk on is . Due to this definition is also referred to as the entropic time (see Definition 3). Their work confirms that for abelian groups the cutoff time of RW only depends on and .
Building upon this point, the next natural step is to explore the mixing behavior of random walks on nilpotent groups (see Section 1.4.1 for a brief overview of the literature concerning this topic). Hermon and Olesker-Taylor [19] study two canonical families of nilpotent groups: the unit-upper triangular matrices with entries in and the -dimensional Heisenberg group over where (the results hold under certain assumptions on depending on the regimes of ). Let be either or and its abelianization. They prove that for the random walk on the Cayley graph exhibits cutoff with high probability at time
(2) |
where is the time at which the entropy of the rate 1 random walk on is . We compare (2) with (1), the latter of which gives the characterization of cutoff time for abelian groups. Indeed, can be interpreted as the time at which the projection of the walk on onto the abelianization exhibits cutoff. This characterization of the cutoff time for random walks on and raises a natural question: does (2) characterize the cutoff time for random walks on nilpotent groups in general? This question will be explored in our investigation.
Another natural extension of the current research on random walks on groups involves extending the choice of generators. Rather than requiring the generators to be chosen independently and uniformly at random from the group , the aim is to advance our understanding to encompass scenarios involving arbitrary choices of generators.
More often than not, the analysis of the mixing of the random walk heavily replies on the specific selection of generators. For example, there is a line of research focusing on understanding the mixing properties of random walks on the unit upper triangular matrix group with prime, wherein the set of generators is either or , where represents the matrix with 1 at the entry and 0 elsewhere. See Section 1.4.1 for an overview of existing research. The current analysis critically hinges on the fact that an operation corresponds to a row addition/subtraction, allowing the decomposition of the walk’s mixing behavior into the first row and the remaining part of the matrix, the latter of which can be regarded as a matrix. However, such methodologies become inapplicable when dealing with arbitrary generators. It is hence of interest to develop techniques that enable the study of mixing properties of random walks on Cayley graphs using a wider range of prescribed generators.
Leading role of the abelianization in random walk mixing.
When is a nilpotent group and is a symmetric set of generators consisting of i.i.d. uniform random elements of , we shall see in Theorem 1 that the mixing time of the RW is determined by the abelianization of , given by the expression (2).
When the generator set is predetermined, in many instances it has also been demonstrated that the mixing time of random walks on groups is primarily determined by the abelianization of the group. In a recent and notable study, Diaconis and Hough [9] introduced a novel approach to establishing a central limit theorem for random walks on unipotent matrix groups driven by a probability measure , under certain general constraints. It is worth noting that their methodology applies to various choices of generators. For unit-upper triangular matrices , it has been established that an individual coordinate on the -th diagonal mixes in order steps, implying the leading role of abelianization (which corresponds to the first diagonal) in the mixing process of this random walk.
Nestoridi and Sly [27] studied the mixing behavior of the rate 1 RW on under the canonical set of generators . In their analysis, it is proved that the mixing time of the RW on is bounded by , where the former term vaguely characterizes the mixing behavior of the RW on the abelianization. This observation becomes clearer when we consider the projected walk on the abelianization, which, under the canonical set of generators, can be viewed as a product chain on and thus has mixing time of order . Essentially, when is considerably larger than , this upper bound is predominantly dictated by the mixing on the abelianization.
One might naturally inquire about the extent to which the abelianization dictates the mixing time of the RW on a general group. In addition, there is interest in explicitly identifying the dependence of the mixing time on the abelianization, which leads to our next motivation.
“Entropic time paradigm”.
As previously discussed, although the entropic time is the mixing time for “most” choice of generators (when ) for abelian groups and nilpotent groups, finding an explicit choice of generators which gives rise to cutoff at the entropic time is still open — even for the cyclic group of prime order. Part of our motivation is to understand the extent to which this paradigm applies for a given set of generators.
It is worth pointing out that for general non-random choice of generators, the cutoff time is not necessarily given by the entropic time. For instance, Hough [24, Theorem 1.11] shows that for the cyclic group of prime order the choice of generators , which he describes as “an approximate embedding of the classical hypercube walk into the cycle”, gives rise to a random walk on that ehxibits cutoff, where the cutoff time is not the entropic time.
Mixing under minimal sets of generators.
A minimal set of generators is a set of elements that generates the group which is minimal in terms of size. For -groups, it is known that the minimal sets of generators can be described by the Frattini subgroup in the sense that any set such that the cosets form a basis of gives a generating set of . See, e.g., Diaconis and Saloff-Coste [10, Section 5.C]. A random walk supported on the minimal set of generators is thus referred to as a Frattini walk. Examples of such walks are discussed in Section 5.C of [10]. For the Heisenberg group with prime , it can be shown that all minimal sets of generators are equivalent from a group theory approach.
Diaconis and Saloff-Coste additionally remarked that based on their experience with the circle and symmetric group, if the number of generators is fixed, most sets of generators should lead to the same convergence rate for the random walk. Motivated by these examples, they ask the following open question (see Remark 2 on Page 23 of [10]): to what extent does the choice of generators effect the mixing behavior?
We give a neat partial answer to this question in Theorem 3. Suppose that is a -group with or that for some . Under very mild assumptions on the rank and step of , for all minimal (symmetric) sets of generators, the corresponding mixing times on are the same up to smaller order terms and the corresponding relaxation times are the same.
1.1.2. Objectives
Motivated by the questions discussed in the preceding section, our primary focus in this paper is as follows.
(i) Study the random walk on for general nilpotent groups. Expanding upon the current understanding of random walks on groups, our goal is to establish cutoff for random walks on when is a nilpotent group and . In particular, we are interested in a general characterization of the cutoff time. An important implication of the findings in [19] is that for certain regimes of , the cutoff time for (or ) does not depend only on and . Nevertheless, the only additional information required to determine the cutoff times for these two examples is the size of the abelianization, as indicated by (2). We hope to generalize the characterization of the cutoff time in (2) to general nilpotent groups.
We thank Péter Varjú for suggesting us the problem of extending the analysis from [19] to other111Namely, the case that is step 2 and , i.e., is elementary abelian. We also wish to thank him for providing invaluable insights regarding how certain components of the argument from [19] could be interpreted in terms of the general theory of nilpotent groups. nilpotent groups.
(ii) Develop techniques applicable when the generators are chosen arbitrarily. As indicated by previous discussions, the mixing time of random walks on a group under various choices of generators is largely determined by the abelianization of the group. We aim to explore the extent to which this leading role of the abelianization holds in a broader context.
In essence, our objective is to develop techniques for studying the mixing properties of random walks, applicable not only under arbitrary choices of generators but also for general groups, without dependence on specific group structures.
1.2. Definitions and Notation
We give the precise definitions of the Cayley graph on and the random walk on Cayley graphs.
Let be a nilpotent group with lower central series
where . In particular, denotes the commutator subgroup of . We also denote by the abelianization of . The rank of a nilpotent group , denoted by , is the smallest integer such that can be generated by a set containing elements of and their inverses. The number is called the step (or the nilpotency class) of , i.e., and .
For a finite group , let be a symmetric subset, i.e., if and only if . We will refer to as the set of generators when generates . The undirected Cayley graph of generated by is defined as follows.
Definition 1 (Cayley multi-graph generated by a set of generators).
Fix a symmetric set of generators. Let denote the (right) Cayley multi-graph generated by with respect to , where the vertex set and the edge set . We allow parallel edges and self loops (if ) so that the Cayley graph is regular with degree .
Random walk on Cayley graphs. We will consider the undirected random walk on the Cayley graph which jumps at rate 1, where . Let be an i.i.d. sequence of indices uniformly sampled from , and let be an i.i.d. sequence of signs uniformly sampled from . At the -th jump, the generator is applied to the walk in the sense that we multiply to the right of the current location of . That is, the random walk can be written as a sequence
where is the number of steps taken by by time and denotes the -th step taken by the random walk with .
Notation. Throughout the paper, we use standard asymptotic notation: “” or “” means “of smaller order”; “” or “” means “of order at most”; “” means “of the same order”; “” means “asymptotically equivalent”. We will abbreviate “with high probability” by whp.
Assumptions. Throughout the paper, we will let be a finite nilpotent group of step and rank where .
1.3. Overview of Main Results
We focus on the mixing behavior of the random walk on a Cayley graph of a finite nilpotent group with a symmetric generator set . We consider the limit as under the assumption that . The condition is necessary for the random walk to exhibit cutoff on for all nilpotent , see the remark below.
Remark 1.
For any choice of generators, it was established by Diaconis and Saloff-Coste [10] that there is no cutoff when for all nilpotent groups, which is a class of groups that satisfies their concept of moderate growth. The interested reader can find a short exposition of their argument in [20, §4]. When and with i.i.d. uniform generators, there is no cutoff for all groups, see [17, §7.2]. Dou [12, Theorems 3.3.1 and 3.4.7] establishes a more general result for .
1.3.1. Cutoff for Random Walks on Nilpotent Groups
We use standard notation and definitions for mixing and cutoff, see e.g. [32, §4 and §18].
Definition 2.
A sequence of Markov chains is said to exhibit cutoff if there exists a sequence of times with
where is the TV distance of from its equilibrium distribution for each .
We say that a RW on a sequence of random graphs exhibits cutoff around time whp if, for all fixed , in the limit , the TV distance at time converges in distribution to 0 and at time to 1, where the randomness is over .
In other words, is said to exhibits cutoff when the TV distance of the distribution of the chain from equilibrium drops from close to 1 to close to 0 in a short time interval of smaller order than the mixing time.
As briefly discussed in Section 1.1.1, there has been considerable interest in studying the cutoff behavior of random walks on groups. Our goal is to generalize the characterization of cutoff time as to general nilpotent groups (for random i.i.d. generators).
We now give the formal definition of the entropic time and the proposed mixing time.
Definition 3.
(i) Let be the time at which the entropy of the rate 1 random walk on is . We refer to as the entropic time.
(ii) Define . We refer to as the cutoff time or the mixing time.
The entropic time is identified as the cutoff time for the projected random walk on , see [17], which is naturally a lower bound on the mixing time of the RW on . To offer insight into the definition of the cutoff time, note that we need to run the RW sufficiently long to ensure that all elements of the group can be reached with reasonable probability, which leads to a lower bound of .
Our first result establishes cutoff around time for the random walk on where consists of i.i.d. uniform generators.
Theorem 1.
Let be a finite nilpotent group with . Let with . Assume . As , the random walk on exhibits cutoff with high probability at time , which is the cutoff time defined in Definition 3.
1.3.2. Random Walk on Non-random Cayley Graphs: Reduction to Abelianization
For a nilpotent group and any symmetric set of generators whose size satisfies an upper bound, we show that the mixing time of the random walk on is completely determined (up to smaller order terms) by the mixing time of the projected walk on .
Theorem 2.
Let be a finite nilpotent group such that and be a symmetric set of generators. Suppose . For any fixed and we have
when is sufficiently large (more precisely, when ).
Remark 2.
The assumption is to guarantee that is of smaller order than so that the mixing of is governed by its projected walk onto . With more specific knowledge on the structure of , one can expect to obtain a much less stringent constraint on . Also see Remark 4.
As a direct consequence of the proof of Theorem 2, we establish that under the same conditions, the spectral gap of the random walk on is likewise determined by the spectral gap of its projection onto .
Corollary 1.
Let and be the relaxation time of the walk and respectively. Then
In particular, when we have .
As a consequence of the above results, we can see that for a class of nilpotent groups whose abelianization has a unique representation, with a symmetric set of generators of minimal size, the mixing time and the relaxation time (inverse of the spectral gap) of the random walk do not depend on the choice of . In this case, the choice of generators do not effect the mixing behavior. This provides a partial answer to the open question posed in Section 1.1.1.
Theorem 3.
Suppose is a nilpotent group with rank and step such that either (i) where or (ii) is a -group. Suppose the rank and step satisfy . For any symmetric set of generators of minimal size and any given , the mixing time is the same up to smaller order terms, and the relaxation time is the same.
The mixing property of the random walk on the Cayley graph of is closely related to that of the projected random walk on the Cayley graph of . More precisely, denoting by the projected RW on and starting with the walk being uniform over , one can observe (see Lemma 3.1) that
As suggested by the following triangle inequality
if the total variation distance between and can be nicely controlled then the mixing property of is primarily characterized by the mixing of its projection on the abelianization, which we refer to as the reduction to abelianization.
We will prove in Lemma 3.2 that indeed decays exponentially fast in time with rate at least , where is the diameter of in . This provides a quantitive criterion to determine when the mixing of the walk is governed by its projection onto the abelianization. In particular, if the mixing of the projected walk occurs after had become vanishingly small then the mixing time of is roughly that of .
Due to the well known connection between the mixing time and the diameter of the graph, see, e.g., [25, Proposition 13.7], for our purpose it is sufficient to prove is small enough compared to . Section 2 is devoted to proving an upper bound on where the roles of , and are made explicit.
Theorem 4.
Let be a symmetric set of generators and let be such that for all . For , we have
(3) |
As a consequence, for any set of generators satisfying , one has and hence the mixing of can be reduced to the mixing of its projection onto the abelianization.
1.3.3. Our Methodology
We describe our methodology in relation to the objectives described in Section 1.1.2.
(i) Representation of random walk. A substantial body of work has been devoted to the study of random walks on unipotent matrix groups, see Section 1.4.1. The analysis in many existing work heavily depends on the favorable matrix structure specific to unipotent matrix groups, a feature not necessarily present in general nilpotent groups.
There has been some progress made towards treating general nilpotent groups. In [17, §6], partial results were obtained using a comparison between the mixing time of a general nilpotent group with a “corresponding” abelian group in [17, §6]. See Section 1.2 for the definition of . More specifically, denoting by and respectively the random Cayley graphs generated by i.i.d. uniform generators in and , it is shown that with high probability, thereby offering an upper bound on the mixing time on .
This comparison leads to a tight upper bound and thus establishes cutoff when is a nilpotent group when has a relatively small commutator subgroup . Examples of such groups include -groups with “small” commutators and Heisenberg groups of diverging dimension, see [17, Corollary D.1 and D.2]. However, for general nilpotent groups this comparison is not sharp.
While the comparison technique discussed in [17] may not ensure a sharp upper bound on the mixing time for general nilpotent groups, it underscores the approach of examining the mixing behavior in relation to each quotient group . To obtain the tight upper bound and establish cutoff, we give an accurate representation of the random walk dynamics through the lens of quotient groups.
To give a bit of intuition, let us consider the free nilpotent group of step 2 (i.e., ). Let be a set of i.i.d. uniform generators. Let be an auxiliary process defined based on the random walk where is the number of times generator has been applied minus the number of times has been applied in the random walk . Through rearranging, we can express any word in the form
(4) |
where results from the rearrangement of generators, see (30) for more details. Roughly speaking, keeps track of the walk on whereas the term , which belongs to , corresponds to the mixing on the quotient group .
We demonstrate in Section 4.3 that this line of reasoning applies to general nilpotent groups of step , see (33). Although the rearranging of generators leads to the presence of multi-fold commutators such as when , we will argue, through a further careful simplification, that the presence of multi-fold commutators does not add to the complexity of the analysis, and one only needs to control the distribution of as with the case where .
(ii.a) Comparison argument: reduction to abelianization. We develop a nice argument of comparison that addresses the mixing of random walk on general groups with an arbitrary generator set . Under mild assumptions on the size of , the mixing time on is the same as the mixing time of the projected walk on (up to smaller order terms), see the precise statement in Theorem 2. That is, within the scope of Theorem 2, the mixing time on is completely determined by that on the abelianization .
Theorem 2 further implies that for a certain class of nilpotent groups with specific structures in their abelianization, the mixing time remains the same (up to smaller order terms) regardless of the choice of a minimal-sized symmetric set of generators. See Theorem 3 for the precise statement.
(ii.b) Geometry of the Cayley graph on nilpotent groups. We derive a quantitative upper bound on the diameter of the commutator subgroup in terms of the diameter of the abelianization , with explicit dependence on the rank and step of the group , as detailed in Theorem 4. This, combined with the aforementioned comparison argument, allows us to provide sufficient conditions under which the mixing behavior of the random walk on is governed by that of the projected walk on .
1.4. Historic Overview
1.4.1. Random Walks on Unipotent Matrix Groups
Consider the group of upper-triangular matrices with ’s along the diagonal, so they are the group of matrices
Then, a unipotent group can be defined as a subgroup of some . This includes the two families of nilpotent groups discussed earlier: the unit-upper triangular matrices with entries in and the -dimensional Heisenberg group over where .
The exploration of random walks on unit upper triangular matrices has led to a substantial body of research. One avenue of investigation involves a simple walk on , the unit upper triangular matrix group with entries over for some : a row is chosen uniformly and added to or subtracted from the row above. Ellenberg [15] studied the diameter of the associated Cayley graph, with growing, and subsequently improved this in Ellenberg and Tymoczko [16]. Stong [31] gave mixing bounds via analysis of eigenvalues. Coppersmith and Pak [8, 28] look directly at mixing. Further work along this line includes Peres and Sly [29], Nestoridi [26] and Nestoridi and Sly [27]. Notably, Nestoridi and Sly [27] are the first to optimize bounds for and simultaneously. Diaconis and Hough [9] introduced a new method for proving a central limit theorem for random walks on unipotent matrix groups.
In the context of i.i.d. uniformly chosen generators, Hermon and Olesker-Taylor [19] prove the characterization of the cutoff time as the entropic time of the projected walk onto the abelianization for the two families of nilpotent groups: the unit-upper triangular matrices with entries in and the -dimensional Heisenberg group over where .
1.4.2. The Entropic Methodology
A common theme in the study of mixing times is that “generic” instances often exhibit the cutoff phenomenon. Moreover, this can often be handled via the entropic method, see, e.g., [3, 4, 5]. A more detailed exposition of the known literature can be found in a previous article of one of the authors, see [17, §1.3.5]. Additionally, the entropic method has been applied within the context of random walks on groups, as discussed in [17, 19], which we now explain in a little more depth.
The main idea is to relate the mixing of the random walk on to that of an auxiliary process and study the entropy of . Suppose is given. The auxiliary process is defined based on where is the number of times generator has been applied minus the number of times has been applied in the random walk . The observation that is a rate 1 random walk on (whose entropy reveals information regarding the mixing of the walk ) leads naturally to the definition of the entropic times, see Definition 3. More specifically, the auxiliary process is related to the original random walk as follows. We sample two independent copies of the random walk and the auxiliary process, denoted by and . By Cauchy-Schwarz inequality one has
which relates the mixing of to the hitting probability of and , i.e., the probability that , where the index is suppressed as it is clear from the context.
When the group is abelian, given the choice of generators , the total variation distance is a function of alone, see [17]. When the group is not abelian, this is not the case. When is nilpotent, the auxiliary process still provides useful (albeit partial) information on . In this case, to get a full picture of the mixing of the RW, we will combine the knowledge on the auxiliary process with further information obtained through analyzing the mixing on the quotient groups separately. See Section 4.6 and 4.8 for the complete discussion.
2. Geometry of Cayley Graphs
The definition of Cayley graph of a group can be naturally extended to its quotient groups. For , the Cayley graph of , denoted by , consists of vertex set and edge set .
Let denote the graph distance on . Define
(5) |
Similarly, let denote the graph distance on . For a subgroup of , we define the diameter of with respect to the graph distance on by
(6) |
For such that (so that is a group), with slight abuse of notation, we can define the diameter of with respect to the graph distance on ,
whose definition is consistent with (6) with replaced respectively by .
We have the following triangle inequality in terms of the diameter of a group and that of its subgroup and the quotient group .
Proposition 1.
For all such that the following holds:
(7) |
Proof.
Let . By the definition of , there exists with such that , i.e., there exists such that . Hence
which concludes the proof of (7). ∎
Applying the triangle inequality in (7) iteratively leads to a decomposition of as the sum of over , i.e.,
Breulliard and Tointon [6, Lemma 4.11] showed the diameter of is at most , where is a constant depending on the size of and . In fact, they showed for all that is at most . El-Baz and Pagano [14] proved the same estimates using somewhat similar arguments. In addition, they observe that and hence one can estimate in terms of .
As discussed in Remark 1, a necessary condition for the random walk on to exhibit cutoff when is bounded is for to diverge. Consequently, as opposed to [6] and [14] which did not quantify the dependence of the constant on and , it is necessary for us to quantify this dependence. Our approach for upper bounding adheres to the framework in El-Baz and Pagano [14], but with considerably more attention devoted to quantifying the influence of as well as .
Theorem 4.
Let be a symmetric set of generators and let be such that for all . For , we have
The following comparison between and is what we will use in the proof of Theorem 2.
Corollary 2.
For any fixed , we have
when . In particular, this condition holds when .
Remark 3.
The statement above is a special case of the following more general claim: For all , if then , which holds when .
Proof.
Knowing that , it is an easy consequence of Theorem 4 that
It remains to prove that for the given range of . Using the fact from Corollary 4 we can observe that
(8) |
for . Based on (8), it suffices to show for the given range of .
To prove the key is to notice for the Cayley graph , setting , trivially we have , where is the ball of radius in and is the -dimensional lattice ball of radius . Thus It follows from Lemma E.2a in [18] that for , which implies . Hence we have .
∎
Before we turn to the proof of Theorem 4, some preliminary results that will be useful are presented in the next section.
2.1. Preliminaries
We begin by recalling some standard notation and stating several properties of commutators. For we write and . Further observe that for , . Define for as the two-fold commutator, and inductively
(9) |
Some standard properties of commutators are collected into the following propositions whose proofs can be easily found in literature, see e.g. [13], and thus are omitted. The following is a fairly well known result following from an induction argument using the three subgroup lemma.
Proposition 2.
The lower central series of a nilpotent group is a strongly central series, i.e., is a subgroup of for all .
Proposition 3.
For , the map given by is anti-symmetric and bi-linear. Namely, the following hold for all and :
Moreover, for and , if and , then we have the following linearity in the -th component, i.e.,
(10) |
where , and so
(11) |
Let be such that for all . Now define inductively
Write for . The following proposition can be proved by induction on using Proposition 3. We omit the details, and refer the reader to [14] for additional details.
Proposition 4.
Assume that generates . Then generates the Abelian group for all . In particular, generates if and only if generates .
Corollary 3.
For and any we can write
(12) |
where are functions from to belonging to the set
where the second condition means for all such that we have either or for any .
Proof.
We know from Proposition 4 that generates , i.e., for any we can express as a product of elements in . Observe that is surjective due to the definition of . Hence we can express
where corresponds to the number of times appears. Let so that by (10)
Then we can take so that
(13) |
The above expression contains a slight abuse of notation on the right hand side as is not an element of while was defined to have inputs from . By (11) we see that for any such that ,
and hence what essentially determines the value of (13) is , which clarifies the meaning of the right hand side of (13). The point of doing so is that we can identify with for some . As can be generated by , there exists some function that satisfies such that
i.e.,
This explains the first condition in the definition of .
To explain the second condition in the definition of , we observe that since for , only one of needs to appear in the expression above. Given and , for each , we choose such that simplifying the product
leads to a non-negative power . We can view as function from to such that only one of is nonzero for . It is straightforward to verify .
Finally, the proof is concluded by applying (10) with the above choice of .
∎
Corollary 4.
For ,
Remark 4.
From a technical standpoint, the second inequality above is why the term is present in the condition of Theorem 2. This inequality can be improved in various scenarios with extra knowledge on the group structure.
Proof.
By (10) and (12), for any , we can express as
for some where for all where is as in Corollary 3. By the same argument as in the proof of Corollary 3, for any given , we can take so that
That is, for any , we can define a function by , which implies is upper bounded by the number of functions from to .
Taking such that gives the desired inequality.
∎
2.2. Proof of Theorem 4
We begin with the following estimate that plays a key role in the proof of Theorem 4.
Lemma 2.1.
Let be fixed. For any and
In what follows, we first present the proof of Theorem 4 given Lemma 2.1 and then complete the proof of Lemma 2.1.
Proof of Theorem 4.
To simplify notation, we abbreviate . Recall from (5) that for . Let denote the graph distance on the Cayley graph and write for .
The first inequality follows from inductively applying (7) with and for .
We turn to the second inequality. The goal is to prove that for every ,
(14) |
By Corollary 3, in order to prove (14) it suffices to show that for any ,
(15) |
where are functions defined in Corollary 3, belonging to the set
For any and , we have . Applying Lemma 2.1 to and using the triangle inequality, we can obtain
Given the constraint , a simple application of Lagrange multipliers gives
Plugging this into the previous display gives
Summing over using the triangle inequality gives the required bound in (15) and thus completes the proof.
∎
Proof of Lemma 2.1. The following simple estimate will play a major role in our proof. For , and , we have
(16) |
This follows from the fact that
which is a simple consequence of the definition .
Note that if for some and , then by (10) we can simply write
and use (16) to conclude the proof. Otherwise, we can still try to decompose as a sum of terms of the form , which helps improving the upper bound on . In other words, our goal is to express as the product of elements of the form
To find the decomposition of we can employ a greedy procedure to search for some set for each so that . In what follows we first define and then find for . Setting and , we will define and inductively for
For such that , let
We stop at the first time when and record . Set .
For each , in order to find we proceed as follows: let
We stop at the first time when and record . Set .
Finally, we set , and .
By Propositon 3 we have
That is, it suffices to upper bound . For , it follows from (16) and the definition of that . It is easy to see that for all and thus for . Lastly, by definition, . Combining these facts gives
(17) |
We first upper bound the second term in (2.2). By definition and thus . To bound for , note that for , and thus , which implies that
i.e.,
It follows that the second term in (2.2) satisfies
(18) |
Next, we estimate in (2.2). Observe that
(19) |
Repeating the same calculation for yields that . More generally, for , since ,
Since for , , by (19) and the fact that we have
It remains to upper bound . By definition, we have . Simple calculation shows that for any ,
where the last inequality follows from the fact that for . Therefore, for and ,
Noting that and , we have
(20) |
3. Reduction to Abelianization
Let be a rate 1 simple random walk on and let be the projected random walk of onto , which is a rate 1 simple random walk on . Let denote the -mixing time of the random walk on and the -mixing time for the projected random walk on . To simplify notation we will drop the in the superscript and write instead when the choice of is clear from the context.
Since is the projection of onto the abelianization we can observe
(21) |
which implies . Naturally we are interested in the mixing behavior of in comparison to that of , that is, we hope to understand to what extent the mixing behavior of the random walk on is governed by its projection on the abelianzation group . It turns out that for a nilpotent group of bounded step and rank, when the generator set is not too large, the mixing of is completely governed by that of .
Theorem 2.
Let be a finite nilpotent group such that and be a symmetric set of generators. Suppose . For any fixed and we have
when is sufficiently large (more precisely, when ).
It is well known that the relaxation time is characterized by the exponential decay rate of the total variation distance between the talk and its equilibrium (see e.g., Corollary 12.7 in [32]). As a consequence of the proof of Theorem 2, we obtain the following characterization of the relaxation time of in terms of its projection .
Corollary 1.
Let and be the relaxation time of the walk and respectively. Then
In particular, when we have .
Before moving on to proving Theorem 2, we first explain why Corollary 1 is an easy consequence of the proof of Theorem 2 and delay its proof to the end of this section. It is useful to observe the following inequality
(22) |
where denotes the uniform distribution over the set . We will establish that in the given regime of , the second term in the above inequality is the leading order term that determines the time of mixing. Moreover, Lemma 3.1 shows this term is fully characterized by the projected random walk on .
Using the interpretation that the relaxation time is the exponential decay rate of the total variation distance, by taking power and letting on both sides of (21) and (22), we shall see that if .
Theorem 3.
Suppose is a nilpotent group with rank and step such that either (i) where or (ii) is a -group. Suppose the rank and step satisfy . For any symmetric set of generators of minimal size and any given , the mixing time is the same up to smaller order terms, and the relaxation time is the same.
Proof.
Let be a minimal size symmetric set of generators for and let . By Proposition 4 we can see that is a minimal size symmetric set of generators for . As is an abelian group, it can be expressed in the form
(23) |
where and is the rank of . In general the choice of is not unique (e.g., ), but when satisfies the assumption in the statement the expression in (23) is unique: in case (i) ; in case (ii) we can write for some with . Hence, any minimal size symmetric set that generates uniquely corresponds to , the collection of standard basis. Thus the mixing time on for any such is equal to the mixing time of the walk on (or ) with generators .
As our assumptions on and guarantees , we can apply Theorem 2 to show that the mixing time is the equal to (up to smaller order terms) the mixing time of the walk on (or ) with generators for any minimal size symmetric set of generators of . The result for relaxation time follows similarly from Corollary 1. ∎
3.1. Proofs
Lemma 3.1.
For ,
Proof.
Write . One can easily check that starting from the initial distribution , for any and , . Hence,
∎
It remains to upper bound the difference .
Lemma 3.2.
For ,
Remark 5.
The conclusion in Lemma 3.2 holds if we replace by any subgroup of .
Proof.
Let be the transition matrix of the simple random walk on . For , we can define the continuous time kernel by .
Consider the linear subspace of functions
We now show that is invariant under the transition matrix , i.e., for all . For any ,
where the second line uses the fact that is translation invariant, i.e., for any and that .
Let denote the transition matrix of the SRW on with . We can also check that for all , i.e.,
Hence, for we have that (below and are independent)
where is the norm of . Applying Theorem 4.4 in [2], which gives a comparison of Dirichlet forms for two sets of generators, we see that for the two symmetric random walks on the finite group with transition matrices and defined as before,
for all functions . That is,
and it follows by induction that
Hence, for any , recalling that , we have
(24) |
Define the function by so that
By our choice of it is easy to see that and . By the definition of one can check that . Note that it follows from Cauchy-Schwarz inequality that
Therefore, applying in (24) gives
∎
Proof of Theorem 2. The first inequality follows directly from the projection of onto . To prove the second part of the inequality, observe that by the triangle inequality
Lemma 3.1 shows that for .
It remains to prove . Recall from Corollary 2 that when . Note that as a direct consequence of Proposition 13.7 in [25], which is a simple application of the Carne-Varopoulos inequality, one has that for all ,
(25) |
It then follows from (25) and Corollary 2 that
where we recall that . We can then apply Lemma 3.2 to and get
when is large enough. The proof is then complete.
∎
4. On Cayley Graphs with Random i.i.d. Generators
We consider the random walk on the Cayley graph of a finite nilpotent group with respect to generators chosen uniformly at random. The graph is denoted by , where the generator set with . The regime of interest in this paper is . In this section, our aim is to prove Theorem 1, which we have restated below for ease of reference.
Theorem 1.
Let be a finite nilpotent group with . Let with . Assume . As , the random walk on exhibits cutoff with high probability at time , which is the cutoff time defined in Definition 3.
The structure of this section is as follows: in Section 4.1 we will give an overview of the entropic method within the context of Theorem 1 and how it is useful as a framework for proving bounds on the mixing time; in Section 4.2 we prove the lower bound on the mixing time.
Establishing the corresponding upper bound on the mixing time requires significant effort, which involves meticulous analysis of the distribution of the RW on each quotient group for . To this end we give a representation of the random walk on in Section 4.3. Subsequently, we provide an outline of the proof for the upper bound on mixing time in Section 4.4 and complete the proof in the remaining of the paper.
4.1. Entropic Method and Entropic Times
Let denote the symmetric set of generators of . We define the auxiliary process based on where is the number of times generator has been applied minus the number of times has been applied in the random walk . It is easy to see is a rate 1 random walk on . To simplify notation, we sometimes drop the time index when it is clear from the context.
Let and be two independent copies of the random walk on starting at and its auxiliary process. Denote by a subset of the state space of the walk , where the index suggests that the precise choice of (which is postponed to Definition 5) depends on the time . For now we will think of as a set of “typical” locations of such that for the relevant choice of .
We use a “modified calculation”: first conditioning on being “typical”; then using a standard calculation on the conditioned law. The proof of the following lemma is quite straightforward and hence is omitted.
Lemma 4.1 (Lemma 2.6 of [17]).
For all and all the following inequalities hold:
where denotes the law of the random walk given the generator set starting at .
Note that when is a random set of generators, is a random variable that is measurable with respect to , the -field generated by the choice of . In what comes later in our arguments, we will take the expectation over the choices of and work with
A good choice of the set of “typical” locations will greatly simplify the analysis. Hence, in the remaining of this section, we will discuss in detail the choice of , or in other words, the “typical event” that we will condition on. As our goal is to obtain an upper bound on the total variation mixing time, we will look at a time that is slightly larger than the proposed mixing time, which is the entropic times that will be defined in Section 4.1.1. Based on this choice of time we then define the typical event in Section 4.1.2.
4.1.1. Asymptotics of Entropic Times
Recall from Definition 3 that the entropic time is the time at which the entropy of the rate 1 random walk on is .
For simplicity in simplicity of notation, we will write and let the cutoff time be denoted by .
The entropy of the random walk on has been well understood. The interested reader can find a detailed exposition of the entropic times in [18]. Via direct calculation with the simple random walk and Poisson laws, one can obtain asymptotics of entropic times, see, e.g., [18, Proposition A.2 and §A.5] for full details. Here we content ourselves with restating the result of such calculation, which can be found in Proposition 2.2 in [19], that yields the asymptotic values of :
(26) |
where and is some continuous decreasing function whose exact value is unimportant for our analysis. Since we assume and , it follows from Corollary 4 that . Consequently, it is possible to have only in the regime . To be more specific, writing , in the regime we have and
The following proposition gives the asymptotics of the cutoff time for the regimes of that we are interested in.
Proposition 5.
4.1.2. Typical Event
For simplicity of notation, in this section we will drop the index of time and write and . Write , where for each , is an independent rate random walk on . For each , define to be the number of steps to the right and the number of steps to the left in the walk . It is then easy to see .
To upper bound the key quantity from Lemma 4.1, we will separate it into cases according to whether or not :
As suggested by this decomposition, bounding plays an important role in the proof. Since are two independent copies,
i.e.,
It now becomes clearer that in order to control the probability , we would like to choose so that and is sufficiently small.
For , write for the law of , the rate 1 random walk on , so that . Also write for the law of so that . Also, for each , define
Then (and respectively ) is the entropy of (and respectively ).
We need an estimate on the entropy shortly after the proposed mixing time, i.e., for , for which we refer to results in [19]. To state their results, we need to define the following quantity.
Definition 4.
Define as follows
where . Fix some such that , and set .
Remark 6.
Note that in both cases.
Lemma 3.9 in [19] proves concentration of whereas we only need one side of their estimate, which we state below.
Lemma 4.2 (Lemma 3.9 in [19]).
Assume that . Let and . Then
Based on the above discussion and Lemma 4.2, it makes sense to define the (global) typical event as follows
(27) |
Write . Let denote a realization of and the corresponding a realization of . To have better control over the behavior of each coordinate, we further define
(28) |
where . It can be observed that is defined based on when so that whp by a union bound on the coordinates. In fact, we will use only in the regime .
In the regime we have . By Poisson thinning, for each the arrivals of the generators follow an independent Poisson process with rate . Then implies that each generator is expected to appear for times in the walk . Thus in this regime we will focus on the collection of generators that appear exactly once. Define, for ,
so that is the index set of generators that appear exactly once in the realization .
Moreover, for a sufficiently small , define
(29) |
We can observe that the distribution of is so that when (which holds when and ), occurs with high probability for any .
Lemma 4.3.
.
Proof.
Lemma 4.2 implies that with high probability. The proof for when follows from standard large deviation estimation. The proof for when also follows from standard large deviation estimation. ∎
4.2. Lower Bound on Mixing Time
As projection does not increasing the total variation distance, to find a lower bound on the mixing time we can consider the projection of the original random walk , defined by , on the projected Cayley graph . As is abelian, the mixing behavior of is well understood, see [17, 19].
The key idea of proving the lower bound comes from a concentration result of the entropy, which has been proved in the literature and is restated here for the sake of self-containment.
Proposition 6 (Proposition 2.3 in [19]).
Assume that satisfies and recall as in Definition 3. Then and further, for , writing , we have
The proof of the lower bound follows from a somewhat conventional argument within the entropic methodology. Our proof is essentially a restatement of the proof in Section 3.3 of [19], which we include for the purpose of being self-contained.
Lemma 4.4.
Assume that . Let be a given generator set. For any and ,
where denotes the law of the random walk with given the generator set .
Proof.
Since the argument does not depend on the choice of we will suppress it from notation and write for .
Recall that . To argue that is a lower bound on the mixing time, first observe that in steps the support of the random walk is at most of size . When , the support has size at most and hence the walk cannot be mixed in this many steps.
Let denote the canonical projection. Consider
Based on the definition of we have . Every element satisfies for some with . Hence, for all , we have
Summing over gives
and hence . Therefore,
which completes the proof.
∎
4.3. Representation of
Let be the number of steps taken by the continuous time rate 1 random walk on the group . For , we can write (the increment of) the -th step taken by as , where and . Then we can express , and similarly , where are independent random variables defined analogously. Note that for , for .
We are interested in the event , which is equivalent to , where
(30) |
It is easy to see the law of is the same as that of a rate 2 simple random walk on the same Cayley graph . In fact, for simplicity of notation we will write (30) as
(31) |
where and for .
Our goal is to express (and analogously ) in a way that the role of (and analogously ) is understood. If is Abelian we can simply rearrange the sequence to obtain . Although we do not have this nice and simple relation between and when is not abelian, we can still rearrange the terms in (30) and pay the price of adding an extra commutator, i.e., for we can rewrite as . For this reason, in some of our analysis we also care about the specific order in which each generator appears in , which is why we will sometimes also refer to as a sequence to emphasize this perspective. To be more specific, when we refer to as a “sequence” we are referring to the corresponding .
More generally, for , consider as an example the element . In order to express in our desired form we can rearrange the terms in the sequence as follows
(32) |
We can see from this example that rearranging the whole sequence will result in commutators of the form , which was defined in (9).
In terms of the sequence , it will become clear later that we actually only need to keep track of the two-fold commutators of the form . Hence, we will write and rearrange the terms in (31) to obtain the following expression
(33) |
where, letting denote the -th generator and its sign in the sequence in (31),
(34) |
and is the residual part as the result of the rearranging. To give a more specific description of , define to be the collection of commutators of . A multi-fold commutator of refers to a term of the form with where for all , which is not simply a two-fold commutator of the form . A multi-fold commutator consisting of pairs of brackets is said to be -fold. For example, see (4.3) where is a 3-fold commutator and is a 5-fold commutator.
It will become clear in our later arguments that the specific order in which terms appear in is of no interest to us, and thus with some abuse of language we will sometimes refer to as a polynomial with terms that are multi-fold commutators of .
4.4. Upper Bound on Mixing Time
In this section we prove the upper bound on the mixing time, for which the precise statement is presented below. Recall that denotes the law of the random walk given the generator set starting at .
Theorem 5.
Let be a nilpotent group with . Let with and assume . For any and , we have with high probability.
Notice that when this theorem follows directly from Theorem 2 and hence in the proof we will focus on the regime .
We will first define the notation that will be used throughout this section and explain why it is useful.
Definition 6.
For each , define and let denote the rank of . For each , we will choose a set such that and .
As
we will be interested in events related to . In other words, we will decompose with respect to the quotient groups and derive simplified expressions to make the distribution of more tractable.
It will be tremendously useful if we can express as a product of elements belonging to each layer of . Indeed, the following lemma asserts that we can construct each generator as a product of independent random variables .
Lemma 4.5 (Corollary 6.4 in [17]).
Let be independent and such that . Then . Moreover, are i.i.d. uniform over for .
4.4.1. Proof Framework for Theorem 5
As suggested by Lemma 4.1, in order to control the total variation distance to stationarity we will upper bound , where we also average over the choice of . Letting , can be further decomposed with respect to :
(35) |
Again, for simplicity of notation we will suppress the dependence of the mixing times on and , and write and . It follows easily from (35) that in order to prove Theorem 5 it suffices to prove the following two results.
Proposition 7.
For any and , we have
Proposition 8.
For any and , we have
The proofs of both Proposition 7 and 8 rely on the analysis of the events . In the case where , writing , one will see in Section 4.6 that the analysis boils down to understanding the distribution of . When the analysis is significantly more involved, as in this case (33) turns into
and hence we need to carefully understand the distribution of the product of commutators. The proof of Proposition 8 is therefore considerably more involved, prompting us to provide an outline here that captures the main steps.
Proof outline of Proposition 8. To give an intuitive and brief explanation on why Proposition 8 is true, we will give the outline of the proof of Proposition 8 here. We will analyze for each when .
The event is already guaranteed to occur if we condition on . In general, in order to understand each event for , we hope to show that for all , is close to being uniform over . To analyze the distribution of for each , we observe from the representation of given in (33) that , which is defined in (34), plays a crucial role in determining the distribution.
As we will later show in Section 4.8.3, after simplifying for (the case where is somewhat different and will be discussed separately in Section 4.8.4) one will eventually arrive at a term of the form
where satisfies and is a subset to be chosen, see (4.8.3). The terms explicitly indicate the role of . More specifically, since is a collection of independent uniform random variables, Lemma 4.12 below asserts that the distribution of is uniform over the subgroup generated, denoted by . As we would like to be close to being uniform over , the choice of will be made so that the set takes up a sufficiently large fraction of .
The above discussion suggests that we need to specify some conditions on to guarantee the existence of such an index set and thus desired behavior of . These conditions are summarized in Definition 10 as a “good” event . As a result, we will obtain the following estimate. Recall the definition of from Definition 4.
Proposition 9.
.
Lastly, we would like to be an event that occurs with sufficiently high probability, i.e.,
Proposition 10.
.
Proof of Proposition 8 given Proposition 9 and 10. It follows from the definition of and a direct calculation that
That is,
(36) |
∎
The remaining of this section is organized as follows: in Section 4.5 we define filtrations that will serve as useful tools in the subsequent proofs; in Section 4.6 we prove Proposition 7; in Section 4.7 we present some preliminary results on groups that are useful in the proof of Proposition 8. As discussed above, the proof of Proposition 8 is divided into two parts: Proposition 9 will be proved in Section 4.8 and Proposition 10 in Section 4.9.
4.5. Useful Filtrations
Recall that the random walk can be written as a sequence
where is the number of steps taken by by time and denotes the -th step taken by the random walk. Similarly, we write . Recall from Lemma 4.5 the decomposition for .
We will define the following -fields and events that will be useful in later analysis.
Definition 7.
(i) Let be the -field generated by , i.e., contains information on the sequences , other than the identities of .
(ii) For each , let . Let be the trivial -field.
(iii) For , let .
To make the intuitive picture a bit clearer, note that there are mainly two sources of randomness:
(i) At every step a generator is randomly chosen and applied, resulting in the random order of the sequences (encoded in );
(ii) For each , the specific choice of the generator is random (encoded in ).
Remark 7.
The following random variables are measurable with respect to : , and .
As preparation for later discussion, we first clarify the measurability of events of interest to us.
Lemma 4.6.
For ,
(i) is measurable with respect to .
(ii) is measurable with respect to .
Proof.
Recall from (33) we can write
where is a product of multi-fold commutators of as a result of the rearranging. (See the end of Section 4.3 for a detailed description of and multi-fold commutators.) By Lemma 4.5, for each , can be decomposed into . For and , we will write .
Applying the decomposition to the sequence above yields
(37) |
where is a certain polynomial with terms that are -fold commutators of for , satisfying that
The reason that we restrict our attention above to is because by Proposition 2 any -fold commutator with that involves is in . Also note that we can exchange the order of and for any as by Proposition 2.
On the right hand side of (4.5) only involves and hence is measurable with respect to . Therefore, is measurable with respect to , proving (ii).
4.6. Proof of Proposition 7
Linear combinations of independent uniform random variables in an abelian group are themselves uniform on their support. As preparation for the proof we present the following lemma, which is a restatement of Lemma 2.11 and Lemma 6.5 of [17].
Lemma 4.7.
Let . Let be an Abelian group and . For , write and define . We have
Consequently,
(40) |
Applying Lemma 4.7 to and for and writing gives
(41) |
The key to the proof of Proposition 7 is the following estimate, whose proof uses the simplified expression of derived in the last section.
Lemma 4.8.
We have
Proof.
Recall from the discussion in the proof of Lemma 4.6 that is measurable with respect to for (where is the trivial -field). Since , it follows from (39) that for ,
where the last line follows from rewriting the term in the form of summation, as is abelian. It then follows from (40) in Lemma 4.7 that
where the last equality follows from observing that is independent from . It then follows from the tower property that
(42) |
Applying (4.6) iteratively for , we obtain
As is measurable with respect to , i.e., , by the tower property we have the desired result
∎
Given Lemma 4.8, we can obtain the following upper bound involving the greatest common divider.
Lemma 4.9.
Let and . We have
Proof.
Lemma 4.10.
Fix any and let for . For all and all , there exists a constant that depends only on such that
where is defined as in (26).
Proof.
Regime: . Let denote a rate 1 Poisson process. Recall that is a rate simple random walk on . For a random walk to return to the origin it must have taken an even number of steps, which means
To simplify notation, write from now on. Using a Chernoff bound argument on the Poisson random variable yields for
To prove the second part of the upper bound, note that for to occur, there must be some such that and divides , which implies that . Hence,
Regime: . Define . Since , we have for some constant depending on . So we can fix some small and some large such that . That is, for all ,
Thus there is some so that
Therefore, for ,
Regime: . Note that
For the second term it follows from Lemma 2.14 of [17] that . As is a one dimension SRW with rate , Theorem A.4 of [18] implies that when we have
Hence,
∎
Lemma 4.11.
Suppose . For any , when we have
Proof.
Again we will write .
Regime: . Observe that . On we have is at most , which gives
Let be sufficiently small. For , applying Lemma 4.10 gives
For , we use the bound to get
Therefore,
as and .
Regime: . In this regime . It follows from Lemma 4.10 that there exists such that
Regime: . In this regime and thus for there are two regimes of to be discussed: and . When , by the same argument as before we can show .
It remains to treat the case where . When , we apply the bound in Lemma 4.10 to show
When , we apply the bound in Lemma 4.10 to show
∎
4.7. Preliminary Results on Groups
Throughout this paper, the notation denotes the subgroup generated by the set of elements . Recall from Definition 6 that are the representatives of .
Lemma 4.12.
For fixed and with , we have
Proof.
Lemma 4.13.
Let be fixed and be a prime. Suppose are i.i.d. uniform random variables over . We have
Proof.
Let for and . On the event we know . Also note that for . It follows that
∎
Lemma 4.14.
Let be a subset of . For any ,
Proof.
For simplicity of notation, write and then let . Since is abelian, it can be expressed in the form
for some where is the rank of . This decomposition allows us to see that . As a consequence,
Note that
We then have
∎
Let be a non-trivial commutative ring with identity, and let be a matrix over . For a maximal ideal of , let be the natural homomorphism. Let be defined as for . The following result comes from the theory of matrices over a commutative ring and it is stated in [7, p. 259].
Proposition 11.
If is a homomorphism, then is surjective if and only if for every maximal ideal of , the map is surjective, where .
We will be interested in where is a prime. The unique maximal ideal of is . Then .
Lemma 4.15.
Let and where . Let denote a matrix over . If is surjective, then has independent entries that are uniform over .
Proof.
Define a surjective group homomorphism by where . Since is surjective, for any there exists some such that . It follows that for any
where the last equality uses the fact that is surjective. ∎
4.8. Proof of Proposition 9
4.8.1. Definitions
To prepare for the proof of Proposition 9, we first introduce several useful quantities and describe the selection of the good event and index set , the reason for whose definition will become clearer as we proceed with the proof of Proposition 9.
To simplify notation we define the following matrix.
Definition 8.
Recall the definition of from (34). Define
(44) |
Definition 9.
Let be a matrix with entries in , and let denote a subset of indices which is known when is given. For , we define , and respectively , to be an element in that satisfies
and respectively
In particular, for , define and .
Since by definition
with slight abuse of notation we will write
(45) |
so that we can write, again with slight abuse of notation,
which is the product of the matrix and the vector .
Let
(46) |
For a submatrix of , define the matrix , i.e., for the -th entry in . Note that is a matrix over the field and hence its row rank is defined as the number of linearly independent rows in the matrix.
Definition 10.
A fixed matrix is said to be good if it satisfies the following two conditions:
-
(i)
There exists a set such that
(47) -
(ii)
Let . For each there exists a submatrix of such that has rank over the field , where as above .
Define and let be the corresponding subset of indices satisfying (i).
Note that by definition both and are measurable with respect to .
4.8.2. Outline of Proof
To prove Proposition 9, we derive an upper bound on inductively using the following proposition.
Proposition 12.
Let be measurable with respect to . For , letting
we have
Applying Lemma 4.14 to gives
(48) |
Our goal is to choose properly so that is sufficiently large compared to . To guarantee such a choice of exists, we further define a “good” event , see Definition 10, which is measurable with respect to and occurs with high probability.
Note that and are measurable with respect to . By the tower property of conditional expectation and the fact that for , Proposition 12 leads to
which implies
Combined with (48), the above yields
(49) |
4.8.3. Proof of Proposition 12
The key to proving Proposition 12 is the simplification of . The analysis in this section is somewhat similar to that in Section 4.5, where we obtained a simplified expression of when . However, when the result of simplification is quite different. Instead of a simple quantity of the form (see Lemma 4.8), we now have to deal with an expression involving commutators of .
Recall from Definition 7 the definition of , and . Define
(50) |
which comes from the right hand side of (4.5).
Lemma 4.16.
On we have
where is measurable with respect to and is a polynomial whose definition will be clarified in the proof.
Proof.
Recall from Lemma 4.6 that is measurable with respect to . On the event we can write
(51) |
where is defined analogously to in (4.5). Since all terms in are present in we can express where is the polynomial that comes from excluding all the terms in from .
We can rewrite (51) as
(52) |
Note that , as every -fold commutator with in must involve a for some . It follows from the proof of Lemma 4.6 that on . Furthermore, it is easy to see that both and are measurable with respect to as they only involve terms in .
Since is abelian, we can equivalently write (52) in terms of addition and obtain the desired expression. ∎
Proof of Proposition 12. For simplicity of notation, let be such that
It follows from Lemma 4.16 that on and hence
(53) |
Let denote the -field generated by . Observe that
(54) |
where the second-to-last line is known under and thus will be denoted by . It remains to consider the third-to-last line, i.e.,
(55) |
where is as in Definition 9, i.e., is such that .
Lemma 4.12 shows that is uniform over , which is measurable with respect to since are measurable with respect to . It turns out that if we further condition on the -field , one has that for any ,
(56) |
Therefore, we can bound (4.8.3) from above by
(57) |
where means we are taking the expectation over . Combining (4.8.3), (4.8.3) and (57) then yields the conclusion of Proposition 12, i.e.,
∎
4.8.4. Proof of Proposition 9
We begin by addressing the last term in (4.8.2). Recall the definition of from Definition 9. Recall that is measurable with respect to .
Lemma 4.17.
Let . Then
Proof.
The last ingredient needed to complete the proof of Proposition 9 is the following estimate whose proof will be delayed till Section 4.8.5.
Proof of Proposition 9. Recall that so that are measurable with respect to . Recall that . Noting that , applying the tower property to (4.8.2) gives
where the last line follows from Lemma 4.17. Then
We can bound the last term above by Hölder’s inequality and Lemma 4.18,
That is,
Taking expectation over on both sides and letting , we have
where is defined as in Definition 4 and by definition . ∎
4.8.5. Proof of Lemma 4.18
Proof.
The proof of the two inequalities is essentially the same. Without loss of generality we only prove the first inequality.
We can write
(59) |
where are distinct primes and is a Sylow -subgroup of . Each has the form
and hence
(60) |
where we can observe that .
Since , for each with , can be represented in the following form
for a collection of independent random variables such that is uniform over . With slight abuse of notation, we will write
Based on this we can further write
where is an element in . Under the -field , the coefficients are known. Hence the collections are independent for different ’s, and we can consider the generation of each subgroup by separately, i.e.,
(61) |
where as above .
For any , we will argue that on the event there exists a set of indices such that for any , is a collection of i.i.d. uniform random variables over , so that one can simply apply Lemma 4.13 to the right hand side of (61) to obtain the desired conclusion.
By Definition 10, on the event there exists a submatrix of such that has rank . We will collect the column indices of into the set (let if does not occur). Then we can define the -field , and express the vector as a sum of two parts, one of which unknown under whereas the other known:
where is a function of whose value is known under . Proposition 11 shows that is a surjective map from to . Lemma 4.15 further implies that the entries of are i.i.d. uniform over . It is then straightforward to see that are i.i.d. uniform over given . That is, for any , we have
4.9. Proof of Proposition 10
In the proof of Proposition 9 we have conditioned on the “good event” that guarantees the existence of a subset of indices that plays a critical role in the proof. In this section we aim to prove that indeed will occur with a sufficiently high probability.
4.9.1. Regime:
Recall the definition of from Definition 4. In this regime we will work with the unconditional probability and prove the following stronger bound than that in Proposition 10:
(62) |
Indeed, with (62) and this yields the statement in Proposition 10 as follows
First we will specify the choice of in this regime and verify this choice satisfies (47) in Definition 10,
(63) |
Note this choice purely depends on and hence is measurable with respect to . To see that are conditionally independent from given , we observe that for any ,
where due to the definition of , see Lemma 4.7.
Furthermore, we can see that involves terms in , whereas involves only terms in (since , in the second sum the condition leads to ). Therefore, conditioned on , we have that is independent from . By the same reasoning we also have that is independent from . With the information combined, we can see that conditionally on , for any , is independent from and uniform over .
Noting that
by the same reasoning as above we have that is independent from and . Again by the definition of we have . That is, conditionally on , for any , is independent from and uniform over . Therefore we have verified the condition that are independent from in Definition 10.
Given the choice of in (63), for each we can define to be the event that there exists a submatrix of such that has rank . Observe that and hence
Recall that our goal is to show , where is as in Definition 4. As , it suffices to prove . In this regime we can prove a much stronger estimate.
Proposition 13.
When , for any ,
In order to prove Proposition 13 we begin by discussing a useful property, which we refer to as the relative independence of the coefficient matrix , that holds in the regime .
Relative independence in . Recall that in this regime . We will consider and hence , i.e., in both of the random walks and each generator in should typically appear many times. As a result, we will see certain “relative independence” among a subset of terms in . Before explaining the meaning of relative independence we need to define some notation.
We can view as a sequence of generators, and express this sequence by . In the following construction, we will obtain partial information on while conditioning on . That is, at this stage of construction we will treat as known.
Let be a subset of indices with size , where is to be chosen later. We will denote by the subsequence of consisting of only generators in and let denote the length of , which follows a Poisson distribution with rate by Poisson thinning.
Based on the subsequence , we will construct a collection of disjoint sets as follows:
-
(1)
Partition into disjoint pairs, each pair consisting of the and -th elements in .
-
(2)
For with , we look at the -th pair in for each . If the two corresponding generators in the -th pair are for some (where are known) regardless of the order in which they appear, then we record the pair of locations in the corresponding set .
Note that in the construction of , we have no knowledge on the exact order of the pair, that is, we only know the pair is either or with equal probability.
The subset is said to be nice if for all with . As shown in the following lemma, is nice with high probability when we pick the right size .
Lemma 4.19.
The probability that the subset is nice is at least for some positive constant independent of .
Proof.
Since , by a Chernoff bound argument
for some constant . Since each pair in the subsequence belongs to some independently with probability ,
where the third line follows from a Chernoff bound. ∎
The reason to look at the collection is the following simple observation: when revealing the relative order of a pair of in , if we have then it leads to an increment of on the value of ; if instead we have then the resulting increment on is 0. Taking account of the fact that for different pairs in , the corresponding ’s are i.i.d. random variables uniform in , we can expect diffusive behavior that “smoothes out” the probability of taking a certain value.
Next we will translate this intuition into a rigorous proof. We first define the following quantity:
(64) |
It is well known that is non-increasing with respect to and there exists some such that for all .
Lemma 4.20.
Let for some . For any prime ,
Proof.
The proof follows from the idea used in Lemma 2.14 of [17]. It is easy to see that any distribution on whose probability mass function is non-decreasing can be written as a mixture of distributions, for different .
For an even the binomial distribution is known to be unimodal, i.e., the mode is unique. When is odd, the maximum of is achieved at two adjacent values. Without loss of generality, let .
Letting , it is easy to see the map is non-increasing on and hence we can write
for some random variable whose law is insignificant to us. As a consequence, when for ,
That is, for any , .
When , by the same reasoning we have the following bound
Therefore,
∎
Now we are ready to state the meaning of “relative independence” in the terms of . The following lemma implies that conditioned on being nice, for any , the collection (respectively, ) is stochastically dominated by i.i.d. Bernoulli random variables with probability (respectively, ).
Lemma 4.21.
Let . For , let . Then for any we have that
(65) |
and furthermore,
(66) |
Proof.
For with , we first recall from (34) the definition of and observe that is in fact determined purely by the subsequence . When referring to as a subsequence, we essentially use the information given by , where denotes the index of the -th term in and its sign. Hence we can write
(67) |
With a slight abuse of notation, let denote the collection of locations in that are not recorded in any ’s. Let (we reveal all the signs regardless of the value of ) and define
From now on, we will condition on the -field . By conditioning on , we are treating all the signs as known and revealing the indices of generators that are not in any ’s. Hence, by (67) can be written as the sum of two parts, one independent from and the other known under :
(68) |
where represents the cumulated increment from the pairs of that are not in , which is known under . We can further observe that the collection of random variables from the first part of (68) are i.i.d. and independent from .
Let . (Note that and are measurable with respect to .) We can then define two independent binomial random variables , so that the increment on resulted from revealing the orders of the pairs in is given by , where . It follows from (68) that, for any ,
(69) |
Hence we have the desired upper bound for a given pair of .
Now note that for different pairs with and , by our construction their corresponding and are disjoint. Hence is a collection of independent random variables. Carrying out a calculation similar to (4.9.1) using leads to
Similarly, by Lemma 4.20
∎
Proof of Proposition 13. Recall that . Let be an even integer to be chosen later. Partition into subsets , each of size , and omit the rest of the generators. Without loss of generality assume is even. Similarly we partition into subsets of size . Let . Since the arrivals of generators whose indices are in disjoint sets are independent, we can try independently for times to search for a submatrix in such that has rank .
For each we perform the following trial. Let be as in (63). Since we will only be looking at , we need to determine if is large enough to begin with. If , we will look for the desired submatrix in . We will look at a batch of indices in at a time, which will be denoted by . Let denote the first half of and the second half. Our goal is to search for column indices in , which will be labelled , such that the submatrix induced by the rows and the columns of satisfies the condition that has rank .
We will describe the steps of the -th trial now:
-
(1)
If proceed to the search of submatrix . Otherwise declare failure for this trial.
-
(2)
We look for the first such that and set it as . If there is no such declare this trial to be a failure.
-
(3)
The search will then proceed iteratively: for , given the choice of , we will look for the first such that the last row is not in the vector space spanned by the previous rows . If there is no that works, then we declare failure for this trial.
-
(4)
The trial is a success if we have found .
It remains to estimate the success probability for each trial, i.e., upper bound the failure probability in each step of the trial.
Step 1. For any , to upper bound the probability that , we will reveal the orders in . (By the relative independence in , we can still control the failure probability of our later search in given that has been revealed.)
Note that
For all , let (and respectively ) denote the number of times the generator appears in (and respectively ). Recall that the arrivals of each generator can be viewed as an independent Poisson process with rate . Letting
we have
Further observe that by (67) and hence . This implies that when , for to occur the only possible case is when for all . That is,
By Lemma 4.19 it is easy to see when , which will be guaranteed by our choice of . Recall that . It follows from Lemma 4.21 that
when . Conditioned on being nice, by the relative independence the collection is dominated by i.i.d. Bernoulli random variables with probability . Since has size ,
(70) |
Step 2 and 3: the search for with . Since we can try at least batches of indices in . These trials are not exactly independent, but using the relative independence for we can upper bound the probability that all trials are failures.
We begin by estimating the failure probability for a single trial. Consider the -th trial where we will look for the candidates in . Write as the corresponding row indices in this trial. The probability that we fail to find a is , which, by Lemma 4.21, satisfies
For , suppose we have found such that the matrix induced by and has linearly independent rows. If a candidate fails, it means that the new row
is in the vector space spanned by previous rows . Since by assumption the matrix induced by and has independent rows, there exists a unique linear combination such that
and thus the last column needs to satisfy
Therefore, by Lemma 4.21 the failure probability for a candidate is at most
The relative independence in implies that the probability of failing to find in the set is at most . Through a simple union bound we see that the batch fails with probability at most
(71) |
Combining all these failure probabilities, i.e., Lemma 4.19, (70), (71), and using the fact that are independent for the disjoint index sets , we have
(72) |
by the elementary inequality for .
Recall that in the currently considered regime. Our choice of should satisfy and so that . It was also required that right before (70). Furthermore, we will choose satisfying and so that the first term in (4.9.1) is for any . To control the second and third terms in (4.9.1), the choice of also needs to satisfy .
We choose for some sufficiently small , which satisfies all the conditions listed above. Consequently, for any ,
which leads to Proposition 13. ∎
4.9.2. Regime:
In this regime and we no longer have the relative independence in , so we will take a different approach.
Recall that tracks the number of times appears in . We will only look at these set of generators
that appear exactly once in . In this regime, we will be conditioning on , see Definition 5 for the definition of in this regime. Denote by the subsequence of that contains only generators in . For simplicity of notation, for any we will assume that (instead of ) appears in . Otherwise we can just relabel as a new . We also want to emphasize that only the order in which appear in is important in the following argument, and the values of are not.
By (67) it is easy to see that on the event we have for since only appear once in both and . Basically, if appears in the same order in as they do in then we have , otherwise .
Our goal is to look for an upper triangular submatrix of that has full rank . Indeed, since is upper triangular and all the entries of are in , if has full rank then for all , also has full rank over the field . The proof is based on a combinatorial argument where we interpret in terms of the relative order of in and .
Let be a set of distinct colors. Let be a positive integer to be determined later and . For , we will color the -th to the -th generator that appears in as color . In other words, the first generators in have color , followed by generators of color and so on. Let denote the set of generators in that are colored . Our coloring scheme implies that in the sequence , for any , any generator belonging to is in front of any generator belonging to . In order to understand it remains to determine the relative orders of in .
Note that has the distribution of a uniform permutation of , which means we can construct this subsequence by inserting the generators to an existing sequence uniformly at random, one by one. Write for . In order to construct , we first sample a uniform permutation of and denote it by . Without loss of generality, we can label them as . Next we will insert the generators from into this sequence, one by one. Once we are done with inserting the generators in , we proceed to insert the generators from and so on. Since all the generators are inserted at random and one by one, the resulting sequence will be a uniform permutation of the generators in .
Given the sequence , we can define a collection of good events . As an example, we will first define . For any , if is inserted between and for some (let if is inserted in front of whereas let if it is inserted behind ), then
We will let be the event that in the sequence there are at least distinct pairs from the set such that there is at least one generator from that is between them. The reason for this definition is that if occurs we can collect the first elements in
as the row indices of and the corresponding ’s as the column indices. This choice leads to an upper triangular matrix where the top right entries are . Consequently, the induced matrix is still an upper triangular matrix that has full rank . This argument can be most easily explained through an example.
Example. For simplicity we assume . Let so that there are 2 colors. In particular, are in color 2 while are in color 1. Suppose when constructing we first sample a random permutation of and get , and then inserting to the current sequence, obtaining as a result
It is easy to see that as a consequence of inserting (of color 2) in between and (of color 1) in , the relative order of in is different from that in and hence . Moreover, we can obtain an upper triangular submatrix
In general, for we can define to be the event that in the sequence there are at least distinct pairs from the set
such that there is at least one generator from that is between them. If any of the events occurred we would be able to find a submatrix satisfying our condition and the set (from the definition of in Definition 10) by collecting the corresponding row indices of . Hence, the corresponding column indices of are in , a fact we shall now use to verify condition (47) from the definition of . One can easily check that for any ,
where the first term is uniform in and independent from . Therefore, condition (47) is satisfied for this choice of and occurs for all , i.e., occurs as long as occurs. Therefore, it remains to upper bound
For , let be the -field that encodes relative orders of generators in in . Then for ,
The key is to observe that is independent from . Applying this iteratively gives that
We will calculate for . Looking at the distribution of the subsequence is equivalent to looking at the subsequence and inserting the generators in randomly and one by one to this subsequence. This perspective allows us to calculate via a multi-type urn scheme.
The subsequence has length and we are inserting the generators from . If we view the gap between two generators in the existing sequence as a distinct type (also taking into account the gap before and behind all generators), then we would have types and there is one ball of each type in the urn. We will be conducting steps. At each step, we choose a ball randomly from the urn and place the ball together with a new ball of the same type back to the urn. It is not difficult to see this urn scheme is equivalent to inserting elements randomly to a sequence.
Our goal is to understand the probability of the event
which is the event equivalent to in the urn model and hence has the same probability. This calculation can be simplified by first fixing the types each with at most one ball from all types and then group the rest types together to form a new type, called type 0. Then we have a new urn model with types (coming from the types and the new type 0) starting with balls of type 0 and one ball of each remaining type. The probability that we are only going to choose balls of type 0 (thus in the original urn model there are at most types that can potentially have at least 2 balls) is
Therefore,
Finally, we have
where the second product is a telescoping product and hence can be simplified to yield
(73) |
The key is to understand . By stirling’s formula we have
We collect the first terms as
for some constant . Recall that . We will choose which implies that
(74) |
and consequently, for arbitrarily small , when is sufficiently large we have
(75) |
-
•
In the regime , the above implies that . Since we have
as long as is a sufficiently large constant.
-
•
For the regime , we have . By typicality, . Recall that we write .
-
–
When and , we can choose so that . Recall that in this regime , i.e., , and where .
Letting and , the failure probability given by (75) is at most
where in the second to the last inequality we use the fact that
-
–
When and , we have and similarly the failure probability is at most
where the last inequality holds because and .
-
–
References
- [1] David Aldous and Persi Diaconis. Shuffling cards and stopping times. The American Mathematical Monthly, 93(5):333–348, 1986.
- [2] Nathanaël Berestycki. Mixing times of markov chains: Techniques and examples. Alea-Latin American Journal of Probability and Mathematical Statistics, 2016.
- [3] Nathanaël Berestycki, Eyal Lubetzky, Yuval Peres, and Allan Sly. Random walks on the random graph. The Annals of Probability, 46(1):456–490, 2018.
- [4] Charles Bordenave, Pietro Caputo, and Justin Salez. Cutoff at the “entropic time” for sparse markov chains. Probability Theory and Related Fields, 173:261–292, 2019.
- [5] Charles Bordenave and Hubert Lacoin. Cutoff at the entropic time for random walks on covered expander graphs. Journal of the Institute of Mathematics of Jussieu, 21(5):1571–1616, 2022.
- [6] Emmanuel Breuillard and Matthew CH Tointon. Nilprogressions and groups with moderate growth. Advances in Mathematics, 289:1008–1055, 2016.
- [7] Wai-Sin Ching. Linear equations over commutative rings. Linear Algebra and its Applications, 18(3):257–266, 1977.
- [8] Don Coppersmith and Igor Pak. Random walk on upper triangular matrices mixes rapidly. Probability theory and related fields, 117:407–417, 2000.
- [9] Persi Diaconis and Robert Hough. Random walk on unipotent matrix groups. In Annales scientifiques de lÉcole normale supérieure, volume 54, 2021.
- [10] Persi Diaconis and Laurent Saloff-Coste. Moderate growth and random walk on finite groups. Geometric & Functional Analysis GAFA, 4:1–36, 1994.
- [11] Carl Dou and Martin Hildebrand. Enumeration and random random walks on finite groups. The Annals of Probability, 24(2):987–1000, 1996.
- [12] Carl CZ Dou. Studies of random walks on groups and random graphs. PhD thesis, Massachusetts Institute of Technology, 1992.
- [13] David Steven Dummit and Richard M Foote. Abstract algebra, volume 3. Wiley Hoboken, 2004.
- [14] Daniel El-Baz and Carlo Pagano. Diameters of random cayley graphs of finite nilpotent groups. Journal of Group Theory, 24(5):1043–1053, 2021.
- [15] J Ellenberg. A sharp diameter bound for upper triangular matrices. Senior Honors Thesis, Department of Mathematics, Harvard University, 1993.
- [16] Jordan S Ellenberg and Julianna Tymoczko. A sharp diameter bound for unipotent groups of classical type over . 2010.
- [17] J Hermon and S Olesker-Taylor. Cutoff for almost all random walks on abelian groups (2021). arXiv preprint arXiv:2102.02809, 2021.
- [18] Jonathan Hermon and Sam Olesker-Taylor. Supplementary material for random cayley graphs project. arXiv preprint arXiv:1810.05130, 2018.
- [19] Jonathan Hermon and Sam Olesker-Taylor. Cutoff for random walks on upper triangular matrices. arXiv preprint arXiv:1911.02974, 2019.
- [20] Jonathan Hermon and Sam Olesker-Taylor. Further results and discussions on random cayley graphs. arXiv preprint arXiv:1911.02975, 2019.
- [21] Jonathan Hermon and Sam Olesker-Taylor. Geometry of random cayley graphs of abelian groups. arXiv preprint arXiv:2102.02801, 2021.
- [22] Martin Hildebrand. Random walks supported on random points of . Probability Theory and Related Fields, 100:191–203, 1994.
- [23] Martin Hildebrand. A survey of results on random random walks on finite groups. 2005.
- [24] Robert Hough. Mixing and cut-off in cycle walks. Electronic Journal of Probability, 22(none):1 – 49, 2017.
- [25] Russell Lyons and Yuval Peres. Probability on trees and networks, volume 42. Cambridge University Press, 2017.
- [26] Evita Nestoridi. Super-character theory and comparison arguments for a random walk on the upper triangular matrices. Journal of Algebra, 521:97–113, 2019.
- [27] Evita Nestoridi and Allan Sly. The random walk on upper triangular matrices over . arXiv preprint arXiv:2012.08731, 2020.
- [28] Igor Pak et al. Two random walks on upper triangular matrices. Journal of Theoretical Probability, 13(4):1083–1100, 2000.
- [29] Yuval Peres and Allan Sly. Mixing of the upper triangular matrix walk. Probability Theory and Related Fields, 156(3-4):581–591, 2013.
- [30] Yuval Roichman. On random random walks. The Annals of Probability, 24(2):1001–1011, 1996.
- [31] Richard Stong. Random walks on the groups of upper triangular matrices. The Annals of Probability, pages 1939–1949, 1995.
- [32] EL Wilmer, David A Levin, and Yuval Peres. Markov chains and mixing times. American Mathematical Soc., Providence, 2009.
- [33] David Bruce Wilson. Random random walks on . Probability Theory and Related Fields, 108:441–457, 1997.