Cutoff phenomenon for the warp-transpose top with random shuffle
Abstract.
Let be a sequence of non-trivial finite groups. In this paper, we study the properties of a random walk on the complete monomial group generated by the elements of the form and for . We call this the warp-transpose top with random shuffle on . We find the spectrum of the transition probability matrix for this shuffle. We prove that the mixing time for this shuffle is . We show that this shuffle exhibits -cutoff at and total variation cutoff at .
Key words and phrases:
random walk, complete monomial group, mixing time, cutoff, Young-Jucys-Murphy elements2020 Mathematics Subject Classification:
60J10, 60B15, 60C05.1. Introduction
The number of shuffles required to mix up a deck of cards is the main topic of interest for card shuffling problems. It has received considerable attention in the last few decades. The card shuffling problems can be described as random walks on the symmetric group. The generalisation by replacing the symmetric group with other finite groups is also a well-studied topic in probability theory [24]. A random walk converges to a unique stationary distribution subject to certain natural conditions. For a convergent random walk, the mixing time (number of steps required to reach the stationary distribution up to a given tolerance) is one of the main topics of interest. It is helpful to know the eigenvalues and eigenvectors of the transition matrix to study the convergence rate of random walks. In general, convergence rate questions for finite Markov chains are useful in many subjects, including statistical physics, computer science, biology and more [22].
In the eighties, Diaconis and Shahshahani introduced the use of non-commutative Fourier analysis techniques in their work on the random transposition shuffle [8]. They proved that this shuffle on distinct cards has total variation cutoff (sharp mixing time; the formal definition of cutoff will be given later) at . The upper bound estimate in this case mainly uses the fact that the total variation distance is at most half of the -distance, which relies on the spectrum of the transition matrix. After this landmark work, the theory of random walks on finite groups obtained its own independence, its own problems and techniques. Afterwards, some other techniques have come to deal with random walks on finite groups (viz. the coupling argument [1], the strong stationary time approach [2, 3]). However, the spectral approach became a standard technique for answering mixing time questions for random walk on finite groups [5]. For a random walk model with known cutoff, the natural question appears on how the transition occurs at cutoff. In 2020, Teyssier [27] studied the limit profile for the random transposition model; providing a precise description of the transition at cutoff. Later Nestoridi and Olesker-Taylor [20] generalised Teyssier’s result to reversible Markov chains. In recent times, Nestoridi further develops the previous result by studying the limit profile for the transpose top with random shuffle (the shuffling algorithm is given in the next paragraph) [19]. This present paper focuses on obtaining the sharp mixing time; the limit profile computation will be considered in future work.
Our model is mainly inspired by the transpose top with random shuffle on the symmetric group [10]. Given a deck of distinct cards, this shuffle chooses a card from the deck uniformly at random and transposes it with the top card. This shuffle exhibits total variation cutoff at [5, 6]. The transpose top with random shuffle has been recently generalised by the author to the cards with two orientations known as the flip-transpose top with random shuffle on the hyperoctahedral group [13]. The flip-transpose top with random shuffle on has total variation cutoff at . In the extended abstract [11], the author has introduced a generalisation of the flip-transpose top with random shuffle to the complete monomial group , and announced result on total variation cutoff under the restriction for all . In this work, we further generalise it by removing the restriction on the size of , that motivates us to consider both -distance and total variation distance. Moreover, if , then our model provides an example where the spectral analysis fails for obtaining sharp total variation mixing time; in other words, the -bound of the total variation distance is not sufficient enough for computing sharp total variation mixing time. Thus we consider both -distance and total variation distance in the present paper. For another notable random walk on the complete monomial groups we mention the work of Schoolfield Jr. [25], which is a generalisation of the random transposition model to for a finite group . However this walk was generated by the probability measure which is constant on the conjugacy classes. On the other hand, the generating measure of our model is not constant on conjugacy classes. In general, it is not easy to study a random walk generated by a probability measure which is not constant on the conjugacy classes (cf. [4, 10, 12, 13]). For other random walks on the complete monomial group see [9, 16]. Before describing our random walk model, let us first recall the definition of the complete monomial group.
Definition 1.1.
Let be a finite group and be the symmetric group of permutations of elements of the set . The complete monomial group is the wreath product of with , denoted by , and can be described as follows: The elements of are -tuples where and . The multiplication in is given by . Therefore .
Now let be a sequence of non-trivial finite groups. We consider the complete monomial groups for each positive integer . Let be the identity of and be the identity of . For an element , let and for , let
Unless otherwise stated from now on, denotes the element of with in th position and in th position, for . One can check that is equal to for .
In this work we consider a random walk on the complete monomial group driven by a probability measure , defined as follows:
(1) |
We call this the warp-transpose top with random shuffle because at most times the th component is multiplied by and the th component is multiplied by simultaneously, , . We now give a combinatorial description of this model as follows:
Let denote the set of all arrangements of coloured cards in a row such that the colours of the cards are indexed by the set . For example, if denotes the additive group of integers modulo , then elements of can be identified with the elements of (the hyperoctahedral group). For , by saying update the colour using colour we mean the colour is updated to colour . Elements of can be identified with the elements of as follows: The element is identified with the arrangement in such that the label of the th card is , and its colour is , for each . Given an arrangement of coloured cards in , the warp-transpose top with random shuffle on is the following: Choose a positive integer uniformly from . Also choose a colour uniformly from , independent of the choice of the integer .
-
(1)
If : update the colour of the th card using colour .
-
(2)
If : first transpose the th and th cards. Then simultaneously update the colour of the th card using colour and update the colour of the th card using colour .
The flip-transpose top with random shuffle on the hyperoctahedral group serves the case when for all [13]. We now state the main theorems of this paper.
Theorem 1.1.
The total variation and mixing time for the warp-transpose top with random shuffle on is .
Theorem 1.2.
The warp-transpose top with random shuffle on presents -cutoff at .
Theorem 1.3.
The warp-transpose top with random shuffle on exhibits total variation cutoff at time .
In view of Theorem 1.3, the distribution after transitions is close to uniform in total variation distance. On the other hand, if , then Theorem 1.2 and
says that the distribution after transitions is far from uniform in -distance. Therefore, the standard spectral approach (mainly uses the fact that the total variation distance is at most half of -distance) for obtaining the total variation mixing time fails when . The proof of Theorem 1.1 and Theorem 1.2 will be presented in Section 3, and the proof of Theorem 1.3 will be presented at the end of Section 5.
Let us recall some concepts and terminologies from representation theory of finite group and discrete time Markov chains with finite state space to make this paper self contained. Readers from representation theoretic background may skip Subsection 1.1 and from Probabilistic background may skip Subsection 1.2.
1.1. Representation theory of finite group
Let be a finite group and be a finite dimensional complex vector space. Also let be the set of all invertible linear operators on . A linear representation of is a homomorphism from to . Sometimes this representation is also denoted by the pair . The dimension of the vector space is called the dimension of the representation. is called the -module corresponding to the representation in this case. Let be the group algebra consisting of complex linear combinations of elements of . In particular taking , we define the right regular representation of by
A vector subspace of is said to be stable ( or ‘invariant’) under if for all in . The representation is irreducible if is non-trivial and has no non-trivial proper stable subspace. Two representations and of are said to be isomorphic if there exists an invertible linear map such that the following diagram commutes for all :
For each can also be thought of as an invertible complex matrix of size . The trace of the matrix is said to be the character value of at and is denoted by . It can be easily seen that the character values are constants on conjugacy classes, hence characters are class functions. If denote the complex conjugate of , then one can check that for all . Let be the complex vector space of class functions of . Then a ‘standard’ inner product on is defined as follows:
An important theorem in this context is the following [26, Theorem 6]: The characters corresponding to the non-isomorphic irreducible representations of form an -orthonormal basis of .
If denotes the tensor product of the vector spaces and , then the tensor product of two representations and is a representation denoted by and defined by,
We use some results from representation theory of finite groups without recalling the proof. For details about finite group representation see [21, 23, 26].
1.2. Discrete time Markov chain with finite state space
Let be a finite set. A sequence of random variables is a discrete time Markov chain with state space and transition matrix if for all , all , and all events satisfying , we have
(2) |
Equation (2) says that given the present, the future is independent of the past. Let denote the distribution after transitions, i.e. is the row (probability) vector . Then for all , which implies . In particular if the chain starts at , then its distribution after transitions is , i.e. . Here is defined on as follows:
A Markov chain is said to be irreducible if it is possible for the chain to reach any state starting from any state using only transitions of positive probability. The period of a state is defined to be the greatest common divisor of the set of all times when it is possible for the chain to return to the starting state . The period of all the states of an irreducible Markov chain are the same [15, Lemma 1.6]. An irreducible Markov chain is said to be aperiodic if the common period for all its states is . A probability distribution is said to be a stationary distribution of the Markov chain if . Any irreducible Markov chain possesses a unique stationary distribution with for all [15, Proposition 1.14]. Moreover if the chain is aperiodic then as [15, Theorem 4.9]. For an irreducible chain, we first define the -distance between the distribution after transitions and the stationary distribution.
Definition 1.2.
Let denote the distribution after transitions of an irreducible discrete time Markov chain with finite state space , and denote its stationary distribution. Then the -distance between and is defined by
We now define the total variation distance between two probability measures.
Definition 1.3.
Let and be two probability measures on . The total variation distance between and is defined by
It can be easily seen that (see [15, Proposition 4.2]).
For an irreducible and aperiodic chain the interesting topic is the minimum number of transitions required to reach near the stationarity up to a certain level of tolerance . We first define the maximal -distance (respectively total variation distance) between the distribution after transitions and the stationary distribution as follows:
For , the -mixing time (respectively total variation mixing time) with tolerance level is defined by
Most of the notations of this subsection are borrowed from [15].
1.3. Non-commutative Fourier analysis and random walks on finite groups
Let be two probability measures on a finite group . We define the convolution of and by . The Fourier transform of at the right regular representation is defined by the matrix . The matrix can be thought of as the action of the group algebra element on by multiplication on the right. It can be easily seen that .
A random walk on a finite group driven by a probability measure is a Markov chain with state space and transition probabilities , . It can be easily seen that the transition matrix is and the distribution after th transition will be (convolution of with itself times) i.e., the probability of getting into state starting at state after transitions is . One can easily check that the random walk on driven by is irreducible if and only if the support of generates [24, Proposition 2.3]. The stationary distribution for an irreducible random walk on driven by , is the uniform distribution on (since for all ). From now on, the uniform distribution on group will be denoted by . For the random walk on driven by , it is enough to focus on and because,
for any two elements . We now define the cutoff phenomenon for a sequence of random walks on finite groups.
Definition 1.4.
Let be a sequence of finite groups. For each , let be a probability measure on such that support of generate . Consider the sequence of irreducible and aperiodic random walk on driven by . We say that the -cutoff phenomenon (respectively total variation cutoff phenomenon) holds for the family if there exists a sequence of positive real numbers tending to infinity as , such that the following hold:
-
(1)
For any and ,
-
(2)
For any and ,
Here denotes the floor of (the largest integer less than or equal to ).
Informally, we will say that has an -cutoff (respectively total variation cutoff) at time . This says that for sufficiently large the leading order term in the mixing time does not depend on the tolerance level . In other words the distribution after transitions is very close to the stationary distribution if but too far from the stationary distribution if . Although most of the cases the cutoff phenomenon depend on the multiplicity of the second largest eigenvalue of the transition matrix [7], sometimes the behaviour is different. For the random-to-random shuffle [4], many eigenvalues are almost equal to the second largest eigenvalue, and they impact the (total variation) cutoff time. We now see that the random walk of our concern is irreducible and aperiodic.
Proposition 1.4.
The warp-transpose top with random shuffle on is irreducible and aperiodic.
Proof.
The support of is and it can be easily seen that is a generating set of .
(3) |
Thus (3) implies generates and hence the warp-transpose top with random shuffle on is irreducible. Moreover given any , the set of all times when it is possible for the chain to return to the starting state contains the integer (as support of contains the identity element of ). Therefore the period of the state is and hence from irreducibility all the states of this chain have period . Thus this chain is aperiodic. ∎
Proposition 1.4 says that the warp-transpose top with random shuffle on converges to the uniform distribution as the number of transitions goes to infinity. In Section 2 we will find the spectrum of . We will prove Theorems 1.1 and 1.2 in Section 3. In Section 4, we will obtain an upper bound of using the coupling argument. In Section 5, the lower bound of will be discussed and Theorem 1.3 will be proved.
2. Spectrum of the transition matrix
In this section we find the eigenvalues of the transition matrix , the Fourier transform of at the right regular representation of . To find the eigenvalues of we will use the representation theory of the wreath product of with the symmetric group . First we briefly discuss the representation theory of , following the notation from [17]. We refer to the exposition [17] for more details on the representation theory of .
A partition of a positive integer (denoted ) is a weakly decreasing finite sequence of positive integers such that . The partition can be pictorially visualized as a left-justified arrangement of rows of boxes with boxes in the th row, . This pictorial arrangement of boxes is known as the Young diagram of . For example there are five partitions of the positive integer viz. , , , and . The Young diagrams corresponding to the partitions of are shown in Figure 2.
Definition 2.1.
Let denote the set of all Young diagrams (there is a unique Young diagram with zero boxes) and denote the set of all Young diagrams with boxes. For example, elements of are shown in Figure 2. For a finite set , we define . For , define , where is the number of boxes of the Young diagram and define .
Let be a fixed positive integer. Let denote the (finite) set of all non-isomorphic irreducible representations of . Given , we denote by the corresponding irreducible -module (the space for the corresponding irreducible representation of ). Elements of are called Young -diagrams and elements of are called Young -diagrams with boxes. For example, if and (the additive group of integers modulo ), then an element of is given in Figure 3. Let . A Young tableau of shape is obtained by taking the Young diagram and filling its boxes (bijectively) with the numbers . A Young tableau is said to be standard if the numbers in the boxes strictly increase along each row and each column of the Young diagram of . The set of all standard Young tableaux of shape is denoted by . Elements of are listed in Figure 4. Let . A Young -tableau of shape is obtained by taking the Young -diagram and filling its boxes (bijectively) with the numbers A Young -tableau is said to be standard if the numbers in the boxes strictly increase along each row and each column of all Young diagrams occurring in . Let , where , denote the set of all standard Young -tableaux of shape and let . An element of is given in Figure 5, the shape is given in Figure 3.
Definition 2.2.
Let and . If appears in the Young diagram , where is the shape of and , we write . For the example given in Figure 5, we have , , , , , , , , , . The content of a box in row and column of a Young diagram is the integer . Let be the box in , with the number resides and denote the content of the box . For the example given in Figure 5, we also have , , , , , , , , , .
The irreducible representation of can be parametrised by elements of [17, Lemma 6.2 and Theorem 6.4]. Given and denotes the set of all Young -diagrams obtained from by removing one of the inner corners in the Young diagram ; see Figure 6 for an example. The branching rule [17, Theorem 6.6] of the pair is given as follows: Let (respectively ) denote the irreducible -module (respectively -module) indexed by (respectively ). Then,
(4) |
where is the irreducible -module indexed by . For an illustration of (4), consider from Figure 3, and recall from Figure 6. Then
Here we have used the same notation for a singleton set and its element.
Definition 2.3.
Let be the subgroup
of for . In particular and .
The subgroup is isomorphic to (direct product of and ) by the isomorphism ; sending to . Here we have used the same notation for permutations of and , as for . The irreducible -modules are given by the tensor products of the irreducible -modules and the irreducible -modules [26, Theorem 10]. Therefore we may parametrise the irreducible representations of by elements of . The branching rule of the pair is given as follows: Let (respectively ) denote the irreducible -module (respectively -module) indexed by (respectively ). Then,
(5) |
In particular, (irreducible -module), for . To illustrate (5), consider from Figure 3, and recall from Figure 6. Then
Here we have used the same notation for a singleton set and its element.
Remark 2.1.
Although the simple (multiplicity free) branching of the pair was established in [17, Section 4], no branching rule was explained. To see the branching rule (5), we note down the following: A straightforward generalisation of [18, Theorem 4.3] provides a proof of the branching rule (5) when . In view of isomorphism , it suffices to prove (5) for .
Definition 2.4.
The (generalised) Young-Jucys-Murphy elements of or are given by and
Young-Jucys-Murphy elements generates a maximal commuting subalgebra of and act like scalars on the Gelfand-Tsetlin subspaces of irreducible -modules. We now define Gelfand-Tsetlin subspaces and the Gelfand-Tsetlin decomposition.
Let and consider the irreducible -module (the space for the representation ). Since the branching is simple (recall (5)), the decomposition into irreducible -modules is given by
where the sum is over all , with (i.e there is an edge from to in the branching multi-graph), is canonical. Here we note that and . Iterating this decomposition of into irreducible -submodules, i.e.,
(6) |
where the sum is over all possible chains with and . We call (6) the Gelfand-Tsetlin decomposition of and each in (6) a Gelfand-Tsetlin subspace of . We note that if from the definition of .
Theorem 2.2 ([17, Theorem 6.5]).
Let . Then we may index the Gelfand-Tsetlin subspaces of by standard Young -tableaux of shape and write the Gelfand-Tsetlin decomposition as
where each is closed under the action of and as a -module, is isomorphic to the irreducible -module
For the eigenvalues of on are given by . Here we recall and from Definition 2.2, is the irreducible -module indexed by , and is the th Young-Jucys-Murphy element of .
Theorem 2.3 ([17, Theorem 6.7]).
Let . Write the elements of as and set for each . Then
Here denotes the number of standard Young tableau of shape , for each .
Lemma 2.4.
Let G be a finite group and . If respectively denotes the irreducible -module respectively character and is the dimension of , then the action of the group algebra element on is given by the following scalar matrix
Here is the identity matrix of order and be the trivial representation of .
Proof.
It is clear that is in the centre of . Therefore by Schur’s lemma ([26, Proposition 4]), we have for some . The value of can be obtained by equating the traces of and . ∎
Remark 2.5.
Let and , where (the trivial representation of ). We write as the tuple , where for each . We also denote the irreducible -module corresponding to and for each . Thus , and depend on i.e., on . To avoid notational complication the dependence of , and on is suppressed. We note that for the dimension of is .
Theorem 2.6.
For each , let denote the restriction of to the irreducible -module . Then the eigenvalues of are given by,
for each .
Proof.
3. Order of the mixing time and -cutoff
In this section, using the spectrum of the transition matrix , we find the upper bounds of and when . We also prove Theorems 1.1 and 1.2 in this section. Before proving the main results of this section, first, we set some notations and prove two useful lemmas. For any positive integer , we write to denote that is a partition of . Given a partition of the integer (here we are allowing to take value ), throughout this section denotes the largest part of . In particular if then (as there is a unique Young diagram with zero boxes) and we set .
Theorem 3.1 (Plancherel formula, [5, Theorem 4.1]).
Let and be two functions on the finite group G. Then
where the sum is over all irreducible representations of and is the dimension of .
Recall that is the uniform distribution on the group . Then using Lemma 2.4 we have the following
Moreover, given any probability measure on the finite group , we have . Therefore setting , we have the following
(9) |
We now state the Diaconis-Shahshahani upper bound lemma. The proof follows from the Cauchy-Schwarz inequality and (9).
Lemma 3.2 ([5, Lemma 4.2]).
Let be a probability measure on a finite group such that for all . Suppose the random walk on driven by is irreducible. Then we have the following
where the sum is over all non-trivial irreducible representations of and is the dimension of .
Definition 3.1.
Let be a non empty set. Then the indicator function of is denoted by and is defined by
Lemma 3.3.
Let be a positive integer and be any non-negative real number. Then we have
Proof.
An immediate corollary of Lemma 3.3 follows from the fact
Corollary 3.4.
Following the notations of Lemma 3.3, we have
Lemma 3.5.
Let . Recall that respectively denotes the largest part of respectively its conjugate for . Then we have
where and for each .
Proof.
Proposition 3.6.
For the warp-transpose top with random shuffle on , we have
for all .
Proof.
Let us recall that , the trivial representation of . Given , throughout this proof we write , where , , and . Now using Lemma 3.2, we have
(12) |
First we partition the set into two disjoint subsets as follows:
It can be easily seen that ’s are disjoint. Therefore by using Theorem 2.6, Remark 2.7, and , the inequality (12) become
(13) | ||||
The sum of the first two terms in the right hand side of (3) are equal to
(14) |
Now recalling respectively is the largest part of respectively its conjugate, we have the following:
This implies
Thus using for , , and for all , the expression in (3) is bounded above by
(15) |
The inequality in (3) follows from Corollary 3.4 and Lemma 3.3. Now recalling , and using Lemma 3.5, the third term in the right hand side of (3) is less than
(16) |
We now deal with (16) by considering two separate cases namely and . Now using
the partial sum corresponding to in (16) is equal to,
(17) |
Using for , and the multinomial theorem
the expression in (17) can be written as
(18) |
The inequality in (3) follows from Lemma 3.3. As , we have
Thus writing by , the expression in (3) is less than or equal to
(19) |
Now using for all and the expression in (19) is less than or equal to
(20) |
Now using the notation to denote , and
the partial sum corresponding to in (16) turns out to be
(21) |
where . Using for , and the multinomial theorem
the expression given in (21) is equal to the following
(22) |
The inequality in (22) follows from Lemma 3.3. As , we have
Thus writing by and using , the expression in (22) is less than or equal to
(23) |
Now using for all and for all , the expression in (23) is less than or equal to
(24) |
Therefore the proposition follows from (3), (3), (20), (3) and for all . ∎
Theorem 3.7.
For the random walk on driven by we have the following:
-
(1)
Let . If , then
-
(2)
For any , if we set , then
Proof.
Proof of Theorem 1.1.
Let and (respectively ) be the -mixing time (respectively total variation mixing time) with tolerance level for the warp-transpose top with random shuffle on . We choose such that . Then the first part of Theorem 3.7 ensures the existence of positive integer such that the following hold for all ,
Finally, using for all , we can conclude that
Thus the theorem follows. ∎
We now establish a lower bound of that will be useful in proving the -cutoff.
Proposition 3.8.
For large , we have
Proof.
Recall that the irreducible representations of are parameterised by the elements of . We now use Theorem 2.6 to compute the eigenvalues of the restriction of to some irreducible -modules. The eigenvalues of the restriction of to the irreducible -module indexed by
(27) |
are given below.
Eigenvalues: | ||
---|---|---|
Multiplicities: |
The eigenvalues of the restriction of to the irreducible -modules indexed by Young -diagram with boxes of the following form
(28) |
are given below.
Eigenvalues: | ||
---|---|---|
Multiplicities: |
Now (9) implies
(29) |
Here ‘’ means ‘ is asymptotic to ’ i.e. as . We have used and to obtain (29). Therefore (29) implies
for large . ∎
Proof of Theorem 1.2.
For any , the second part of Theorem 3.7 implies
(30) |
Again, Proposition 3.8 implies the following
(31) |
for large . The right hand side of the inequality (31) tends to infinity as . Therefore from (30) and (31), we can conclude that the warp-transpose top with random shuffle on satisfies the -cutoff phenomenon with cutoff time . This completes the proof of Theorem 1.2. ∎
4. Upper bound for total variation distance
In this section, we use the coupling argument to obtain an upper bound of when . Our method uses the known upper bound for the transpose top with random shuffle [5, Theorem 5.1] and coupling argument. Let us first recall Markov chain coupling.
Definition 4.1.
A coupling of Markov chains with transition matrix is a process such that both and are Markov chains with transition matrix and with possibly different initial distribution.
Given a coupling of a Markov chain with transition matrix , suppose that is a random time such that
Then for every pair of initial distribution
(32) |
The coupling is called successful if
The random time is known as the coupling time. The coupling is called maximal if equality holds in (32), i.e.
Theorem 4.1 ([14, Theorem 4]).
Any Markov chain has a maximal coupling achieving equality in (32). Thus there exists a successful maximal coupling for if and only if is weakly ergodic i.e., irrespective of the initial distribution, the chain will converge to a unique distribution as the number of transition approaches infinity.
In particular, an irreducible and aperiodic random walk on finite group is weakly ergodic.
Before going to the main theorem, we recall the coupon collector problem [15, Section 2.2] and prove a useful lemma. Suppose a shop sells different types of coupons. A collector visits the shop and buys coupons. Each coupon he buys is equally likely to be each of the types. The collector desires a complete set of all types. Let be the minimum (random) number of coupons collected by the collector to have all the types; is usually known as the coupon collector random variable. Then we have the following [15, Proposition 2.4]
(33) |
Let be the minimum (random) number of coupons collected by the collector to have the coupon of type . Then we define the twisted coupon collector random variable as follows:
(34) |
i.e., is the minimum (random) number of coupons collected by the collector to collect every coupon at least once and the last collected coupon is of type .
Lemma 4.2.
For the twisted coupon collector random variable defined in (34), we have
Proof.
The definition of the twisted coupon collector random variable implies that
Therefore,
Theorem 4.3.
For the random walk on driven by we have the following:
- (1)
-
(2)
For any ,
Proof.
We construct a coupling for the warp-transpose top with random shuffle using a successful maximal coupling of the transpose top with random shuffle on . Let be the identity element of , and be a random element of (chosen uniformly). Thus the last element of is the identity permutation , and the last element of is a random permutation (say) .
First we focus on the transpose top with random shuffle on . Let us define the probability measure on that generates the transpose top with random shuffle on .
(35) |
where denotes the transposition in interchanging and ; here we set . Recall that the transpose top with random shuffle is irreducible and aperiodic. Thus, , the distribution after transitions converges to the unique stationary distribution as . Therefore, Theorem 4.1 ensures the existence of a successful maximal coupling for the transpose top with random shuffle (on ) such that
-
•
starts at identity permutation and starts at .
-
•
Both and evolves according to the law , i.e., and .
-
•
is the coupling time, i.e., , , and
(36)
Define by , and by . In other words, is obtained from , and is obtained from . Thus, the sequence (also, ) consists of independent uniformly distributed random variables taking values from . We note that, and depend on each other via the coupling .
The known upper bound of the transpose top with random shuffle on (see [5, Theorem 5.1]) provides
Therefore, using (36), we have the following:
(37) |
Now we construct the coupling as follows:
-
•
Recall that , and starts from a random element of .
-
•
For , set
where we choose such that th position and matches; also choose such that th position and matches. We note that may happen.
Under the above coupling , if a position matches in and , then the position agrees in and for . Therefore, we have the following
-
•
First positions of and will be matched when (defined in (34)), as they are updated by time with the final update at position .
-
•
The last position of and matches when (the coupling time for ), but does not match when .
Thus, the coupling time for can not exceed ,
Hence by using (32), we have
(38) |
for and the constant in (37). The inequality in (4) follows from (37) and Lemma 4.2. The first part of this theorem follows from the fact that decreases as increases, and the second part follows from the first part by setting . ∎
Remark 4.4.
We now describe an explicit coupling (not necessarily optimal) for the transpose top with random shuffle on , that provides a proof of (37) without using Theorem 4.1. We use the same notation (respectively ) for the coupling (respectively coupling time). Let and . The coupling description is as follows: Let i.e., the set of positions where and agree. Now choose uniformly at random.
-
•
If , then and . Thus .
-
•
If but , then and , where . Thus .
-
•
If but , then and . Thus .
-
•
If and , then and , where . Thus .
Thus we have the following: Each element of are equally probable to be added to if , implies for all , implies , and when . Therefore , where is the coupon collector random variable satisfying (33). Finally using the fact
5. Lower bound for total variation distance
This section will focus on the lower bound of for and prove Theorem 1.3. The idea is to use the fact that a projected chain mixes faster than the original chain. We define a group homomorphism from onto the symmetric group which projects the warp-transpose top with random shuffle on to the transpose top with random shuffle on . Recall the lower bound results for the transpose top with random shuffle on from [13, Section 4]. Although the detailed analysis for the transpose top with random shuffle on has appeared first in [5, Chapter 5(C), p. 27], the lower bound results we need here are directly available in [13, Section 4].
Recall that the transpose top with random shuffle is the random walk on driven by (defined in (35)), and is the distribution after transitions. Then as . Given , if denotes the number of fixed points in , then we have
(39) |
where (respectively ) denotes the expected value of (respectively ) with respect to the probability measure . Now recall the expressions for and obtained in [13].
(40) | ||||
(41) |
Let us define a homomorphism from onto as follows:
(42) |
It can be checked that the mapping defined in (42) is a surjective homomorphism. Moreover, projects the warp-transpose top with random shuffle on to the transpose top with random shuffle on i.e., . Here is defined by for . We now prove a lemma which will be useful in proving the main result of this section.
Lemma 5.1.
For any positive integer we have .
Proof.
We use the first principle of mathematical induction on . The base case for is true by definition. Now assume the induction hypothesis i.e., for some positive integer . Let be chosen arbitrarily. Then for the inductive step we have the following:
(43) |
Now using the fact that is a homomorphism, we have the following:
Therefore the expression in (5) becomes
Thus the lemma follows from the first principle of mathematical induction. ∎
Theorem 5.2.
For the random walk on driven by we have the following:
-
(1)
For large ,
when and .
-
(2)
For any ,
Proof.
We know that, given two probability distributions and on and a mapping , we have , where is finite [15, Lemma 7.9]. Therefore we have the following:
(44) |
using . Therefore (39), (40), (41), and (5) implies that
(45) |
for sufficiently large . The equality in (5) holds because of the following:
Now if is large, and , then by (5), we have the first part of this theorem.
Proof of Theorem 1.3.
Acknowledgement
I extend sincere thanks to my PhD advisor Arvind Ayyer for all the insightful discussions during the preparation of this paper. I am very grateful to the anonymous referees of the Journal of Algebraic Combinatorics for many constructive suggestions. I would like to thank an anonymous referee of the Algebraic Combinatorics for the valuable comments, which helped improve the total variation upper bound result and simplify the proof of the total variation lower bound. I am grateful to Professor Tullio Ceccherini-Silberstein for his encouragement and inspiring comments. I would also like to thank Guy Blachar, Ashish Mishra, and Shangjie Yang for their discussions. I would like to acknowledge support in part by a UGC Centre for Advanced Study grant.
Statements and Declarations
The author has no conflict of interest to disclose. This article is a part of author’s PhD dissertation. The extended abstract of this article was accepted in FPSAC 2020 (online).
Data Availability Statements
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Copyright Statements
This version of the article has been accepted for publication in the Journal of Algebraic Combinatorics: An International Journal, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s10801-023-01271-1. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms.
References
- [1] David Aldous. Random walks on finite groups and rapidly mixing Markov chains. In Seminar on probability, XVII, volume 986 of Lecture Notes in Math., pages 243–297. Springer, Berlin, 1983.
- [2] David Aldous and Persi Diaconis. Shuffling cards and stopping times. Amer. Math. Monthly, 93(5):333–348, 1986.
- [3] David Aldous and Persi Diaconis. Strong uniform times and finite random walks. Adv. in Appl. Math., 8(1):69–97, 1987.
- [4] Megan Bernstein and Evita Nestoridi. Cutoff for random to random card shuffle. Ann. Probab., 47(5):3303–3320, 2019.
- [5] Persi Diaconis. Applications of non-commutative fourier analysis to probability problems. In École d’Été de Probabilités de Saint-Flour XV–XVII, 1985–87, pages 51–100. Springer, 1988.
- [6] Persi Diaconis. Group representations in probability and statistics. 11:vi+198, 1988.
- [7] Persi Diaconis. The cutoff phenomenon in finite Markov chains. Proc. Nat. Acad. Sci. U.S.A., 93(4):1659–1664, 1996.
- [8] Persi Diaconis and Mehrdad Shahshahani. Generating a random permutation with random transpositions. Z. Wahrsch. Verw. Gebiete, 57(2):159–179, 1981.
- [9] James Allen Fill and Clyde H. Schoolfield, Jr. Mixing times for Markov chains on wreath products and related homogeneous spaces. Electron. J. Probab., 6:no. 11, 22, 2001.
- [10] L. Flatto, A. M. Odlyzko, and D. B. Wales. Random shuffles and group representations. Ann. Probab., 13(1):154–178, 1985.
- [11] Subhajit Ghosh. Cutoff for the warp-transpose top with random shuffle. Sém. Lothar. Combin., 84B:Art. 69, 12, 2020.
- [12] Subhajit Ghosh. Total variation cutoff for the transpose top-2 with random shuffle. J. Theoret. Probab., 33(4):1832–1854, 2020.
- [13] Subhajit Ghosh. Total variation cutoff for the flip-transpose top with random shuffle. ALEA Lat. Am. J. Probab. Math. Stat., 18(1):985–1006, 2021.
- [14] David Griffeath. A maximal coupling for Markov chains. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 31:95–106, 1974/75.
- [15] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov chains and mixing times. American Mathematical Society, Providence, RI, 2009. With a chapter by James G. Propp and David B. Wilson.
- [16] Oliver Matheau-Raven. Random walks on the symmetric group: cutoff for one-sided transposition shuffles. PhD thesis, University of York, 2020.
- [17] Ashish Mishra and Murali K. Srinivasan. The Okounkov-Vershik approach to the representation theory of . J. Algebraic Combin., 44(3):519–560, 2016.
- [18] Ashish Mishra and Shraddha Srivastava. On representation theory of partition algebras for complex reflection groups. Algebr. Comb., 3(2):389–432, 2020.
- [19] Evita Nestoridi. The limit profile of star transpositions. arXiv preprint arXiv:2111.03622, 2021.
- [20] Evita Nestoridi and Sam Olesker-Taylor. Limit profiles for reversible Markov chains. Probab. Theory Related Fields, 182(1-2):157–188, 2022.
- [21] Amritanshu Prasad. Representation theory: a combinatorial viewpoint, volume 147. Cambridge University Press, 2015.
- [22] Dana Randall. Rapidly mixing Markov chains with applications in computer science and physics. Computing in Science and Engg., 8(2):30 – 41, 2006.
- [23] Bruce E. Sagan. The symmetric group, volume 203 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 2001. Representations, combinatorial algorithms, and symmetric functions.
- [24] Laurent Saloff-Coste. Random walks on finite groups. In Probability on discrete structures, volume 110 of Encyclopaedia Math. Sci., pages 263–346. Springer, Berlin, 2004.
- [25] Clyde H. Schoolfield, Jr. Random walks on wreath products of groups. J. Theoret. Probab., 15(3):667–693, 2002.
- [26] Jean-Pierre Serre. Linear representations of finite groups. Springer-Verlag, New York-Heidelberg, 1977. Translated from the second French edition by Leonard L. Scott, Graduate Texts in Mathematics, Vol. 42.
- [27] Lucas Teyssier. Limit profile for random transpositions. Ann. Probab., 48(5):2323–2343, 2020.