Random Transpositions on Contingency Tables
Abstract
Contingency tables are useful objects in statistics for representing 2-way data. With fixed row and column sums, and a total of entries, contingency tables correspond to parabolic double cosets of . The uniform distribution on induces the Fisher-Yates distribution, classical for its use in the chi-squared test for independence. A Markov chain on can then induce a random process on the space of contingency tables through the double cosets correspondence. The random transpositions Markov chain on induces a natural ‘swap’ Markov chain on the space of contingency tables; the stationary distribution of the Markov chain is the Fisher-Yates distribution. This paper describes this Markov chain and shows that the eigenfunctions are orthogonal polynomials of the Fisher-Yates distribution. Results for the mixing time are discussed, as well as connections with sampling from the uniform distribution on contingency tables, and data analysis.
1 Introduction
Contingency tables are a classical object of study, useful in statistics for representing data with two categorical features. For example, rows could be hair colors and columns could be eye colors. The number in an entry in the table counts the number of individuals from a sample with a fixed hair, eye color combination. Various statistical tests exist to test the hypothesis that the row and column features are independent, and other more general models. See e.g. [36], [1] and Section 5 of [16] for many more references.
Given two partitions of , the set of contingency tables is the set of tables with non-negative integer entries with row sums and column sums . As explained below, the space is in bijection with the space of double cosets , where is the symmetric group on elements and are the Young subgroups defined by . Though this is a classical fact [29], recently it was studied in [16] with probabilistic motivation. The general question is: Given a finite group and two subgroups , what is the distribution on induced by picking a element uniformly from and mapping it to its double coset? For , the induced distribution on is the Fisher-Yates distribution:
(1) |
The Fisher-Yates distribution is classically defined as the conditional distribution on a contingency table sampled with cell probabilities which satisfy (the assumption that row and column features are independent). It is the distribution of the table conditioned on the sufficient statistics (the row and column sums). It is a pleasant surprise that this same distribution is induced by the double coset decomposition.
Whenever there is a Markov chain on some state space, one can consider the lumped process on any partitioning of this space. In [12] the question is studied for double cosets: Given a Markov chain on , is the lumped process on also a Markov chain? An affirmative answer is proven when the Markov chain on is induced by a probability measure on which is constant on conjugacy classes.
Many Markov chains on have been studied, and perhaps the most famous and easiest to describe is the random transpositions Markov chain [14]. This is generated by the class function which gives equal probability to transpositions. The Markov chain on induced by random transpositions turns out to be novel and easy to describe. One move involves adding a sub-matrix of the form to the current table; the probability of selecting the rows/columns for the sub-matrix depend on the current configuration. Since the stationary distribution of the random transpositions chain on is uniform, the induced chain on has Fisher-Yates distribution as stationary distribution. These statements are proven in Section 1.1 below.
This same move has previously been studied for a Markov chain on contingency tables with rows and columns for the sub-matrix chosen uniformly at random (and moves which would give a negative entry in the table are rejected). This ‘uniform swap’ chain was proposed in [11] to sample contingency tables from the uniform distribution, as a way of approximately counting the number of tables, which is a # -P complete problem.
The purpose of this paper is to describe the random transposition chain on , characterize its eigenvalues and eigenvectors, and analyze the mixing time in special cases. This chain has polynomial eigenvectors, which can be used to give orthogonal polynomials for the Fisher-Yates distribution. In the special case of a two-rowed table, the Fisher-Yates distribution is the multivariate hypergeometric distribution and the orthogonal polynomials are the multivariate Hahn polynomials. More generally, this Markov chain gives a characterization of the natural analog of Hahn polynomials for the Fisher-Yates distribution.
The remainder of this section reviews the correspondence between contingency tables and double cosets, defines the Markov chain explicitly, states the main results of this paper, and reviews related work.
1.1 Contingency Tables as Double Cosets
This section describes the relationship between contingency tables and double cosets of . Let be a positive integer and suppose is a partition of . That is, are all positive integers, , and .
Definition 1.1 (Young Subgroup).
The parabolic subgroup, or Young subgroup, is the set of all permutations in which permute only among themselves, only among themselves, and so on. Thus,
If is an arbitrary finite group and are two sub-groups of , then the double-cosets are the equivalence classes defined by the equivalence relation
Let be a second partition of , let denote the space of contingency tables with row sums given by and column sums . That is, an element of is an table with non-negative integer entries, row sums and column sums . The following bijection is described in further detail in [29], and the induced distribution, along with many more references, in [16].
Lemma 1.2.
The set of double-cosets is in bijection with . The distribution on induced by sampling a uniform and mapping it to it’s corresponding double-coset is the Fisher-Yates distribution
(2) |
Remark 1.
Without loss of generality, we assume that the rows and columns of the contingency tables are ordered, i.e. the first row/column has the largest sum and then row/column sums are decreasing. This ordering does not effect any of the results in this paper.
The mapping from to tables is easy to describe: Fix . Inspect the first positions in . Let be the number of elements from occurring in these positions, the number of elements from , …and the number of elements from . In general, is the number of elements from which occur in the positions up to .
Example 1.3.
When , , there are five possible tables:
Listed below each table is a permutation in the corresponding double coset, and the total size of the double coset. The double coset representatives are chosen to be the one of minimal length, i.e. the fewest number of adjacent transpositions to change the permutation to the identity. It is always possible to a unique shortest double coset representative [4]. This is easy to identify: Given , build sequentially, left to right, by putting down then … each time putting down the longest available numbers in the block, in order. In example 1.3, the shortest double coset representatives are shown.
Thanks to Zhihan Li, another way to describe the mapping from to contingency tables is to consider the permutation matrix defined by . The partition divides the columns into blocks and divides the rows into blocks. The table corresponding to is defined by as the number of s in the block of rows and block of columns. For example, gives the permutation matrix
1 | 0 | 0 | 0 | 0 | |
0 | 1 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | |
0 | 0 | 0 | 1 | 0 | |
0 | 0 | 1 | 0 | 0 |
which defines the table .
1.2 Markov Chain and Main Results
The random transpositions chain on is easy to describe. If a permutation is viewed as an arrangement of a deck of unique cards, the random transpositions chain can be described: Pick one card with your left hand and one card with your right hand (allowing for the possibility of picking the same card), and swap the two cards. The transition probabilities are generated by the probability measure on defined
(3) |
where denotes the transposition of elements and . The transition matrix for the random transposition Markov chain is then
Note that is constant on conjugacy classes: If and are two transpositions, then
Since transpositions generate , this proves that if is a transposition, then is a transposition for any , and thus .
The random transpositions chain on induces a Markov chain on as a ‘lumping’. From a table , choose a permutation that maps to the double coset corresponding to . From , sample from and move to the contingency table corresponding to . More details of this lumping are discussed in Section 2.1, and this perspective will be useful for relating the well-studied eigenvalues of the chain on to the chain on . It is also possible to write down the transition probabilities independently of the chain on .
Definition 1.4 (Random Transpositions Markov chain on ).
Let be two partitions of . For and indices and , let denote the table with
and all other entries of the same as . The random transpositions Markov chain on is defined by the transition matrix with:
Note if or , then has negative values and is not an element of . Definition 1.4 does not allow selecting this possibility, since only pairs with will be chosen.
There are other ways of thinking about the Markov chain. Suppose the contingency table is created from data pairs of the type , where indicates the row feature and indicates the column feature. One step of the chain is to pick two pairs uniformly (possibly picking the same pair) and swapping the column values (or equivalently, swapping the row values). This is exactly the swap Markov chain on bi-partite graphs, which has been studied for various special cases, e.g. [2].
Example 1.5.
:
The probability of this transition for the tables is , since there are two possible transpositions in the permutation that would result in the table . Representing the table as a set of data points, this transition is:
Remark 2.
If , then this is related to the Bernoulli-Laplace chain: Suppose . Consider two urns. The first contains total balls and the second has balls. Of these total balls, are green and are red. In the contingency table, the rows represent urns and the columns colors. In the traditional Bernoulli-Laplace urn, one ball is picked uniformly from each urn and swapped. The random transpositions chain Definition 1.4 can be described by instead picking balls uniformly from all balls and swapping (which creates the possibility both are selected from the same urn).
Using the spherical Fourier transform of the Gelfand pair , Diaconis and Shahshahani proved mixing time of order steps for the Bernoulli-Laplace chain in a special case (for total balls in the system); for thorough analyses of the Bernoulli-Laplace chain, see [15], [21], [6]. An extension of the Bernoulli-Laplace chain with urns is studied in [46]; this state space corresponds to for partitions of with .
Remark 3.
The random transpositions chain is sometimes defined with the probability measure:
That is, the random walk on is described: pick a card with your left hand and pick a different card with your right hand. This means the permutation will always change at each step, but then the process is periodic. Using from (3) (i.e. allowing the possibility of picking the same card) the chain is aperiodic, which allows for easier analysis of the chain on .
Note that as long as are not both equal to (in which case ), the chain using probabilities is not periodic: If there is at least one row or column with entries, e.g. and ), then picking these two entries to swap does not change the table. Thus, using to define the induced process on is also possible. The analysis and mixing times are the same for both and , except the chains have slightly different eigenvalues.
The interpretation of the chain as a lumping of the random transpositions chain on is advantageous because the chain on contingency tables inherits its eigenvalues from the chain on , and these eigenvalues are well-known. Representation theory also gives an expression for the multiplicities of the eigenvalues in terms of Kostka numbers corresponding to partitions. Section 2.1 reviews this background in more detail, and thoroughly explains part (b) of the following theorem, which are the results for eigenvalues and multiplicities.
Theorem 1.6.
Let be partitions of and the random transpositions Markov chain on the space of contingency tables .
-
(a)
The eigenvalues are of the form
for a partition of .
-
(b)
The multiplicity of for is , where is the Kostka number (defined below).
-
(c)
Eigenfunctions of are the orthogonal polynomials for the Fisher-Yates distribution.
Theorem 1.6 (a) and (b) arise from the general theory for Markov chains on double cosets developed in [12] and [18] and summarized in Section 2.1. Section 3 contains a proof of Theorem 1.6(c) as well as a description of the multiplicities. It seems unlikely that there would be a simple formula for the number and multiplicity of the eigenvalues, as this would give a way of enumerating the size of the state space , which is a #- P complete problem [11]. The orthogonal polynomials of the Fisher-Yates distribution have not been explicitly computed. Section 3.2 computes linear and quadratic eigenfunctions of , for any size table, which can be used as a starting point for finding orthogonal polynomials.
For tables, the stationary distribution is multi-variate hypergeometric, and the computation of the eigenvalues simplifies to give a more complete picture.
Theorem 1.7.
Let for some , , and be the random swap Markov chain on the space of contingency tables . Then the eigenvalues of are
with eigenbasis the orthogonal polynomials of degree for the multivariate hypergeometric distribution (defined below). The multiplicity of is the size of the set
The eigenvalues and eigenvectors allow an analysis of the convergence rates of this Markov chain. The measure that we study for convergence rate is the mixing time. That is, for a Markov chain on a space with transition probability and stationary distribution , the mixing time is defined as
where is the total-variation distance between probability measures.
It is challenging to analyze the mixing time in full generality, but understanding the linear and quadratic eigenfunctions of the chain give the following results:
-
•
For , the time to stationarity, averaged over starting positions sampled from , is bounded above by .
-
•
For and any , the multivariate Hahn polynomials are used to compute an upper bound for the time to stationarity starting from extreme states in which the second row has a single non-zero entry.
-
•
For any , the mixing time is bounded below by , where is a computable constant.
The mixing time of the random transpositions chain on is , which is an upper bound for the mixing time of the lumped chain on . The results listed above suggest that there is not much of a speed-up.
1.3 Related Work
Note that for two way contingency tables it is easy to sample directly from the Fisher-Yates distribution (by generating a random permutation and using the argument of Lemma 1.2). One reason for interest in the Markov chain mixing time is that, for three and higher way tables, the Markov chain method is the only available approach, so it is important to see how it works in the two way case.
The uniform swap Markov chain on contingency tables was first proposed in [11], with the motivation of studying the uniform distribution [10]. The mixing time of this chain has been analyzed in special cases, e.g. if and the number of rows and columns is fixed, then the mixing time is of order . When the table has two rows, the chain mixes in time polynomial in the number of columns and [25]. Chung, Graham, and Yau [7] indicate that a modified version of the chain converges in time polynomial in the number of rows and columns.
A contingency table can be thought of as a bi-partite graph with prescribed degree sequences, and a similar swap Markov chain defined for any graph with fixed degree sequences to sample from the uniform distribution, e.g. [2]. There is a large literature on methods for sampling graphs with given degree sequences (not necessarily bi-partite); see [20] for a recent review. Recently the chain was studied for contingency tables with entries in and mixing time with cut-off proven at [41].
A Markov chain on for which the stationary distribution is Fisher-Yates was studied in [17], motivated by studying conditional distributions. The Markov chain is created by taking the Gibbs sampler using the uniform swap chain defined in [11]. This gives the chain: Pick a pair of rows and columns uniformly at random. This determines a submatrix. Replace it with a sub-table with the same margins, chosen from the hypergeometric distribution. As noted in Section 2 of [17], while it is straightforward to sample directly from Fisher-Yates distribution for -way tables, there is not a simple algorithm for sampling from the analagous distribution on -way or higher-dimensional tables. Thus, Markov chains with Fisher-Yates distribution as stationary are valuable for this application; multiway tables are discussed further in Section 5.2.
1.4 Outline
Section 2 contains an overview of basic definitions and facts for Markov chains, double cosets, and orthogonal polynomials. Section 3 establishes the eigenvalues and multiplicities, as well as the eigenfunctions of the Markov chain as polynomials, and gives formulas for the linear and quadratic polynomial eigenfunctions. These are then used in Section 4 to find explicit upper and lower bounds on the mixing time of the chain, in specific cases. Section 5 discusses some future directions.
Acknowledgments
The author thanks Persi Diaconis for helpful discussion and suggestions. This work was supported by a National Defense Science & Engineering Graduate Fellowship and a Lieberman Fellowship at Stanford University.
2 Background
This section reviews necessary facts and background on Markov chains, orthogonal polynomials, and the random transpositions chain on .
2.1 Double Coset Markov Chains
The results of this section use basic tools of representation theory; in particular, induced representations and Frobenius reciprocity. For background on these topics, see [27], [30], [47]. In the specific case in hand – the symmetric group – additional tools are available; Young’s rule and the hook-length formula. These are clearly developed in [31]. See also, [29].
Let be subgroups of the finite group , a probability on . If is not contained in a coset of a subgroup, then the random walk on induced by is ergodic with a uniform stationary distribution. This random walk on is defined by picking a random element from and multiplying the current state. That is, transitions are
See [8] for an introduction to random walks on groups.
The double cosets of partition the space and any Markov chain on defines a random process on the set of double cosets by keeping track of which double coset the process on is in, giving a ‘lumped’ process. Proposition 2.1 gives a condition for when the random walk induced by on is also a Markov chain on the double cosets. For the statement we fix double coset representatives and write for the double coset . The result is proven in [12].
Proposition 2.1.
Let be a probability on with is -conjugacy invariant ( for ). Then, the image of the random walk driven by on maps to a Markov chain on with transition kernel
The stationary distribution is . If , then is reversible.
Remark 4.
It is special for a function of a Markov chain to also be Markov. In this setting, many of the famous Markov chains on are not Markov when they are lumped to the the double cosets . For example, the adjacent transpositions random walk on is induced by , which does not satisfy the conditions of Proposition 2.1. See [43] for a survey on lumped Markov chains.
Since transpositions form a conjugacy class in , the probability from (3) satisfies Proposition 2.1, so lumped to contingency tables gives a Markov process. The following lemma also directly proves that the lumping of the random transpositions chain to contingency tables is Markov, and equivalent to Definition 1.4.
Lemma 2.2.
Proof.
Suppose . Define and similarly use to define . Then the contingency table corresponding to the double coset of is defined by
Let be a transposition in . If then and . Suppose . If or then . Otherwise, , as defined in Definition 1.4.
Note that if are contained in the same double coset, i.e. , then for any
In words, for the random transpositions chain on , the probability of transitioning from the double coset to another double coset doesn’t depend on the choice of double coset representative. This is Dynkin’s condition for lumped Markov chains ([37], [43]).
∎
Eigenvalues
Let be the eigenvalues of with corresponding eigenfunctions . That is, and
For reversible Markov chains, the eigenfunctions can be chosen to be orthonormal with respect to the stationary distribution:
The eigenvalues and eigenfunctions give an exact formula for the chi-squared distance between the chain and the stationary distribution, defined
The chi-squared distance then gives an upper bound for the total variation distance. This information is summarized in the following lemma, see [37] Chapter 12.
Lemma 2.3.
For any and ,
Note that the worst-case mixing time is defined by looking at the total variation distance from the worst-case starting point. Thus to bound the mixing time it is needed to be able to analyze for any . Even when the eigenfunctions are explicitly known, this can be challenging. For chains on where a group acts transitively (for all , ), the distance from all starting states is the same and the right hand side of the bound in Lemma 2.3 is . Another bound, for any starting state , is
(4) |
with . If is not uniform this bound can vary widely, as in the following example.
Example 2.4.
The paper [12] analyzes the random transvections chain on lumped to double cosets , with the Borel subgroup of upper triangular matrices. Here double cosets are indexed by permutations in , and the uniform distribution on induces the Mallows distribution on with parameter :
where is the number of inversions in . In the setting , the reversal permutation has largest probability and the identity has smallest. Transvections form a conjugacy class in , and so character theory gives the eigenvalues and the mixing time from special starting states can be analyzed. Starting from the identity the mixing time is order (the same order as the random transvections walk on ), but starting from the reversal permutation the mixing time is order .
The analysis in Example 2.4 is amenable because, in the setting of Proposition 2.1, the measure was a class function, i.e. for all . In this case, the eigenvalues of the walk on are expressed using the characters of the irreducible complex representations of . Let be an index set for these representations, , and the character of . The restriction of to is written and is the number of times the trivial representation of appears in . By reciprocity, this is .
The following theorem is proven in [12].
Proposition 2.5.
Let be a class function on and let be subgroups of . The induced chain of Proposition 2.1 on has eigenvalues
(5) |
with multiplicity
(6) |
Further, the average square chi-squared distance to stationarity satisfies
(7) |
2.2 Random Transpositions on
The driving Markov chain on for contingency tables is the random transpositions Markov chain studied in [14]. This uses the tools of representation theory and the character theory of the symmetric group. An expository account, aimed at probabilists and statisticians is in Chapter 3D in [8]. One of the main results used below is an explicit determination of the eigenvalues of this chain.
Recall the probability measure which defines the random walk on :
This measure is not concentrated on a single conjugacy class (the identity is not in the conjugacy class of transpositions). However, we can write
where and , where denotes the conjugacy class of transpositions. (This is the same as the one discussed in Remark 3.) Then is concentration on a single conjugacy class , so the eigenvalues of the random walk are equal to the character ratio,
for each partition of .. The formula for this character ratio was determined by Frobenius; see [38] for a modern exposition, and [44] for a proof of this special case:
The eigenvalues for the walk driven by are then related:
This information is summarized in the following lemma; see [14] for more details.
Lemma 2.6 ([14], Corollary 1 & Lemma 7).
If is the random transpositions chain on driven by , then has an eigenvalue for each partition of . The eigenvalue corresponding to is
In [14], the chain is proven to mix in total variation distance, with cut-off, after . This gives an initial upper bound on the mixing time of the random transpositions chain on contingency tables.
2.3 Orthogonal Polynomials
This section reviews orthogonal polynomials and records the formula for multivariate Hahn polynomials. See [19] for a thorough exposition on multivariate orthogonal polynomials; especially Section 3 for sufficient conditions for an orthonormal basis to exists for a probability distribution.
Let be a probability distribution on a finite space . Let be the space of functions with the inner product
A set of functions are orthogonal in if
For the following lemma, denotes an index vector and is the total degree of the polynomial defined by the vector.
Lemma 2.7 ([35], Lemma 3.2).
Suppose is a distribution on the space , for some , and is an orthogonal basis of , where is a polynomial of exact degree . Let be a reversible Markov chain with transition matrix and stationary distribution . Suppose that,
Then has eigenvalue with eigenbasis .
Write for . The chi-square distance between and is
where
is called the kernel polynomial of degree .
Multivariate Hahn Polynomials
For contingency tables with only two rows, the Fisher-Yates distribution is simply the multivariate hypergeometric distribution. That is, if and , then a contingency table in can be represented by a -dimensional vector in the space
where . The multivariate hypergeometric distribution is
where . For example, this distribution arises from sampling without replacement: An urn contains balls of different colors, of color . If is a vector which records the number of each color drawn in a sample (without replacement) of size , then has the multivariate hypergeometric distribution. The orthogonal polynomials for this distribution are called the multivariate Hahn polynomials. An overview of univariate Hahn polynomials is in [28]. Following the notation from [35] (Section 2.2.1) and [26], define
For a vectors ,
The multivariate Hahn polynomials, defined in Section 2.2.1 of [35], are
(8) |
where
is the classical univariate Hahn polynomial.
For calculating the expression for chi-squared distance, if the orthogonal polynomials are the eigenfunctions, then it is the kernel polynomials which need to be solved for. These were constructed by Griffiths [23], [24]. It is an open problem to find a useful equation for the kernel polynomials evaluated at more general states.
Proposition 2.8 (Proposition 2.6 from [35]).
Suppose for all . Let be the vector with in the th index and elsewhere. Then,
(9) |
2.4 Fisher-Yates Distribution
The measure induced on contingency tables by the uniform distribution on is
(10) |
This is the Fisher-Yates distribution on contingency tables, a mainstay of applied statistical work in chi-squared tests of independence. In applications it is natural to test the assumption that row and column features are independent and that the observed table is a sample with cell probabilities . For this model, the row and column sums are sufficient statistics; conditioning on the row and column sums of the table gives the Fisher-Yates distribution.
A useful way of describing a sample from is by sampling without replacement: Suppose that an urn contains total balls of different colors, of color . To empty the urn, make sets of draws of unequal sizes. First draw balls, next , and so on until there are balls left. Create a contingency table by setting to be the number of color in the th draw.
This description is useful for calculating moments of entries in a table, by utilizing the fact that the draws are exchangeable: Let , be if in the th round of drawings, the th draw was color , and otherwise. That is,
The expectation of one entry in the table is
using that for any since the variables are exchangeable in . These calculations for the moments of Fisher-Yates tables are needed in the following section to normalize eigenfunctions.
The usual test of the independence model uses the chi-squared statistic:
Under general conditions, see e.g. [33], the chi-squared statistic has limiting limiting chi-squared distribution with degrees of freedom. This result says that, if the null hypothesis is true, most tables will be close to a table with entries . Another feature to investigate for the independence model is the number of zero entries in a contingency table. Since most tables will be close to the table , which has no zeros, zeros are a pointer to the breakdown of the independence model. Section 5.2 of [16] proves that the number of zeros is asymptotically Poisson, under naturally hypotheses on row and column sums. In [42], limit theorems for fixed points, descents, and inversions of permutations chosen uniformly from fixed double cosets are studied.
Majorization Order
Let and be tables with the same row and column sums. Say that (‘ majorizes ’) if the largest element in is greater than the largest element in , the sum of the two largest elements in is greater than the sum of the two largest elements in , and so on. Of course the sum of all elements in equals the sum of all elements of .
Example 2.9.
For tables with , there is the following ordering
Majorization is a standard partial order on vectors [40] and Harry Joe [32] has shown it is useful for contingency tables.
Proposition 2.10.
Let and be tables with the same row and column sums given by and the Fisher-Yates distribution. If , then
Proof.
From the definition, we have for a constant . This form makes it clear the right hand side is a symmetric function of the numbers . The log convexity of the Gamma function shows that it is concave. A symmetric concave function is Schur concave: That is, order-reversing for the majorization order [40]. ∎
Remark 5.
Joe [32] shows that, among the real-valued tables with given row and column sums, the independence table is the unique smallest table in majorization order. He further shows that if an integer valued table is, entry-wise, within of the real independence table, then is the unique smallest table with integer entries. In this case, the corresponding double coset has largest.
Example 2.11.
Fix a positive integer and consider an table with all entries equal to . This has constant row sums and column sums . It is the unique smallest table with these row and column sums, and so corresponds to the largest double coset. For , this table is
Contingency tables with fixed row and column sums form a graph with edges between tables that can be obtained by one move of the following: pick two rows and two columns . Add to the entry, to the entry, to the entry, and to the entry (one move of the Markov chain Definition 1.4). This graph is connected and moves up or down in the majorization order as the table with rows and columns moves up or down. See Example 2.9 above.
3 Eigenvalues and Eigenfunctions
From Proposition 2.5, the multiplicity of the eigenvalue in the contingency table chain is
These multiplicities can be determined by Young’s Rule [9], [31]: Given a partition of , take of the symbol ‘’, of the symbol ‘2’, and so on. The value is equal to the number of ways of arranging these symbols into a tableau of shape with strictly increasing columns and weakly increasing rows. In other words, is the number of semistandard Young tableau of shape and weight . These are also called Kostka numbers, and have a long enumerative history in connection with symmetric functions (see Chapter 7 of [48]).
Example 3.1.
Consider and . The possible eigenvalues and their multiplicities are in Table 1.
1 | 1 | 1 | ||
2 | 2 | 4 | ||
1 | 2 | 2 | ||
1 | 1 | 1 |
For example, for the symbols are and there are ways of arranging them into a tableau of shape with increasing rows and strictly increasing columns:
Thus for , the contribution to the multiplicity is , as shown in Table 1. While there is not an explicit formula for the result of Young’s rule (indeed, this would provide a formula for the number of contingency tables with fixed row and column sums), we can say things for special cases. See the exercises in Chapter 6.4 [38] for more properties.
Corollary 3.2.
Let be partitions of and denote the number of parts of the partition, with . Let denote the eigenvalue in the random transpositions chain corresponding to and the multiplicity of in the chain lumped to .
-
(a)
The multiplicity of the second-largest eigenvalue is .
-
(b)
If , then and .
-
(c)
If for and then
-
(d)
If , and , then
Remark 6.
Corollary 3.2(b) shows that a table with only two rows or columns will only have eigenvalues with . The eigenvalue defined by is . This has multiplicity unless for which .
Proof.
(a): An arrangement of symbols determined by into a tableau of shape is determined by the choice of the symbol to be in the second row. To fit the constraint the columns in the tableau are strictly increasing, any symbol except ‘1’ could be placed in the second row. Thus, there are possibilities. For example, if , there are 2 possibilities:
(b): Consider the first column of an arrangement of the symbols determined by into a tableau of shape . For the column to be strictly increasing, there must be at least symbols from , to give the column . Thus, if then . All of the ‘1’ symbols must be contained in the first row of the tableau, which gives the constraint .
(c): If and then there is at most one way of arranging the symbols of into a tableau of shape . For example, :
To satisfy the constraint that columns are strictly increasing it is necessary that the second row only contains symbols ‘2’. If , this is not possible. Note that the assumption ensures that there would never need to be a column with only the symbol ‘2’.
(d): Now suppose with . The assumption ensures that any assignment of the second row will obey the strictly increasing column constraint. That is, any selection of symbols from the symbols which are greater than would be a valid assignment for the second row of the tableau. There are possible symbols, of each type. If denotes the number of symbol in row , then the second row is determined by with and . The number of possibilities is exactly the number of contingency tables with row sums and column sums .
∎
The multiplicities also behave well with respect to the majorization order on partitions: If are partitions of , then if for all . In other words, can be obtained from by successively ‘moving up boxes’ ([38] Chapter 1). For example,
Remark 7.
The eigenvalues are monotonic with respect to this ordering: If , then ; see Chapter 3 of [8] for more properties. This tells us that gives the largest eigenvalue not equal to .
The following lemma is well-known in the literature, e.g. [22].
Lemma 3.3.
Let be partitions of . Then,
-
(a)
if and only if .
-
(b)
If , then .
Remark 8.
No such monotonicity exists in . For example, with , , we have yet , , .
3.1 Average Case Mixing Time
For contingency tables, Corollary 3.2 suffices to determine the multiplicities of all eigenvalues.
Lemma 3.4.
For for if for then
If , then
Proof.
Suppose with . By Corollary 3.2(c) , so the multiplicity of in the transpositions chain on is . For any other partition , the multiplicity is . Thus,
which is for .
The lower bound comes from using the term in the sum with (which gives the largest eigenvalue, with multiplicity ):
if . The last inequality is Bernoulli’s inequality. ∎
Remark 9.
A table in is determined by one entry, say the entry , so the table is
Then has the uni-variate hypergeometric distribution with parameters . The transitions for are
Remark 10.
In contrast to the previous remark, the Bernoulli-Laplace urn has transitions
In [14], for the special case (which corresponds to the state space ), the Bernoulli-Laplace chain is proven to mix steps. The paper [21] studies a more general model in which balls are swapped between the two urns; the chain is shown to have mixing time . These are special cases of the random walk on a distance regular graph studied in [3].
Remark 11.
Note that the random transpositions Markov chain on is transitive, and so the behavior of the chain, especially the mixing time, does not depend on the starting state. The chain lumped to is not transitive, and the mixing time could heavily rely on the starting state. It is expected that the mixing time is fastest starting from states with high probability (large double cosets) and slowest from the states with low probability (small double cosets). From Remark 5, the highest probability table is the one closest to the ‘independence table’; the tables with low probability are the sparse tables with many zeros.
Remark 12.
For any , suppose . From the table we can create a table by ‘collapsing’ columns and rows . That is, set
Then . Futhermore, has the uni-variate hypergeometric distribution with parameters . The random transpositions chain on lumps to the uni-variate Markov chain on . Thus the mixing time for the -entry Markov chain is a lower bound for the mixing time of the full process.
3.2 Orthogonal Polynomials
This section proves part (3) of Theorem 1.6, that the eigenfunctions of the random transpositions chain on contingency tables are orthogonal polynomials. While it is difficult to find an explicit formula for all of these polynomials, analysis of the chain provides a way to calculate the linear and quadratic polynomials.
Theorem 3.5.
Let be partitions of and the random transpositions Markov chain on the space of contingency tables with transition matrix . For any and ,
The eigenfunctions for are polynomials.
Proof.
Let represent the Markov chain on , with denoting the value in the cell. Let . Marginally, each entry of the table is a birth-death process: For any ,
(11) | |||
(12) |
These transitions allow for the calculation of , for any integer power . That is,
By Lemma 2.7, this condition means the eigenfunctions for are polynomials. ∎
Straightforward calculation of the transition probabilities gives the linear eigenfunctions:
Lemma 3.6.
For any and , the functions
are eigenvectors with eigenvalue .
Proof.
Remark 13.
To find the quadratic eigenvectors, the following computations are needed.
Lemma 3.7.
For any , and ,
(15) |
Proof.
For the first case, if and then the only situation in which both the and entries are chosen is if they are opposite corners of the box chosen for the swap move. In the following calculation, (16) is the case where both the and entries change, (17) is the case where only the entry changes, and (18) is the case where only the entry changes.
(16) | ||||
(17) | ||||
(18) | ||||
Now suppose and . In this case the and entries of the table could only both change if one increases and the other decreases. This gives
(19) | ||||
(20) | ||||
(21) | ||||
Lemma 3.8 (Quadratic Eigenfunctions).
Let be partitions of with . Let with . For the Markov chain on , the following functions are eigenfunctions with eigenvalue , defined for :
-
(a)
-
(b)
-
(c)
Remark 14.
Using Equation 14, one can check that , for any .
Proof.
The results follow from straightforward calculations using Lemma 3.7 and Equation 13. As an illustration, (a) is computed: In , the degree terms will be
Degree terms arise from the expectation of six of the terms in :
Collecting the terms gives
and the computation is the same for the other linear terms. Finally, the constant terms arise from:
This gives
Further details are omitted. ∎
4 Mixing Time
This section contains results on the mixing time for special cases. Throughout, keep in mind that random transpositions on has mixing time , which is an upper bound for the mixing time on . The question then is: Does the lumping speed up mixing, and if so by what order?
4.1 Upper Bound for tables
Due to the complicated nature of the orthogonal polynomials, the upper bound can only be analyzed for the specific starting states for which the kernel polynomials simplify. Suppose , for , and at least one such that .
Let be the table with the second row all except in the th column. For example, when , we have the following table
Note that the assumption ensure that this table exists.
Recall Proposition 2.8 which says, evaluated at these states, the kernel polynomials for the multivariate orthogonal polynomials have a simple closed-form expression:
(22) |
Recall the notation for the decreasing factorial, with .
Theorem 4.1.
Let be the transition matrix for the swap Markov chain , with and with for at least one index . For any and such that ,
-
(a)
If
then .
-
(b)
If
then .
Remark 15.
In [35] the kernel polynomials for the multivariate hypergeometric distribution are similarly used to analyze three classes of Bernoulli-Laplace type Markov chains, which have multivariate hypergeometric stationary distribution. Theorem 4.1 can be compared to Proposition 4.2.1 [35]. The random transpositions chain on is essentially a special case of the Bernoulli Laplace level model.
Proof.
Using the expression (22) for the kernel polynomials, the chi-squared distance is
To help bound this sum, let be the th term in the summand, i.e.
Then the ratio of consecutive terms can be bounded:
If and
then the ratio is less than . Also the first term is the largest term, and
Thus,
The bound for part (b) comes from considering the contribution to the sum from the largest eigenvalues, which is for . This gives
using that for . The sum is then for . ∎
Example 4.2.
4.2 Lower Bound
Because the linear eigenfunctions are known for any size contingency table, Wilson’s method works to give a general lower bound.
Theorem 4.3 (Wilson’s Method, Theorem 13.5 in [37]).
Let be an irreducible aperiodic Markov chain with state space and transition matrix . Let be an eigenfunction of with eigenvalue with . Fix and let satisfy
Then for any ,
For the contingency table Markov chain on the linear functions
for any , are eigenfunctions with eigenvalue . These will be used in Wilson’s method to get the following result.
Lemma 4.4.
Let be any partitions of . For any and ,
where .
Proof.
This allows the calculation of for :
First suppose that are such that , and note that for . A bound then is
This is the constant that can be applied to Wilson’s method. Using such that :
If , then
and in this case
Finally, note that, with and using the bound ,
∎
Example 4.5.
Suppose with (as in Example 4.2). Then , so the second case in Lemma 4.4 always applies to the entry of the table. Since , a lower bound is
For example, if , then the expression inside the is equal to .
If are small enough so that , then the first case of Lemma 4.4 applies to the entry, with . The lower bound is then
For example, , then the expression inside the is equal to .
5 Further Directions
This section discussions some future directions and applications of the random transpositions chain on contingency tables. Section 5.1 explores how the linear and quadratic eigenfunctions of the Fisher-Yates distribution could be used in statistical applications, to decompose the chi-squared statistic. Section 5.2 notes how the random transpositions chain can be extended to multi-way tables (coming from data with more than categorical features). Section 5.3 considers the random transpositions chain on transformed via the Metropolis-Hastings algorithm to a new Markov chain which has uniform stationary distribution (and vice-versa, the symmetric swap chain can be transformed to have Fisher-Yates stationary distribution). The relaxation times of the chains can be compared. The constants in the comparison depend on and the size of the table, but the conclusion is that for sparse tables the metropolized version of random transpositions has a significantly smaller relaxation time than the symmetric swap chain. Section 5.4 notes the -analog of the Fisher-Yates distribution arises via double cosets.
5.1 Data Analysis
This section discusses potential statistical applications of the eigenfunctions for the random transpositions chain on contingency tables. First, some classic datasets are described.
Datasets.
Tables 2, 3, 4 are classical real data tables with large statistic. Figure 1 shows a histogram of the the quadratic eigenfunctions evaluated on each of these tables, as well as a plot of the Pearson residuals. We have not succeded in finding any extra structure from these displays but believe that they may sometimes be informative.
Well | Mild | Moderate | Impaired | Total | |
---|---|---|---|---|---|
A | 64 | 94 | 58 | 46 | 262 |
B | 57 | 94 | 54 | 40 | 245 |
C | 57 | 105 | 65 | 60 | 287 |
D | 72 | 141 | 77 | 94 | 384 |
E | 36 | 97 | 54 | 78 | 265 |
F | 21 | 71 | 54 | 71 | 217 |
Total | 307 | 602 | 362 | 389 | 1660 |
Table 2 shows data from an epidemiological survey known as the Midtown Manhattan Mental Health Study [39]. Rows record parent’s socioeconomic status (ranging from A = high, to F = low) and columns severity of mental illness. The statistic is on degrees of freedom.
Jan | Feb | March | April | May | June | July | Aug | Sep | Oct | Nov | Dec | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jan | 1 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 6 |
Feb | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 5 |
March | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 5 |
April | 3 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 1 | 3 | 1 | 1 | 12 |
May | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 12 |
June | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 |
July | 2 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 10 |
Aug | 0 | 0 | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | 7 |
Sep | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 3 |
Oct | 1 | 1 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 7 |
Nov | 0 | 1 | 1 | 1 | 2 | 0 | 0 | 2 | 0 | 1 | 1 | 0 | 9 |
Dec | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 |
Total | 13 | 4 | 7 | 10 | 8 | 4 | 5 | 3 | 4 | 9 | 7 | 8 | 82 |
Table 3 records the month of birth and death for descendants of Queen Victoria, occurs as an example in [17]. The statistic is with degrees of freedom, which gives -value , suggesting we do not reject the null hypothesis for independence. The classical rules of thumb for validity of the chi-square approximation is that there is a minimum of entries per cell; this assumption is are badly violated in Table 3, and there are too many tables with these margins for exact enumeration.
Black | Brown | Red | Blond | Total | |
---|---|---|---|---|---|
Brown | 68 | 119 | 26 | 7 | 220 |
Blue | 20 | 84 | 17 | 94 | 215 |
Hazel | 15 | 54 | 14 | 10 | 93 |
Green | 5 | 29 | 14 | 16 | 64 |
Total | 108 | 286 | 71 | 127 | 592 |

Residuals.
The linear eigenfunctions derived in Lemma 3.6 are well known as ‘Pearson residuals’ in the classical analysis of contingency tables [1]. The naturally scaled eigenfunctions
measure the departure of the table from the null model. Standard practice displays all of these in a two way array. Another scaling is inspired by the inner product space with respect to the Fisher-Yates distribution, so that the functions have norm :
Well | Mild | Moderate | Impaired | |
---|---|---|---|---|
A | 2.233 | -0.104 | 0.114 | -1.965 |
B | 1.737 | 0.546 | 0.078 | -2.298 |
C | 0.538 | 0.090 | 0.305 | -0.885 |
D | 0.117 | 0.148 | -0.737 | 0.423 |
E | -1.858 | 0.092 | -0.498 | 2.018 |
F | -3.020 | -0.867 | 0.971 | 2.826 |
Well | Mild | Moderate | Impaired | |
---|---|---|---|---|
A | 2.695 | -0.142 | 0.141 | -2.446 |
B | 2.083 | 0.741 | 0.096 | -2.844 |
C | 0.656 | 0.124 | 0.379 | -1.111 |
D | 0.147 | 0.211 | -0.950 | 0.551 |
E | -2.245 | 0.125 | -0.615 | 2.515 |
F | -3.587 | -1.165 | 1.177 | 3.462 |
Chi-squared Decomposition.
It is natural to try to use the quadratic eigenfunctions in a similar way. They do not have an interpretation as (observed - expected) for some observed statistic, but they do have expectation with respect to .
A potential use for the polynomial eigenfunctions is in a decomposition of the chi-squared statistic
(23) |
Under the null hypothesis that row and column features are independent, has a limiting chi-squared distribution with degrees of freedom. A drawback of using is that when it is large enough to reject the null hypothesis, no additional information is given about why the sample fails the test. This motivates the subject of partitioning the statistic into terms which could give insight into what parts of the table fail the independence hypothesis. The thesis [45] contains a thorough review of this problem, and a theory for decomposing using Markov chains on the space . Presented below is a natural decomposition using the linear and quadratic eigenfunctions and .
Let , be such that
so that . Then,
Figure 1 shows histograms of the normalized quadratic eigenfunctions .
5.2 Higher Dimensional Tables
The random swap Markov chain can be used as inspiration for Markov chains on -way contingency tables with fixed margins for . This section describes how that could go for .
Let be three partitions of with . Let be the set of three-way tables with fixed margins determined by . That is if
The partitions are the sufficient statistics for the complete independence model: The probability of an entry being is . A table can by thought of as tri-partite hypergraph, where each edge connects exactly three vertices.
Representing a table by a set of tuples , we can describe a similar adjacent swap Markov chain as: Pick tuples , pick uniformly, and swap the th entry in the two tuples. For example,
This move still corresponds to adding to a submatrix of the contingency table. The probability of picking the two tuples . To be precise about the Markov chain, let denote the table where the swap was made in the indicated entries of the table at the index, with . Then,
These same moves were proposed in [17] as input for a Metropolis Markov chain for log-linear models on multi-way tables (Section 4.2); for that Markov chain the tuples to be swapped are chosen uniformly from all entries (and if the swap move results in a negative value in the table, it is rejected), thus the chain has uniform stationary distribution.
Theorem 5.1.
The Markov chain on is connected and reversible with respect to the natural analog of Fisher-Yates for 3-way tables:
Proof.
To see that the Markov chain is connected, we can define a partial order on the space , analagous to the majorization order on used in Proposition 2.10: Write if
One move of the Markov chain corresponds to one edge in the lattice defined by this partial order (i.e. if then or ). Thus, every table can transition to the largest element and so the space is connected under .
To see that the chain is reversible with the correct stationary distribution, given suppose with so that . Then,
Then,
The calculation is analogous for . ∎
Remark 16.
It would be interesting to have a double-coset representation for , but we have not discovered one.
5.3 Comparison of Markov Chains
This new chain on contingency tables can be compared to other Markov chains on contingency tables using simple spectral comparison techniques, as developed in [13].
The comparison technique relies on the variational characterization of the spectral gap of a Markov chain: If is a reversible Markov chain on with stationary distribution and the spectral gap, then
where
The relaxation time is defined as the inverse of the absolute spectral gap , and can be used as one measure of mixing.
Lemma 5.2 (Lemma 13.22 in [37]).
Let and be reversible transition matrices on with stationary distributions and , respectively. If for all , then
(24) |
The term makes it difficult to compare Markov chains with stationary distributions which are very different. The uniform distribution and the Fisher-Yates distribution on are exponentially different and the ratio of the distribution cannot be bounded by a constant.
Example 5.3.
With , with , there are tables. The table with the smallest Fisher-Yates probability is
with . Taking for example gives
by Stirling’s approximation. This is in sharp contrast with the uniform probability .
Instead, we can use the Metropolis algorithm on either the symmetric swap Markov chain or the random transpositions Markov chain to get two chains with the same distribution to compare. This is done in both cases in the following sections.
Let be the random transpositions Markov chain on from Definition 1.4. Let be the symmetric swap Markov chain which has stationary uniform distribution. That is,
Let be the chain from Metropolizing to have uniform stationary distribution and be the result of Metropolizing to have stationary distribution Fisher-Yates.
Theorem 5.4.
Let be the associated relaxation times for the Markov chains . Define
If , then
-
(a)
-
(b)
For any , note that and .
Example 5.5.
If , , then and . Then Theorem 5.4 shows
The intuition is that if is growing and is fixed then the tables are relatively sparse, and so the Metropolis chain is much more likely than the uniform swap chain to propose moves which are accepted. Accepting more moves means exploring the state space more quickly, and thus achieving stationarity. Similarly, for the chains with Fisher-Yates stationary distribution in this setting,
Again, if is fixed and is larger the Markov chain utilizing the random transpositions has significantly smaller relaxation time than the Markov chain created by using the uniform swap moves with the Metropolis algorithm. Note that if , then the Fisher-Yates distribution is exactly the uniform distribution.
The remainder of this section reviews the Metropolis-Hastings Algorithm, finds the transition probabilities and , and then proves Theorem 5.4.
Metropolis-Hastings Algorithm
The random transpositions chain could be used as the proposal Markov chain in Metropolis-Hastings algorithm to sample from the uniform distribution. Suppose is a Markov chain on and is a target distribution. A new Markov chain on which is reversible with respect to is defined by: From
-
1.
Generate a random candidate according to one step of the chain, i.e. .
-
2.
Compute the acceptance probability
-
3.
With probability , move to . Otherwise, stay at the current state .
The transitions for the new Markov chain are defined
Uniform Distribution Comparison
For two states such that , the acceptance probability can be computed. One way to see this is to note that since is reversible with respect to the Fisher-Yates distribution , it is
supposing that .
In conclusion, the Metropolis chain has transitions
This is clearly a symmetric Markov chain and has uniform stationary distribution. Note that this in general, any chain when transformed via Metropolis to give the uniform stationary distribution is if the form .
To make the comparison of Dirichlet forms, since the stationary distributions are the same it is enough to compare the transition probabilities. This can be done,
and similarly
This gives the result, since
Fisher-Yates Distribution Comparison
In the other direction, we could take the symmetric swap Markov chain and Metropolize it to have stationary distribution Fisher-Yates. The acceptance probability in this direction is
The transition probabilities, for are then
To compare this with the random transpositions chain ,
And similarly
This proves part (b) of Theorem 5.4.
5.4 Parabolic Subgroups of
Contingency tables also arise from double cosets using the parabolic subgroups of , as described in [34].
Definition 5.6.
Let be a partition of . The parabolic subgroup consists of all invertible block upper-triangular matrices with diagonal block sizes
Section 4 of [34] shows that if are two partitions of , the double cosets are indexed by contingency tables with row sums and column sums . Proposition 4.37 contains the size of a double-coset which corresponds to the table , with
(25) |
Dividing (25) by and setting recovers the usual Fisher-Yates distribution for partitions . It remains an open problem to investigate these probability distributions and Markov chains on the double cosets of from parabolic subgroups.
References
- [1] A. Agresti. A survey of exact inference for contingency tables. Statist. Sci., 7(1):131–177, 1992. With comments and a rejoinder by the author.
- [2] G. Amanatidis and P. Kleer. Rapid mixing of the switch Markov chain for strongly stable degree sequences. Random Structures & Algorithms, 57(3):637–657, 2020.
- [3] E. D. Belsley. Rates of convergence of random walk on distance regular graphs. Probability theory and related fields, 112(4):493–533, 1998.
- [4] S. C. Billey, M. Konvalinka, T. K. Petersen, W. Slofstra, and B. E. Tenner. Parabolic double cosets in coxeter groups, 2017.
- [5] O. Bormashenko. A coupling argument for the random transposition walk. arXiv preprint arXiv:1109.3915, 2011.
- [6] T. Ceccherini Silberstein, F. Scarabotti, and F. Tolli. Harmonic analysis on finite groups, volume 108. Cambridge University Press, 2008.
- [7] F. R. K. Chung, R. L. Graham, and S.-T. Yau. On sampling with Markov chains. Random Structures & Algorithms, 9(1-2):55–77, 1996.
- [8] P. Diaconis. Group representations in probability and statistics, volume 11 of Institute of Mathematical Statistics Lecture Notes—Monograph Series. Institute of Mathematical Statistics, Hayward, CA, 1988.
- [9] P. Diaconis. A generalization of spectral analysis with application to ranked data. The Annals of Statistics, pages 949–979, 1989.
- [10] P. Diaconis and B. Efron. Testing for independence in a two-way table: new interpretations of the chi-square statistic. Ann. Statist., 13(3):845–913, 1985. With discussions and with a reply by the authors.
- [11] P. Diaconis and A. Gangolli. Rectangular arrays with fixed margins. In Discrete probability and algorithms, pages 15–41. Springer, 1995.
- [12] P. Diaconis, A. Ram, and M. Simper. Double coset markov chains and random transvections, 2022.
- [13] P. Diaconis and L. Saloff-Coste. Comparison theorems for reversible markov chains. The Annals of Applied Probability, 3(3):696–730, 1993.
- [14] P. Diaconis and M. Shahshahani. Generating a random permutation with random transpositions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 57(2):159–179, 1981.
- [15] P. Diaconis and M. Shahshahani. Time to reach stationarity in the Bernoulli-Laplace diffusion model. SIAM J. Math. Anal., 18(1):208–218, 1987.
- [16] P. Diaconis and M. Simper. Statistical enumeration of groups by double cosets. Journal of Algebra, 607:214–246, 2022. Special Issue dedicated to J. Saxl.
- [17] P. Diaconis and B. Sturmfels. Algebraic algorithms for sampling from conditional distributions. Annals of statistics, 26(1):363–397, 1998.
- [18] P. Diaconis and P. M. Wood. Random doubly stochastic tridiagonal matrices. Random Structures & Algorithms, 42(4):403–437, 2013.
- [19] C. F. Dunkl and Y. Xu. Orthogonal Polynomials of Several Variables. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2 edition, 2014.
- [20] U. Dutta, B. K. Fosdick, and A. Clauset. Sampling random graphs with specified degree sequences. arXiv e-prints, pages arXiv–2105, 2021.
- [21] A. Eskenazis and E. Nestoridi. Cutoff for the Bernoulli–Laplace urn model with swaps. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 56, pages 2621–2639. Institut Henri Poincaré, 2020.
- [22] M. Fayers. A note on Kostka numbers. arXiv preprint arXiv:1903.12499, 2019.
- [23] B. Griffiths. Orthogonal polynomials on the multinomial distribution. Australian Journal of Statistics, 13(1):27–35, 1971.
- [24] R. C. Griffiths and D. Spano. Multivariate Jacobi and Laguerre polynomials, infinite-dimensional extensions, and their probabilistic connections with multivariate Hahn and Meixner polynomials. Bernoulli, 17(3):1095–1125, 2011.
- [25] D. Hernek. Random generation of 2 contingency tables. Random Structures & Algorithms, 13(1):71–79, 1998.
- [26] P. Iliev and Y. Xu. Discrete orthogonal polynomials and difference equations of several variables. Advances in Mathematics, 212(1):1–36, 2007.
- [27] I. M. Isaacs. Character theory of finite groups, volume 69. Courier Corporation, 1994.
- [28] M. E. H. Ismail. Classical and Quantum Orthogonal Polynomials in One Variable. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2005.
- [29] James. The Representation Theory of the Symmetric Group. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1984.
- [30] G. James, M. W. Liebeck, and M. Liebeck. Representations and characters of groups. Cambridge University Press, 2001.
- [31] G. D. James. The representation theory of the symmetric groups. In Proc. Symposia in Pure Math, volume 47, pages 111–126, 1987.
- [32] H. Joe. An ordering of dependence for contingency tables. Linear algebra and its applications, 70:89–103, 1985.
- [33] S.-h. Kang and J. Klotz. Limiting conditional distribution for tests of independence in the two way table. Comm. Statist. Theory Methods, 27(8):2075–2082, 1998.
- [34] S. N. Karp and H. Thomas. -Whittaker functions, finite fields, and Jordan forms. arXiv preprint arXiv:2207.12590, 2022.
- [35] K. Khare and H. Zhou. Rates of convergence of some multivariate Markov chains with polynomial eigenfunctions. The Annals of Applied Probability, 19(2):737–777, 2009.
- [36] H. O. Lancaster. The chi-squared distribution. John Wiley & Sons, Inc., New York-London-Sydney, 1969.
- [37] D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- [38] I. G. Macdonald. Symmetric functions and Hall polynomials. Oxford university press, 1998.
- [39] B. MacMahon. Mental health in the metropolis: The midtown Manhattan study, 1963.
- [40] A. W. Marshall, I. Olkin, and B. C. Arnold. Inequalities: theory of majorization and its applications, volume 143. Springer, 1979.
- [41] E. Nestoridi and O. Nguyen. On the mixing time of the Diaconis–Gangolli random walk on contingency tables over . In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 56, pages 983–1001. Institut Henri Poincaré, 2020.
- [42] J. Paguyo. Fixed points, descents, and inversions in parabolic double cosets of the symmetric group. arXiv preprint arXiv:2112.07728, 2021.
- [43] C. Pang. Lumpings of algebraic Markov chains arise from subquotients. Journal of Theoretical Probability, 32(4):1804–1844, 2019.
- [44] S. RE Ingram. Some characters of the symmetric group. Proceedings of the American Mathematical Society, pages 358–369, 1950.
- [45] J. Salzman. Spectral analysis with Markov chains. PhD thesis, Stanford University, 2007. Copyright - Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works; Last updated - 2021-09-28.
- [46] F. Scarabotti. Time to reach stationarity in the bernoulli–laplace diffusion model with many urns. Advances in Applied Mathematics, 18(3):351–371, 1997.
- [47] J.-P. Serre. Linear representations of finite groups, volume 42. Springer, 1977.
- [48] R. P. Stanley. Enumerative Combinatorics, volume 1 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2 edition, 2011.