Random Transpositions on Contingency Tables

Mackenzie Simper [email protected] Stanford University

Abstract

Contingency tables are useful objects in statistics for representing 2-way data. With fixed row and column sums, and a total of $n$ entries, contingency tables correspond to parabolic double cosets of $S_{n}$ . The uniform distribution on $S_{n}$ induces the Fisher-Yates distribution, classical for its use in the chi-squared test for independence. A Markov chain on $S_{n}$ can then induce a random process on the space of contingency tables through the double cosets correspondence. The random transpositions Markov chain on $S_{n}$ induces a natural ‘swap’ Markov chain on the space of contingency tables; the stationary distribution of the Markov chain is the Fisher-Yates distribution. This paper describes this Markov chain and shows that the eigenfunctions are orthogonal polynomials of the Fisher-Yates distribution. Results for the mixing time are discussed, as well as connections with sampling from the uniform distribution on contingency tables, and data analysis.

1 Introduction

Contingency tables are a classical object of study, useful in statistics for representing data with two categorical features. For example, rows could be hair colors and columns could be eye colors. The number in an entry in the table counts the number of individuals from a sample with a fixed hair, eye color combination. Various statistical tests exist to test the hypothesis that the row and column features are independent, and other more general models. See e.g. [36], [1] and Section 5 of [16] for many more references.

Given $\lambda=(\lambda_{1},\dots,\lambda_{I}),\mu=(\mu_{1},\dots,\mu_{J})$ two partitions of $n$ , the set of contingency tables $\mathcal{T}_{\lambda,\mu}$ is the set of $I\times J$ tables with non-negative integer entries with row sums $\lambda_{1},\dots,\lambda_{I}$ and column sums $\mu_{1},\dots,\mu_{J}$ . As explained below, the space $\mathcal{T}_{\lambda,\mu}$ is in bijection with the space of double cosets $S_{\lambda}\backslash S_{n}/S_{\mu}$ , where $S_{n}$ is the symmetric group on $n$ elements and $S_{\lambda},S_{\mu}$ are the Young subgroups defined by $\lambda,\mu$ . Though this is a classical fact [29], recently it was studied in [16] with probabilistic motivation. The general question is: Given a finite group $G$ and two subgroups $H,K$ , what is the distribution on $H\backslash G/K$ induced by picking a element uniformly from $G$ and mapping it to its double coset? For $S_{\lambda}\backslash S_{n}/S_{\mu}$ , the induced distribution on $\mathcal{T}_{\lambda,\mu}$ is the Fisher-Yates distribution:

\pi_{\lambda,\mu}(T)=\frac{1}{n!}\prod_{i,j}\frac{\lambda_{i}!\mu_{j}!}{T_{ij}!},\quad T=\{T_{ij}\}_{1\leq i\leq I,1\leq j\leq J}\in\mathcal{T}_{\lambda,\mu}.

(1)

The Fisher-Yates distribution is classically defined as the conditional distribution on a contingency table sampled with cell probabilities $p_{ij}$ which satisfy $p_{ij}=\theta_{i}\eta_{j}$ (the assumption that row and column features are independent). It is the distribution of the table conditioned on the sufficient statistics (the row and column sums). It is a pleasant surprise that this same distribution is induced by the double coset decomposition.

Whenever there is a Markov chain on some state space, one can consider the lumped process on any partitioning of this space. In [12] the question is studied for double cosets: Given a Markov chain on $G$ , is the lumped process on $H\backslash G/K$ also a Markov chain? An affirmative answer is proven when the Markov chain on $G$ is induced by a probability measure on $G$ which is constant on conjugacy classes.

Many Markov chains on $S_{n}$ have been studied, and perhaps the most famous and easiest to describe is the random transpositions Markov chain [14]. This is generated by the class function $Q$ which gives equal probability to transpositions. The Markov chain on $\mathcal{T}_{\lambda,\mu}$ induced by random transpositions turns out to be novel and easy to describe. One move involves adding a sub-matrix of the form $\begin{pmatrix}+1&-1\cr-1&+1\end{pmatrix}$ to the current table; the probability of selecting the rows/columns for the sub-matrix depend on the current configuration. Since the stationary distribution of the random transpositions chain on $S_{n}$ is uniform, the induced chain on $\mathcal{T}_{\lambda,\mu}$ has Fisher-Yates distribution as stationary distribution. These statements are proven in Section 1.1 below.

This same move has previously been studied for a Markov chain on contingency tables with rows and columns for the sub-matrix chosen uniformly at random (and moves which would give a negative entry in the table are rejected). This ‘uniform swap’ chain was proposed in [11] to sample contingency tables from the uniform distribution, as a way of approximately counting the number of tables, which is a # -P complete problem.

The purpose of this paper is to describe the random transposition chain on $\mathcal{T}_{\lambda,\mu}$ , characterize its eigenvalues and eigenvectors, and analyze the mixing time in special cases. This chain has polynomial eigenvectors, which can be used to give orthogonal polynomials for the Fisher-Yates distribution. In the special case of a two-rowed table, the Fisher-Yates distribution is the multivariate hypergeometric distribution and the orthogonal polynomials are the multivariate Hahn polynomials. More generally, this Markov chain gives a characterization of the natural analog of Hahn polynomials for the Fisher-Yates distribution.

The remainder of this section reviews the correspondence between contingency tables and double cosets, defines the Markov chain explicitly, states the main results of this paper, and reviews related work.

1.1 Contingency Tables as Double Cosets

This section describes the relationship between contingency tables and double cosets of $S_{n}$ . Let $n$ be a positive integer and suppose $\lambda=(\lambda_{1},\lambda_{2},\dots,\lambda_{I})$ is a partition of $n$ . That is, $\lambda_{i}$ are all positive integers, $\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{I}>0$ , and $\lambda_{1}+\lambda_{2}+\ldots+\lambda_{I}=n$ .

Definition 1.1 (Young Subgroup).

The parabolic subgroup, or Young subgroup, $S_{\lambda}$ is the set of all permutations in $S_{n}$ which permute only $\{1,2,\dots,\lambda_{1}\}$ among themselves, only $\{\lambda_{1}+1,\dots,\lambda_{1}+\lambda_{2}\}$ among themselves, and so on. Thus,

S_{\lambda}\cong S_{\lambda_{1}}\times S_{\lambda_{2}}\times\ldots\times S_{\lambda_{I}}.

If $G$ is an arbitrary finite group and $H,K$ are two sub-groups of $G$ , then the $H-K$ double-cosets are the equivalence classes defined by the equivalence relation

s\sim t\iff s=htk\,\,\,\,\,\,\,\,\text{for}\,\,\,\,\,\,\,\,s,t\in G,\,\,\,h\in H,\,\,\,k\in K.

Let $\mu=(\mu_{1},\dots,\mu_{J})$ be a second partition of $n$ , let $\mathcal{T}_{\lambda,\mu}$ denote the space of contingency tables with row sums given by $\lambda$ and column sums $\mu$ . That is, an element of $\mathcal{T}_{\lambda,\mu}$ is an $I\times J$ table with non-negative integer entries, row sums $\lambda_{1},\dots,\lambda_{I}$ and column sums $\mu_{1},\dots,\mu_{J}$ . The following bijection is described in further detail in [29], and the induced distribution, along with many more references, in [16].

Lemma 1.2.

The set of double-cosets $S_{\lambda}\backslash S_{n}/S_{\mu}$ is in bijection with $\mathcal{T}_{\lambda,\mu}$ . The distribution on $\mathcal{T}_{\lambda,\mu}$ induced by sampling a uniform $\sigma\in S_{n}$ and mapping it to it’s corresponding double-coset is the Fisher-Yates distribution

\pi_{\lambda,\mu}(T)=\frac{1}{n!}\prod_{i,j}\frac{\mu_{i}!\lambda_{j}!}{T_{ij}!},\quad T\in\mathcal{T}_{\lambda,\mu}.

(2)

Remark 1.

Without loss of generality, we assume that the rows and columns of the contingency tables $\mathcal{T}_{\lambda,\mu}$ are ordered, i.e. the first row/column has the largest sum and then row/column sums are decreasing. This ordering does not effect any of the results in this paper.

The mapping from $S_{n}$ to tables is easy to describe: Fix $\sigma\in S_{n}$ . Inspect the first $\lambda_{1}$ positions in $\sigma$ . Let $T_{11}$ be the number of elements from $\{1,2,\dots,\mu_{1}\}$ occurring in these positions, $T_{12}$ the number of elements from $\{\mu_{1}+1,\dots,\mu_{1}+\mu_{2}\}$ , …and $T_{1J}$ the number of elements from $\{n-\mu_{J}+1,\dots,n\}$ . In general, $T_{ij}$ is the number of elements from $\{\mu_{1}+\ldots+\mu_{i-1}+1,\dots,\mu_{1}+\ldots+\mu_{j}\}$ which occur in the positions $\lambda_{1}+\lambda_{2}+\ldots+\lambda_{i-1}+1$ up to $\lambda_{1}+\ldots+\lambda_{i}$ .

Example 1.3.

When $n=5$ , $\lambda=(3,2)$ , $\mu=(2,2,1)$ there are five possible tables:

	$\displaystyle\begin{pmatrix}2&1&0\cr 0&1&1\end{pmatrix}\,\,\,\,\,\,\,\,\,\,\,\begin{pmatrix}2&0&1\cr 0&2&0\end{pmatrix}\,\,\,\,\,\,\,\,\,\,\,\begin{pmatrix}1&2&0\cr 1&0&1\end{pmatrix}\,\,\,\,\,\,\,\,\,\,\,\begin{pmatrix}1&1&1\cr 1&1&0\end{pmatrix}\,\,\,\,\,\,\,\,\,\,\,\begin{pmatrix}0&2&1\cr 2&0&0\end{pmatrix}$
	$\displaystyle\hskip 11.38109pt\sigma=12345\hskip 28.45274pt\sigma=12534\hskip 28.45274pt\sigma=13425\hskip 28.45274pt\sigma=13524\hskip 28.45274pt\sigma=34512$
	$\displaystyle\,\,\,\,\,\,\,\,\,\,\,\,24\hskip 65.44133pt12\hskip 65.44133pt24\hskip 62.59605pt48\hskip 62.59605pt12$

Listed below each table is a permutation in the corresponding double coset, and the total size of the double coset. The double coset representatives are chosen to be the one of minimal length, i.e. the fewest number of adjacent transpositions to change the permutation to the identity. It is always possible to a unique shortest double coset representative [4]. This is easy to identify: Given $T$ , build $\sigma$ sequentially, left to right, by putting down $1,2,\dots,T_{11}$ then $\mu_{1}+1,\mu_{1}+2,\dots,\mu_{1}+T_{12}$ … each time putting down the longest available numbers in the $\mu_{j}$ block, in order. In example 1.3, the shortest double coset representatives are shown.

Thanks to Zhihan Li, another way to describe the mapping from $S_{n}$ to contingency tables is to consider the $n\times n$ permutation matrix defined by $\sigma$ . The partition $\mu$ divides the columns into $J$ blocks and $\lambda$ divides the rows into $I$ blocks. The table $T$ corresponding to $\sigma$ is defined by $T_{ij}$ as the number of $1$ s in the $i$ block of rows and $j$ block of columns. For example, $\sigma=12543$ gives the permutation matrix

	$\mu_{1}$		$\mu_{2}$		$\mu_{3}$
$\lambda_{1}$	1	0	0	0	0
	0	1	0	0	0
	0	0	0	0	1
$\lambda_{2}$	0	0	0	1	0
	0	0	1	0	0

which defines the table $\begin{pmatrix}2&0&1\cr 0&2&0\end{pmatrix}$ .

1.2 Markov Chain and Main Results

The random transpositions chain on $S_{n}$ is easy to describe. If a permutation $\sigma\in S_{n}$ is viewed as an arrangement of a deck of $n$ unique cards, the random transpositions chain can be described: Pick one card with your left hand and one card with your right hand (allowing for the possibility of picking the same card), and swap the two cards. The transition probabilities are generated by the probability measure $Q$ on $S_{n}$ defined

Q(\sigma)=\begin{cases}\frac{2}{n^{2}}&\text{if}\,\,\,\ \sigma=(i,j),i<j\\ \frac{1}{n}&\text{if}\,\,\,\ \sigma=id\\ 0&\text{otherwise},\end{cases}

(3)

where $(i,j)$ denotes the transposition of elements $i$ and $j$ . The transition matrix for the random transposition Markov chain is then

P(x,y)=Q(y\cdot x^{-1}),\,\,\,\,\,\,\,\,x,y\in S_{n}.

Note that $Q$ is constant on conjugacy classes: If $i<j,k<\ell$ and $x=(ij),y=(k\ell)$ are two transpositions, then

\displaystyle yxy^{-1}=(k\ell)\cdot(ij)\cdot(k\ell)=\begin{cases}(ij)&\text{if}\,\,i\neq k,j\neq\ell\,\,\text{or}\,\,i=k,j=\ell\\ (j\ell)&\text{if}\,\,i=k,j\neq\ell\\ (ki)&\text{if}\,\,j=\ell,i\neq k\end{cases}.

Since transpositions generate $S_{n}$ , this proves that if $x$ is a transposition, then $yxy^{-1}$ is a transposition for any $y\in S_{n}$ , and thus $Q(yxy^{-1})=Q(x)$ .

The random transpositions chain on $S_{n}$ induces a Markov chain on $\mathcal{T}_{\lambda,\mu}$ as a ‘lumping’. From a table $T\in\mathcal{T}_{\lambda,\mu}$ , choose a permutation $x\in S_{n}$ that maps to the double coset corresponding to $T$ . From $x$ , sample $y$ from $P(x,\cdot)$ and move to the contingency table corresponding to $y$ . More details of this lumping are discussed in Section 2.1, and this perspective will be useful for relating the well-studied eigenvalues of the chain on $S_{n}$ to the chain on $\mathcal{T}_{\lambda,\mu}$ . It is also possible to write down the transition probabilities independently of the chain on $S_{n}$ .

Definition 1.4 (Random Transpositions Markov chain on $\mathcal{T}_{\lambda,\mu}$ ).

Let $\lambda=(\lambda_{1},\dots,\lambda_{I}),\mu=(\mu_{1},\dots,\mu_{J})$ be two partitions of $n$ . For $T\in\mathcal{T}_{\lambda,\mu}$ and indices $1\leq i_{1},i_{2}\leq I$ and $1\leq j_{1},j_{2}\leq J$ , let $F_{(i_{1},j_{1}),(i_{2},j_{2})}(T)=T^{\prime}$ denote the table with

	$\displaystyle T_{i_{1},j_{1}}^{\prime}=T_{i_{1},j_{1}}-1,$
	$\displaystyle T_{i_{2},j_{2}}^{\prime}=T_{i_{2},j_{2}}-1,$
	$\displaystyle T_{i_{1},j_{2}}^{\prime}=T_{i_{1},j_{2}}+1,$
	$\displaystyle T_{i_{2},j_{1}}^{\prime}=T_{i_{2},j_{1}}+1,$

and all other entries of $T^{\prime}$ the same as $T$ . The random transpositions Markov chain on $\mathcal{T}_{\lambda,\mu}$ is defined by the transition matrix $P$ with:

P(T,T^{\prime})=\begin{cases}\frac{2\cdot T_{i_{1},j_{1}}\cdot T_{i_{2},j_{2}}}{n^{2}}&\text{if}\,\,\,T^{\prime}=F_{(i_{1},j_{1}),(i_{2},j_{2})}(T)\\ 1-\sum_{i_{1}<i_{2}}\sum_{j_{1}<j_{2}}\frac{2\cdot T_{i_{1},j_{1}}\cdot T_{i_{2},j_{2}}}{n^{2}}&\text{if}\,\,\,T^{\prime}=T\\ 0&\text{otherwise}\end{cases}.

Note if $T_{i_{1},j_{1}}=0$ or $T_{i_{2},j_{2}}=0$ , then $F_{(i_{1},j_{1}),(i_{2},j_{2})}(T)$ has negative values and is not an element of $\mathcal{T}_{\lambda,\mu}$ . Definition 1.4 does not allow selecting this possibility, since only pairs with $T_{i_{1},j_{1}}T_{i_{2},j_{2}}\geq 1$ will be chosen.

There are other ways of thinking about the Markov chain. Suppose the contingency table is created from $n$ data pairs of the type $(i,j)$ , where $i$ indicates the row feature and $j$ indicates the column feature. One step of the chain is to pick two pairs uniformly (possibly picking the same pair) and swapping the column values (or equivalently, swapping the row values). This is exactly the swap Markov chain on bi-partite graphs, which has been studied for various special cases, e.g. [2].

Example 1.5.

$\lambda=(3,2),\mu=(2,1,1)$ :

	$\displaystyle T^{x}=\begin{pmatrix}2&1&0\cr 0&1&1\end{pmatrix}\hskip 28.45274pt\longrightarrow\hskip 28.45274ptT^{y}=\begin{pmatrix}1&2&0\cr 1&0&1\end{pmatrix}$
	$\displaystyle\hskip 14.22636ptx=12345\hskip 45.5244pt\longrightarrow\hskip 28.45274pty=1\mathbf{4}3\mathbf{2}5$

The probability of this transition for the tables is $2/5^{2}$ , since there are two possible transpositions in the permutation $x$ that would result in the table $T^{y}$ . Representing the table as a set of $n$ data points, this transition is:

\displaystyle T^{x}:(1,1),(1,1),(1,2),(2,2),(2,3)\longrightarrow T^{y}=(1,1),(1,\mathbf{2}),(1,2),(2,\mathbf{1}),(2,3).

Remark 2.

If $I=J=2$ , then this is related to the Bernoulli-Laplace chain: Suppose $\lambda=(n-k,k),\mu=(n-j,j)$ . Consider two urns. The first contains $n-k$ total balls and the second has $k$ balls. Of these $n$ total balls, $n-j$ are green and $j$ are red. In the contingency table, the rows represent urns and the columns colors. In the traditional Bernoulli-Laplace urn, one ball is picked uniformly from each urn and swapped. The random transpositions chain Definition 1.4 can be described by instead picking $2$ balls uniformly from all $n$ balls and swapping (which creates the possibility both are selected from the same urn).

Using the spherical Fourier transform of the Gelfand pair $(S_{2n},S_{n}\times S_{n})$ , Diaconis and Shahshahani proved mixing time of order $(n/4)\log(n)$ steps for the Bernoulli-Laplace chain in a special case (for $2n$ total balls in the system); for thorough analyses of the Bernoulli-Laplace chain, see [15], [21], [6]. An extension of the Bernoulli-Laplace chain with $m$ urns is studied in [46]; this state space corresponds to $\mathcal{T}_{\lambda,\mu}$ for $\lambda,\mu$ partitions of $mn$ with $\lambda=1^{mn},\mu=(n,n,\dots,n)$ .

Remark 3.

The random transpositions chain is sometimes defined with the probability measure:

\widetilde{Q}(\sigma)=\begin{cases}\frac{1}{\binom{n}{2}}&\text{if}\,\,\,\ \sigma=(i,j),i<j\\ 0&\text{else}\end{cases}.

That is, the random walk on $S_{n}$ is described: pick a card with your left hand and pick a different card with your right hand. This means the permutation will always change at each step, but then the process is periodic. Using $Q$ from (3) (i.e. allowing the possibility of picking the same card) the chain is aperiodic, which allows for easier analysis of the chain on $S_{n}$ .

Note that as long as $\lambda,\mu$ are not both equal to $(1,1,\dots,1)$ (in which case $S_{\lambda}\backslash S_{n}/S_{\mu}\cong S_{n}$ ), the chain using probabilities $1/\binom{n}{2}$ is not periodic: If there is at least one row or column with $2$ entries, e.g. $(i,j)$ and $(i,l$ ), then picking these two entries to swap does not change the table. Thus, using $\widetilde{Q}$ to define the induced process on $\mathcal{T}_{\lambda,\mu}$ is also possible. The analysis and mixing times are the same for both $Q$ and $\widetilde{Q}$ , except the chains have slightly different eigenvalues.

The interpretation of the chain as a lumping of the random transpositions chain on $S_{n}$ is advantageous because the chain on contingency tables inherits its eigenvalues from the chain on $S_{n}$ , and these eigenvalues are well-known. Representation theory also gives an expression for the multiplicities of the eigenvalues in terms of Kostka numbers corresponding to partitions. Section 2.1 reviews this background in more detail, and thoroughly explains part (b) of the following theorem, which are the results for eigenvalues and multiplicities.

Theorem 1.6.

Let $\lambda=(\lambda_{1},\dots,\lambda_{I}),\mu=(\mu_{1},\dots,\mu_{J})$ be partitions of $n$ and $P$ the random transpositions Markov chain on the space of contingency tables $\mathcal{T}_{\lambda,\mu}$ .

(a)

The eigenvalues $\beta_{\rho}$ are of the form

$\beta_{\rho}=\frac{1}{n}+\frac{1}{n^{2}}\sum_{j=1}^{k}\left[(\rho_{j}-j)(\rho_{j}-j+1)-j(j-1)\right],$

for $\rho=(\rho_{1},\dots,\rho_{k})$ a partition of $n$ .
(b)

The multiplicity of $\beta_{\rho}$ for $P$ is $m_{\rho}^{\lambda}\cdot m_{\rho}^{\mu}$ , where $m_{\rho}^{\lambda}$ is the Kostka number (defined below).
(c)

Eigenfunctions of $P$ are the orthogonal polynomials for the Fisher-Yates distribution.

Theorem 1.6 (a) and (b) arise from the general theory for Markov chains on double cosets developed in [12] and [18] and summarized in Section 2.1. Section 3 contains a proof of Theorem 1.6(c) as well as a description of the multiplicities. It seems unlikely that there would be a simple formula for the number and multiplicity of the eigenvalues, as this would give a way of enumerating the size of the state space $\mathcal{T}_{\lambda,\mu}$ , which is a #- P complete problem [11]. The orthogonal polynomials of the Fisher-Yates distribution have not been explicitly computed. Section 3.2 computes linear and quadratic eigenfunctions of $P$ , for any size table, which can be used as a starting point for finding orthogonal polynomials.

For $2\times J$ tables, the stationary distribution is multi-variate hypergeometric, and the computation of the eigenvalues simplifies to give a more complete picture.

Theorem 1.7.

Let $\lambda=(n-k,k)$ for some $k\leq\lfloor n/2\rfloor$ , $\mu=(\mu_{1},\dots,\mu_{J})$ , and $P$ be the random swap Markov chain on the space of contingency tables $\mathcal{T}_{\lambda,\mu}$ . Then the eigenvalues of $P$ are

\beta_{m}=1-\frac{2m(n+1-m)}{n^{2}},\,\,\,\,\,\,0\leq m\leq k,

with eigenbasis the orthogonal polynomials of degree $m$ for the multivariate hypergeometric distribution (defined below). The multiplicity of $\beta_{m}$ is the size of the set

\left\{(x_{1},\dots,x_{J-1})\in\mathbb{N}^{J-1}:\sum_{j=1}^{J-1}x_{j}=m,x_{j}<\mu_{J-j+1}\right\}.

The eigenvalues and eigenvectors allow an analysis of the convergence rates of this Markov chain. The measure that we study for convergence rate is the mixing time. That is, for a Markov chain on a space $\Omega$ with transition probability $P$ and stationary distribution $\pi$ , the mixing time is defined as

t_{mix}(\epsilon)=\sup_{x_{0}\in\Omega}\inf\{t>0:\|P^{t}(x_{0},\cdot)-\pi(\cdot)\|_{TV}<\epsilon\},\quad\epsilon>0,

where $\|\mu-\pi\|_{TV}=\sup_{A\subset\Omega}|\mu(A)-\pi(A)|$ is the total-variation distance between probability measures.

It is challenging to analyze the mixing time in full generality, but understanding the linear and quadratic eigenfunctions of the chain give the following results:

•

For $\lambda=(n-k,k),\mu=(n-\ell,\ell)$ , the time to stationarity, averaged over starting positions sampled from $\pi_{\lambda,\mu}$ , is bounded above by $(n/4)\log(\min(k,\ell))$ .
•

For $\lambda=(n-k,k)$ and any $\mu$ , the multivariate Hahn polynomials are used to compute an upper bound for the time to stationarity starting from extreme states in which the second row has a single non-zero entry.
•

For any $\lambda,\mu$ , the mixing time is bounded below by $((n/4)-1)\log(C_{\lambda,\mu}n)$ , where $C_{\lambda,\mu}$ is a computable constant.

The mixing time of the random transpositions chain on $S_{n}$ is $(1/2)n\log(n)+cn$ , which is an upper bound for the mixing time of the lumped chain on $\mathcal{T}_{\lambda,\mu}$ . The results listed above suggest that there is not much of a speed-up.

1.3 Related Work

Note that for two way contingency tables it is easy to sample directly from the Fisher-Yates distribution (by generating a random permutation and using the argument of Lemma 1.2). One reason for interest in the Markov chain mixing time is that, for three and higher way tables, the Markov chain method is the only available approach, so it is important to see how it works in the two way case.

The uniform swap Markov chain on contingency tables was first proposed in [11], with the motivation of studying the uniform distribution [10]. The mixing time of this chain has been analyzed in special cases, e.g. if $n=\sum\lambda_{i}=\sum\mu_{j}$ and the number of rows and columns is fixed, then the mixing time is of order $n^{2}$ . When the table has two rows, the chain mixes in time polynomial in the number of columns and $n$ [25]. Chung, Graham, and Yau [7] indicate that a modified version of the chain converges in time polynomial in the number of rows and columns.

A contingency table can be thought of as a bi-partite graph with prescribed degree sequences, and a similar swap Markov chain defined for any graph with fixed degree sequences to sample from the uniform distribution, e.g. [2]. There is a large literature on methods for sampling graphs with given degree sequences (not necessarily bi-partite); see [20] for a recent review. Recently the chain was studied for contingency tables with entries in $\mathbb{Z}/q\mathbb{Z}$ and mixing time with cut-off proven at $c_{q}n^{2}\log(n)$ [41].

A Markov chain on $\mathcal{T}_{\lambda,\mu}$ for which the stationary distribution is Fisher-Yates was studied in [17], motivated by studying conditional distributions. The Markov chain is created by taking the Gibbs sampler using the uniform swap chain defined in [11]. This gives the chain: Pick a pair of rows $i_{1},i_{2}$ and columns $j_{1},j_{2}$ uniformly at random. This determines a $2\times 2$ submatrix. Replace it with a $2\times 2$ sub-table with the same margins, chosen from the hypergeometric distribution. As noted in Section 2 of [17], while it is straightforward to sample directly from Fisher-Yates distribution for $2$ -way tables, there is not a simple algorithm for sampling from the analagous distribution on $3$ -way or higher-dimensional tables. Thus, Markov chains with Fisher-Yates distribution as stationary are valuable for this application; multiway tables are discussed further in Section 5.2.

1.4 Outline

Section 2 contains an overview of basic definitions and facts for Markov chains, double cosets, and orthogonal polynomials. Section 3 establishes the eigenvalues and multiplicities, as well as the eigenfunctions of the Markov chain as polynomials, and gives formulas for the linear and quadratic polynomial eigenfunctions. These are then used in Section 4 to find explicit upper and lower bounds on the mixing time of the chain, in specific cases. Section 5 discusses some future directions.

Acknowledgments

The author thanks Persi Diaconis for helpful discussion and suggestions. This work was supported by a National Defense Science & Engineering Graduate Fellowship and a Lieberman Fellowship at Stanford University.

2 Background

This section reviews necessary facts and background on Markov chains, orthogonal polynomials, and the random transpositions chain on $S_{n}$ .

2.1 Double Coset Markov Chains

The results of this section use basic tools of representation theory; in particular, induced representations and Frobenius reciprocity. For background on these topics, see [27], [30], [47]. In the specific case in hand – the symmetric group – additional tools are available; Young’s rule and the hook-length formula. These are clearly developed in [31]. See also, [29].

Let $H,K$ be subgroups of the finite group $G$ , $Q$ a probability on $G$ . If $\text{supp}(Q)$ is not contained in a coset of a subgroup, then the random walk on $G$ induced by $Q$ is ergodic with a uniform stationary distribution. This random walk on $G$ is defined by picking a random element from $Q$ and multiplying the current state. That is, transitions are

P(x,y)=Q(yx^{-1}).

See [8] for an introduction to random walks on groups.

The double cosets of $H\backslash G/K$ partition the space $G$ and any Markov chain on $G$ defines a random process on the set of double cosets by keeping track of which double coset the process on $G$ is in, giving a ‘lumped’ process. Proposition 2.1 gives a condition for when the random walk induced by $Q$ on $G$ is also a Markov chain on the double cosets. For the statement we fix double coset representatives and write $x$ for the double coset $HxK$ . The result is proven in [12].

Proposition 2.1.

Let $Q$ be a probability on $G$ with is $H$ -conjugacy invariant ( $Q(s)=Q(h^{-1}sh)$ for $h\in H,s\in G$ ). Then, the image of the random walk driven by $Q$ on $G$ maps to a Markov chain on $H\backslash G/K$ with transition kernel

P(x,y)=Q(HyKx^{-1}).

The stationary distribution is $\pi(x)=|HxK|/|G|$ . If $Q(s)=Q(s^{-1})$ , then $(P,\pi)$ is reversible.

Remark 4.

It is special for a function of a Markov chain to also be Markov. In this setting, many of the famous Markov chains on $S_{n}$ are not Markov when they are lumped to the the double cosets $S_{\lambda}\backslash S_{n}/S_{\mu}$ . For example, the adjacent transpositions random walk on $S_{n}$ is induced by $Q((i,i+1))=1/(n-1)$ , which does not satisfy the conditions of Proposition 2.1. See [43] for a survey on lumped Markov chains.

Since transpositions form a conjugacy class in $S_{n}$ , the probability $Q$ from (3) satisfies Proposition 2.1, so lumped to contingency tables gives a Markov process. The following lemma also directly proves that the lumping of the random transpositions chain to contingency tables is Markov, and equivalent to Definition 1.4.

Lemma 2.2.

The random transpositions chain on $S_{n}$ induced by $Q$ from (7) when lumped to double cosets $S_{\lambda}\backslash S_{n}/S_{\mu}$ is equivalent to the Markov chain defined in Definition 1.4.

Proof.

Suppose $\lambda=(\lambda_{1},\dots,\lambda_{I}),\mu=(\mu_{1},\dots,\mu_{J})$ . Define $L_{1}=\{1,\dots,\lambda_{1}\},L_{2}=\{\lambda_{1}+1,\dots,\lambda_{1}+\lambda_{2}\},\dots L_{I}=\{n-\lambda_{I},\dots,n-1,n\}$ and similarly use $\mu$ to define $M_{1},\dots,M_{J}$ . Then the contingency table corresponding to the double coset of $x\in S_{n}$ is defined by

T^{x}=\{T_{ij}^{x}\},\quad T_{ij}^{x}=\#\{\ell\in L_{i}:x(\ell)\in M_{j}\}.

Let $(ab)$ be a transposition in $S_{n}$ . If $y=(ab)x$ then $y(a)=x(b),y(b)=x(a)$ and $y(c)=x(c),c\neq a,b$ . Suppose $a\in L_{i_{1}},b\in L_{i_{2}},x(a)\in M_{j_{1}},x(b)\in M_{j_{2}}$ . If $i_{1}=i_{2}$ or $j_{1}=j_{2}$ then $y\in S_{\lambda}xS_{\mu}$ . Otherwise, $T^{y}=F_{(i_{1},j_{1}),(i_{2},j_{2})}(T^{x})$ , as defined in Definition 1.4.

Note that if $x,x^{\prime}\in S_{n}$ are contained in the same double coset, i.e. $T^{x}=T^{x^{\prime}}$ , then for any $y\in S_{n}$

\sum_{y^{\prime}\in S_{\lambda}yS_{\mu}}Q(y^{\prime}x^{-1})=\sum_{y^{\prime}\in S_{\lambda}yS_{\mu}}Q(y^{\prime}(x^{\prime})^{-1}).

In words, for the random transpositions chain on $S_{n}$ , the probability of transitioning from the double coset $S_{\lambda}xS_{\mu}$ to another double coset $S_{\lambda}yS_{\mu}$ doesn’t depend on the choice of double coset representative. This is Dynkin’s condition for lumped Markov chains ([37], [43]).

∎

Eigenvalues

Let $1=\beta_{0}>\beta_{1}\geq\ldots\geq\beta_{|\Omega|-1}\geq-1$ be the eigenvalues of $P$ with corresponding eigenfunctions $f_{0},f_{1},\dots,f_{|\Omega|-1}$ . That is, $f_{i}:\Omega\to\mathbb{R}$ and

\operatorname{{\mathbb{E}}}[f_{i}(X_{1})\mid X_{0}=x]=\beta_{i}\cdot f_{i}(x),\,\,\,\,\,x\in\Omega.

For reversible Markov chains, the eigenfunctions $\{f_{i}\}_{i\geq 0}$ can be chosen to be orthonormal with respect to the stationary distribution:

\sum_{x\in\Omega}f_{i}(x)\cdot f_{j}(x)\cdot\pi(x)=\textbf{1}(i=j).

The eigenvalues and eigenfunctions give an exact formula for the chi-squared distance between the chain and the stationary distribution, defined

\chi^{2}_{x}(t):=\sum_{y\in\Omega}\frac{\left|P^{t}(x,y)-\pi(y)\right|^{2}}{\pi(y)}.

The chi-squared distance then gives an upper bound for the total variation distance. This information is summarized in the following lemma, see [37] Chapter 12.

Lemma 2.3.

For any $x\in\Omega$ and $t>0$ ,

\|P^{t}(x,\cdot)-\pi(\cdot)\|_{TV}^{2}\leq\frac{1}{4}\chi^{2}_{x}(t)=\frac{1}{4}\cdot\sum_{i=1}^{|\Omega|-1}\beta_{i}^{2t}\cdot f_{i}^{2}(x).

Note that the worst-case mixing time is defined by looking at the total variation distance from the worst-case starting point. Thus to bound the mixing time it is needed to be able to analyze $f_{i}^{2}(x)$ for any $x\in\Omega$ . Even when the eigenfunctions are explicitly known, this can be challenging. For chains $P(x,y)$ on $\Omega$ where a group $G$ acts transitively (for all $x,y\in\Omega,g\in G$ , $P(x,y)=P(gx,gy)$ ), the distance from all starting states is the same and the right hand side of the bound in Lemma 2.3 is $\sum_{i}\beta_{i}^{2t}$ . Another bound, for any starting state $x$ , is

4\|P_{x}^{\ell}-\pi\|_{TV}^{2}\leq\frac{1}{\pi(x)}\beta_{*}^{2\ell},

(4)

with $\beta_{*}=\max_{\lambda\neq 1}|\beta_{\lambda}|$ . If $\pi(x)$ is not uniform this bound can vary widely, as in the following example.

Example 2.4.

The paper [12] analyzes the random transvections chain on $GL_{n}(\mathbb{F}_{q})$ lumped to double cosets $\mathcal{B}\backslash GL_{n}(\mathbb{F}_{q})/\mathcal{B}$ , with $\mathcal{B}$ the Borel subgroup of upper triangular matrices. Here double cosets are indexed by permutations in $S_{n}$ , and the uniform distribution on $GL_{n}(\mathbb{F}_{q})$ induces the Mallows distribution on $S_{n}$ with parameter $q$ :

p_{q}(\omega)=\frac{q^{I(\omega)}}{[n]_{q}!},\quad[n]_{q}!:=(1+q)(1+q+q^{2})\ldots(1+q+\ldots+q^{n-1}),

where $I(\omega)$ is the number of inversions in $\omega$ . In the setting $q>1$ , the reversal permutation has largest probability and the identity has smallest. Transvections form a conjugacy class in $GL_{n}(\mathbb{F}_{q})$ , and so character theory gives the eigenvalues and the mixing time from special starting states can be analyzed. Starting from the identity the mixing time is order $n$ (the same order as the random transvections walk on $GL_{n}(\mathbb{F}_{q})$ ), but starting from the reversal permutation the mixing time is order $\log(n)$ .

The analysis in Example 2.4 is amenable because, in the setting of Proposition 2.1, the measure $Q$ was a class function, i.e. $Q(s)=Q(t^{-1}st)$ for all $s,t,\in G$ . In this case, the eigenvalues of the walk on $G$ are expressed using the characters of the irreducible complex representations of $G$ . Let $\widehat{G}$ be an index set for these representations, $\lambda\in\widehat{G}$ , and $\chi_{\lambda}$ the character of $\lambda$ . The restriction of $\chi_{\lambda}$ to $H$ is written $\chi_{\lambda}|_{H}$ and $\left\langle\chi_{\lambda}|_{H},1\right\rangle$ is the number of times the trivial representation of $H$ appears in $\chi_{\lambda}|_{H}$ . By reciprocity, this is $\left\langle\chi_{\lambda},\mathrm{Ind}_{H}^{G}(1)\right\rangle$ .

The following theorem is proven in [12].

Proposition 2.5.

Let $Q$ be a class function on $G$ and let $H,K$ be subgroups of $G$ . The induced chain $P(x,y)$ of Proposition 2.1 on $H\backslash G/K$ has eigenvalues

\beta_{\lambda}=\frac{1}{\chi_{\lambda}(1)}\sum_{s\in G}Q(s),\quad\lambda\in\widehat{G}

(5)

with multiplicity

m_{\lambda}=\left\langle\chi_{\lambda}|_{H},1\right\rangle\cdot\left\langle\chi_{\lambda}|_{K},1\right\rangle.

(6)

Further, the average square chi-squared distance to stationarity satisfies

\sum_{x\in\Omega}\pi(x)\chi^{2}_{x}(\ell)=\frac{1}{4}\sum_{\lambda\in\widehat{G},\lambda\neq 1}m_{\lambda}\beta_{\lambda}^{2\ell}.

(7)

2.2 Random Transpositions on $S_{n}$

The driving Markov chain on $S_{n}$ for contingency tables is the random transpositions Markov chain studied in [14]. This uses the tools of representation theory and the character theory of the symmetric group. An expository account, aimed at probabilists and statisticians is in Chapter 3D in [8]. One of the main results used below is an explicit determination of the eigenvalues of this chain.

Recall the probability measure which defines the random walk on $S_{n}$ :

Q(\sigma)=\begin{cases}\frac{2}{n^{2}}&\text{if}\,\,\,\,\sigma=(i,j),i<j\\ \frac{1}{n}&\text{if}\,\,\,\,\sigma=id,\\ 0&\text{otherwise},\end{cases}.

This measure $Q$ is not concentrated on a single conjugacy class (the identity is not in the conjugacy class of transpositions). However, we can write

Q=\frac{1}{n}I+\frac{n-1}{n}\widetilde{Q},

where $I(\sigma)=\mathbf{1}(\sigma=id)$ and $\widetilde{Q}=\frac{1}{\binom{n}{2}}\mathbf{1}(\sigma\in\mathcal{C})$ , where $\mathcal{C}\subset S_{n}$ denotes the conjugacy class of transpositions. (This $\widetilde{Q}$ is the same as the one discussed in Remark 3.) Then $\widetilde{Q}$ is concentration on a single conjugacy class $\mathcal{C}$ , so the eigenvalues of the random walk are equal to the character ratio,

\widetilde{\beta}_{\rho}=\frac{1}{\chi_{\rho}(1)}\sum_{s\in G}Q(s)=\frac{\chi_{\rho}(\mathcal{C})}{\chi_{\rho}(1)},

for each partition $\rho$ of $n$ .. The formula for this character ratio was determined by Frobenius; see [38] for a modern exposition, and [44] for a proof of this special case:

\widetilde{\beta}_{\rho}=\frac{1}{n(n-1)}\sum_{j=1}^{k}\left[\rho_{j}^{2}-(2j-1)\rho_{j}\right].

The eigenvalues for the walk driven by $Q$ are then related:

\beta_{\rho}=\frac{1}{n}+\frac{n-1}{n}\widetilde{\beta}_{\rho}.

This information is summarized in the following lemma; see [14] for more details.

Lemma 2.6 ([14], Corollary 1 & Lemma 7).

If $P$ is the random transpositions chain on $S_{n}$ driven by $Q$ , then $P$ has an eigenvalue $\beta_{\rho}$ for each partition $\rho=(\rho_{1},\rho_{2},\dots,\rho_{k})$ of $n$ . The eigenvalue corresponding to $\rho$ is

\beta_{\rho}=\frac{1}{n}+\frac{1}{n^{2}}\sum_{j=1}^{k}\left[\rho_{j}^{2}-(2j-1)\rho_{j}\right].

In [14], the chain is proven to mix in total variation distance, with cut-off, after $(n/2)\log(n)$ . This gives an initial upper bound on the mixing time of the random transpositions chain on contingency tables.

2.3 Orthogonal Polynomials

This section reviews orthogonal polynomials and records the formula for multivariate Hahn polynomials. See [19] for a thorough exposition on multivariate orthogonal polynomials; especially Section 3 for sufficient conditions for an orthonormal basis to exists for a probability distribution.

Let $\pi$ be a probability distribution on a finite space $\Omega\subset\mathbb{N}^{d}$ . Let $\ell^{2}(\pi)$ be the space of functions $f:\Omega\to\mathbb{R}$ with the inner product

\left\langle f,g\right\rangle_{\pi}=\operatorname{{\mathbb{E}}}_{\pi}[f(X)g(X)]=\sum_{x\in\Omega}f(x)g(x)\pi(x).

A set of functions $\{q_{m}\}_{0\leq m<|\Omega|}$ are orthogonal in $\ell^{2}(\pi)$ if

\left\langle q_{m},q_{\ell}\right\rangle_{\pi}=d_{m}^{2}\textbf{1}(m=\ell).

For the following lemma, $\mathbf{m}=(m_{1},\dots,m_{N})$ denotes an index vector and $|\mathbf{m}|=\sum m_{i}$ is the total degree of the polynomial defined by the vector.

Lemma 2.7 ([35], Lemma 3.2).

Suppose $\pi$ is a distribution on the space $\Omega\subset\mathbb{N}^{N}$ , for some $N$ , and $\{q_{\mathbf{m}}\}$ is an orthogonal basis of $\ell^{2}(\pi)$ , where $q_{\mathbf{m}}$ is a polynomial of exact degree $|\mathbf{m}|$ . Let $(X_{t})_{t\geq 0}$ be a reversible Markov chain with transition matrix $P$ and stationary distribution $\pi$ . Suppose that,

\operatorname{{\mathbb{E}}}\left[X_{1}^{\mathbf{m}}\mid X_{0}=\mathbf{x}\right]=\beta_{|\mathbf{m}|}x^{\mathbf{m}}+\left(\text{terms in x of degree}\,\,\,<|\mathbf{m}|\right).

Then $P$ has eigenvalue $\beta_{|\mathbf{m}|}$ with eigenbasis $\{q_{\mathbf{m}}\}_{|\mathbf{m}|=m}$ .

Write $\beta_{m}=\beta_{|\mathbf{m}|}$ for $|\mathbf{m}|=m$ . The chi-square distance between $P^{t}(\mathbf{x},\cdot)$ and $\pi$ is

\chi^{2}_{\mathbf{x}}(t)=\sum_{m\geq 1}\beta_{m}^{2t}\cdot h_{m}(\mathbf{x},\mathbf{x}),

where

h_{m}(\mathbf{x},\mathbf{y}):=\sum_{|\mathbf{m}|=m}q_{\mathbf{m}}(\mathbf{x})q_{\mathbf{m}}(\mathbf{y})\left\langle q_{\mathbf{m}},q_{\mathbf{m}}\right\rangle_{\pi}^{-1}

is called the kernel polynomial of degree $m$ .

Multivariate Hahn Polynomials

For contingency tables with only two rows, the Fisher-Yates distribution is simply the multivariate hypergeometric distribution. That is, if $\lambda=(n-k,k),k<\lfloor n/2\rfloor$ and $\mu=(\mu_{1},\dots,\mu_{J})$ , then a contingency table in $\mathcal{T}_{\lambda,\mu}$ can be represented by a $(J-1)$ -dimensional vector in the space

\mathbf{X}_{k,\mu}^{J}=\{\mathbf{x}=(x_{1},\dots,x_{J})\in\mathbb{N}_{0}^{J}:|\mathbf{x}|=k,x_{j}\leq\mu_{j}\},

where $|\mathbf{x}|=\sum_{i}x_{i}$ . The multivariate hypergeometric distribution is

H_{k,\mu}(\mathbf{x})=\frac{\prod_{j=1}^{J}\binom{\mu_{j}}{x_{j}}}{\binom{n}{k}},

where $n=\sum_{j}\mu_{j}$ . For example, this distribution arises from sampling without replacement: An urn contains $n$ balls of $|\mu|$ different colors, $\mu_{j}$ of color $j$ . If $\mathbf{X}$ is a vector which records the number of each color drawn in a sample (without replacement) of size $k$ , then $\mathbf{X}$ has the multivariate hypergeometric distribution. The orthogonal polynomials for this distribution are called the multivariate Hahn polynomials. An overview of univariate Hahn polynomials is in [28]. Following the notation from [35] (Section 2.2.1) and [26], define

	$\displaystyle a_{(k)}=a(a+1)\ldots(a+k-1),$
	$\displaystyle a_{[k]}=a(a-1)\ldots(a-k+1),$
	$\displaystyle a_{(0)}=a_{[0]}=1.$

For a vectors $\mathbf{x}=(x_{1},\dots,x_{d})$ ,

\displaystyle|\mathbf{x}|=\sum_{i=1}^{d}x_{i},\quad|\mathbf{x}_{i}|=\sum_{j=1}^{i}x_{j},\quad|\mathbf{x}^{i}|=\sum_{j=i}^{d}x_{j}.

The multivariate Hahn polynomials, defined in Section 2.2.1 of [35], are

Q_{\mathbf{m}}(\mathbf{x};k,\mu)=\frac{(-1)^{|\mathbf{m}|}}{(k)_{[|\mathbf{m}|]}}\prod_{j=1}^{J-1}\left(-k+|\mathbf{x}_{j-1}|+|\mathbf{m}^{j+1}|\right)_{(m_{j})}\cdot Q_{m_{j}}\left(x_{j};k-|\mathbf{x}_{j-1}|-|\mathbf{m}^{j+1}|,-\mu_{j},|\mu^{j+1}|+2|\mathbf{m}^{j+1}|\right),

(8)

where

\displaystyle Q_{m}(x;k,\alpha,\beta)

\displaystyle=\prescript{}{3}{F}_{2}\begin{pmatrix}-n,n+\alpha+\beta-1,-x&\vline&1\\ \alpha,-N&\vline\end{pmatrix}

is the classical univariate Hahn polynomial.

For calculating the expression for chi-squared distance, if the orthogonal polynomials are the eigenfunctions, then it is the kernel polynomials which need to be solved for. These were constructed by Griffiths [23], [24]. It is an open problem to find a useful equation for the kernel polynomials evaluated at more general states.

Proposition 2.8 (Proposition 2.6 from [35]).

Suppose $\mu_{j}\geq k$ for all $j$ . Let $\mathbf{e}_{j}$ be the vector with $1$ in the $j$ th index and $0$ elsewhere. Then,

h_{m}(k\mathbf{e}_{j},k\mathbf{e}_{j})=\binom{k}{m}\frac{(n-2m+1)n_{[m-1]}(n-\mu_{j})_{[m]}}{(n-k)_{[m]}(\mu_{j})_{[m]}}.

(9)

2.4 Fisher-Yates Distribution

The measure induced on contingency tables by the uniform distribution on $S_{n}$ is

\pi_{\lambda,\mu}(T)=\frac{1}{n!}\prod_{i,j}\frac{\lambda_{i}!\mu_{j}!}{T_{ij}!}.

(10)

This is the Fisher-Yates distribution on contingency tables, a mainstay of applied statistical work in chi-squared tests of independence. In applications it is natural to test the assumption that row and column features are independent and that the observed table is a sample with cell probabilities $p_{ij}=\theta_{i}\eta_{j}$ . For this model, the row and column sums are sufficient statistics; conditioning on the row and column sums of the table gives the Fisher-Yates distribution.

A useful way of describing a sample from $\pi_{\lambda,\mu}$ is by sampling without replacement: Suppose that an urn contains $n$ total balls of $J$ different colors, $\mu_{j}$ of color $j$ . To empty the urn, make $I$ sets of draws of unequal sizes. First draw $\lambda_{1}$ balls, next $\lambda_{2}$ , and so on until there are $\lambda_{I}=n-\sum_{i=1}^{I-1}\lambda_{i}$ balls left. Create a contingency table by setting $T_{ij}$ to be the number of color $j$ in the $i$ th draw.

This description is useful for calculating moments of entries in a table, by utilizing the fact that the draws are exchangeable: Let $X_{s}^{(i,j)}$ , $1\leq s\leq n$ be $1$ if in the $i$ th round of drawings, the $s$ th draw was color $j$ , and $0$ otherwise. That is,

T_{ij}=\sum_{s=1}^{\lambda_{i}}X_{s}^{(i,j)}.

The expectation of one entry in the table is

\displaystyle\operatorname{{\mathbb{E}}}_{\pi_{\lambda,\mu}}[T_{ij}]=\sum_{s=1}^{\lambda_{i}}\operatorname{\mathbb{P}}(X_{s}^{(i,j)}=1)=\frac{\lambda_{i}\mu_{j}}{n},

using that $\operatorname{\mathbb{P}}(X_{s}^{(i,j)}=1)=\mu_{j}/n$ for any $s$ since the variables are exchangeable in $s$ . These calculations for the moments of Fisher-Yates tables are needed in the following section to normalize eigenfunctions.

The usual test of the independence model uses the chi-squared statistic:

\chi^{2}(T)=\sum_{i,j}\frac{(T_{ij}-\lambda_{i}\mu_{j}/n)^{2}}{\lambda_{i}\mu_{j}/n}.

Under general conditions, see e.g. [33], the chi-squared statistic has limiting limiting chi-squared distribution with $(I-1)\cdot(J-1)$ degrees of freedom. This result says that, if the null hypothesis is true, most tables will be close to a table $T^{*}$ with entries $T^{*}_{ij}=\lambda_{i}\mu_{j}/n$ . Another feature to investigate for the independence model is the number of zero entries in a contingency table. Since most tables will be close to the table $T^{*}$ , which has no zeros, zeros are a pointer to the breakdown of the independence model. Section 5.2 of [16] proves that the number of zeros is asymptotically Poisson, under naturally hypotheses on row and column sums. In [42], limit theorems for fixed points, descents, and inversions of permutations chosen uniformly from fixed double cosets are studied.

Majorization Order

Let $T$ and $T^{\prime}$ be tables with the same row and column sums. Say that $T\prec T^{\prime}$ (‘ $T^{\prime}$ majorizes $T$ ’) if the largest element in $T^{\prime}$ is greater than the largest element in $T$ , the sum of the two largest elements in $T^{\prime}$ is greater than the sum of the two largest elements in $T$ , and so on. Of course the sum of all elements in $T^{\prime}$ equals the sum of all elements of $T$ .

Example 2.9.

For tables with $n=8,\lambda_{1}=\lambda_{2}=\mu_{1}=\mu_{2}=4$ , there is the following ordering

\begin{pmatrix}2&2\cr 2&2\end{pmatrix}\prec\begin{pmatrix}3&1\cr 1&3\end{pmatrix}\prec\begin{pmatrix}4&0\cr 0&4\end{pmatrix}.

Majorization is a standard partial order on vectors [40] and Harry Joe [32] has shown it is useful for contingency tables.

Proposition 2.10.

Let $T$ and $T^{\prime}$ be tables with the same row and column sums given by $\lambda,\mu$ and $\pi_{\lambda,\mu}$ the Fisher-Yates distribution. If $T\prec T^{\prime}$ , then

\pi_{\lambda,\mu}(T)>\pi_{\lambda,\mu}(T^{\prime}).

Proof.

From the definition, we have $\log(pi_{\lambda,\mu}(T))=C-\sum_{i,j}\log(T_{ij}!)$ for a constant $C$ . This form makes it clear the right hand side is a symmetric function of the $IJ$ numbers $\{T_{ij}\}$ . The log convexity of the Gamma function shows that it is concave. A symmetric concave function is Schur concave: That is, order-reversing for the majorization order [40]. ∎

Remark 5.

Joe [32] shows that, among the real-valued tables with given row and column sums, the independence table $T^{*}$ is the unique smallest table in majorization order. He further shows that if an integer valued table is, entry-wise, within $1$ of the real independence table, then $T$ is the unique smallest table with integer entries. In this case, the corresponding double coset has $P(T)$ largest.

Example 2.11.

Fix a positive integer $a$ and consider an $I\times J$ table $T$ with all entries equal to $a$ . This has constant row sums $J\cdot a$ and column sums $I\cdot a$ . It is the unique smallest table with these row and column sums, and so corresponds to the largest double coset. For $a=2,I=2,J=3$ , this table is

T=\begin{pmatrix}2&2&2\cr 2&2&2\end{pmatrix}.

Contingency tables with fixed row and column sums form a graph with edges between tables that can be obtained by one move of the following: pick two rows $i,i^{\prime}$ and two columns $j,j^{\prime}$ . Add $+1$ to the $(i,j)$ entry, $-1$ to the $(i^{\prime},j)$ entry, $+1$ to the $(i^{\prime},j^{\prime})$ entry, and $-1$ to the $(i,j^{\prime})$ entry (one move of the Markov chain Definition 1.4). This graph is connected and moves up or down in the majorization order as the $2\times 2$ table with rows $i,i^{\prime}$ and columns $j,j^{\prime}$ moves up or down. See Example 2.9 above.

3 Eigenvalues and Eigenfunctions

From Proposition 2.5, the multiplicity of the eigenvalue $\beta_{\rho}$ in the contingency table chain is

	$\displaystyle m_{\rho}=m_{\rho}^{\lambda}\cdot m_{\rho}^{\mu}$
	$\displaystyle m_{\rho}^{\lambda}:=\left\langle\chi_{\rho},\mathrm{Ind}_{S_{\lambda}}^{S_{n}}(1)\right\rangle$

These multiplicities can be determined by Young’s Rule [9], [31]: Given $\lambda$ a partition of $n$ , take $\lambda_{1}$ of the symbol ‘ $1$ ’, $\lambda_{2}$ of the symbol ‘2’, and so on. The value $\left\langle\chi_{\rho},\mathrm{Ind}_{S_{\lambda}}^{S_{n}}(1)\right\rangle$ is equal to the number of ways of arranging these symbols into a tableau of shape $\rho$ with strictly increasing columns and weakly increasing rows. In other words, $m_{\rho}^{\lambda}$ is the number of semistandard Young tableau of shape $\rho$ and weight $\lambda$ . These are also called Kostka numbers, and have a long enumerative history in connection with symmetric functions (see Chapter 7 of [48]).

Example 3.1.

Consider $n=5$ and $\lambda=(3,1,1),\mu=(2,2,1)$ . The possible eigenvalues and their multiplicities are in Table 1.

$\rho$	$\beta_{\rho}$	$m_{\rho}^{\lambda}$	$m_{\rho}^{\mu}$	$m_{\rho}$
$(5)$	$1.0$	1	1	1
$(4,1)$	$0.6$	2	2	4
$(3,2)$	$0.36$	1	2	2
$(3,1,1)$	$0.20$	1	1	1

Table 1: Eigenvalues and multiplicities for

\lambda=(3,1,1),\mu=(2,2,1)

For example, for $\mu=(2,2,1)$ the symbols are $11223$ and there are $2$ ways of arranging them into a tableau of shape $(3,2)$ with increasing rows and strictly increasing columns:

\begin{matrix}1&1&2\cr 2&3&\end{matrix},\quad\begin{matrix}1&1&3\cr 2&2&\end{matrix}.

Thus for $\rho=(3,2)$ , the contribution to the multiplicity is $m_{\rho}^{\mu}=2$ , as shown in Table 1. While there is not an explicit formula for the result of Young’s rule (indeed, this would provide a formula for the number of contingency tables with fixed row and column sums), we can say things for special cases. See the exercises in Chapter 6.4 [38] for more properties.

Corollary 3.2.

Let $\rho,\lambda,\mu$ be partitions of $n$ and $|\rho|$ denote the number of parts of the partition, with $|\lambda|=I,|\mu|=J$ . Let $\beta_{\rho}$ denote the eigenvalue in the random transpositions chain corresponding to $\rho$ and $m_{\rho}$ the multiplicity of $\beta_{\rho}$ in the chain lumped to $\mathcal{T}_{\lambda,\mu}$ .

(a)

The multiplicity of the second-largest eigenvalue $\beta_{(n-1,1)}=1-2/n$ is $m_{(n-1,1)}=(I-1)\cdot(J-1)$ .
(b)

If $m_{\rho}>0$ , then $|\rho|\leq\min(I,J)$ and $\rho_{1}\geq\max(\lambda_{1},\mu_{1})$ .
(c)

If $\rho=(n-k,k)$ for $1\leq k\leq\lfloor n/2\rfloor$ and $\lambda=(n-j,j)$ then

$m_{\rho}^{\lambda}=\begin{cases}0&\text{if}\,\,\,k>j\\ 1&\text{else}\end{cases}.$

(d)

If $\rho=(n-k,k)$ , $\mu=(\mu_{1},\dots,\mu_{J})$ and $\mu_{1}\geq k$ , then

m_{\rho}^{\mu}=\left\{(x_{1},\dots,x_{J-1})\in\mathbb{N}^{J-1}:\sum_{j=1}^{J-1}x_{j}=k,x_{j}\leq\mu_{j+1}\right\}=|\mathcal{T}_{(k,n-\mu_{1}-k),(\mu_{2},\mu_{3},\dots,\mu_{j})}|.

Remark 6.

Corollary 3.2(b) shows that a table with only two rows or columns will only have eigenvalues $\beta_{\rho}$ with $\rho=(n-m,m)$ . The eigenvalue defined by $\rho=(1,1,\dots,1)$ is $\beta_{\rho}=-1$ . This has $0$ multiplicity unless $\lambda=\mu=(1,1,\dots,1)$ for which $\mathcal{T}_{\lambda,\mu}\simeq S_{n}$ .

Proof.

(a): An arrangement of symbols determined by $\lambda$ into a tableau of shape $(n-1,1)$ is determined by the choice of the symbol to be in the second row. To fit the constraint the columns in the tableau are strictly increasing, any symbol except ‘1’ could be placed in the second row. Thus, there are $|\lambda|-1$ possibilities. For example, if $\lambda=(3,2,1)$ , there are 2 possibilities:

\displaystyle\begin{matrix}1&1&1&2&2\cr 3&&&&\end{matrix},\quad\begin{matrix}1&1&1&2&3\cr 2&&&&\end{matrix}.

(b): Consider the first column of an arrangement of the symbols determined by $\lambda$ into a tableau of shape $\rho$ . For the column to be strictly increasing, there must be at least $|\rho|$ symbols from $\lambda$ , to give the column $1,2,\dots,|\rho|$ . Thus, if $|\lambda|<|\rho|$ then $m_{\rho}^{\lambda}=0$ . All of the ‘1’ symbols must be contained in the first row of the tableau, which gives the constraint $\lambda_{1}\leq\rho_{1}$ .

(c): If $|\lambda|=2$ and $|\rho|=2$ then there is at most one way of arranging the symbols of $\lambda$ into a tableau of shape $\mu$ . For example, $\lambda=(3,2),\rho=(4,1)$ :

\begin{matrix}1&1&1&2\\ 2&&&\end{matrix}.

To satisfy the constraint that columns are strictly increasing it is necessary that the second row only contains symbols ‘2’. If $\lambda_{1}>\rho_{1}$ , this is not possible. Note that the assumption $\lambda_{2}\leq\lambda_{1}$ ensures that there would never need to be a column with only the symbol ‘2’.

(d): Now suppose $\mu=(\mu_{1},\dots,\mu_{J})$ with $J>2$ . The assumption $\mu_{1}\geq k$ ensures that any assignment of the second row will obey the strictly increasing column constraint. That is, any selection of $k$ symbols from the $n-\mu_{1}$ symbols which are greater than $1$ would be a valid assignment for the second row of the tableau. There are $J-1$ possible symbols, $\mu_{j}$ of each type. If $x_{j}$ denotes the number of symbol $j+1$ in row $2$ , then the second row is determined by $(x_{1},\dots,x_{J-1})$ with $x_{j}\leq\mu_{j+1}$ and $\sum_{j=1}^{J-1}x_{j}=k$ . The number of possibilities is exactly the number of $2\times(J-1)$ contingency tables with row sums $k,n-\mu_{1}-k$ and column sums $\mu_{2},\dots,\mu_{J}$ .

∎

The multiplicities also behave well with respect to the majorization order on partitions: If $\lambda,\lambda^{\prime}$ are partitions of $n$ , then $\lambda\prec\lambda^{\prime}$ if $\lambda_{1}+\lambda_{2}+\ldots+\lambda_{k}\leq\lambda_{1}^{\prime}+\lambda_{2}^{\prime}+\ldots+\lambda_{k}^{\prime}$ for all $1\leq k\leq|\lambda|$ . In other words, $\lambda^{\prime}$ can be obtained from $\lambda$ by successively ‘moving up boxes’ ([38] Chapter 1). For example,

Remark 7.

The eigenvalues $\beta_{\rho}$ are monotonic with respect to this ordering: If $\rho\prec\rho^{\prime}$ , then $\beta_{\rho}\leq\beta_{\rho^{\prime}}$ ; see Chapter 3 of [8] for more properties. This tells us that $\rho=(n-1,1)$ gives the largest eigenvalue not equal to $1$ .

The following lemma is well-known in the literature, e.g. [22].

Lemma 3.3.

Let $\lambda,\lambda^{\prime},\rho$ be partitions of $n$ . Then,

(a)

$m_{\rho}^{\lambda}\neq 0$ if and only if $\lambda\prec\rho$ .
(b)

If $\lambda\prec\lambda^{\prime}\prec\rho$ , then $m_{\rho}^{\lambda}\geq m_{\rho}^{\lambda^{\prime}}$ .

Remark 8.

No such monotonicity exists in $\rho$ . For example, with $n=4$ , $\lambda=(1,1,1,1)$ , we have $(1,1,1,1)\prec(2,1,1)\prec(2,2)$ yet $m_{(1,1,1,1)}^{\lambda}=1$ , $m_{(2,1,1)}^{\lambda}=3$ , $m_{(2,2)}^{\lambda}=2$ .

3.1 Average Case Mixing Time

For $2\times 2$ contingency tables, Corollary 3.2 suffices to determine the multiplicities of all eigenvalues.

Lemma 3.4.

For $\lambda=(n-k,k),\mu=(n-\ell,\ell)$ for $k\leq\ell\leq\lfloor n/2\rfloor$ if $t=(n/4)(\log(k)+c)$ for $c>0$ then

\sum_{\mathbf{x}\in\mathcal{T}_{\lambda,\mu}}\pi_{\lambda,\mu}(\mathbf{x})\|P^{t}_{\mathbf{x}}-\pi_{\lambda,\mu}\|_{TV}^{2}\leq e^{-c}.

If $t=cn/4$ , then

\sum_{\mathbf{x}\in\mathcal{T}_{\lambda,\mu}}\pi_{\lambda,\mu}(\mathbf{x})\chi^{2}_{\mathbf{x}}(t)\geq 1-c.

Proof.

Suppose $\rho=(n-m,m)$ with $m\leq k\leq\ell$ . By Corollary 3.2(c) $m_{\rho}^{\lambda}=m_{\rho}^{\mu}=1$ , so the multiplicity of $\beta_{\rho}=1-2m(n+1-m)/n^{2}$ in the transpositions chain on $\mathcal{T}_{\lambda,\mu}$ is $1$ . For any other partition $\rho$ , the multiplicity is $0$ . Thus,

	$\displaystyle\sum_{\mathbf{x}\in\mathcal{T}_{\lambda,\mu}}\pi(\mathbf{x})\chi^{2}_{\mathbf{x}}(t)$	$\displaystyle=\sum_{\rho\neq(n)}\beta_{\rho}^{2t}m_{\rho}^{2}$
		$\displaystyle=\sum_{m=1}^{k}\left(1-\frac{2m(n+1-m)}{n^{2}}\right)^{2t}$
		$\displaystyle\leq\sum_{m=1}^{k}\exp\left(-2t\frac{2m(n+1-m)}{n^{2}}\right)\leq\exp\left(-\frac{4t}{n}+\log(k)\right),$

which is $\leq e^{-c}$ for $t=(n/4)(\log(k)+c)$ .

The lower bound comes from using the term in the sum with $\rho=(n-1,1)$ (which gives the largest eigenvalue, with multiplicity $1$ ):

	$\displaystyle\sum_{\mathbf{x}\in\mathcal{T}_{\lambda,\mu}}\pi(\mathbf{x})\chi^{2}_{\mathbf{x}}(t)$	$\displaystyle\geq\beta_{(n-1,1)}^{2t}m_{(n-1,1)}^{2}$
		$\displaystyle=\left(1-\frac{2}{n}\right)^{2t}$
		$\displaystyle\geq 1-\frac{4t}{n}=1-c,$

if $t=cn/4$ . The last inequality is Bernoulli’s inequality. ∎

Remark 9.

A table in $\mathcal{T}_{(n-k,k),(n-\ell,\ell)}$ is determined by one entry, say the $(2,2)$ entry $X$ , so the table is

\begin{pmatrix}n-k-\ell+X&\ell-X\cr k-X&X\end{pmatrix}.

Then $X\in\{0,1,\dots,k\}$ has the uni-variate hypergeometric distribution with parameters $n,k$ . The transitions for $X$ are

	$\displaystyle P(a,a+1)=\frac{2(k-a)(\ell-a)}{n^{2}},\quad a<k$
	$\displaystyle P(a,a-1)=\frac{2a(n-k-\ell+a)}{n^{2}},\quad a>0.$

Remark 10.

In contrast to the previous remark, the Bernoulli-Laplace urn has transitions

	$\displaystyle P(a,a+1)=\frac{2(k-a)(\ell-a)}{k(n-k)},\quad a<k$
	$\displaystyle P(a,a-1)=\frac{2a(n-k-\ell+a)}{k(n-k)},\quad a>0.$

In [14], for the special case $k=\ell=n/2$ (which corresponds to the state space $\mathcal{T}_{(n/2,n/2),(n/2,n/2)}$ ), the Bernoulli-Laplace chain is proven to mix $(n/8)\log(n)$ steps. The paper [21] studies a more general model in which $k$ balls are swapped between the two urns; the chain is shown to have mixing time $(n/4k)\log(n)$ . These are special cases of the random walk on a distance regular graph studied in [3].

Remark 11.

Note that the random transpositions Markov chain on $S_{n}$ is transitive, and so the behavior of the chain, especially the mixing time, does not depend on the starting state. The chain lumped to $\mathcal{T}_{\lambda,\mu}$ is not transitive, and the mixing time could heavily rely on the starting state. It is expected that the mixing time is fastest starting from states with high probability (large double cosets) and slowest from the states with low probability (small double cosets). From Remark 5, the highest probability table is the one closest to the ‘independence table’; the tables with low probability are the sparse tables with many zeros.

Remark 12.

For any $\lambda,\mu$ , suppose $T\in\mathcal{T}_{\lambda,\mu}$ . From the table $T$ we can create a $2\times 2$ table by ‘collapsing’ columns $2:J$ and rows $2:I$ . That is, set

	$\displaystyle\widetilde{T}(1,1)=T(1,1)$
	$\displaystyle\widetilde{T}(1,2)=\sum_{j=2}^{J}T(1,j)$
	$\displaystyle\widetilde{T}(2,1)=\sum_{i=2}^{I}T(i,1)$
	$\displaystyle\widetilde{T}(2,2)=\lambda_{2}-\widetilde{T}(2,1)=\mu_{2}-\widetilde{T}(1,2).$

Then $\widetilde{T}\in\mathcal{T}_{(\lambda_{1},n-\lambda_{1}),(\mu_{1},n-\mu_{1})}$ . Futhermore, $T(1,1)$ has the uni-variate hypergeometric distribution with parameters $\mu_{1},\lambda_{1}$ . The random transpositions chain on $\mathcal{T}_{\lambda,\mu}$ lumps to the uni-variate Markov chain on $\mathcal{T}_{(\lambda_{1},n-\lambda_{1}),(\mu_{1},n-\mu_{1})}$ . Thus the mixing time for the $(1,1)$ -entry Markov chain is a lower bound for the mixing time of the full process.

3.2 Orthogonal Polynomials

This section proves part (3) of Theorem 1.6, that the eigenfunctions of the random transpositions chain on contingency tables are orthogonal polynomials. While it is difficult to find an explicit formula for all of these polynomials, analysis of the chain provides a way to calculate the linear and quadratic polynomials.

Theorem 3.5.

Let $\lambda=(\lambda_{1},\dots,\lambda_{I}),\mu=(\mu_{1},\dots,\mu_{J})$ be partitions of $n$ and $\{T_{t}\}_{t\geq 0}$ the random transpositions Markov chain on the space of contingency tables $\mathcal{T}_{\lambda,\mu}$ with transition matrix $P$ . For any $m$ and $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ ,

\displaystyle\operatorname{{\mathbb{E}}}[T_{t+1}(i,j)^{m}\mid T_{t}=\mathbf{x}]

\displaystyle=x_{ij}^{m}\left(1-\frac{2m(n+1-m)}{n^{2}}\right)+(\text{terms in x of degree}\,\,\,<m).

The eigenfunctions for $P$ are polynomials.

Proof.

Let $\{T_{t}\}_{t\geq 0}$ represent the Markov chain on $\mathcal{T}_{\lambda,\mu}$ , with $T_{t}(i,j)$ denoting the value in the $(i,j)$ cell. Let $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ . Marginally, each entry of the table is a birth-death process: For any $1\leq i\leq I,1\leq j\leq J$ ,

	$\displaystyle\operatorname{\mathbb{P}}\left(T_{t+1}(i,j)=x_{ij}+1\mid T_{t}=\mathbf{x}\right)=\sum_{\ell\neq j}\sum_{k\neq i}\frac{2\cdot x_{i\ell}x_{kj}}{n^{2}}=\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}$		(11)
	$\displaystyle\operatorname{\mathbb{P}}\left(T_{t+1}(i,j)=x_{ij}-1\mid T_{t}=\mathbf{x}\right)=\sum_{\ell\neq j}\sum_{k\neq i}\frac{2x_{ij}x_{kl}}{n^{2}}=\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}.$		(12)

These transitions allow for the calculation of $\operatorname{{\mathbb{E}}}[T_{t+1}(i,j)^{m}\mid T_{t}=\mathbf{x}]$ , for any integer power $m$ . That is,

	$\displaystyle\operatorname{{\mathbb{E}}}[T_{t+1}(i,j)^{m}\mid T_{t}=\mathbf{x}]$	$\displaystyle=(x_{ij}+1)^{m}\cdot\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}+(x_{ij}-1)^{m}\cdot\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}$
		$\displaystyle\,\,\,\,\,\,\,\,\,\,+x_{ij}^{m}\left(1-\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}-\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}\right)$
		$\displaystyle=x_{ij}^{m}+\left(mx_{ij}^{m-1}+\frac{m(m-1)}{2}x_{ij}^{m-2}+\sum_{\ell=3}^{m}\binom{m}{\ell}x_{ij}^{m-\ell}\right)\cdot\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}$
		$\displaystyle\,\,\,\,\,\,\,\,\,\,+\left(-mx_{ij}^{m-1}+\frac{m(m-1)}{2}x_{ij}^{m-2}+\sum_{\ell=3}^{m}\binom{m}{\ell}x_{ij}^{m-\ell}(-1)^{\ell}\right)\cdot\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}$
		$\displaystyle=x_{ij}^{m}\left(1-\frac{2m(n+1-m)}{n^{2}}\right)+(\text{terms in x of degree}\,\,\,<m).$

By Lemma 2.7, this condition means the eigenfunctions for $P$ are polynomials. ∎

Straightforward calculation of the transition probabilities gives the linear eigenfunctions:

Lemma 3.6.

For any $\lambda,\mu$ and $1\leq i\leq I,1\leq j\leq J$ , the functions

f_{ij}(\mathbf{x}):=x_{ij}-\frac{\lambda_{i}\mu_{j}}{n}

are eigenvectors with eigenvalue $1-2/n$ .

Proof.

For $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ , the expected value of one entry of the table after one step of the chain can be computed using (11) and (12):

$\displaystyle\operatorname{{\mathbb{E}}}[T_{1}(i,j)\mid T_{0}=\mathbf{x}]$	$\displaystyle=(x_{ij}+1)\cdot\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}+(x_{ij}-1)\cdot\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}$
	$\displaystyle\,\,\,\,\,\,\,\,\,\,+x_{ij}\left(1-\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}-\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}\right)$
	$\displaystyle=\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}-\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}+x_{ij}$
	$\displaystyle=x_{ij}\left(1-\frac{2}{n}\right)+\frac{2\lambda_{i}\mu_{j}}{n^{2}}.$	(13)

Thus,

	$\displaystyle\operatorname{{\mathbb{E}}}[f_{ij}(T_{1})\mid T_{0}=\mathbf{x}]$	$\displaystyle=x_{ij}\left(1-\frac{2}{n}\right)+\frac{2\lambda_{i}\mu_{j}}{n^{2}}-\frac{\lambda_{i}\mu_{j}}{n}$
		$\displaystyle=\left(1-\frac{2}{n}\right)\left(x_{ij}-\frac{\lambda_{i}\mu_{j}}{n}\right)=\left(1-\frac{2}{n}\right)\cdot f_{ij}(\mathbf{x}).$

∎

Remark 13.

The set $\{f_{ij}\}_{1\leq i\leq I-1,1\leq j\leq J-1}$ of eigenvectors from Lemma 3.6 is a basis for the eigenspace with eigenvalue $1-2/n$ (as Corollary 3.2 (a) showed the multiplicity of $1-2/n$ is $(I-1)(J-1)$ ). The current version of the functions $f_{ij}$ are not orthogonal with respect to $\pi_{\lambda,\mu}$ , because

\displaystyle\operatorname{{\mathbb{E}}}_{\pi_{\lambda,\mu}}[T_{ij}T_{kl}]=\begin{cases}\frac{\lambda_{i}\mu_{j}\lambda_{k}\mu_{l}}{n(n-1)}&\text{if}\quad i\neq k,j\neq l\\ \frac{\lambda_{i}\mu_{j}\lambda_{k}(\mu_{j}-1)}{n(n-1)}&\text{if}\quad i\neq k,j=l\\ \frac{\lambda_{i}^{2}\mu_{j}^{2}}{n^{2}}+\frac{\lambda_{i}\mu_{j}(n-\lambda_{i})(n-\mu_{j})}{n^{2}(n-1)}&\text{if}\quad i=j,k=l\end{cases}.

(14)

If $i\neq k,j\neq l$ , then

	$\displaystyle\operatorname{{\mathbb{E}}}_{\pi_{\lambda,\mu}}[f_{i,j}(T)f_{k,l}(T)]$	$\displaystyle=\frac{\lambda_{i}\mu_{j}\lambda_{k}\mu_{l}}{n(n-1)}-2\frac{\lambda_{i}\mu_{j}\lambda_{k}\mu_{l}}{n^{2}}+\frac{\lambda_{i}\mu_{j}\lambda_{k}\mu_{l}}{n^{2}}$
		$\displaystyle=\frac{\lambda_{i}\mu_{j}\lambda_{k}\mu_{l}}{n}\left(\frac{1}{n(n-1)}\right).$

To find the quadratic eigenvectors, the following computations are needed.

Lemma 3.7.

For any $\lambda,\mu$ , $1\leq i,k\leq I,1\leq j,l\leq J$ and $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ ,

\operatorname{{\mathbb{E}}}\left[T_{1}(i,j)T_{1}(k,l)\mid T_{0}=\mathbf{x}\right]=\begin{cases}x_{ij}x_{kl}\left(1-\frac{4}{n}+\frac{2}{n^{2}}\right)+\frac{2}{n^{2}}\left(x_{kl}\lambda_{i}\mu_{j}+x_{ij}\lambda_{k}\mu_{l}+x_{il}x_{kj}\right)&i\neq k,j\neq l\\ x_{ij}x_{kj}\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)+\frac{2}{n^{2}}\left(x_{kj}(\lambda_{i}\mu_{j}-\lambda_{i})+x_{ij}(\lambda_{k}\mu_{j}-\lambda_{k})\right)&i\neq k,j=l\\ x_{ij}^{2}\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)+\frac{2}{n^{2}}\left(x_{ij}(2\lambda_{i}\mu_{j}-2\lambda_{i}-2\mu_{j}+n)+\lambda_{i}\mu_{j}\right)&i=k,j=l\end{cases}.

(15)

Proof.

For the first case, if $i\neq k$ and $j\neq l$ then the only situation in which both the $(i,j)$ and $(k,l)$ entries are chosen is if they are opposite corners of the box chosen for the swap move. In the following calculation, (16) is the case where both the $(i,j)$ and $(k,l)$ entries change, (17) is the case where only the $(i,j)$ entry changes, and (18) is the case where only the $(k,l)$ entry changes.

$\displaystyle\operatorname{{\mathbb{E}}}\left[T_{1}(i,j)T_{1}(k,l)\right.$	$\displaystyle\left.-x_{ij}x_{kl}\mid T_{0}=\mathbf{x}\right]=(-x_{ij}-x_{kl}+1)\frac{2x_{ij}x_{kl}}{n^{2}}+(x_{ij}+x_{kl}+1)\frac{2x_{il}x_{kj}}{n^{2}}$	(16)
	$\displaystyle-x_{kl}\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij}-x_{kl})}{n^{2}}+x_{kl}\frac{2((\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})-x_{il}x_{kj})}{n^{2}}$	(17)
	$\displaystyle-x_{ij}\frac{2x_{kl}(n-\lambda_{k}-\mu_{l}+x_{kl}-x_{ij})}{n^{2}}+x_{ij}\frac{2((\lambda_{k}-x_{kl})(\mu_{l}-x_{kl})-x_{il}x_{kj})}{n^{2}}$	(18)
	$\displaystyle=\frac{2}{n^{2}}\left(x_{ij}x_{kl}-2nx_{ij}x_{kl}+x_{ij}\lambda_{k}\mu_{l}+x_{kl}\lambda_{i}\mu_{j}+x_{il}x_{kj}\right)$

Now suppose $i\neq k$ and $j=l$ . In this case the $(i,j)$ and $(k,l)$ entries of the table could only both change if one increases and the other decreases. This gives

$\displaystyle\operatorname{{\mathbb{E}}}\left[T_{1}(i,j)T_{1}(k,j)\right.$	$\displaystyle\left.-x_{ij}x_{kj}\mid T_{0}=\mathbf{x}\right]=(x_{ij}-x_{kj}-1)\frac{2x_{ij}(\lambda_{k}-x_{kj})}{n^{2}}+(-x_{ij}+x_{kj}-1)\frac{2x_{kj}(\lambda_{i}-x_{ij})}{n^{2}}$	(19)
	$\displaystyle-x_{kj}\frac{2x_{ij}(n-\lambda_{i}-\lambda_{k}-\mu_{j}+x_{ij}+x_{kj})}{n^{2}}+x_{kj}\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij}-x_{kj})}{n^{2}}$	(20)
	$\displaystyle-x_{ij}\frac{2x_{kj}(n-\lambda_{k}-\lambda_{i}-\mu_{j}+x_{kj}+x_{ij})}{n^{2}}+x_{ij}\frac{2(\lambda_{k}-x_{kj})(\mu_{j}-x_{kj}-x_{ij})}{n^{2}}$	(21)
	$\displaystyle=\frac{2}{n^{2}}\left(2x_{ij}x_{kj}-2nx_{ij}x_{kj}-\lambda_{k}x_{ij}-\lambda_{i}x_{kj}+x_{kj}\lambda_{i}\mu_{j}+x_{ij}\lambda_{k}\mu_{j}\right).$

Finally for the case $i=k,j=l$ , this is the second moment of a birth death process. Using the transitions (11) and (12), the calculation is

	$\displaystyle\operatorname{{\mathbb{E}}}[T_{1}(i,j)^{2}\mid T_{0}=\mathbf{x}]$	$\displaystyle=(x_{ij}+1)^{2}\left(\frac{2(\lambda_{i}-x_{ij})(\mu_{j}-x_{ij})}{n^{2}}\right)+(x_{ij}-1)^{2}\left(\frac{2x_{ij}(n-\lambda_{i}-\mu_{j}+x_{ij})}{n^{2}}\right)$
		$\displaystyle\,\,\,\,\,\,\,\,\,\,\,+x_{ij}^{2}\operatorname{\mathbb{P}}(T_{1}(i,j)=x_{ij}\mid T_{0}=\mathbf{x})$
		$\displaystyle=x_{ij}^{2}\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)+\frac{2}{n^{2}}\left(x_{ij}(2\lambda_{i}\mu_{j}-2\lambda_{i}-2\mu_{j}+n)+\lambda_{i}\mu_{j}\right).$

∎

Lemma 3.8 (Quadratic Eigenfunctions).

Let $\lambda,\mu$ be partitions of $n$ with $|\lambda|=I,|\mu|=J$ . Let $1\leq i,k\leq I,1\leq j,l\leq J$ with $i\neq k,j\neq l$ . For the Markov chain on $\mathcal{T}_{\lambda,\mu}$ , the following functions are eigenfunctions with eigenvalue $1-4/n+4/n^{2}$ , defined for $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ :

(a)

	$\displaystyle f_{(i,j),(k,l)}(\mathbf{x})$	$\displaystyle:=x_{ij}x_{kl}-x_{ij}\frac{\lambda_{k}\mu_{l}}{n-2}-x_{kl}\frac{\lambda_{i}\mu_{j}}{n-2}$
		$\displaystyle\,\,\,\,\,\,\,+x_{il}x_{kj}-x_{il}\frac{\lambda_{k}\mu_{j}}{n-2}-x_{kj}\frac{\lambda_{i}\mu_{l}}{n-2}+\frac{2\lambda_{k}\mu_{l}\lambda_{i}\mu_{j}}{(n-1)(n-2)}.$

(b)

\displaystyle f_{(i,j),(k,j)}(\mathbf{x})

\displaystyle=x_{ij}x_{kj}-x_{ij}\frac{\lambda_{i}(\mu_{j}-1)}{n-2}-x_{kj}\frac{\lambda_{k}(\mu_{j}-1)}{n-2}+\frac{\lambda_{i}\lambda_{k}\mu_{j}(\mu_{j}-1)}{(n-1)(n-2)}.

(c)

\displaystyle f_{(i,j),(i,j)}(\mathbf{x})

\displaystyle=x_{ij}^{2}-x_{ij}\frac{2\lambda_{i}\mu_{j}-2\lambda_{i}-2\mu_{j}+n}{n-2}+\frac{\lambda_{i}\mu_{j}(1+\lambda_{i}\mu_{j}-\lambda_{i}-\mu_{j})}{(n-1)(n-2)}.

Remark 14.

Using Equation 14, one can check that $\operatorname{{\mathbb{E}}}_{\pi_{\lambda,\mu}}[f_{(i,j),(k,l)}(T)]=0$ , for any $i,j,k,l$ .

Proof.

The results follow from straightforward calculations using Lemma 3.7 and Equation 13. As an illustration, (a) is computed: In $\operatorname{{\mathbb{E}}}[f_{(i,j),(k,l)}(T_{1})\mid T_{0}=\mathbf{x}]$ , the degree $2$ terms will be

\displaystyle x_{ij}x_{kl}\left(1-\frac{4}{n}+\frac{2}{n^{2}}\right)+\frac{2}{n^{2}}x_{il}x_{kj}+x_{il}x_{kj}\left(1-\frac{4}{n}+\frac{2}{n^{2}}\right)+\frac{2}{n^{2}}x_{ij}x_{k}l=\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)\left(x_{ij}x_{kl}+x_{il}x_{kj}\right).

Degree $1$ terms arise from the expectation of six of the terms in $f_{(i,j),(k,l)}(T_{1})$ :

	$\displaystyle\operatorname{{\mathbb{E}}}[T_{1}(i,j)T_{1}(k,l)\mid T_{0}=\mathbf{x}]\to\frac{2}{n^{2}}\left(x_{kl}\lambda_{i}\mu_{j}+x_{ij}\lambda_{k}\mu_{l}\right)$
	$\displaystyle\operatorname{{\mathbb{E}}}[T_{1}(i,l)T_{1}(k,j)\mid T_{0}=\mathbf{x}]\to\frac{2}{n^{2}}\left(x_{kj}\lambda_{i}\mu_{l}+x_{il}\lambda_{k}\mu_{j}\right)$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(i,j)\frac{\lambda_{k}\mu_{l}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\left(1-\frac{2}{n}\right)x_{ij}\frac{\lambda_{k}\mu_{l}}{n-2}$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(k,l)\frac{\lambda_{i}\mu_{j}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\left(1-\frac{2}{n}\right)x_{kl}\frac{\lambda_{i}\mu_{j}}{n-2}$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(i,l)\frac{\lambda_{k}\mu_{j}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\left(1-\frac{2}{n}\right)x_{il}\frac{\lambda_{k}\mu_{j}}{n-2}$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(k,j)\frac{\lambda_{i}\mu_{l}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\left(1-\frac{2}{n}\right)x_{kj}\frac{\lambda_{i}\mu_{l}}{n-2}$

Collecting the $x_{ij}$ terms gives

	$\displaystyle x_{ij}\left(\frac{2}{n^{2}}\lambda_{k}\mu_{l}-\frac{1}{n}\lambda_{k}\mu_{l}\right)$	$\displaystyle=(n-2)\left(\frac{2}{n^{2}}-\frac{1}{n}\right)x_{ij}\frac{\lambda_{k}\mu_{l}}{n-2}$
		$\displaystyle=-\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)x_{ij}\frac{\lambda_{k}\mu_{l}}{n-2},$

and the computation is the same for the other linear terms. Finally, the constant terms arise from:

	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(i,j)\frac{\lambda_{k}\mu_{l}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\frac{2\lambda_{i}\mu_{j}}{n^{2}}\cdot\frac{\lambda_{k}\mu_{l}}{n-2}$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(k,l)\frac{\lambda_{i}\mu_{j}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\frac{2\lambda_{k}\mu_{l}}{n^{2}}\cdot\frac{\lambda_{i}\mu_{j}}{n-2}$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(i,l)\frac{\lambda_{k}\mu_{j}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\frac{2\lambda_{i}\mu_{l}}{n^{2}}\cdot\frac{\lambda_{k}\mu_{j}}{n-2}$
	$\displaystyle\operatorname{{\mathbb{E}}}\left[-T_{1}(k,j)\frac{\lambda_{i}\mu_{l}}{n-2}\mid T_{0}=\mathbf{x}\right]\to-\frac{2\lambda_{k}\mu_{j}}{n^{2}}\cdot\frac{\lambda_{i}\mu_{l}}{n-2}.$

This gives

\displaystyle\frac{2\lambda_{k}\mu_{l}\lambda_{i}\mu_{j}}{(n-1)(n-2)}-\frac{8\lambda_{k}\mu_{l}\lambda_{i}\mu_{j}}{n^{2}(n-2)}=\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)\frac{2\lambda_{k}\mu_{l}\lambda_{i}\mu_{j}}{(n-1)(n-2)}.

Further details are omitted. ∎

4 Mixing Time

This section contains results on the mixing time for special cases. Throughout, keep in mind that random transpositions on $S_{n}$ has mixing time $(n/2)\log(n)$ , which is an upper bound for the mixing time on $\mathcal{T}_{\lambda,\mu}$ . The question then is: Does the lumping speed up mixing, and if so by what order?

4.1 Upper Bound for $2\times J$ tables

Due to the complicated nature of the orthogonal polynomials, the upper bound can only be analyzed for the specific starting states for which the kernel polynomials simplify. Suppose $\lambda=(n-k,k),\mu=(\mu_{1},\dots,\mu_{J})$ , for $k\leq\lfloor n/2\rfloor$ , and at least one $1\leq j\leq J$ such that $\mu_{j}>k$ .

Let $k\mathbf{e}_{j}$ be the table with the second row all $0$ except $k$ in the $j$ th column. For example, when $n=10,\mu=(4,3,3),\lambda=(8,2)$ , we have the following table

2\mathbf{e}_{1}=\begin{pmatrix}2&3&3\\ 2&0&0\end{pmatrix}.

Note that the assumption $\mu_{j}>k$ ensure that this table exists.

Recall Proposition 2.8 which says, evaluated at these states, the kernel polynomials for the multivariate orthogonal polynomials have a simple closed-form expression:

h_{m}(k\mathbf{e}_{j},k\mathbf{e}_{j})=\binom{k}{m}\frac{(n-2m+1)n_{[m-1]}(n-\mu_{j})_{[m]}}{(n-k)_{[m]}(\mu_{j})_{[m]}}.

(22)

Recall the notation $a_{[m]}=a(a-1)\ldots(a-m+1)$ for the decreasing factorial, with $a_{[0]}:=1$ .

Theorem 4.1.

Let $P$ be the transition matrix for the swap Markov chain $\mathcal{T}_{(n-k,k),\mu}$ , with $k\leq\lfloor n/2\rfloor$ and $\mu=(\mu_{1},\dots,\mu_{J})$ with $\mu_{j}>k$ for at least one index $1\leq j\leq J$ . For any $c>0$ and $1\leq j\leq J$ such that $\mu_{j}>k$ ,

(a)

t=\left(\frac{n}{4}+\frac{k(k-1)}{2(n-2k)}\right)\cdot\left(\log\left(\frac{k\cdot n\cdot(n-\mu_{j})}{(n-2k)\cdot(\mu_{j}-k)}\right)+c\right),

then $\chi^{2}_{k\mathbf{e}_{j}}(t)\leq e^{-c}$ .

(b)

If

$t=\frac{n}{8}\cdot\left(\log\left(\frac{k\cdot(n-1)\cdot(n-\mu_{j})}{(n-k)\cdot\mu_{j}}\right)-c\right),$

then $\chi^{2}_{k\mathbf{e}_{j}}(t)\geq e^{c}$ .

Remark 15.

In [35] the kernel polynomials for the multivariate hypergeometric distribution are similarly used to analyze three classes of Bernoulli-Laplace type Markov chains, which have multivariate hypergeometric stationary distribution. Theorem 4.1 can be compared to Proposition 4.2.1 [35]. The random transpositions chain on $\mathcal{T}_{(n-k,k),\mu}$ is essentially a special case of the Bernoulli Laplace level model.

Proof.

Using the expression (22) for the kernel polynomials, the chi-squared distance is

\displaystyle\chi^{2}_{k\mathbf{e}_{j}}(t)

\displaystyle=\sum_{m=1}^{k}\left(1-\frac{2m(n+1-m)}{n^{2}}\right)^{2t}\cdot\binom{k}{m}\frac{(n-2m+1)n_{[m-1]}(n-\mu_{j})_{[m]}}{(n-k)_{[m]}(\mu_{j})_{[m]}}.

To help bound this sum, let $S_{m}$ be the $m$ th term in the summand, i.e.

S_{m}=\left(1-\frac{2m(n+1-m)}{n^{2}}\right)^{2t}\cdot\binom{k}{m}\frac{(n-2m+1)n_{[m-1]}(n-\mu_{j})_{[m]}}{(n-k)_{[m]}(\mu_{j})_{[m]}}.

Then the ratio of consecutive terms can be bounded:

	$\displaystyle\frac{S_{m+1}}{S_{m}}\leq$	$\displaystyle\left(1-\frac{2(n-2m)}{n^{2}-2m(n+1-m)}\right)^{2t}\cdot\frac{k-m}{m+1}\cdot\frac{n-2m-1}{n-2m+1}\cdot\frac{(n-m+1)(n-\mu_{j}-m)}{(n-k-m)(\mu_{j}-m)}$
		$\displaystyle\,\,\,\,\,\,\,\,\,\,\leq\left(1-\frac{2(n-2k)}{n^{2}-2k(n+1-k)}\right)^{2t}\cdot\frac{k}{2}\cdot\frac{n}{n-2k}\cdot\frac{n-\mu_{j}}{\mu_{j}-k}$
		$\displaystyle\,\,\,\,\,\,\,\,\,\,\leq\frac{k}{2}\cdot\frac{n}{n-2k}\cdot\frac{n-\mu_{j}}{\mu_{j}-k}\exp\left(-2t\left(\frac{2(n-2k)}{n^{2}-2k(n+1-k)}\right)\right).$

If $c>0$ and

t=\left(\frac{n^{2}-2k(n+1-k)}{4(n-2k)}\right)\cdot\left(\log\left(k\cdot\frac{n}{n-2k}\cdot\frac{n-\mu_{j}}{\mu_{j}-k}\right)+c\right),

then the ratio $S_{m+1}/S_{m}$ is less than $e^{-c}/2\leq 1/2$ . Also the first term $S_{1}$ is the largest term, and

S_{1}=\left(1-\frac{2}{n}\right)^{2t}\cdot\frac{k(n-1)(n-\mu_{j})}{(n-k)(\mu_{j})}\leq e^{-c}.

Thus,

\displaystyle\sum_{m=1}^{k}S_{m}\leq S_{1}\sum_{m=0}^{\infty}\frac{1}{2^{m}}\leq e^{-c}.

The bound for part (b) comes from considering the contribution to the sum from the largest eigenvalues, which is for $m=1$ . This gives

	$\displaystyle\chi^{2}_{k\mathbf{e}_{j}}(t)$	$\displaystyle\geq\left(1-\frac{2}{n}\right)^{2t}\cdot k\cdot\frac{(n-1)n_{[0]}(n-\mu_{j})_{[1]}}{(n-k)_{[1]}(\mu_{j})_{[1]}}=\left(1-\frac{2}{n}\right)^{2t}\cdot k\cdot\frac{(n-1)\cdot(n-\mu_{j})}{(n-k)\cdot\mu_{j}}$
		$\displaystyle=\exp\left(2t\log\left(1-\frac{2}{n}\right)+\log\left(k\cdot\frac{(n-1)\cdot(n-\mu_{j})}{(n-k)\cdot\mu_{j}}\right)\right)$
		$\displaystyle\geq\exp\left(2t\left(-\frac{4}{n}\right)+\log\left(k\cdot\frac{(n-1)\cdot(n-\mu_{j})}{(n-k)\cdot\mu_{j}}\right)\right),$

using that $\log(1-x)\geq-x-x^{2}>-2x$ for $0\leq x\leq 1/2$ . The sum is then $\geq e^{c}$ for $t=(n/8)\left(\log\left(k(n-1)(n-\mu_{j})/((n-k)\mu_{j})\right)-c\right)$ . ∎

Example 4.2.

Suppose $\lambda=(n-k,k),\mu=(n-\ell,\ell)$ with $\ell>k$ . Then Lemma 4.1 gives $\chi^{2}_{k\mathbf{e}_{2}}(t)\leq e^{-c}$ for

t=\left(\frac{n}{4}+\frac{k(k-1)}{2(n-2k)}\right)\cdot\left(\log\left(\frac{k\cdot n\cdot(n-\ell)}{(n-2k)\cdot(\ell-k)}\right)+c\right).

In particular if $n$ is even, $k=n/2-1$ , $\ell=n/2$ , then

t=\left(\frac{n}{4}+\frac{(n/2-1)(n/2-2)}{2}\right)\cdot\left(\log\left((n/2-1)\cdot n\cdot(n/2)\right)+c\right)\sim\left(\frac{n}{4}+\frac{n^{2}}{8}\right)\left(\log(n^{3})+c\right).

Note that this is a factor of $n$ larger than the bound in Lemma 3.4 for the distance averaged over all starting states. This could be due to slower mixing from the extreme starting state:

k\mathbf{e}_{2}=\begin{pmatrix}n-\ell+k&\ell-k\\ 0&k\end{pmatrix},

which has small probability under $\pi_{(n-k,k),(n-\ell,\ell)}$ .

4.2 Lower Bound

Because the linear eigenfunctions are known for any size contingency table, Wilson’s method works to give a general lower bound.

Theorem 4.3 (Wilson’s Method, Theorem 13.5 in [37]).

Let $(X_{t})_{t\geq 0}$ be an irreducible aperiodic Markov chain with state space $\Omega$ and transition matrix $P$ . Let $\phi$ be an eigenfunction of $P$ with eigenvalue $\lambda$ with $1/2<\lambda<1$ . Fix $0<\epsilon<1$ and let $R>0$ satisfy

\operatorname{{\mathbb{E}}}\left[|\phi(X_{1})-\phi(x)|^{2}\mid X_{0}=x\right]\leq R,\,\,\,\,\,\,\,\,\forall x\in\Omega.

Then for any $x\in\Omega$ ,

t_{mix}(\epsilon)\geq\frac{1}{2\log(1/\lambda)}\left[\log\left(\frac{(1-\lambda)\phi(x)^{2}}{2R}\right)+\log\left(\frac{1-\epsilon}{\epsilon}\right)\right].

For the contingency table Markov chain on $\mathcal{T}_{\lambda,\mu}$ the linear functions

f_{ij}(\mathbf{x})=x_{ij}-\frac{\lambda_{i}\mu_{j}}{n},\quad\mathbf{x}\in\mathcal{T}_{\lambda,\mu}

for any $i,j$ , are eigenfunctions with eigenvalue $1-2/n$ . These will be used in Wilson’s method to get the following result.

Lemma 4.4.

Let $\lambda=(\lambda_{1},\dots,\lambda_{I}),\mu=(\mu_{1},\dots,\mu_{J})$ be any partitions of $n$ . For any $i,j$ and $c>0$ ,

t_{mix}\geq\begin{cases}\left(\frac{n}{4}-\frac{1}{2}\right)\left(\log\left(m_{ij}-\frac{\lambda_{i}\mu_{j}}{n}\right)-c\right)&\text{if}\,\,\,n\geq 2(\lambda_{i}+\mu_{j})\\ \left(\frac{n}{4}-\frac{1}{2}\right)\left(\log\left(\frac{1}{2}\frac{(nm_{ij}-\lambda_{i}\mu_{j})^{2}}{n(n+2)\lambda_{i}\mu_{j}}\right)-c\right)&\text{if}\,\,\,n<2(\lambda_{i}+\mu_{j})\end{cases},

where $m_{ij}=\min(\lambda_{i},\mu_{j})$ .

Proof.

From Lemmas 3.6 and 3.7,

	$\displaystyle\operatorname{{\mathbb{E}}}[T_{t+1}(i,j)\mid T_{t}=\mathbf{x}]=x_{ij}\left(1-\frac{2}{n}\right)+\frac{2\lambda_{i}\mu_{j}}{n^{2}}$
	$\displaystyle\operatorname{{\mathbb{E}}}[T_{t+1}(i,j)^{2}\mid T_{t}=\mathbf{x}]=x_{ij}^{2}\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)+\frac{2}{n^{2}}\left(x_{ij}(2\lambda_{i}\mu_{j}-2\lambda_{i}-2\mu_{j}+n)+\lambda_{i}\mu_{j})\right).$

This allows the calculation of $\operatorname{{\mathbb{E}}}[|\phi(T_{1})-\phi(\mathbf{x})|^{2}\mid T_{0}=\mathbf{x}]$ for $\phi=f_{ij}$ :

	$\displaystyle\operatorname{{\mathbb{E}}}[\|f_{ij}(T_{1})-f_{ij}(\mathbf{x})\|^{2}\mid T_{0}=\mathbf{x}]$	$\displaystyle=\operatorname{{\mathbb{E}}}[\|T_{1}(i,j)-x_{ij}\|^{2}\mid T_{0}=\mathbf{x}]$
		$\displaystyle=x_{ij}^{2}\left(1-\frac{4}{n}+\frac{4}{n^{2}}\right)+\frac{2}{n^{2}}\left(x_{ij}(2\lambda_{i}\mu_{j}-2\lambda_{i}-2\mu_{j}+n)+\lambda_{i}\mu_{j})\right)$
		$\displaystyle\,\,\,\,\,\,\,\,\,\,-2x_{ij}\left(x_{ij}\left(1-\frac{2}{n}\right)+\frac{2\lambda_{i}\mu_{j}}{n^{2}}\right)+x_{ij}^{2}$
		$\displaystyle=x_{ij}^{2}\cdot\frac{4}{n^{2}}+x_{ij}\cdot\frac{2}{n^{2}}\left(n-2\lambda_{i}-2\mu_{j}\right)+\frac{2\lambda_{i}\mu_{j}}{n^{2}}=:A$

First suppose that $i,j$ are such that $n\geq 2(\lambda_{i}+\mu_{j})$ , and note that $x_{ij}\leq\min(\lambda_{i},\mu_{j})=:m_{ij}$ for $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ . A bound then is

	$\displaystyle A$	$\displaystyle\leq\frac{4m_{ij}^{2}}{n^{2}}+m_{ij}\frac{2}{n^{2}}\left(n-2\lambda_{i}-2\mu_{j}\right)+\frac{2\lambda_{i}\mu_{j}}{n^{2}}$
		$\displaystyle=\frac{2m_{ij}}{n}-\frac{2\lambda_{i}\mu_{j}}{n^{2}}.$

This is the constant $R$ that can be applied to Wilson’s method. Using $\mathbf{x}\in\mathcal{T}_{\lambda,\mu}$ such that $x_{ij}=m_{ij}=\min(\lambda_{i},\mu_{j})$ :

	$\displaystyle\frac{(1-\lambda)f_{ij}(\mathbf{x})^{2}}{2R}$	$\displaystyle=\left.\frac{2}{n}\left(x_{ij}-\frac{\lambda_{i}\mu_{j}}{n}\right)^{2}\middle/2\left(\frac{2m_{ij}}{n}-\frac{2\lambda_{i}\mu_{j}}{n^{2}}\right)\right.$
		$\displaystyle=\left.\frac{2}{n}\left(m_{ij}-\frac{\lambda_{i}\mu_{j}}{n}\right)^{2}\middle/2\left(\frac{2m_{ij}}{n}-\frac{2\lambda_{i}\mu_{j}}{n^{2}}\right)\right.=\frac{1}{2}\left(m_{ij}-\frac{\lambda_{i}\mu_{j}}{n}\right)$

If $n<2(\lambda_{i}+\mu_{j})$ , then

A\leq\frac{4m_{ij}^{2}}{n^{2}}+\frac{2\lambda_{i}\mu_{j}}{n}\leq\frac{2\lambda_{i}\mu_{j}}{n}\left(1+\frac{2}{n}\right),

and in this case

\displaystyle\frac{(1-\lambda)f_{ij}(\mathbf{x})^{2}}{2R}=\left.\frac{2}{n}\left(m_{ij}-\frac{\lambda_{i}\mu_{j}}{n}\right)^{2}\middle/\frac{4\lambda_{i}\mu_{j}}{n}\left(1+\frac{2}{n}\right)\right.=\frac{1}{2}\frac{(nm_{ij}-\lambda_{i}\mu_{j})^{2}}{n(n+2)\lambda_{i}\mu_{j}}

Finally, note that, with $\lambda=1-2/n$ and using the bound $1/-\log(1-\gamma)\geq 1/\gamma-1$ ,

\frac{1}{2\log(1/\lambda)}=\frac{1}{-2\log(1-2/n)}\geq\frac{n}{4}-\frac{1}{2}.

∎

Example 4.5.

Suppose $\lambda=(n-k,k),\mu=(n-\ell,\ell)$ with $k<\ell\leq n/2$ (as in Example 4.2). Then $2n-\ell-k>n$ , so the second case in Lemma 4.4 always applies to the $(1,1)$ entry of the table. Since $m_{11}=\min(n-\ell,n-k)=n-\ell$ , a lower bound is

	$\displaystyle\mathrm{t_{mix}}$	$\displaystyle\geq\left(\frac{n}{4}-\frac{1}{2}\right)\left(\log\left(\frac{1}{2}\frac{(n(n-\ell)-(n-\ell)(n-k))^{2}}{n(n+2)(n-\ell)(n-k)}\right)-c\right)$
		$\displaystyle=\left(\frac{n}{4}-\frac{1}{2}\right)\left(\log\left(\frac{1}{2}\frac{(n-\ell)k^{2}}{n(n+2)(n-k)}\right)-c\right).$

For example, if $k=\ell=n/2$ , then the expression inside the $\log$ is equal to $(n/2)^{2}/(2n(n+2))\sim 1/8$ .

If $\ell,k$ are small enough so that $k+\ell\leq n/2$ , then the first case of Lemma 4.4 applies to the $(2,2)$ entry, with $m_{22}=\min(k,\ell)=k$ . The lower bound is then

\displaystyle\mathrm{t_{mix}}=\left(\frac{n}{4}-\frac{1}{2}\right)\left(\log\left(k-\frac{k\ell}{n}\right)-c\right).

For example, $\ell=k=n/4$ , then the expression inside the $\log$ is equal to $(n^{2}/4-n^{2}/16)/n\sim(3/16)n$ .

5 Further Directions

This section discussions some future directions and applications of the random transpositions chain on contingency tables. Section 5.1 explores how the linear and quadratic eigenfunctions of the Fisher-Yates distribution could be used in statistical applications, to decompose the chi-squared statistic. Section 5.2 notes how the random transpositions chain can be extended to multi-way tables (coming from data with more than $2$ categorical features). Section 5.3 considers the random transpositions chain on $\mathcal{T}_{\lambda,\mu}$ transformed via the Metropolis-Hastings algorithm to a new Markov chain which has uniform stationary distribution (and vice-versa, the symmetric swap chain can be transformed to have Fisher-Yates stationary distribution). The relaxation times of the chains can be compared. The constants in the comparison depend on $\lambda,\mu$ and the size of the table, but the conclusion is that for sparse tables the metropolized version of random transpositions has a significantly smaller relaxation time than the symmetric swap chain. Section 5.4 notes the $q$ -analog of the Fisher-Yates distribution arises via double cosets.

5.1 Data Analysis

This section discusses potential statistical applications of the eigenfunctions for the random transpositions chain on contingency tables. First, some classic datasets are described.

Datasets.

Tables 2, 3, 4 are classical real data tables with large $\chi^{2}$ statistic. Figure 1 shows a histogram of the the quadratic eigenfunctions evaluated on each of these tables, as well as a plot of the Pearson residuals. We have not succeded in finding any extra structure from these displays but believe that they may sometimes be informative.

	Well	Mild	Moderate	Impaired	Total
A	64	94	58	46	262
B	57	94	54	40	245
C	57	105	65	60	287
D	72	141	77	94	384
E	36	97	54	78	265
F	21	71	54	71	217
Total	307	602	362	389	1660

Table 2: Midtown Manhattan Mental Health Study data

Table 2 shows data from an epidemiological survey known as the Midtown Manhattan Mental Health Study [39]. Rows record parent’s socioeconomic status (ranging from A = high, to F = low) and columns severity of mental illness. The $\chi^{2}$ statistic is $45.98$ on $15$ degrees of freedom.

	Jan	Feb	March	April	May	June	July	Aug	Sep	Oct	Nov	Dec	Total
Jan	1	0	0	0	1	2	0	0	1	0	1	0	6
Feb	1	0	0	1	0	0	0	0	0	1	0	2	5
March	1	0	0	0	2	1	0	0	0	0	0	1	5
April	3	0	2	0	0	0	1	0	1	3	1	1	12
May	2	1	1	1	1	1	1	1	1	1	1	0	12
June	2	0	0	0	1	0	0	0	0	0	0	0	3
July	2	0	2	1	0	0	0	0	1	1	1	2	10
Aug	0	0	0	3	0	0	1	0	0	1	0	2	7
Sep	0	0	0	1	1	0	0	0	0	0	1	0	3
Oct	1	1	0	2	0	0	1	0	0	1	1	0	7
Nov	0	1	1	1	2	0	0	2	0	1	1	0	9
Dec	0	1	1	0	0	0	1	0	0	0	0	0	3
Total	13	4	7	10	8	4	5	3	4	9	7	8	82

Table 3: Birth and deathday for Queen Victoria’s descendants.

Table 3 records the month of birth and death for $82$ descendants of Queen Victoria, occurs as an example in [17]. The $\chi^{2}$ statistic is $115.6$ with $121$ degrees of freedom, which gives $p$ -value $0.621$ , suggesting we do not reject the null hypothesis for independence. The classical rules of thumb for validity of the chi-square approximation is that there is a minimum of $5$ entries per cell; this assumption is are badly violated in Table 3, and there are too many tables with these margins for exact enumeration.

	Black	Brown	Red	Blond	Total
Brown	68	119	26	7	220
Blue	20	84	17	94	215
Hazel	15	54	14	10	93
Green	5	29	14	16	64
Total	108	286	71	127	592

Table 4: Eye color vs. hair color for

n=592

individuals.

Table 4 was analyzed in [10]; the $\chi^{2}$ statistic is $138.29$ with $9$ degrees of freedom.

Refer to caption — Figure 1: The figures in the left column are the Pearson residuals, which are the normalized linear eigenfunctions; the $x$ -axis indicates an arbitrary ordering of the residuals and the $y$ -axis are the values of the residuals. The right column shows histograms for the normalized quadratic eigenfunctions.

Residuals.

The linear eigenfunctions derived in Lemma 3.6 are well known as ‘Pearson residuals’ in the classical analysis of contingency tables [1]. The naturally scaled eigenfunctions

\widehat{f}_{ij}(T)=\frac{T_{ij}-\lambda_{i}\mu_{j}/n}{\sqrt{\lambda_{i}\mu_{j}/n}}

measure the departure of the table from the null model. Standard practice displays all of these in a two way array. Another scaling is inspired by the inner product space with respect to the Fisher-Yates distribution, so that the functions have norm $1$ :

\widetilde{f}_{ij}(T)=\frac{T_{ij}-\lambda_{i}\mu_{j}/n}{\sqrt{c_{ij}}},\quad c_{ij}:=\frac{\lambda_{i}\mu_{j}(n-\lambda_{i})(n-\mu_{j})}{n^{2}(n-1)}=\operatorname{{\mathbb{E}}}_{\pi_{\lambda,\mu}}\left[(T_{ij}-\lambda_{i}\mu_{j}/n)^{2}\right].

Table 7 compares $\widehat{f}_{ij}$ and $\widehat{f}_{ij}$ for the Midtown Manhattan Mental Health dataset from Table 2.

	Well	Mild	Moderate	Impaired
A	2.233	-0.104	0.114	-1.965
B	1.737	0.546	0.078	-2.298
C	0.538	0.090	0.305	-0.885
D	0.117	0.148	-0.737	0.423
E	-1.858	0.092	-0.498	2.018
F	-3.020	-0.867	0.971	2.826

Table 5: The standard Pearson residuals,

\widehat{f}_{ij}

	Well	Mild	Moderate	Impaired
A	2.695	-0.142	0.141	-2.446
B	2.083	0.741	0.096	-2.844
C	0.656	0.124	0.379	-1.111
D	0.147	0.211	-0.950	0.551
E	-2.245	0.125	-0.615	2.515
F	-3.587	-1.165	1.177	3.462

Table 6: The linear eigenfunctions

\widehat{f}_{ij}

standardized to have norm

1

Table 7: Comparison of the different normalizations

\widehat{f}_{ij}

and

\widetilde{f}_{ij}

of the linear eigenfunctions, for the dataset from Table 2.

Chi-squared Decomposition.

It is natural to try to use the quadratic eigenfunctions in a similar way. They do not have an interpretation as (observed - expected) for some observed statistic, but they do have expectation $0$ with respect to $\pi_{\lambda,\mu}$ .

A potential use for the polynomial eigenfunctions is in a decomposition of the chi-squared statistic

\displaystyle\chi^{2}(T)=\sum_{i,j}\frac{(T_{ij}-\lambda_{i}\mu_{j}/n)^{2}}{\lambda_{i}\mu_{j}/n}.

(23)

Under the null hypothesis that row and column features are independent, $\chi^{2}(T)$ has a limiting chi-squared distribution with $(I-1)\cdot(J-1)$ degrees of freedom. A drawback of using $\chi^{2}(T)$ is that when it is large enough to reject the null hypothesis, no additional information is given about why the sample fails the test. This motivates the subject of partitioning the statistic into terms which could give insight into what parts of the table fail the independence hypothesis. The thesis [45] contains a thorough review of this problem, and a theory for decomposing $\chi^{2}(T)$ using Markov chains on the space $\{1,\dots,I\}\times\{1,\dots,J\}$ . Presented below is a natural decomposition using the linear and quadratic eigenfunctions $f_{ij}$ and $f_{(i,j),(i,j)}$ .

Let $M_{ij}=\lambda_{i}\mu_{j}/n$ , $K_{ij},L_{ij}$ be such that

	$\displaystyle K_{ij}=\frac{2\lambda_{i}\mu_{j}-2\lambda_{i}-2\mu_{j}+n}{n-2}$
	$\displaystyle L_{ij}=\frac{\lambda_{i}\mu_{j}(1+\lambda_{i}\mu_{j}-\lambda_{i}-\mu_{j})}{(n-1)(n-2)},$

so that $f_{(i,j),(i,j)}(T)=T_{ij}^{2}-K_{ij}T_{ij}+L_{ij}$ . Then,

	$\displaystyle\chi^{2}(T)$	$\displaystyle=\sum\frac{1}{M_{ij}}(T_{ij}-M_{ij})^{2}=\sum\frac{1}{M_{ij}}(T_{ij}^{2}-2M_{ij}T_{ij}+M_{ij}^{2})$
		$\displaystyle=\sum\frac{1}{M_{ij}}(T_{ij}^{2}-K_{ij}T_{ij}+L_{ij}+(2M_{ij}-K_{ij})T_{ij}+M_{ij}^{2}-L_{ij})$
		$\displaystyle=\sum_{i,j}\frac{1}{M_{ij}}f_{(i,j),(i,j)}(T)+\sum_{i,j}\frac{2M_{ij}-K_{ij}}{M_{ij}}f_{(i,j)}(T)+\sum_{i,j}\frac{M_{ij}^{2}-L_{ij}}{M_{ij}}.$

Figure 1 shows histograms of the normalized quadratic eigenfunctions $f_{(i,j),(k,l)}/\operatorname{{\mathbb{E}}}_{\pi}\left[f_{(i,j),(k,l)}\right]$ .

5.2 Higher Dimensional Tables

The random swap Markov chain can be used as inspiration for Markov chains on $m$ -way contingency tables with fixed margins for $m>2$ . This section describes how that could go for $m=3$ .

Let $\lambda,\mu,\rho$ be three partitions of $n$ with $|\lambda|=I,|\mu|=J,|\rho|=K$ . Let $\mathcal{T}_{\lambda,\mu,\rho}$ be the set of three-way tables with fixed margins determined by $\lambda,\mu,\rho$ . That is $T=\{T_{ijk}:1\leq i\leq I,1\leq j,\leq J,1\leq k\leq K\}\in\mathcal{T}_{\lambda,\mu,\rho}$ if

\displaystyle\sum_{jk}T_{ijk}=\lambda_{i},\quad\sum_{ik}T_{ijk}=\mu_{j},\quad\sum_{ij}T_{ijk}=\rho_{k},\quad\text{for all}\,\,\,1\leq i\leq I,1\leq j\leq J,1\leq k\leq K.

The partitions $\lambda,\mu,\rho$ are the sufficient statistics for the complete independence model: The probability of an entry being $(i,j,k)$ is $p_{ijk}=\theta_{i}\eta_{j}\gamma_{k}$ . A table can by thought of as tri-partite hypergraph, where each edge connects exactly three vertices.

Representing a table by a set of $n$ tuples $(i,j,k)$ , we can describe a similar adjacent swap Markov chain as: Pick $2$ tuples $(i_{1},j_{1},k_{1}),(i_{2},j_{2},k_{2})$ , pick $r\in\{1,2,3\}$ uniformly, and swap the $r$ th entry in the two tuples. For example,

(i_{1},j_{1},k_{1}),(i_{2},j_{2},k_{2})\to\begin{cases}(i_{2},j_{1},k_{1}),(i_{1},j_{2},k_{2})\\ (i_{1},j_{2},k_{1}),(i_{2},j_{1},k_{2})\\ (i_{1},j_{1},k_{2}),(i_{2},j_{2},k_{1})\end{cases}.

This move still corresponds to adding $\begin{pmatrix}-1&1\cr 1&-1\end{pmatrix}$ to a $2\times 2$ submatrix of the contingency table. The probability of picking the two tuples $2T_{i_{1}j_{1}k_{1}}\cdot T_{i_{2}j_{2}k_{2}}/n^{2}$ . To be precise about the Markov chain, let $F_{(i_{1},j_{1},k_{1}),(i_{2},j_{2},k_{2}),r}(T)$ denote the table where the swap was made in the indicated entries of the table at the $r$ index, with $r\in\{1,2,3\}$ . Then,

\displaystyle P(T,F_{(i_{1},j_{1},k_{1}),(i_{2},j_{2},k_{2}),r}(T))=\frac{1}{3}\cdot\frac{2T_{i_{1}j_{1}k_{1}}T_{i_{2}j_{2}k_{2}}}{n^{2}}.

These same moves were proposed in [17] as input for a Metropolis Markov chain for log-linear models on multi-way tables (Section 4.2); for that Markov chain the $2$ tuples to be swapped are chosen uniformly from all entries (and if the swap move results in a negative value in the table, it is rejected), thus the chain has uniform stationary distribution.

Theorem 5.1.

The Markov chain $P$ on $\mathcal{T}_{\lambda,\mu,\rho}$ is connected and reversible with respect to the natural analog of Fisher-Yates for 3-way tables:

\pi_{\lambda,\mu,\rho}(T):=\frac{1}{n!}\prod_{i,j,k}\frac{\lambda_{i}!\mu_{j}!\rho_{k}!}{T_{ijk}!}.

Proof.

To see that the Markov chain is connected, we can define a partial order on the space $\mathcal{T}_{\lambda,\mu,\rho}$ , analagous to the majorization order on $\mathcal{T}_{\lambda,\mu}$ used in Proposition 2.10: Write $T\prec T^{\prime}$ if

\sum_{r=1}^{R}\sum_{s=1}^{S}\sum_{t=1}^{T}T_{i_{r}j_{s}k_{t}}\leq\sum_{r=1}^{R}\sum_{s=1}^{S}\sum_{t=1}^{T}T^{\prime}_{i_{r}j_{s}k_{t}},\quad\text{for all}\,\,\,1\leq R\leq I,1\leq s\leq J,1\leq T\leq K.

One move of the Markov chain corresponds to one edge in the lattice defined by this partial order (i.e. if $P(T,T^{\prime})>0$ then $T\prec T^{\prime}$ or $T^{\prime}\prec T$ ). Thus, every table can transition to the largest element and so the space is connected under $P$ .

To see that the chain is reversible with the correct stationary distribution, given $T\in\mathcal{T}_{\lambda,\mu,\rho}$ suppose $T^{\prime}=F_{(i_{1},j_{1},k_{1}),(i_{2},j_{2},k_{2}),r}(T)$ with $r=1$ so that $P(T,T^{\prime})>0$ . Then,

\displaystyle\pi_{\lambda,\mu,\rho}(T^{\prime})=\pi_{\lambda,\mu,\rho}(T)\cdot\frac{T_{i_{1}j_{1}k_{1}}T_{i_{2}j_{2}k_{2}}}{(T_{i_{2}j_{1}k_{1}}+1)(T_{i1j_{2}k_{2}}+1)}.

Then,

	$\displaystyle\pi_{\lambda,\mu,\rho}(T^{\prime})P(T^{\prime},T)$	$\displaystyle=\pi_{\lambda,\mu,\rho}(T)\cdot\frac{T_{i_{1}j_{1}k_{1}}T_{i_{2}j_{2}k_{2}}}{(T_{i_{2}j_{1}k_{1}}+1)(T_{i1j_{2}k_{2}}+1)}\cdot\frac{2(T_{i_{2}j_{1}k_{1}}+1)(T_{i1j_{2}k_{2}}+1)}{3n^{2}}$
		$\displaystyle=\pi_{\lambda,\mu,\rho}(T)\cdot\frac{2T_{i_{1}j_{1}k_{1}}T_{i_{2}j_{2}k_{2}}}{3n^{2}}=\pi_{\lambda,\mu,\rho}(T)P(T,T^{\prime}).$

The calculation is analogous for $r=2,3$ . ∎

Remark 16.

It would be interesting to have a double-coset representation for $\mathcal{T}_{\lambda,\mu,\rho}$ , but we have not discovered one.

5.3 Comparison of Markov Chains

This new chain on contingency tables can be compared to other Markov chains on contingency tables using simple spectral comparison techniques, as developed in [13].

The comparison technique relies on the variational characterization of the spectral gap of a Markov chain: If $P$ is a reversible Markov chain on $\Omega$ with stationary distribution $\pi$ and $\gamma=1-\lambda_{2}$ the spectral gap, then

\gamma=\min_{f:\Omega\to\mathbb{R}:\text{Var}_{\pi}(f)}\frac{\mathcal{E}(f)}{\text{Var}_{\pi}(f)},

where

\mathcal{E}(f):=\frac{1}{2}\sum_{x,y\in\Omega}\left(f(x)-f(y)\right)^{2}\pi(x)P(x,y).

The relaxation time $\tau$ is defined as the inverse of the absolute spectral gap $\tau=1/\gamma$ , and can be used as one measure of mixing.

Lemma 5.2 (Lemma 13.22 in [37]).

Let $P$ and $\widetilde{P}$ be reversible transition matrices on $\Omega$ with stationary distributions $\pi$ and $\widetilde{\pi}$ , respectively. If $\widetilde{\mathcal{E}}(f)\leq\alpha\mathcal{E}(f)$ for all $f$ , then

\widetilde{\gamma}\leq\left(\max_{x\in\Omega}\frac{\pi(x)}{\widetilde{\pi}(x)}\right)\alpha\gamma.

(24)

The term $\max_{x\in\Omega}\pi(x)/\widetilde{\pi}(x)$ makes it difficult to compare Markov chains with stationary distributions which are very different. The uniform distribution and the Fisher-Yates distribution on $\mathcal{T}_{\lambda,\mu}$ are exponentially different and the ratio of the distribution cannot be bounded by a constant.

Example 5.3.

With $\lambda=(n-k,k),\mu=(n-\ell,\ell)$ , with $k\leq\ell\leq n/2$ , there are $k$ tables. The table with the smallest Fisher-Yates probability is

T=\begin{pmatrix}n-\ell-k&\ell\cr k&0\end{pmatrix},

with $\pi_{FY}(T)=(n-k)!(n-\ell)!/((n-k-\ell)!n!)$ . Taking $k=\ell=n/2$ for example gives

\pi_{FY}(T)=\frac{(n/2)!(n/2)!}{n!}\sim\frac{(n/2)^{n/2}(n/2)^{n/2}}{n^{n}}=\frac{1}{2^{n}},

by Stirling’s approximation. This is in sharp contrast with the uniform probability $\pi_{U}(T)=1/k=2/n$ .

Instead, we can use the Metropolis algorithm on either the symmetric swap Markov chain or the random transpositions Markov chain to get two chains with the same distribution to compare. This is done in both cases in the following sections.

Let $P_{FY}$ be the random transpositions Markov chain on $\mathcal{T}_{\lambda,\mu}$ from Definition 1.4. Let $P_{U}$ be the symmetric swap Markov chain which has stationary uniform distribution. That is,

P_{U}(\mathbf{x},\mathbf{y})=\begin{cases}\frac{2}{(IJ)^{2}}&\text{if}\quad\mathbf{y}=F_{(i_{1},j_{1}),(i_{2},j_{2})}(\mathbf{x})\\ 1-\sum_{i_{1}<i_{2}}\sum_{j_{2}<j_{2}}\frac{2}{(IJ)^{2}}&\text{if}\quad\mathbf{y}=\mathbf{x}\\ 0&\text{otherwise}\end{cases}.

Let $P_{U}^{M}$ be the chain from Metropolizing $P_{FY}$ to have uniform stationary distribution and $P_{FY}^{M}$ be the result of Metropolizing $P_{U}$ to have stationary distribution Fisher-Yates.

Theorem 5.4.

Let $\tau_{FY},\tau_{FY}^{M},\tau_{U},\tau_{U}^{M}$ be the associated relaxation times for the Markov chains $P_{FY},P_{FY}^{M},P_{U},P_{U}^{M}$ . Define

	$\displaystyle m_{\lambda}=\min\{x_{ij}>0:\mathbf{x}\in\mathcal{T}_{\lambda,\mu}\}$
	$\displaystyle M_{\lambda}=\max\{x_{ij}>0:\mathbf{x}\in\mathcal{T}_{\lambda,\mu}\}$

If $|\lambda|=I,|\mu|=J$ , then

(a)

$\frac{m_{\lambda,\mu}^{2}(IJ)^{2}}{n^{2}}\tau_{U}^{M}\leq\tau_{U}\leq\frac{M_{\lambda,\mu}^{2}(IJ)^{2}}{n^{2}}\tau_{U}^{M},$

(b)

\frac{n^{2}}{(IJ)^{2}M_{\lambda,\mu}^{4}}\tau_{FY}^{M}\leq\tau_{FY}\leq\frac{n^{2}}{(IJ)^{2}m_{\lambda,\mu}^{4}}\tau_{FY}^{M}.

For any $\lambda,\mu$ , note that $m_{\lambda,\mu}\geq 1$ and $M_{\lambda,\mu}\leq\max(\lambda_{i},\mu_{j})$ .

Example 5.5.

If $\lambda=\mu=(c,c,\dots,c)$ , $I=J=n/c$ , then $m_{\lambda,\mu}=1$ and $M_{\lambda,\mu}=c$ . Then Theorem 5.4 shows

\frac{n^{2}}{c^{4}}\tau_{U}^{M}\leq\tau_{U}\leq\frac{n^{2}}{c^{2}}\tau_{U}^{M}.

The intuition is that if $n$ is growing and $c$ is fixed then the tables are relatively sparse, and so the Metropolis chain is much more likely than the uniform swap chain to propose moves which are accepted. Accepting more moves means exploring the state space more quickly, and thus achieving stationarity. Similarly, for the chains with Fisher-Yates stationary distribution in this setting,

\frac{1}{n^{2}}\tau_{FY}^{M}\leq\tau_{FY}\leq\frac{c^{4}}{n^{2}}\tau_{FY}^{M}.

Again, if $c$ is fixed and $n$ is larger the Markov chain utilizing the random transpositions has significantly smaller relaxation time than the Markov chain created by using the uniform swap moves with the Metropolis algorithm. Note that if $c=1$ , then the Fisher-Yates distribution is exactly the uniform distribution.

The remainder of this section reviews the Metropolis-Hastings Algorithm, finds the transition probabilities $P_{U}^{M}$ and $P_{FY}^{M}$ , and then proves Theorem 5.4.

Metropolis-Hastings Algorithm

The random transpositions chain could be used as the proposal Markov chain in Metropolis-Hastings algorithm to sample from the uniform distribution. Suppose $P$ is a Markov chain on $\Omega$ and $\pi$ is a target distribution. A new Markov chain on $\Omega$ which is reversible with respect to $\pi$ is defined by: From $x$

1.

Generate a random candidate $y$ according to one step of the $P$ chain, i.e. $y\sim P(x,y)$ .
2.

Compute the acceptance probability

$A(x,y)=\min\left(1,\frac{\pi(y)P(y,x)}{\pi(x)P(x,y)}\right).$
3.

With probability $A(x,y)$ , move to $y$ . Otherwise, stay at the current state $x$ .

The transitions for the new Markov chain are defined

K(x,y)=P(x,y)\cdot A(x,y).

Uniform Distribution Comparison

For two states $\mathbf{x},\mathbf{y}\in\mathcal{T}_{\lambda,\mu}$ such that $P(\mathbf{x},\mathbf{y})>0$ , the acceptance probability can be computed. One way to see this is to note that since $P$ is reversible with respect to the Fisher-Yates distribution $\pi_{\lambda,\mu}$ , it is

\frac{P(\mathbf{y},\mathbf{x})}{P(\mathbf{x},\mathbf{y})}=\frac{\pi_{\lambda,\mu}(\mathbf{x})}{\pi_{\lambda,\mu}(\mathbf{y})}=\frac{(x_{i_{1},j_{2}}+1)(x_{i_{2},j_{1}}+1)}{x_{i_{1},j_{1}}\cdot x_{i_{2},j_{2}}},

supposing that $\mathbf{y}=F_{(i_{1},j_{1}),(i_{2},j_{2})}(\mathbf{x})$ .

In conclusion, the Metropolis chain has transitions

	$\displaystyle P_{U}^{M}(\mathbf{x},\mathbf{y})$	$\displaystyle=\frac{2x_{i_{1},j_{1}}x_{i_{2},j_{2}}}{n^{2}}\cdot\min\left(1,\frac{(x_{i_{1},j_{2}}+1)(x_{i_{2},j_{1}}+1)}{x_{i_{1},j_{1}}\cdot x_{i_{2},j_{2}}}\right)=\min\left(\frac{2x_{i_{1},j_{1}}x_{i_{2},j_{2}}}{n^{2}},\frac{(2x_{i_{1},j_{2}}+1)(x_{i_{2},j_{1}}+1)}{n^{2}}\right)$
		$\displaystyle=\min\left(P_{FY}(\mathbf{x},\mathbf{y}),P_{FY}(\mathbf{y},\mathbf{x})\right).$

This is clearly a symmetric Markov chain and has uniform stationary distribution. Note that this in general, any chain $P(x,y)$ when transformed via Metropolis to give the uniform stationary distribution is if the form $P_{U}(x,y)=\min(P(x,y),P(y,x))$ .

To make the comparison of Dirichlet forms, since the stationary distributions are the same it is enough to compare the transition probabilities. This can be done,

P_{U}(\mathbf{x},\mathbf{y})=\frac{n^{2}}{m_{\lambda,\mu}^{2}(IJ)^{2}}\cdot\frac{2m_{\lambda,\mu}^{2}}{n^{2}}\leq\frac{n^{2}}{m_{\lambda,\mu}^{2}(IJ)^{2}}P_{U}^{M}(\mathbf{x},\mathbf{y}),

and similarly

P_{FY}(\mathbf{x},\mathbf{y})=\frac{n^{2}}{M_{\lambda,\mu}^{2}(IJ)^{2}}\cdot\frac{2M_{\lambda,\mu}^{2}}{n^{2}}\geq\frac{n^{2}}{M_{\lambda,\mu}^{2}(IJ)^{2}}P_{U}^{M}(\mathbf{x},\mathbf{y}).

This gives the result, since

\displaystyle\alpha_{1}P_{U}^{M}(\mathbf{x},\mathbf{y})\leq P_{U}(\mathbf{x},\mathbf{y})\leq\alpha_{2}P_{U}^{M}(\mathbf{x},\mathbf{y})\implies\frac{1}{\alpha_{2}}\tau_{U}^{M}\leq\tau_{U}\leq\frac{1}{\alpha_{1}}\tau_{U}^{M}.

Fisher-Yates Distribution Comparison

In the other direction, we could take the symmetric swap Markov chain and Metropolize it to have stationary distribution Fisher-Yates. The acceptance probability in this direction is

	$\displaystyle A(\mathbf{x},\mathbf{y})$	$\displaystyle=\min\left(1,\frac{\pi_{FY}(\mathbf{y})P_{U}(\mathbf{y},\mathbf{x})}{\pi_{FY}(\mathbf{x})P_{U}(\mathbf{x},\mathbf{y})}\right)$
		$\displaystyle=\min\left(1,\frac{\pi_{FY}(\mathbf{y})}{\pi_{FY}(\mathbf{x})}\right)=\min\left(1,\frac{x_{i_{1},j_{1}}\cdot x_{i_{2},j_{2}}}{(x_{i_{1},j_{2}}+1)(x_{i_{2},j_{1}}+1)}\right).$

The transition probabilities, for $\mathbf{y}=F_{(i_{1},j_{1}),(i_{2},j_{2})}(\mathbf{x})$ are then

\displaystyle P_{FY}^{M}(\mathbf{x},\mathbf{y})=\frac{2}{(IJ)^{2}}\min\left(1,\frac{x_{i_{1},j_{1}}\cdot x_{i_{2},j_{2}}}{(x_{i_{1},j_{2}}+1)(x_{i_{2},j_{1}}+1)}\right).

To compare this with the random transpositions chain $P_{FY}$ ,

	$\displaystyle P_{FY}(\mathbf{x},\mathbf{y})$	$\displaystyle=\frac{2x_{i_{1},j_{1}}x_{i_{2},j_{2}}}{n^{2}}\leq\frac{2M_{\lambda,\mu}^{2}}{n^{2}}$
		$\displaystyle=\frac{(IJ)^{2}M_{\lambda,\mu}^{4}}{n^{2}}\cdot\frac{2m_{\lambda,\mu}^{2}}{M_{\lambda,\mu}^{2}(IJ)^{2}}\leq\frac{(IJ)^{2}M_{\lambda,\mu}^{4}}{n^{2}}\cdot P_{FY}^{M}(\mathbf{x},\mathbf{y})$

And similarly

	$\displaystyle P_{U}(\mathbf{x},\mathbf{y})$	$\displaystyle=\frac{2x_{i_{1},j_{1}}x_{i_{2},j_{2}}}{n^{2}}\geq\frac{2m_{\lambda,\mu}^{2}}{n^{2}}$
		$\displaystyle=\frac{(IJ)^{2}m_{\lambda,\mu}^{4}}{n^{2}}\cdot\frac{2M_{\lambda,\mu}^{2}}{m_{\lambda,\mu}^{2}(IJ)^{2}}\geq\frac{(IJ)^{2}m_{\lambda,\mu}^{4}}{n^{2}}\cdot P_{FY}^{M}(\mathbf{x},\mathbf{y})$

This proves part (b) of Theorem 5.4.

5.4 Parabolic Subgroups of $GL_{n}$

Contingency tables also arise from double cosets using the parabolic subgroups of $GL_{n}(q)$ , as described in [34].

Definition 5.6.

Let $\alpha=(\alpha_{1},\dots,\alpha_{k})$ be a partition of $n$ . The parabolic subgroup $P_{\alpha}\subset GL_{n}(q)$ consists of all invertible block upper-triangular matrices with diagonal block sizes $\alpha_{1},\dots,\alpha_{k}$

Section 4 of [34] shows that if $\alpha,\beta$ are two partitions of $n$ , the double cosets $P_{\alpha}\backslash GL_{n}(q)/P_{\beta}$ are indexed by contingency tables with row sums $\alpha$ and column sums $\beta$ . Proposition 4.37 contains the size of a double-coset which corresponds to the table $T$ , with $\theta=1/q$

\theta^{-n^{2}+\sum_{1\leq i<i^{\prime}\leq I,1\leq j<j^{\prime}\leq J}T_{ij}T_{i^{\prime}j^{\prime}}}(1-\theta)^{n}\cdot\frac{\prod_{i=1}^{I}[\alpha_{i}]_{\theta}!\prod_{j=1}^{J}[\beta_{j}]_{\theta}!}{\prod_{i,j}[M_{ij}]_{\theta}!},\quad I=|\alpha|,J=|\beta|.

(25)

Dividing (25) by $(1-\theta)^{n}$ and setting $\theta=1$ recovers the usual Fisher-Yates distribution for partitions $\alpha,\beta$ . It remains an open problem to investigate these probability distributions and Markov chains on the double cosets of $GL_{n}(q)$ from parabolic subgroups.

References

[1] A. Agresti. A survey of exact inference for contingency tables. Statist. Sci., 7(1):131–177, 1992. With comments and a rejoinder by the author.
[2] G. Amanatidis and P. Kleer. Rapid mixing of the switch Markov chain for strongly stable degree sequences. Random Structures & Algorithms, 57(3):637–657, 2020.
[3] E. D. Belsley. Rates of convergence of random walk on distance regular graphs. Probability theory and related fields, 112(4):493–533, 1998.
[4] S. C. Billey, M. Konvalinka, T. K. Petersen, W. Slofstra, and B. E. Tenner. Parabolic double cosets in coxeter groups, 2017.
[5] O. Bormashenko. A coupling argument for the random transposition walk. arXiv preprint arXiv:1109.3915, 2011.
[6] T. Ceccherini Silberstein, F. Scarabotti, and F. Tolli. Harmonic analysis on finite groups, volume 108. Cambridge University Press, 2008.
[7] F. R. K. Chung, R. L. Graham, and S.-T. Yau. On sampling with Markov chains. Random Structures & Algorithms, 9(1-2):55–77, 1996.
[8] P. Diaconis. Group representations in probability and statistics, volume 11 of Institute of Mathematical Statistics Lecture Notes—Monograph Series. Institute of Mathematical Statistics, Hayward, CA, 1988.
[9] P. Diaconis. A generalization of spectral analysis with application to ranked data. The Annals of Statistics, pages 949–979, 1989.
[10] P. Diaconis and B. Efron. Testing for independence in a two-way table: new interpretations of the chi-square statistic. Ann. Statist., 13(3):845–913, 1985. With discussions and with a reply by the authors.
[11] P. Diaconis and A. Gangolli. Rectangular arrays with fixed margins. In Discrete probability and algorithms, pages 15–41. Springer, 1995.
[12] P. Diaconis, A. Ram, and M. Simper. Double coset markov chains and random transvections, 2022.
[13] P. Diaconis and L. Saloff-Coste. Comparison theorems for reversible markov chains. The Annals of Applied Probability, 3(3):696–730, 1993.
[14] P. Diaconis and M. Shahshahani. Generating a random permutation with random transpositions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 57(2):159–179, 1981.
[15] P. Diaconis and M. Shahshahani. Time to reach stationarity in the Bernoulli-Laplace diffusion model. SIAM J. Math. Anal., 18(1):208–218, 1987.
[16] P. Diaconis and M. Simper. Statistical enumeration of groups by double cosets. Journal of Algebra, 607:214–246, 2022. Special Issue dedicated to J. Saxl.
[17] P. Diaconis and B. Sturmfels. Algebraic algorithms for sampling from conditional distributions. Annals of statistics, 26(1):363–397, 1998.
[18] P. Diaconis and P. M. Wood. Random doubly stochastic tridiagonal matrices. Random Structures & Algorithms, 42(4):403–437, 2013.
[19] C. F. Dunkl and Y. Xu. Orthogonal Polynomials of Several Variables. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2 edition, 2014.
[20] U. Dutta, B. K. Fosdick, and A. Clauset. Sampling random graphs with specified degree sequences. arXiv e-prints, pages arXiv–2105, 2021.
[21] A. Eskenazis and E. Nestoridi. Cutoff for the Bernoulli–Laplace urn model with $o(n)$ swaps. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 56, pages 2621–2639. Institut Henri Poincaré, 2020.
[22] M. Fayers. A note on Kostka numbers. arXiv preprint arXiv:1903.12499, 2019.
[23] B. Griffiths. Orthogonal polynomials on the multinomial distribution. Australian Journal of Statistics, 13(1):27–35, 1971.
[24] R. C. Griffiths and D. Spano. Multivariate Jacobi and Laguerre polynomials, infinite-dimensional extensions, and their probabilistic connections with multivariate Hahn and Meixner polynomials. Bernoulli, 17(3):1095–1125, 2011.
[25] D. Hernek. Random generation of 2 $\times n$ contingency tables. Random Structures & Algorithms, 13(1):71–79, 1998.
[26] P. Iliev and Y. Xu. Discrete orthogonal polynomials and difference equations of several variables. Advances in Mathematics, 212(1):1–36, 2007.
[27] I. M. Isaacs. Character theory of finite groups, volume 69. Courier Corporation, 1994.
[28] M. E. H. Ismail. Classical and Quantum Orthogonal Polynomials in One Variable. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2005.
[29] James. The Representation Theory of the Symmetric Group. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1984.
[30] G. James, M. W. Liebeck, and M. Liebeck. Representations and characters of groups. Cambridge University Press, 2001.
[31] G. D. James. The representation theory of the symmetric groups. In Proc. Symposia in Pure Math, volume 47, pages 111–126, 1987.
[32] H. Joe. An ordering of dependence for contingency tables. Linear algebra and its applications, 70:89–103, 1985.
[33] S.-h. Kang and J. Klotz. Limiting conditional distribution for tests of independence in the two way table. Comm. Statist. Theory Methods, 27(8):2075–2082, 1998.
[34] S. N. Karp and H. Thomas. $q$ -Whittaker functions, finite fields, and Jordan forms. arXiv preprint arXiv:2207.12590, 2022.
[35] K. Khare and H. Zhou. Rates of convergence of some multivariate Markov chains with polynomial eigenfunctions. The Annals of Applied Probability, 19(2):737–777, 2009.
[36] H. O. Lancaster. The chi-squared distribution. John Wiley & Sons, Inc., New York-London-Sydney, 1969.
[37] D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
[38] I. G. Macdonald. Symmetric functions and Hall polynomials. Oxford university press, 1998.
[39] B. MacMahon. Mental health in the metropolis: The midtown Manhattan study, 1963.
[40] A. W. Marshall, I. Olkin, and B. C. Arnold. Inequalities: theory of majorization and its applications, volume 143. Springer, 1979.
[41] E. Nestoridi and O. Nguyen. On the mixing time of the Diaconis–Gangolli random walk on contingency tables over $\mathbb{Z}/q\mathbb{Z}$ . In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 56, pages 983–1001. Institut Henri Poincaré, 2020.
[42] J. Paguyo. Fixed points, descents, and inversions in parabolic double cosets of the symmetric group. arXiv preprint arXiv:2112.07728, 2021.
[43] C. Pang. Lumpings of algebraic Markov chains arise from subquotients. Journal of Theoretical Probability, 32(4):1804–1844, 2019.
[44] S. RE Ingram. Some characters of the symmetric group. Proceedings of the American Mathematical Society, pages 358–369, 1950.
[45] J. Salzman. Spectral analysis with Markov chains. PhD thesis, Stanford University, 2007. Copyright - Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works; Last updated - 2021-09-28.
[46] F. Scarabotti. Time to reach stationarity in the bernoulli–laplace diffusion model with many urns. Advances in Applied Mathematics, 18(3):351–371, 1997.
[47] J.-P. Serre. Linear representations of finite groups, volume 42. Springer, 1977.
[48] R. P. Stanley. Enumerative Combinatorics, volume 1 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2 edition, 2011.

Random Transpositions on Contingency Tables

Abstract

1 Introduction

1.1 Contingency Tables as Double Cosets

Definition 1.1 (Young Subgroup).

Lemma 1.2.

Remark 1.

Example 1.3.

1.2 Markov Chain and Main Results

Definition 1.4 (Random Transpositions Markov chain on 𝒯λ,μ\mathcal{T}_{\lambda,\mu}).

Example 1.5.

Remark 2.

Remark 3.

Theorem 1.6.

Theorem 1.7.

1.3 Related Work

1.4 Outline

Acknowledgments

2 Background

2.1 Double Coset Markov Chains

Proposition 2.1.

Remark 4.

Lemma 2.2.

Proof.

Eigenvalues

Lemma 2.3.

Example 2.4.

Proposition 2.5.

2.2 Random Transpositions on SnS_{n}

Lemma 2.6 ([14], Corollary 1 & Lemma 7).

2.3 Orthogonal Polynomials

Lemma 2.7 ([35], Lemma 3.2).

Multivariate Hahn Polynomials

Proposition 2.8 (Proposition 2.6 from [35]).

2.4 Fisher-Yates Distribution

Majorization Order

Example 2.9.

Proposition 2.10.

Proof.

Remark 5.

Example 2.11.

3 Eigenvalues and Eigenfunctions

Example 3.1.

Corollary 3.2.

Remark 6.

Proof.

Remark 7.

Lemma 3.3.

Remark 8.

3.1 Average Case Mixing Time

Lemma 3.4.

Proof.

Remark 9.

Remark 10.

Remark 11.

Remark 12.

3.2 Orthogonal Polynomials

Theorem 3.5.

Proof.

Lemma 3.6.

Proof.

Remark 13.

Lemma 3.7.

Proof.

Lemma 3.8 (Quadratic Eigenfunctions).

Remark 14.

Proof.

4 Mixing Time

4.1 Upper Bound for 2×J2\times J tables

Theorem 4.1.

Remark 15.

Proof.

Example 4.2.

4.2 Lower Bound

Theorem 4.3 (Wilson’s Method, Theorem 13.5 in [37]).

Lemma 4.4.

Proof.

Example 4.5.

5 Further Directions

5.1 Data Analysis

Definition 1.4 (Random Transpositions Markov chain on $\mathcal{T}_{\lambda,\mu}$ ).

2.2 Random Transpositions on $S_{n}$

4.1 Upper Bound for $2\times J$ tables

5.4 Parabolic Subgroups of $GL_{n}$