An Adjacent-Swap Markov Chain on Coalescent Trees

Mackenzie Simper, Julia A. Palacios

Abstract

The standard coalescent is widely used in evolutionary biology and population genetics to model the ancestral history of a sample of molecular sequences as a rooted and ranked binary tree. In this paper, we present a representation of the space of ranked trees as a space of constrained ordered matched pairs. We use this representation to define ergodic Markov chains on labeled and unlabeled ranked tree shapes analogously to transposition chains on the space of permutations. We show that an adjacent-swap chain on labeled and unlabeled ranked tree shapes has mixing time at least of order $n^{3}$ , and at most of order $n^{4}$ . Bayesian inference methods rely on Markov chain Monte Carlo methods on the space of trees. Thus, it is important to define good Markov chains which are easy to simulate and for which rates of convergence can be studied.

1 Introduction

The standard coalescent (Kingman:1982uj, ) is often used in evolutionary biology and population genetics to model the set of ancestral relationships among $n$ individuals as a rooted and timed binary tree called a genealogy (wakeley_coalescent_2008, ). In this manuscript, we are concerned with discrete tree topologies only and assume a unit interval length between consecutive coalescing (branching) events.

A labeled ranked tree shape $T^{L}\in\mathcal{T}^{L}_{n}$ with $n$ leaves is a rooted labeled and ranked binary tree. The leaves are labeled by the set $\{\ell_{1},\ell_{2},\ldots,\ell_{n}\}$ and internal nodes are labeled $1,\ldots,n-1$ in increasing order, with label $n-1$ at the root (Figure 1(A)). The cardinality of $\mathcal{T}^{L}_{n}$ is given by

|\mathcal{T}^{L}_{n}|=\frac{n!(n-1)!}{2^{n-1}}.

(1)

In many applications, the quantity of interest is the overall shape of the tree (Kirkpatrick1993, ; Frost2013, ; Maliet2018, ); the specific labels of the samples are un-important. In fact, a key feature of simple coalescent models, such as the Kingman coalescent, is the assumption that the leaves are exchangeable. The Tajima coalescent, a lumping of the Kingman coalescent that ignores leaf labels, was recently proposed for inferring past population size trajectories from binary molecular data (Palacios2019, ). This motivates the study of unlabeled ranked tree shapes with $n$ leaves $\mathcal{T}_{n}$ (Figure 1(B)). The appealing feature of modeling unlabeled ranked tree shapes is the reduction in the cardinality of the state space. The cardinality of $\mathcal{T}_{n}$ is given by the ( $n-1$ )-th Euler zig-zag number $E_{n-1}\ll|T^{L}_{n}|$ (kuznetsov1994increasing, ).

Refer to caption — Figure 1: Examples of coalescent tree topologies. A ranked and unlabeled tree shape $T$ has internal nodes labeled by their ranking from leaves to root and unlabeled leaves. (A) has ordered matched pairs: $(0,0)^{1},(0,0)^{2},(0,1)^{3},(0,3)^{4},(2,4)^{5},(0,5)^{6}$ . A ranked and labeled tree $T^{L}$ has unique leaf and internal node labels. (B) has ordered matched pairs: $(\ell_{3},\ell_{4})^{1},(\ell_{1},\ell_{2})^{2},(\ell_{5},1)^{3},(\ell_{6},3)^{4},(2,4)^{5},(\ell_{7},5)^{6}$ .

Other types of trees are used in the context of evolutionary models, such as cladeograms (or phylogenies). Cladeograms are rooted or unrooted binary trees with labeled leaves and no ranking of internal nodes. In diaconis1998matchings , the space of cladeograms with $l$ leaves is shown to be in bijection with the space of perfect matchings of $2n$ labels, with $n=l-1$ . The bijection with perfect matchings is used to define a Markov chain on the space of cladeograms. Using the perfect matching perspective, the chain is analogous to a random transpositions chain on the space $S_{2n}$ of permutations of $2n$ elements. This representation allows the use of known results for the random transpositions chain to be applied to the chain on trees to determine sharp bounds on rates of convergence diaconis2002random . Motivated by this result, we propose a new representation of labeled and unlabeled ranked tree shapes as a type of matching. We use this representation to define an adjacent-swap chain on the spaces of labeled and unlabeled ranked tree shapes, and to study their mixing time. It is important to study the mixing time for chains on ranked trees because these results have implications for the design of MCMC algorithms to infer evolutionary parameters from molecular data in a Bayesian setting (drummond2012bayesian, ).

In the rest of this section we define a bijective representation of ranked trees as constrained ordered matchings and define the adjacent-swap chain. We then state our main results concerning mixing times and discuss related work. In Section 2, we state known results on Markov chains that will be used in the following sections. In section 3 we state important properties of the proposed adjacent-swap chain. The lower bound for the mixing time of the adjacent-swap chains is proven in section 4 and it is obtained by finding a specific function that gives a lower bound for the relaxation time of the chains.In section 5, we prove the upper bounds on the mixing time of the adjacent-swap chains via a coupling argument. In the discussion section we mention future directions and other related chains on labeled and un-labeled coalescent trees which could be defined using the matching representations, as well as discuss implications for MCMC methods applied in practice.

1.1 Matching representations

We first establish some definitions. Let $m$ be an integer and $S$ a multiset of size $2m$ . A matching of $S$ objects is a partition of the elements into $m$ disjoint subsets of size two. An ordered matching is a matching with a linear ordering (ranking) of the $m$ pairs.

Definition 1.

The space $COM_{m}(S)$ (Constrained Ordered Matchings) is the set of all ordered matchings of $S$ with pairs labeled $\textbf{p}_{1},\textbf{p}_{2},\dots,\textbf{p}_{m}$ , which satisfy, for $k=1,\dots,m$ and $a,b\in S$ ,

\textbf{p}_{k}=(a,b)^{k}\implies a,b<k\text{ if }a,b\in\mathbb{Z}\text{, or }a,b\in\{\ell_{1},\ldots,\ell_{n}\}

(2)

To represent a labeled ranked tree shape as a constrained ordered matching, we take $S$ to be the set of leaf and internal node labels, and each pair in the matching represents a coalescence in the tree, with $\textbf{p}_{k}$ representing the $k$ -th coalescence event. The condition (2) then simply ensures that no interior node can be merged before it has been introduced. A leaf can merge at any point and thus there is no constraint on a leaf’s position in the matching.

Lemma 1.

With $S=\{\ell_{1},\ell_{2},\dots,\ell_{n},1,2,\dots,n-2\}$ , the space $COM_{n-1}(S)$ is in bijection with $\mathcal{T}_{n}^{L}$ . With multiset $S=\{0,0,\dots,0,1,2,\dots,n-2\}$ , with $0$ repeated $n$ times, the space $COM_{n-1}(S)$ is in bijection with with $\mathcal{T}_{n}$ .

Proof.

First assume that $S=\{\ell_{1},\ell_{2},\dots,\ell_{n},1,2,\dots,n-2\}$ . The first matched pair can be formed by any two leaves in $\binom{n}{2}$ possible ways. The second pair can be formed by any of the $n-2$ remaining leaves or $1$ , the label of the first matched pair, in $\binom{n-1}{2}$ ways; in general, the $k$ -th matched pair can be formed in $\binom{n-k+1}{2}$ ways, because there are $n-k+1$ unique labels that could be matched at that time. Thus, $|COM_{n-1}(S)|=\prod^{n}_{i=2}\binom{i}{2}=|\mathcal{T}^{L}_{n}|$ and for every element $T^{L}\in\mathcal{T}^{L}_{n}$ there is a one-to-one mapping between every coalescence event in $T^{L}$ and every matched pair in one element of $COM_{n-1}(S)$ . These two facts imply there is a one-to-one mapping between $\mathcal{T}^{L}_{n}$ and $COM_{n-1}(S)$ .

With the leaf labels $\ell_{1},\dots,\ell_{n}$ replaced with the repeated element $0$ , the space of matchings is in bijection with $\mathcal{T}_{n}$ due to the fact that the space of $\mathcal{T}_{n}$ is equivalent to the space of $\mathcal{T}_{n}^{L}$ with the leaf labels removed. That is, there is a surjection $\mathcal{T}_{n}^{L}\to\mathcal{T}_{n}$ equivalent to the surjection $COM_{n-1}(S_{1})\to COM_{n-1}(S_{2})$ , with $S_{1}=\{\ell_{1},\ell_{2},\dots,\ell_{n},1,2,\dots,n-2\}$ and $S_{2}=\{0,0,\dots,0,1,2,\dots,n-2\}$ . The bijective matching representation of the ranked trees $\mathcal{T}_{n}$ and $\mathcal{T}_{n}^{L}$ of Figure 1 are shown in the legend of Figure 1. ∎

1.2 Markov chain and main results

Definition 2.

The adjacent-swap Markov chain on the space $COM_{n-1}(S)$ is defined by the following update move:

1.

Pick an index $k\in\{1,\dots,n-2\}$ uniformly at random.
2.
Pick labels $l_{1}$ and $l_{2}$ uniformly from the pairs $\mathbf{p}_{k}$ and $\mathbf{p}_{k+1}$ respectively.
1. (a)
  
  If swapping the positions of $l_{1},l_{2}$ does not violate constraint (2), make the swap.
2. (b)
  
  Otherwise, remain in the current state.

For the sets $S$ which give the bijections with $\mathcal{T}_{n}$ and $\mathcal{T}_{n}^{L}$ , this Markov chain is connected and reversible. In the case of $\mathcal{T}_{n}^{L}$ it is also symmetric and hence, it has a uniform stationary distribution. For $\mathcal{T}_{n}$ , the chain is not symmetric; the stationary distribution is the Tajima distribution on the space of ranked un-labeled trees. These facts, as well as the relationship between the two chains, will be proved in a later section.

We are interested in the convergence rates of this Markov chain and how this rate compares depending on the space $\mathcal{T}_{n}$ or $\mathcal{T}_{n}^{L}$ . The measure that we study for convergence rate is the mixing time. That is, for a Markov chain on a space $\Omega$ with transition probability $P$ and stationary distribution $\pi$ , the mixing time is defined as

t_{mix}=\sup_{x_{0}\in\Omega}\inf\{t>0:\|P^{t}(x_{0},\cdot)-\pi(\cdot)\|_{TV}<1/4\}.

To state the result, let $t_{n}$ and $t_{n}^{L}$ be the mixing times for the chains on $\mathcal{T}_{n}$ and $\mathcal{T}_{n}^{L}$ respectively. Our main result is:

Theorem 1.

There exists constants $C_{1},C_{2},C_{3}$ such that

C_{1}\cdot n^{3}\leq t_{n}\leq C_{2}\cdot n^{4},

and

C_{1}\cdot n^{3}\leq t_{n}^{L}\leq C_{3}\cdot n^{4}.

The lower bound of $n^{3}$ comes from the relaxation time, and involves a standard trick of finding a specific function for which the variance under the stationary distribution can be bounded. The chain on $\mathcal{T}_{n}$ is a certain type of “lumping” of the chain on $\mathcal{T}_{n}^{L}$ , and so the same lower bound applies to both spaces. The upper bounds are obtained using a coupling argument. We conjecture that the mixing time $t_{n}$ is indeed smaller than the time $t_{n}^{L}$ , though it is not evident whether this is by a significant order, or just a constant factor.

1.3 Related work

Mixing of chains on other tree spaces have been previously studied, in particular, the mixing of Markov chains on the space of cladeograms. An $n$ -leaf cladeogram is a rooted or unrooted tree with $n$ labeled leaves and unlabeled internal nodes of degree 3. The main difference between cladeograms and the tree topologies studied in this manuscript, is that cladeograms do not rank internal branching events. The cardinality of the space of cladeograms of $n$ leaves is $(2(n-1))!/2^{n-1}(n-1)!$ , smaller than the cardinality of the space of ranked and labeled trees, but larger than the cardinality of the space of ranked unlabeled trees. Cladeograms are fundamental objects in phylogenetics to model ancestral relationships at the species level, while ranked tree shapes are fundamental objects in population genetics and phylodynamics of infectious diseases. Ranked tree shapes are used to model ancestral relationships of a sample of individuals from a single population of the same species (felsenstein2004inferring, ).

Markov chains on cladeograms. Aldous aldous2000mixing studied a chain on unrooted cladeograms that removes a leaf chosen uniformly at random from the current cladeogram and reattaches it to a random edge. Using coupling methods, Aldous aldous2000mixing showed the relaxation time for this chain is $\tau_{rel}\leq c_{2}n^{3}$ . The same chain was later analyzed by Schweinsberg (schweinsberg2002n2, ) who showed that $\tau_{rel}=O(n^{2})$ using a modified method of distinguished paths. In diaconis1998matchings , the authors show that the space of rooted cladeograms with $n$ leaves is in bijection with the space of perfect matchings in the complete graph on $2(n-1)$ vertices. This bijection was later used to define a Markov chain by randomly choosing two pairs and transposing two randomly selected entries of each pair diaconis2002random . The authors showed that this random transpositions chain mixed after $\frac{1}{2}n\log n$ steps. In SHK2014 , the authors study the nearest neighbor interchange (NNI) chain and a subtree prune and regraft (SPR) chain on the space of rooted cladeograms. The authors showed that the upper bounds on the relaxation time of the SPR and the NNI chains are $O(n^{5/2})$ and $O(n^{4})$ respectively.

Other related work include the study of mixing times of chains whose stationary distribution is the posterior distribution over cladeograms (mossel2006limitations, ; vstefankovivc2011fast, ). In (mossel2006limitations, ), the authors show exponential mixing times under a misspecified model in which data is generated from a mixture of cladeograms.

Markov chains on related spaces.

The adjacent swap Markov chain on the space of ranked trees studied in this manuscript is closely related to the adjacent transpositions chain on the space $S_{n}$ of permutations. As it is often done, one can think of an element $\sigma\in S_{n}$ as an ordering of a deck of cards with unique labels $1,\dots,n$ . In (aldous1983random, , Example 4.10), the chain is defined by picking at random two adjacent cards from the deck; with probability 1/2, the cards are swapped and with probability 1/2 nothing happens. The chain is symmetric with uniform stationary distribution. Aldous (aldous1983random, ) showed that the mixing time of this chain has a lower bound of $c_{1}n^{3}$ and an upper bound of $c_{2}n^{3}\log(n)$ on the mixing time. In wilson2004mixing , the now-famous “Wilson’s method” was introduced and applied to the adjacent transpositions chain as an example to improve the lower bound to order $n^{3}log(n)$ with the explicit constant $(1/\pi^{2})n\log(n)$ , as well as an upper bound of $(2/\pi^{2})n\log(n)$ . Finally, in lacoin2016mixing ,the upper bound was improved to $(1/\pi^{2})n\log(n)$ , so that the upper and lower bounds matched; in addition, the chain was proven to follow the cut-off phenomenon. Durrett generalized the adjacent transpositions chain to the $L$ -reversal chain, introduced as a model for the evolution of DNA in a chromosome durrett2003shuffling .

To the best of our knowledge, bounds on the mixing time of chains on ranked tree shapes have not been studied before (misra2011optimization, ).

2 Preliminaries

In this section, we review some results of Markov chains theory that will be used to bound the mixing time of the adjacent swap chains on $\mathcal{T}^{L}_{n}$ and $\mathcal{T}_{n}$ . We use the coupling method to find the upper bounds. A coupling of Markov chains with transition matrix $P$ is a process $(X_{t},Y_{t})_{t\geq 0}$ such that both $(X_{t})$ and $(Y_{t})$ are marginally Markov chains with transition matrix $P$ . The goal is to define a coupling of two chains from different starting distributions such that the two chains will quickly reach the same state, and once this occurs the two chains stay matched forever. This coupling time gives an upper bound on the mixing time, see Chapter 5 from LevinPeresWilmer2006 for more details.

Theorem 2 (Theorem 5.4 (LevinPeresWilmer2006, )).

Suppose that for each pair of states $x,y\in\mathcal{X}$ there is a coupling $(X_{t},Y_{t})$ with $X_{0}=x$ and $Y_{0}=y$ . For each such coupling, let $\tau_{\text{couple}}$ be the coalescence time of the chains, i.e.

\tau_{\text{couple}}:=\min\{t:X_{s}=Y_{s}\text{ for all }s\geq t\}.

Then

t_{mix}\leq 4\max_{x,y}E_{x,y}(\tau_{\text{couple}})

To find a lower bound on the mixing time, we bound the relaxation time and use the following result:

Theorem 3 (Theorem 12.5 (LevinPeresWilmer2006, )).

Suppose $P$ is the transition matrix of an irreducible, aperiodic and reversible Markov chain. Then

t_{mix}\geq(\tau_{n}-1)\log(2)

(3)

The relaxation time is defined as the inverse of the spectral gap: $\tau_{n}=1/\gamma$ , where $\gamma=1-|\lambda_{n,2}|$ , one minus the absolute value of the second-largest eigenvalue of the transition matrix of the chain. We will bound the relaxation time using the following variational characterization:

Theorem 4 (Lemma 13.7 (LevinPeresWilmer2006, )).

Let P be the transition matrix for a reversible Markov chain. The spectral gap $\gamma$ satisfies

\gamma=\min_{f:\mathcal{X}\rightarrow\mathbb{R},\text{Var}_{\pi}(f)\neq 0}\frac{\mathcal{E}(f)}{\text{Var}_{\pi}(f)}

where

\mathcal{E}(f):=\frac{1}{2}\sum_{x,y\in\mathcal{X}}[f(x)-f(y)]^{2}\pi(x)P(x,y)

3 The adjacent-swap chain on ranked tree shapes

Lemma 2.

The adjacent-swap Markov chain on $\mathcal{T}_{n}^{L}$ is irreducible, aperiodic and reversible with respect to the uniform stationary distribution $\pi^{L}(x):=1/|\mathcal{T}_{n}^{L}|$ .

Proof.

This can be seen by noting the transition matrix is symmetric. Let $P^{L}$ be the transition matrix for the adjacent-swap chain on $\mathcal{T}_{n}^{L}$ , then

P^{L}(x,y)=\frac{1}{n-2}\cdot\frac{1}{4},

where $y$ is obtained by swapping two randomly chosen elements from two adjacent pairs in $x$ , chosen uniformly at random, e.g. consider the following states at pairs $k$ and $k+1$ :

	$\displaystyle x:(a,b)^{k},(c,d)^{k+1}$
	$\displaystyle y:(a,c)^{k},(b,d)^{k+1}.$

In a labeled ranked tree shape $T^{L}_{n}$ , all pairs are formed by distinct elements, so $a,b,c,d$ are all unique and the only way to transition from $x$ to $y$ is to swap the labels $b$ and $c$ , which happens with probability $1/4(n-2)$ if $c<k$ or $c\in\{\ell_{1},\ldots,\ell_{n}\}$ .

To see that the chain is irreducible, we note that any label can be moved to any pair to the right, one pair (or step) at a time with positive probability and that any label can be moved to a pair with index $k$ to the left (one pair at a time) as long as the label corresponds to a leaf or an internal node smaller than $k$ , satisfying constraint (2). To see that the chain is aperiodic, we note that $P^{L}(x,x)>0$ , for all $x\in\mathcal{T}^{L}_{n}$ . Independent of the current state, we can pick pair $k=n-2$ and attempt to swap any element of the $n-2$ pair with label $n-2$ from the $n-1$ pair with probability $1/2(n-2)$ . Since this move violates constraint (2), the move is rejected and the chain remains in the current state. ∎

We note that the transition matrix $P$ of the adjacent-swap chain on unlabeled ranked tree shapes is not symmetric. For example, consider the transition $x$ to $y$ of the type:

(0,a)^{k},(0,b)^{k+1}\to(0,0)^{k},(a,b)^{k+1},

(4)

for labels $a,b\neq 0$ (recall the $0$ s represent leaf labels). The probability of this transition is $P(x,y)=1/(n-2)\cdot 1/4$ , since once pairs $k$ and $k+1$ are chosen, there is a $1/4$ chance of choosing label $a$ from pair $k$ and label $0$ from pair $k+1$ . However, the reverse transition:

(0,0)^{k},(a,b)^{k+1}\to(0,a)^{k},(0,b)^{k+1}

has probability $1/(n-2)\cdot 1/2$ . It turns out that the stationary distribution $\pi$ for the chain is the Tajima coalescent distribution veber (also known as Yule distribution). For $T\in\mathcal{T}_{n}$

\pi(T)=\frac{2^{n-c(T)-1}}{(n-1)!},

where $c(T)$ is the number of cherries of $T$ , i.e. the number of pairs of the type $(0,0)$ in the $COM_{n-1}(S)$ representation. Indeed, for transitions $x$ to $y$ of the type of (4), $y$ has one more cherry than $x$ and $\pi(x)P(x,y)=\pi(y)P(y,x)$ ; for other types of transitions $P(x,y)>0$ that do not affect the number of cherries, $P(x,y)=P(y,x)$ , $\pi(x)=\pi(y)$ and the detailed balance equation is satisfied.

Another way of proving the Tajima coalescent distribution is the stationary distribution on unlabeled ranked tree shapes is to view the chain as a lumping of the chain on $\mathcal{T}^{L}_{n}$ . This perspective also gives us an initial comparison of the relaxation times of the chains. The space $\mathcal{T}_{n}$ of unlabeled ranked-tree shapes can be considered as a set of equivalency classes of the trees $\mathcal{T}_{n}^{L}$ . That is, for trees $x,y\in\mathcal{T}_{n}^{L}$ , define the equivalence relation $x\sim y$ if the trees have the same ranked tree shape. From the $COM_{n-1}(S)$ perspective with $S=\{\ell_{1},\dots,\ell_{n},1,2,\dots,n-2\}$ , two matchings are equivalent if all internal node labels $1,\ldots,n-2$ occur in the same pairs.

This equivalence relation induces a partition of $\mathcal{T}_{n}^{L}$ using equivalence classes. That is, we can write $\mathcal{T}_{n}^{L}$ as the disjoint union of sets $\Omega_{1},\dots,\Omega_{M}$ , where $M=|\mathcal{T}_{n}|$ , and all trees in $\Omega_{i}$ have the same ranked-tree shape.

Lemma 3.

For any $x,x^{\prime}\in\mathcal{T}_{n}^{L}$ , equivalence class $\Omega_{i}$ , and $x\sim x^{\prime}$ , the following relation holds:

P^{L}(x,\Omega_{i}):=\sum_{y\in\Omega_{i}}P^{L}(x,y)=P^{L}(x^{\prime},\Omega_{i}).

Proof.

If $x\in\Omega_{i}$ , then the transition from $x$ to $y\in\Omega_{i}$ is of the type:

(a,b)^{k},(c,d)^{k+1}\to(c,b)^{k},(a,d)^{k+1}.

(5)

where $a,c\in\{\ell_{1},\ell_{2},\dots,\ell_{n}\}$ correspond to leaf labels. Since $x$ and $x^{\prime}$ differ only at leaf labels, the transition probability of swapping two leaf labels is the same and $P^{L}(x,\Omega_{i})=P^{L}(x^{\prime},\Omega_{i})$ . If $x\not\in\Omega_{i}$ , and therefore $x^{\prime}\not\in\Omega_{i}$ , then the transition from $x$ to $y$ is of the type:

(a,b)^{k},(c,d)^{k+1}\to(c,b)^{k},(a,d)^{k+1}.

where $a$ and $c$ are not both in $\{\ell_{1},\ell_{2},\dots,\ell_{n}\}$ . Since $x$ and $x^{\prime}$ differ only at leaf labels, again $P^{L}(x,\Omega_{i})=P^{L}(x^{\prime},\Omega_{i})$ . ∎

In words, Lemma 3 says that probability of transitioning to a different ranked-tree shape is independent of the leaf configuration of the current tree. A well-known result (e.g. Lemma 2.5 in LevinPeresWilmer2006 ) is that the induced chain on the space of equivalence classes defined by $\tilde{P}([x],[y]):=P(x,[y])$ is a Markov chain; then observe that the induced chain $\tilde{P}$ is equivalent to adjacent-swap Markov chain $P$ on the space $\mathcal{T}_{n}$ . This allows a comparison of the spectral gaps of the two chains. Lemma 12.9 in (LevinPeresWilmer2006, ) states the eigenvalues of the transition matrix of the lumped chain are eigenvalues of the transition matrix of the full chain, hence we have the following lemma.

Lemma 4.

Let $\gamma,\gamma^{L}$ be the spectral gaps of the chains $P,P^{L}$ respectively. Then, $\gamma^{L}\leq\gamma$ .

From Lemma 3 we can immediately see that stationary distribution for $P$ is indeed the Tajima coalescent distribution. Suppose $T\in\mathcal{T}_{n}^{U}$ corresponds to the equivalence class $\Omega_{i}\subset\mathcal{T}_{n}^{L}$ . Then,

\pi(T)=\sum_{y\in\Omega_{i}}\pi^{L}(y)=\frac{|\Omega_{i}|}{|\mathcal{T}_{n}^{L}|}.

A uniformly sampled labeled ranked tree shape with the leaf-labels erased gives an unlabeled ranked tree shape according to the Tajima distribution (veber, ). This implies that $|\Omega_{i}|/|\mathcal{T}_{n}^{L}|$ is equal to the Tajima distribution for the ranked tree shape corresponding to $\Omega_{i}$ .

4 Lower bound

To prove the lower bounds in Theorem 1, we use the variational characterization of the relaxation time:

\tau_{n}=\sup_{f:\Omega\to\mathbb{R}}\frac{Var_{\pi}(f)}{\mathcal{E}(f)}.

(6)

We will use the internal tree length as a function $\varphi(x):\mathcal{T}_{n}\rightarrow\mathbb{R}$ to find a lower bound for $\tau_{n}$ . Since the internal tree length is independent of the leaf labels, we get the same lower mixing time bound for the adjacent-swap chains on $\mathcal{T}_{n}$ and $\mathcal{T}^{L}_{n}$ . For ease of exposition we will focus the following discussion on $\mathcal{T}_{n}$ .

We will define the internal tree length function on the space $\mathcal{T}_{n}$ using the correspondence with $COM_{n-1}(S)$ and assuming unit length between consecutive coalescence events. For a label $j\in\{1,\dots,n-2\}$ , let $I_{x}(j)$ denote the pair index $k$ such that $j\in\textbf{p}_{k}$ . In the example of Figure 3, $I_{x}(2)=5$ . Note that from the constraints (Eq. (2)), we have that

I_{x}(j)\in\{j+1,\dots,n-1\}.

We now define the internal tree length function on $\mathcal{T}_{n}$ as follows:

\varphi(x)=\sum_{j=1}^{n-2}(I_{x}(j)-j).

(7)

As an example consider the ranked unlabeled tree shape with $n=7$ depicted in Figure 3. Note that the function is minimized for the “caterpillar tree”: $(0,0)^{1},(0,1)^{2},(0,2)^{3},\ldots,(0,n-2)^{n-1}$ , where $I_{x}(j)=j+1$ for all $j$ , giving $\varphi(x)=n-2$ .

This function $\varphi$ is useful in bounding the relaxation time because it is “local” with respect to the Markov chain. That is, suppose $P(x,y)>0$ and $x\neq y$ . The change from $x$ to $y$ could have moved an interior node label to the left, in which case $\varphi(y)=\varphi(x)-1$ . If it moved an interior node label to the right, then $\varphi(y)=\varphi(x)+1$ . If two interior node labels were swapped, then $\varphi(y)=\varphi(x)$ . Thus, $(\varphi(x)-\varphi(y))^{2}\leq 1$ . The denominator of Equation (6) can then be upper-bounded by $1/2$ .

To find the variance of the internal tree length, we note that Var ${}_{\pi}[\varphi(X)]=$ Var ${}_{\pi^{L}}[\varphi(X)]$ since the internal tree length is independent of the leaf labels. We now re-state the standard coalescent of labeled ranked tree shapes as an urn process.

Consider an urn containing $n$ balls labeled $\ell_{1},\ldots,\ell_{n}$ . At step k, draw two balls without replacement from the urn. Let $l_{1},l_{2}$ be the labels of the balls drawn, then set $\mathbf{p}_{k}=(l_{1},l_{2})^{k}$ . Add in a new ball with label $k$ . Repeat this process for $k=1,\dots,n-1$ until only a single ball labeled $(n-1)$ remains in the urn. The resulting sequence of pairs $\mathbf{p}=(\mathbf{p}_{1},\dots,\mathbf{p}_{n-1})$ corresponds to a labeled ranked tree shape $T\in\mathcal{T}^{L}_{n}$ drawn from the standard coalescent, i.e. $\mathbf{p}\sim\pi^{L}$ .

We can now simplify the urn process as follows: start with $n$ white balls in the urn. At each step, draw two balls and add back in a single red ball (representing an interior node of the tree). Let $R_{0}=0$ and $R_{k}$ be the number of red balls in the urn after $k$ coalescence events. Note that after $k$ mergers (coalescence events), there are $n-k$ total balls left and $R_{n-1}=1$ .

The simplified urn process is useful because the internal tree length can be computed by counting the number of red balls at each step and adding them all together (Figure 3), i.e.,

\varphi(\mathbf{p})=\sum_{k=1}^{n-2}R_{k}.

The quantities $\{R_{k}\}_{k=1}^{n-2}$ are fairly easy to analyze and have been studied before in various contexts. In janson2011 , the values are used to study the asymptotic behavior of the external tree length of the Kingman coalescent and it is shown that:

	$\displaystyle\mathbb{E}[R_{k}]=\frac{k(n-k)}{n-1},$
	$\displaystyle Cov(R_{k}\cdot R_{l})=\frac{k(k-1)(n-l)(n-l-1)}{(n-1)^{2}(n-2)},\,\,\,\,k\leq l.$

Using this, we get

\mathbb{E}_{\pi}[\varphi(\mathbf{p})]=\sum_{k=1}^{n-2}\frac{k(n-k)}{n-1}=\frac{1}{6}(n^{2}+n-6),

Then, we calculate

	$\displaystyle\sum_{k=1}^{n-2}\mathbb{E}[R_{k}^{2}]=\frac{1}{30}(n^{3}+3n^{2}+2n-30),$		(8)
	$\displaystyle 2\cdot\sum_{k=1}^{n-3}\sum_{l=k+1}^{n-2}\mathbb{E}[R_{k}R_{l}]=\frac{1}{180}(n-3)(5n^{3}+21n^{2}-14n-120).$		(9)

The second moment $\mathbb{E}_{\pi}[\varphi(\mathbf{p})^{2}]$ is the sum of these two lines (8) + (9), which gives

\text{Var}_{\pi}(\varphi(\mathbf{p}))=\mathbb{E}_{\pi}[\varphi(\mathbf{p})^{2}]-\mathbb{E}_{\pi}[\varphi(\mathbf{p})]^{2}=\frac{1}{90}n(n+1)(n-3).

Theorem 5.

Let $\tau_{n},\tau_{n}^{L}$ be the relaxation time of the adjacent-swap chains on $\mathcal{T}_{n}$ and $\mathcal{T}^{L}_{n}$ , respectively. Then,

\tau^{L}_{n}\geq\tau_{n}\geq\frac{\text{Var}_{\pi}(\varphi)}{\mathcal{E}(\varphi)}\geq\frac{2}{90}n(n+1)(n-3).

The lower bound on the relaxation time is applicable to labeled and unlabeled ranked tree shapes, however Lemma 4 further allows us to state the first inequality on the left hand side of Theorem 5.

5 Upper bound

In this section we prove the upper bounds in Theorem 1 using a coupling argument. The coupling is similar to the one used by Aldous for analyzing the adjacent transpositions chain on $S_{n}$ aldous1983random .

We analyze a lazy version of the chain to make the coupling. That is, at each step, we generate a random coin flip $\theta\sim\text{Bernoulli}(1/2)$ . If $\theta=1$ , attempt a move of the chain, and if $\theta=0$ then make no change. Let $X_{t}=(\textbf{p}_{1},\dots,\textbf{p}_{n-1})$ and $Y_{t}=(\textbf{q}_{1},\dots,\textbf{q}_{n-1})$ be the two copies of the chain at time $t$ , one started from $x$ and the other from $y$ . We will first describe the coupling for the chain on $\mathcal{T}_{n}$ .

Coupling of unlabeled ranked tree shapes.

We’ll define a coupling that jointly matches the internal node labels of the two copies in the order $n-2,n-3,n-4,\dots,1$ . Note that label $n-2$ is already jointly matched in the $n-1$ pairs because it can only occur in the final pair. Let $N_{t}\in\{1,2,\ldots,n-2\}$ be the maximum label that is not jointly matched, i.e., the maximum element that occurs in different pairs in $X_{t}$ and $Y_{t}$ . Let $X_{t}(a)$ denote the index of the pair in the matching $X_{t}$ that contains $a$ , i.e. $X_{t}(a)=i$ if $a\in\textbf{p}_{i}$ at time $t$ . Then, at time $t$ , the elements $\geq N_{t}$ match in both $X_{t}$ and $Y_{t}$ , i.e. for every $a\geq N_{t}$ $X_{t}(a)=Y_{t}(a)$ .

For any label $a\in\{1,2,\dots,n-2\}$ , the coupling will have two properties:

Property 1. If $a\geq N_{t}$ , then $X_{s}(a)=Y_{s}(a)$ for all $s\geq t$ .

Property 2. If $a=N_{t}-1$ and $X_{t}(a)<Y_{t}(a)$ , then $X_{t+1}(a)\leq Y_{t+1}(a)$ and if $X_{t}(a)>Y_{t}(a)$ , then $X_{t+1}(a)\geq Y_{t+1}(a)$ . This condition will ensure that $a$ will eventually get matched in the two copies.

Define the following quantities:

•

Let $M_{t}$ be the set of all indices $2\leq i\leq n-2$ that contain labels $\geq N_{t}$ jointly matched. That is, there is a label $a\geq N_{t}$ such that $X_{t}(a)=Y_{t}(a)=i$ . Note that it is possible to have other matches for labels $<N_{t}$ , but we do not keep track of those matches and breaking those matches does not violate the two properties of the coupling.
•

Let $AM_{t}=i$ if label $N_{t}-1$ is in pairs $\textbf{p}_{i}$ and $\textbf{q}_{i+1}$ , or if it is in pairs $\textbf{p}_{i+1}$ and $\textbf{q}_{i}$ . Otherwise, set $AM_{t}=0$ .

At each step, pick an index $1\leq i\leq n-2$ uniformly at random and consider the following cases:

1.

$i,i+1\notin M_{t}$ and $i\neq AM_{t}$ . There are no joint matchings of labels $\geq N_{t}$ in pairs $i$ and $i+1$ and swapping two elements in both copies will not match label $N_{t}-1$ . In this case, propose a swap in $X_{t}$ and $Y_{t}$ according to their (lazy) marginal dynamics.
2.

$i\in M_{t}$ and/or $i+1\in M_{t}$ , and $i\neq AM_{t}$ . There is at least one joint match of labels $\geq N_{t}$ in pairs $i$ or $i+1$ and swapping two elements in both copies will not create a new joint match of $N_{t}-1$ . To preserve Property 1 it is necessary to perform the same move (or no move at all) on both chains. Toss a single coin $\theta$ to determine whether a move will be proposed on both copies or not.
(a) If $\theta=1$ and the pairs at $i$ and $i+1$ are of the type:

$\displaystyle X:(a,b)^{i},(c,d)^{i+1}$

$\displaystyle Y:(a,e)^{i},(f,g)^{i+1},$

With the possibility of $b=e$ . Then, if a matched label e.g. $a$ is chosen to be swapped in chain $X$ , it must also be chosen for $Y$ . Now, label $a<i$ since it is a label that merges at $i$ ; and $a\geq N_{t}$ since it is a label that was jointly matched before. Then $c,d,f,$ or $g$ cannot be label $i$ since $i$ is already a match. This implies that there are no constraints on $c,d,f,g$ and the marginal probability of swapping $a$ would be the same in each chain.

(b) If $\theta=1$ and the pairs at $i$ and $i+1$ are of the type:

$\displaystyle X:(a,b)^{i},(c,d)^{i+1}$

$\displaystyle Y:(e,f)^{i},(g,d)^{i+1},$

With the possibility of $d=i$ , or

$\displaystyle X:(a,b)^{i},(i,d)^{i+1}$

$\displaystyle Y:(e,f)^{i},(i,d)^{i+1},$

Then, if a matched label e.g. $d$ is chosen to be swapped in chain $X$ , it must also be swapped in $Y$ .
3.

$i,i+1\notin M_{t}$ and $i=AM_{t}$ . There are no joint matchings of labels $\geq N_{t}$ in pairs $i$ and $i+1$ , and label $N_{t}-1$ is either in $\textbf{p}_{i}$ and $\textbf{q}_{i+1}$ or in $\textbf{p}_{i+1}$ and $\textbf{q}_{i}$ . In order to preserve property 2, simply set $\theta_{Y}=1-\theta_{X}$ . This ensures that label $a=N_{t}-1$ will only possibly move in one of the chains.

$i\in M_{t}$ and/or $i+1\in M_{t}$ , and $i=AM_{t}$ . There is one joint match in pair $i$ and/or $i+1$ , and label $N_{t}-1$ is either in $\textbf{p}_{i}$ and $\textbf{q}_{i+1}$ or in $\textbf{p}_{i+1}$ and $\textbf{q}_{i}$ . To preserve Properties 1 and 2 while keeping the correct marginal transition probabilities for each chain we define the joint transitions as follows.

(a) Suppose $c=N_{t}-1$ and the pairs at indices $i,i+1$ are of the type:

	$\displaystyle X:(a,b)^{i},(c,d)^{i+1}$
	$\displaystyle Y:(a,c)^{i},(f,g)^{i+1}.$

By the same argument of case 2, $c,d,g,f\neq i$ . Table 1 defines the joint proposal probabilities that preserve Properties $1$ and $2$ and correct marginal transition probabilities.

Table 1: Coupling transition probabilities in case 4(a)

Move in $X$	Move in $Y$	Probability
No change	No change	$3/8$
No change	$c\leftrightarrow f$	$1/16$
No change	$c\leftrightarrow g$	$1/16$
$b\leftrightarrow c$	No change	$1/8$
$b\leftrightarrow d$	$c\leftrightarrow f$	$1/16$
$b\leftrightarrow d$	$c\leftrightarrow g$	$1/16$
$a\leftrightarrow c$	$a\leftrightarrow f$	$1/16$
$a\leftrightarrow c$	$a\leftrightarrow g$	$1/16$
$a\leftrightarrow d$	$a\leftrightarrow f$	$1/16$
$a\leftrightarrow d$	$a\leftrightarrow g$	$1/16$

(b) Suppose now that $c=N_{t}-1$ and that pairs at indices $i$ and $i+1$ are of the type:

	$\displaystyle X:(a,b)^{i},(c,i)^{i+1}$
	$\displaystyle Y:(a,c)^{i},(f,i)^{i+1}.$

Table 2 defines the joint proposal probabilities that preserve Properties 1 and 2 with correct marginal transition probabilities.

Table 2: Coupling transition probabilities in case 4(b)

Move in $X$	Move in $Y$	Probability
No change	No change	$5/8$
No change	$c\leftrightarrow f$	$1/8$
$b\leftrightarrow c$	No change	$1/8$
$a\leftrightarrow c$	$a\leftrightarrow f$	$1/8$

	$\displaystyle X:(a,c)^{i},(b,i)^{i+1}$
	$\displaystyle Y:(d,e)^{i},(c,i)^{i+1}.$

Table 3 defines the joint proposal probabilities that preserve Properties 1 and 2 with correct marginal transition probabilities.

Table 3: Coupling transition probability in case 4(c).

Move in $X$	Move in $Y$	Probability
No change	No change	$5/8$
No change	$e\leftrightarrow c$	$1/8$
$b\leftrightarrow c$	No change	$1/8$
$a\leftrightarrow b$	$d\leftrightarrow c$	$1/8$

Coupling of labeled ranked tree shapes.

The coupling for $\mathcal{T}_{n}^{L}$ can proceed exactly as the coupling for $\mathcal{T}_{n}$ until every interior-node label is matched, say at time $T$ . After that point, the leaf labels can be matched in any order. We extend the set $M_{t}$ to contain the indices $1\leq i\leq n-2$ of any label $a\in\{\ell_{1},\ldots,\ell_{n}\}$ such that $X_{t}(a)=Y_{t}(a)$ and $AM_{t}$ to be the set of indices $i$ such that any label $a\in\{\ell_{1},\ldots,\ell_{n}\}$ is in pair $\textbf{p}_{i}$ and $\textbf{q}_{i+1}$ or $\textbf{p}_{i+1}$ and $\textbf{q}_{i}$ . The transition probabilities defined above work with these sets $M_{t}$ , $AM_{t}$ and have the following properties, for $t\geq T$ and label $a$ :

1.

Property 1 If $X_{t}(a)=Y_{t}(a)$ , then $X_{s}(a)=Y_{s}(a)$ for all $s\geq t$ .
2.

Property 2 If $X_{t}(a)<Y_{t}(a)$ then $X_{s}(a)\leq Y_{s}(a)$ for all $s\geq t$ . If $X_{t}(a)>Y_{t}(a)$ then $X_{s}(a)\geq Y_{s}(a)$ for all $s\geq t$ .

5.1 Coupling time

For the chain on unlabeled ranked tree shapes $\mathcal{T}_{n}$ , the time to couple is $T=T_{n-3}+T_{n-4}+\dots+T_{1}$ , where $T_{a}$ is the time it takes to match interior node label $a$ , after the labels $a+1,\dots,n-2$ are already matched. Property 2 is crucial to study $T_{a}$ . Suppose that at time $t=T_{n-3}+\dots+T_{a+1}$ when label $a+1$ is matched we have $X_{t}(a)<Y_{t}(a)$ . Let $S_{a}$ be the time it takes for $Y_{t}$ to hit the left boundary of it’s range, i.e. $Y_{t}(a)=a+1$ . Then necessarily at this time, by Property 2, $X_{t}(a)=Y_{t}(a)$ .

Note that $X_{t}(a)\in\{a+1,\dots,n-2\}$ . If $a$ is not at the left boundary, the probability of moving $a$ to the left is:

\mathbb{P}(X_{t+1}(a)=X_{t}(a)-1\mid X_{t}\neq a+1)=\frac{1}{n-2}\cdot\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4(n-2)}.

The probability that $a$ moves to the right will depend on the exact configuration and whether or not there is a constraint, e.g. in the situation

\displaystyle(a,b)^{i},(c,i)^{i+1},

in which the probability $a$ moves to the right is $1/8(n-3)$ . Thus, we can bound

\frac{1}{8(n-2)}\leq\mathbb{P}(X_{t+1}(a)=X_{t}(a)+1\mid X_{t}\neq n-2)\leq\frac{1}{4(n-2)}.

The following is an easy result about the hitting time of a symmetric random walk on a line.

Lemma 5.

Let $(Z_{t})_{t\geq 0}$ be a random walk on the line $\{1,2,3,\dots,m\}$ with the transitions

	$\displaystyle P(i,i+1)=p\,\,\,\,\,\,\,\,\,\,\,i\neq m$
	$\displaystyle P(i,i-1)=p\,\,\,\,\,\,\,\,\,\,\,i\neq 1$
	$\displaystyle P(i,i)=1-2p\,\,\,\,\,\,\,\,\,\,\,i\neq 1,m$
	$\displaystyle P(m,m)=1-p$
	$\displaystyle P(1,1)=1$

for some $0<p\leq 1/2$ . Suppose $Z_{0}=m$ and let $T_{m}=\inf\{t>0:Z_{t}=1\}$ . Then $\mathbb{E}[T_{m}]=(1/2p)\cdot m(m-1)$ .

Proof.

We can prove the result for $p=1/2$ , then scale. Let $a_{x}$ be the expected first time of hitting $1$ if $Z_{0}=x$ . Note that the following identities are satisfied:

	$\displaystyle a_{1}=0$
	$\displaystyle a_{x}=1+\frac{1}{2}(a_{x+1}+a_{x-1})\,\,\,\,\,\,x\in\{2,3,\dots,m-1\}$
	$\displaystyle a_{m}=1+\frac{1}{2}a_{m-1}+\frac{1}{2}a_{m}.$

This is a second-order recursion, with solution $a_{x}=x(2m-x+1)-2m$ . So this means $a_{m}=m(m-1)$ . When $p<1/2$ the probability of moving is $2p<1$ , so the expected time to hit $1$ started from $m$ is $(1/2p)m(m-1)$ . ∎

Let $Z_{t}$ be a random walk on the line $a+1,\dots,n-2$ with $p=1/4(n-2)$ , defined as in Lemma 5. We can couple $Z_{t}$ with $Y_{t}(a)$ so that $Y_{t}(a)\leq Z_{t}$ almost surely for all $t\geq 0$ , because $Z_{t}$ is moving to the right with probability $\geq$ $Y_{t}(a)$ . In conclusion,

\mathbb{E}[T_{a}]\leq 2(n-2)(n-2-a)^{2}

and thus the total coupling time is

\mathbb{E}[\tau_{couple}]\leq\sum_{a=1}^{n-3}\mathbb{E}[T_{a}]\leq\sum_{a=1}^{n-3}2(n-2)(n-2-a)^{2}=\frac{1}{3}(n-1)(n-2)(n-3)(2n-5)\leq\frac{2}{3}n^{4}.

This together with Theorem 2 gives the upper bound in Theorem 1 for $\mathcal{T}_{n}$ :

t_{n}\leq 4\cdot\mathbb{E}[\tau_{couple}]\leq\frac{8}{3}n^{4}.

Leaf-Labeled Trees

The coupling time for leaf-labeled trees can be written $\tau_{couple}^{L}=T+T^{leaves}$ , where $T$ is the time from the previous section for the interior labels to match. For the time $T^{leaves}$ for the leaves to match, note that time is saved because the leaves are allowed to match in any order. Moreover, leaf labels can be moved without constraints. In analogy with the result from Example 4.10 in aldous1983random for adjacent transpositions on $S_{n}$ , this would take time of order $n^{3}\log(n)$ . Therefore, $\mathbb{E}[\tau_{couple}^{L}]\leq\frac{2}{3}n^{4}+C_{4}n^{3}\log(n)$ .

6 Discussion

The representation of ranked tree shapes as ordered matchings can be used to define many other Markov chains in analogy to well-studied chains on $S_{n}$ . For example, in addition to the adjacent-swap, another natural chain would be a random-swap Markov chain: pick two pairs uniformly at random, within each pair pick a label, and swap the two elements if it is allowed. The internal tree length function defined in Section 4 gives a lower bound of $n$ for this chain, because a single move could change the function by at most $n$ , and thus the Dirichlet form can be bounded by $n^{2}$ . A naive coupling argument would give an upper bound of order $n^{3}$ . The focus of this paper was on the adjacent-swap chain because it is a local-move chain which could be more useful for Metropolis algorithms in applications.

The upper bound on the mixing time relies on a coupling that matches one label at a time in a specific order giving a sub-optimal upper bound. We conjecture that the mixing time of the adjacent-swap chain on unlabeled ranked tree shapes is of the order of $n^{3}\log(n)$ as it is in the case of adjacent transpositions on $S_{n}$ . Figure 4 shows the total variation distance between the adjacent-swap chain on unlabeled ranked tree shapes and the Tajima stationary distribution for trees with $n<10$ leaves. Due to the large size of the state space, this calculation could not be extended for larger trees in order to inform about the presence of a cutoff phenomena.

We also note another connection between trees and permutations: Unlabeled ranked tree shapes are in bijection with alternating permutations (donaghey1975alternating, ; richard1999enumerative, ) and hence some results concerning Markov chains on alternating permutations could be used to analyze the mixing of Markov chains on unlabeled ranked trees and vice versa. However, we are not aware of mixing time results of Markov chains on the space alternating permutations (diaconis2013random, ).

Finally, in this work we analyzed Markov chains on trees whose stationary distributions are coalescent distributions. These coalescent distributions are used as prior distributions over the set of ancestral relationships of samples of DNA (Palacios2019, ). In practice, these Markov chains are used to approximate the posterior distribution, and hence, the state space is more restricted and subject of future exploration.

Acknowledgments.

MS is supported by a National Defense Science & Engineering Graduate fellowship. JAP is supported by National Institutes of Health Grant R01-GM-131404 and the Alfred P. Sloan Foundation.

References

[1] D. Aldous. Random walks on finite groups and rapidly mixing Markov chains. In Séminaire de Probabilités XVII 1981/82, pages 243–297. Springer, 1983.
[2] D. J. Aldous. Mixing time for a Markov chain on cladograms. Combinatorics, Probability and Computing, 9(3):191–204, 2000.
[3] P. Diaconis, S. Holmes, et al. Random walks on trees and matchings. Electronic Journal of Probability, 7, 2002.
[4] P. Diaconis and P. M. Wood. Random doubly stochastic tridiagonal matrices. Random Structures & Algorithms, 42(4):403–437, 2013.
[5] P. W. Diaconis and S. P. Holmes. Matchings and phylogenetic trees. Proceedings of the National Academy of Sciences, 95(25):14600–14602, 1998.
[6] R. Donaghey. Alternating permutations and binary increasing trees. Journal of Combinatorial Theory, Series A, 18(2):141–148, 1975.
[7] A. Drummond, M. Suchard, D. Xie, and A. Rambaut. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29:1969–1973, 2012.
[8] R. Durrett. Shuffling chromosomes. Journal of Theoretical Probability, 16(3):725–750, 2003.
[9] J. Felsenstein. Inferring phylogenies, volume 2. Sinauer associates Sunderland, MA, 2004.
[10] S. D. W. Frost and E. M. Volz. Modelling tree shape and structure in viral phylodynamics. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1614):20120208, 2013.
[11] S. Janson and G. Kersting. On the total external length of the Kingman coalescent. Electron. J. Probab., 16:2203–2218, 2011.
[12] J. Kingman. The coalescent. Stochastic Processes and their Applications, 13(3):235–248, 1982.
[13] M. Kirkpatrick and M. Slatkin. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution, 47(4):1171–1181, 1993.
[14] A. G. Kuznetsov, I. M. Pak, and A. E. Postnikov. Increasing trees and alternating permutations. Russian Mathematical Surveys, 49(6):79, 1994.
[15] H. Lacoin et al. Mixing time and cutoff for the adjacent transposition shuffle and the simple exclusion. The Annals of Probability, 44(2):1426–1487, 2016.
[16] D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Society, 2017.
[17] O. Maliet, F. Gascuel, and A. Lambert. Ranked tree shapes, nonrandom extinctions, and the loss of phylogenetic diversity. Systematic Biology, 67(6):1025–1040, 04 2018.
[18] N. Misra, G. Blelloch, R. Ravi, and R. Schwartz. An optimization-based sampling scheme for phylogenetic trees. In International Conference on Research in Computational Molecular Biology, pages 252–266. Springer, 2011.
[19] E. Mossel, E. Vigoda, et al. Limitations of markov chain monte carlo algorithms for bayesian inference of phylogeny. The Annals of Applied Probability, 16(4):2215–2234, 2006.
[20] J. A. Palacios, A. Véber, L. Cappello, Z. Wang, J. Wakeley, and S. Ramachandran. Bayesian estimation of population size changes by sampling Tajima’s trees. Genetics, 213(3):967–986, 2019.
[21] P. S. Richard. Enumerative combinatorics. Volume I, 1999.
[22] R. Sainudiin, T. Stadler, and A. Véber. Finding the best resolution for the Kingman-Tajima coalescent: theory and applications. Journal of Mathematical Biology, 70(6):1207–1247, 2015.
[23] J. Schweinsberg. An ${O}(n^{2})$ bound for the relaxation time of a Markov chain on cladograms. Random Structures & Algorithms, 20(1):59–70, 2002.
[24] D. Spade, R. Herbei, and L. Kubatko. A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces. Statistics & Probability Letters, 84:247–252, 01 2014.
[25] D. Štefankovič and E. Vigoda. Fast convergence of Markov chain Monte Carlo algorithms for phylogenetic reconstruction with homogeneous data on closely related species. SIAM Journal on Discrete Mathematics, 25(3):1194–1211, 2011.
[26] J. Wakeley. Coalescent Theory: An Introduction. Roberts & Company Publishers, June 2008.
[27] D. B. Wilson et al. Mixing times of lozenge tiling and card shuffling Markov chains. The Annals of Applied Probability, 14(1):274–325, 2004.

	$\displaystyle X:(a,b)^{i},(c,d)^{i+1}$
	$\displaystyle Y:(a,e)^{i},(f,g)^{i+1},$

	$\displaystyle X:(a,b)^{i},(c,d)^{i+1}$
	$\displaystyle Y:(e,f)^{i},(g,d)^{i+1},$

	$\displaystyle X:(a,b)^{i},(i,d)^{i+1}$
	$\displaystyle Y:(e,f)^{i},(i,d)^{i+1},$