This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Adjacent-Swap Markov Chain on Coalescent Trees

Mackenzie Simper, Julia A. Palacios
Abstract

The standard coalescent is widely used in evolutionary biology and population genetics to model the ancestral history of a sample of molecular sequences as a rooted and ranked binary tree. In this paper, we present a representation of the space of ranked trees as a space of constrained ordered matched pairs. We use this representation to define ergodic Markov chains on labeled and unlabeled ranked tree shapes analogously to transposition chains on the space of permutations. We show that an adjacent-swap chain on labeled and unlabeled ranked tree shapes has mixing time at least of order n3n^{3}, and at most of order n4n^{4}. Bayesian inference methods rely on Markov chain Monte Carlo methods on the space of trees. Thus, it is important to define good Markov chains which are easy to simulate and for which rates of convergence can be studied.

1 Introduction

The standard coalescent (Kingman:1982uj, ) is often used in evolutionary biology and population genetics to model the set of ancestral relationships among nn individuals as a rooted and timed binary tree called a genealogy (wakeley_coalescent_2008, ). In this manuscript, we are concerned with discrete tree topologies only and assume a unit interval length between consecutive coalescing (branching) events.

A labeled ranked tree shape TL𝒯nLT^{L}\in\mathcal{T}^{L}_{n} with nn leaves is a rooted labeled and ranked binary tree. The leaves are labeled by the set {1,2,,n}\{\ell_{1},\ell_{2},\ldots,\ell_{n}\} and internal nodes are labeled 1,,n11,\ldots,n-1 in increasing order, with label n1n-1 at the root (Figure 1(A)). The cardinality of 𝒯nL\mathcal{T}^{L}_{n} is given by

|𝒯nL|=n!(n1)!2n1.|\mathcal{T}^{L}_{n}|=\frac{n!(n-1)!}{2^{n-1}}. (1)

In many applications, the quantity of interest is the overall shape of the tree (Kirkpatrick1993, ; Frost2013, ; Maliet2018, ); the specific labels of the samples are un-important. In fact, a key feature of simple coalescent models, such as the Kingman coalescent, is the assumption that the leaves are exchangeable. The Tajima coalescent, a lumping of the Kingman coalescent that ignores leaf labels, was recently proposed for inferring past population size trajectories from binary molecular data (Palacios2019, ). This motivates the study of unlabeled ranked tree shapes with nn leaves 𝒯n\mathcal{T}_{n} (Figure 1(B)). The appealing feature of modeling unlabeled ranked tree shapes is the reduction in the cardinality of the state space. The cardinality of 𝒯n\mathcal{T}_{n} is given by the (n1n-1)-th Euler zig-zag number En1|TnL|E_{n-1}\ll|T^{L}_{n}| (kuznetsov1994increasing, ).

Refer to caption
Figure 1: Examples of coalescent tree topologies. A ranked and unlabeled tree shape TT has internal nodes labeled by their ranking from leaves to root and unlabeled leaves. (A) has ordered matched pairs: (0,0)1,(0,0)2,(0,1)3,(0,3)4,(2,4)5,(0,5)6(0,0)^{1},(0,0)^{2},(0,1)^{3},(0,3)^{4},(2,4)^{5},(0,5)^{6}. A ranked and labeled tree TLT^{L} has unique leaf and internal node labels. (B) has ordered matched pairs: (3,4)1,(1,2)2,(5,1)3,(6,3)4,(2,4)5,(7,5)6(\ell_{3},\ell_{4})^{1},(\ell_{1},\ell_{2})^{2},(\ell_{5},1)^{3},(\ell_{6},3)^{4},(2,4)^{5},(\ell_{7},5)^{6}.

Other types of trees are used in the context of evolutionary models, such as cladeograms (or phylogenies). Cladeograms are rooted or unrooted binary trees with labeled leaves and no ranking of internal nodes. In diaconis1998matchings , the space of cladeograms with ll leaves is shown to be in bijection with the space of perfect matchings of 2n2n labels, with n=l1n=l-1. The bijection with perfect matchings is used to define a Markov chain on the space of cladeograms. Using the perfect matching perspective, the chain is analogous to a random transpositions chain on the space S2nS_{2n} of permutations of 2n2n elements. This representation allows the use of known results for the random transpositions chain to be applied to the chain on trees to determine sharp bounds on rates of convergence diaconis2002random . Motivated by this result, we propose a new representation of labeled and unlabeled ranked tree shapes as a type of matching. We use this representation to define an adjacent-swap chain on the spaces of labeled and unlabeled ranked tree shapes, and to study their mixing time. It is important to study the mixing time for chains on ranked trees because these results have implications for the design of MCMC algorithms to infer evolutionary parameters from molecular data in a Bayesian setting (drummond2012bayesian, ).

In the rest of this section we define a bijective representation of ranked trees as constrained ordered matchings and define the adjacent-swap chain. We then state our main results concerning mixing times and discuss related work. In Section 2, we state known results on Markov chains that will be used in the following sections. In section 3 we state important properties of the proposed adjacent-swap chain. The lower bound for the mixing time of the adjacent-swap chains is proven in section 4 and it is obtained by finding a specific function that gives a lower bound for the relaxation time of the chains.In section 5, we prove the upper bounds on the mixing time of the adjacent-swap chains via a coupling argument. In the discussion section we mention future directions and other related chains on labeled and un-labeled coalescent trees which could be defined using the matching representations, as well as discuss implications for MCMC methods applied in practice.

1.1 Matching representations

We first establish some definitions. Let mm be an integer and SS a multiset of size 2m2m. A matching of SS objects is a partition of the elements into mm disjoint subsets of size two. An ordered matching is a matching with a linear ordering (ranking) of the mm pairs.

Definition 1.

The space COMm(S)COM_{m}(S) (Constrained Ordered Matchings) is the set of all ordered matchings of SS with pairs labeled p1,p2,,pm\textbf{p}_{1},\textbf{p}_{2},\dots,\textbf{p}_{m}, which satisfy, for k=1,,mk=1,\dots,m and a,bSa,b\in S,

pk=(a,b)ka,b<k if a,b, or a,b{1,,n}\textbf{p}_{k}=(a,b)^{k}\implies a,b<k\text{ if }a,b\in\mathbb{Z}\text{, or }a,b\in\{\ell_{1},\ldots,\ell_{n}\} (2)

To represent a labeled ranked tree shape as a constrained ordered matching, we take SS to be the set of leaf and internal node labels, and each pair in the matching represents a coalescence in the tree, with pk\textbf{p}_{k} representing the kk-th coalescence event. The condition (2) then simply ensures that no interior node can be merged before it has been introduced. A leaf can merge at any point and thus there is no constraint on a leaf’s position in the matching.

Lemma 1.

With S={1,2,,n,1,2,,n2}S=\{\ell_{1},\ell_{2},\dots,\ell_{n},1,2,\dots,n-2\}, the space COMn1(S)COM_{n-1}(S) is in bijection with 𝒯nL\mathcal{T}_{n}^{L}. With multiset S={0,0,,0,1,2,,n2}S=\{0,0,\dots,0,1,2,\dots,n-2\}, with 0 repeated nn times, the space COMn1(S)COM_{n-1}(S) is in bijection with with 𝒯n\mathcal{T}_{n}.

Proof.

First assume that S={1,2,,n,1,2,,n2}S=\{\ell_{1},\ell_{2},\dots,\ell_{n},1,2,\dots,n-2\}. The first matched pair can be formed by any two leaves in (n2)\binom{n}{2} possible ways. The second pair can be formed by any of the n2n-2 remaining leaves or 11, the label of the first matched pair, in (n12)\binom{n-1}{2} ways; in general, the kk-th matched pair can be formed in (nk+12)\binom{n-k+1}{2} ways, because there are nk+1n-k+1 unique labels that could be matched at that time. Thus, |COMn1(S)|=i=2n(i2)=|𝒯nL||COM_{n-1}(S)|=\prod^{n}_{i=2}\binom{i}{2}=|\mathcal{T}^{L}_{n}| and for every element TL𝒯nLT^{L}\in\mathcal{T}^{L}_{n} there is a one-to-one mapping between every coalescence event in TLT^{L} and every matched pair in one element of COMn1(S)COM_{n-1}(S). These two facts imply there is a one-to-one mapping between 𝒯nL\mathcal{T}^{L}_{n} and COMn1(S)COM_{n-1}(S).

With the leaf labels 1,,n\ell_{1},\dots,\ell_{n} replaced with the repeated element 0, the space of matchings is in bijection with 𝒯n\mathcal{T}_{n} due to the fact that the space of 𝒯n\mathcal{T}_{n} is equivalent to the space of 𝒯nL\mathcal{T}_{n}^{L} with the leaf labels removed. That is, there is a surjection 𝒯nL𝒯n\mathcal{T}_{n}^{L}\to\mathcal{T}_{n} equivalent to the surjection COMn1(S1)COMn1(S2)COM_{n-1}(S_{1})\to COM_{n-1}(S_{2}), with S1={1,2,,n,1,2,,n2}S_{1}=\{\ell_{1},\ell_{2},\dots,\ell_{n},1,2,\dots,n-2\} and S2={0,0,,0,1,2,,n2}S_{2}=\{0,0,\dots,0,1,2,\dots,n-2\}. The bijective matching representation of the ranked trees 𝒯n\mathcal{T}_{n} and 𝒯nL\mathcal{T}_{n}^{L} of Figure 1 are shown in the legend of Figure 1. ∎

1.2 Markov chain and main results

Definition 2.

The adjacent-swap Markov chain on the space COMn1(S)COM_{n-1}(S) is defined by the following update move:

  1. 1.

    Pick an index k{1,,n2}k\in\{1,\dots,n-2\} uniformly at random.

  2. 2.

    Pick labels l1l_{1} and l2l_{2} uniformly from the pairs 𝐩k\mathbf{p}_{k} and 𝐩k+1\mathbf{p}_{k+1} respectively.

    1. (a)

      If swapping the positions of l1,l2l_{1},l_{2} does not violate constraint (2), make the swap.

    2. (b)

      Otherwise, remain in the current state.

For the sets SS which give the bijections with 𝒯n\mathcal{T}_{n} and 𝒯nL\mathcal{T}_{n}^{L}, this Markov chain is connected and reversible. In the case of 𝒯nL\mathcal{T}_{n}^{L} it is also symmetric and hence, it has a uniform stationary distribution. For 𝒯n\mathcal{T}_{n}, the chain is not symmetric; the stationary distribution is the Tajima distribution on the space of ranked un-labeled trees. These facts, as well as the relationship between the two chains, will be proved in a later section.

Refer to caption
Figure 2: Example of an adjacent-swap move on ranked tree shapes. The pair k=6k=6 is selected uniformly at random among pairs 2,,62,\ldots,6. We then select uniformly at random one element from pair 66 and one element from pair 77. The only allowable swaps are 55 and 0 or 33 and 0. The new state (right tree) is obtained by swapping 55 and 0 from pairs 66 and 77 respectively in the left tree.

We are interested in the convergence rates of this Markov chain and how this rate compares depending on the space 𝒯n\mathcal{T}_{n} or 𝒯nL\mathcal{T}_{n}^{L}. The measure that we study for convergence rate is the mixing time. That is, for a Markov chain on a space Ω\Omega with transition probability PP and stationary distribution π\pi, the mixing time is defined as

tmix=supx0Ωinf{t>0:Pt(x0,)π()TV<1/4}.t_{mix}=\sup_{x_{0}\in\Omega}\inf\{t>0:\|P^{t}(x_{0},\cdot)-\pi(\cdot)\|_{TV}<1/4\}.

To state the result, let tnt_{n} and tnLt_{n}^{L} be the mixing times for the chains on 𝒯n\mathcal{T}_{n} and 𝒯nL\mathcal{T}_{n}^{L} respectively. Our main result is:

Theorem 1.

There exists constants C1,C2,C3C_{1},C_{2},C_{3} such that

C1n3tnC2n4,C_{1}\cdot n^{3}\leq t_{n}\leq C_{2}\cdot n^{4},

and

C1n3tnLC3n4.C_{1}\cdot n^{3}\leq t_{n}^{L}\leq C_{3}\cdot n^{4}.

The lower bound of n3n^{3} comes from the relaxation time, and involves a standard trick of finding a specific function for which the variance under the stationary distribution can be bounded. The chain on 𝒯n\mathcal{T}_{n} is a certain type of “lumping” of the chain on 𝒯nL\mathcal{T}_{n}^{L}, and so the same lower bound applies to both spaces. The upper bounds are obtained using a coupling argument. We conjecture that the mixing time tnt_{n} is indeed smaller than the time tnLt_{n}^{L}, though it is not evident whether this is by a significant order, or just a constant factor.

1.3 Related work

Mixing of chains on other tree spaces have been previously studied, in particular, the mixing of Markov chains on the space of cladeograms. An nn-leaf cladeogram is a rooted or unrooted tree with nn labeled leaves and unlabeled internal nodes of degree 3. The main difference between cladeograms and the tree topologies studied in this manuscript, is that cladeograms do not rank internal branching events. The cardinality of the space of cladeograms of nn leaves is (2(n1))!/2n1(n1)!(2(n-1))!/2^{n-1}(n-1)!, smaller than the cardinality of the space of ranked and labeled trees, but larger than the cardinality of the space of ranked unlabeled trees. Cladeograms are fundamental objects in phylogenetics to model ancestral relationships at the species level, while ranked tree shapes are fundamental objects in population genetics and phylodynamics of infectious diseases. Ranked tree shapes are used to model ancestral relationships of a sample of individuals from a single population of the same species (felsenstein2004inferring, ).

Markov chains on cladeograms. Aldous aldous2000mixing studied a chain on unrooted cladeograms that removes a leaf chosen uniformly at random from the current cladeogram and reattaches it to a random edge. Using coupling methods, Aldous aldous2000mixing showed the relaxation time for this chain is τrelc2n3\tau_{rel}\leq c_{2}n^{3}. The same chain was later analyzed by Schweinsberg (schweinsberg2002n2, ) who showed that τrel=O(n2)\tau_{rel}=O(n^{2}) using a modified method of distinguished paths. In diaconis1998matchings , the authors show that the space of rooted cladeograms with nn leaves is in bijection with the space of perfect matchings in the complete graph on 2(n1)2(n-1) vertices. This bijection was later used to define a Markov chain by randomly choosing two pairs and transposing two randomly selected entries of each pair diaconis2002random . The authors showed that this random transpositions chain mixed after 12nlogn\frac{1}{2}n\log n steps. In SHK2014 , the authors study the nearest neighbor interchange (NNI) chain and a subtree prune and regraft (SPR) chain on the space of rooted cladeograms. The authors showed that the upper bounds on the relaxation time of the SPR and the NNI chains are O(n5/2)O(n^{5/2}) and O(n4)O(n^{4}) respectively.

Other related work include the study of mixing times of chains whose stationary distribution is the posterior distribution over cladeograms (mossel2006limitations, ; vstefankovivc2011fast, ). In (mossel2006limitations, ), the authors show exponential mixing times under a misspecified model in which data is generated from a mixture of cladeograms.

Markov chains on related spaces.

The adjacent swap Markov chain on the space of ranked trees studied in this manuscript is closely related to the adjacent transpositions chain on the space SnS_{n} of permutations. As it is often done, one can think of an element σSn\sigma\in S_{n} as an ordering of a deck of cards with unique labels 1,,n1,\dots,n. In (aldous1983random, , Example 4.10), the chain is defined by picking at random two adjacent cards from the deck; with probability 1/2, the cards are swapped and with probability 1/2 nothing happens. The chain is symmetric with uniform stationary distribution. Aldous (aldous1983random, ) showed that the mixing time of this chain has a lower bound of c1n3c_{1}n^{3} and an upper bound of c2n3log(n)c_{2}n^{3}\log(n) on the mixing time. In wilson2004mixing , the now-famous “Wilson’s method” was introduced and applied to the adjacent transpositions chain as an example to improve the lower bound to order n3log(n)n^{3}log(n) with the explicit constant (1/π2)nlog(n)(1/\pi^{2})n\log(n), as well as an upper bound of (2/π2)nlog(n)(2/\pi^{2})n\log(n). Finally, in lacoin2016mixing ,the upper bound was improved to (1/π2)nlog(n)(1/\pi^{2})n\log(n), so that the upper and lower bounds matched; in addition, the chain was proven to follow the cut-off phenomenon. Durrett generalized the adjacent transpositions chain to the LL-reversal chain, introduced as a model for the evolution of DNA in a chromosome durrett2003shuffling .

To the best of our knowledge, bounds on the mixing time of chains on ranked tree shapes have not been studied before (misra2011optimization, ).

2 Preliminaries

In this section, we review some results of Markov chains theory that will be used to bound the mixing time of the adjacent swap chains on 𝒯nL\mathcal{T}^{L}_{n} and 𝒯n\mathcal{T}_{n}. We use the coupling method to find the upper bounds. A coupling of Markov chains with transition matrix PP is a process (Xt,Yt)t0(X_{t},Y_{t})_{t\geq 0} such that both (Xt)(X_{t}) and (Yt)(Y_{t}) are marginally Markov chains with transition matrix PP. The goal is to define a coupling of two chains from different starting distributions such that the two chains will quickly reach the same state, and once this occurs the two chains stay matched forever. This coupling time gives an upper bound on the mixing time, see Chapter 5 from LevinPeresWilmer2006 for more details.

Theorem 2 (Theorem 5.4 (LevinPeresWilmer2006, )).

Suppose that for each pair of states x,y𝒳x,y\in\mathcal{X} there is a coupling (Xt,Yt)(X_{t},Y_{t}) with X0=xX_{0}=x and Y0=yY_{0}=y. For each such coupling, let τcouple\tau_{\text{couple}} be the coalescence time of the chains, i.e.

τcouple:=min{t:Xs=Ys for all st}.\tau_{\text{couple}}:=\min\{t:X_{s}=Y_{s}\text{ for all }s\geq t\}.

Then

tmix4maxx,yEx,y(τcouple)t_{mix}\leq 4\max_{x,y}E_{x,y}(\tau_{\text{couple}})

To find a lower bound on the mixing time, we bound the relaxation time and use the following result:

Theorem 3 (Theorem 12.5 (LevinPeresWilmer2006, )).

Suppose PP is the transition matrix of an irreducible, aperiodic and reversible Markov chain. Then

tmix(τn1)log(2)t_{mix}\geq(\tau_{n}-1)\log(2) (3)

The relaxation time is defined as the inverse of the spectral gap: τn=1/γ\tau_{n}=1/\gamma, where γ=1|λn,2|\gamma=1-|\lambda_{n,2}|, one minus the absolute value of the second-largest eigenvalue of the transition matrix of the chain. We will bound the relaxation time using the following variational characterization:

Theorem 4 (Lemma 13.7 (LevinPeresWilmer2006, )).

Let P be the transition matrix for a reversible Markov chain. The spectral gap γ\gamma satisfies

γ=minf:𝒳,Varπ(f)0(f)Varπ(f)\gamma=\min_{f:\mathcal{X}\rightarrow\mathbb{R},\text{Var}_{\pi}(f)\neq 0}\frac{\mathcal{E}(f)}{\text{Var}_{\pi}(f)}

where

(f):=12x,y𝒳[f(x)f(y)]2π(x)P(x,y)\mathcal{E}(f):=\frac{1}{2}\sum_{x,y\in\mathcal{X}}[f(x)-f(y)]^{2}\pi(x)P(x,y)

3 The adjacent-swap chain on ranked tree shapes

Lemma 2.

The adjacent-swap Markov chain on 𝒯nL\mathcal{T}_{n}^{L} is irreducible, aperiodic and reversible with respect to the uniform stationary distribution πL(x):=1/|𝒯nL|\pi^{L}(x):=1/|\mathcal{T}_{n}^{L}|.

Proof.

This can be seen by noting the transition matrix is symmetric. Let PLP^{L} be the transition matrix for the adjacent-swap chain on 𝒯nL\mathcal{T}_{n}^{L}, then

PL(x,y)=1n214,P^{L}(x,y)=\frac{1}{n-2}\cdot\frac{1}{4},

where yy is obtained by swapping two randomly chosen elements from two adjacent pairs in xx, chosen uniformly at random, e.g. consider the following states at pairs kk and k+1k+1:

x:(a,b)k,(c,d)k+1\displaystyle x:(a,b)^{k},(c,d)^{k+1}
y:(a,c)k,(b,d)k+1.\displaystyle y:(a,c)^{k},(b,d)^{k+1}.

In a labeled ranked tree shape TnLT^{L}_{n}, all pairs are formed by distinct elements, so a,b,c,da,b,c,d are all unique and the only way to transition from xx to yy is to swap the labels bb and cc, which happens with probability 1/4(n2)1/4(n-2) if c<kc<k or c{1,,n}c\in\{\ell_{1},\ldots,\ell_{n}\}.

To see that the chain is irreducible, we note that any label can be moved to any pair to the right, one pair (or step) at a time with positive probability and that any label can be moved to a pair with index kk to the left (one pair at a time) as long as the label corresponds to a leaf or an internal node smaller than kk, satisfying constraint (2). To see that the chain is aperiodic, we note that PL(x,x)>0P^{L}(x,x)>0, for all x𝒯nLx\in\mathcal{T}^{L}_{n}. Independent of the current state, we can pick pair k=n2k=n-2 and attempt to swap any element of the n2n-2 pair with label n2n-2 from the n1n-1 pair with probability 1/2(n2)1/2(n-2). Since this move violates constraint (2), the move is rejected and the chain remains in the current state. ∎

We note that the transition matrix PP of the adjacent-swap chain on unlabeled ranked tree shapes is not symmetric. For example, consider the transition xx to yy of the type:

(0,a)k,(0,b)k+1(0,0)k,(a,b)k+1,(0,a)^{k},(0,b)^{k+1}\to(0,0)^{k},(a,b)^{k+1}, (4)

for labels a,b0a,b\neq 0 (recall the 0s represent leaf labels). The probability of this transition is P(x,y)=1/(n2)1/4P(x,y)=1/(n-2)\cdot 1/4, since once pairs kk and k+1k+1 are chosen, there is a 1/41/4 chance of choosing label aa from pair kk and label 0 from pair k+1k+1. However, the reverse transition:

(0,0)k,(a,b)k+1(0,a)k,(0,b)k+1(0,0)^{k},(a,b)^{k+1}\to(0,a)^{k},(0,b)^{k+1}

has probability 1/(n2)1/21/(n-2)\cdot 1/2. It turns out that the stationary distribution π\pi for the chain is the Tajima coalescent distribution veber (also known as Yule distribution). For T𝒯nT\in\mathcal{T}_{n}

π(T)=2nc(T)1(n1)!,\pi(T)=\frac{2^{n-c(T)-1}}{(n-1)!},

where c(T)c(T) is the number of cherries of TT, i.e. the number of pairs of the type (0,0)(0,0) in the COMn1(S)COM_{n-1}(S) representation. Indeed, for transitions xx to yy of the type of (4), yy has one more cherry than xx and π(x)P(x,y)=π(y)P(y,x)\pi(x)P(x,y)=\pi(y)P(y,x); for other types of transitions P(x,y)>0P(x,y)>0 that do not affect the number of cherries, P(x,y)=P(y,x)P(x,y)=P(y,x), π(x)=π(y)\pi(x)=\pi(y) and the detailed balance equation is satisfied.

Another way of proving the Tajima coalescent distribution is the stationary distribution on unlabeled ranked tree shapes is to view the chain as a lumping of the chain on 𝒯nL\mathcal{T}^{L}_{n}. This perspective also gives us an initial comparison of the relaxation times of the chains. The space 𝒯n\mathcal{T}_{n} of unlabeled ranked-tree shapes can be considered as a set of equivalency classes of the trees 𝒯nL\mathcal{T}_{n}^{L}. That is, for trees x,y𝒯nLx,y\in\mathcal{T}_{n}^{L}, define the equivalence relation xyx\sim y if the trees have the same ranked tree shape. From the COMn1(S)COM_{n-1}(S) perspective with S={1,,n,1,2,,n2}S=\{\ell_{1},\dots,\ell_{n},1,2,\dots,n-2\}, two matchings are equivalent if all internal node labels 1,,n21,\ldots,n-2 occur in the same pairs.

This equivalence relation induces a partition of 𝒯nL\mathcal{T}_{n}^{L} using equivalence classes. That is, we can write 𝒯nL\mathcal{T}_{n}^{L} as the disjoint union of sets Ω1,,ΩM\Omega_{1},\dots,\Omega_{M}, where M=|𝒯n|M=|\mathcal{T}_{n}|, and all trees in Ωi\Omega_{i} have the same ranked-tree shape.

Lemma 3.

For any x,x𝒯nLx,x^{\prime}\in\mathcal{T}_{n}^{L}, equivalence class Ωi\Omega_{i}, and xxx\sim x^{\prime}, the following relation holds:

PL(x,Ωi):=yΩiPL(x,y)=PL(x,Ωi).P^{L}(x,\Omega_{i}):=\sum_{y\in\Omega_{i}}P^{L}(x,y)=P^{L}(x^{\prime},\Omega_{i}).
Proof.

If xΩix\in\Omega_{i}, then the transition from xx to yΩiy\in\Omega_{i} is of the type:

(a,b)k,(c,d)k+1(c,b)k,(a,d)k+1.(a,b)^{k},(c,d)^{k+1}\to(c,b)^{k},(a,d)^{k+1}. (5)

where a,c{1,2,,n}a,c\in\{\ell_{1},\ell_{2},\dots,\ell_{n}\} correspond to leaf labels. Since xx and xx^{\prime} differ only at leaf labels, the transition probability of swapping two leaf labels is the same and PL(x,Ωi)=PL(x,Ωi)P^{L}(x,\Omega_{i})=P^{L}(x^{\prime},\Omega_{i}). If xΩix\not\in\Omega_{i}, and therefore xΩix^{\prime}\not\in\Omega_{i}, then the transition from xx to yy is of the type:

(a,b)k,(c,d)k+1(c,b)k,(a,d)k+1.(a,b)^{k},(c,d)^{k+1}\to(c,b)^{k},(a,d)^{k+1}.

where aa and cc are not both in {1,2,,n}\{\ell_{1},\ell_{2},\dots,\ell_{n}\}. Since xx and xx^{\prime} differ only at leaf labels, again PL(x,Ωi)=PL(x,Ωi)P^{L}(x,\Omega_{i})=P^{L}(x^{\prime},\Omega_{i}). ∎

In words, Lemma 3 says that probability of transitioning to a different ranked-tree shape is independent of the leaf configuration of the current tree. A well-known result (e.g. Lemma 2.5 in LevinPeresWilmer2006 ) is that the induced chain on the space of equivalence classes defined by P~([x],[y]):=P(x,[y])\tilde{P}([x],[y]):=P(x,[y]) is a Markov chain; then observe that the induced chain P~\tilde{P} is equivalent to adjacent-swap Markov chain PP on the space 𝒯n\mathcal{T}_{n}. This allows a comparison of the spectral gaps of the two chains. Lemma 12.9 in (LevinPeresWilmer2006, ) states the eigenvalues of the transition matrix of the lumped chain are eigenvalues of the transition matrix of the full chain, hence we have the following lemma.

Lemma 4.

Let γ,γL\gamma,\gamma^{L} be the spectral gaps of the chains P,PLP,P^{L} respectively. Then, γLγ\gamma^{L}\leq\gamma.

From Lemma 3 we can immediately see that stationary distribution for PP is indeed the Tajima coalescent distribution. Suppose T𝒯nUT\in\mathcal{T}_{n}^{U} corresponds to the equivalence class Ωi𝒯nL\Omega_{i}\subset\mathcal{T}_{n}^{L}. Then,

π(T)=yΩiπL(y)=|Ωi||𝒯nL|.\pi(T)=\sum_{y\in\Omega_{i}}\pi^{L}(y)=\frac{|\Omega_{i}|}{|\mathcal{T}_{n}^{L}|}.

A uniformly sampled labeled ranked tree shape with the leaf-labels erased gives an unlabeled ranked tree shape according to the Tajima distribution (veber, ). This implies that |Ωi|/|𝒯nL||\Omega_{i}|/|\mathcal{T}_{n}^{L}| is equal to the Tajima distribution for the ranked tree shape corresponding to Ωi\Omega_{i}.

4 Lower bound

To prove the lower bounds in Theorem 1, we use the variational characterization of the relaxation time:

τn=supf:ΩVarπ(f)(f).\tau_{n}=\sup_{f:\Omega\to\mathbb{R}}\frac{Var_{\pi}(f)}{\mathcal{E}(f)}. (6)

We will use the internal tree length as a function φ(x):𝒯n\varphi(x):\mathcal{T}_{n}\rightarrow\mathbb{R} to find a lower bound for τn\tau_{n}. Since the internal tree length is independent of the leaf labels, we get the same lower mixing time bound for the adjacent-swap chains on 𝒯n\mathcal{T}_{n} and 𝒯nL\mathcal{T}^{L}_{n}. For ease of exposition we will focus the following discussion on 𝒯n\mathcal{T}_{n}.

We will define the internal tree length function on the space 𝒯n\mathcal{T}_{n} using the correspondence with COMn1(S)COM_{n-1}(S) and assuming unit length between consecutive coalescence events. For a label j{1,,n2}j\in\{1,\dots,n-2\}, let Ix(j)I_{x}(j) denote the pair index kk such that jpkj\in\textbf{p}_{k}. In the example of Figure 3, Ix(2)=5I_{x}(2)=5. Note that from the constraints (Eq. (2)), we have that

Ix(j){j+1,,n1}.I_{x}(j)\in\{j+1,\dots,n-1\}.

We now define the internal tree length function on 𝒯n\mathcal{T}_{n} as follows:

φ(x)=j=1n2(Ix(j)j).\varphi(x)=\sum_{j=1}^{n-2}(I_{x}(j)-j). (7)

As an example consider the ranked unlabeled tree shape with n=7n=7 depicted in Figure 3. Note that the function is minimized for the “caterpillar tree”: (0,0)1,(0,1)2,(0,2)3,,(0,n2)n1(0,0)^{1},(0,1)^{2},(0,2)^{3},\ldots,(0,n-2)^{n-1}, where Ix(j)=j+1I_{x}(j)=j+1 for all jj, giving φ(x)=n2\varphi(x)=n-2.

Refer to caption
Figure 3: Example of internal tree length calculation. A ranked tree shape TT with ordered matched pairs: x=(0,0)1,(0,0)2,(0,1)3,(0,3)4,(2,4)5,(0,5)6x=(0,0)^{1},(0,0)^{2},(0,1)^{3},(0,3)^{4},(2,4)^{5},(0,5)^{6}. The corresponding internal tree length is the sum of the lengths of the red branches: φ(x)=(31)+(52)+(43)+(54)+(65)=8\varphi(x)=(3-1)+(5-2)+(4-3)+(5-4)+(6-5)=8.

This function φ\varphi is useful in bounding the relaxation time because it is “local” with respect to the Markov chain. That is, suppose P(x,y)>0P(x,y)>0 and xyx\neq y. The change from xx to yy could have moved an interior node label to the left, in which case φ(y)=φ(x)1\varphi(y)=\varphi(x)-1. If it moved an interior node label to the right, then φ(y)=φ(x)+1\varphi(y)=\varphi(x)+1. If two interior node labels were swapped, then φ(y)=φ(x)\varphi(y)=\varphi(x). Thus, (φ(x)φ(y))21(\varphi(x)-\varphi(y))^{2}\leq 1. The denominator of Equation (6) can then be upper-bounded by 1/21/2.

To find the variance of the internal tree length, we note that Var[φ(X)]π={}_{\pi}[\varphi(X)]=Var[φ(X)]πL{}_{\pi^{L}}[\varphi(X)] since the internal tree length is independent of the leaf labels. We now re-state the standard coalescent of labeled ranked tree shapes as an urn process.

Consider an urn containing nn balls labeled 1,,n\ell_{1},\ldots,\ell_{n}. At step k, draw two balls without replacement from the urn. Let l1,l2l_{1},l_{2} be the labels of the balls drawn, then set 𝐩k=(l1,l2)k\mathbf{p}_{k}=(l_{1},l_{2})^{k}. Add in a new ball with label kk. Repeat this process for k=1,,n1k=1,\dots,n-1 until only a single ball labeled (n1)(n-1) remains in the urn. The resulting sequence of pairs 𝐩=(𝐩1,,𝐩n1)\mathbf{p}=(\mathbf{p}_{1},\dots,\mathbf{p}_{n-1}) corresponds to a labeled ranked tree shape T𝒯nLT\in\mathcal{T}^{L}_{n} drawn from the standard coalescent, i.e. 𝐩πL\mathbf{p}\sim\pi^{L}.

We can now simplify the urn process as follows: start with nn white balls in the urn. At each step, draw two balls and add back in a single red ball (representing an interior node of the tree). Let R0=0R_{0}=0 and RkR_{k} be the number of red balls in the urn after kk coalescence events. Note that after kk mergers (coalescence events), there are nkn-k total balls left and Rn1=1R_{n-1}=1.

The simplified urn process is useful because the internal tree length can be computed by counting the number of red balls at each step and adding them all together (Figure 3), i.e.,

φ(𝐩)=k=1n2Rk.\varphi(\mathbf{p})=\sum_{k=1}^{n-2}R_{k}.

The quantities {Rk}k=1n2\{R_{k}\}_{k=1}^{n-2} are fairly easy to analyze and have been studied before in various contexts. In janson2011 , the values are used to study the asymptotic behavior of the external tree length of the Kingman coalescent and it is shown that:

𝔼[Rk]=k(nk)n1,\displaystyle\mathbb{E}[R_{k}]=\frac{k(n-k)}{n-1},
Cov(RkRl)=k(k1)(nl)(nl1)(n1)2(n2),kl.\displaystyle Cov(R_{k}\cdot R_{l})=\frac{k(k-1)(n-l)(n-l-1)}{(n-1)^{2}(n-2)},\,\,\,\,k\leq l.

Using this, we get

𝔼π[φ(𝐩)]=k=1n2k(nk)n1=16(n2+n6),\mathbb{E}_{\pi}[\varphi(\mathbf{p})]=\sum_{k=1}^{n-2}\frac{k(n-k)}{n-1}=\frac{1}{6}(n^{2}+n-6),

Then, we calculate

k=1n2𝔼[Rk2]=130(n3+3n2+2n30),\displaystyle\sum_{k=1}^{n-2}\mathbb{E}[R_{k}^{2}]=\frac{1}{30}(n^{3}+3n^{2}+2n-30), (8)
2k=1n3l=k+1n2𝔼[RkRl]=1180(n3)(5n3+21n214n120).\displaystyle 2\cdot\sum_{k=1}^{n-3}\sum_{l=k+1}^{n-2}\mathbb{E}[R_{k}R_{l}]=\frac{1}{180}(n-3)(5n^{3}+21n^{2}-14n-120). (9)

The second moment 𝔼π[φ(𝐩)2]\mathbb{E}_{\pi}[\varphi(\mathbf{p})^{2}] is the sum of these two lines (8) + (9), which gives

Varπ(φ(𝐩))=𝔼π[φ(𝐩)2]𝔼π[φ(𝐩)]2=190n(n+1)(n3).\text{Var}_{\pi}(\varphi(\mathbf{p}))=\mathbb{E}_{\pi}[\varphi(\mathbf{p})^{2}]-\mathbb{E}_{\pi}[\varphi(\mathbf{p})]^{2}=\frac{1}{90}n(n+1)(n-3).
Theorem 5.

Let τn,τnL\tau_{n},\tau_{n}^{L} be the relaxation time of the adjacent-swap chains on 𝒯n\mathcal{T}_{n} and 𝒯nL\mathcal{T}^{L}_{n}, respectively. Then,

τnLτnVarπ(φ)(φ)290n(n+1)(n3).\tau^{L}_{n}\geq\tau_{n}\geq\frac{\text{Var}_{\pi}(\varphi)}{\mathcal{E}(\varphi)}\geq\frac{2}{90}n(n+1)(n-3).

The lower bound on the relaxation time is applicable to labeled and unlabeled ranked tree shapes, however Lemma 4 further allows us to state the first inequality on the left hand side of Theorem 5.

5 Upper bound

In this section we prove the upper bounds in Theorem 1 using a coupling argument. The coupling is similar to the one used by Aldous for analyzing the adjacent transpositions chain on SnS_{n} aldous1983random .

We analyze a lazy version of the chain to make the coupling. That is, at each step, we generate a random coin flip θBernoulli(1/2)\theta\sim\text{Bernoulli}(1/2). If θ=1\theta=1, attempt a move of the chain, and if θ=0\theta=0 then make no change. Let Xt=(p1,,pn1)X_{t}=(\textbf{p}_{1},\dots,\textbf{p}_{n-1}) and Yt=(q1,,qn1)Y_{t}=(\textbf{q}_{1},\dots,\textbf{q}_{n-1}) be the two copies of the chain at time tt, one started from xx and the other from yy. We will first describe the coupling for the chain on 𝒯n\mathcal{T}_{n}.

Coupling of unlabeled ranked tree shapes.

We’ll define a coupling that jointly matches the internal node labels of the two copies in the order n2,n3,n4,,1n-2,n-3,n-4,\dots,1. Note that label n2n-2 is already jointly matched in the n1n-1 pairs because it can only occur in the final pair. Let Nt{1,2,,n2}N_{t}\in\{1,2,\ldots,n-2\} be the maximum label that is not jointly matched, i.e., the maximum element that occurs in different pairs in XtX_{t} and YtY_{t}. Let Xt(a)X_{t}(a) denote the index of the pair in the matching XtX_{t} that contains aa, i.e. Xt(a)=iX_{t}(a)=i if apia\in\textbf{p}_{i} at time tt. Then, at time tt, the elements Nt\geq N_{t} match in both XtX_{t} and YtY_{t}, i.e. for every aNta\geq N_{t} Xt(a)=Yt(a)X_{t}(a)=Y_{t}(a).

For any label a{1,2,,n2}a\in\{1,2,\dots,n-2\}, the coupling will have two properties:

Property 1. If aNta\geq N_{t}, then Xs(a)=Ys(a)X_{s}(a)=Y_{s}(a) for all sts\geq t.

Property 2. If a=Nt1a=N_{t}-1 and Xt(a)<Yt(a)X_{t}(a)<Y_{t}(a), then Xt+1(a)Yt+1(a)X_{t+1}(a)\leq Y_{t+1}(a) and if Xt(a)>Yt(a)X_{t}(a)>Y_{t}(a), then Xt+1(a)Yt+1(a)X_{t+1}(a)\geq Y_{t+1}(a). This condition will ensure that aa will eventually get matched in the two copies.

Define the following quantities:

  • Let MtM_{t} be the set of all indices 2in22\leq i\leq n-2 that contain labels Nt\geq N_{t} jointly matched. That is, there is a label aNta\geq N_{t} such that Xt(a)=Yt(a)=iX_{t}(a)=Y_{t}(a)=i. Note that it is possible to have other matches for labels <Nt<N_{t}, but we do not keep track of those matches and breaking those matches does not violate the two properties of the coupling.

  • Let AMt=iAM_{t}=i if label Nt1N_{t}-1 is in pairs pi\textbf{p}_{i} and qi+1\textbf{q}_{i+1}, or if it is in pairs pi+1\textbf{p}_{i+1} and qi\textbf{q}_{i}. Otherwise, set AMt=0AM_{t}=0.

At each step, pick an index 1in21\leq i\leq n-2 uniformly at random and consider the following cases:

  1. 1.

    i,i+1Mti,i+1\notin M_{t} and iAMti\neq AM_{t}. There are no joint matchings of labels Nt\geq N_{t} in pairs ii and i+1i+1 and swapping two elements in both copies will not match label Nt1N_{t}-1. In this case, propose a swap in XtX_{t} and YtY_{t} according to their (lazy) marginal dynamics.

  2. 2.

    iMti\in M_{t} and/or i+1Mti+1\in M_{t}, and iAMti\neq AM_{t}. There is at least one joint match of labels Nt\geq N_{t} in pairs ii or i+1i+1 and swapping two elements in both copies will not create a new joint match of Nt1N_{t}-1. To preserve Property 1 it is necessary to perform the same move (or no move at all) on both chains. Toss a single coin θ\theta to determine whether a move will be proposed on both copies or not.
    (a) If θ=1\theta=1 and the pairs at ii and i+1i+1 are of the type:

    X:(a,b)i,(c,d)i+1\displaystyle X:(a,b)^{i},(c,d)^{i+1}
    Y:(a,e)i,(f,g)i+1,\displaystyle Y:(a,e)^{i},(f,g)^{i+1},

    With the possibility of b=eb=e. Then, if a matched label e.g. aa is chosen to be swapped in chain XX, it must also be chosen for YY. Now, label a<ia<i since it is a label that merges at ii; and aNta\geq N_{t} since it is a label that was jointly matched before. Then c,d,f,c,d,f, or gg cannot be label ii since ii is already a match. This implies that there are no constraints on c,d,f,gc,d,f,g and the marginal probability of swapping aa would be the same in each chain.

    (b) If θ=1\theta=1 and the pairs at ii and i+1i+1 are of the type:

    X:(a,b)i,(c,d)i+1\displaystyle X:(a,b)^{i},(c,d)^{i+1}
    Y:(e,f)i,(g,d)i+1,\displaystyle Y:(e,f)^{i},(g,d)^{i+1},

    With the possibility of d=id=i, or

    X:(a,b)i,(i,d)i+1\displaystyle X:(a,b)^{i},(i,d)^{i+1}
    Y:(e,f)i,(i,d)i+1,\displaystyle Y:(e,f)^{i},(i,d)^{i+1},

    Then, if a matched label e.g. dd is chosen to be swapped in chain XX, it must also be swapped in YY.

  3. 3.

    i,i+1Mti,i+1\notin M_{t} and i=AMti=AM_{t}. There are no joint matchings of labels Nt\geq N_{t} in pairs ii and i+1i+1, and label Nt1N_{t}-1 is either in pi\textbf{p}_{i} and qi+1\textbf{q}_{i+1} or in pi+1\textbf{p}_{i+1} and qi\textbf{q}_{i}. In order to preserve property 2, simply set θY=1θX\theta_{Y}=1-\theta_{X}. This ensures that label a=Nt1a=N_{t}-1 will only possibly move in one of the chains.

  4. 4.

    iMti\in M_{t} and/or i+1Mti+1\in M_{t}, and i=AMti=AM_{t}. There is one joint match in pair ii and/or i+1i+1, and label Nt1N_{t}-1 is either in pi\textbf{p}_{i} and qi+1\textbf{q}_{i+1} or in pi+1\textbf{p}_{i+1} and qi\textbf{q}_{i}. To preserve Properties 1 and 2 while keeping the correct marginal transition probabilities for each chain we define the joint transitions as follows.

    (a) Suppose c=Nt1c=N_{t}-1 and the pairs at indices i,i+1i,i+1 are of the type:

    X:(a,b)i,(c,d)i+1\displaystyle X:(a,b)^{i},(c,d)^{i+1}
    Y:(a,c)i,(f,g)i+1.\displaystyle Y:(a,c)^{i},(f,g)^{i+1}.

    By the same argument of case 2, c,d,g,fic,d,g,f\neq i. Table 1 defines the joint proposal probabilities that preserve Properties 11 and 22 and correct marginal transition probabilities.

    Table 1: Coupling transition probabilities in case 4(a)
    Move in XX Move in YY Probability
    No change No change 3/83/8
    No change cfc\leftrightarrow f 1/161/16
    No change cgc\leftrightarrow g 1/161/16
    bcb\leftrightarrow c No change 1/81/8
    bdb\leftrightarrow d cfc\leftrightarrow f 1/161/16
    bdb\leftrightarrow d cgc\leftrightarrow g 1/161/16
    aca\leftrightarrow c afa\leftrightarrow f 1/161/16
    aca\leftrightarrow c aga\leftrightarrow g 1/161/16
    ada\leftrightarrow d afa\leftrightarrow f 1/161/16
    ada\leftrightarrow d aga\leftrightarrow g 1/161/16

    (b) Suppose now that c=Nt1c=N_{t}-1 and that pairs at indices ii and i+1i+1 are of the type:

    X:(a,b)i,(c,i)i+1\displaystyle X:(a,b)^{i},(c,i)^{i+1}
    Y:(a,c)i,(f,i)i+1.\displaystyle Y:(a,c)^{i},(f,i)^{i+1}.

    Table 2 defines the joint proposal probabilities that preserve Properties 1 and 2 with correct marginal transition probabilities.

    Table 2: Coupling transition probabilities in case 4(b)
    Move in XX Move in YY Probability
    No change No change 5/85/8
    No change cfc\leftrightarrow f 1/81/8
    bcb\leftrightarrow c No change 1/81/8
    aca\leftrightarrow c afa\leftrightarrow f 1/81/8

    (c) Suppose now that c=Nt1c=N_{t}-1 and that pairs at indices ii and i+1i+1 are of the type:

    X:(a,c)i,(b,i)i+1\displaystyle X:(a,c)^{i},(b,i)^{i+1}
    Y:(d,e)i,(c,i)i+1.\displaystyle Y:(d,e)^{i},(c,i)^{i+1}.

    Table 3 defines the joint proposal probabilities that preserve Properties 1 and 2 with correct marginal transition probabilities.

    Table 3: Coupling transition probability in case 4(c).
    Move in XX Move in YY Probability
    No change No change 5/85/8
    No change ece\leftrightarrow c 1/81/8
    bcb\leftrightarrow c No change 1/81/8
    aba\leftrightarrow b dcd\leftrightarrow c 1/81/8

Coupling of labeled ranked tree shapes.

The coupling for 𝒯nL\mathcal{T}_{n}^{L} can proceed exactly as the coupling for 𝒯n\mathcal{T}_{n} until every interior-node label is matched, say at time TT. After that point, the leaf labels can be matched in any order. We extend the set MtM_{t} to contain the indices 1in21\leq i\leq n-2 of any label a{1,,n}a\in\{\ell_{1},\ldots,\ell_{n}\} such that Xt(a)=Yt(a)X_{t}(a)=Y_{t}(a) and AMtAM_{t} to be the set of indices ii such that any label a{1,,n}a\in\{\ell_{1},\ldots,\ell_{n}\} is in pair pi\textbf{p}_{i} and qi+1\textbf{q}_{i+1} or pi+1\textbf{p}_{i+1} and qi\textbf{q}_{i}. The transition probabilities defined above work with these sets MtM_{t}, AMtAM_{t} and have the following properties, for tTt\geq T and label aa:

  1. 1.

    Property 1 If Xt(a)=Yt(a)X_{t}(a)=Y_{t}(a), then Xs(a)=Ys(a)X_{s}(a)=Y_{s}(a) for all sts\geq t.

  2. 2.

    Property 2 If Xt(a)<Yt(a)X_{t}(a)<Y_{t}(a) then Xs(a)Ys(a)X_{s}(a)\leq Y_{s}(a) for all sts\geq t. If Xt(a)>Yt(a)X_{t}(a)>Y_{t}(a) then Xs(a)Ys(a)X_{s}(a)\geq Y_{s}(a) for all sts\geq t.

5.1 Coupling time

For the chain on unlabeled ranked tree shapes 𝒯n\mathcal{T}_{n}, the time to couple is T=Tn3+Tn4++T1T=T_{n-3}+T_{n-4}+\dots+T_{1}, where TaT_{a} is the time it takes to match interior node label aa, after the labels a+1,,n2a+1,\dots,n-2 are already matched. Property 2 is crucial to study TaT_{a}. Suppose that at time t=Tn3++Ta+1t=T_{n-3}+\dots+T_{a+1} when label a+1a+1 is matched we have Xt(a)<Yt(a)X_{t}(a)<Y_{t}(a). Let SaS_{a} be the time it takes for YtY_{t} to hit the left boundary of it’s range, i.e.  Yt(a)=a+1Y_{t}(a)=a+1. Then necessarily at this time, by Property 2, Xt(a)=Yt(a)X_{t}(a)=Y_{t}(a).

Note that Xt(a){a+1,,n2}X_{t}(a)\in\{a+1,\dots,n-2\}. If aa is not at the left boundary, the probability of moving aa to the left is:

(Xt+1(a)=Xt(a)1Xta+1)=1n21212=14(n2).\mathbb{P}(X_{t+1}(a)=X_{t}(a)-1\mid X_{t}\neq a+1)=\frac{1}{n-2}\cdot\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4(n-2)}.

The probability that aa moves to the right will depend on the exact configuration and whether or not there is a constraint, e.g. in the situation

(a,b)i,(c,i)i+1,\displaystyle(a,b)^{i},(c,i)^{i+1},

in which the probability aa moves to the right is 1/8(n3)1/8(n-3). Thus, we can bound

18(n2)(Xt+1(a)=Xt(a)+1Xtn2)14(n2).\frac{1}{8(n-2)}\leq\mathbb{P}(X_{t+1}(a)=X_{t}(a)+1\mid X_{t}\neq n-2)\leq\frac{1}{4(n-2)}.

The following is an easy result about the hitting time of a symmetric random walk on a line.

Lemma 5.

Let (Zt)t0(Z_{t})_{t\geq 0} be a random walk on the line {1,2,3,,m}\{1,2,3,\dots,m\} with the transitions

P(i,i+1)=pim\displaystyle P(i,i+1)=p\,\,\,\,\,\,\,\,\,\,\,i\neq m
P(i,i1)=pi1\displaystyle P(i,i-1)=p\,\,\,\,\,\,\,\,\,\,\,i\neq 1
P(i,i)=12pi1,m\displaystyle P(i,i)=1-2p\,\,\,\,\,\,\,\,\,\,\,i\neq 1,m
P(m,m)=1p\displaystyle P(m,m)=1-p
P(1,1)=1\displaystyle P(1,1)=1

for some 0<p1/20<p\leq 1/2. Suppose Z0=mZ_{0}=m and let Tm=inf{t>0:Zt=1}T_{m}=\inf\{t>0:Z_{t}=1\}. Then 𝔼[Tm]=(1/2p)m(m1)\mathbb{E}[T_{m}]=(1/2p)\cdot m(m-1).

Proof.

We can prove the result for p=1/2p=1/2, then scale. Let axa_{x} be the expected first time of hitting 11 if Z0=xZ_{0}=x. Note that the following identities are satisfied:

a1=0\displaystyle a_{1}=0
ax=1+12(ax+1+ax1)x{2,3,,m1}\displaystyle a_{x}=1+\frac{1}{2}(a_{x+1}+a_{x-1})\,\,\,\,\,\,x\in\{2,3,\dots,m-1\}
am=1+12am1+12am.\displaystyle a_{m}=1+\frac{1}{2}a_{m-1}+\frac{1}{2}a_{m}.

This is a second-order recursion, with solution ax=x(2mx+1)2ma_{x}=x(2m-x+1)-2m. So this means am=m(m1)a_{m}=m(m-1). When p<1/2p<1/2 the probability of moving is 2p<12p<1, so the expected time to hit 11 started from mm is (1/2p)m(m1)(1/2p)m(m-1). ∎

Let ZtZ_{t} be a random walk on the line a+1,,n2a+1,\dots,n-2 with p=1/4(n2)p=1/4(n-2), defined as in Lemma 5. We can couple ZtZ_{t} with Yt(a)Y_{t}(a) so that Yt(a)ZtY_{t}(a)\leq Z_{t} almost surely for all t0t\geq 0, because ZtZ_{t} is moving to the right with probability \geq Yt(a)Y_{t}(a). In conclusion,

𝔼[Ta]2(n2)(n2a)2\mathbb{E}[T_{a}]\leq 2(n-2)(n-2-a)^{2}

and thus the total coupling time is

𝔼[τcouple]a=1n3𝔼[Ta]a=1n32(n2)(n2a)2=13(n1)(n2)(n3)(2n5)23n4.\mathbb{E}[\tau_{couple}]\leq\sum_{a=1}^{n-3}\mathbb{E}[T_{a}]\leq\sum_{a=1}^{n-3}2(n-2)(n-2-a)^{2}=\frac{1}{3}(n-1)(n-2)(n-3)(2n-5)\leq\frac{2}{3}n^{4}.

This together with Theorem 2 gives the upper bound in Theorem 1 for 𝒯n\mathcal{T}_{n}:

tn4𝔼[τcouple]83n4.t_{n}\leq 4\cdot\mathbb{E}[\tau_{couple}]\leq\frac{8}{3}n^{4}.

Leaf-Labeled Trees

The coupling time for leaf-labeled trees can be written τcoupleL=T+Tleaves\tau_{couple}^{L}=T+T^{leaves}, where TT is the time from the previous section for the interior labels to match. For the time TleavesT^{leaves} for the leaves to match, note that time is saved because the leaves are allowed to match in any order. Moreover, leaf labels can be moved without constraints. In analogy with the result from Example 4.10 in aldous1983random for adjacent transpositions on SnS_{n}, this would take time of order n3log(n)n^{3}\log(n). Therefore, 𝔼[τcoupleL]23n4+C4n3log(n)\mathbb{E}[\tau_{couple}^{L}]\leq\frac{2}{3}n^{4}+C_{4}n^{3}\log(n).

6 Discussion

The representation of ranked tree shapes as ordered matchings can be used to define many other Markov chains in analogy to well-studied chains on SnS_{n}. For example, in addition to the adjacent-swap, another natural chain would be a random-swap Markov chain: pick two pairs uniformly at random, within each pair pick a label, and swap the two elements if it is allowed. The internal tree length function defined in Section 4 gives a lower bound of nn for this chain, because a single move could change the function by at most nn, and thus the Dirichlet form can be bounded by n2n^{2}. A naive coupling argument would give an upper bound of order n3n^{3}. The focus of this paper was on the adjacent-swap chain because it is a local-move chain which could be more useful for Metropolis algorithms in applications.

The upper bound on the mixing time relies on a coupling that matches one label at a time in a specific order giving a sub-optimal upper bound. We conjecture that the mixing time of the adjacent-swap chain on unlabeled ranked tree shapes is of the order of n3log(n)n^{3}\log(n) as it is in the case of adjacent transpositions on SnS_{n}. Figure 4 shows the total variation distance between the adjacent-swap chain on unlabeled ranked tree shapes and the Tajima stationary distribution for trees with n<10n<10 leaves. Due to the large size of the state space, this calculation could not be extended for larger trees in order to inform about the presence of a cutoff phenomena.

Refer to caption
Figure 4: Total variation distance for adjacent-swap chain on unlabeled ranked tree shapes. Each curve depicts the total variation distance between the probability of the chain at time tt and the Tajima stationary distribution for trees with n=6,7,8n=6,7,8 and 99 leaves.

We also note another connection between trees and permutations: Unlabeled ranked tree shapes are in bijection with alternating permutations (donaghey1975alternating, ; richard1999enumerative, ) and hence some results concerning Markov chains on alternating permutations could be used to analyze the mixing of Markov chains on unlabeled ranked trees and vice versa. However, we are not aware of mixing time results of Markov chains on the space alternating permutations (diaconis2013random, ).

Finally, in this work we analyzed Markov chains on trees whose stationary distributions are coalescent distributions. These coalescent distributions are used as prior distributions over the set of ancestral relationships of samples of DNA (Palacios2019, ). In practice, these Markov chains are used to approximate the posterior distribution, and hence, the state space is more restricted and subject of future exploration.

Acknowledgments.

MS is supported by a National Defense Science & Engineering Graduate fellowship. JAP is supported by National Institutes of Health Grant R01-GM-131404 and the Alfred P. Sloan Foundation.

References

  • [1] D. Aldous. Random walks on finite groups and rapidly mixing Markov chains. In Séminaire de Probabilités XVII 1981/82, pages 243–297. Springer, 1983.
  • [2] D. J. Aldous. Mixing time for a Markov chain on cladograms. Combinatorics, Probability and Computing, 9(3):191–204, 2000.
  • [3] P. Diaconis, S. Holmes, et al. Random walks on trees and matchings. Electronic Journal of Probability, 7, 2002.
  • [4] P. Diaconis and P. M. Wood. Random doubly stochastic tridiagonal matrices. Random Structures & Algorithms, 42(4):403–437, 2013.
  • [5] P. W. Diaconis and S. P. Holmes. Matchings and phylogenetic trees. Proceedings of the National Academy of Sciences, 95(25):14600–14602, 1998.
  • [6] R. Donaghey. Alternating permutations and binary increasing trees. Journal of Combinatorial Theory, Series A, 18(2):141–148, 1975.
  • [7] A. Drummond, M. Suchard, D. Xie, and A. Rambaut. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29:1969–1973, 2012.
  • [8] R. Durrett. Shuffling chromosomes. Journal of Theoretical Probability, 16(3):725–750, 2003.
  • [9] J. Felsenstein. Inferring phylogenies, volume 2. Sinauer associates Sunderland, MA, 2004.
  • [10] S. D. W. Frost and E. M. Volz. Modelling tree shape and structure in viral phylodynamics. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1614):20120208, 2013.
  • [11] S. Janson and G. Kersting. On the total external length of the Kingman coalescent. Electron. J. Probab., 16:2203–2218, 2011.
  • [12] J. Kingman. The coalescent. Stochastic Processes and their Applications, 13(3):235–248, 1982.
  • [13] M. Kirkpatrick and M. Slatkin. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution, 47(4):1171–1181, 1993.
  • [14] A. G. Kuznetsov, I. M. Pak, and A. E. Postnikov. Increasing trees and alternating permutations. Russian Mathematical Surveys, 49(6):79, 1994.
  • [15] H. Lacoin et al. Mixing time and cutoff for the adjacent transposition shuffle and the simple exclusion. The Annals of Probability, 44(2):1426–1487, 2016.
  • [16] D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Society, 2017.
  • [17] O. Maliet, F. Gascuel, and A. Lambert. Ranked tree shapes, nonrandom extinctions, and the loss of phylogenetic diversity. Systematic Biology, 67(6):1025–1040, 04 2018.
  • [18] N. Misra, G. Blelloch, R. Ravi, and R. Schwartz. An optimization-based sampling scheme for phylogenetic trees. In International Conference on Research in Computational Molecular Biology, pages 252–266. Springer, 2011.
  • [19] E. Mossel, E. Vigoda, et al. Limitations of markov chain monte carlo algorithms for bayesian inference of phylogeny. The Annals of Applied Probability, 16(4):2215–2234, 2006.
  • [20] J. A. Palacios, A. Véber, L. Cappello, Z. Wang, J. Wakeley, and S. Ramachandran. Bayesian estimation of population size changes by sampling Tajima’s trees. Genetics, 213(3):967–986, 2019.
  • [21] P. S. Richard. Enumerative combinatorics. Volume I, 1999.
  • [22] R. Sainudiin, T. Stadler, and A. Véber. Finding the best resolution for the Kingman-Tajima coalescent: theory and applications. Journal of Mathematical Biology, 70(6):1207–1247, 2015.
  • [23] J. Schweinsberg. An O(n2){O}(n^{2}) bound for the relaxation time of a Markov chain on cladograms. Random Structures & Algorithms, 20(1):59–70, 2002.
  • [24] D. Spade, R. Herbei, and L. Kubatko. A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces. Statistics & Probability Letters, 84:247–252, 01 2014.
  • [25] D. Štefankovič and E. Vigoda. Fast convergence of Markov chain Monte Carlo algorithms for phylogenetic reconstruction with homogeneous data on closely related species. SIAM Journal on Discrete Mathematics, 25(3):1194–1211, 2011.
  • [26] J. Wakeley. Coalescent Theory: An Introduction. Roberts & Company Publishers, June 2008.
  • [27] D. B. Wilson et al. Mixing times of lozenge tiling and card shuffling Markov chains. The Annals of Applied Probability, 14(1):274–325, 2004.