This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bijections for Ranked Tree-Child Networks

Alessandra Caraceni
Istituto Nazionale di Alta Matematica
Unità di ricerca SNS
P.zza dei Cavalieri 7, Pisa
Italy
   Michael Fuchs
Department of Mathematical Sciences
National Chengchi University
Taipei 116
Taiwan
Supported by MOST under the research grant MOST-109-2115-M-004-003-MY2.
   Guan-Ru Yu
Department of Mathematics
National Kaohsiung Normal University
Kaohsiung 824
Taiwan
Abstract

The class of ranked tree-child networks, tree-child networks arising from an evolution process with a fixed embedding into the plane, has recently been introduced by Bienvenu, Lambert, and Steel. These authors derived counting results for this class. In this note, we will give bijective proofs of three of their results. Two of our bijections answer questions raised in their paper.

1 Introduction

Phylogenetic trees and networks are important discrete structures from biology where they are used to model evolution; see [9, 10, 11]. Of these two types of structures, phylogenetic trees are simpler and more classical but they are less suitable to model evolutionary scenarios that involve reticulation events. Thus, in many recent studies, they have been replaced by (the more general) phylogenetic networks. However, the majority of the classes of phylogenetic networks are not recursive and thus they are a poor model for processes that evolve over time. In order to be able to model such processes, Bienvenu et al. recently proposed the class of ranked tree-child networks; see [1].

We recall some definitions. First, a (rooted, binary) phylogenetic network is a simple DAG (directed acyclic graph) with a unique root of indegree 0 and outdegree 11 such that all other nodes belong to one of the following types:

  • leaves, which are nodes with indegree 11 and outdegree 0; these are bijectively labeled by the set {1,,}\{1,\ldots,\ell\} where \ell is the total number of leaves;

  • tree nodes, which are nodes with indegree 11 and outdegree 22; and

  • reticulation nodes, which are nodes with indegree 22 and outdegree 11.

A phylogenetic network is called a tree-child network if each internal (i.e., non-leaf) node has at least one child which is not a reticulation node. For such a network, we call a tree node whose two children are both not reticulation nodes a branching event and a reticulation node together with both its parents a reticulation event; see Figure 1-(a). A ranked tree-child network (or RTCN for short) is now a tree-child network which is drawn in such a way that one starts with a branching event and then successively adds either a branching event or a reticulation event until the leaves are reached; see Figure 1-(b).

Refer to caption
Figure 1: (a) The two types of events of a ranked tree-child network; (b) An example of a RTCN.

The class of RTCNs has the advantage over other network classes that the special embedding into the plane makes counting relatively straightforward. (In contrast, few of the other classes of phylogenetic networks have so far been counted; see, e.g., [2, 3, 4, 5, 6, 7, 8] for some recent progress.) For instance, the following simple formula was obtained in [1] for the number RTC\mathrm{RTC}_{\ell} of RTCNs with \ell leaves:

RTC=!(1)!221.\mathrm{RTC}_{\ell}=\frac{\ell!(\ell-1)!^{2}}{2^{\ell-1}}. (1)

Curiously, the same number is the answer to the following counting problem: find the number of ways for \ell people to cross a river with a two-person boat where the boat trips follow the pattern 22 send, 11 returns, 22 send, 11 returns, etc; see A167484 in the OEIS. Thus, one can ask for a bijection between RTCNs and such boat sequences; however, in Remark 2.8 in [1] it was claimed that a natural bijection seems to be unlikely because for =3\ell=3, 33 out of the 66 RTCNs contain a reticulation event and 33 do not (see bottom resp. top row of Figure 2) whereas all 66 possible boat trips are completely equivalent. Nevertheless, we will give a simple bijection in the next section; again see Figure 2 for the correspondence between RTCNs and boat sequences for =3\ell=3 arising from our bijection.

Refer to caption
Figure 2: All RTCNs and their corresponding boat sequences for =3\ell=3.

Using this bijection, branching events are, e.g., mapped to the following events of boat sequences. Assume that the \ell people are ranked according to their skill at steering the boat. Then, the number of branching events corresponds to the number XX_{\ell} of return trips where the most skilled person among those on the opposite shore takes the boat back. Since it was proved in Corollary 2.7 in [1] that the number of branching events in a RTCN chosen uniformly at random from all RTCNs with \ell leaves satisfies a central limit theorem, we obtain the following result.

Theorem 1.

For a boat sequence of \ell people chosen uniformly at random from all boat sequences, the number of times XX_{\ell} that the most skilled person takes the boat back satisfies the limit law:

XloglogdN(0,1),\frac{X_{\ell}-\log\ell}{\sqrt{\log\ell}}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1),

where d\stackrel{{\scriptstyle d}}{{\longrightarrow}} denotes convergence in distribution and N(0,1)N(0,1) denotes the standard normal distribution.

In fact, the number RTC,b\mathrm{RTC}_{\ell,b} of RTCNs with \ell leaves and bb branching events was counted in [1] as well:

RTC,b=[1b]RT,\mathrm{RTC}_{\ell,b}=\genfrac{[}{]}{0.0pt}{}{\ell-1}{b}\cdot\mathrm{RT}_{\ell},

where the bracket denotes the (signless) Stirling numbers of the first kind and RT=!(1)!/21\mathrm{RT}_{\ell}=\ell!(\ell-1)!/2^{\ell-1} is the number of ranked trees (i.e., RTCNs without reticulation events). Since the Stirling numbers of the first kind count permutations with a fixed number of cycles, this result suggests that there should be a simple bijection between RTCNs with a fixed number of branching events and pairs consisting of permutations with a fixed number of cycles and ranked trees. We will give such a bijection in Section 3.

Refer to caption
Figure 3: A ranked tree TT (upper left corner) with =4\ell=4 leaves and all 1515 RTCNs which contain TT where incoming edges of reticulation nodes which need to be removed to obtain TT are indicated by arrows. (Note that one of the RTCNs is TT itself.)

A final bijection discussed in this note is related to another result in [1] which concerns containment of ranked trees in RTCNs. Here, we say that a ranked tree TT is contained in a RTCN NN, in symbols TNT\sqsubset N, if TT can be obtained from NN by choosing one of the incoming edges of each reticulation node of NN, removing them, and then suppressing resulting nodes with indegree 11 and outdegree 11; see Figure 3.

For a fixed ranked tree TT with \ell leaves, it was proved in [1] that

#{N:TN}=135(23)=:(23)!!.\#\{N\ :\ T\sqsubset N\}=1\cdot 3\cdot 5\cdots(2\ell-3)=:(2\ell-3)!!. (2)

Note that (23)!!(2\ell-3)!! is the number of phylogenetic trees with \ell leaves where a phylogenetic tree is a phylogenetic network without reticulation nodes; see, e.g., Corollary 2.2.4 in [10]. Thus, one again would like to have a simple bijection between RTCNs which contain TT and phylogetetic trees; see Remark 3.7 in [1]. In the final section of this note, we will give such a bijection.

2 RTCNs and boat sequences

In this section, we will describe a bijection between RTCNs with \ell leaves and boat sequences involving \ell people. This will be done in three steps. First, we will explain how to construct all RTCNs with +1\ell+1 leaves from those with \ell leaves; then, we will do the same for boat sequences; and finally we will synchronize these two constructions to get the desired bijection.

Recursive construction of RTCNs.

Here, we give a top-down construction which (uniquely) produces all RTCNs with +1\ell+1 leaves. A similar construction was described in [1] where the authors used the concept of decorated RTCNs. We will sidestep this concept which gives a (slightly) simpler construction.

Refer to caption
Figure 4: The two operations (O-i) and (O-ii) in the recursive construction of RTCNs.

Assume we have a given RTCN with \ell leaves. Then, we perform either one of the following two operations (see Figure 4 for examples).

(O-i)

Pick two labels a,ba,b from the set {1,,+1}\{1,\ldots,\ell+1\}, remove the label \ell from the RTCN and attach to its leaf two children with labels a,ba,b. Finally, relabel all the other leaves of the RTCN in an order-consistent way with the remaining labels from {1,,+1}{a,b}\{1,\ldots,\ell+1\}\setminus\{a,b\}.

(O-ii)

Pick two labels a<ba<b from {1,,+1}\{1,\ldots,\ell+1\} and a label cc from {1,,+1}{a,b}\{1,\ldots,\ell+1\}\setminus\{a,b\}. Next, find the relative ranks, say a<ba^{\prime}<b^{\prime}, of a<ba<b in {1,,+1}{c}\{1,\ldots,\ell+1\}\setminus\{c\}. Remove a,ba^{\prime},b^{\prime} from the RTCN and attach a reticulation event to their leaves with the children corresponding to the leaf which had the label aa^{\prime} being a leaf with label aa and the reticulation vertex, the children corresponding to the leaf which had the label bb^{\prime} being a leaf with label bb and the (same) reticulation vertex and the child of the reticulation vertex being a leaf with label cc. Finally, relabel all the other leaves of the RTCN in an order-consistent way with the labels from {1,,+1}{a,b,c}\{1,\ldots,\ell+1\}\setminus\{a,b,c\}.

Since both of these operations are reversible, we have the following result.

Proposition 1.

Starting from all RTCNs with \ell leaves and performing for each of them all the operations above gives (exactly once) all the RTCNs with +1\ell+1 leaves.

Moreover, this construction also immediately gives (1).

Corollary 1.

We have

RTC=!(1)!221.\mathrm{RTC}_{\ell}=\frac{\ell!(\ell-1)!^{2}}{2^{\ell-1}}.
Proof.

The number of different ways to perform (i) and (ii) above is

(+12)+(+12)(1)=(+1)22.\binom{\ell+1}{2}+\binom{\ell+1}{2}(\ell-1)=\frac{(\ell+1)\ell^{2}}{2}.

Thus,

RTC=j=11(+1)22=!(1)!221\mathrm{RTC}_{\ell}=\prod_{j=1}^{\ell-1}\frac{(\ell+1)\ell^{2}}{2}=\frac{\ell!(\ell-1)!^{2}}{2^{\ell-1}}

which proves the claim.   

Refer to caption
Figure 5: An example of a boat sequence involving 55 people. Top: the people on both sides of the shores after each step. Bottom: the representation of the boat sequence as array and the array of rankings obtained from it by applying the map rr.

Recursive construction of boat sequences.

Here, we explain how all boat sequences involving +1\ell+1 people can be constructed from those involving \ell people. (Note that this reveals a recursive structure inherent to boat sequences that was not considered in [1].)

First, note that a boat sequence involving \ell people can be represented by an array with 22 rows where the first row has 1\ell-1 entries and the second has 2\ell-2 entries, namely, the first row contains the set of numbers of the people sent in the ii-th step and the second row contains the singleton of the number of the person who returned in the ii-th step; see bottom left array in Figure 5 for an example.

Now, each such array can be mapped to a corresponding array where the numbers are replaced by the relative ranks of the people within the group on their shore, e.g., if {2,5}\{2,5\} are sent with {2,3,5,6}\{2,3,5,6\} at the same shore, then {2,5}\{2,5\} is replaced by {1,3}\{1,3\}. Call the resulting map rr; see bottom of Figure 5.

Note that the map rr is clearly a bijection from the above arrays representing boat sequences to arrays where in the first row, we have a subset of size 22 of {1,,}\{1,\ldots,\ell\}, followed by a subset of size 22 of {1,,1}\{1,\ldots,\ell-1\}, etc. until the set {1,2}\{1,2\} and in the second row, we have a subset of size 11 of {1,2}\{1,2\}, followed by a subset of size 11 of {1,2,3}\{1,2,3\}, etc. until a subset of size 11 of {1,,1}\{1,\ldots,\ell-1\}.

Now, we apply the following operations on the latter arrays: we shift all entries of the first row to the right by one position, add a subset of size 22 of {1,,+1}\{1,\ldots,\ell+1\} in the (now empty) first position and add a subset of size 11 of {1,,}\{1,\ldots,\ell\} at the end of the second row; see the downwards arrow in Figure 6.

Finally, by using the inverse of rr, we obtain an array for a boat sequence involving +1\ell+1 people; see the bottom left array in Figure 6.

Refer to caption
Figure 6: A boat sequence with 66 people constructed from the boat sequence with 55 people from Figure 5.

Since the above process is reversible, we obtain the following result.

Proposition 2.

Starting from a boat sequence involving \ell people and using the map rr, all possible operations above, and the inverse of rr, we obtain (exactly once) all boat sequences involving +1\ell+1 people.

Refer to caption
Figure 7: A RTCN (top left corner) with =4\ell=4 and the process of constructing its image under the bijection. Top: the two steps of reducing the RTCN to the one with =2\ell=2. Right-most column: The boat sequence corresponding to the initial case and its array of rankings. Bottom: the process of extending the array of rankings. Left-most column: applying r1r^{-1} gives the desired result (middle).

Synchronization of the two constructions.

We will now explain how to synchronize the constructions from the above two paragraphs to get a bijection between RTCNs and boat sequences. This bijection is defined recursively.

Assume we have given a RTCN on +1\ell+1 leaves. Reverse the recursive construction from Proposition 1 to obtain a RTCN on \ell leaves. By induction hypothesis, this RTCN is mapped on a boat sequence which is represented by an array as described in the previous paragraph. We apply the map rr to this array. Now, depending on which of the two operations was used in Proposition 1 to construct the RTCN on +1\ell+1 leaves from that of \ell leaves, we do the following to the array of ranks.

(i)

If Operation (O-i) was used, then move all the pairs from the first row by one position and add {a,b}\{a,b\} as the first entry. Also, add {}\{\ell\} at the end of the second row.

(ii)

If Operation (O-ii) was used, then again move all the pairs from the first row by one position and add {a,b}\{a,b\} as the first entry. Add the relative rank of cc in {1,,+1}{a,b}\{1,\ldots,\ell+1\}\setminus\{a,b\} to the end of the second row.

Finally, apply the inverse of rr to the above array to obtain the desired boat sequence.

More generally, in order to get the corresponding boat sequence for a RTCN on \ell leaves, we first have to reduce it to the RTCN on 22 leaves by applying Proposition 1 2\ell-2 times and retain the operations from the first paragraph for each step. Then, we can start from the (trivial) boat sequence and use the above procedure 2\ell-2 times to construct the corresponding boat sequence; see Figure 7 for an example.

Also, from the above, it is clear that branching events correspond to steps where the person with the maximal rank was picked for the return trip (or, using the language introduced just before Theorem 1, the most skilled person on the opposite shore). This proves Theorem 1 from the introduction.

3 Branching events, permutations and ranked binary trees

Refer to caption
Figure 8: Left: a RTCN with =6\ell=6 leaves and k=3k=3 reticulation events; the profile equals (qi)i=15=(1,0,1,1,0)(q_{i})_{i=1}^{5}=(1,0,1,1,0) and the sequence of positions of reticulation events equals (ai)i=13=(1,3,4)(a_{i})_{i=1}^{3}=(1,3,4). Right: the same RTCN redrawn as described in Section 3.

In this section, we describe an explicit bijection between the set of RTCNs with \ell leaves and kk reticulation events (equivalently, with 1k\ell-1-k branching events) and the set of pairs (T,σ)(T,\sigma) where TT is a ranked binary tree with \ell leaves and σ\sigma is a permutation of {1,,1}\{1,\ldots,\ell-1\} consisting of 1k\ell-1-k disjoint cycles.

From a RTCN to a ranked binary tree and permutation.

Given a RTCN with \ell leaves labeled 11 to \ell, define its profile as the sequence (qi)i=11{0,1}1(q_{i})_{i=1}^{\ell-1}\in\{0,1\}^{\ell-1} where qi=0q_{i}=0 if and only if the ii-th event from the bottom is a branching event (see left of Figure 8). Note that we necessarily have q1=0q_{\ell-1}=0, as only two lineages (i.e., vertical edges) are available to merge for the last event. Suppose i=11qi=k\sum_{i=1}^{\ell-1}q_{i}=k, that is, that the RTCN under consideration has kk reticulation events, and let a1<a2<<aka_{1}<a_{2}<\ldots<a_{k} be the indices i{1,,2}i\in\{1,\ldots,\ell-2\} such that qi=1q_{i}=1, corresponding to the reticulation events in question (note that the reticulation event aia_{i} happens at time ai\ell-a_{i}, with the notation from Figure 1-(b)).

Next, note that we can consider lineages above (and below) each event to be naturally ordered as follows. Assuming that lineages just below event ii are labeled 11 to i+1\ell-i+1 and that the event is a branching event, label the i\ell-i lineages just above the event with the integers 11 to i\ell-i so that their order reflects the order after the branching event, with the parent lineage identified with its child whose label is smaller. Similarly, assuming ii is a reticulation event, order the lineages present just above the event ii consistently with their order after the event (ignoring the hybrid lineage resulting from the reticulation event). This is by no means the only possible convention (other conventions would yield equivalent bijections), but we may use it to determine the left-to-right order of lineages when the RTCN is drawn in the plane (so that we can say that the i\ell-i lineages above event ii are labeled “left-to-right” with the numbers 11 to i\ell-i). The planar embedding induced by this convention is depicted on the right of Figure 8: the RTCN on the left is redrawn on the right with lineages ordered left to right according to their current labels, which are explicitly indicated above and below each event; in the representation on the right, parent lineages of reticulation events are drawn directly above the respective non-hybrid child lineages.

In order to obtain a ranked binary tree, we will replace each of the kk reticulation events of the RTCN with a branching event; at the same time, we shall keep track of additional information allowing the recovery of the original reticulation event.

L1L_{1}L3L_{3}L2L_{2}l1l_{1}l2l_{2}L1L_{1}L3L_{3}L2L_{2}l1l_{1}^{\prime}l2l_{2}^{\prime}
Figure 9: A reticulation event is replaced by a branching event: the hybrid lineage L3L_{3} becomes the future of the former rightmost parent lineage, which is no longer involved in the event. Note that the number L3L_{3} need not be between L1L_{1} and L2L_{2}: for examples where it is not, see the second and third transformations performed in Figure 10.
Refer to caption
Figure 10: From the RTCN displayed to the far left to its corresponding ranked binary tree obtained on the right, with its accompanying permutation σ=(a1,a1+b1)(a2,a2+b2)(a3,a3+b3)=(1,4)(3,5)(4,5)=(1,5,3,4)(2)\sigma=(a_{1},a_{1}+b_{1})(a_{2},a_{2}+b_{2})(a_{3},a_{3}+b_{3})=(1,4)(3,5)(4,5)=(1,5,3,4)(2), which is the product of 1k=613=2\ell-1-k=6-1-3=2 disjoint cycles. The three subsequent steps turn reticulation events 1, 3, 4 into branching events. The values of (L1,L2,L3,l1,l2,l1,l2)(L_{1},L_{2},L_{3},l_{1},l_{2},l_{1}^{\prime},l_{2}^{\prime}) are (1,5,4,1,4,1,4)(1,5,4,1,4,1,4) for the first step, (1,2,4,1,2,1,3)(1,2,4,1,2,1,3) for the second, and (2,3,1,1,2,2,1)(2,3,1,1,2,2,1) for the third.

Build the corresponding ranked binary tree to our RTCN by considering in turn the events a1,,aka_{1},\ldots,a_{k}. The result of the reticulation event a1a_{1} is given by two lineages L1<L2L_{1}<L_{2}, labeled l1<l2l_{1}<l_{2} above the reticulation event, and a third hybrid lineage L3L_{3}; replace this by a branching event that yields L1,L2L_{1},L_{2} from a single parent lineage PP and leaves L3L_{3} intact. Some convention is needed in order to match lineages P,L3P,L_{3} to lineages l1,l2l_{1},l_{2}; we identify PP with l1l_{1} and assign the evolution of the hybrid lineage L3L_{3} to the lineage l2l_{2} (Figure 9). Labels assigned to lineages below event a1a_{1} are unchanged by this procedure, but labels above the event are recomputed according to the rules we explained previously. In particular, labels immediately above the event a1a_{1} range between 11 and a1\ell-a_{1}; the lineage that used to be labeled l1l_{1} is now labeled l1l_{1}^{\prime} and the lineage that used to be labeled l2l_{2} is now labeled l2l_{2}^{\prime} where the values of l1,l2l_{1}^{\prime},l_{2}^{\prime} depend on L1,L2,L3L_{1},L_{2},L_{3}: if L3<L1L_{3}<L_{1}, then l1=L1=l1+1l_{1}^{\prime}=L_{1}=l_{1}+1 and l2=L3l_{2}^{\prime}=L_{3}; if L2<L3L_{2}<L_{3}, then l1=L1=l1l_{1}^{\prime}=L_{1}=l_{1} and l2=L31l_{2}^{\prime}=L_{3}-1; if L1<L3<L2L_{1}<L_{3}<L_{2}, then l1=L1=l1l_{1}^{\prime}=L_{1}=l_{1} and l2=L3l_{2}^{\prime}=L_{3}.

The result of this operation is a RTCN T1T_{1} with k1k-1 reticulation events. In order to be able to reinstate the original reticulation, it is enough to keep track of a1a_{1} (identifying the branching event to be turned back into a reticulation event) and a number b1b_{1} that identifies the lineage to turn back into the original hybrid. Note that the label l2l_{2}^{\prime}, for example, would allow the recovery of the original RTCN; l2l_{2}^{\prime} is an integer between 11 and a1\ell-a_{1}, but cannot take all values in the range independently of T1T_{1}, as it cannot be equal to l1l_{1}^{\prime}, that is, to the label of the parent lineage of the branching event a1a_{1}. We therefore define the number b1b_{1} as l2l_{2}^{\prime} if l2<l1l_{2}^{\prime}<l_{1}^{\prime} and l21l_{2}^{\prime}-1 otherwise. This way, we have 1b11a11\leq b_{1}\leq\ell-1-a_{1}; since l1l_{1}^{\prime} can be determined from T1T_{1}, the pair (a1,b1)(a_{1},b_{1}) is enough to recover l2l_{2}^{\prime} and thus the original RTCN (as we will further discuss later).

We repeat this operation for each reticulation to obtain a final ranked binary tree TkT_{k} and a sequence (a1,b1),,(ak,bk)(a_{1},b_{1}),\ldots,(a_{k},b_{k}) where 1bi1ai1\leq b_{i}\leq\ell-1-a_{i}: see Figure 10 for an example.

This sequence of kk pairs can be naturally encoded by the permutation σk=(a1,b1+a1)(a2,b2+a2)(ak,ak+bk)\sigma_{k}=(a_{1},b_{1}+a_{1})\cdot(a_{2},b_{2}+a_{2})\cdot\ldots\cdot(a_{k},a_{k}+b_{k}) in the symmetric group 𝒮1\mathcal{S}_{\ell-1} (where we adopt the convention that composition of permutations is obtained via multiplication from the right); this is the product of kk transpositions such that a1<<aka_{1}<\ldots<a_{k} and bi+ai>aib_{i}+a_{i}>a_{i}. The latter condition is actually equivalent to the permutation being a product of 1k\ell-1-k disjoint cycles, as clarified by the following lemma:

Lemma 1.

For all integers n>k>0n>k>0, a permutation σ𝒮n\sigma\in\mathcal{S}_{n} is the product of nkn-k disjoint cycles if and only if it can be expressed as a product of transpositions of the form (x1,y1)(xk,yk)(x_{1},y_{1})\cdot\ldots\cdot(x_{k},y_{k}) where 0<x1<<xk<n0<x_{1}<\ldots<x_{k}<n and xi<yinx_{i}<y_{i}\leq n for 1ik1\leq i\leq k; moreover, if such an expression for σ\sigma exists, then it is unique.

Proof.

First of all, given a positive integer nn, we show the fact that any product of k<nk<n transpositions with the properties described in the statement is the product of nkn-k disjoint cycles by induction on kk. Given a positive integer i<ni<n, consider a permutation τ𝒮n\tau\in\mathcal{S}_{n} of the form τ=(α1,β1)(αi1,βi1)\tau=(\alpha_{1},\beta_{1})\cdot\ldots\cdot(\alpha_{i-1},\beta_{i-1}) where α1<α2<<αi1\alpha_{1}<\alpha_{2}<\ldots<\alpha_{i-1} and αj<βj\alpha_{j}<\beta_{j} for 1ji11\leq j\leq i-1; assume that τ\tau has ni+1n-i+1 cycles and that each jj in {1,,n}{α1,,αi1}\{1,\ldots,n\}\setminus\{\alpha_{1},\ldots,\alpha_{i-1}\} is the maximum element of the cycle it belongs to in the factorisation of τ\tau as a product of disjoint cycles. Consider any permutation of the form τ(αi,βi)\tau\cdot(\alpha_{i},\beta_{i}) where nβi>αi>αi1n\geq\beta_{i}>\alpha_{i}>\alpha_{i-1}. The elements αi,βi\alpha_{i},\beta_{i} must belong to separate cycles in τ\tau and each must be the maximum of its own cycle, as neither can be in {α1,,αi1}\{\alpha_{1},\ldots,\alpha_{i-1}\}. Denote by cαc_{\alpha} the cycle of τ\tau containing αi\alpha_{i} and by cβc_{\beta} the one containing βi\beta_{i}; the disjoint cycles of the permutation τ(αi,βi)\tau\cdot(\alpha_{i},\beta_{i}) are those of τ\tau, with the cycles cαc_{\alpha} and cβc_{\beta} merged into one.

Note that the property that elements not in {α1,,αi}\{\alpha_{1},\ldots,\alpha_{i}\} are maximal in their cycles holds for τ(αi,βi)\tau\cdot(\alpha_{i},\beta_{i}). Indeed, if some element is not in {α1,,αi1}\{\alpha_{1},\ldots,\alpha_{i-1}\} and does not belong to cαc_{\alpha} or cβc_{\beta}, then it is maximal in its cycle in τ\tau, and thus it is maximal in the same cycle of τ(αi,βi)\tau\cdot(\alpha_{i},\beta_{i}). The elements αi\alpha_{i} and βi\beta_{i}, which do not appear in the list {α1,,αi1}\{\alpha_{1},\ldots,\alpha_{i-1}\}, are the maximums of the cycles cαc_{\alpha} and cβc_{\beta}, respectively. It follows that the element βi\beta_{i} is maximal in its cycle of τ(αi,βi)\tau\cdot(\alpha_{i},\beta_{i}), which is the union of cαc_{\alpha} and cβc_{\beta}. Finally, suppose by contradiction that an element of the new cycle other than βi\beta_{i} belongs to {1,,n}{α1,,αi}\{1,\ldots,n\}\setminus\{\alpha_{1},\ldots,\alpha_{i}\}; this would imply that it was maximal in its cycle in τ\tau, and therefore in cαc_{\alpha} or cβc_{\beta}; it would thus belong to {βi,αi}\{\beta_{i},\alpha_{i}\}, which we have excluded.

It follows from the fact that the identity of 𝒮n\mathcal{S}_{n} (the product of zero transpositions) consists of nn disjoint cycles of length one (and is thus such that each element in {1,,n}\{1,\ldots,n\} is maximal in its cycle) that any permutation of the form described in the statement is the product of nkn-k disjoint cycles.

Conversely, consider a permutation σ𝒮n\sigma\in\mathcal{S}_{n} made up of nk<nn-k<n disjoint cycles; let us determine positive integers x1,,xk,y1,,ykx_{1},\ldots,x_{k},y_{1},\ldots,y_{k} between 11 and nn such that x1<<xkx_{1}<\ldots<x_{k}, xi<yix_{i}<y_{i} for 1ik1\leq i\leq k, and σ=(x1,y1)(xk,yk)\sigma=(x_{1},y_{1})\cdot\ldots\cdot(x_{k},y_{k}). The element x1x_{1} must necessarily be the smallest one that is not fixed by σ\sigma. The element y1y_{1} must be such that σ(y1)=x1\sigma(y_{1})=x_{1}, as x1x_{1} is not moved by any of the other transpositions. Consequently, we have y1=σ1(x1)>x1y_{1}=\sigma^{-1}(x_{1})>x_{1}. We can repeat this procedure recursively on the permutation (x1,σ1(x1))σ(x_{1},\sigma^{-1}(x_{1}))\cdot\sigma (which fixes all elements up to and including x1x_{1}) to obtain the subsequent transpositions (x2,y2),,(xk,yk)(x_{2},y_{2}),\ldots,(x_{k},y_{k}).   

As a result, from our initial RTCN we have constructed a pair given by a ranked binary tree and a permutation in 𝒮1\mathcal{S}_{\ell-1} which, by applying Lemma 1 with n=1n=\ell-1, we find to be a product of as many disjoint cycles as the number of branching events in the original RTCN (that is, 1k\ell-1-k). We have already sketched how we can revert each change made to recover the original RTCN, but we give a more complete description of the inverse construction below.

From a ranked binary tree and a permutation to a RTCN.

We are given a ranked binary tree with \ell labeled leaves and a permutation σ\sigma in 𝒮1\mathcal{S}_{\ell-1} with 1k\ell-1-k cycles.

First of all, by Lemma 1 we can express σ\sigma in a unique way as a product of transpositions of the form previously described, as (x1,y1)(xk,yk)(x_{1},y_{1})\cdot\ldots\cdot(x_{k},y_{k}) with 0<x1<<xk0<x_{1}<\ldots<x_{k} and xi<yinx_{i}<y_{i}\leq n for 1ik1\leq i\leq k. We then obtain our pairs (ai,bi)i=1k(a_{i},b_{i})_{i=1}^{k} as ai=xia_{i}=x_{i}, bi=yixib_{i}=y_{i}-x_{i}.

All that is left to do is replace the branching events ak,,a1a_{k},\ldots,a_{1} in our ranked binary tree (numbered from the bottom) by reticulation events. We can still number our lineages from 11 to i\ell-i above branching event ii (again, think of the order induced by assigning to the parent the label of the smaller child).

For i=k,,1i=k,\ldots,1 in turn, consider lineages 11 to ai\ell-a_{i} right above branching event aia_{i} and let ll be the label assigned to the parent of the branching event aia_{i}; pick the past of lineage bi+sb_{i}+s where s=1s=1 if bilb_{i}\geq l and s=0s=0 otherwise, to become the past of the larger child issued from the branching event (while the past of the branching event becomes the past of the smaller child), and make the future of lineage bi+sb_{i}+s into a hybridised version of the two lineages involved in the branching event.

This operation yields a RTCN and is precisely the inverse of the one previously described (see again Figure 10 where one can follow the steps backwards to recover the RTCN from the final ranked binary tree).

4 Ranked tree containment and phylogenetic trees

In this final section, we give a bijection between the set of RTCNs which contain a fixed ranked tree and the set of phylogenetic trees; see the last two paragraphs of Section 1.

We first recall how all RTCNs which contain a fixed ranked tree TT are constructed from TT; see [1] where this was used to prove (2). Starting from the leaves and moving back in time, for every branching event of TT a decision is taken whether or not the branching event is turned into a reticulation event and if yes, which of the (future) lineages not in the branching event is connected with which of the (future) lineages from the branching event; see Figure 11-(b) for an example. We will call the resulting RTCN NN and will directly translate each step in the above process of building NN from TT into a corresponding step for building a phylogenetic tree (time will, however, be reversed).

Refer to caption
Figure 11: (a) A ranked tree TT on 66 leaves which is redrawn such that the leaf labels are in increasing order; (b) A RTCN NN containing the ranked tree.

In order to describe this, we first draw the ranked tree TT so that all the leaves are in increasing order when read from left to right; see Figure 11-(a). Then, we consider the resulting NN (with the same order of the leaves). Now, starting from a tree consisting of a root with one child labeled by 11, we build the (rooted) tree τ\tau with leaf labeled by 1,,1,\ldots,\ell and internal nodes labeled by 1¯,,1¯\overline{1},\ldots,\overline{\ell-1} where \ell is the number of labels of TT and we assume that 1<<<1¯<<1¯1<\cdots<\ell<\overline{1}<\cdots<\overline{\ell-1}, by processing the events from NN top-down. More precisely, for the kk-th event in NN, we do the following:

(a)

If the kk-th event is a branching event, we insert a node with label k¯\overline{k} into the root edge of τ\tau and attach to this node a leaf with label k+1k+1;

(b)

If the kk-th event is a reticulation event that was created from the kk-th branching event of TT by connecting the ii-th (future) lineage (not counting the lineages of the kk-th branching event), then, we insert a node with label k¯\overline{k} into one of the edges to the children of the internal node with label i¯\overline{i} of τ\tau where the child with the smaller (larger) label is chosen depending on whether the left or the right lineage of the kk-th branching event of TT was used to form the reticulation event; moreover, we again attach a leaf with label k+1k+1 to the inserted node.

See Figure 12 for a visualization of the the above construction for the RTCN from Figure 11-(b).

Refer to caption
Figure 12: The rooted labeled tree constructed from the RTCN from Figure 11-(b) which contains the ranked tree from Figure 11-(a).

Next, we remove the labels of all international nodes and the root edge of τ\tau. The resulting tree is a phylogenetic tree; see the tree on the right of Figure 13. This tree is the image of NN.

Refer to caption
Figure 13: The image of the RTCN from Figure 11-(b) under the bijection.

Finally, it is not hard to see that the above construction is reversible: e.g., if the phylogenetic tree on the right of Figure 13 is given, find the largest label (55); its parent will have label 4¯\overline{4}. Then, remove the largest label, the parent label and the edge which joins them. Continue until all internal nodes have received a label (where the number of the label is decreased by 11 in every step). This gives the right-most tree in Figure 12 which unambiguously encodes the whole tree construction process from Figure 12. Clearly, this process can be used to re-construct NN.

Acknowledgments

We thank François Bienvenu for putting us into contact. Moreover, we thank the two anonymous referees for a careful reading. The second author acknowledges financial support by the Ministry of Science and Technology, Taiwan under the research grant MOST-109-2115-M004-003-MY2; the third author was partially supported by the same funding agency under the grant MOST-110-2115-M-017-003-MY3.

References

  • [1] F. Bienvenu, A. Lambert, M. Steel. Combinatorial and stochastic properties of ranked tree-child networks, Random Struct. Algor., in press. https://doi.org/10.1002/rsa.21048
  • [2] M. Bouvel, P. Gambette, M. Mansouri (2020). Counting phylogenetic networks of level 1 and 2, J. Math. Biol., 81:6-7, 1357–1395.
  • [3] C. McDiarmid, C. Semple, D. Welsh (2015). Counting phylogenetic networks, Ann. Comb., 19:1, 205–224.
  • [4] M. Fuchs, B. Gittenberger, M. Mansouri (2019). Counting phylogenetic networks with few reticulation vertices: tree-child and normal networks, Australas. J. Combin., 73, 385–423.
  • [5] M. Fuchs, B. Gittenberger, M. Mansouri (2021). Counting phylogenetic networks with few reticulation vertices: exact enumeration and corrections, Australas. J. Combin., 82:2, 257–282.
  • [6] M. Fuchs, E.-Y. Huang, G.-R. Yu. Counting phylogenetic networks with few reticulation vertices: a second approach, arXiv:2104.07842.
  • [7] M. Fuchs, G.-R. Yu, L. Zhang (2021). On the asymptotic growth of the number of tree-child networks, European J. Combin., 93, 103278, 20pp.
  • [8] M. Fuchs, G.-R. Yu, L. Zhang. Asymptotic enumeration and distributional properties of galled networks, arXiv:2010.13324.
  • [9] D. H. Huson, R. Rupp, C. Scornavacca. Phylogenetic Networks: Concepts, Algorithms and Applications, Cambridge University Press, 1st edition, 2010.
  • [10] C. Semple and M. Steel. Phylogenetics, Oxford University Press, Oxford, 2003.
  • [11] M. Steel. Phylogeny—Discrete and Random Processes in Evolution, CBMS-NSF Regional Conference Series in Applied Mathematics, 89, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2016.