Parameter Identifiability of a Multitype Pure-Birth Model of Speciation
Abstract
Diversification models describe the random growth of evolutionary trees, modeling the historical relationships of species through speciation and extinction events. One class of such models allows for independently changing traits, or types, of the species within the tree, upon which speciation and extinction rates depend. Although identifiability of parameters is necessary to justify parameter estimation with a model, it has not been formally established for these models, despite their adoption for inference. This work establishes generic identifiability up to label swapping for the parameters of one of the simpler forms of such a model, a multitype pure birth model of speciation, from an asymptotic distribution derived from a single tree observation as its depth goes to infinity. Crucially for applications to available data, no observation of types is needed at any internal points in the tree, nor even at the leaves.
keywords:
[class=MSC]keywords:
, and
1 Introduction
Species diversification models are used in Biology to make inferences about historical speciation and extinction rates over the time since a group of species, or taxa, evolved from a common ancestor. By providing information on rates of speciation and extinction, inference with these models seeks to give insight into the evolutionary dynamics leading to the present diversity of life. These models have a long history, starting with Yule’s constant-rate pure-birth model [18], and a fairly large literature has developed.
Diversification models describe a process beginning with a single lineage at some time in the past, which as time progresses may speciate or go extinct. When a speciation occurs the edge bifurcates into two edges, with the number of lineages increasing by 1. When an extinction occurs, the lineage ends, and the number of lineages decreases by 1. After either event, the process continues forward, independently on all lineages, producing a growing tree structure until the present time is reached. This tree, which has both topological and metric structure, constitutes an observation. (In applications, it may be necessary to consider the reconstructed tree, which is obtained by removing all tree edges with no descendents at the present [12, 5].)
Two basic sorts of these models have found common use in empirical studies. In the first, the speciation and extinction rates are functions of time, and apply to all taxon lineages present at any moment. This can be thought of as modeling exogenous factors, such as environmental conditions, that affect all taxa in the tree identically. Since all lineages behave in the same probabilistic way at any moment, it is not hard to show that the exact branching pattern of the tree-structure is irrelevant, with all the information in a tree observation being captured by the number of lineages as a function of time. Thus the work on time-dependent birth-death models by Kendell is foundational [7].
In the second sort of diversification model, which we call the multitype birth-death tree model, lineages are assigned one of a finite number of types at each moment, with the model’s speciation and extinction rates dependent only on the type. Over time, however, species may change types at fixed switching rates. This models endogenous factors, such as a particular biological trait a taxon may possess, including, for instance, a morphological feature, behavior, or whether a particular gene is present and active in an organism. A given type might correlate with faster or slower speciation than another, and/or affect the extinction rate. For these models the branching structure of a tree observation does matter, as taxa present at a given time may each have different types, and thus different tendencies to speciate or go extinct.
The Binary State-specific Speciation and Extinction (BiSSE) model formalized the multitype framework for biological applications [10]. Multitype (MuSSE) and quantitative-type (QuaSSE) variants of the model were subsequently proposed [4]. Although these works assumed the type is observed for the extant taxa at the leaves of a tree, we consider the multitype birth-death tree model with no type information observable for any lineage at any time, as type observations are unnecessary for our results. Indeed, the usefulness of these models to infer correlation between observed types and diversification rates from data with type information for extant taxa has been called into question [14].
Many other diversification models have been proposed, combining or extending these basic frameworks, with [16] offering one review. New variants continue to be developed, e.g., [11, 17, 15, 3].
When these models are used for inference, the data is taken as a single tree assumed to show the true evolutionary relationships of the taxa. (In practice, this tree itself must be inferred, usually from sequence data using phylogenetic and/or phylogenomic methods which we do not discuss here.) Multiple trees which one can reasonably hypothesize were generated with the same parameter values are simply not available. If the tree is sufficiently large, researchers hope it provides enough information to infer the speciation and extinction parameters reasonably well. More precisely, it has been implicitly assumed that the inference is statistically consistent, in the sense that as the number of taxa increases toward infinity (i.e., the tree grows larger), the probability of inferring model parameters arbitrarily close to the generating ones approaches 1. Establishing such a result, however, requires showing identifiability of the model parameters: A distribution derived from an observation of a single tree has a limit, as the number of taxa approaches infinity, that uniquely determines all parameter values.
Of course a full proof of the statistical consistency of a particular estimator requires additional arguments. For instance, the standard results on the consistency of maximum likelihood assume the availability of multiple independent samples, and therefore cannot be applied. Leroux’s result on the consistency of maximum likelihood inference from a single sequence of observation from a Hidden Markov Model [8] is analogous to what is need for applications of these diversification models. Nonetheless, establishing parameter identifiability is the first step toward this goal.
Recent work has shown that the first type of diversification model, with time-dependent rates, does not in fact have identifiable parameters, despite its widespread use by empiricists [9]. This result, which holds even if one allows for identification to be based on arbitrarily many independent tree observations with the same underlying rate parameters, was compellingly illustrated by construction of examples of wildly different rate functions producing identical tree distributions. An instance of this lack of identifiability had in fact appeared earlier, in an argument in which speciation rates were modified and extinction rates set to zero without changing the model distribution [12].
Little work, however, has addressed identifiability questions for multitype birth-death tree models. The strongest results on parameter identifiability for a pure birth model focus on a tree’s topological features but assume the types of both leaf nodes and their parents are observed [13]. In biological applications, however, the type of a leaf of the tree may be observable, but the type of the parent nodes is virtually never known. Thus no identifiability result relevant to typical data analyses has been produced.
One might hope that the analysis of multitype birth-death tree models would be simpler than for a time-dependent rate model, as its parameter space is finite dimensional. On the other hand, while trees produced by the time-dependent rate models can be summarized by the counts of lineages through time with no loss of information, this is not true for the multitype models. Effectively extracting information from a tree with both topological and metric structure requires a new approach.
In this paper, we investigate parameter identifiability of the multiype pure-birth tree (MPBT) model with any finite number of types. We thus restrict extinction rates for all classes to be zero. This model has also been called the multitype Yule model [13]. We assume only that the metric tree is observable, with no information on the types either at points internal to the tree or at the leaves. More formally, we establish generic identifiability of parameters up to label swapping. “Generic” means the result holds if we exclude parameters lying in a measure-zero subset of the parameter space. We give an explicit characterization of such a measure-zero exceptional set, as the zero set of a certain polynomial. “Up to label swapping” means that there are certain symmetries of the parameter space, arising from interchanging types, so that their corresponding speciation and switching rates are also interchanged, that have no effect on the model’s behavior. Generic identifiability up to label swapping is often the strongest form of identifiability that holds in models with hidden variables [1], and since we treat the types as unobservable, its appearance here is not surprising.
Our arguments draw on several earlier works. The first is [2] on Multitype Continuous Time Markov Branching Processes. In fact, these models and the MBPT model have the same underlying structure. But much of the classical branching process literature allows only for observing type counts over time, and not for observing the tree structure indicating the branching of specific lineages. The MPBT model, in contrast, treats the tree structure as observable, with type information hidden. Thus while providing an important tool in this work, the results of [2] are not immediately applicable to the MPBT model.
The second result crucial to our work is a general theorem on identifiability up to label swapping of parameters of a mixture model of product distributions [1]. In applying this to the MPBT model, we consider the joint distribution of edge lengths around a node on a uniformly-at-random chosen edge of a random tree, as the random tree grows arbitrarily large. Due to conditional independence of edge lengths conditioned on the type of the shared node, this joint distribution takes the form of a mixture distribution (over types) of product distributions. Although additional work is necessary to show parameter identifiability, this theorem is a crucial ingredient in our argument.
Although we do not address the multitype birth-death tree model with non-zero extinction rates here, we believe that our approach provides a pathway toward a more general result.
This paper is structured as follows. In Section 2 we provide a more formal definition of the MPBT model, and begin its analysis by deriving formulas related to the generation of a single edge in the tree in Section 3. Section 4 uses the results in [2] to obtain asymptotic results on the distribution of types across lineages in the tree at times increasingly distant from the root of the tree. Then, in Section 5, we bring these ingredients together, and apply the theorem of [1] to obtain our main results. Concluding remarks appear in Section 6.
2 Model definition
In this section we formalize the Multitype Pure-Birth Tree model, in a form useful for our analysis.
Let be a positive integer denoting the number of types, and denote the set of types by .
The parameter space of the MBDT model with types is all 3-tuples described as follows:
A root distribution , with , gives probabilities of type being chosen for the tree root. A vector with non-negative entries gives speciation rates for type . An matrix with non-negative off-diagonal entries and rows summing to 0 gives scalar type switching rates from type to type , . Note that is determined by the independent scalar switching rates.
2.1 The edge process model
We first describe how an edge of a tree is produced under the model. As edges of the tree are produced independently conditioned on their starting types, a description of a single edge is sufficient.
We view an edge as growing with time, randomly changing the type of its leading point as it does so. At any time the edge may speciate, at a rate determined by its current type . When speciation occurs, the edge ceases to grow, and in the full model two new edge processes are started for its descendent edges. However, in formalizing the edge process we describe the speciation of an edge as the process entering an absorbing state, for mathematical convenience.
For each type , define two states . At any time, state indicates that the current leading point of the edge has type and that the edge has not yet speciated. The absorbing state represents that a speciation has occurred and at the time of speciation the leading point had type . The parameter , , is thus a rate of change from state to state , while is the rate of change from state to . No other instantaneous state changes are allowed.
Definition 2.1.
The -type pure-birth edge process with , , is the -state continuous-time Markov process over with states
initial state distribution , and transition rate matrix
where the rows and columns of are ordered by states as above. Here is a vector or matrix of 0s, and is the diagonal matrix formed from vector .
The transition probability matrix associated to is
with giving the probability that an edge is in state at time given that it was in state at time .
Definition 2.2.
The speciation time associated to is the -valued random variable
A realization of the edge process that reaches a “+” state is viewed as an edge of length , the time at which a speciation occurs. Each point (time ) along the edge is “colored” by type if the process is in state (or state at its endpoint) at that time. Under mild assumptions, the edge length is finite with probability 1, as is shown below. Although for the MPBT model colors on edges are ultimately hidden, they play an important role in our arguments.
The terminal edges of the tree are produced by terminating edge processes at a specific time, before they may have reached an absorbing state. Formally defining such a truncated edge process and the colored edge it produces, is straightforward.
Due to the time-homogeneous Markov formulation of the edge process, we may equivalently produce an edge either from a single process reaching a “” state, or by starting the process, truncating it before it enters a “” state, starting a new process in the final state of the truncated one, and then conjoining the edges produced. Likewise, to produce an edge from the truncated process, we may allow the process to continue to a later time, and then truncate the edge that was produced to an initial segment.
2.2 The multitype pure-birth tree model
We now define the MPBT model, as a generative model producing a tree. Let be the depth (length of all paths from root to any tip) of the tree to be sampled.
-
1.
The process begins with a root node. With parameters generate from an edge process a colored descendent edge from the root to a node of type , the only current tip of the tree.
If the length of this edge is , truncate it to length , and go to Step 4.
Otherwise, at this node attach two descendent edges of length 0, with points on them colored by . The tree now has 2 tips.
-
2.
If the tree currently has tips, for each tip generate a descendent edge via independent edge processes with parameters , where is the type of the tip and the standard basis vector in . Truncate all edge processes at the time when the first reaches a “+” state. The colored edges for each tip are conjoined to the edges (possibly of length 0) leading to the tip.
If the path length from the root to a tip of the tree is , truncate all terminal edges so that all paths from root to leaves have length , and go to Step 4.
Otherwise, at the tip that arose from reaching state , we attach two descendent edges of length 0 with points on them colored by .
-
3.
Go to step 2.
-
4.
Uncolor all edges to obtain a sampled tree.
An example simulation of a colored tree from a binary-type model is shown in Figure 1, with the color hidden in Figure 2.
Remark.
Inherent in the model are several notions of time. For an individual edge process, is a time variable, with at the parental node in the edge. For the tree generation process overall, we use as the time variable, with at the root. If the edge process starting at the root enters a “+” state at time , then that root edge has length and at its child node . Then if the edge process for an edge descending from the first speciation produces an edge of length , then at its child node . In general, a point on any edge at time has
We can thus view a random tree as growing with time , as its terminal edges lengthen while changing type, and speciate.
Remark.
While we have defined the MPBT model as starting with a single edge descending from the root node, it is equally common to define diversification model starting at a bifurcating root. The modifications to the definition that are necessary to do so are straightforward, and working in that context would have no substantive impact on the arguments which follow.
Remark.
Even if , a single observed tree does not allow for the identification of , so we focus on identifying the pair . This factor of the parameter space can be identified with the non-negative orthant of .

3 The edge process
For parameters , let and , so that the edge process has Markov rate matrix
Lemma 3.1.
The transition probability matrix for is
where satisfies .
Proof.
For
so
∎
For technical reasons we impose the following assumption, which is also biologically plausible.
Assumption 1.
The speciation rates are positive for all .
Lemma 3.2.
Let (, ) be parameters for a MPBT edge process satisfying Assumption 1. Then is non-singular and all eigenvalues of have negative real part.
Proof.
The assumption implies that is strictly diagonally dominant, that is, the absolute value of each diagonal entry is strictly greater than the sum of the absolute values of all other entries in its row. Thus is non-singular [6]. Since the diagonal entries are also negative, by the Gershgorin Circle Theorem every eigenvalue of will have negative real part. ∎
Proposition 3.3.
Let denote the cdf of the speciation time conditioned on , and be the vector of s. Then is given by the -th entry of
Moreover, under Assumption 1, is finite with probability .
Proof.
Since is the time first enters any of the absorbing states , is the sum across the row of the upper right block of . From Lemma 3.1, using that , the column vector of the s is therefore given by
Proposition 3.4.
Let denote the asymptotic probability of transition to conditioned on . Then under Assumption 1, is the -entry of .
4 Type Counting Process
Another ingredient of our approach to establishing the identifiability of MPBT model parameters is an analysis of an associated classical branching process, in which only the type counts are observed. More specifically, it records the number of edges of the tree which have each type as a function of time, but retains no information on the topology of the tree. We call this the type counting process, and in this section use established results to determine the asymptotic behavior of the relative frequencies of each type.
Definition 4.1.
For , let denote the number of edges in a colored random tree arising from the colored MPBT model that exist at time and are of type at that moment. The type counting process is the -valued continuous-time stochastic process over defined by . The relative frequency process is , provided the denominator is non-zero.
The asymptotics of the relative frequencies follow from results of [2] on multitype continuous-time Markov branching processes, specifically Theorems 1 and 2 of that work, which are paraphrased below as Theorem 4.5. Such a model can be described as a process where individuals of type live an exponentially-distributed length of time (whose rate only depends on type) and on death may be replaced by individuals of any type according to a distribution over .
To place the type counting process of the MPBT model into this framework, both speciation and change in type are viewed as deaths. Speciation results in replacement by 2 individuals of the same type, and change in type results in replacement by an individual of a different type. Since a speciation “death” of a type individual occurs with rate , and a type change “death” of a type individual followed by replacement with type occurs with rate , the combined rate of death for type is . When a death occurs, it is a speciation with probability
and a change to type with probability
Basic properties of the type counting process are summarized in the following.
Lemma 4.2.
The type counting process of the MPBT model is a strong Markov, continuous-time, -type branching process, where each type death has an offspring distribution defined by the multivariable probability generating function
We introduce yet another matrix defined in terms of the MPBT model parameters, as its leading eigenvalue and corresponding eigenvector plays a large role in the counting process’s behavior.
Definition 4.3.
Given parameters of the MPBT model, let
A leading eigenvalue of is an eigenvalue, , with the largest real part, and a normalized leading left eigenvector of , is a left eigenvector for with .
The matrix is the infinitesimal generator of the conditional expectation of the s. More precisely,
with
where is the -th standard basis vector.
We will shortly show and are uniquely determined, under an additional assumption.
Assumption 2.
The off-diagonal entries of are positive, i.e., for .
Lemma 4.4.
For parameters of the MPBT model satisfying Assumption 2,
-
1.
has positive entries for .
-
2.
A has a unique leading eigenvalue , which is both simple and real. Moreover the corresponding normalized left eigenvector can be chosen to have all positive components.
Proof.
Fix . Then, using Assumption 2, has positive off-diagonal entries, so there is a real such that has positive entries. Since commute, it follows that . Since has positive entries, does as well. Thus, has positive entries.
The Perron-Frobenius Theorem applied to shows it has a unique dominant (i.e., of maximal absolute value) eigenvalue which is also positive and simple, with a unique normalized left eigenvector whose components are all positive. Since has the same eigenvectors, and eigenvalues shifted by and scaled by , the second claim follows. ∎
Key properties of the counting process follow from the following more general theorem on classical branching processes.
Theorem 4.5.
[2] Let be a strong Markov, continuous-time, -type branching process over which takes values in . Let be the conditional expectation matrix. Let be the offspring probability generating function for type .
If has positive entries for some , and is of degree for all , then as ,
where is a non-negative random variable, is the leading eigenvalue of , and is the positive normalized left eigenvector of associated with .
Moreover, if are random variables with generating functions , then
(1) |
for all if and only if for all
Corollary 4.6.
Proof.
Using the assumptions and Lemmas 4.2 and 4.4, the hypotheses of Theorem 4.5 are met, including inequality (1). Thus
where is the leading eigenvalue of , is its positive normalized left eigenvector, and is a non-negative random variable.
Since the random variable is non-decreasing, the probability of extinction is zero:
Thus we find , implying regardless of . Then by the continuous mapping theorem,
for each . ∎
Remark.
In studying diversification models with a single type but time-dependent rates of speciation and extinction, it is common to consider the random function giving the the number of lineages through time in a tree. This loses no information on parameters from the full tree, as each change in its value (speciation or extinction) is equally likely to have occurred on any lineage, and the growth of this function is thus highly informative on parameter values. For the multitype pure-birth model, however, the function should not capture all information in the tree, as speciation may not be equally likely on all lineages. Corollary 4.6 indicates its growth is determined only by , the largest eigenvalue of .
5 Identifiability of the MPBT model
Using the distributions of edge lengths and relative frequencies of each type of edge in a tree at a given time found in Sections 3 and 4, we are ready to establish identifiability of the MPBT parameters. To do so, we consider an asymptotic joint distribution of the lengths of 3 edges around a common node in the tree (see Figure 2). We seek to show that from this distribution the model parameters can be determined, up to label swapping.

Due to the conditional independence of the lengths of three edges sharing a common node, given that node’s type, this distribution is a mixture of product distribution, with the mixing distribution and the components of the products closely related to distributions previously computed. This structure allows for the application of the following theorem, to obtain unmixed distributions of edge lengths conditioned on the type of the parental node. Thus even though we have no observation of type at any point in the tree, we can extract a distribution that is conditioned on type.
The following is a variant of Theorem 8 of [1], with the hypotheses modified as discussed on p. 3116 of that paper.
Theorem 5.1.
[1] For , let
be a product of independent, absolutely continuous distributions on . With , let be a distribution on . For each , suppose the set of distributions has the property that every subset of elements is linearly independent, and that
Then, up to label swapping in , the and are determined by the mixture distribution
More precisely, determines distributions and such that for some permutation of the set ,
To apply this theorem, we make a further technical assumption, denoting the vector of s by .
Assumption 3.
Parameters are such that the matrix
is non-singular.
While the role of this assumption in our arguments will be clear in our proofs of Lemma 5.5 and Theorem 5.6 below, to understand its implications concretely, consider first the case . Then
so
The non-singularity of thus is equivalent to That these speciation rates would need to be different for parameters to be identifiable is intuitively clear, since otherwise type changes governed by would have no impact on the structure of the uncolored tree.
For general , Assumption 3 is equivalent to the non-vanishing of , a degree polynomial in the independent entries of . Its non-vanishing thus excludes an algebraic variety of codimension 1, a set of Lebesque measure 0 in the unrestricted parameter space. An explicit calculation in the case shows the polynomial to be an irreducible polynomial in the and , .
The non-vanishing of always requires that the vector not be a multiple of (so that the first two columns of are linearly independent), and hence that not all are the same. However, the additional restrictions it imposes on the parameters are more opaque to intuition without considering special cases.
For instance, when , if all the are equal, so the type switching behavior is identical for all types, the polynomial simplifies considerably, and factors as
Non vanishing of the polynomial, then requires that the three be distinct, as one would expect is needed for identifiability, for otherwise several types would behave identically. However, for other choices of the , two of the can be equal without the polynomial vanishing.
Next, we define the joint edge length distribution for several edges of a tree.
Definition 5.2.
For some , consider the following three random variables: Sample an (uncolored) tree of depth under the MPBT model. From among the edges of the tree existing at time choose one uniformly at random. Then with , the time at which that edge speciates, let denote the time interval until it speciates, and let and , respectively denote the lengths of the immediate descendent edges (where the edges are designated 1,2 uniformly at random). Then the joint distribution of these three variables , , and is
We call the joint distribution of edge lengths around a node.
The three edge lengths used in the definition of are depicted in Figure 2, for . The conditioning in the definition of ensures it only considers edges in which the edge process has led to speciation, that is, the edge processes for the parental and child edges are not truncated.
Lemma 5.3.
Proof.
Note that the event which is conditioned upon in the definition of excludes edge lengths resulting from truncated edge processes, so that all edge lengths under consideration are in fact speciation times . Thus
where the function is the difference of the conditional and non-conditional probabilities above. But since the probability of as , it follows that . We henceforth focus on rather than .
Letting denote the event that the uniformly-at-random chosen edge is of type at time and denote the event that that edge speciates in color , and recalling that edge processes around a node are independent when conditioned on the type of that node, we have
Remark.
While the specific time is used in this Lemma, our arguments would be essentially unchanged if this were replaced by any function with and as .
This immediately gives that is a finite mixture of product distributions.
Corollary 5.4.
The asymptotic joint distribution of edge lengths around a node, can be expressed as a -component mixture of products of univariate distributions:
where , , , and is as defined in Proposition 3.4.
In order to apply Theorem 5.1 to , we need to verify that some of the univariate distributions in its decomposition above are linearly independent. To do so, the following lemma is needed.
We now introduce an additional assumption, which holds for generic parameters.
Assumption 4.
The speciation parameters satisfy for all .
Lemma 5.5.
Proof.
Since , we need only consider the cases .
Consider first the case . Consider the vector of functions . Then by Proposition 3.3,
Suppose for some vector . Since , it follows that where is defined in Assumption 3. Since is non-singular, , so the entries of are independent.
For , it is enough to show the independence of each pair of functions
From Lemma 3.1 the vector of all is given by
Suppose for some vector . Since , it follows that
In particular, for we find . For , since and , we have
To show every pair of the s is independent, consider all of whose entries except possibly two are zero. Without loss of generality suppose the exceptions are . Then the equations become
Using , computing the determinant of this matrix shows it is non-singular, and hence .
∎
We now arrive at our main result.
Theorem 5.6.
Proof.
Suppose two parameter choices, and , induce the same asymptotic distribution . Denoting the various distributions of conditional branching times, asymptotic transition probabilities, eigenvectors of matrices, etc. associated to parameters as earlier in this work, we use the same notation with a “” appended to denote the corresponding entities associated to parameters .
By Theorem 5.1, Corollary 5.4, and Lemma 5.5 the distributions , for , are determined from , up to label swapping in . Thus for some permutation .
Using Proposition 3.3 the equations for all can be represented in matrix form as
(3) |
where is the permutation matrix representing . Equating coefficients of the MacLauren series yields for that
(4) |
Using equation (4) and the definition of in Assumption 3 shows
(5) |
Equation (4) further implies
Using equation (5) then yields
and since is non-singular,
Since and each row of adds to 0, multiplying the last equation by on the right gives Since this implies , it follows that as well. Thus the parameters differ only up to label swapping. ∎
Remark.
Theorem 5.6 establishes that an asymptotic distribution, as tree depth associated to the MPBT model yields parameter identifiability. This suggests that with a sample of many trees of arbitrarily large size, there is potential for statistically consistent inference, where “consistency” would mean as both the number of trees and the tree depth go to infinity. However, this is not the framework in which data analysis with this model is performed, since while a tree may be large, only one tree observation is available [10].
Fortunately, a minor modification to the proofs above again yields identifiability of parameters from an asymptotic distribution derived from a single observation, as the depth of the tree goes to infinity. Indeed, modify Definition 5.2 so that is the distribution of edge lengths around a node from single growing tree. The proof of Lemma 5.3, then, is modified only in its last line, as , a random variable rather than its expected value. Nonetheless, by Corollary 4.6, we again find , so the conclusion is unchanged.
6 Discussion
Theorem 5.6, and Remark Remark, show that parameters of the MPBT model can be identified from an asymptotic distribution as the tree depth grows, whether or not the number of sampled trees grows. Although this is not sufficient to conclude that estimation of parameters by maximum likelihood (ML) from a single tree, as suggested in [10], is statistically consistent, it does at least indicate that is a possibility. A similar question on ML inference of parameters for a hidden Markov model from a single sequence of observations was addressed by [8], with the consistency of ML estimation established as the sequence length goes to infinity.
For applications, it would be highly desirable to extend our identifiability result to a model incorporating constant extinction rates for each type. In most biological settings, the obtainable “data,” however is not the tree with edges stopping at extinction events, but rather the pruned tree in which all edges with no extant descendants are removed.
For a single type, parameter identifiability of a model with pruning was essentially considered in [12], where it was shown that the lineages-through-time function’s rate of change allowed the speciation and extinction rates to be determined, by separately considering the time regimes much earlier than the tree tips, and near the tree tips. An analysis combining the insight from [12] with the mixture distribution framework used in this work might be successful in showing parameters can be recovered from a single large tree observation for the multitype model.
Another interesting identifiability question for multitype tree models concerns what information on parameters is contained in the tree topology alone, or from weaker metric information than precise branch lengths. While our analysis depends heavily on metric features of the tree, that of [13] required no metric information. However, it did use type observations at the tips of the tree, and at their parental nodes. While types at tree tips may be observed in some biological studies, types of the parental nodes are generally not observable, as data is generally collected only from the taxa extant at the present. Even if ancient DNA or other trait data from earlier times is available, it is unlikely to be from the time of the last speciation.
ESA and JAR were supported in part by NSF Grant DMS-2051760.
References
- [1] {barticle}[author] \bauthor\bsnmAllman, \bfnmElizabeth S.\binitsE. S., \bauthor\bsnmMatias, \bfnmCatherine\binitsC. and \bauthor\bsnmRhodes, \bfnmJohn A.\binitsJ. A. (\byear2009). \btitleIdentifiability of Parameters in Latent Structure Models With Many Observed Variables. \bjournalAnn. Statist. \bvolume37 \bpages3099–3132. \bdoi10.1214/09-AOS689 \endbibitem
- [2] {barticle}[author] \bauthor\bsnmAthreya, \bfnmKrishna Balasundaram\binitsK. B. (\byear1968). \btitleSome Results on Multitype Continuous Time Markov Branching Processes. \bjournalThe Annals of Mathematical Statistics \bvolume39 \bpages347 – 357. \bdoi10.1214/aoms/1177698395 \endbibitem
- [3] {barticle}[author] \bauthor\bsnmBarido-Sottani, \bfnmJoëlle\binitsJ., \bauthor\bsnmVaughan, \bfnmTimothy G\binitsT. G. and \bauthor\bsnmStadler, \bfnmTanja\binitsT. (\byear2020). \btitleA Multitype Birth–Death Model for Bayesian Inference of Lineage-Specific Birth and Death Rates. \bjournalSystematic Biology \bvolume69 \bpages973-986. \bdoi10.1093/sysbio/syaa016 \endbibitem
- [4] {barticle}[author] \bauthor\bsnmFitzJohn, \bfnmRichard G.\binitsR. G. (\byear2012). \btitleDiversitree: comparative phylogenetic analyses of diversification in R. \bjournalMethods in Ecology and Evolution \bvolume3 \bpages1084-1092. \endbibitem
- [5] {barticle}[author] \bauthor\bsnmHarvey, \bfnmPaul H.\binitsP. H., \bauthor\bsnmMay, \bfnmRobert M.\binitsR. M. and \bauthor\bsnmNee, \bfnmSean\binitsS. (\byear1994). \btitlePhylogenies Without Fossils. \bjournalEvolution \bvolume48 \bpages523–529. \endbibitem
- [6] {bbook}[author] \bauthor\bsnmHorn, \bfnmR. A.\binitsR. A. and \bauthor\bsnmJohnson, \bfnmC. R.\binitsC. R. (\byear2012). \btitleMatrix Analysis. \bpublisherCambridge University Press. \endbibitem
- [7] {barticle}[author] \bauthor\bsnmKendall, \bfnmDavid G.\binitsD. G. (\byear1948). \btitleOn the Generalized “Birth-And-Death” Process. \bjournalThe Annals of Mathematical Statistics \bvolume19 \bpages1 – 15. \bdoi10.1214/aoms/1177730285 \endbibitem
- [8] {barticle}[author] \bauthor\bsnmLeroux, \bfnmBrian G.\binitsB. G. (\byear1992). \btitleMaximum-likelihood estimation for hidden Markov models. \bjournalStochastic Processes and their Applications \bvolume40 \bpages127-143. \bdoihttps://doi.org/10.1016/0304-4149(92)90141-C \endbibitem
- [9] {barticle}[author] \bauthor\bsnmLouca, \bfnmStilianos\binitsS. and \bauthor\bsnmPennell, \bfnmMatthew W.\binitsM. W. (\byear2020). \btitleExtant Timetrees Are Consistent With a Myriad of Diversification Histories. \bjournalNature \bvolume580 \bpages502-505. \bdoi10.1038/s41586-020-2176-1 \endbibitem
- [10] {barticle}[author] \bauthor\bsnmMaddison, \bfnmWayne P.\binitsW. P., \bauthor\bsnmMidford, \bfnmPeter E.\binitsP. E. and \bauthor\bsnmOtto, \bfnmSarah P.\binitsS. P. (\byear2007). \btitleEstimating a Binary Character’s Effect on Speciation and Extinction. \bjournalSystematic Biology \bvolume56 \bpages701-710. \bdoi10.1080/10635150701607033 \endbibitem
- [11] {barticle}[author] \bauthor\bsnmMaliet, \bfnmOdile\binitsO., \bauthor\bsnmHartig, \bfnmFlorian\binitsF. and \bauthor\bsnmMorlon, \bfnmHelene\binitsH. (\byear2019). \btitleA model with many small shifts for estimating species-specific diversification rates. \bjournalNature Ecology & Evolution \bvolume3 \bpages1086–1092. \bdoi10.1038/s41559-019-0908-0 \endbibitem
- [12] {barticle}[author] \bauthor\bsnmNee, \bfnmSean\binitsS., \bauthor\bsnmMay, \bfnmRobert Mccredie\binitsR. M. and \bauthor\bsnmHarvey, \bfnmPaul H.\binitsP. H. (\byear1994). \btitleThe Reconstructed Evolutionary Process. \bjournalPhilosophical Transactions of the Royal Society of London. Series B: Biological Sciences \bvolume344 \bpages305-311. \bdoi10.1098/rstb.1994.0068 \endbibitem
- [13] {barticle}[author] \bauthor\bsnmPopovic, \bfnmLea\binitsL. and \bauthor\bsnmRivas, \bfnmMariolys\binitsM. (\byear2016). \btitleTopology and inference for Yule trees with multiple states. \bjournalJ. Math. Biol. \bvolume73 \bpages1251–1291. \bdoi10.1007/s00285-016-0992-6 \endbibitem
- [14] {barticle}[author] \bauthor\bsnmRabosky, \bfnmDaniel L.\binitsD. L. and \bauthor\bsnmGoldberg, \bfnmEmma E.\binitsE. E. (\byear2015). \btitleModel Inadequacy and Mistaken Inferences of Trait-Dependent Speciation. \bjournalSystematic Biology \bvolume64 \bpages340-355. \bdoi10.1093/sysbio/syu131 \endbibitem
- [15] {barticle}[author] \bauthor\bsnmRasmussen, \bfnmDavid A\binitsD. A. and \bauthor\bsnmStadler, \bfnmTanja\binitsT. (\byear2019). \btitleCoupling adaptive molecular evolution to phylodynamics using fitness-dependent birth-death models. \bjournaleLife \bvolume8 \bpagese45562. \bdoi10.7554/eLife.45562 \endbibitem
- [16] {barticle}[author] \bauthor\bsnmStadler, \bfnmTanya\binitsT. (\byear2013). \btitleRecovering Speciation and Extinction Dynamics Based on Phylogenies. \bjournalJournal of Evolutionary Biology \bvolume26 \bpages1203-1219. \bdoihttps://doi.org/10.1111/jeb.12139 \endbibitem
- [17] {barticle}[author] \bauthor\bsnmStadler, \bfnmTanja\binitsT. (\byear2019). \btitleSpecies-specific diversification. \bjournalNature Ecology & Evolution \bvolume3 \bpages1003–1004. \bdoi10.1038/s41559-019-0923-1 \endbibitem
- [18] {barticle}[author] \bauthor\bsnmYule, \bfnmGeorge Udny\binitsG. U. (\byear1925). \btitleA Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. \bjournalPhilosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character \bvolume213 \bpages21-87. \bdoi10.1098/rstb.1925.0002 \endbibitem