This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Expected number of induced subtrees shared by two independent copies of the terminal tree in a critical branching process

Boris Pittel Department of Mathematics, The Ohio State University, Columbus, Ohio 43210, USA [email protected]
Abstract.

Consider a rooted tree TT of minimum degree 22 at least, with leaf-set [n][n]. A rooted tree 𝒯\mathcal{T} with leaf-set S[n]S\subset[n] is induced by SS in TT if 𝒯\mathcal{T} is the lowest common ancestor subtree for SS, with all its degree-2 vertices suppressed. A “maximum agreement subtree” (MAST) for a pair of two trees TT^{\prime} and T′′T^{\prime\prime} is a tree 𝒯\mathcal{T} with a largest leaf-set S[n]S\subset[n] such that 𝒯\mathcal{T} is induced by SS both in TT^{\prime} and T′′T^{\prime\prime}. Bryant et al. [7] and Bernstein et al. [6] proved, among other results, that for TT^{\prime} and T′′T^{\prime\prime} being two independent copies of a random binary (uniform or Yule-Harding distributed) tree TT, the likely magnitude order of MAST(T,T′′)\text{MAST}(T^{\prime},T^{\prime\prime}) is O(n1/2)O(n^{1/2}). In this paper we prove this bound for a wide class of random rooted trees: TT is a terminal tree of a branching process with an offspring distribution of mean 11, conditioned on “total number of leaves is nn”.

Key words and phrases:
Random, binary tree, asymptotics
2010 Mathematics Subject Classification:
05C30, 05C80, 05C05, 34E05, 60C05

1. Introduction, results

Consider a rooted binary tree TT, with nn leaves (pendant vertices) labelled by elements from [n][n]. We visualize this tree with the root on top and the leaves at bottom. Given S[n]S\subset[n], let v(S)V(T)v(S)\in V(T) denote the lowest common ancestor of leaves in SS, (LCA(S)\text{LCA}(S).) Introduce the subtree of TT formed by the paths from v(S)v(S) to leaves in SS. Ignoring (suppressing) degree-2 vertices of this subtree (except the root itself), we obtain a rooted binary tree with leaf-set SS. This binary tree is called “a tree induced by SS in TT”.

Finden and Gordon [11] and Gordon [13] introduced a notion of a “maximum agreement subtree” (MAST) for a pair of such trees TT^{\prime} and T′′T^{\prime\prime}: it is a tree 𝒯\mathcal{T} with a largest leaf-set S[n]S\subset[n] such that 𝒯\mathcal{T} is induced by SS both in TT^{\prime} and T′′T^{\prime\prime}. In a pioneering paper [7], Bryant, McKenzie and Steel addressed the problem of a likely order of MAST(Tn,Tn′′)\text{MAST}(T^{\prime}_{n},T^{\prime\prime}_{n}) when TnT^{\prime}_{n} and Tn′′T^{\prime\prime}_{n} are two independent copies of a random binary tree TnT_{n}. To quote from [7], such a problem is “relevant when comparing evolutionary trees for the same set of species that have been constructed from two quite different types of data”.

It was proved in [7] that MAST(Tn,Tn′′)(1+o(1))e21/2n1/2\text{MAST}(T^{\prime}_{n},T^{\prime\prime}_{n})\leq(1+o(1))e2^{-1/2}n^{1/2} with probability 1o(1)1-o(1) as nn\to\infty. The proof was based on a remarkable property of the uniformly random rooted binary tree, and of few other tree models, known as “sampling consistency”, see Aldous [1]. As observed by Aldous [3], [4], sampling consistency makes this model conceptually close to a uniformly random permutation of [n][n]. Combinatorially, it means that a rooted binary tree 𝒯\mathcal{T} with with a leaf-set S[n]S\subset[n], |S|=s|S|=s, is induced by SS in exactly (2n3)!!(2a3)!!\frac{(2n-3)!!}{(2a-3)!!} rooted binary trees with leaf-set [n][n], regardless of choice of 𝒯\mathcal{T}. Probabilistically, the rooted binary tree induced by SS in TnT_{n} is distributed uniformly on the set of all (2s3)!!(2s-3)!! such trees. Mike Steel [23] pointed out that the sampling consistency of the rooted binary tree follows directly from a recursive process for generating all the rooted trees in which SS induces 𝒯\mathcal{T}.

Bernstein, Ho, Long, Steel, St. John, and Sullivant [6] established a qualitatively similar upper bound O(n1/2)O(n^{1/2}) for the likely size of a common induced subtree in a harder case of Yule-Harding tree, again relying on sampling consistency of this tree model. Recently Misra and Sullivant [19] were able to prove the estimate Θ(n1/2)\Theta(n^{1/2}) for the case when two independent binary trees with nn labelled leaves are obtained by selecting independently, and uniformly at random, two leaf-labelings of the same unlabelled tree. Using the classic results on the length of the longest increasing subsequence in the uniformly random permutation, the authors of [6] established a first power-law lower bound cn1/8cn^{1/8} for the likely size of the common induced subtree in the case of the uniform rooted binary tree, and a lower bound cnao(1)cn^{a-o(1)}, a=0.344a=0.344\dots, for the Yule-Harding model. Very recently, Aldous [3] proved that a maximum agreement rooted subtree for two independent, uniform, unrooted trees is likely to have n312o(1)n0.366n^{\frac{\sqrt{3}-1}{2}-o(1)}\approx n^{0.366} leaves, at least. It was mentioned in [3] that an upper bound O(n1/2)O(n^{1/2}) could be obtained by “the first moment method (calculating the expected number of large common subtrees)”.

In this paper we show that the total number of unrooted trees with leaf-set [n][n], which contains a rooted subtree induced by S[n]S\subset[n], s<ns<n, is (2n5)!!(2s3)!!\frac{(2n-5)!!}{(2s-3)!!}. It follows that a rooted binary tree induced by SS in the uniformly random unrooted tree on [n][n] is again distributed uniformly on the set of all (2s3)!!(2s-3)!! rooted trees. Using the asymptotic estimate from [7], we have: a maximum agreement rooted subtree for two independent copies of the uniformly random unrooted tree is likely to have at most (1+o(1))e21/2n1/2(1+o(1))e2^{-1/2}n^{1/2} leaves.

The proof of this (2n5)!!(2s3)!!\frac{(2n-5)!!}{(2s-3)!!} result suggested that a bound O(n1/2)O(n^{1/2}) might, just might, be obtained for a broad class of random rooted trees that includes the rooted binary tree, by using a probabilistic counterpart of the two-phase counting procedure. Consider a Markov branching process with a given offspring distribution 𝕡={pj}j0\mathbb{p}=\{p_{j}\}_{j\geq 0}. If p0>0p_{0}>0 and jjpj=1\sum_{j}jp_{j}=1 (critical case), then the process is almost surely extinct.

Let TtT_{t} be the random terminal tree, and let TnT_{n} be TtT_{t} conditioned on the event “TtT_{t} has nn leaves”, which we label, uniformly at random, by elements of [n][n]. The uniform binary rooted tree is a special case corresponding to p0=p2=1/2p_{0}=p_{2}=1/2. In general, we assume that p1=0p_{1}=0, g.c.d.(j:pj+1>0)=1\text{g.c.d.}(j:\,p_{j+1}>0)=1, and that P(s):=jpjsjP(s):=\sum_{j}p_{j}s^{j} has convergence radius R>1R>1, all the conditions being met by the binary case. We will show that n:=(Tt has n leaves)>0\mathbb{P}_{n}:=\mathbb{P}(T_{t}\text{ has }n\text{ leaves})>0 for all nn, meaning that TnT_{n} is well-defined for all nn.

Finally, an out-degree of a vertex in TnT_{n} may now exceed 22. So we add to the definition of a tree 𝒯\mathcal{T}, induced by SS in TnT_{n}, a condition: the out-degree of every vertex from V(𝒯)V(\mathcal{T}) in TnT_{n} is the same as its out-degree in 𝒯\mathcal{T}.

Under the conditions above, we prove that a rooted binary tree 𝒯\mathcal{T} with leaf-set S[n]S\subset[n], and edge-set E(𝒯)E(\mathcal{T}), is induced by SS in TnT_{n} with probability

(ns)!n!n(𝒯)[yns](1Φ1(y))e(𝒯)Φ(y),(e(𝒯):=|E(𝒯)|),\displaystyle\,\,\frac{(n-s)!}{n!\,\mathbb{P}_{n}}\,\mathbb{P}(\mathcal{T})\,[y^{n-s}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y),\quad(e(\mathcal{T}):=|E(\mathcal{T})|),
Φ(y)=P(Φ(y))+p0(y1),Φ1(y)=(1p0)1j>1pjΦj1(y);\displaystyle\Phi(y)=P(\Phi(y))+p_{0}(y-1),\quad\Phi_{1}(y)=(1-p_{0})^{-1}\sum_{j>1}p_{j}\Phi^{j-1}(y);
(𝒯)=vVint(𝒯)pd(v,𝒯);d(v,𝒯):=out-degree of v in 𝒯.\displaystyle\quad\quad\mathbb{P}(\mathcal{T})=\!\!\!\prod_{v\in V_{\text{int}}(\mathcal{T})}\!\!p_{d(v,\mathcal{T})};\quad d(v,\mathcal{T}):=\text{out-degree of }v\text{ in }\mathcal{T}.

Here Φ(y)\Phi(y) (Φ1(y)\Phi_{1}(y), respectively) is the probability generating function of the total number of leaves in the terminal tree (conditioned on the event “the progenitor has at least two children”, respectively.). We will check that for p0=p2=1/2p_{0}=p_{2}=1/2 this formula reduces to 1(2s3)!!\frac{1}{(2s-3)!!}. Note that in general, because of the factor (𝒯)\mathbb{P}(\mathcal{T}), and e(𝒯)e(\mathcal{T}), the probability of 𝒯\mathcal{T} being induced by SS in TnT_{n} depends not only on |S||S|, but also on the whole out-degree sequence of 𝒯\mathcal{T}.

We use the above identity to prove the following claim. Let

c(𝕡):=λe3/2(1j=2pj2)1/2,\displaystyle\qquad\qquad\qquad\qquad c(\mathbb{p}):=\lambda e^{3/2}\Bigl{(}1-\sum\nolimits_{j=2}^{\infty}p_{j}^{2}\Bigr{)}^{1/2},
λ:=max(χ4,χ2),χ:=(2p03)1/2(1p0)σ,σ2:=j=2j(j1)pj.\displaystyle\,\,\lambda:=\max(\chi^{-4},\chi^{-2}),\quad\chi:=(2p_{0}^{3})^{-1/2}(1-p_{0})\sigma,\quad\sigma^{2}:=\sum\nolimits_{j=2}^{\infty}j(j-1)p_{j}.

Then, for ε(0,1/2]\varepsilon\in(0,1/2], with probability 1(1ε)(1+ε)c(𝕡)n1/2\geq 1-(1-\varepsilon)^{(1+\varepsilon)c(\mathbb{p})n^{1/2}}, the largest number of leaves in an induced subtree shared by two independent copies of the conditioned terminal tree TnT_{n} is at most (1+ε)c(𝕡)n1/2(1+\varepsilon)c(\mathbb{p})n^{1/2}.

For a wide ranging exposition of combinatorial/probabilistic problems and methods in theory of phylogeny, we refer the reader to a book [22] by Steel.

2. Uniform binary trees

Consider a rooted binary tree TT with leaf-set [n][n]. For a given S[n]S\subset[n], there exists a subtree with leaf-set SS, which is rooted at the lowest vertex common to all |S||S| paths leading away from SS toward the root of TT. The vertex set of this lowest common ancestor (LCA) tree is the set of all vertices from the paths in question. Ignoring degree-2 vertices of this subtree (except the root itself), we obtain a rooted binary tree 𝒯\mathcal{T}. This LCA subtree has a name “a tree induced by SS in TT”, see [3].

Let Tn,Tn′′T^{\prime}_{n},T^{{}^{\prime\prime}}_{n} be two independent copies of the uniformly random rooted binary tree with leaf-set [n][n]. Let Xn,aX_{n,a} denote the random total number of leaf-sets S[n]S\subset[n] of cardinality aa that induce the same rooted subtree in TnT_{n}^{\prime} and Tn′′T^{{}^{\prime\prime}}_{n}. Bernstein et al. [6] proved that

(2.1) 𝔼[Xn,a]=(na)(2a3)!!\mathbb{E}[X_{n,a}]=\frac{\binom{n}{a}}{(2a-3)!!}

The proof was based on sampling consistency of the random tree TnT_{n}, so that N(𝒯)N(\mathcal{T}), the number of rooted trees on [n][n] in which SS induces a given rooted tree 𝒯\mathcal{T} on SS, is (2n3)!!(2a3)!!\frac{(2n-3)!!}{(2a-3)!!}, thus dependent only on the leaf-set size.

Following [3] (see Introduction), we consider the case when a binary tree with leaf-set [n][n] is unrooted. Let now Tn,Tn′′T^{\prime}_{n},T^{{}^{\prime\prime}}_{n} be two independent copies of the uniformly random (unrooted) binary tree with leaf-set [n][n]. Let Xn,aX_{n,a} denote the random total number of leaf-sets S[n]S\subset[n] of cardinality aa that induce the same rooted subtree in TnT_{n}^{\prime} and Tn′′T^{{}^{\prime\prime}}_{n}.

Lemma 2.1.

Let a=|S|<na=|S|<n. Then

(2.2) 𝔼[Xn,a]=(na)(2a3)!!.\mathbb{E}[X_{n,a}]=\frac{\binom{n}{a}}{(2a-3)!!}.

Equivalently 𝒩(𝒯)\mathcal{N}(\mathcal{T}), the number of unrooted trees TT on [n][n], in which SS induces a given rooted tree 𝒯\mathcal{T} with leaf set SS, is (2n5)!!(2a3)!!\frac{(2n-5)!!}{(2a-3)!!}.

Note. So the expectation is the same as for the rooted trees TT on [n][n].

Proof.

For an unrooted tree TT with nn leaves, the notion of a rooted subtree induced by a leaf-set SS with |S|=a>1|S|=a>1 makes sense only for a<na<n, and this subtree uniquely exists for any such SS. Indeed a vertex vv adjacent to any fixed leaf [n]S\ell^{*}\in[n]\setminus S is joined by a unique path to each leaf in SS. Any other vertex vv^{\prime} common to some aa paths emanating from aa leaves must be common to all the paths from SS to vv. It follows that there exists a unique vertex vv^{*} which is the LCA of the aa leaves. The paths from vv^{*} to SS form the subtree induced by SS, and \ell^{*} is connected by an external path to vv^{*}, if v\ell^{*}\neq v^{*}.

Let us evaluate 𝒩(𝒯)\mathcal{N}(\mathcal{T}). Consider a generic rooted tree with aa leaves. For 𝒯\mathcal{T} to be induced by its leaves in TT with nn leaves, it has to be obtained by ignoring degree-22 (non-root) vertices in the LCA subtree for leaf-set SS.

The outside (third) neighbors of the ignored vertices are the roots of subtrees with some bb leaves from the remaining nan-a leaves, selected in (nab)\binom{n-a}{b} ways. The roots of possible trees, attached to internal points chosen from some of 2(a1)2(a-1) edges of 𝒯\mathcal{T}, can be easily ordered. Introduce F(b,k)F(b,k), the total number of ordered forests of kk rooted trees with bb leaves altogether. By Lemma 4 of Carter et al. [9] (for the count of unordered trees), we have

(2.3) F(b,k)=k(2bk1)!(bk)! 2bk.F(b,k)=\frac{k(2b-k-1)!}{(b-k)!\,2^{b-k}}.

It was indicated in [6] that (2.3) follows from

(2.4) F(b,k)=b![xb]B(x)k,B(x):=112x,F(b,k)=b!\cdot[x^{b}]\,B(x)^{k},\quad B(x):=1-\sqrt{1-2x},

(Semple and Steel [21]). For the reader’s convenience here is a sketch proof of (2.4) and (2.3). We have

F(b,k)\displaystyle F(b,k) =b!t1++tk=bj[k](2tj3)!!tj!=b!t1++tk=bj[k]1tj2tj1(2(tj1)tj1)\displaystyle=b!\sum_{t_{1}+\cdots+t_{k}=b}\prod_{j\in[k]}\frac{(2t_{j}-3)!!}{t_{j}!}=b!\sum_{t_{1}+\cdots+t_{k}=b}\prod_{j\in[k]}\frac{1}{t_{j}2^{t_{j}-1}}\binom{2(t_{j}-1)}{t_{j}-1}
=b![xb][t1xtt2t1(2(t1)t1)]k=b![xb]B(x)k=k(2bk1)!(bk)! 2bk;\displaystyle=b![x^{b}]\biggl{[}\sum_{t\geq 1}\frac{x^{t}}{t2^{t-1}}\binom{2(t-1)}{t-1}\biggr{]}^{k}=b![x^{b}]B(x)^{k}=\frac{k(2b-k-1)!}{(b-k)!\,2^{b-k}};

for the last two steps we used Equations (2.5.10), (2.5.16) in Wilf [24].

Introduce (b,k)\mathcal{F}(b,k), the total number of the ordered forests of kk binary trees with roots attached to internal points of 𝒯\mathcal{T}’s edges, with bb leaves altogether. (These leaves have to be chosen from [n](S{}[n]\setminus(S\cup\{\ell^{*}\}, so bna1b\leq n-a-1. ) Since the total number of integer compositions (ordered partitions) of kk with j2(a1)j\leq 2(a-1) positive parts is

(k1j1)(2(a1)j)=(k1j1)(2(a1)2(a1)j),\binom{k-1}{j-1}\binom{2(a-1)}{j}=\binom{k-1}{j-1}\binom{2(a-1)}{2(a-1)-j},

(2.3) implies

(2.5) (b,k)=F(b,k)j2(a1)(k1j1)(2(a1)2(a1)j)=F(b,k)(k+2a32a3)=b!(k+2a32a3)[xb]B(x)k.\mathcal{F}(b,k)=F(b,k)\sum_{j\leq 2(a-1)}\binom{k-1}{j-1}\binom{2(a-1)}{2(a-1)-j}\\ =F(b,k)\binom{k+2a-3}{2a-3}=b!\binom{k+2a-3}{2a-3}[x^{b}]B(x)^{k}.

Now, kb(b,k)\sum_{k\leq b}\mathcal{F}(b,k) is the total number of ways to expand the host subtree into a full binary subtree rooted at the lowest common ancestor of the aa leaves. To evaluate this sum, first denote α=2a3\alpha=2a-3, β=B(x)\beta=B(x) and write

k0(k+αα)βk\displaystyle\sum_{k\geq 0}\binom{k+\alpha}{\alpha}\beta^{k} =k0(β)k(α1k)=(1β)α1.\displaystyle=\sum_{k\geq 0}(-\beta)^{k}\binom{-\alpha-1}{k}=(1-\beta)^{-\alpha-1}.

Therefore

kb(k+αα)[xb]B(x)k\displaystyle\sum_{k\leq b}\binom{k+\alpha}{\alpha}[x^{b}]B(x)^{k} =[xb]k0(k+αα)B(x)k=[xb]1(1B(x))α+1\displaystyle=[x^{b}]\sum_{k\geq 0}\binom{k+\alpha}{\alpha}B(x)^{k}=[x^{b}]\frac{1}{(1-B(x))^{\alpha+1}}
=[xb](12x)α+12.\displaystyle=[x^{b}](1-2x)^{-\frac{\alpha+1}{2}}.

We conclude that

(2.6) kb(b,k)=b!.[xb](12x)α+12|α=2a3=b![xb](12x)(a1).\sum_{k\leq b}\mathcal{F}(b,k)=b!\Bigl{.}[x^{b}](1-2x)^{-\frac{\alpha+1}{2}}\Bigr{|}_{\alpha=2a-3}=b![x^{b}](1-2x)^{-(a-1)}.

Recall that bb leaves were chosen from [n](S{})[n]\setminus(S\cup\{\ell^{*}\}). If b=na1b=n-a-1, then attaching the single remaining leaf to the root vv^{*} we get a binary tree TT with leaf-set [n][n]. If bna2b\leq n-a-2, we view the expanded subtree as a single leaf, and form an unrooted binary tree with 1+(nab)31+(n-a-b)\geq 3 leaves, in [2(nab)3]!![2(n-a-b)-3]!! ways. Therefore 𝒩(𝒯)\mathcal{N}(\mathcal{T}) depends on aa only, and with ν:=na1\nu:=n-a-1, it is given by

𝒩(𝒯)=bν(νb)b![xb](12x)(a1)(2(νb)1)!!=ν!bν[xb](12x)(a1)[xνb](12x)1/2=ν![xν](12x)a+1/2=j=0na2(2a1+2j)=(2n5)!!(2a3)!!;\mathcal{N}(\mathcal{T})=\sum_{b\leq\nu}\binom{\nu}{b}b!\,[x^{b}](1-2x)^{-(a-1)}\bigl{(}2(\nu-b)-1\bigr{)}!!\\ =\nu!\sum_{b\leq\nu}[x^{b}](1-2x)^{-(a-1)}\cdot[x^{\nu-b}](1-2x)^{-1/2}\\ =\nu![x^{\nu}](1-2x)^{-a+1/2}=\prod_{j=0}^{n-a-2}(2a-1+2j)=\frac{(2n-5)!!}{(2a-3)!!};

in the first line (1)!!:=1(-1)!!:=1, and for the second line we used

(2k1)!!k!=(2k)!2k(k!)2=2k(2kk)=[xk](12x)1/2.\frac{(2k-1)!!}{k!}=\frac{(2k)!}{2^{k}(k!)^{2}}=2^{-k}\binom{2k}{k}=[x^{k}](1-2x)^{-1/2}.

Consequently

(2.7) 𝔼[Xn,a]=(na)(2a3)!![𝒩(𝒯)(2n5)!!]2=(na)(2a3)!!.\mathbb{E}[X_{n,a}]=\binom{n}{a}(2a-3)!!\biggl{[}\frac{\mathcal{N}(\mathcal{T})}{(2n-5)!!}\biggr{]}^{2}=\frac{\binom{n}{a}}{(2a-3)!!}.

3. Branching Process Framework

Consider a branching process initiated by a single progenitor. This process is visualized as a growing rooted tree. The root is the progenitor, connected by edges to each of the vertices, that represent the root’s “children”, i.e. the root’s immediate offspring. Each of the children becomes the root of the corresponding (sub)tree, so that the ordered children of all these roots are the grandchildren of the progenitor. We obviously get a recursively defined process; it delivers a nested sequence of trees, which is either infinite, or terminates at a moment when none of the members of the last generation have children.

The classic Galton-Watson branching process is the case when the number of each member’s children (a) is independent of those numbers for all members from the preceding and current generations and (b) has the same distribution {pj}j0\{p_{j}\}_{j\geq 0}, (jpj=1)(\sum_{j}p_{j}=1). It is well-known that if p0>0p_{0}>0 and j0jpj=1\sum_{j\geq 0}jp_{j}=1, then the process terminates with probability 11, Harris [15]. Let TtT_{t} denote the terminal tree. Given a finite rooted tree, TT, we have

(Tt=T)=vV(T)pd(v,T),\mathbb{P}(T_{t}=T)=\prod_{v\in V(T)}p_{d(v,T)},

where d(v,T)d(v,T) is the out-degree of vertex vV(T)v\in V(T). Xt:=|V(Tt)|X_{t}:=|V(T_{t})|, the total population size by the extinction time, has the probability generating function (p.g.f) F(x):=𝔼[xXt]F(x):=\mathbb{E}[x^{X_{t}}], |x|1|x|\leq 1, that satisfies

(3.1) F(x)=xP(F(x)),P(ξ):=j0pjξj,(|ξ|1).F(x)=xP(F(x)),\quad P(\xi):=\sum_{j\geq 0}p_{j}\xi^{j},\,\,(|\xi|\leq 1).

Indeed, introducing Fτ(x)F_{\tau}(x) the p.g.f. of the total cardinality of the first τ\tau generations, we have

Fτ+1(x)=xj0pj[Fτ(x)]j=xP(Fτ(x)).F_{\tau+1}(x)=x\sum_{j\geq 0}p_{j}\bigl{[}F_{\tau}(x)\bigr{]}^{j}=xP(F_{\tau}(x)).

So letting τ\tau\to\infty, we obtain (3.1). In the same vein, consider the pair (Xt,Yt)(X_{t},Y_{t}), where Yt:|{vV(Tt):d(v,Tt)=0}|Y_{t}:\bigl{|}\{v\in V(T_{t}):\,d(v,T_{t})=0\}\bigr{|} is the total number of leaves (zero out-degree vertices) of the terminal tree. Then denoting G(x,y)=𝔼[xXtyYt]G(x,y)=\mathbb{E}\bigl{[}x^{X_{t}}y^{Y_{t}}\bigr{]}, (|x|,|y|1)(|x|,\,|y|\leq 1), we have

(3.2) G(x,y)=p0xy+xj1pj[G(x,y)]j=xP(G(x,y))+p0x(y1).G(x,y)=p_{0}xy+x\sum_{j\geq 1}p_{j}\bigl{[}G(x,y)\bigr{]}^{j}=xP(G(x,y))+p_{0}x(y-1).

So, with Φ(y):=𝔼[yY]=G(1,y)\Phi(y):=\mathbb{E}\bigl{[}y^{Y}\bigr{]}=G(1,y), we get

(3.3) Φ(y)=j1pjΦj(y)+p0y=P(Φ(y))+p0(y1).\Phi(y)=\sum_{j\geq 1}p_{j}\Phi^{j}(y)+p_{0}y=P(\Phi(y))+p_{0}(y-1).

Importantly, this identity allows us to deal directly with the leaf set at the extinction moment: k:=[yk]Φ(y)\mathbb{P}_{k}:=[y^{k}]\,\Phi(y) is the probability that TtT_{t} has kk leaves. In particular, 1=[y1]Φ(y)=p0>0\mathbb{P}_{1}=[y^{1}]\Phi(y)=p_{0}>0. More generally, k>0\mathbb{P}_{k}>0 for all k1k\geq 1. meaning that (Tt has k leaves)>0\mathbb{P}(T_{t}\text{ has }k\text{ leaves})>0 for all k1k\geq 1. Indeed, for k2k\geq 2, we have

k=j1pjk1++kj=kk1,,kj1k1kj;\mathbb{P}_{k}=\sum_{j\geq 1}p_{j}\sum_{k_{1}+\cdots+k_{j}=k\atop k_{1},\dots,k_{j}\geq 1}\mathbb{P}_{k_{1}}\cdots\mathbb{P}_{k_{j}};

so the claim follows by easy induction on kk.

If p0=p2=1/2p_{0}=p_{2}=1/2, then the branching process is a nested sequence of binary trees. The equation (3.3) yields

Φ(y)=1(1y)1/2=n1(y2)n(2n3)!!n!,|y|1.\Phi(y)=1-(1-y)^{1/2}=\sum_{n\geq 1}\left(\frac{y}{2}\right)^{n}\frac{(2n-3)!!}{n!},\quad|y|\leq 1.

So the terminal tree TtT_{t} has nn leaves with probability (2n3)!!2nn!>0\frac{(2n-3)!!}{2^{n}n!}>0. On this event, call it AnA_{n}, the total number of vertices is 2n12n-1, and each of rooted binary trees with 2n12n-1 vertices is a value of the terminal tree of the same probability (1/2)2n1(1/2)^{2n-1}. Conditionally on the event AnA_{n}, we label, uniformly at random, the leaves of TtT_{t} by elements of [n][n] and use notation TnT_{n} for the resulting uniformly random, rooted binary tree.

This is a promising sign that we can extend what we did for the uniformly random binary trees, i.e. for p0=p2=1/2p_{0}=p_{2}=1/2, for a more general offspring distribution {pj}\{p_{j}\}.

We continue to assume that p1=0p_{1}=0. The notion of an induced subtree needs to be expanded, since an out-degree of a vertex now may exceed 22. Let 𝒯\mathcal{T} be a tree with a leaf-set S[n]S\subset[n], such that every non-leaf vertex of 𝒯\mathcal{T} has out-degree 22, at least. We say that SS induces 𝒯\mathcal{T} in a tree TnT_{n} provided that: (a) the LCA subtree for SS in TnT_{n} is 𝒯\mathcal{T} if we ignore vertices of total degree 22 in this LCA subtree; (b) the out-degree of every other vertex in the LCA of SS in TnT_{n} is the same as its out-degree in 𝒯\mathcal{T}.

Let 𝒯\mathcal{T} be a tree with the leaf-set SS, |S|=a<n|S|=a<n, bnab\leq n-a. Let An(𝒯,b)AnA_{n}(\mathcal{T},b)\subset A_{n} be the event: (i) some bb elements from [n]S[n]\setminus S are chosen as the leaves of all the complementary subtrees rooted at degree -2 vertices sprinkled on the edges of 𝒯\mathcal{T}, forming a composed tree with a+ba+b leaves; (ii) the tree with nn leaves is obtained by using the remaining nabn-a-b leaves and an extra leaf which is the root of the tree composed in step (i). The event An(𝒯,b)A_{n}(\mathcal{T},b) is partitioned into disjoint (nab)\binom{n-a}{b} events corresponding to all choices to select bb elements of [n]S[n]\setminus S in question.

Let e(𝒯)e(\mathcal{T}) be the total number of edges in 𝒯\mathcal{T}. For each of these choices, on the event An(𝒯,b)A_{n}(\mathcal{T},b) we must have some kbk\leq b trees with ordered roots on some of e(𝒯)e(\mathcal{T}) edges, and with their respective, nonempty, leaf-set labels forming an ordered set partition of the set of bb leaves. The root of each of these trees has one child down the host “edge” of 𝒯\mathcal{T}, and all the remaining children outside edges of 𝒯\mathcal{T}. Since p1=0p_{1}=0, the number of other children of a root is jj with conditional probability (1p0)1pj+1(1-p_{0})^{-1}p_{j+1}. So the number of leaves of the subtrees rooted at the children is ii with probability [yi]Φ1(y)[y^{i}]\Phi_{1}(y), where

(3.4) Φ1(y)=(1p0)1j1pj+1Φj(y).\Phi_{1}(y)=(1-p_{0})^{-1}\sum_{j\geq 1}p_{j+1}\Phi^{j}(y).

Therefore, conditionally on “SS induces 𝒯\mathcal{T}”, a given set of bb elements of [n][n] is the leaf-set of these complementary trees with probability

b!jkb(k1j1)(e(𝒯)j)b1++bk=bt=1k[ybt]Φ1(y)=b!jkb(k1j1)(e(𝒯)e(𝒯)j)[yb]Φ1k(y)=b![yb]k(k+e(𝒯)1k)Φ1k(y)=b![yb](1Φ1(y))e(𝒯).b!\sum_{j\leq k\leq b}\binom{k-1}{j-1}\binom{e(\mathcal{T})}{j}\sum_{b_{1}+\cdots+b_{k}=b}\prod_{t=1}^{k}\,[y^{b_{t}}]\Phi_{1}(y)\\ =b!\!\!\sum_{j\leq k\leq b}\binom{k-1}{j-1}\binom{e(\mathcal{T})}{e(\mathcal{T})-j}[y^{b}]\Phi_{1}^{k}(y)=b![y^{b}]\sum_{k}\binom{k+e(\mathcal{T})-1}{k}\Phi_{1}^{k}(y)\\ =b![y^{b}](1-\Phi_{1}(y))^{-e(\mathcal{T})}.

Explanation: kk is a generic total number of trees rooted at ordered internal points of some jj edges of 𝒯\mathcal{T}; btb_{t} is a generic number of leaves of a tt-th tree; the product of two binomial coefficients in the top line is the number of ways to pick jj edges of 𝒯\mathcal{T} and to select an ordered, jj-long, composition of kk; the sum is the probability that the kk trees have bb leaves in total.

With these complementary trees attached, we obtain a tree with a+ba+b leaves. So for An(𝒯,b)A_{n}(\mathcal{T},b) to hold, we view the root of this tree (i.e. the root of 𝒯\mathcal{T}) as a leaf and complete determination of a tree with nn leaves by constructing an auxiliary tree with 11 plus (nab)(n-a-b) remaining leaves. Therefore using the definition of Φ(y)\Phi(y), we have

(An(𝒯,b))\displaystyle\mathbb{P}\bigl{(}A_{n}(\mathcal{T},b)\bigr{)} =1n!(nab)×b![yb](1Φ1(y))e(𝒯)\displaystyle=\frac{1}{n!}\binom{n-a}{b}\,\times b!\,[y^{b}](1-\Phi_{1}(y))^{-e(\mathcal{T})}
×(𝒯)(nab+1)![ynab+1]Φ(y).\displaystyle\quad\times\mathbb{P}(\mathcal{T})\cdot(n-a-b+1)!\,[y^{n-a-b+1}]\Phi(y).

Here (𝒯):=vVint(𝒯)pd(v,𝒯)\mathbb{P}(\mathcal{T}):=\prod_{v\in V_{\text{int}}(\mathcal{T})}p_{d(v,\mathcal{T})}, where {d(v,𝒯)}\{d(v,\mathcal{T})\} is the out-degree sequence of non-leaf vertices of 𝒯\mathcal{T}.

Using j[yj]Φ(y)=[yj1]Φ(y)j\,[y^{j}]\Phi(y)=[y^{j-1}]\Phi^{\prime}(y), j1j\geq 1, and summing the last equation for 0bna0\leq b\leq n-a, we obtain

(3.5) (An(𝒯))=(na)!n!(𝒯)[yna](1Φ1(y))e(𝒯)Φ(y).\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)}=\frac{(n-a)!}{n!}\,\mathbb{P}(\mathcal{T})\,[y^{n-a}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y).

For a partial check of (3.5), let us return to p0=p2=1/2p_{0}=p_{2}=1/2. Here

(𝒯)=p2a1=2a+1,Φ(y)=1(1y)1/2,Φ1(y)=Φ(y).\mathbb{P}(\mathcal{T})=p_{2}^{a-1}=2^{-a+1},\quad\Phi(y)=1-(1-y)^{1/2},\quad\Phi_{1}(y)=\Phi(y).

Therefore

[yna](1Φ1(y))e(𝒯)Φ(y)=[yna](1Φ(y))2a+2Φ(y)=[yna](1y)a+112(1y)1/2=12[yna](1y)a+1/2=2n+a1(2n3)!!(na)!(2a3)!!,[y^{n-a}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y)=[y^{n-a}](1-\Phi(y))^{-2a+2}\Phi^{\prime}(y)\\ =[y^{n-a}](1-y)^{-a+1}\cdot\frac{1}{2}(1-y)^{-1/2}=\frac{1}{2}[y^{n-a}](1-y)^{-a+1/2}\\ =2^{-n+a-1}\frac{(2n-3)!!}{(n-a)!\,(2a-3)!!},

so, by (3.5), we have

(3.6) (An(𝒯))=(2n3)!!2nn!(2a3)!!.\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)}=\frac{(2n-3)!!}{2^{n}n!\,(2a-3)!!}.

Since (An)=(2n3)!!2nn!\mathbb{P}(A_{n})=\frac{(2n-3)!!}{2^{n}n!}, we conclude that

(An(𝒯)|An)=(An(𝒯))(An)=1(2a3)!!,\mathbb{P}(A_{n}(\mathcal{T})|\,A_{n})=\frac{\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)}}{\mathbb{P}(A_{n})}=\frac{1}{(2a-3)!!},

for every binary tree 𝒯\mathcal{T} with leaf-set S[n]S\subset[n], |S|=a|S|=a. The LHS is the probability that SS induces 𝒯\mathcal{T} in the uniformly random binary tree TnT_{n}.

To summarize, we proved

Lemma 3.1.

Consider the branching process with the immediate offspring distribution {pj}\{p_{j}\}, such that p0>0p_{0}>0, p1=0p_{1}=0, and j0jpj=1\sum_{j\geq 0}jp_{j}=1. With probability 11, the process eventually stops, so that a finite terminal tree TtT_{t} is a.s. well-defined. (1) Setting An={Tt has n leaves}A_{n}=\{T_{t}\text{ has }n\text{ leaves}\}, we have

(An)=[yn]Φ(y),Φ(y)=P(Φ(y))+p0(y1),\mathbb{P}(A_{n})=[y^{n}]\Phi(y),\quad\Phi(y)=P(\Phi(y))+p_{0}(y-1),

where P(η):=j0ηjpjP(\eta):=\sum_{j\geq 0}\eta^{j}p_{j}. (2) On the event AnA_{n}, we define TnT_{n} as the tree TtT_{t} with leaves labelled uniformly at random by elements from [n][n]. Given a rooted tree 𝒯\mathcal{T} with leaf-set S[n]S\subset[n], set An(𝒯)A_{n}(\mathcal{T}):= “AnA_{n} holds and SS induces 𝒯\mathcal{T} in TnT_{n}”. Then (An(𝒯)|An)=(An(𝒯))(An)\mathbb{P}(A_{n}(\mathcal{T})|\,A_{n})=\frac{\mathbb{P}(A_{n}(\mathcal{T}))}{\mathbb{P}(A_{n})}, where

(3.7) (An(𝒯))=(na)!n!(𝒯)[yna](1Φ1(y))e(𝒯)Φ(y),\displaystyle\qquad\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)}=\frac{(n-a)!}{n!}\,\mathbb{P}(\mathcal{T})\,[y^{n-a}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y),
Φ(y)=P(Φ(y))+p0(y1),Φ1(y)=(1p0)1j>1pjΦj1(y);\displaystyle\Phi(y)=P(\Phi(y))+p_{0}(y-1),\quad\Phi_{1}(y)=(1-p_{0})^{-1}\sum_{j>1}p_{j}\Phi^{j-1}(y);
(𝒯)=vVint(𝒯)pd(v,𝒯);\displaystyle\qquad\qquad\qquad\quad\mathbb{P}(\mathcal{T})=\!\!\!\prod_{v\in V_{\text{int}}(\mathcal{T})}\!\!p_{d(v,\mathcal{T})};

Vint(𝒯)V_{\text{int}}(\mathcal{T}) and e(𝒯)e(\mathcal{T}) are the set of non-leaves and the number of edges of 𝒯\mathcal{T}.

Note. For p0=p2=1/2p_{0}=p_{2}=1/2, (An(𝒯))\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)} turned out to be dependent only on the number of leaves of 𝒯\mathcal{T}. The formula (3.7) clearly shows that, in general, this probability depends on shape of 𝒯\mathcal{T}. Fortunately, this dependence is confined to a single factor (𝒯)\mathbb{P}(\mathcal{T}), since the rest depends on two scalars, aa and e(𝒯)e(\mathcal{T}). Importantly, these quantities are of the same order of magnitude. Indeed, if (𝒯)>0\mathbb{P}(\mathcal{T})>0 then e(𝒯)max(a,2|Vint(𝒯)|)e(\mathcal{T})\geq\max\bigl{(}a,2|V_{\text{int}}(\mathcal{T})|\bigr{)}, i.e.

ae(𝒯)=|V(𝒯)|1=|Vint(𝒯)|+a1e(𝒯)2+a1,a\leq e(\mathcal{T})=|V(\mathcal{T})|-1=|V_{\text{int}}(\mathcal{T})|+a-1\leq\frac{e(\mathcal{T})}{2}+a-1,

so that

(3.8) ae(𝒯)2(a1).a\leq e(\mathcal{T})\leq 2(a-1).

3.1. Asymptotics

From now on we assume that the series jpjsj\sum_{j}p_{j}s^{j} has convergence radius R>1R>1.

Lemma 3.2.

Suppose that d:=g.c.d.{j1:pj+1>0}=1d:=\text{g.c.d.}\{j\geq 1:\,p_{j+1}>0\}=1. Let σ2:=j0j(j1)pj\sigma^{2}:=\sum_{j\geq 0}j(j-1)p_{j}, i.e. σ2\sigma^{2} is the variance of the immediate offspring, since j0jpj=1\sum_{j\geq 0}jp_{j}=1. Then

(An)=(2p0)1/2σ(2n3)!!2nn!+O(n2)=(p02πσ2)1/2n3/2+O(n2).\mathbb{P}(A_{n})=\frac{(2p_{0})^{1/2}}{\sigma}\frac{(2n-3)!!}{2^{n}\,n!}+O(n^{-2})=\biggl{(}\frac{p_{0}}{2\pi\sigma^{2}}\biggr{)}^{1/2}\!\!\!n^{-3/2}+O(n^{-2}).
Proof.

According to Lemma 3.1, we need to determine an asymptotic behavior of the coefficient in the power series Φ(z)=n1zn(An)\Phi(z)=\sum_{n\geq 1}z^{n}\mathbb{P}(A_{n}), where Φ(z)\Phi(z) is given implicitly by the functional equation Φ(z)=j0pjΦj(z)+p0(z1),(|z|1)\Phi(z)=\sum_{j\geq 0}p_{j}\Phi^{j}(z)+p_{0}(z-1),\,(|z|\leq 1).

In 1974 Bender [5] sketched a proof of the following general claim.

Theorem 3.3.

Assume that the power series w(z)=nanznw(z)=\sum_{n}a_{n}z^{n} with nonnegative coefficients satisfies F(z,w)0F(z,w)\equiv 0. Suppose that there exist r>0r>0 and s>0s>0 such that: (i) for some R>rR>r and S>sS>s, F(z,w)F(z,w) is analytic for |z|<R|z|<R and w<Sw<S; (ii) F(r,s)=Fw(r,s)=0F(r,s)=F_{w}(r,s)=0; (iii) Fz(r,s)0F_{z}(r,s)\neq 0 and Fww(r,s)0F_{ww}(r,s)\neq 0; (iv) if |z|r|z|\leq r, |w|s|w|\leq s, and F(z,w)=Fw(z,w)=0F(z,w)=F_{w}(z,w)=0, then z=rz=r and w=sw=s. Then an((rFz(r,s))/(2πFww(r,s)))1/2n3/2rna_{n}\sim\bigl{(}(rF_{z}(r,s))/(2\pi F_{ww}(r,s))\bigr{)}^{1/2}n^{-3/2}r^{-n}.

The remainder term aside, that’s exactly what we claim in Lemma 3.2 for our Φ(z)\Phi(z). The proof in [5] relied on an appealing conjecture that, under the conditions (i)-(iv), rr is the radius of convergence for the power series for w(z)w(z), and z=rz=r is the only singularity for w(z)w(z) on the circle |z|=r|z|=r. However, ten years later Canfield [8] found an example of F(z,w)F(z,w) meeting the four conditions in which rr and the radius of convergence for w(z)w(z) are not the same. Later Meir and Moon found some conditions sufficient for validity of the conjecture. Our equation Φ(z)=P(Φ(z))+p0(z1)\Phi(z)=P(\Phi(z))+p_{0}(z-1) is a special case of w=ϕ(w)+h(z)w=\phi(w)+h(z) considered in [17]. For the conditions from [17] to work in our case, it would have been necessary to have |P(w)/P(w)|1\big{|}P^{\prime}(w)/P(w)\big{|}\leq 1 for complex ww with |w|1|w|\leq 1, a strong condition difficult to check. (An interesting discussion of these issues can be found in an encyclopedic book by Flajolet and Sedgewick [12] and an authoritative survey by Odlyzko [20].)

Let rr be the convergence radius for the powers series representing Φ(z)\Phi(z); so that r1r\geq 1, since (An)1\mathbb{P}(A_{n})\leq 1. By implicit differentiation, we have

limx1Φ(x)=limx1p01w(Φ(x))=,\lim_{x\uparrow 1}\Phi^{\prime}(x)=\lim_{x\uparrow 1}\frac{p_{0}}{1-\mathbb{P}_{w}(\Phi(x))}=\infty,

since limx1Pw(Φ(x))=jjpj=1\lim_{x\uparrow 1}P_{w}(\Phi(x))=\sum_{j}jp_{j}=1. Therefore r=1r=1. Turn to complex zz. For |z|<1|z|<1, we have F(z,Φ(z))=0F(z,\Phi(z))=0, where F(z,w):=p0(z1)+P(w)wF(z,w):=p_{0}(z-1)+P(w)-w is analytic as a function of zz and ww subject to |w|<1|w|<1. Observe that Fw(z,w)=P(w)1=0F_{w}(z,w)=P^{\prime}(w)-1=0 is possible only if |w|1|w|\geq 1, since for |w|<1|w|<1 we have |P(w)|P(|w|)<P(1)=1|P^{\prime}(w)|\leq P^{\prime}(|w|)<P^{\prime}(1)=1. If |w|=1|w|=1 then P(w)=j2jpjwj1=1P^{\prime}(w)=\sum_{j\geq 2}jp_{j}w^{j-1}=1 if and only if w=wk:=exp(i2πkd)w=w_{k}:=\exp\Bigl{(}i\frac{2\pi k}{d}\Bigr{)}, and k=1,,dk=1,\dots,d. Notice also that

(3.9) P(wk)=j0pjwkj=p0+wkj2pjwkj1=p0+wk(1p0).P(w_{k})=\sum_{j\geq 0}p_{j}w_{k}^{j}=p_{0}+w_{k}\sum_{j\geq 2}p_{j}w_{k}^{j-1}=p_{0}+w_{k}(1-p_{0}).

Now, zz is a singular point of Φ(z)\Phi(z) if and only if |z|=1|z|=1 and (P(w)1)|w=Φ(z)=0(P^{\prime}(w)\!-1\!)\big{|}_{w=\Phi(z)}\!=\!0, i.e. if and only if Φ(z)=wk\Phi(z)=w_{k} for some k[d]k\in[d], which is equivalent to

p0(z1)+P(wk)wk=0.p_{0}(z-1)+P(w_{k})-w_{k}=0.

Combination of this condition with (3.9) yields z=wkz=w_{k}. Therefore the set of all singular points of Φ(z)\Phi(z) on the circle |z|=1|z|=1 is the set of all wkw_{k} such that Φ(wk)=wk\Phi(w_{k})=w_{k}. Notice that

P′′(wk)=wk1j2j(j1)pjwkj1=wk1j2j(j1)pj=wk1σ20.P^{\prime\prime}(w_{k})=w_{k}^{-1}\sum_{j\geq 2}j(j-1)p_{j}w_{k}^{j-1}=w_{k}^{-1}\sum_{j\geq 2}j(j-1)p_{j}=w_{k}^{-1}\sigma^{2}\neq 0.

So none of wkw_{k} is an accumulation point of roots of P(w)1P^{\prime}(w)-1 outside the circle |w|=1|w|=1, i.e. {wk}k[d]\{w_{k}\}_{k\in[d]} is the full root set of P(w)1P^{\prime}(w)-1 inside the circle |w|=1+ρ0|w|=1+\rho_{0}, for some small ρ0>0\rho_{0}>0.

Consequently, if d=1d=1, then z=1z=1 is the only singular point of Φ(z)\Phi(z) on the circle |z|=1|z|=1. Define the argument arg(z)\text{arg}(z) by the condition arg(z)[0,2π)\text{arg}(z)\in[0,2\pi). By the analytic implicit function theorem applied to F(z,w)F(z,w), for every zα=eiαz_{\alpha}=e^{i\alpha}, εα2πε\varepsilon\leq\alpha\leq 2\pi-\varepsilon, a small ε>0\varepsilon>0 being fixed, there exists an analytic function Ψα(z)\Psi_{\alpha}(z) defined on Dzα(ρ)D_{z_{\alpha}}(\rho)–an open disc centered at zαz_{\alpha}, of a radius ρ=ρ(ε)<ρ0\rho=\rho(\varepsilon)<\rho_{0} small enough to make ε/2arg(z)2πε/2\varepsilon/2\leq\text{arg}(z)\leq 2\pi-\varepsilon/2 for all zDzαz\in D_{z_{\alpha}}– such that F(z,Ψα(z))=0F(z,\Psi_{\alpha}(z))=0 for zDzαz\in D_{z_{\alpha}} and Ψα(z)=Φ(z)\Psi_{\alpha}(z)=\Phi(z) for zDzαz\in D_{z_{\alpha}} with |z|1|z|\leq 1. Together, these local analytic continuations determine an analytic continuation of Φ(z)\Phi(z) to a function Ψ^(z)\hat{\Psi}(z) determined, and bounded, for zz with |z|<1+ρ|z|<1+\rho, arg(z)[ε,2πε]\text{arg}(z)\in[\varepsilon,2\pi-\varepsilon].

Since z0=1z_{0}=1 is the singular point of Φ(z)\Phi(z), there is no analytic continuation of Φ(z)\Phi(z) for |z|>1|z|>1 and |z1||z-1| small. So instead we delete an interval [1,1+ρ)[1,1+\rho) and continue Φ(z)\Phi(z) analytically into the remaining part of a disc centered at 11. Here is how. We have Fww(1,Φ(1))=P′′(1)=jj(j1)pj=σ2>0F_{ww}(1,\Phi(1))=P^{\prime\prime}(1)=\sum_{j}j(j-1)p_{j}=\sigma^{2}>0. By a “preparation” theorem due to Weierstrass, (Ebeling [10], Krantz [16]), already used by Bender [5] for the same purpose in the general setting, there exist two open discs D1D_{1} and 𝒟1\mathcal{D}_{1} such that for zD1z\in D_{1} and w𝒟1w\in\mathcal{D}_{1} we have

F(z,w)=[(w1)2+c1(z)(w1)+c2(z)]g(z,w),F(z,w)=\bigl{[}(w-1)^{2}+c_{1}(z)(w-1)+c_{2}(z)\bigr{]}g(z,w),

where cj(z)c_{j}(z) are analytic on D1D_{1}, cj(1)=0c_{j}(1)=0, and g(z,w)g(z,w) is analytic, non-vanishing, on D1×𝒟1D_{1}\times\mathcal{D}_{1}. (The degree 22 of the polynomial is exactly the order of the first non-vanishing derivative of F(z,w)F(z,w) with respect to ww at (1,1)(1,1).) So for zD1z\in D_{1} and w𝒟1w\in\mathcal{D}_{1}, F(z,w)=0F(z,w)=0 is equivalent to

(w1)2+c1(z)(w1)+c2(z)=0w=1+(z1)1/2h(z),(w-1)^{2}+c_{1}(z)(w-1)+c_{2}(z)=0\Longrightarrow w=1+(z-1)^{1/2}h(z),

where h(z)h(z) is analytic at z=1z=1. Plugging the power series w=1+(z1)1/2h(z)=1+(z1)1/2j0hj(z1)jw=1+(z-1)^{1/2}h(z)=1+(z-1)^{1/2}\sum_{j\geq 0}h_{j}(z-1)^{j} into equation F(z,w)=0F(z,w)=0, and expanding P(w)P(w) in powers of w1w-1, we can compute the coefficients hjh_{j}. In particular,

w(z)=1γ(1z)1/2+O(|z1|),γ:=(2p0)1/2σ1.w(z)=1-\gamma(1-z)^{1/2}+O(|z-1|),\quad\gamma:=(2p_{0})^{1/2}\sigma^{-1}.

For zz real and z(0,1)z\in(0,1), we have Φ(z)=1γ(1z)1/2+O(1z)\Phi(z)=1-\gamma(1-z)^{1/2}+O(1-z). So to use w(z)w(z) as an extension Ψ~(z)\tilde{\Psi}(z) we need to choose ξ=|ξ|1/2exp(iArg(ξ)/2)\sqrt{\xi}=|\xi|^{1/2}\exp\bigl{(}i\text{Arg}(\xi)/2\bigr{)}, where Arg(ξ)(π,π)\text{Arg}(\xi)\in(-\pi,\pi).

The continuations Ψ^(z)\hat{\Psi}(z) and Ψ~(z)\tilde{\Psi}(z) together determine an analytic continuation of Φ(z)\Phi(z) into a function Ψ(z)\Psi(z) which is analytic and bounded on a disc D=D0(1+ρ)D^{*}=D_{0}(1+\rho^{*}) minus a cut [1,1+ρ)[1,1+\rho^{*}), ρ>0\rho^{*}>0 being chosen sufficiently small, such that

Ψ(z)=zD[1,1+ρ)z11γ(1z)1/2+O(|z1|).\Psi(z)\underset{z\in D^{*}\setminus[1,1+\rho^{*})\atop z\to 1}{=}1-\gamma(1-z)^{1/2}+O(|z-1|).

It follows that

(An)=12πi|z|=1Φ(z)zn+1𝑑z=12π|z|=1+ρΨ(z)zn+1+12πi11+ρ2iγ(x1)1/2+O(x1)xn+1𝑑x=O((1+ρ)n+n2)+γπ1(x1)1/2xn+1𝑑x.\mathbb{P}(A_{n})=\frac{1}{2\pi i}\oint_{|z|=1}\frac{\Phi(z)}{z^{n+1}}\,dz\\ =\frac{1}{2\pi}\oint_{|z|=1+\rho*}\frac{\Psi(z)}{z^{n+1}}+\frac{1}{2\pi i}\int_{1}^{1+\rho*}\frac{2i\gamma(x-1)^{1/2}+O(x-1)}{x^{n+1}}\,dx\\ =O\bigl{(}(1+\rho^{*})^{-n}+n^{-2}\bigr{)}+\frac{\gamma}{\pi}\int_{1}^{\infty}\frac{(x-1)^{1/2}}{x^{n+1}}\,dx.

For the second line we integrated Ψ(z)/zn+1\Psi(z)/z^{n+1} along the limit contour : it consists of the directed circular arc z=(1+ρ)eiαz=(1+\rho^{*})e^{i\alpha}, 0<α<2π0<\alpha<2\pi and a detour part formed by two opposite-directed line segments, one from z=(1+ρ)ei(2π0)z=(1+\rho^{*})e^{i(2\pi-0)} to z=ei(2π0)z=e^{i(2\pi-0)} and another from z=ei(+0)z=e^{i(+0)} to z=(1+ρ)ei(+0)z=(1+\rho^{*})e^{i(+0)}. By the formula 3.191(2) from Gradshteyn and Ryzik [14], we have

1(x1)1/2xn+1𝑑x=B(n1/2,3/2)=Γ(n1/2)Γ(3/2)Γ(n+1)\displaystyle\int_{1}^{\infty}\frac{(x-1)^{1/2}}{x^{n+1}}\,dx=B(n-1/2,3/2)=\frac{\Gamma(n-1/2)\,\Gamma(3/2)}{\Gamma(n+1)}
=m=2n(n2m12)n!12Γ2(1/2)=π(2n3)!!2nn!.\displaystyle=\frac{\prod_{m=2}^{n}\Bigl{(}n-\frac{2m-1}{2}\Bigr{)}}{n!}\cdot\frac{1}{2}\Gamma^{2}(1/2)=\pi\frac{(2n-3)!!}{2^{n}\,n!}.

Therefore

γπ1(x1)1/2xn+1𝑑x=γ(2n3)!!2nn!=γ2π1/2n3/2+O(n2).\frac{\gamma}{\pi}\int_{1}^{\infty}\frac{(x-1)^{1/2}}{x^{n+1}}\,dx=\gamma\frac{(2n-3)!!}{2^{n}\,n!}=\frac{\gamma}{2\pi^{1/2}n^{3/2}}+O(n^{-2}).

Recalling that γ=(2p0)1/2σ1\gamma=(2p_{0})^{1/2}\sigma^{-1}, we complete the proof of the Lemma. ∎

We will use 𝕡\mathbb{p} to denote the offspring distribution {pj}\{p_{j}\}. Using Lemma 3.1 and Lemma 3.2 we prove

Theorem 3.4.

Let c(𝕡):=λe3/2(1j2pj2)1/2c(\mathbb{p}):=\lambda e^{3/2}\Bigl{(}1-\sum_{j\geq 2}p_{j}^{2}\Bigr{)}^{1/2}, where λ=max(χ4,χ2)\lambda=\max(\chi^{-4},\chi^{-2}) and χ=(2p03)1/2(1p0)σ\chi=(2p_{0}^{3})^{-1/2}(1-p_{0})\sigma. Then, for ε(0,1/2]\varepsilon\in(0,1/2] and a(1+ε)c(𝕡)n1/2a\geq(1+\varepsilon)c(\mathbb{p})n^{1/2}, we have 𝔼[Xn,a](1ε)a\mathbb{E}[X_{n,a}]\leq(1-\varepsilon)^{a}. Consequently, with probability 1(1ε)(1+ε)c(𝕡)n1/2\geq 1-(1-\varepsilon)^{(1+\varepsilon)c(\mathbb{p})n^{1/2}}, the largest number of leaves in a common induced subtree is at most (1+ε)c(𝕡)n1/2(1+\varepsilon)c(\mathbb{p})n^{1/2}.

Proof.

By Lemma 3.1,

(3.10) (An(𝒯))=(na)!n!(𝒯)[yna](1Φ1(y))e(𝒯)Φ(y),\displaystyle\qquad\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)}=\frac{(n-a)!}{n!}\,\mathbb{P}(\mathcal{T})\,[y^{n-a}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y),
Φ(y)=P(Φ(y))+p0(y1),Φ1(y)=(1p0)1j>1pjΦj1(y);\displaystyle\Phi(y)=P(\Phi(y))+p_{0}(y-1),\quad\Phi_{1}(y)=(1-p_{0})^{-1}\sum_{j>1}p_{j}\Phi^{j-1}(y);
(𝒯)=vVint(𝒯)pd(v,𝒯).\displaystyle\qquad\qquad\qquad\quad\mathbb{P}(\mathcal{T})=\!\!\!\prod_{v\in V_{\text{int}}(\mathcal{T})}\!\!p_{d(v,\mathcal{T})}.

Start with the [yna][y^{n-a}] factor. Observe that

(1Φ1(y))e(𝒯)\displaystyle(1-\Phi_{1}(y))^{-e(\mathcal{T})} =j0(Φ1(y))j(e(𝒯)+j1j),\displaystyle=\sum_{j\geq 0}\bigl{(}\Phi_{1}(y)\bigr{)}^{j}\binom{e(\mathcal{T})+j-1}{j},
Φ(y)\displaystyle\Phi^{\prime}(y) =p01P(Φ(y))=p0j0(P(Φ(y)))j.\displaystyle=\frac{p_{0}}{1-P^{\prime}(\Phi(y))}=p_{0}\sum_{j\geq 0}\bigl{(}P^{\prime}(\Phi(y))\bigr{)}^{j}.

The power series for both Φ(y)\Phi(y) and Φ1(y)\Phi_{1}(y) around y=0y=0, which start with y1y^{1}, have non-negative coefficients only, and so does the power series for P(w)P^{\prime}(w) at w=0w=0. Therefore the power series for (1Φ1(y))e(𝒯)Φ(y)(1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y) around y=0y=0 has only non-negative coefficients. So we obtain a Chernoff-type bound: for all r(0,1)r\in(0,1),

(3.11) [yna](1Φ1(y))e(𝒯)Φ(y)(1Φ1(r))e(𝒯)Φ(r)rna.[y^{n-a}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y)\leq\frac{(1-\Phi_{1}(r))^{-e(\mathcal{T})}\Phi^{\prime}(r)}{r^{n-a}}.

As Φ(r)=1γ(1r)1/2+O(1r)\Phi(r)=1-\gamma(1-r)^{1/2}+O(1-r), we have

1Φ1(y)\displaystyle 1-\Phi_{1}(y) =1(1p0)1j>1pjΦj1(y)=(1p0)1P(Φ(y))p0Φ(y)\displaystyle=1-(1-p_{0})^{-1}\sum_{j>1}p_{j}\Phi^{j-1}(y)=(1-p_{0})^{-1}\frac{P(\Phi(y))-p_{0}}{\Phi(y)}
=11p0+P(1)(Φ(y)1)+O(1r)1γ(1r)1/2+O(1r)\displaystyle\qquad=1-\frac{1-p_{0}+P^{\prime}(1)(\Phi(y)-1)+O(1-r)}{1-\gamma(1-r)^{1/2}+O(1-r)}
=11γ(1r)1/2/(1p0)1γ(1r)1/2(1+O(1r))\displaystyle\qquad\qquad=1-\frac{1-\gamma(1-r)^{1/2}/(1-p_{0})}{1-\gamma(1-r)^{1/2}}(1+O(1-r))
=γp01p0(1r)1/2(1+O((1r)1/2)),\displaystyle\qquad\qquad\qquad=\frac{\gamma p_{0}}{1-p_{0}}(1-r)^{1/2}\bigl{(}1+O((1-r)^{1/2})\bigr{)},

and Φ(y)=O((1r)1/2)\Phi^{\prime}(y)=O((1-r)^{-1/2}). So the RHS is essentially of order f(r):=(1r)e(𝒯)/2rn+af(r):=(1-r)^{-e(\mathcal{T})/2}r^{-n+a}, and f(r)f(r) attains its maximum

[na+e(𝒯)/2]na+e(𝒯)/2(na)na[e(𝒯)/2]e(𝒯)/2cn1/2(na+e(𝒯)/2na)\frac{\bigl{[}n-a+e(\mathcal{T})/2\bigr{]}^{n-a+e(\mathcal{T})/2}}{(n-a)^{n-a}\,\bigl{[}e(\mathcal{T})/2\bigr{]}^{e(\mathcal{T})/2}}\leq cn^{1/2}\binom{n-a+e(\mathcal{T})/2}{n-a}

at rn=nana+e(𝒯)/2r_{n}=\frac{n-a}{n-a+e(\mathcal{T})/2}, which is 1Θ(a/n)1-\Theta(a/n), since e(𝒯)[a,2a]e(\mathcal{T})\in[a,2a] and a=o(n)a=o(n), see (3.8). In addition, the binomial factor is at most (nna)=(na)\binom{n}{n-a}=\binom{n}{a}. So, denoting 𝕡={pj}\mathbb{p}=\{p_{j}\},

[yna](1Φ1(y))e(𝒯)Φ(y)c1n(1p0γp0c2(a/n)1/2)e(𝒯)(na)\displaystyle[y^{n-a}](1-\Phi_{1}(y))^{-e(\mathcal{T})}\Phi^{\prime}(y)\leq c_{1}n\biggl{(}\frac{1-p_{0}}{\gamma p_{0}}-c_{2}(a/n)^{1/2}\biggr{)}^{-e(\mathcal{T})}\binom{n}{a}
c1nλa(na),λ>λ(𝕡)={(1p0γp0)4,1p0γp01,(1p0γp0)2,1p0γp0>1.\displaystyle\leq c_{1}n\lambda^{a}\binom{n}{a},\quad\lambda>\lambda(\mathbb{p})=\left\{\begin{aligned} &\left(\frac{1-p_{0}}{\gamma p_{0}}\right)^{-4},&&\frac{1-p_{0}}{\gamma p_{0}}\leq 1,\\ &\left(\frac{1-p_{0}}{\gamma p_{0}}\right)^{-2},&&\frac{1-p_{0}}{\gamma p_{0}}>1.\end{aligned}\right.

(λ(𝕡)=1\lambda(\mathbb{p})=1 for the benchmark case p0=p2=1/2p_{0}=p_{2}=1/2. ) Hence, using the top line equation in (3.10), we obtain

(An(𝒯))c1na!λa(𝒯).\mathbb{P}\bigl{(}A_{n}(\mathcal{T})\bigr{)}\leq\frac{c_{1}n}{a!}\,\lambda^{a}\,\mathbb{P}(\mathcal{T}).

Since (An)=Θ(n3/2)\mathbb{P}(A_{n})=\Theta(n^{-3/2}), we conclude that

(3.12) (An(𝒯)|An)c1n5/2a!λa(𝒯).\mathbb{P}(A_{n}(\mathcal{T})|\,A_{n})\leq\frac{c_{1}n^{5/2}}{a!}\,\lambda^{a}\,\mathbb{P}(\mathcal{T}).

Recall that (𝒯)=vVint(𝒯)pd(v,𝒯)\mathbb{P}(\mathcal{T})=\prod_{v\in V_{\text{int}}(\mathcal{T})}p_{d(v,\mathcal{T})}, where {d(v,𝒯)}\{d(v,\mathcal{T})\} is the out-degree sequence of non-leaf vertices of a generic 𝒯\mathcal{T} with aa leaves labelled by the elements of SS. The RHS in (3.12) does not depend on how the aa labels are assigned to the leaves. Therefore we have the following upper bound for 𝔼[Xn,a]\mathbb{E}[X_{n,a}]:

𝔼[Xn,a]=(na)𝒯[(An(𝒯)|An)]2a!(na)(c1n5/2λaa!)2𝒯2(𝒯);\mathbb{E}[X_{n,a}]=\binom{n}{a}\sum_{\mathcal{T}}\Bigl{[}\mathbb{P}(A_{n}(\mathcal{T})|\,A_{n})\Bigr{]}^{2}\leq a!\binom{n}{a}\Bigl{(}\frac{c_{1}n^{5/2}\lambda^{a}}{a!}\Bigr{)}^{2}\sum_{\mathcal{T}}\mathbb{P}^{2}(\mathcal{T});

the last sum is over all (finite) rooted trees 𝒯\mathcal{T} with aa unlabelled leaves. Define the probability distribution 𝕢={qj}\mathbb{q}=\{q_{j}\}: q0=1j2pj2>0q_{0}=1-\sum_{j\geq 2}p_{j}^{2}>0, q1=0q_{1}=0, qj=pj2q_{j}=p_{j}^{2} for j2j\geq 2. Then we have

2(𝒯)=vVint(𝒯)pd(v,𝒯)2=q0avV(𝒯)qd(v,𝒯).\mathbb{P}^{2}(\mathcal{T})=\prod_{v\in V_{\text{int}}(\mathcal{T})}\!\!p^{2}_{d(v,\mathcal{T})}=q_{0}^{-a}\prod_{v\in V(\mathcal{T})}q_{d(v,\mathcal{T})}.

Observe that j2jqj<j2jpj=1\sum_{j\geq 2}jq_{j}<\sum_{j\geq 2}jp_{j}=1. Therefore the process with the offspring distribution {qj}\{q_{j}\} is almost surely extinct, implying that

𝒯2(𝒯)=q0a𝒯vV(𝒯)qd(v,𝒯)q0a=(1j2pj2)a.\sum_{\mathcal{T}}\mathbb{P}^{2}(\mathcal{T})=q_{0}^{-a}\sum_{\mathcal{T}}\prod_{v\in V(\mathcal{T})}q_{d(v,\mathcal{T})}\leq q_{0}^{-a}=\Bigl{(}1-\sum\nolimits_{j\geq 2}p_{j}^{2}\Bigr{)}^{-a}.

A close look shows that, in fact,

𝒯2(𝒯)=o(ρa),ρ:=maxη1(ηj2pj2ηj).\sum_{\mathcal{T}}\mathbb{P}^{2}(\mathcal{T})=o(\rho^{-a}),\quad\rho:=\max_{\eta\geq 1}\Bigl{(}\eta-\sum\nolimits_{j\geq 2}p_{j}^{2}\eta^{j}\Bigr{)}.

So using a!(a/e)aa!\geq(a/e)^{a}, (na)(ne/a)a\binom{n}{a}\leq(ne/a)^{a}, we obtain then

𝔼[Xn,a]cn5(na)a!(λ2q0)acn5(na2(eλ)2q0)a0,\mathbb{E}[X_{n,a}]\leq c\frac{n^{5}\binom{n}{a}}{a!}\Bigl{(}\frac{\lambda^{2}}{q_{0}}\Bigr{)}^{a}\leq cn^{5}\biggl{(}\frac{n}{a^{2}}\,\frac{(e\lambda)^{2}}{q_{0}}\biggr{)}^{a}\to 0,

if a(1+ε)c(𝕡)n1/2a\geq(1+\varepsilon)c(\mathbb{p})n^{1/2}, c(𝕡):=eλ(𝕡)q01/2c(\mathbb{p}):=e\lambda(\mathbb{p})q_{0}^{-1/2}. ∎

Acknowledgment. I owe a debt of genuine gratitude to Ovidiu Costin and Jeff McNeal for guiding me to the Weierstrass separation theorem. I thank Daniel Bernstein, Mike Steel, and Seth Sullivant for an important feedback regarding the references [7], [6] and [22].

References

  • [1] D. J. Aldous, Probability distributions on cladograms, Random Discrete Structures, The IMA Volumes in Mathematics and its Applications, 76, D. Aldous and R. Pemantle, Eds., Springer (1996) 1–18.
  • [2] D. J. Aldous and L. Popovic, A critical branching process model for biodiversity, Adv. Appl. Prob., 37 (2005) 1094–1115.
  • [3] D. J. Aldous, On the largest common subtree of random leaf-labelled binary trees, https://arxiv.org/pdf/2006.10545.pdf
  • [4] D. J. Aldous, Open problems: Largest common substructures in probabilistic combinatorics, (2003) https://www.stat.berkeley.edu/-aldous/Research/OP.
  • [5] E. A. Bender, Asymptotic methods in enumeration, SIAM Review, 16 (1974) 485–515.
  • [6] D. I. Bernstein, L. S. T. Ho, C. Long, M. Steel, K. St. John, A. S. Sullivant, Bounds on the expected size of the maximum agreement subtree, SIAM J. Discrete Math., 29 (2015) 2065–2074.
  • [7] D. Bryant, A. McKenzie, and M. Steel, The size of a maximum agreement subtree for random binary trees, in BioConsensus, DIMACS Ser. Discrete Math. Theoret. Comput. Sci.61, AMS (2003) 55–65.
  • [8] E. R. Canfield, Remarks on an asymptotic method in combinatorics, J. Comb. Theory, A, 37 (1984) 348–352.
  • [9] M. Carter, M. Hendy, D. Penny, L. A. Székely, and N. C. Wormald, On the distribution of lengths of evolutionary trees, SIAM J. Discrete Math., 3 (1990) 38–47.
  • [10] W. Ebeling, Functions of Several Complex Variables and Their Singularities, American Mathematical Society (2007).
  • [11] C. R. Finden and A. D. Gordon, Obtaining common pruned trees, J. Classification, 2 (1985) 255–276.
  • [12] P. Flajolet and R. Sedgewick, Analytic Combinatorics, Cambridge University Press (2009).
  • [13] A. Gordon, On the assessment and comparison of classifications, in (R. Tomassine, Ed.), Analyse de Données et Informatique, Le Chesnay, INRIA, France (1980) 149–160.
  • [14] L. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, 6th Edition, Academic Press (2000).
  • [15] T. E. Harris, The Theory of Branching Processes, Springer–Verlag (1963).
  • [16] S. G. Krantz and H. R. Parks, The Implicit Function Theorem: History, Theory, and Applications, Birkhäuser (2013).
  • [17] A. Meir and J. W. Moon, On an asymptotic method in enumeration, J. Comb. Theory, A, 51 (1989) 77–89.
  • [18] A. Meir and J. W. Moon, Erratum: “On an asymptotic method in enumeration”, J. Comb. Theory, A, 52 (1989) 163.
  • [19] P. Misra and S. Sullivant, Bounds on the expected size of the maximum agreement subtree for a given tree shape, SIAM J. Discrete Math., 33 (2019) 2316–2325.
  • [20] A. M. Odlyzko, Asymptotic Enumeration Methods, in Handbook of Combinatorics, II, The MIT Press (1995) 1063–1229.
  • [21] C. Semple and M. Steel, Phylogenetics, Oxford University Press (2003).
  • [22] M. Steel, Phylogeny: Discrete and Random Processes in Evolution, Society for Industrial and Applied Mathematics (2016).
  • [23] M. Steel, Private communication.
  • [24] H. S. Wilf, Generatingfunctionology, 2nd Edition, Academic Press (1994).