This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Species Abundance Distribution and Species Accumulation Curve: A General Framework and Results

Cheuk Ting Lilabel=e1][email protected] [ Department of Information Engineering, The Chinese University of Hong Kong,
Hong Kong SAR, China
   Kim-Hung Lilabel=e2][email protected] [ Asian Cities Research Centre Ltd., New Treasure Centre, San Po Kong,
Hong Kong SAR, China
Abstract

We build a general framework which establishes a one-to-one correspondence between species abundance distribution (SAD) and species accumulation curve (SAC). The appearance rates of the species and the appearance times of individuals of each species are modeled as Poisson processes. The number of species can be finite or infinite. Hill numbers are extended to the framework. We introduce a linear derivative ratio family of models, LDR1\mathrm{LDR}_{1}, of which the ratio of the first and the second derivatives of the expected SAC is a linear function. A D1/D2 plot is proposed to detect this linear pattern in the data. By extrapolation of the curve in the D1/D2 plot, a species richness estimator that extends Chao1 estimator is introduced. The SAD of LDR1\mathrm{LDR}_{1} is the Engen’s extended negative binomial distribution, and the SAC encompasses several popular parametric forms including the power law. Family LDR1\mathrm{LDR}_{1} is extended in two ways: LDR2\mathrm{LDR}_{2} which allows species with zero detection probability, and RDR1\mathrm{RDR}_{1} where the derivative ratio is a rational function. Real data are analyzed to demonstrate the proposed methods. We also consider the scenario where we record only a few leading appearance times of each species. We show how maximum likelihood inference can be performed when only the empirical SAC is observed, and elucidate its advantages over the traditional curve-fitting method.

62P10,
92D40,
Diagnostic plots,
Hill numbers,
Power law,
Rarefaction curve,
Species-time relationship,
Species richness,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs

and

1 Introduction

Estimating the diversity of classes in a population is a problem encountered in many fields. We may be interested in the diversity of words a person know from his/her writings [18], the illegal immigrants from the apprehension records [3], the distinct attributes in a database [24, 16], or the distinct responses to a crowdsourcing query [48]. Among different applications, species abundance is the one that receives most attention. For this reason, it is chosen as the theme of this paper with the understanding that the proposed framework and methods are applicable in other applications as well.

Understanding the species abundance in an ecological community has long been an important task for ecologists. Such knowledge is paramount in conservation planning and biodiversity management [35]. An exhaustive species inventory is too labor and resource intensive to be practical. Information about species abundance can thus be acquired mainly through a survey.

Let N=(N0,N1,)N=(N_{0},N_{1},...) where NkN_{k} is the number of species in a community that are represented exactly kk times in a survey. We do not observe the whole NN, but N~=(N1,N2,)\tilde{N}=(N_{1},N_{2},...) which is the zero-truncated NN. In other words, we do not know how many species are not seen in the survey. We call the vector N~\tilde{N}, the frequency of frequencies (FoF) [21].

A plethora of species abundance models have been proposed for N~\tilde{N}. Comprehensive review of the field can be found in Bunge and Fitzpatrick [6] and Matthews and Whittaker [34]. A typical assumption in purely statistical models is

N~N+Multinomial(N+,p),\tilde{N}\mid N_{+}\sim\mathrm{Multinomial}(N_{+},p), (1.1)

where N+=k=1NkN_{+}=\sum_{k=1}^{\infty}N_{k} is the total number of recorded species, and p=(p1,p2,)p=(p_{1},p_{2},\ldots) is a probability vector, such that pkp_{k} is the probability for a randomly selected recorded species to be observed kk times in the survey for k=1,2,k=1,2,\ldots. We call this vector pp, the species abundance distribution (SAD). The great significance of (1.1) relies on three assumptions: (A1) the species names are noninformative; (A2) the observed data for different recorded species are independent and identically distributed (iid), and (A3) for each recorded species, its observed frequency contains all useful information.

Species accumulation curve (SAC) is another popular tool in the analysis of species abundance data. The survey is viewed as a data-collection process in which more and more sampling effort is devoted. The individual-based SAC is the number of recorded species expressed as a function of the amount of sampling effort.

Despite the different emphases of SAD and SAC, the two approaches have an overlapped target: Estimating DD, the total number of species in the community. For SAD, it means estimating N0N_{0}, the number of unseen species as D=N0+N+D=N_{0}+N_{+}. For SAC, DD is the total number of seen species when the sampling effort is unlimited.

As SAD depends largely on the sampling effort, it is necessary to include sampling effort explicitly in the model in order to make comparison of different SADs possible. This addition establishes a link between SAD and SAC. Sampling effort can be of continuous type, such as the area of land or the volume of water sampled, or the duration of the survey. Discrete type sampling effort can be the sample size. To emphasize the sampling effort considered, the SAC is called species-time curve, species-area curve, or species-sample-size curve when the sampling effort is time, area, or sample size respectively. Species-area curve has been studied extensively in the literature. Review of it can be found in Tjørve [46, 47], Dengler [15] and Williams et al. [50]. In this paper, we use time as the measure of sampling effort. We consider the vector NN to be a function of the time tt, denoted as N(t)N(t). Notations NkN_{k}, N~\tilde{N}, N+N_{+}, pp and pkp_{k} are likewise denoted as Nk(t)N_{k}(t), N~(t)\tilde{N}(t), N+(t)N_{+}(t), p(t)p(t) and pk(t)p_{k}(t) respectively. The (empirical) SAC is N+(t)N_{+}(t). Throughout the paper, we assume without loss of generality that the survey starts at time 0 and ends at time t0>0t_{0}>0. Because of the great similarity of the species-time relationship and species-area relationship [39, 32], time and area are treated as if they are interchangeable in this paper. When SAC is a species-area curve, we refer to the Type I species-area curve [43], where the areas sampled are nested (smaller areas are included in larger areas), resulting in a curve that is always nondecreasing.

The study of the relation between the (empirical) SAD and the (empirical) SAC started early since the introduction of rarefaction curve ([1, 42]) which describes how we interpolate SAC using the observed FoF. Let nk(t0)n_{k}(t_{0}) be the observed Nk(t0)N_{k}(t_{0}) for k=1,2,k=1,2,\ldots, n+(t0)=k=1nk(t0)n_{+}(t_{0})=\sum_{k=1}^{\infty}n_{k}(t_{0}), and n~(t0)={nk(t0)}k1\tilde{n}(t_{0})=\{n_{k}(t_{0})\}_{k\geq 1}. The rarefaction curve for species-time relationship is

N^+(t)=k=1nk(t0)(1(1tt0)k),(0tt0).\hat{N}_{+}(t)=\sum_{k=1}^{\infty}n_{k}(t_{0})\left(1-\left(1-\frac{t}{t_{0}}\right)^{k}\right),~{}~{}~{}(0\leq t\leq t_{0}). (1.2)

The rationale of (1.2) is that 1(1t/t0)k1-(1-t/t_{0})^{k} is the probability for a species with frequency kk in time interval [0,t0][0,t_{0}] to be observed before time tt if the appearance times of it in [0,t0][0,t_{0}] are iid U[0,t0]U[0,t_{0}] distributed. Equivalent formula for species-area curve appears earlier in Arrhenius [1]. Good and Toulmin [22] proposed an extrapolation formula for species-sample-size curve which when expressed as species-time curve is

N^+(t)=n+(t0)+k=1(1)k+1nk(t0)(tt01)k,(t>t0).\hat{N}_{+}(t)=n_{+}(t_{0})+\sum_{k=1}^{\infty}(-1)^{k+1}n_{k}(t_{0})\left(\frac{t}{t_{0}}-1\right)^{k},~{}~{}~{}(t>t_{0}). (1.3)

Equation (1.3) is (1.2) with a change of domain. Define the empirical SAD, p^k(t)=nk(t)/n+(t)\hat{p}_{k}(t)=n_{k}(t)/n_{+}(t). Equation (1.3) becomes

N^+(t)=n+(t0)[1k=1p^k(t0)(1tt0)k]=n+(t0)[1h^t0(1tt0)],\hat{N}_{+}(t)=n_{+}(t_{0})\left[1-\sum_{k=1}^{\infty}\hat{p}_{k}(t_{0})\left(1-\frac{t}{t_{0}}\right)^{k}\right]=n_{+}(t_{0})\left[1-\hat{h}_{t_{0}}\left(1-\frac{t}{t_{0}}\right)\right], (1.4)

where h^t(s)\hat{h}_{t}(s) is an estimator of ht(s)h_{t}(s), the probability generating function of the SAD at time tt. Equation (1.4) delineates a simple and yet convincing formula to find SAC from SAD and vice versa. The aim of this paper is to propose and study a statistical framework under which model in (1.1) and the relation in (1.4) hold when N^+(t)\hat{N}_{+}(t), n+(t0)n_{+}(t_{0}) and h^t0(s)\hat{h}_{t_{0}}(s) are replaced by E(N+(t))E(N_{+}(t)), E(N+(t0))E(N_{+}(t_{0})) and ht0(s)h_{t_{0}}(s) respectively. Our framework assumes (A1), (A2) and that the appearance times of each species follow a Poisson process which is sufficient for (A3) and the validity of (1.2).

Finite DD is commonplace in SAD approach although no consensus has been reached. Empirically, log-series distribution, an SAD that assumes infinite DD, is one of the most successful models. Many SACs do not have asymptote. It is common that the observed number of rare species is large, and shows no sign to decrease. In Bayesian nonparametric approach in genomic diversity study, Poisson-Dirichlet process which assumes infinite DD is used (see for example Lijoi et al. [31]). In reality, the existence of transient species (species which are observed erratically and infrequently [38]), and the error in the species identification process (a well-known example is the missequencing in pyrosequencing of DNA (see for example Dickie [17])) are continual sources of rare species making the species number larger than expected. In our framework, DD is random and can be finite or infinite with probability one. We use species richness to refer to E(D)E(D) when DD is random, and DD when DD is deterministic.

A special feature of our framework is that species can have zero detection probability. We call such species, zero-rate species, and all other species, positive-rate species. Zero-rate species can either be seen only once or unseen in a survey. If there are finite number of zero-rate species, the probability of observing any of them is zero. Therefore, if zero-rate species is observed in a survey, almost surely there are infinite number of them. We interpret observed individuals of positive-rate species as outcomes of a discrete distribution, where individuals of the same species can appear any nonnegative number of times in a survey. On the other hand, individuals of zero-rate species are outcomes of a continuous distribution, and no two such individuals belong to the same species. In our framework, the distribution is allowed to be a mixture of the two. Suppose we want to estimate the population of a town through recording each person we meet on street. Then tourists from distant countries can be viewed as “zero-rate species”. In the first example in Section 9, it is suspected that the sequencing error in pyrosequencing of DNA may be a source for zero-rate species.

Poisson distribution is a main component in our framework. Though Poisson distribution is common in existing SAD models, time tt scarcely plays a role. In mixed Poisson model (see for example Fisher et al. [20], Bulmer [5]), the observed frequencies of the species are independent and each follows a Poisson distribution with its own rate, say λi\lambda_{i} for species ii. The value DD is a fixed unknown finite value and {λi}\{\lambda_{i}\} are iid sample from a mixing distribution. Another related model was proposed in Zhou et al. [51]. The paper focuses mainly on finite DD. Neither time nor zero-rate species are included in the model.

The outline of this paper is as follows. We propose in Section 2 a new model for the sampling process, called the mixed Poisson partition process (MPPP). We emphasize on the parametric approach where additional assumptions are made on top of the framework so that the process depends only on a few parameters. Once the parameters are estimated, we can make inference on different characteristics of the population, say the SAD at any fixed time, the Hill numbers, or the expected future data. We study N~(t)\tilde{N}(t) in Section 3. In Section 4, we consider the expected species accumulation curve (ESAC). We prove the one-to-one correspondence among (i) an MPPP, (ii) an ESAC which is a Bernstein function that passes through the origin, and (iii) the expected number of recorded species and the SAD at time t0t_{0} such that the first derivative of its probability generating function is absolutely monotone in (,1)(-\infty,1). In Section 5, we introduce LDR1\mathrm{LDR}_{1}, a parametric family of ESAC. The SAD of LDR1\mathrm{LDR}_{1} is the Engen’s extended negative binomial distribution. This family has an attractive property that the ratio of the first and the second derivatives of the ESAC is a linear function of tt. We extend LDR1\mathrm{LDR}_{1} to LDR2\mathrm{LDR}_{2} which allows zero-rate species. In Section 6, a D1/D2 plot for LDR1\mathrm{LDR}_{1} and some diagnostic plots for specified LDR1\mathrm{LDR}_{1} distributions are proposed. Extrapolation of the curve in D1/D2 plot is considered in Section 7. Estimator of species richness is suggested basing on a modified first-order extrapolation. In Section 8, LDR1\mathrm{LDR}_{1} is generalized so that the derivative ratio is a rational function instead of a linear function of tt. Four real data are analyzed in Section 9 to demonstrate the applications of the proposed models and the suggested plots. In Section 10, we propose and study a design where only a few leading appearance times of each species are recorded. In Section 11, we consider the maximum likelihood approach on the empirical SAC. We give a discussion in Section 12.

2 Mixed Poisson Partition Process

Poisson process is the backbone of the framework. A Poisson process is a point process characterized by an intensity measure over the nn-dimensional space n\mathbb{R}^{n} (we usually have n=1n=1 in this paper). The intensity measure which we denote as ω\omega delineates how many points are present on average in different parts of n\mathbb{R}^{n}. More precisely, the number of points in a set AnA\subseteq\mathbb{R}^{n} follows the Poisson(ω(A))\mathrm{Poisson}(\omega(A)) distribution. Furthermore, for any finite collection of disjoint subsets A1,,AknA_{1},\ldots,A_{k}\subseteq\mathbb{R}^{n}, S1,,SkS_{1},\ldots,S_{k} are mutually independent, where SiS_{i} is the number of points in AiA_{i}. If ω(dx)=f(x)dx\omega(dx)=f(x)dx, we call f(x)f(x) the intensity function. A simple example is the homogeneous Poisson process where ω(dx)=λdx\omega(dx)=\lambda dx for a constant λ\lambda. The parameter λ\lambda is called the rate of the process.

If ω\omega is finite (i.e., ω(dx)<\int\omega(dx)<\infty), simulation of a Poisson process can be performed in two steps: (i) simulate the total number of points WPoisson(ω(dx))W\sim\mathrm{Poisson}(\int\omega(dx)), and (ii) simulate X1,,XWX_{1},\ldots,X_{W} iid from the probability measure ω/ω(dx)\omega/\int\omega(dx). Nevertheless, if ω\omega is infinite, then the number of points is infinite, and it is impossible to simulate all the points in the process. In this case, we can only simulate a selected finite subset of points of the process using the thinning property of Poisson processes [29]. For each point xx in the Poisson process, let α(x)[0,1]\alpha(x)\in[0,1] be the probability for the point xx to be selected, and 1α(x)1-\alpha(x) be the probability for it to be discarded. Then the selected points form a Poisson process with intensity measure αω\alpha\omega, where αω(A)=Aα(x)ω(dx)\alpha\omega(A)=\int_{A}\alpha(x)\omega(dx). We can simulate the selected points when the measure αω\alpha\omega is finite using the aforementioned method. For example, if we are interested only in the points that lie in a bounded region AnA\subset\mathbb{R}^{n} of a homogeneous Poisson process with rate λ\lambda, then α(x)=𝟏{xA}\alpha(x)=\mathbf{1}\{x\in A\} where 𝟏{.}\mathbf{1}\{.\} is the indicator function. The number of selected points follows Poisson(λA𝑑x)\mathrm{Poisson}(\lambda\int_{A}dx) distribution, and the selected points are iid uniformly distributed in AA.

We model the observations in a species abundance survey by a stochastic process where rates of the species follows a Poisson process.

Definition: (Mixed Poisson partition process) A mixed Poisson partition process (MPPP), GG, is characterized by a nonzero species intensity measure ν\nu, which is a measure over 0\mathbb{R}_{\geq 0}, the set of all nonnegative real numbers, satisfying

0min{1,λ1}ν(dλ)<.\int_{0}^{\infty}\min\{1,\lambda^{-1}\}\nu(d\lambda)<\infty. (2.1)

The definition of an MPPP consists of three steps:

  1. 1.

    (Generation of positive rates of species) Given ν\nu, define ν~\tilde{\nu} to be a measure over >0\mathbb{R}_{>0} (>0\mathbb{R}_{>0} is the set of positive real numbers) by ν~(dλ)=ν(dλ)/λ\tilde{\nu}(d\lambda)=\nu(d\lambda)/\lambda, (i.e., dν~/dν=1/λd\tilde{\nu}/d\nu=1/\lambda for λ>0\lambda>0). Let λ1,λ2,\lambda_{1},\lambda_{2},\ldots (a finite or countably infinite sequence) be a realization of a Poisson process with intensity measure ν~\tilde{\nu}.

  2. 2.

    (Generation of individuals of positive-rate species) For each λi\lambda_{i} in Step 1, we generate a realization ηi\eta_{i} (independently across ii) of a Poisson process with rate λi\lambda_{i}. The realization ηi\eta_{i} represents the arrival times of a species with rate λi\lambda_{i}.

  3. 3.

    (Generation of individuals of zero-rate species) We generate a realization η0\eta_{0} of a Poisson process with rate ν({0})\nu(\{0\}), independent of η1,η2,\eta_{1},\eta_{2},\ldots. This represents the times of appearance for all the zero-rate species.

Finally, we take G={η1,η2,}η0G=\{\eta_{1},\eta_{2},\ldots\}\cup\eta_{0}. For any i1i\geq 1, all points in ηi\eta_{i} are arrival times of the same species, whereas each point in η0\eta_{0} is from a different zero-rate species (we use a slight abuse of notation to treat each point in η0\eta_{0} as a point process with only one point).

Measures ν\nu and ν~\tilde{\nu} may be finite or infinite. Let Λ=ν(dλ)\Lambda=\int\nu(d\lambda). As the expected number of individuals seen in time interval [0,t][0,t] is equal to tΛt\Lambda (see (3.4) in Section 3), we can interpret Λ\Lambda as the expected total rate. When ν\nu is finite (i.e., Λ<\Lambda<\infty), conditional on the event that there is an individual observed exactly at time tt, the distribution of the rate of the species of that individual is ν/Λ\nu/\Lambda (see Proposition A in Appendix A for a proof). Therefore, we can regard ν\nu as the intensity measure of λ\lambda of the observed individuals. If ν~\tilde{\nu} is finite, the expected number of positive-rate species in the community is finite. From Step 3 in the definition, if ν({0})>0\nu(\{0\})>0, there are infinite number of zero-rate species and they arrive at a constant rate. With probability one, DD is finite if and only if ν({0})=0\nu(\{0\})=0 and ν~(>0)<\tilde{\nu}(\mathbb{R}_{>0})<\infty. In such case, DD follows a Poisson(ν~(>0))\mathrm{Poisson}(\tilde{\nu}(\mathbb{R}_{>0})) distribution. Measure ν~\tilde{\nu} specifies the distribution of the rates of positive-rate species. More precisely, ν~([λ0,λ1])\tilde{\nu}([\lambda_{0},\lambda_{1}]) with λ0>0\lambda_{0}>0 is the average number of species with rate in [λ0,λ1][\lambda_{0},\lambda_{1}]. Measure ν\nu specifies the distribution of the rates of the species of the individuals including those of the zero-rate species. More precisely, ν([λ0,λ1])\nu([\lambda_{0},\lambda_{1}]) is the average number of individuals (per unit time) belonging to species whose rates lie in [λ0,λ1][\lambda_{0},\lambda_{1}]. Condition (2.1) is essential because it is equivalent to the finiteness of the ESAC (i.e., E(N+(t))<E(N_{+}(t))<\infty for any finite nonnegative tt). A proof of it is given in Appendix B. Figure 1 illustrates the generation of the MPPP.

When ν~\tilde{\nu} is infinite, we cannot simulate all λi\lambda_{i}’s in Step 1 in practice. We can use the following method to simulate a realization of the process in interval [0,t0][0,t_{0}]. The probability for a species with rate λ\lambda to be recorded in [0,t0][0,t_{0}] is 1exp(λt0)1-\exp(-\lambda t_{0}). Applying the thinning property, the intensity of the recorded positive-rate species is (1exp(λt0))ν~(1-\exp(-\lambda t_{0}))\tilde{\nu} which is always finite. To include also the recorded zero-rate species, the intensity is (1exp(λt0))λ1ν(1-\exp(-\lambda t_{0}))\lambda^{-1}\nu (we take (1exp(λt0))λ1=t0(1-\exp(-\lambda t_{0}))\lambda^{-1}=t_{0} when λ=0\lambda=0). After generating λ1,λ2,\lambda_{1},\lambda_{2},\ldots according to a Poisson process with this intensity measure, we simulate for each ii, a realization ηi\eta_{i} (independently across ii) of a Poisson process with rate λi\lambda_{i}, conditional on the event that ηi\eta_{i} has at least one point in [0,t0][0,t_{0}] (if λi=0\lambda_{i}=0, then ηi\eta_{i} contains one uniformly distributed point in [0,t0][0,t_{0}]). An alternative equivalent definition that unifies the generation of individuals of zero-rate species and positive-rate species is given in Appendix C.

Refer to caption
Figure 1: An illustration of the MPPP. In step 1, we generate the rates λi\lambda_{i} of the positive-rate species according to a Poisson process with intensity measure ν~\tilde{\nu}. If ν~(>0)<\tilde{\nu}(\mathbb{R}_{>0})<\infty, then this is equivalent to first generating nposn_{pos}, the number of positive-rate species according to Poisson(ν~(>0))\mathrm{Poisson}(\tilde{\nu}(\mathbb{R}_{>0})), and then generating a random sample λ1,,,λnpos\lambda_{1},,\ldots,\lambda_{n_{pos}} of size nposn_{pos} from the probability measure ν~/ν~(>0)\tilde{\nu}/\tilde{\nu}(\mathbb{R}_{>0}). In step 2, we generate the individuals of species ii according to a Poisson process with rate λi\lambda_{i} for i=1,,nposi=1,\ldots,n_{pos}. In step 3, we generate the individuals of zero-rate species (species 6 to 9 in the figure) according to a Poisson process with rate ν({0})\nu(\{0\}).

A special feature of the framework is that DD is random and can be infinite with probability one. This change necessitates modification of biodiversity measures, among which Hill numbers are popular. When DD is deterministic, Hill number of order qq [25] for q0q\geq 0 and q1q\neq 1 is Dq=(i(psp(i))q)1/(1q){}^{q}\!D=\Big{(}\sum_{i}(p_{\mathrm{sp}}(i))^{q}\Big{)}^{1/(1-q)} where psp(i)p_{\mathrm{sp}}(i) is the relative abundance of species ii in an assemblage (the number of individuals of species ii divided by the total population). When q=1q=1, D1{}^{1}\!D is defined as limq1qD=exp(ipsp(i)log(psp(i)))\lim_{q\to 1}\,^{q}\!D=\exp(-\sum_{i}p_{\mathrm{sp}}(i)\log(p_{\mathrm{sp}}(i))). Hill numbers are interpreted as the effective number of species. The order qq determines the sensitivity of the measure to species relative abundances. When q=0q=0, all species regardless its abundance are treated equally. Larger qq imposes more weight to abundant species. Hill numbers encompass three important diversity measures as its special cases. They are the species richness (q=0q=0), the exponential of Shannon entropy (q=1q=1), and the inverse of Simpson index (q=2q=2). A notable property of Hill numbers is that they obey the replication principle: The diversity of an assemblage formed by pooling mm equally abundant and equally large assemblages with no species in common is mm times the diversity of a single assemblage.

Under our framework, each species corresponds to a Poisson process, and the relative abundance of a species is its rate divided by the expected total rate, Λ\Lambda. Reasonable modifications to the definition of Hill numbers are to replace psp(i)p_{\mathrm{sp}}(i) with λ/Λ\lambda/\Lambda, where λ\lambda is the rate of a species, and to replace summation over species with integration with respect to the measure λ1ν\lambda^{-1}\nu so that the integral of λ/Λ\lambda/\Lambda is one. It works well when Λ\Lambda is finite, but fails when Λ\Lambda is infinite. To fix the problem, we need a meaningful surrogate rate which fulfills two requirements: (i) the expected total rate is finite, and (ii) it approaches the true rate λ\lambda as a limit. The first appearance time of a species with rate λ\lambda follows the exponential distribution with density function λexp(λt)\lambda\exp(-\lambda t), which can be regarded as the instantaneous rate of its first appearance at time tt. This rate approaches λ\lambda when tt decreases to zero. The expected total instantaneous rate of first appearances over all species at time tt is Λt=exp(λt)ν(dλ)\Lambda_{t}=\int\exp(-\lambda t)\nu(d\lambda) (=E(N1(t))/t=E(N_{1}(t))/t from (3.1)) which is always finite for positive tt. Replacing psp(i)p_{\mathrm{sp}}(i) with λexp(λt)/Λt\lambda\exp(-\lambda t)/\Lambda_{t} and taking t0t\to 0, we define, the Hill numbers, Dνq{}^{q}\!D_{\nu} for our framework as

Dνq={limt0(Λtqλq1exp(λqt)ν(dλ))1/(1q)(q0,q1)limt0Λtexp(1Λt(logλλt)exp(λt)ν(dλ))(q=1).{}^{q}\!D_{\nu}=\begin{cases}\lim_{t\to 0}\bigg{(}\Lambda_{t}^{-q}\int\lambda^{q-1}\exp(-\lambda qt)\nu(d\lambda)\bigg{)}^{1/(1-q)}&(q\geq 0,q\neq 1)\\ \lim_{t\to 0}\Lambda_{t}\exp\bigg{(}-\frac{1}{\Lambda_{t}}\int\left(\log\lambda-\lambda t\right)\exp(-\lambda t)\nu(d\lambda)\bigg{)}&(q=1).\end{cases}

When ν({0})>0\nu(\{0\})>0 and 0q10\leq q\leq 1, Dνq={}^{q}\!D_{\nu}=\infty. When Λ\Lambda is finite,

Dνq={(Λqλq1ν(dλ))1/(1q)(q0,q1)Λexp(1Λlog(λ)ν(dλ))(q=1).{}^{q}\!D_{\nu}=\begin{cases}\bigg{(}\Lambda^{-q}\int\lambda^{q-1}\nu(d\lambda)\bigg{)}^{1/(1-q)}&(q\geq 0,q\neq 1)\\ \Lambda\exp\bigg{(}-\frac{1}{\Lambda}\int\log(\lambda)\nu(d\lambda)\bigg{)}&(q=1).\end{cases}

Diversity Dνq{}^{q}\!D_{\nu} is non-increasing with respect to qq. Unlike the classical Hill numbers, Dνq{}^{q}\!D_{\nu} can be less than one. For instance, from (3.3), Dν0=E(D){}^{0}\!D_{\nu}=E(D) which can be any positive value. It can be proved that for any positive constant γ\gamma, Dνq=γ(qDν/γ){}^{q}\!D_{\nu}=\gamma(^{q}\!D_{\nu/\gamma}), an analogue of the replication principle for Hill numbers.

3 Frequency of Frequencies

A sufficient statistic for a realization GG of an MPPP in time interval [0,t0][0,t_{0}] is N~(t0)\tilde{N}(t_{0}). Consider a time interval [0,t][0,t]. For each λ\lambda in Step 2 of the definition, it contributes one to one of the counts Nk(t)N_{k}(t), where kk follows a Poisson distribution, Poisson(λt)\mathrm{Poisson}(\lambda t). By the splitting property of Poisson processes [29], their total contribution to Nk(t)N_{k}(t) follows Poisson((λt)k(k!)1eλtν~(dλ))\mathrm{Poisson}(\int(\lambda t)^{k}(k!)^{-1}e^{-\lambda t}\tilde{\nu}(d\lambda)) distribution independent across kk. The zero-rate species only increase N1(t)N_{1}(t) by a Poisson(ν({0})t)\mathrm{Poisson}(\nu(\{0\})t) random variate. Therefore, for k1k\geq 1,

Nk(t)\displaystyle N_{k}(t) \displaystyle\sim Poisson((λt)kexp(λt)k!ν~(dλ)+𝟏{k=1}ν({0})t)\displaystyle\mathrm{Poisson}\left(\int\frac{(\lambda t)^{k}\exp(-\lambda t)}{k!}\tilde{\nu}(d\lambda)+\mathbf{1}\{k=1\}\nu(\{0\})t\right)
=\displaystyle= Poisson(λk1tkexp(λt)k!ν(dλ)),\displaystyle\mathrm{Poisson}\left(\int\frac{\lambda^{k-1}t^{k}\exp(-\lambda t)}{k!}\nu(d\lambda)\right),

(we use the convention that 00=10^{0}=1). Equations (3.2) and (3.3) follows from (3.1)(\ref{eq:Enk}) for k1k\geq 1. The correctness of (3.1) when k=0k=0 follows from (3.2), (3.3) and the fact E(N0(t))=E(D)E(N+(t))E(N_{0}(t))=E(D)-E(N_{+}(t)).

E(Nk(t))\displaystyle E(N_{k}(t)) =\displaystyle= λk1tkexp(λt)k!ν(dλ),(k0)\displaystyle\int\frac{\lambda^{k-1}t^{k}\exp(-\lambda t)}{k!}\nu(d\lambda),~{}~{}~{}(k\geq 0) (3.1)
E(N+(t))\displaystyle E(N_{+}(t)) =\displaystyle= k=1E(Nk(t))=1exp(λt)λν(dλ),\displaystyle\sum_{k=1}^{\infty}E(N_{k}(t))=\int\frac{1-\exp(-\lambda t)}{\lambda}\nu(d\lambda), (3.2)
E(D)\displaystyle E(D) =\displaystyle= limtE(N+(t))=λ1ν(dλ).\displaystyle\lim_{t\to\infty}E(N_{+}(t))=\int\lambda^{-1}\nu(d\lambda). (3.3)

Again for (3.2), we set (1exp(λt))/λ=t(1-\exp(-\lambda t))/\lambda=t when λ=0\lambda=0. When ν({0})>0\nu(\{0\})>0, E(D)E(D) is infinite. Since all elements in {Nk(t)}k1\{N_{k}(t)\}_{k\geq 1} are independent and follow Poisson distribution, variable N+(t)N_{+}(t) is Poisson distributed, and so do DD and N0(t)N_{0}(t) when their expected values are finite.

Write S(t)=k=1kNk(t)S(t)=\sum_{k=1}^{\infty}kN_{k}(t) which is the number of individuals observed before time tt.

E(S(t))=k=1λk1tkexp(λt)(k1)!ν(dλ)=tν(dλ)=tΛ.E(S(t))=\int\sum_{k=1}^{\infty}\frac{\lambda^{k-1}t^{k}\exp(-\lambda t)}{(k-1)!}\nu(d\lambda)=t\int\nu(d\lambda)=t\Lambda. (3.4)

From (3.1), for k0k\geq 0,

(k+1)E(Nk+1(t0))t0=λ(λt0)kexp(λt)k!λν(dλ).\frac{(k+1)E(N_{k+1}(t_{0}))}{t_{0}}=\int\lambda\frac{(\lambda t_{0})^{k}\exp(-\lambda t)}{k!\lambda}\nu(d\lambda).

By the thinning property of Poisson processes,

(λt0)kexp(λt)k!λν(dλ)\frac{(\lambda t_{0})^{k}\exp(-\lambda t)}{k!\lambda}\nu(d\lambda)

is the intensity measure of the rate for the species represented kk times in [0,t0][0,t_{0}]. We can interpret (k+1)E(Nk+1(t0))/t0(k+1)E(N_{k+1}(t_{0}))/t_{0} as the expected total rate for all species represented kk times in [0,t0][0,t_{0}]. As pointed out in Section 2 and from (3.4), Λ=E(S(t0))/t0\Lambda=E(S(t_{0}))/t_{0} is the expected total rate for all species. Conditional on the event that we will observe an individual at a given future time t1>t0t_{1}>t_{0}, the probability that this individual belongs to a species represented kk times in [0,t0][0,t_{0}] is equal to

(k+1)E(Nk+1(t0))/t0E(S(t0))/t0=(k+1)E(Nk+1(t0))E(S(t0))\frac{(k+1)E(N_{k+1}(t_{0}))/t_{0}}{E(S(t_{0}))/t_{0}}=\frac{(k+1)E(N_{k+1}(t_{0}))}{E(S(t_{0}))}

(formal proof is given in Appendix D) which corresponds to the renowned Good-Turing frequency estimator in Good [21]. It is worth pointing out that the above equation holds for individual observed at a given future time rather than the next observed individual.

From the well-known relation between independent Poisson random variables and multinomial distribution, model (1.1) is valid under the framework with

pk(t)=E(Nk(t))E(N+(t))=(k!)1λk1tkexp(λt)ν(dλ)λ1(1exp(λt))ν(dλ),(k=1,2,).p_{k}(t)=\frac{E(N_{k}(t))}{E(N_{+}(t))}=\frac{\int(k!)^{-1}\lambda^{k-1}t^{k}\exp(-\lambda t)\nu(d\lambda)}{\int\lambda^{-1}(1-\exp(-\lambda t))\nu(d\lambda)},~{}~{}~{}~{}(k=1,2,\ldots). (3.5)

Formal proof is given in Appendix E. If E(D)E(D) is finite, limtpk(t)=0\lim_{t\to\infty}p_{k}(t)=0 for any fixed kk. The joint probability mass function of N~(t0)\tilde{N}(t_{0}) is

P(N~(t0)=n~(t0)ν)=exp(E(N+(t0)))k=1(E(Nk(t0)))nk(t0)nk(t0)!.P(\tilde{N}(t_{0})=\tilde{n}(t_{0})\mid\nu)=\exp\left(-E(N_{+}(t_{0}))\right)\prod_{k=1}^{\infty}\frac{(E(N_{k}(t_{0})))^{n_{k}(t_{0})}}{n_{k}(t_{0})!}.

In terms of the expected FoF, the log-likelihood function is

log(({E(Nk(t0))}k1n~(t0)))=E(N+(t0))+k=1nk(t0)log(E(Nk(t0))).\log(\mathcal{L}(\{E(N_{k}(t_{0}))\}_{k\geq 1}\mid\tilde{n}(t_{0})))=-E(N_{+}(t_{0}))+\sum_{k=1}^{\infty}n_{k}(t_{0})\log(E(N_{k}(t_{0}))). (3.6)

In terms of p(t0)p(t_{0}) and E(N+(t0))E(N_{+}(t_{0})), it is

log((p(t0),E(N+(t0))n~(t0)))\displaystyle\log(\mathcal{L}(p(t_{0}),E(N_{+}(t_{0}))\mid\tilde{n}(t_{0})))
=E(N+(t0))+n+(t0)log(E(N+(t0)))+k=1nk(t0)log(pk(t0)).\displaystyle=-E(N_{+}(t_{0}))+n_{+}(t_{0})\log(E(N_{+}(t_{0})))+\sum_{k=1}^{\infty}n_{k}(t_{0})\log(p_{k}(t_{0})).

If the unknown vector p(t0)p(t_{0}) and the quantity E(N+(t0))E(N_{+}(t_{0})) are unrelated, the above log-likelihood function implies that the maximum likelihood estimator (MLE) of p(t0)p(t_{0}) is the conditional maximum likelihood estimator (conditional on the observed n+(t0)n_{+}(t_{0})) for the multinomial distribution in (1.1). The MLE of E(N+(t0))E(N_{+}(t_{0})) is n+(t0)n_{+}(t_{0}).

4 Expected Species Accumulation Curve

Denote the expected (empirical) SAC (ESAC) as ψ(t)=E(N+(t))\psi(t)=E(N_{+}(t)). Condition (2.1) guarantees that ψ(t)\psi(t) is finite for any finite tt. For a real-valued function g(t)g(t), let g(m)(t)g^{(m)}(t) stand for the mm-order derivative of function g(t)g(t). Clearly ψ(0)=0\psi(0)=0. From (3.2),

ψ(k)(t)=(λ)k1exp(λt)ν(dλ),(k=1,2,).\psi^{(k)}(t)=\int(-\lambda)^{k-1}\exp(-\lambda t)\nu(d\lambda),~{}~{}~{}(k=1,2,\ldots). (4.1)

Note that ψ(1)(0)=ν(dλ)=Λ\psi^{(1)}(0)=\int\nu(d\lambda)=\Lambda. From (3.1) and (4.1),

ψ(k)(t)=(1)k+1k!tkE(Nk(t)),(k=1,2,).\psi^{(k)}(t)=(-1)^{k+1}\frac{k!}{t^{k}}E(N_{k}(t)),~{}~{}~{}(k=1,2,\ldots). (4.2)

Analogous expression for (4.2) appears in Béguinot [2] as an approximate formula under the multinomial model for fixed total number of observed individuals for the species-sample-size curve with the derivative operator replaced by the difference operator.

Before studying the link between ESAC and SAD, two mathematical terms are needed. A function g(t)g(t) is a Bernstein function if it is a nonnegative real-valued function on [0,)[0,\infty) such that (1)k+1g(k)(t)0(-1)^{k+1}g^{(k)}(t)\geq 0 for all positive integer kk [44]. An infinitely differentiable function f(s)f(s) on an interval AA is called absolutely monotone in AA if f(k)(s)0f^{(k)}(s)\geq 0 for k=0,1,k=0,1,\ldots and sAs\in A. The relation between absolutely monotone function and probability generating function is well known (see for example Strook [45]).

Proposition 1:

a.

A function ψ(t)\psi(t) is the ESAC of an MPPP if and only if ψ(t)\psi(t) is a Bernstein function that passes through the origin.

b.

A function ht(s)h_{t}(s) is the probability generating function of p(t)p(t) for a fixed positive tt of an MPPP if and only if ht(0)=0,ht(1)=1h_{t}(0)=0,h_{t}(1)=1, and ht(s)h_{t}(s) is absolutely monotone in (,1)(-\infty,1).

c.

An MPPP is uniquely determined by its ψ(t)\psi(t), or (ht0(s),ψ(t0))(h_{t_{0}}(s),\psi(t_{0})) for the probability generating function ht0(s)h_{t_{0}}(s) of p(t0)p(t_{0}) for a fixed positive t0t_{0}.

d.

For any MPPP, and t>0t>0, we have

ht(s)=1ψ((1s)t)ψ(t),(s(,1]).h_{t}(s)=1-\frac{\psi((1-s)t)}{\psi(t)},~{}~{}~{}(s\in(-\infty,1]). (4.3)

Proof of Proposition 1 is given in Appendix F. Equation (4.3) is the population version of (1.4).

It worths pointing out that Boneh et al. [4] proved that (1)k+1ψ(k)(t)0(-1)^{k+1}\psi^{(k)}(t)\geq 0 for a different setting (DD independent Poisson processes with different rates) and called it “infinite order alternating copositivity”.

From (3.6), the log-likelihood function can be re-expressed as a function of ψ(t)\psi(t).

log((ψn~(t0)))=ψ(t0)+k=1nk(t0)log(ψ(k)(t0)).\log(\mathcal{\mathcal{L}}(\psi\mid\tilde{n}(t_{0})))=-\psi(t_{0})+\sum_{k=1}^{\infty}n_{k}(t_{0})\log(\mid\psi^{(k)}(t_{0})\mid). (4.4)

It can be shown that the Taylor expansion of ψ(t)\psi(t) at t0t_{0} converges to ψ(t)\psi(t) when 0t<2t00\leq t<2t_{0}:

ψ(t)=E(N+(t0))+k=1(1)k+1E(Nk(t0))(tt01)k,(0t<2t0).\psi(t)=E(N_{+}(t_{0}))+\sum_{k=1}^{\infty}(-1)^{k+1}E(N_{k}(t_{0}))\left(\frac{t}{t_{0}}-1\right)^{k},~{}~{}~{}(0\leq t<2t_{0}). (4.5)

It signifies that Good-Toulmin estimator in (1.3) performs satisfactorily in short-term extrapolation when t0<t<2t0t_{0}<t<2t_{0}. We deduce from (4.5) the following unbiased estimator for different order of derivative of ψ(t)\psi(t)

ψ^(j)(t)=(1)j+1t0jk=jk!nk(t0)(kj)!(1tt0)kj,(j1,0<t<2t0).\hat{\psi}^{(j)}(t)=\frac{(-1)^{j+1}}{t_{0}^{j}}\sum_{k=j}^{\infty}\frac{k!n_{k}(t_{0})}{(k-j)!}\left(1-\frac{t}{t_{0}}\right)^{k-j},~{}~{}~{}(j\geq 1,0<t<2t_{0}). (4.6)

We use (4.6) only for 0tt00\leq t\leq t_{0} 111We include t=0t=0 here for completeness. Remember that ψ(1)(0)=Λ\psi^{(1)}(0)=\Lambda can be infinite. because outside this interval, ψ^(j)(t)\hat{\psi}^{(j)}(t) may not have the correct sign (1)j+1(-1)^{j+1}.

Estimator in (4.6) is useful. For example, a concave downward curve when we plot 1/ψ^(1)(t)1/\hat{\psi}^{(1)}(t) for t(0,t0]t\in(0,t_{0}] 222It corresponds to the diagnostic check for the log-series distribution in Table 1 in Section 5. is an indication that E(D)=E(D)=\infty because if we believe that there is a linear function b+ctb+ct with positive bb and cc such that b+ct1/ψ(1)(t)b+ct\geq 1/\psi^{(1)}(t) for all t0t\geq 0, then

E(D)=0ψ(1)(x)𝑑x0(b+cx)1𝑑x=.E(D)=\int_{0}^{\infty}\psi^{(1)}(x)dx\geq\int_{0}^{\infty}(b+cx)^{-1}dx=\infty.

From (4.2), a parallel result of Equation (4.6) is the following relation among expected FoFs

E(Nj(t))=k=jE(Nk(t0))(kj)(tt0)j(1tt0)kj,(j1,0t<2t0),E(N_{j}(t))=\sum_{k=j}^{\infty}E(N_{k}(t_{0})){k\choose j}\left(\frac{t}{t_{0}}\right)^{j}\left(1-\frac{t}{t_{0}}\right)^{k-j},~{}~{}~{}(j\geq 1,0\leq t<2t_{0}), (4.7)

which can be proved using the law of iterated expectations when 0tt00\leq t\leq t_{0}. Furthermore, for 0tt00\leq t\leq t_{0}

E(Nj(t)N~(t0)=n~(t0))=k=jnk(t0)(kj)(tt0)j(1tt0)kj,(j1),E(N_{j}(t)\mid\tilde{N}(t_{0})=\tilde{n}(t_{0}))=\sum_{k=j}^{\infty}n_{k}(t_{0}){k\choose j}\left(\frac{t}{t_{0}}\right)^{j}\left(1-\frac{t}{t_{0}}\right)^{k-j},~{}~{}~{}(j\geq 1), (4.8)

5 Linear First Derivative Ratio Family

MPPP is nonparametric in nature. It is defined by an intensity measure, an ESAC or an SAD. When parametric approach is preferred, we restrict our interest to a family of distributions in MPPP, say by putting constraints on the SAD. A way to portray an SAD is to delineate its probability ratio, pj(t)/pj+1(t)p_{j}(t)/p_{j+1}(t) for j=1,2,j=1,2,\ldots. It is equivalent to examine ψ(j)(t)/ψ(j+1)(t)=tpj(t)/[(j+1)pj+1(t)]-\psi^{(j)}(t)/\psi^{(j+1)}(t)=tp_{j}(t)/[(j+1)p_{j+1}(t)], which we call the jjth derivative ratio. From (4.1) and the Cauchy-Schwarz inequality, for j=1,2,j=1,2,\ldots and t0t\geq 0, ψ(j)(t)ψ(j+2)(t)(ψ(j+1)(t))2\psi^{(j)}(t)\psi^{(j+2)}(t)\geq(\psi^{(j+1)}(t))^{2}. It deduces that jjth derivative ratio is always a nonnegative nondecreasing function of tt. Among all derivative ratios, the first derivative ratio is most important because ψ(1)(t)/ψ(2)(t)=[dlog(ψ(1)(t))/dt]1-\psi^{(1)}(t)/\psi^{(2)}(t)=-[d\log(\psi^{(1)}(t))/dt]^{-1} which relates to the logarithmic derivative of ψ(1)(t)=E(N1(t))/t\psi^{(1)}(t)=E(N_{1}(t))/t, the expected total rate for unseen species at time tt. The following proposition gives a sufficient condition for it.

Proposition 2: A sufficient condition for a function ξ(t)\xi(t) on [0,)[0,\infty) to be the first derivative ratio for an MPPP is that (i) ξ(t)\xi(t) is a Bernstein function, and (ii) ξ(0)>0\xi(0)>0 or ξ(1)(0)>1\xi^{(1)}(0)>1.

We prove Proposition 2 in Appendix G. Hereafter we use ξ(t)=ψ(1)(t)/ψ(2)(t)\xi(t)=-\psi^{(1)}(t)/\psi^{(2)}(t) to denote the first derivative ratio.

The simplest nontrivial Bernstein function is the positive linear function on [0,)[0,\infty). It suggests the following fundamental family of distributions in MPPP.

Definition: (Linear First Derivative Ratio Family) An MPPP is said to belong to the linear first derivative ratio family (denoted as LDR1\mathrm{LDR}_{1}) if its first derivative ratio ξ(t)\xi(t) takes a linear form ξ(t)=b+ct\xi(t)=b+ct for b0b\geq 0 and c0c\geq 0 such that c>1c>1 if b=0b=0.

Family LDR1\mathrm{LDR}_{1} encompasses some common SADs and ESACs. We list the characteristics of all models in LDR1\mathrm{LDR}_{1} in Table 1. When b=0b=0, cc must be larger than 1 (see condition (ii) in Proposition 2), otherwise from the second row of Table 1, limb0ψ(t)=\lim_{b\downarrow 0}\psi(t)=\infty for finite tt. In Table 1, equality of transformed ψ(1)(t)\psi^{(1)}(t) for some models is presented. Such equalities can be used to produce diagnostic check for specified SAD in LDR1\mathrm{LDR}_{1} when ψ(1)(t)\psi^{(1)}(t) is replaced by its estimator in (4.6).

[b]

Table 1: Models in LDR1\mathrm{LDR}_{1} (in the expressions, “aa” is a positive scale parameter)
c=0c=0 SAD (Zero-truncated Poisson distribution)1  :
(b>0\Rightarrow b>0)      pk(t)=(t/b)kexp(t/b)/[k!(1exp(t/b))]p_{k}(t)=(t/b)^{k}\exp(-t/b)/[k!(1-\exp(-t/b))]
ESAC (Negative exponential law):
     ψ(t)=abexp(1/b)(1exp(t/b))\psi(t)=ab\exp(1/b)(1-\exp(-t/b))
Diagnostic check: log(ψ(1)(t))=1/b+log(a)t/b\log(\psi^{(1)}(t))=1/b+\log(a)-t/b
c0,1c\neq 0,1 SAD2  : pk(t)=(c1)ck1tkΓ(1/c+k1)(b+ct)11/ckk!Γ(1/c)[(b+ct)11/cb11/c]p_{k}(t)=\frac{(c-1)c^{k-1}t^{k}\Gamma(1/c+k-1)(b+ct)^{1-1/c-k}}{k!\Gamma(1/c)[(b+ct)^{1-1/c}-b^{1-1/c}]}
b>0b>0 ESAC: ψ(t)=a(b+c)c1((b+ctb+c)11/c(bb+c)11/c)\psi(t)=\frac{a(b+c)}{c-1}\left(\left(\frac{b+ct}{b+c}\right)^{1-1/c}-\left(\frac{b}{b+c}\right)^{1-1/c}\right)
c=1/2c=1/2 SAD (Geometric distribution)3  : pk(t)=(2b/t)[t/(2b+t)]kp_{k}(t)=(2b/t)[t/(2b+t)]^{k}
(b>0\Rightarrow b>0) ESAC (Hyperbola law)4  : ψ(t)=a(2b+1)2t/(2b(t+2b))\psi(t)=a(2b+1)^{2}t/(2b(t+2b))
Diagnostic check: (ψ(1)(t))1/2=(t+2b)/[a1/2(2b+1)](\psi^{(1)}(t))^{-1/2}=(t+2b)/[a^{1/2}(2b+1)]
c=1c=1 SAD (Log-series distribution): pk(t)=[t/(t+b)]k/(klog(1+t/b))p_{k}(t)=[t/(t+b)]^{k}/(k\log(1+t/b))
(b>0\Rightarrow b>0) ESAC (Kobayashi’s logarithm law)5  : ψ(t)=a(b+1)log(1+t/b)\psi(t)=a(b+1)\log(1+t/b)
Diagnostic check: 1/ψ(1)(t)=(b+t)/[a(b+1)]1/\psi^{(1)}(t)=(b+t)/[a(b+1)]
b=0b=0 SAD 6  : When 1<c<1<c<\infty, pk(t)=(c1)Γ(1/c+k1)/[k!cΓ(1/c)]p_{k}(t)=(c-1)\Gamma(1/c+k-1)/[k!c\Gamma(1/c)]
(c>1\Rightarrow c>1        When c=c=\infty, p1(t)=1p_{1}(t)=1, pk(t)=0p_{k}(t)=0 for k>1k>1.
or c=c=\infty ) ESAC (Power law): When 1<c<1<c<\infty, ψ(t)=act11/c/(c1)\psi(t)=act^{1-1/c}/(c-1)
       When c=c=\infty, ψ(t)=at\psi(t)=at
Diagnostic check: log(ψ(1)(t))=log(a)log(t)/c\log(\psi^{(1)}(t))=\log(a)-\log(t)/c
  • 1

    It is the simplest MPPP with all species having the same rate 1/b1/b.

  • 2

    When 0<c<10<c<1 (b>0\Rightarrow b>0), it is the zero-truncated negative binomial distribution.

  • 3

    It is a special case of the zero-truncated negative binomial distribution.

  • 4

    Also known as Michaelis-Menten equation and Monod model.

  • 5

    Kobayashi [30] (see also Fisher et al. [20] and May [36])

  • 6

    This distribution appears in Zhou et al. [51].

LDR1\mathrm{LDR}_{1} has three parameters aa, bb and cc. Parameters bb and cc determine the SAD, and a=ψ(1)(1)a=\psi^{(1)}(1) is a scale parameter. As pointed out at the end of Section 3, the MLE of bb and cc is equivalent to the conditional MLE (conditional on the observed n+(t0)n_{+}(t_{0})) of the multinomial model in (1.1). The MLE of aa is chosen to make E^(N+(t0))\hat{E}(N_{+}(t_{0})) equal to n+(t0)n_{+}(t_{0}).

From Table 1, we can find E(Nk(t))E(N_{k}(t)) and ψ(k)(t)\psi^{(k)}(t) using the relations E(Nk(t))=pk(t)ψ(t)E(N_{k}(t))=p_{k}(t)\psi(t), and

ψ(k)(t)=(1)k+1k!tkpk(t)ψ(t).\psi^{(k)}(t)=(-1)^{k+1}\frac{k!}{t^{k}}p_{k}(t)\psi(t).

It can be shown that for LDR1\mathrm{LDR}_{1}, ν({0})=0\nu(\{0\})=0, and

ν~(dλ)={a((b+c)/c)1/cλ1/c2Γ(1/c)exp(bλ/c)dλ(c>0),abexp(1/b)δ1/b(dλ)(c=0),\tilde{\nu}(d\lambda)=\begin{cases}\frac{a((b+c)/c)^{1/c}\lambda^{1/c-2}}{\Gamma(1/c)}\exp(-b\lambda/c)d\lambda&(c>0),\\ ab\exp(1/b)\delta_{1/b}(d\lambda)&(c=0),\end{cases} (5.1)

where δ1/b(A)=𝟏{1/bA}\delta_{1/b}(A)=\mathbf{1}\{1/b\in A\} is the Dirac measure. The intensity measure ν~\tilde{\nu} takes the form as a gamma distribution with extended shape parameter 1/c11/c-1 for nonnegative cc. Therefore, p(t)p(t) is the Engen’s extended negative binomial distribution [19] with support {1,2,}\{1,2,\ldots\}. The parameter cc determines the shape of the gamma distribution.

The Hill number of order qq for LDR1\mathrm{LDR}_{1} is

Dνq={abexp(1/b)(c=0)a((b+c)/b)1/c(b/c)(Γ(1c+q1)/Γ(1c))1/(1q)(c>0,q>11c,q1)a((b+c)/b)1/c(b/c)exp(Ψ(1c))(c>0,q=1)(c>0,q11c),{}^{q}\!D_{\nu}=\begin{cases}ab\exp(1/b)&(c=0)\\ a((b+c)/b)^{1/c}(b/c)(\Gamma(\frac{1}{c}+q-1)/\Gamma(\frac{1}{c}))^{1/(1-q)}&(c>0,q>1-\frac{1}{c},q\neq 1)\\ a((b+c)/b)^{1/c}(b/c)\exp(-\Psi(\frac{1}{c}))&(c>0,q=1)\\ \infty&(c>0,q\leq 1-\frac{1}{c}),\end{cases}

where Ψ(x)\Psi(x) is the digamma function. E(D)E(D) can be found either as Dν0{}^{0}\!D_{\nu} or limtψ(t)\lim_{t\to\infty}\psi(t). Species richness, E(D)=0Dν=E(D)=\,^{0}\!D_{\nu}=\infty if and only if c1c\geq 1. In this case, E(Nk(t))E(N_{k}(t)) is increasing in tt for any fixed kk. When E(D)<E(D)<\infty, E(Nk(t))E(N_{k}(t)) is unimodal with respect to tt.

An ESAC is called following a power law, if ψ(t)tτ\psi(t)\propto t^{\tau} for 0<τ10<\tau\leq 1. From Table 1, the SAD at time tt for a power law has the form

P(X=k)=(1β)(β)k1k!,(k=1,2,)P(X=k)=\frac{(1-\beta)(\beta)_{k-1}}{k!},~{}~{}~{}(k=1,2,\ldots)

where 0β<10\leq\beta<1 (β\beta is 1/c1/c in Table 1), and (a)i=a(a+1)(a+i1)(a)_{i}=a(a+1)\ldots(a+i-1) for i1i\geq 1 and (a)0=1(a)_{0}=1 is the rising factorial (this distribution appears in Zhou et al. [51]). This SAD distribution does not depend on tt. We call this discrete distribution, the power species abundance distribution (PSAD). If XX follows a PSAD with parameter β\beta, then X1X-1 follows the generalized hypergeometric distribution, F12(β,1;2;1){}_{2}F_{1}(\beta,1;2;1) distribution [28, 27].

Proposition 3: Under the MPPP, power law is the only ESAC which has SAD independent on tt. Furthermore, if an SAD under MPPP has a proper limiting distribution when tt approaches infinity, then the limiting distribution must be a PSAD.

The proof of Proposition 3 is given in Appendix H.

Three distributions in LDR1\mathrm{LDR}_{1} are exceptional. They stand for three boundary situations. The first one is the zero-truncated Poisson distribution (c=0c=0). It models the extreme case when all species have equal abundance. In the approach that assumes finite DD, it is a common reference distribution, say in interpreting Dνq{}^{q}\!D_{\nu} and deriving nonparametric estimator of DD.

The second one is the log-series distribution (c=1c=1). It is where E(D)E(D) jumps from finite value to infinite value in LDR1\mathrm{LDR}_{1}. Therefore, if we are interested only in finite DD models, it is a boundary case. Because of this property, it is used in this paper as a reference distribution in graphical check for the finiteness of E(D)E(D) (refer to the end of Section 4, and the checking of slope 1 in the D1/D2 plot introduced later).

The last one is the power law (b=0b=0). It is the only possible limiting distribution of SAD in MPPP as tt approaches \infty. All SADs in LDR1\mathrm{LDR}_{1} with c>1c>1 converge to it when tt increases without bound. It is also the only distribution in LDR1\mathrm{LDR}_{1} that has E(S(t))=E(S(t))=\infty for any t>0t>0.

We extend the linear first derivative ratio family to linear jjth derivative ratio family, which we denote as LDRj\mathrm{LDR}_{j}. A ψ(t)\psi(t) belongs to LDRj\mathrm{LDR}_{j} if ψ(j)(t)/ψ(j+1)(t)-\psi^{(j)}(t)/\psi^{(j+1)}(t) is a linear function of tt. We prove in Appendix I that LDR2=LDR3=\mathrm{LDR}_{2}=\mathrm{LDR}_{3}=\ldots, and LDR2\mathrm{LDR}_{2} is simply a mixture of zero-rate species and LDR1\mathrm{LDR}_{1} (i.e., the ν~\tilde{\nu} of LDR2\mathrm{LDR}_{2} satisfies (5.1), but ν({0})\nu(\{0\}) can be positive).

6 Diagnostic Plots

An advantage of LDR1\mathrm{LDR}_{1} is that it has a simple diagnostic plot: Draw ξ^(t)=ψ^(1)(t)/ψ^(2)(t)\hat{\xi}(t)=-\hat{\psi}^{(1)}(t)/\hat{\psi}^{(2)}(t) as a function of t[0,t0]t\in[0,t_{0}] for ψ^(1)(t)\hat{\psi}^{(1)}(t) and ψ^(2)(t)\hat{\psi}^{(2)}(t) defined in (4.6). If the curve in the plot is almost linear, LDR1\mathrm{LDR}_{1} is an appropriate model. Approximate intercept and slope of the curve can be used as initial estimate of bb and cc in finding the MLE. We call the plot, D1/D2 plot, and the curve for ξ^(t)\hat{\xi}(t) in the plot, the D1/D2 curve.

Similarly, to investigate how well LDR2\mathrm{LDR}_{2} fits a data, we can plot the function ψ^(2)(t)/ψ^(3)(t)-\hat{\psi}^{(2)}(t)/\hat{\psi}^{(3)}(t) for t[0,t0]t\in[0,t_{0}] where ψ^(2)(t)\hat{\psi}^{(2)}(t) and ψ^(3)(t)\hat{\psi}^{(3)}(t) are defined in (4.6). We call the plot, D2/D3 plot, and the curve in it, the D2/D3 curve.

As N1(t0),N2(t0),N_{1}(t_{0}),N_{2}(t_{0}),\ldots are independent and Poisson distributed, by the delta method, we can approximate Var(ξ^(t))Var(\hat{\xi}(t)) by

Var^(ξ^(t))=Var^(ψ^(1)(t))ψ^(2)2(t)+ψ^(1)2(t)Var^(ψ^(2)(t))ψ^(2)4(t)2ψ^(1)(t)Cov^(ψ^(1)(t),ψ^(2)(t))ψ^(2)3(t),\widehat{Var}(\hat{\xi}(t))=\frac{\widehat{Var}(\hat{\psi}^{(1)}(t))}{\hat{\psi}^{(2)2}(t)}+\frac{\hat{\psi}^{(1)2}(t)\widehat{Var}(\hat{\psi}^{(2)}(t))}{\hat{\psi}^{(2)4}(t)}-\frac{2\hat{\psi}^{(1)}(t)\widehat{Cov}(\hat{\psi}^{(1)}(t),\hat{\psi}^{(2)}(t))}{\hat{\psi}^{(2)3}(t)},

where

Var^(ψ^(1)(t))=1t02k=1k2Nk(t0)(1t/t0)2k2,\widehat{Var}(\hat{\psi}^{(1)}(t))=\frac{1}{t_{0}^{2}}\sum_{k=1}^{\infty}k^{2}N_{k}(t_{0})(1-t/t_{0})^{2k-2}, (6.1)
Var^(ψ^(2)(t))=1t04k=2k2(k1)2Nk(t0)(1t/t0)2k4,\widehat{Var}(\hat{\psi}^{(2)}(t))=\frac{1}{t_{0}^{4}}\sum_{k=2}^{\infty}k^{2}(k-1)^{2}N_{k}(t_{0})(1-t/t_{0})^{2k-4},

and

Cov^(ψ^(1)(t),ψ^(2)(t))=1t03k=2k2(k1)Nk(t0)(1t/t0)2k3.\widehat{Cov}(\hat{\psi}^{(1)}(t),\hat{\psi}^{(2)}(t))=-\frac{1}{t_{0}^{3}}\sum_{k=2}^{\infty}k^{2}(k-1)N_{k}(t_{0})(1-t/t_{0})^{2k-3}.

We use ξ^(t)±1.96Var^(ξ^(t))\hat{\xi}(t)\pm 1.96\sqrt{\widehat{Var}(\hat{\xi}(t))} as an approximate 95% pointwise confidence band for ξ^(t)\hat{\xi}(t) in the D1/D2 plot. Similar confidence band can be constructed in D2/D3 plot (see Appendix J).

The diagnostic checks in Table 1 suggest graphs to examine the fitness of zero-truncated Poisson distribution, geometric distribution, log-series distribution, and power law to the FoF. Graphical check for the leading three distributions exist in the statistical literature. See for example Hoaglin and Tukey [26]. Our plots are new additions from a totally new point of view. The plot focuses on ψ^(1)(t)\hat{\psi}^{(1)}(t), the estimated expected total rate of unseen species at time tt. The graphical check detects discrepancy between the estimated function and its expected pattern for t[0,t0]t\in[0,t_{0}] when a specified distribution is assumed. Since the plotted yy-value in the diagnostic plot has the form g(ψ^(1)(t))g(\hat{\psi}^{(1)}(t)), an approximate pointwise confidence band for the curve can be obtained using the approximation Var(g(ψ^(1)(t)))[g(1)(ψ^(1)(t))]2Var^(ψ^(1)(t))Var(g(\hat{\psi}^{(1)}(t)))\approx[g^{(1)}(\hat{\psi}^{(1)}(t))]^{2}\widehat{Var}(\hat{\psi}^{(1)}(t)) for Var^(ψ^(1)(t))\widehat{Var}(\hat{\psi}^{(1)}(t)) defined in (6.1).

Currently a standard diagnostic plot for power law is the log-log plot which plots log(ψ^(t))\log(\hat{\psi}(t)) against log(t)\log(t) for 0tt00\leq t\leq t_{0}. As dlog(ψ(t))/dlog(t)=p1(t)d\log(\psi(t))/d\log(t)=p_{1}(t), log-log plot detects whether p1(t)p_{1}(t) is a constant function. As suggested by the diagnostic check of power law in Table 1, we can plot log(ψ^(1)(t))\log(\hat{\psi}^{(1)}(t)) against log(t)\log(t) for 0tt00\leq t\leq t_{0}. We call it log(D)-log plot. Log(D)-log plot checks whether p2(t)/p1(t)p_{2}(t)/p_{1}(t) is a constant function because dlog(ψ(1)(t))/dlog(t)=2p2(t)/p1(t)d\log(\psi^{(1)}(t))/d\log(t)=-2p_{2}(t)/p_{1}(t). Log(D)-log plot is more sensitive to discrepancies with the power law because p1(t)p_{1}(t) changes very slowly with respect to tt for many SADs. It is well-known in species-area relationship studies that the curve in log-log plot is approximately linear for various dissimilar SADs [39, 40, 36, 33].

In Figure 2, we consider four LDR1\mathrm{LDR}_{1} distributions for which the diagnostic graphs are designed. The parameter (b,c)(b,c) are (1.5, 0) (zero-truncated Poisson distribution), (1, 0.5) (geometric distribution), (0.5, 1) (log-series distribution) and (0, 1.5) (power law). Parameter aa is chosen so that ψ(5)=200\psi(5)=200. We draw the ESAC in panel (a) and the SAD at t=5t=5 in panel (b). We can discriminate the power law (the black curve in the panel (a)) from other distributions as its ψ(1)(0)=\psi^{(1)}(0)=\infty (ψ(1)(0)=Λ\psi^{(1)}(0)=\Lambda is the expected total rate). Other than this, we learn little about the parameters bb and cc from visual inspection of the plots in panels (a) and (b). We need special plots to discriminate different parameter values. In D1/D2 plot in panel (c), the four distributions correspond to four straight lines with different slope cc. The graphical checks in panels (d)-(g) are designed for each special distribution so that when FoF follows that distribution, the curve in the plot is a straight line, which is drawn as a heavy line in Figure 2. Therefore, how straight the curve is can be used to assess how well the distribution fits the data.

Refer to caption
Figure 2: Plots for four LDR1\mathrm{LDR}_{1} models with ξ(t)=(1.5c)+ct\xi(t)=(1.5-c)+ct for c=1.5,1,0.5,1c=1.5,1,0.5,1. When c=1.5c=1.5 (b=0b=0), it is a power law and the curve is shown in black color. When c=1c=1, it is a log-series distribution, and the curve is shown in red. When c=0.5c=0.5, it is a geometric distribution, and the curve is shown in green. When c=0c=0, it is a zero truncated Poisson distribution, and the curve is shown in blue. In panel (a), we draw the ESAC. In panel (b), we draw SAD at time t=5t=5. In panel (c), we draw D1/D2 plot. In panels (d)-(g), we draw the graphical checks for zero truncated Poisson distribution, geometric distribution, log-series distribution and power law respectively. The straight line in the plots are drawn with larger weight.

7 Species Richness via Extrapolation

Assuming the whole curve ξ(t)\xi(t) to be linear may be too ambitious. If species richness is our main concern, our aim is to estimate E(N0(t0))E(N_{0}(t_{0})). It is convenient to concentrate only on the rare species. We classify species into two groups: rare species and abundant species. Assume that (B1) all species represented 0, 1, 2, and 3 times in time interval [0,t0][0,t_{0}] are rare, and (B2) rare species follows LDR1\mathrm{LDR}_{1} distribution. 333The categorization of species into rare species and abundant species in (B1) and (B2) is vague and data-dependent. A clear-cut threshold at 3 for the abundant species is artificial. It leads to an intuitive but debatable conclusion: All abundant species are represented at least four times in [0,t0][0,t_{0}]. Results deduced from (B1) and (B2) can only be approximations. Observations n1(t0),n2(t0)n_{1}(t_{0}),n_{2}(t_{0}) and n3(t0)n_{3}(t_{0}) are data from this model truncated at both ends. We do not know how many rare species are represented four times or more in [0,t0][0,t_{0}] because the observed nj(t0)n_{j}(t_{0}) for j4j\geq 4 may include abundant species.

It can be shown that for LDR1\mathrm{LDR}_{1},

E(Nj(t0))E(Nj+2(t0))E2(Nj+1(t0))=(j+1)(jc+1)(j+2)((j1)c+1),\frac{E(N_{j}(t_{0}))E(N_{j+2}(t_{0}))}{E^{2}(N_{j+1}(t_{0}))}=\frac{(j+1)(jc+1)}{(j+2)((j-1)c+1)}, (7.1)

when j=1,2,j=1,2,\ldots and is independent on t0t_{0}. When j=1j=1, we have

E(N1(t0))E(N3(t0))E2(N2(t0))=2(c+1)3.\frac{E(N_{1}(t_{0}))E(N_{3}(t_{0}))}{E^{2}(N_{2}(t_{0}))}=\frac{2(c+1)}{3}.

The information about the rare species from N1(t0),N2(t0)N_{1}(t_{0}),N_{2}(t_{0}) and N3(t0)N_{3}(t_{0}) is barely enough to make cc estimable. When N2(t0)>0N_{2}(t_{0})>0, a plug-in estimator of cc is

c^=3N1(t0)N3(t0)2N22(t0)1.\hat{c}^{*}=\frac{3N_{1}(t_{0})N_{3}(t_{0})}{2N_{2}^{2}(t_{0})}-1.

Estimator c^\hat{c}^{*} needs modification to take into account two inequality constraints: c0c\geq 0 and b0b\geq 0. Under LDR1\mathrm{LDR}_{1}, the constraint b0b\geq 0 is equivalent to cc, the slope of ξ(t)\xi(t) satisfying E(N1(t0))/(2E(N2(t0)))=ξ(t0)/t0cE(N_{1}(t_{0}))/(2E(N_{2}(t_{0})))=\xi(t_{0})/t_{0}\geq c. Therefore, our final estimator of cc when N2(t0)>0N_{2}(t_{0})>0 is

c^F=max(min(c^,N1(t0)/(2N2(t0))),0).\hat{c}_{F}=\max(\min(\hat{c}^{*},N_{1}(t_{0})/(2N_{2}(t_{0}))),0).

If 0c<10\leq c<1, E(N0(t0))E(N_{0}(t_{0})) (=E(D)ψ(t0)=E(D)-\psi(t_{0})) is finite, and (7.1) remains true when j=0j=0. We have

E(N0(t0))={E2(N1(t0))/[2(1c)E(N2(t0))](0c<1)(c1).E(N_{0}(t_{0}))=\left\{\begin{array}[]{ll}E^{2}(N_{1}(t_{0}))/[2(1-c)E(N_{2}(t_{0}))]&(0\leq c<1)\\ \infty&(c\geq 1).\end{array}\right. (7.2)

Using estimator c^F\hat{c}_{F} and (7.2), we have the following estimator of E(N0(t0))E(N_{0}(t_{0})).

E^(N0(t0))={N12(t0)/[2(1c^F)N2(t0)](N2(t0)>0,0c^F<1)(N2(t0)>0,c^F1) or (N1(t0)>0,N2(t0)=0)0(N1(t0)=N2(t0)=0).\hat{E}(N_{0}(t_{0}))=\left\{\begin{array}[]{ll}N_{1}^{2}(t_{0})/[2(1-\hat{c}_{F})N_{2}(t_{0})]&(N_{2}(t_{0})>0,0\leq\hat{c}_{F}<1)\\ \infty&(N_{2}(t_{0})>0,\hat{c}_{F}\geq 1)\mbox{ or }\\ &~{}~{}~{}~{}~{}(N_{1}(t_{0})>0,N_{2}(t_{0})=0)\\ 0&(N_{1}(t_{0})=N_{2}(t_{0})=0).\end{array}\right. (7.3)

From (B1), all unseen species are rare. Estimator E^(N0(t0))\hat{E}(N_{0}(t_{0})) depends only on rare species data N1(t0),N2(t0)N_{1}(t_{0}),N_{2}(t_{0}), and N3(t0)N_{3}(t_{0}). When c^F=0\hat{c}_{F}=0, E^(N0(t0))\hat{E}(N_{0}(t_{0})) reduces to Chao1 estimator [7] of E(N0(t0))E(N_{0}(t_{0})), which is N12(t0)/[2N2(t0)]N_{1}^{2}(t_{0})/[2N_{2}(t_{0})]. If N2(t0)>0N_{2}(t_{0})>0 and c^F1\hat{c}_{F}\geq 1, a reasonable estimate of E(N0(t0))E(N_{0}(t_{0})) is infinity. When N2(t0)=0N_{2}(t_{0})=0, our estimate of E(N0(t0))E(N_{0}(t_{0})) is \infty when N1(t0)>0N_{1}(t_{0})>0, and is 0 when N1(t0)=0N_{1}(t_{0})=0. From (B1), all abundant species are observed before time t0t_{0}. Their contribution to DD is included in N+(t0)N_{+}(t_{0}). Our estimator of E(D)E(D) is E^(D)=N+(t0)+E^(N0(t0))\hat{E}^{*}(D)=N_{+}(t_{0})+\hat{E}(N_{0}(t_{0})).

Chao1 estimator [7] is a lower bound estimator of DD, and works well when the rare species have equal abundance, which corresponds to the LDR1\mathrm{LDR}_{1} model with c=0c=0. Our estimator E^(D)\hat{E}^{*}(D) is an extension of Chao1 estimator in the sense that we assume the rare species follow LDR1\mathrm{LDR}_{1} model, and estimate the parameter cc using the FoF of rare species. It reduces to Chao1 estimator when c^F=0\hat{c}_{F}=0.

A better way to understand the estimator E^(D)\hat{E}^{*}(D) is to relate it to an extrapolation of ξ^(t)\hat{\xi}(t) because we can assess its suitability in the D1/D2 plot. Abundant species usually appear early. Species that show up late are likely rare. The rear part of ξ^(t)\hat{\xi}(t) depends mainly on rare species. Extrapolation is a technique to extend our knowledge about rare species to the unseen species. (B1) and (B2) imply that ξ(t)\xi(t) is approximately linear when tt0t\geq t_{0}.444We do not require ξ(t)\xi(t) to be exactly linear when tt0t\geq t_{0} because given the rear part of ξ(t)\xi(t), we can find E(Ni(t))E(N_{i}(t)) for i1i\geq 1 and tt0t\geq t_{0} and then perform interpolation using Equation (4.7) to find ξ(t)\xi(t) for all t0t\geq 0. If ξ(t)\xi(t) is linear after t0t_{0}, it can be shown that ξ(t)\xi(t) is linear in [0,)[0,\infty). Consider two simple linear extrapolation methods in D1/D2 plot. The zeroth-order extrapolation uses ξ^(t)=ξ^(t0)=t0N1(t0)/[2N2(t0)]\hat{\xi}(t)=\hat{\xi}(t_{0})=t_{0}N_{1}(t_{0})/[2N_{2}(t_{0})] for t>t0t>t_{0}. Another extrapolation uses

ξ^(t)=ξ^(t0)+c^F(tt0),(t>t0).\hat{\xi}(t)=\hat{\xi}(t_{0})+\hat{c}_{F}(t-t_{0}),~{}~{}~{}(t>t_{0}). (7.4)

As c^=ξ^(1)(t0)\hat{c}^{*}=\hat{\xi}^{(1)}(t_{0}), (7.4) is the first-order extrapolation of ξ^(t)\hat{\xi}(t) when c^F=c^\hat{c}_{F}=\hat{c}^{*}. We call (7.4) a modified first-order extrapolation. It is always true that

ψ(t)=ψ(t0)+ψ(1)(t0)t0texp(t0y1ξ(x)𝑑x)𝑑y,(tt0).\psi(t)=\psi(t_{0})+\psi^{(1)}(t_{0})\int_{t_{0}}^{t}\exp\left(-\int_{t_{0}}^{y}\frac{1}{\xi(x)}dx\right)dy,~{}~{}(t\geq t_{0}). (7.5)

If we estimate ψ(t0)\psi(t_{0}) by its MLE N+(t0)N_{+}(t_{0}) and ψ(1)(t0)=E(N1(t0))/t0\psi^{(1)}(t_{0})=E(N_{1}(t_{0}))/t_{0} by its plug-in estimator N1(t0)/t0N_{1}(t_{0})/t_{0}, then we can estimate ψ(t)\psi(t) for all t>t0t>t_{0} using (7.5) once an extrapolation rule for ξ(t)\xi(t) is chosen. Species richness is just the limiting value of this ψ(t)\psi(t). If zeroth-order extrapolation is used, the estimator of E(D)E(D) is Chao1 estimator. If modified first-order extrapolation rule is used, the estimator of E(D)E(D) is E^(D)\hat{E}^{*}(D). As ξ(t)\xi(t) is nondecreasing, zeroth-order extrapolation is a lower bound of all reasonable extrapolations. It explains why Chao1 estimator estimates a lower bound of E(D)E(D).

All ξ(t)\xi(t)’s that fulfill the sufficient condition in Proposition 2 are asymptotically linear because its derivative is nonnegative and non-increasing and thus must have a finite limit. Furthermore, for concave ξ(t)\xi(t), the modified first-order extrapolation is probably the first-order extrapolation. Therefore, under the sufficient condition in Proposition 2, modified first-order extrapolation is an adequate extrapolation curve when t0t_{0} is large, and E^(D)\hat{E}^{*}(D) should give a plausible estimate.

Equation (7.5) is useful. If ξ(t)β+t\xi(t)\geq\beta+t for a β>0\beta>0 when tt0t\geq t^{*}\geq 0, then

ψ(t)\displaystyle\psi(t) \displaystyle\geq ψ(t)+ψ(1)(t)ttexp(ty1β+x𝑑x)𝑑y\displaystyle\psi(t^{*})+\psi^{(1)}(t^{*})\int_{t^{*}}^{t}\exp\left(-\int_{t^{*}}^{y}\frac{1}{\beta+x}dx\right)dy
=\displaystyle= ψ(t)+ψ(1)(t)(β+t)log(β+tβ+t)when t.\displaystyle\psi(t^{*})+\psi^{(1)}(t^{*})(\beta+t^{*})\log\left(\frac{\beta+t}{\beta+t^{*}}\right)\to\infty~{}~{}\mbox{when }t\to\infty.

Similarly, if ξ(t)<β+γt\xi(t)<\beta+\gamma t for a β>0\beta>0 and 0γ<10\leq\gamma<1 when tt0t\geq t^{*}\geq 0, then

ψ(t)\displaystyle\psi(t) <\displaystyle< ψ(t)+ψ(1)(t)ttexp(ty1β+γx𝑑x)𝑑y\displaystyle\psi(t^{*})+\psi^{(1)}(t^{*})\int_{t^{*}}^{t}\exp\left(-\int_{t^{*}}^{y}\frac{1}{\beta+\gamma x}dx\right)dy
=\displaystyle= ψ(t)+ψ(1)(t)(β+γt)1/γ1γ((β+γt)11/γ(β+γt)11/γ)\displaystyle\psi(t^{*})+\psi^{(1)}(t^{*})\frac{(\beta+\gamma t^{*})^{1/\gamma}}{1-\gamma}\left((\beta+\gamma t^{*})^{1-1/\gamma}-(\beta+\gamma t)^{1-1/\gamma}\right)

where the last expression has finite limit when tt increases to infinity. These two facts can be used to detect the finiteness of E(D)E(D) in D1/D2 plot. If we can judge from the ξ^(t)\hat{\xi}(t) in the D1/D2 plot that there are positive values β\beta and tt^{*} such that ξ^(t)β+t\hat{\xi}(t)\geq\beta+t whenever ttt\geq t^{*}, then E(D)E(D) is likely infinite. On the other hand, if ξ^(t)β+γt\hat{\xi}(t)\leq\beta+\gamma t for a 0γ<10\leq\gamma<1, it is reasonable to believe that E(D)E(D) is finite. To assist judging whether the rear part of ξ^(t)\hat{\xi}(t) has slope larger than or smaller than 1, it is helpful to add grid lines with slope 1 in the D1/D2 plot as demonstrated in Figure 3.

We can assess the adequacy of Chao1 estimator through investigating how well N1(t0)N_{1}(t_{0}), N2(t0)N_{2}(t_{0}) and N3(t0)N_{3}(t_{0}) fit a truncated Poisson distribution by testing

H0:3E(N1(t0))E(N3(t0))2E2(N2(t0))0H_{0}:3E(N_{1}(t_{0}))E(N_{3}(t_{0}))-2E^{2}(N_{2}(t_{0}))\leq 0

against

H1:3E(N1(t0))E(N3(t0))2E2(N2(t0))>0.H_{1}:3E(N_{1}(t_{0}))E(N_{3}(t_{0}))-2E^{2}(N_{2}(t_{0}))>0.

If the null hypothesis is not rejected, Chao1 estimator gives reasonable estimate of E(D)E(D), otherwise it only estimates a lower bound of E(D)E(D). An unbiased estimator of 3E(N1(t0))E(N3(t0))2E2(N2(t0))3E(N_{1}(t_{0}))E(N_{3}(t_{0}))-2E^{2}(N_{2}(t_{0})) is T=3N1(t0)N3(t0)2N2(t0)(N2(t0)1)T=3N_{1}(t_{0})N_{3}(t_{0})-2N_{2}(t_{0})(N_{2}(t_{0})-1) which is our test statistic. An unbiased estimator of Var(T)Var(T) is

Var^(T)=9N1(t0)N3(t0)(N1(t0)+N3(t0)1)+8N2(t0)(35N2(t0)+2N22(t0)).\widehat{Var}(T)=9N_{1}(t_{0})N_{3}(t_{0})(N_{1}(t_{0})+N_{3}(t_{0})-1)+8N_{2}(t_{0})(3-5N_{2}(t_{0})+2N_{2}^{2}(t_{0})).

The approximate p-value of the test is P(N(0,1)T/Var^(T))P(N(0,1)\geq T/\sqrt{\widehat{Var}(T)}~{}), where N(0,1)N(0,1) stands for standard normal distribution. We reject H0H_{0} at significance level α\alpha when the p-value is less than α\alpha.

8 Rational First Derivative Ratio Family

When the curve in the D1/D2 plot is not linear, but concave, it is natural to model ξ(t)\xi(t) as a rational function. Quotient of two linear functions is too restrictive because it is in general asymptotically flat. It not only implies that E(D)E(D) is usually finite, but also that the rare species are homogeneously abundant. Therefore, we consider a ratio of a quadratic polynomial and a linear function. This form of ξ(t)\xi(t) is asymptotically linear with flexible slope. After simple manipulation, we obtain the following expression for ξ(t)\xi(t).

Definition: (Rational First Derivative Ratio Family) A first derivative ratio, ξ(t)\xi(t) belongs to the rational first derivative ratio family (denoted as RDR1\mathrm{RDR}_{1}) if

ξ(t)=1c1/(t+b1)+c2/(t+b2),\xi(t)=\frac{1}{c_{1}/(t+b_{1})+c_{2}/(t+b_{2})}, (8.1)

where b2b_{2} and c1c_{1} are positive parameters, and b1b_{1} and c2c_{2} are nonnegative parameters. For uniqueness we assume b1<b2b_{1}<b_{2}. If b1=0b_{1}=0, we require c1<1c_{1}<1. This additional restriction is to ensure that ψ(t)\psi(t) is finite for all finite tt.

Function ξ(t)\xi(t) in (8.1) fulfills the sufficient condition in Proposition 2. When c2=0c_{2}=0, RDR1\mathrm{RDR}_{1} model becomes LDR1\mathrm{LDR}_{1} model. RDR1\mathrm{RDR}_{1} has five parameters, a=ψ(1)(t0)a=\psi^{(1)}(t_{0}), b1b_{1}, b2b_{2}, c1c_{1}, and c2c_{2}.

To calculate the log-likelihood function using (4.4), we need to compute ψ(t0)\psi(t_{0}) and ψ(k)(t0)\psi^{(k)}(t_{0}) for all kk such that nk(t0)>0n_{k}(t_{0})>0. The following equality which can be proved by mathematical induction for k1k\geq 1 is helpful,

k(k1+c1+c2)ψ(k)(t)+[c1t+b2c1+c2t+b1c2+k(2t+b1+b2)]ψ(k+1)(t)\displaystyle k(k-1+c_{1}+c_{2})\psi^{(k)}(t)+[c_{1}t+b_{2}c_{1}+c_{2}t+b_{1}c_{2}+k(2t+b_{1}+b_{2})]\psi^{(k+1)}(t)
+(t+b1)(t+b2)ψ(k+2)(t)=0.\displaystyle+(t+b_{1})(t+b_{2})\psi^{(k+2)}(t)=0.

To use it, choose a value for t0t_{0}. Given a=ψ(1)(t0)a=\psi^{(1)}(t_{0}) which is a scale parameter, and the values of the parameters b1b_{1}, b2b_{2}, c1c_{1} and c2c_{2}, we can find ψ(2)(t0)=ψ(1)(t0)/ξ(t0)\psi^{(2)}(t_{0})=-\psi^{(1)}(t_{0})/\xi(t_{0}). Then apply the above recurrence relation for t=t0t=t_{0} to compute all necessary ψ(k)(t0)\psi^{(k)}(t_{0}) in (4.4). The ESAC is

ψ(t)=a(t0+b1)c1(t0+b2)c20t(x+b1)c1(x+b2)c2𝑑x.\psi(t)=a(t_{0}+b_{1})^{c_{1}}(t_{0}+b_{2})^{c_{2}}\int_{0}^{t}(x+b_{1})^{-c_{1}}(x+b_{2})^{-c_{2}}dx.

Thus E(D)<E(D)<\infty if and only if c1+c2>1c_{1}+c_{2}>1. It can be proved that ν({0})=0\nu(\{0\})=0. We give the expression for Dνq{}^{q}\!D_{\nu} in Appendix L.

9 Examples

Four real FoF data are presented in Table 2. Nonparametric analysis of the data can be found in Böhning and Schön [3], Lijoi et al. [31], Wang [49], Chee and Wang [12], Norris and Pollock [37], and Chiu and Chao [13]. In this section, we fit the data using the parametric models in Sections 5 and 8, demonstrate the use of various diagnostic plots, and compare different diversity estimates. Without loss of generality, we set t0=1t_{0}=1. The significance level of the tests is fixed to 5%.

Table 2: Four real FoF data and the MLE for the selected model using AIC
(i) Swine feces data
        (a^=8025.0,b^1=0.429,b^2=3.178,c^1=0.115,c^2=0.294\hat{a}=8025.0,\hat{b}_{1}=0.429,\hat{b}_{2}=3.178,\hat{c}_{1}=0.115,\hat{c}_{2}=0.294 under RDR1\mathrm{RDR}_{1})
kk 1 2 3 4 5 6 7 8 9 10 11
nk(t0)n_{k}(t_{0}) 8025 605 129 41 16 8 4 2 1 1 1
(ii) Accident data
        (a^=1318.1,b^1=0.617,b^2=198.9,c^1=0.211,c^2=45.362\hat{a}=1318.1,\hat{b}_{1}=0.617,\hat{b}_{2}=198.9,\hat{c}_{1}=0.211,\hat{c}_{2}=45.362 under RDR1\mathrm{RDR}_{1})
kk 1 2 3 4 5 6 7
nk(t0)n_{k}(t_{0}) 1317 239 42 14 4 4 1
(iii) Tomato flowers data
        (a^=1433.7,b^1=0.050,b^2=1.451,c^1=0.074,c^2=0.693\hat{a}=1433.7,\hat{b}_{1}=0.050,\hat{b}_{2}=1.451,\hat{c}_{1}=0.074,\hat{c}_{2}=0.693 under RDR1\mathrm{RDR}_{1})
kk 1 2 3 4 5 6 7 8 9 10 11 12 13
nk(t0)n_{k}(t_{0}) 1434 253 71 33 11 6 2 3 1 2 2 1 1
kk 14 16 23 27
nk(t0)n_{k}(t_{0}) 1 2 1 1
(iv) Bird abundance data (a^=14.696,b^=0.044,c^=0.772\hat{a}=14.696,\hat{b}=0.044,\hat{c}=0.772 under LDR1\mathrm{LDR}_{1})
kk 1 2 3 4 5 6 7 8 9 10 12 13 14
nk(t0)n_{k}(t_{0}) 11 12 10 6 2 5 1 3 2 4 1 1 1
kk 15 16 18 25 29 30 32 39 44 53 54
nk(t0)n_{k}(t_{0}) 2 1 2 1 1 1 1 1 1 1 1

The first data is the swine feces data which appeared and was analyzed in Chiu and Chao [13]. It is for the pooled contig spectra from seven non-medicated swine feces. The large n1(t0)n_{1}(t_{0}) relative to other frequencies is viewed as a signal for sequencing errors. Chiu and Chao [13] proposed a nonparametric estimate of the singleton count basing on the other counts, and the difference of this estimate and the observed singleton count is interpreted as outcome of missequencing. An implicit assumption of this approach is that sequencing errors inflate only the singleton count, and all other frequency counts are unaffected. It is equivalent to claim that there are sequencing errors which create solely zero-rate species.

To investigate whether zero-rate species really exist, we draw the D1/D2 and D2/D3 plots with 95% pointwise confidence bands in panels (a) and (b) respectively in Figure 3. The approximate linear curve in both plots indicates that both LDR1\mathrm{LDR}_{1} and LDR2\mathrm{LDR}_{2} are reasonable models. The heavy dashed lines in panels (a) and (b) are the lines fitted by MLE under LDR1\mathrm{LDR}_{1}. The fitted line is close to the curve in D1/D2 plot, but is not so in the D2/D3 plot. As the dashed line lies inside the confidence bands in panel (b), the disagreement between the singleton count and other counts is not strong enough to reject that they come from the fitted LDR1\mathrm{LDR}_{1}. The Pearson’s chi-square test statistic after grouping all cells with expected frequency less than 5 is 4.325 with 4 degrees of freedom. The estimated E(N1(t0))E(N_{1}(t_{0})) under this LDR1\mathrm{LDR}_{1} is 8027.6 which is marginally larger than n1(t0)n_{1}(t_{0}). As E^(N1(t0))>n1(t0)\hat{E}(N_{1}(t_{0}))>n_{1}(t_{0}), the MLE of ν({0})\nu(\{0\}) under LDR2\mathrm{LDR}_{2} should be zero. There is no significant evidence for the existence of zero-rate species. The heavy green curve in panel (a) is the fitted curve under RDR1\mathrm{RDR}_{1}. The heavy green curve is very close to the D1/D2 curve. RDR1\mathrm{RDR}_{1} performs better than LDR1\mathrm{LDR}_{1} in terms of Akaike information criterion (AIC) (the AIC for LDR1\mathrm{LDR}_{1} is -136060.6, and that for RDR1\mathrm{RDR}_{1} is -136061). 555The standard likelihood ratio test is not valid as the tested parameter value is on the boundary of the parameter space.

In panel (a) blue grid lines with slope 1 are drawn. From Section 7, the slope of the rear part of ξ^(t)\hat{\xi}(t) larger (or smaller) than 1 is an indication that E(D)E(D) is infinite (or finite). Apparently the slope of ξ^(t)\hat{\xi}(t) has slope always larger than 1. We are confident that E(D)E(D) is infinite. The modified first-order extrapolation line for E^(D)\hat{E}^{*}(D) (the purple line) and the zeroth-order extrapolation line for Chao1 estimator (the orange line) are drawn. As c^F>1\hat{c}_{F}>1, E^(D)=\hat{E}^{*}(D)=\infty. The orange extrapolation line does not look good. Chao1 estimator does not perform satisfactorily.

Refer to caption
Figure 3: D1/D2 and D2/D3 plots for the data. Panels (a), (c), (d) and (e) are D1/D2 plots for the four examples, and Panel (b) is the D2/D3 plot for the first example. The light black solid curves are the D1/D2 or D2/D3 curve. The light black dashed curves are the 95% pointwise confidence bands. The heavy black dashed lines are the lines fitted by the MLE under LDR1\mathrm{LDR}_{1}. The heavy green solid curves in panels (a), (c), (d) and (e) are the fitted curves under RDR1\mathrm{RDR}_{1}. The blue dotted lines in the D1/D2 plots are grid lines with slope 1. They are added to assist checking the slope of the curve. D1/D2 curve with slope larger than 1 is a signal for infinite E(D)E(D), while slope less than 1 is for finite E(D)E(D). The zeroth-order and the modified first-order extrapolation lines are drawn in D1/D2 plot in orange and purple respectively. They correspond to the extrapolation used by Chao1 estimator (orange line) and E^(D)\hat{E}^{*}(D) (purple line) respectively.

The second data come from 9461 accident insurance policies issued by an insurance company. It was used in Böhning and Schön [3], Wang [49], and Chee and Wang [12]. The species corresponds to the policies, and the frequency count to the number of claims during a particular year. The ξ^(t)\hat{\xi}(t) in the D1/D2 plot for the accident data in panel (c) is concave but not far from linear. The heavy dashed line is the fitted line under LDR1\mathrm{LDR}_{1}. The p-value for the Pearson chi-square test for LDR1\mathrm{LDR}_{1} is 0.0979. The MLE of cc is 1.1146 which is larger than 1. Therefore, the MLE of E(D)E(D) is infinite. Compared with the blue grid lines with slope 1, the average slope of the D1/D2 curve is close to one showing uncertainty about the finiteness of E(D)E(D). The heavy green curve in panel (c) is the fitted curve under RDR1\mathrm{RDR}_{1}. It is preferred to LDR1\mathrm{LDR}_{1} because it has smaller AIC (the AIC for LDR1\mathrm{LDR}_{1} and RDR1\mathrm{RDR}_{1} are -18691.95 and -18692.15 respectively). The MLE of E(D)E(D) under RDR1\mathrm{RDR}_{1} is 6354. It is less than the true value D=9461D=9461, and is comparable to the nonparametric estimates presented in Chee and Wang [12] which ranges from 4016 to 7374. The purple line is the modified first-order extrapolation used by E^(D)\hat{E}^{*}(D). As the slope of the purple line is less than one, E^(D)\hat{E}^{*}(D) is finite. For this data, c^F=0.4525\hat{c}_{F}=0.4525 and E^(D)=8249.2\hat{E}^{*}(D)=8249.2 which is closer to the true value. The orange extrapolation line for Chao1 estimator is also drawn. The difference of the two extrapolation lines is not small.

The third data come from a CDNA library of the expressed sequence tags of tomato flowers. It was studied in Böhning and Schön [3] and Lijoi et al. [31]. The D1/D2 plot in panel (d) shows that LDR1\mathrm{LDR}_{1} model does not fit the data well (the p-value of the Pearson chi-square test is 0.0059). The curve looks like a Bernstein function. An RDR1\mathrm{RDR}_{1} model is fitted and the fitted curve is shown in the plot by a heavy green curve. The model fits the data well (Pearson chi-square statistic after grouping all cells with expected frequency less than 5 is 1.470 with 2 degrees of freedom). As c^1+c^2=0.767<1\hat{c}_{1}+\hat{c}_{2}=0.767<1, the estimated E(D)E(D) is infinite. This estimate of E(D)E(D) agrees with the finding when we compare the slope of the rear part of the curve with one under the assistance of the blue grid lines. The modified first-order and zeroth-order extrapolation lines are drawn in purple and orange respectively. The difference of the slopes of the lines is not small.

The fourth data is the bird abundance data for the Wisconsin route of the North American Breeding Bird Survey for 1995. Totally 645 birds from 72 species are recorded. The data was studied in Norris and Pollock [37] where a mixture of five Poisson models was fitted, and the estimated DD is 76. Comparing with the blue grid lines, the slope of the D1/D2 curve is less than one. Species richness should be finite. The ξ^(t)\hat{\xi}(t) in the D1/D2 plot does not look like a straight line. However, the LDR1\mathrm{LDR}_{1} model is not rejected as the p-value of the Pearson’s chi-square test is 0.116. The estimated E(D)E(D) is 124.50. We also fit a RDR1\mathrm{RDR}_{1} model to the data. The fitted line for this model is shown in the figure as a heavy green curve. The MLE of E(D)E(D) under RDR1\mathrm{RDR}_{1} is 80.7. LDR1\mathrm{LDR}_{1} is preferred to RDR1\mathrm{RDR}_{1} as its AIC is -25.63 which is less than that for RDR1\mathrm{RDR}_{1} which is -24.53. RDR1\mathrm{RDR}_{1} does not fit the rear part of ξ^(t)\hat{\xi}(t) well because of an abrupt change of slope around t=0.5t=0.5. After t=0.6t=0.6, ξ^(t)\hat{\xi}(t) looks quite linear. It suggests that E^(D)\hat{E}^{*}(D) may be a good estimate. For this data, E^(D)=77.90\hat{E}^{*}(D)=77.90, which is close to the Chao1 estimate 77.04.

Refer to caption
Figure 4: Diagnostic plots for data. Each column corresponds to one diagnostic plot. The first four columns are plots suggested in Table 1. They check c=0c=0 (zero-truncated Poisson distribution), c=0.5c=0.5 (geometric distribution), c=1c=1 (log-series distribution) and b=0b=0 (log(D)-log plot for power law). The last column is the log-log plot used to check the power law. The two red dashed curves are the 95% approximate pointwise confidence band. There are five rows. The first four rows are for the real data: swine feces data, accident data, tomato flowers data and bird abundance data respectively. The last row is for data simulated from the LDR1\mathrm{LDR}_{1} distribution to be checked. Their parameter vector (a,b,c)(a,b,c) are (200,1,0) (zero-truncated Poisson distribution), (200,1,0.5) (geometric distribution), (200,1,1) (log-series distribution), (200,0,3) (power law), and (200,0,3) (power law) respectively. When the curve is close to a straight line, it means that the checked distribution fits the data.

We investigate the performance of various special graphical checks in Figure 4. If the curve in the plot is close to a straight line, we assume that the corresponding distribution fits the data. The red dashed curves are the 95% approximate pointwise confidence band. The five columns correspond to five graphical checks: namely the plot for zero-truncated Poisson distribution (check c=0c=0), for geometric distribution (check c=0.5c=0.5), for log-series distribution (check c=1c=1), for power law (check b=0b=0; it is the log(D)-log plot), and the log-log plot for power law. The first four rows of plots are for the four real FoF data. The last row is for data simulated from the specified distribution which the diagnostic plot is designed to check. The curves in the last row are close to a straight line as expected. The log-log plot in the last column does not perform satisfactorily. The first three graphs shows almost perfect line. Comparatively, the log(D)-log plot correctly declare discrepancy of power law to the four real data. The graphical check for log-series distribution correctly detects that accident data follow closely to a log-series distribution (the MLE of cc under LDR1\mathrm{LDR}_{1} is 1.1146). The curves for swine feces data and the tomato flowers data are concave. It implies that E(D)E(D) is infinite as pointed out at the end of Section 4. The plot for geometric distribution and zero-truncated Poisson plot look reasonable except for the accident data.

Consider estimation of Hill numbers. In Table 3, we compare the model-based estimator developed in the paper, and Chao-Jost estimator proposed in Chao and Jost [10]. For our model-based estimator, 95%95\% confidence intervals are constructed using parametric bootstrap method. Details are given in Appendix K. The difference between our estimators and Chao-Jost estimator is mild when q=2q=2. In other situations, the difference can be large. Sometimes the two confidence intervals do not overlap. The width of the 95% confidence interval for our parametric estimator is larger than that of the corresponding interval for Chao-Jost estimator. The interval estimate for Chao-Jost estimator is obtained basing on the assumption that DD is finite and all unseen species have equal abundance. For the accident data, the true DD is contained in the model-based 95% confidence interval, but not in that for the Chao-Jost estimator.

[b]

Table 3: Chao-Jost Estimates and Our Parametric Estimates of Hill Numbers for Selected Models (Values enclosed between square parentheses are 95% confidence interval)
Data Model/Method Dν0{}^{0}\!D_{\nu} Dν1{}^{1}\!D_{\nu} Dν2{}^{2}\!D_{\nu}
Swine RDR1\mathrm{RDR}_{1} \infty 194649 27698
feces [105408,][105408,\infty] [62348,407027][62348,407027] [23940,31195][23940,31195]
data Chao-Jost 1 62051 53835 27801
[59197,64905][59197,64905] [51634,56035][51634,56035] [26076,29526][26076,29526]
Accident RDR1\mathrm{RDR}_{1} 6354 5203 3557
data [4741,16013][4741,16013] [4458,6684][4458,6684] [2996,4210][2996,4210]
(True DD Chao-Jost 5248 4788 3606
=9461=9461) [4713,5783][4713,5783] [4449,5126][4449,5126] [3322,3890][3322,3890]
Tomato RDR1\mathrm{RDR}_{1} \infty 5941 1311
flowers [7849,][7849,\infty] [4054,7741][4054,7741] [699,2120][699,2120]
data Chao-Jost 5887 4079 1450
[5405,6370][5405,6370] [3809,4349][3809,4349] [1254,1646][1254,1646]
Bird LDR1\mathrm{LDR}_{1} 124.51 43.84 28.43
abundance [63.63,417.98][63.63,417.98] [32.82,57.58][32.82,57.58] [19.55,39.91][19.55,39.91]
data Chao-Jost 77.03 41.94 27.57
[61.68,92.39][61.68,92.39] [39.35,44.54][39.35,44.54] [24.72,30.41][24.72,30.41]
  • 1

    Hill numbers estimates are the estimator derived in Chao and Jost [10]. Point and interval estimates are computed using R package SpadeR [8].

Consider estimation of species richness. In Table 4, we compare E^(D)\hat{E}^{*}(D), Chao1 estimator, iChao1 estimator in Chiu et al. [14], and the abundance-based coverage estimator (ACE) in Chao and Lee [9] and Chao et al. [11]. We construct the 95%95\% confidence intervals for E(D)E(D) basing on E^(D)\hat{E}^{*}(D) using parametric bootstrap method. Details are given in Appendix K. Estimator E^(D)\hat{E}^{*}(D) usually gives larger estimate. For the bird abundance data, all estimators give similar point estimate. For the accident data, E^(D)\hat{E}^{*}(D) gives the best estimate of the true DD. For the remaining two data, the differences are huge. Our estimate is infinite, while other estimates are finite as assumed.

The difference between E^(D)\hat{E}^{*}(D) and the other three methods can well be explained by the test on whether the observed N1(t0)N_{1}(t_{0}), N2(t0)N_{2}(t_{0}) and N3(t0)N_{3}(t_{0}) fit a truncated Poisson distribution. The p-value of the test at the end of Section 7 for the bird abundance data is 0.374. For this data, all estimates are close to each others. The p-value for accident data is 0.040. The difference is significant, but not huge. The p-value of the other two data are smaller than 6×1066\times 10^{-6} implying that the deviation is huge. The 95% confidence interval for swine feces data is [,][\infty,\infty], a much stronger signal when compared to the confidence interval in Table 3. For E^(D)\hat{E}^{*}(D), only the 95% confidence interval for bird abundance data has finite width. It has the lower endpoint close to the lower endpoint of other confidence intervals, but it has much larger upper endpoint.

Estimator E^(D)\hat{E}^{*}(D) gives wider confidence interval than that for Chao1 estimator because it uses a modified first-order extrapolation that requires the estimation of slope, whereas zeroth-order extrapolation does not have this requirement. Nevertheless, first-order extrapolation looks more rational when compared to zeroth-order extrapolation which can only estimate a lower bound.

[b]

Table 4: Point and 95% Interval Estimates of Species Richness
Data Estimator Estimate 95%Lower 95%Upper
Swine E^(D)\hat{E}^{*}(D) \infty \infty \infty
feces Chao1 1 62051.3 57409.9 67136.2
data iChao1 2 67615.0 62889.6 72753.5
ACE 3 68827.7 62928.4 75370.3
Accident E^(D)\hat{E}^{*}(D) 8249.2 5087.4 \infty
data Chao1 5247.8 4682.7 5917.3
(True D iChao1 5966.7 5396.9 6622.6
= 9461) ACE 5683.8 5031.1 6461.5
Tomato E^(D)\hat{E}^{*}(D) \infty 18108.3 \infty
flowers Chao1 5887.4 5274.4 6609.2
data iChao1 6512.3 5951.4 7149.5
ACE 6645.5 7779.2 11859.5
Bird E^(D)\hat{E}^{*}(D) 77.9 72.5 297.0
abundance Chao1 77.0 73.3 92.1
data iChao1 77.5 74.6 83.3
ACE 78.7 74.3 91.7
  • 1

    Chao1 estimate and the confidence interval are computed using R package SpadeR [8].

  • 2

    iChao1 estimator is an improved Chao1 estimator proposed in Chiu et al. [14]. The point and the interval estimates are computed using R package SpadeR [8].

  • 3

    Abundance-based coverage estimate (ACE) [9] and the confidence interval are computed using R package SpadeR [8].

10 ρ\rho-appearance Design

Usually we collect all available information within the survey period. Under our framework, all useful information is in FoF. If labor saving is our concern, we may neglect some minor information, say halt recording a species as soon as its observed frequency reaches a fixed positive integer ρ\rho in the study period [0,t0][0,t_{0}]. In this case, the observation period for each species varies. The period is short for abundant species, and long for rare species. We call this design, the ρ\rho-appearance design. This design places more emphasis on rare species than abundant species which is in line with the common understanding that information about the rare species is critical when our interest is in DD. In bird survey, species can be identified by distant sightings or short bursts of song. Stop recording abundant species early helps the researcher concentrating more on the rare species. When ρ=\rho=\infty, we obtain n~(t0)\tilde{n}(t_{0}). When ρ=1\rho=1, we record only the first appearance-time of each seen species. It is exactly the information available in the empirical SAC.

We call the appearance time of the ρ\rhoth individual of a species in a survey, the ρ\rho-appearance time of that species. Our observations are {n1(t0),,nρ1(t0),\{n_{1}(t_{0}),\ldots,n_{\rho-1}(t_{0}), r1,r2,,rm}r_{1},r_{2},\ldots,r_{m}\} where r1,,rmr_{1},\ldots,r_{m} are the observed ρ\rho-appearance times. The log-likelihood function given a realization {nj(t0)}j=1,,ρ1,\{n_{j}(t_{0})\}_{j=1,\ldots,\rho-1}, {ri}i=1,,m\{r_{i}\}_{i=1,\ldots,m} is proved in Appendix M to be

log((ψ|{nj(t0)},{ri}))=ψ(t0)+j=1ρ1nj(t0)log(|ψ(j)(t0)|)+i=1mlog(|ψ(ρ)(ri)|).\log(\mathcal{L}(\psi|\{n_{j}(t_{0})\},\{r_{i}\}))=-\psi(t_{0})+\sum_{j=1}^{\rho-1}n_{j}(t_{0})\log(|\psi^{(j)}(t_{0})|)+\sum_{i=1}^{m}\log(|\psi^{(\rho)}(r_{i})|). (10.1)

A numerical advantage of (10.1) is that it does not require a general expression of ψ(k)(t)\psi^{(k)}(t) for all k1k\geq 1. A small simulation experiment in Appendix N shows that the loss in information of ρ\rho-appearance design when compared to the standard design is marked when ρ=1\rho=1, and minor when ρ=4\rho=4.

By the displacement theorem of Poisson process [29], the ρ\rho-appearance times form a Poisson process with intensity function

fρ(r)=(λr)ρ1eλr(ρ1)!ν(dλ)=rρ1(ρ1)!(1)ρ1ψ(ρ)(r).f_{\rho}(r)=\int\frac{(\lambda r)^{\rho-1}e^{-\lambda r}}{(\rho-1)!}\nu(d\lambda)=\frac{r^{\rho-1}}{(\rho-1)!}(-1)^{\rho-1}\psi^{(\rho)}(r). (10.2)

From (4.2), another expression for fρ(r)f_{\rho}(r) is fρ(r)=ρE(Nρ(r))/rf_{\rho}(r)=\rho E(N_{\rho}(r))/r. Equation (10.2) gives another interpretation of ψ(k)(t)\psi^{(k)}(t). For example, ψ(1)(r)\psi^{(1)}(r) is the intensity function of the 1-appearance times (thus ψ(1)(r)/ψ(t0)\psi^{(1)}(r)/\psi(t_{0}) is the density function of the 1-appearance times of all seen species in [0,t0][0,t_{0}]), and r|ψ(2)(r)|r|\psi^{(2)}(r)| is the intensity function of the 2-appearance times.

11 Inference on Empirical Species Accumulation Curve

Suppose we only observe the empirical SAC, n+(t)n_{+}(t) for t[0,t0]t\in[0,t_{0}], or equivalently, the 1-appearance times of all seen species in [0,t0][0,t_{0}], say r1,,rn+(t0)r_{1},\ldots,r_{n_{+}(t_{0})} (i.e., the case ρ=1\rho=1 in Section 10). From (10.1), the log-likelihood function is log((ψ{ri}))=i=1n+(t0)log(ψ(1)(ri))ψ(t0)\log(\mathcal{L}(\psi\mid\{r_{i}\}))=\sum_{i=1}^{n_{+}(t_{0})}\log(\psi^{(1)}(r_{i}))-\psi(t_{0}). If ψ(t)\psi(t) has a free scale parameter, MLE of ψ(t0)\psi(t_{0}) is n+(t0)n_{+}(t_{0}). The MLE of ψ(t)\psi(t) for power law has a simple closed form n+(t0)(t/t0)zn_{+}(t_{0})(t/t_{0})^{z}, where z=min{n+(t0)/ilog(t0/ri),1}z=\min\{n_{+}(t_{0})/\sum_{i}\log(t_{0}/r_{i}),1\}.

Given a parametric form of ψ(t)\psi(t), a traditional approach is to fit it to the empirical SAC by linear or non-linear least-squares method. Two differences between the MLE approach and the curve-fitting method are noteworthy. First, the MLE of ψ(t0)\psi(t_{0}) is equal to n+(t0)n_{+}(t_{0}) whenever ψ(t)\psi(t) has a free scale parameter, and it is not the case in the curve-fitting approach. Second, the MLE method fits the density function to the 1-appearance times, while the curve-fitting method fits a function to the empirical SAC. The curve-fitting methods do not take the interdependence among the points in the empirical SAC into consideration. Since such interdependence is present in species-time curves and Type I species-area curves [43], the curve-fitting approach is theoretically flawed. Although improvements can be made in curve-fitting approach through transformation and/or adding weights, maximum likelihood approach is preferred for its proven effectiveness under the assumption that the model is correct.

In certain situations, only the values of the empirical SAC at a finite set of times are available. For example, only the cumulative number of species observed after day 1, day 2, and so on are recorded. Suppose the observed N+(i)N_{+}(\ell_{i}) is n+(i)n_{+}(\ell_{i}) for i=1,,mi=1,\ldots,m with 0=0<1<<m=t00=\ell_{0}<\ell_{1}<\ldots<\ell_{m}=t_{0}. The log-likelihood function is

log((ψ{n+(i)}i=1,,m))\displaystyle\log(\mathcal{L}(\psi\mid\{n_{+}(\ell_{i})\}_{i=1,\ldots,m}))
=i=1m(n+(i)n+(i1))log(ψ(i)ψ(i1))n+(t0)log(ψ(t0)).\displaystyle=\sum_{i=1}^{m}(n_{+}(\ell_{i})-n_{+}(\ell_{i-1}))\log(\psi(\ell_{i})-\psi(\ell_{i-1}))-n_{+}(t_{0})\log(\psi(t_{0})).

In the simulation study in Appendix O, MLE has smaller root mean squared relative error in extrapolation when compared to the curve-fitting method.

The distribution function of the 1-appearance times in time interval [0,t0][0,t_{0}] is ψ(t)/ψ(t0)\psi(t)/\psi(t_{0}). This fact holds in general for any nondecreasing ψ(t)\psi(t) with ψ(0)=0\psi(0)=0, including sigmoid function such as the cumulative Weibull function. If we assume that the 1-appearance times are independent, we can perform maximum likelihood inference conditional on n+(t0)n_{+}(t_{0}) when a parametric form of ψ(t)\psi(t) is given. Full maximum likelihood calculation is possible when further assumption on the distribution of N+(t0)N_{+}(t_{0}) is made. As the empirical SAC is proportional to the empirical distribution function of the 1-appearance times, statistical tools for empirical distribution function can be used. For example, we can apply the Dvoretzky-Kiefer-Wolfowitz inequality to construct confidence bands for the distribution function ψ(t)/ψ(t0)\psi(t)/\psi(t_{0}).

12 Discussion

A general framework, MPPP, is proposed for species abundance data. This framework has the following novel features: (1) it delineates the relation between two analytic tools – SAD and SAC, and an MPPP can be characterized by either one of them; (2) it characterizes the class of possible ESACs to be the class of Bernstein functions, giving solid theoretical support to the empirical observation that SACs are nondecreasing concave functions; (3) it allows the existence of zero-rate species; (4) it contains the LDR1\mathrm{LDR}_{1} model, a parametric model where the first derivative ratio of the ESAC is linear, which admits several graphical checks, and two generalizations, namely LDR2\mathrm{LDR}_{2} which includes zero-rate species, and RDR1\mathrm{RDR}_{1} model where the first derivative ratio is a rational function; (5) it admits a new species richness estimator E^(D)\hat{E}^{*}(D); (6) it admits a natural generalization of Hill numbers as measures of species diversity, which are well-defined even when the number of species is random and infinite; and (7) it allows inference to be performed when only the first ρ\rho appearance times of a species is recorded, which we refer as ρ\rho-appearance design.

Compared to the conventional curve-fitting approach to SAC, our framework has a more solid theoretical underpinning, as an MPPP is uniquely characterized by an ESAC that is a Bernstein function. In the study of species-area curve, power law is popular. Nevertheless, under our model, power law is an extreme case. Its E(D)E(D) and E(S(t))E(S(t)) for any positive tt are infinite, and its PSAD has a heavy right tail. Even though curve fitting can give reasonable estimates, the MPPP framework provides a parametric generative model in which maximum likelihood estimates can be obtained. Matthews and Whittaker [34] suggests using maximum likelihood methods instead of least-squares approaches in SAD fitting. We would like to extend their advice to ESAC fitting.

One possible future direction is to apply MPPP in a nonparametric setting, which lessens the need for model selection. Another future direction is to understand the tradeoff that arises in the ρ\rho-appearance design. A small ρ\rho can reduce the sampling effort, at the expense of less accurate estimates of parameters. Investigating this tradeoff in a theoretic and empirical setting can provide guidance to the design.

Acknowledgments

The authors are grateful to the associate editor and two anonymous referees for helpful suggestions that improved the presentation of this article.

Appendix A Conditional Distribution of the Rate of an Individual at a Fixed Time

Informally, if we condition on the event that there is one individual at exact time tt, then the rate of its species follows the distribution ν/Λ\nu/\Lambda if Λ<\Lambda<\infty. The formal statement below involves a limit argument.

Proposition A: Fix t0t\geq 0, and assume Λ<\Lambda<\infty. Conditional on the event that there is at least one individual in the time interval [t,t+ϵ][t,t+\epsilon] for ϵ>0\epsilon>0, then the probability that there is exactly one individual in [t,t+ϵ][t,t+\epsilon] tends to 11, and the rate of its species converges in distribution to the distribution ν/Λ\nu/\Lambda as ϵ0\epsilon\to 0.

Proof. Let AA be the event that there is at least one individual in the time interval [t,t+ϵ][t,t+\epsilon], and BB be the event that there is exactly one individual in the time interval [t,t+ϵ][t,t+\epsilon] for ϵ>0\epsilon>0. The probability that a species with rate λ\lambda is observed during [t,t+ϵ][t,t+\epsilon] is 1exp(λϵ)1-\exp(-\lambda\epsilon). By the thinning property, the rates of the positive-rate species observed during [t,t+ϵ][t,t+\epsilon] form a Poisson process with intensity measure (1exp(λϵ))ν~(dλ)(1-\exp(-\lambda\epsilon))\tilde{\nu}(d\lambda). Considering also the zero-rate species, the rates of the species observed during [t,t+ϵ][t,t+\epsilon] form a Poisson process with intensity measure (1exp(λϵ))λ1ν(dλ)(1-\exp(-\lambda\epsilon))\lambda^{-1}\nu(d\lambda) (where (1exp(λϵ))λ1=ϵ(1-\exp(-\lambda\epsilon))\lambda^{-1}=\epsilon when λ=0\lambda=0). Event AA is that this Poisson process has at least one point. As 1exp(x)x1-\exp(-x)\leq x when x0x\geq 0,

P(A)\displaystyle P(A) =\displaystyle= 1exp(0(1exp(λϵ))λ1ν(dλ))\displaystyle 1-\exp\left(-\int_{0}^{\infty}(1-\exp(-\lambda\epsilon))\lambda^{-1}\nu(d\lambda)\right)
\displaystyle\leq 0(1exp(λϵ))λ1ν(dλ)ϵΛ.\displaystyle\int_{0}^{\infty}(1-\exp(-\lambda\epsilon))\lambda^{-1}\nu(d\lambda)\leq\epsilon\Lambda.

Similarly, the rates of the species observed exactly one time during [t,t+ϵ][t,t+\epsilon] form a Poisson process with intensity measure

[exp(λϵ)λϵ]λ1ν(dλ)=ϵexp(λϵ)ν(dλ).[\exp(-\lambda\epsilon)\lambda\epsilon]\lambda^{-1}\nu(d\lambda)=\epsilon\exp(-\lambda\epsilon)\nu(d\lambda).

Event BB is that this Poisson process has exactly one point. Thus

P(B)=exp(ϵ0exp(λϵ)ν(dλ))0ϵexp(λϵ)ν(dλ),P(B)=\exp\left(-\epsilon\int_{0}^{\infty}\exp(-\lambda\epsilon)\nu(d\lambda)\right)\int_{0}^{\infty}\epsilon\exp(-\lambda\epsilon)\nu(d\lambda),

which is equal to ϵΛ(1+o(1))\epsilon\Lambda(1+o(1)) when ϵ0\epsilon\to 0 because Λ\Lambda is finite. Therefore,

limϵ0P(BA)=limϵ0P(B)P(A)limϵ0ϵΛ(1+o(1))ϵΛ=1.\lim_{\epsilon\to 0}P(B\mid A)=\lim_{\epsilon\to 0}\frac{P(B)}{P(A)}\geq\lim_{\epsilon\to 0}\frac{\epsilon\Lambda(1+o(1))}{\epsilon\Lambda}=1.

As probability cannot be larger than 1, it implies that P(BA)1P(B\mid A)\to 1 as ϵ0\epsilon\to 0.

Given that exactly one individual appears in [t,t+ϵ][t,t+\epsilon], let XX be the rate of the species that the individual belongs and FF be its distribution function. For any positive λ0\lambda_{0}, let Cλ0C_{\lambda_{0}} be the event that XX is less than or equal to λ0\lambda_{0}. Then

F(λ0)\displaystyle F(\lambda_{0}) =\displaystyle= P(Cλ0)P(B)=exp(ϵ0λ0exp(λϵ)ν(dλ))0λ0ϵexp(λϵ)ν(dλ)exp(ϵ0exp(λϵ)ν(dλ))0ϵexp(λϵ)ν(dλ).\displaystyle\frac{P(C_{\lambda_{0}})}{P(B)}=\frac{\exp\left(-\epsilon\int_{0}^{\lambda_{0}}\exp(-\lambda\epsilon)\nu(d\lambda)\right)\int_{0}^{\lambda_{0}}\epsilon\exp(-\lambda\epsilon)\nu(d\lambda)}{\exp\left(-\epsilon\int_{0}^{\infty}\exp(-\lambda\epsilon)\nu(d\lambda)\right)\int_{0}^{\infty}\epsilon\exp(-\lambda\epsilon)\nu(d\lambda)}.

It follows that

limϵ0F(λ0)=0λ0ν(dλ)Λ.\lim_{\epsilon\to 0}F(\lambda_{0})=\frac{\int_{0}^{\lambda_{0}}\nu(d\lambda)}{\Lambda}.

Therefore, XX converges in distribution to the distribution ν/Λ\nu/\Lambda as ϵ0\epsilon\to 0.      \Box

For the case Λ=\Lambda=\infty, as ϵ0\epsilon\to 0, the rate of the species observed in [t,t+ϵ][t,t+\epsilon] diverges to infinity.

Proposition B: Fix t0t\geq 0, and assume Λ=\Lambda=\infty. Conditional on the event that there is at least one individual observed in the time interval [t,t+ϵ][t,t+\epsilon] for ϵ>0\epsilon>0, then the minimum rate among observed species converges in probability to \infty as ϵ0\epsilon\to 0.

Proof. Let AA be the event that there is at least one individual in the time interval [t,t+ϵ][t,t+\epsilon]. As in the case Λ<\Lambda<\infty in Proposition A, for any fixed λ00\lambda_{0}\geq 0, we have

P(at least one individuals in [t,t+ϵ] with rate λ0)\displaystyle P\left(\text{at least one individuals in $[t,t+\epsilon]$ with rate $\leq\lambda_{0}$}\right)
=1exp(0λ0(1exp(λϵ))λ1ν(dλ))\displaystyle=1-\exp\left(-\int_{0}^{\lambda_{0}}(1-\exp(-\lambda\epsilon))\lambda^{-1}\nu(d\lambda)\right)
0λ0(1exp(λϵ))λ1ν(dλ)\displaystyle\leq\int_{0}^{\lambda_{0}}(1-\exp(-\lambda\epsilon))\lambda^{-1}\nu(d\lambda)
ϵ0λ0ν(dλ)=O(ϵ)\displaystyle\leq\epsilon\int_{0}^{\lambda_{0}}\nu(d\lambda)=O(\epsilon)

as ϵ0\epsilon\to 0 since 0λ0ν(dλ)<\int_{0}^{\lambda_{0}}\nu(d\lambda)<\infty due to 0min{1,λ1}ν(dλ)<\int_{0}^{\infty}\min\{1,\lambda^{-1}\}\nu(d\lambda)<\infty by the definition of MPPP. On the other hand, as 1exp(x)xmax{1x/2,0}1-\exp(-x)\geq x\max\{1-x/2,0\} when x0x\geq 0,

P(A)=P(N+(ϵ)>0)=1exp(ψ(ϵ))ψ(ϵ)max{1ψ(ϵ)/2,0}.P(A)=P(N_{+}(\epsilon)>0)=1-\exp(-\psi(\epsilon))\geq\psi(\epsilon)\max\left\{1-\psi(\epsilon)/2,0\right\}.

Since limϵ0ψ(ϵ)/ϵ=ψ(1)(0)=Λ=\lim_{\epsilon\to 0}\psi(\epsilon)/\epsilon=\psi^{(1)}(0)=\Lambda=\infty, for any λ0>0\lambda_{0}>0,

limϵ0P(at least one individuals in [t,t+ϵ] with rate λ0A)\displaystyle\lim_{\epsilon\to 0}P(\mbox{at least one individuals in }[t,t+\epsilon]\mbox{ with rate }\leq\lambda_{0}\mid A)
limϵ0O(ϵ)/ϵ(ψ(ϵ)/ϵ)max{1ψ(ϵ)/2,0}=0.\displaystyle\leq\lim_{\epsilon\to 0}\frac{O(\epsilon)/\epsilon}{(\psi(\epsilon)/\epsilon)\max\left\{1-\psi(\epsilon)/2,0\right\}}=0.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\Box

Appendix B Proof of the equivalence between Condition (2.1) and the finiteness of ESAC

Suppose Condition (2.1) holds. Then

1eλtλν(dλ)min{t,1λ}ν(dλ)max{t,1}min{1,1λ}ν(dλ)<\int\frac{1-e^{-\lambda t}}{\lambda}\nu(d\lambda)\leq\int\min\left\{t,\frac{1}{\lambda}\right\}\nu(d\lambda)\leq\max\{t,1\}\int\min\left\{1,\frac{1}{\lambda}\right\}\nu(d\lambda)<\infty

for t0t\geq 0. On the other hand, if E(N+(1))<E(N_{+}(1))<\infty, then

min{1,λ1}ν(dλ)11exp(1)1exp(λ)λν(dλ)=E(N+(1))1exp(1)<.\int\min\{1,\lambda^{-1}\}\nu(d\lambda)\leq\frac{1}{1-\exp(-1)}\int\frac{1-\exp(-\lambda)}{\lambda}\nu(d\lambda)=\frac{E(N_{+}(1))}{1-\exp(-1)}<\infty.

Appendix C Equivalent definition of the MPPP

In the definition of the MPPP in the paper, the individuals of the zero-rate species and the individuals of the positive-rate species are generated separately. Here we present an alternative definition where all individuals are generated in a unified manner.

Definition: (Equivalent definition of MPPP) An MPPP GG is characterized by a nonzero species intensity measure ν\nu, which is a measure over 0\mathbb{R}_{\geq 0} satisfying 0min{1,λ1}ν(dλ)<\int_{0}^{\infty}\min\{1,\lambda^{-1}\}\nu(d\lambda)<\infty. Define ν̊\mathring{\nu} to be a measure over 02\mathbb{R}_{\geq 0}^{2} by

ν̊(A)=0𝟏{(λ,t)A}eλt𝑑tν(dλ)\mathring{\nu}(A)=\int\int_{0}^{\infty}\mathbf{1}\{(\lambda,t)\in A\}e^{-\lambda t}dt\cdot\nu(d\lambda)

for any measurable set A02A\subseteq\mathbb{R}_{\geq 0}^{2}. Generate (λ1,t1),(λ2,t2),(\lambda_{1},t_{1}),(\lambda_{2},t_{2}),\ldots (a finite or countably infinite sequence) according to a Poisson process with intensity measure ν̊\mathring{\nu}. For each simulated (λi,ti)(\lambda_{i},t_{i}), we generate a realization ηi\eta_{i} (independently across ii) of a Poisson process with rate λi\lambda_{i}, conditioned on the event that the first point is at time tit_{i}. (That is, it contains the point tit_{i} together with a Poisson process starting at time tit_{i}. If λi=0\lambda_{i}=0, then ηi\eta_{i} contains only one point tit_{i}.) Each ηi\eta_{i} stores the appearance times of one species with rate λi\lambda_{i}. Finally, we take G={η1,η2,}G=\{\eta_{1},\eta_{2},\ldots\}.

This definition models the species that will eventually be observed in a study. The first appearance time for each of such species (i.e., the first point of each ηi\eta_{i}) is explicitly included in the definition of ν̊\mathring{\nu}.

Appendix D Proof of the Probability that an Individual Observed in a Given Future Time Belongs to a Species Represented kk times in [0,t0][0,t_{0}]

Let λ\lambda^{*} be the rate of the species observed in a given future time. Suppose Λ<\Lambda<\infty. From Proposition A in Appendix A, the probability measure of λ\lambda^{*} is ν/Λ\nu/\Lambda. Thus

P(individual observed in a given future time belongs to a species\displaystyle P(\mbox{individual observed in a given future time belongs to a species }
represented k times in [0,t0])\displaystyle\mbox{represented }k\mbox{ times in }[0,t_{0}])
=E[P(individual observed in a given future time belongs to a species\displaystyle=E[P(\mbox{individual observed in a given future time belongs to a species }
represented k times in [0,t0]|λ)]\displaystyle\mbox{represented }k\mbox{ times in }[0,t_{0}]~{}|~{}\lambda^{*})]
=E[(λt0)kexp(λt0)k!]\displaystyle=E\left[\frac{(\lambda^{*}t_{0})^{k}\exp(-\lambda^{*}t_{0})}{k!}\right]
=1Λ(λt0)kexp(λt0)k!ν(dλ)\displaystyle=\frac{1}{\Lambda}\int\frac{(\lambda^{*}t_{0})^{k}\exp(-\lambda^{*}t_{0})}{k!}\nu(d\lambda^{*})
=(k+1)E(Nk+1(t0))E(S(t0)).\displaystyle=\frac{(k+1)E(N_{k+1}(t_{0}))}{E(S(t_{0}))}.

The last expression holds also when Λ=\Lambda=\infty (i.e., the above probability is zero) because from Proposition B in Appendix A, the rate of the future individual approaches infinity and thus that species should have been seen infinite number of times in [0,t0][0,t_{0}].

Appendix E Proof of Equation (3.5)

pk(t)\displaystyle p_{k}(t) =\displaystyle= Pr(a species is represented k times in time interval [0,t]\displaystyle\Pr(\mbox{a species is represented }k\mbox{ times in time interval }[0,t]\mid
it is observed in time interval [0,t])\displaystyle\mbox{it is observed in time interval }[0,t])
=\displaystyle= E(Pr(a species is observed k times in time interval [0,t]\displaystyle E(\Pr(\mbox{a species is observed }k\mbox{ times in time interval }[0,t]\mid
the rate of the recorded species is λ)the species is observed in\displaystyle\mbox{the rate of the recorded species is }\lambda)~{}\mid~{}\mbox{the species is observed in }
time interval [0,t])\displaystyle\mbox{ time interval }[0,t])
=\displaystyle= E((λt)kexp(λt)/k!1exp(λt)|it is observed in time interval [0,t]).\displaystyle E\left(\left.\frac{(\lambda t)^{k}\exp(-\lambda t)/k!}{1-\exp(-\lambda t)}~{}\right|~{}\mbox{it is observed in time interval }[0,t]\right).

From the thinning property of Poisson processes, the probability measure for λ\lambda given that the species is observed in time interval [0,t][0,t] is

λ1(1exp(λt))νλ1(1exp(λt))ν(dλ).\frac{\lambda^{-1}(1-\exp(-\lambda t))\nu}{\int\lambda^{*-1}(1-\exp(-\lambda^{*}t))\nu(d\lambda^{*})}.

Therefore,

pk(t)\displaystyle p_{k}(t) =\displaystyle= (λt)kexp(λt)/k!1exp(λt)λ1(1exp(λt))λ1(1exp(λt))ν(dλ)ν(dλ)\displaystyle\int\frac{(\lambda t)^{k}\exp(-\lambda t)/k!}{1-\exp(-\lambda t)}\frac{\lambda^{-1}(1-\exp(-\lambda t))}{\int\lambda^{*-1}(1-\exp(-\lambda^{*}t))\nu(d\lambda^{*})}\nu(d\lambda)
=\displaystyle= λk1tkexp(λt)/k!ν(dλ)λ1(1exp(λt))ν(dλ)\displaystyle\frac{\int\lambda^{k-1}t^{k}\exp(-\lambda t)/k!\nu(d\lambda)}{\int\lambda^{*-1}(1-\exp(-\lambda^{*}t))\nu(d\lambda^{*})}
=\displaystyle= E(Nk(t))E(N+(t)).\displaystyle\frac{E(N_{k}(t))}{E(N_{+}(t))}.

Appendix F Proof of Proposition 1

For any MPPP, from (4.1), ψ(t)\psi(t) is a Bernstein function. Clearly ψ(0)=0\psi(0)=0. It proves the necessity part of (a). The sufficiency part of (a) follows from the fact that every Bernstein function g(t)g(t) with g(0)=0g(0)=0 has a unique Lévy-Khintchine representation

g(t)=κt+0(1exp(λt))μ(dλ),g(t)=\kappa t+\int_{0}^{\infty}(1-\exp(-\lambda t))\mu(d\lambda),

where κ0\kappa\geq 0, and μ\mu is a measure over [0,)[0,\infty) such that 0min{1,λ}μ(dλ)<\int_{0}^{\infty}\min\{1,\lambda\}\mu(d\lambda)<\infty. Define an MPPP with ν({0})=κ\nu(\{0\})=\kappa and ν~=μ\tilde{\nu}=\mu. The condition 0min{1,λ}μ(dλ)<\int_{0}^{\infty}\min\{1,\lambda\}\mu(d\lambda)<\infty becomes Condition (2.1). From (3.2), the ESAC of this MPPP is equal to g(t)g(t). It proves the sufficiency part of (a). It also proves that ψ(t)\psi(t) uniquely determines an MPPP in (c) because Lévy-Khintchine representation is unique.

To prove (d), we only need to show that ht(s)h_{t}(s) in (4.3) is the probability generating function of p(t)p(t). Clearly ht(0)=0h_{t}(0)=0. From (4.2), for k1k\geq 1, ht(k)(0)/k!=(1)k+1tkψ(k)(t)/(k!ψ(t))=E(Nk(t))/ψ(t)=pk(t)h_{t}^{(k)}(0)/k!=(-1)^{k+1}t^{k}\psi^{(k)}(t)/(k!\psi(t))=E(N_{k}(t))/\psi(t)=p_{k}(t). It completes the proof of (d).

Given ht0(s)h_{t_{0}}(s) and ψ(t0)\psi(t_{0}) for a fixed t0>0t_{0}>0, from (4.3), ψ(t)=ψ(t0)(1ht0(1t/t0))\psi(t)=\psi(t_{0})(1-h_{t_{0}}(1-t/t_{0})). The ESAC is uniquely determined and so is the MPPP. It completes the proof for the remaining part of (c).

To prove the necessity of (b), let g(s)g(s) be the probability generating function of p(t0)p(t_{0}). From (4.3),

g(s)=1ψ((1s)t0)ψ(t0).g(s)=1-\frac{\psi((1-s)t_{0})}{\psi(t_{0})}.

Thus g(0)=0g(0)=0 and g(1)=1g(1)=1. Furthermore,

ψ(t)=ψ(t0)(1g(1t/t0)).\psi(t)=\psi(t_{0})(1-g(1-t/t_{0})).

It follows that when k1k\geq 1, ψ(k)(t)=(1)k+1ψ(t0)g(k)(1t/t0)/t0k\psi^{(k)}(t)=(-1)^{k+1}\psi(t_{0})g^{(k)}(1-t/t_{0})/t_{0}^{k}. Because ψ(k)(t)\psi^{(k)}(t) has sign (1)k+1(-1)^{k+1}, g(k)(s)0g^{(k)}(s)\geq 0 for s(,1)s\in(-\infty,1). It completes the proof of the necessity part. For the sufficiency part, suppose g(0)=0,g(1)=1g(0)=0,g(1)=1, g(s)g(s) is absolutely monotone in (,1)(-\infty,1). Let t0t_{0} and ψ(t0)\psi(t_{0}) be two given positive values. Define χ(t)=ψ(t0)(1g(1t/t0))\chi(t)=\psi(t_{0})(1-g(1-t/t_{0})). It can be shown that χ(t)\chi(t) is a Bernstein function such that χ(0)=0\chi(0)=0. From (a), there is an MPPP with ESAC equal to χ(t)\chi(t). We have χ(t0)=ψ(t0)\chi(t_{0})=\psi(t_{0}). From (4.3), g(s)=ht0(s)g(s)=h_{t_{0}}(s) is the probability generating function of p(t0)p(t_{0}) of this MPPP. It completes the proof of (b).

Appendix G Proof of Proposition 2

Let ξ(t)\xi(t) satisfy conditions (i) and (ii) in Proposition 2. Therefore, there exist ϵ>0\epsilon>0 and δ>0\delta>0 such that for all 0<t<δ0<t<\delta, we have ξ(t)>(1+ϵ)t\xi(t)>(1+\epsilon)t. Let g(t)=δ0texp(yδ(1/ξ(x))𝑑x)𝑑yg(t)=\delta\int_{0}^{t}\exp(\int_{y}^{\delta}(1/\xi(x))dx)dy. We show that g(t)g(t) is a ψ(t)\psi(t) in the Proposition.

When 0<tδ0<t\leq\delta,

g(t)δ0δexp(yδ1(1+ϵ)x𝑑x)𝑑y=δ0δ(δy)1/(1+ϵ)𝑑y=δ21+ϵϵ<.g(t)\leq\delta\int_{0}^{\delta}\exp\left(\int_{y}^{\delta}\frac{1}{(1+\epsilon)x}dx\right)dy=\delta\int_{0}^{\delta}\left(\frac{\delta}{y}\right)^{1/(1+\epsilon)}dy=\delta^{2}\frac{1+\epsilon}{\epsilon}<\infty.

When t>δt>\delta,

g(t)\displaystyle g(t) =\displaystyle= δ0δexp(yδ(1/ξ(x))𝑑x)𝑑y+δδtexp(δy(1/ξ(x))𝑑x)𝑑y\displaystyle\delta\int_{0}^{\delta}\exp\left(\int_{y}^{\delta}(1/\xi(x))dx\right)dy+\delta\int_{\delta}^{t}\exp\left(-\int_{\delta}^{y}(1/\xi(x))dx\right)dy
\displaystyle\leq δ2(1+ϵ)/ϵ+δδt1𝑑y<.\displaystyle\delta^{2}(1+\epsilon)/\epsilon+\delta\int_{\delta}^{t}1dy<\infty.

Therefore, g(t)g(t) is finite for any t>0t>0. As g(1)(t)=δexp(tδ(1/ξ(x))𝑑x)g^{(1)}(t)=\delta\exp(\int_{t}^{\delta}(1/\xi(x))dx), and g(2)(t)=δexp(tδ(1/ξ(x))𝑑x)(1/ξ(t))g^{(2)}(t)=-\delta\exp(\int_{t}^{\delta}(1/\xi(x))dx)(1/\xi(t)), we have g(1)(t)/g(2)(t)=ξ(t)-g^{(1)}(t)/g^{(2)}(t)=\xi(t). Since g(0)=0g(0)=0, from Proposition 1(a), it is sufficient if we can prove that

(1)k1g(k)(t)0,(k1).(-1)^{k-1}g^{(k)}(t)\geq 0,~{}~{}~{}(k\geq 1). (G.1)

Clearly (G.1) holds when k=1k=1. Suppose (G.1) holds when kdk\leq d for a d1d\geq 1. As g(2)(t)ξ(t)=g(1)(t)g^{(2)}(t)\xi(t)=-g^{(1)}(t), after taking derivative (d1)(d-1) times on both sides, we have

i=0d1(d1i)g(d+1i)(t)ξ(i)(t)=g(d)(t).\sum_{i=0}^{d-1}{d-1\choose i}g^{(d+1-i)}(t)\xi^{(i)}(t)=-g^{(d)}(t).

For i1i\geq 1, sign(g(d+1i)(t)ξ(i)(t))=sign((1)di(1)i+1)=(1)d+1sign(g^{(d+1-i)}(t)\xi^{(i)}(t))=sign((-1)^{d-i}(-1)^{i+1})=(-1)^{d+1}. Furthermore, sign(g(d)(t))=(1)dsign(-g^{(d)}(t))=(-1)^{d}. Thus sign(g(d+1)(t)ξ(t))=(1)dsign(g^{(d+1)}(t)\xi(t))=(-1)^{d}. It implies that sign(g(d+1)(t))=(1)dsign(g^{(d+1)}(t))=(-1)^{d} completing the proof of (G.1) by induction.

Appendix H Proof of Proposition 3

If an SAD is time invariant, then its probability generating function, ht(s)h_{t}(s) does not depend on tt. From (4.3), there is a positive function gg such that ψ(ty)=ψ(t)g(y)\psi(ty)=\psi(t)g(y) for y[0,)y\in[0,\infty). Take logarithm on both sides, take derivative with respect to tt, and then set t=1t=1. We have

dlog(ψ(y))dy=ψ(1)(1)ψ(1)y.\frac{d\log(\psi(y))}{dy}=\frac{\psi^{(1)}(1)}{\psi(1)y}.

The solution of the above differential equation is ψ(y)=ψ(1)yψ(1)(1)/ψ(1)\psi(y)=\psi(1)y^{\psi^{(1)}(1)/\psi(1)}. As ψ(t)\psi(t) is a Bernstein function, 0<ψ(1)(1)/ψ(1)10<\psi^{(1)}(1)/\psi(1)\leq 1. Hence power law is the only law with p(t)p(t) independent on tt.

Consider the second part of Proposition 3. Suppose an SAD in MPPP converges to a proper distribution with probability generating function f(s)f(s). Then the ht(s)h_{t}(s) of this SAD converges to f(s)f(s) for any s[0,1]s\in[0,1]. Let xx and yy be any two values in [0,1][0,1]. From (4.3), 1f(1xy)=limtψ(xyt)/ψ(t)=limtψ(xyt)/ψ(xt)limtψ(xt)/ψ(t)=(1f(1y))(1f(1x))1-f(1-xy)=\lim_{t\to\infty}\psi(xyt)/\psi(t)=\lim_{t\to\infty}\psi(xyt)/\psi(xt)\lim_{t\to\infty}\psi(xt)/\psi(t)=(1-f(1-y))(1-f(1-x)). Write π(x)=(1f(1x))\pi(x)=(1-f(1-x)). Then π(xy)=π(x)π(y)\pi(xy)=\pi(x)\pi(y). Similar to the proof above, we have π(x)=xα\pi(x)=x^{\alpha} when x[0,1]x\in[0,1]. It implies that f(s)=1π(1s)=1(1s)αf(s)=1-\pi(1-s)=1-(1-s)^{\alpha}. As f(1)=1f(1)=1, α>0\alpha>0. Since P(1)=f(1)(0)=αP(1)=f^{(1)}(0)=\alpha, we have 0<α10<\alpha\leq 1. It completes the proof as the probability generating function of power law is 1(1s)11/c1-(1-s)^{1-1/c}.

Appendix I Proof of LDR1LDR2=LDR3=\mathrm{LDR}_{1}\subsetneq\mathrm{LDR}_{2}=\mathrm{LDR}_{3}=\ldots

For any positive integer jj, condition ψ(j)(t)=(b+ct)ψ(j+1)(t)-\psi^{(j)}(t)=(b+ct)\psi^{(j+1)}(t) implies

ψ(j+1)(t)=cψ(j+1)(t)+(b+ct)ψ(j+2)(t).-\psi^{(j+1)}(t)=c\psi^{(j+1)}(t)+(b+ct)\psi^{(j+2)}(t).

Therefore, LDRjLDRj+1\mathrm{LDR}_{j}\subseteq\mathrm{LDR}_{j+1} for j1j\geq 1. Let ϕ(t)=αt+ψ(t)\phi(t)=\alpha t+\psi(t) where α>0\alpha>0 and ψ(t)LDR1\psi(t)\in\mathrm{LDR}_{1}. Then ϕ(2)(t)/ϕ(3)(t)=ψ(2)(t)/ψ(3)(t)-\phi^{(2)}(t)/\phi^{(3)}(t)=-\psi^{(2)}(t)/\psi^{(3)}(t) which is a linear function of tt because ψ(t)LDR1LDR2\psi(t)\in\mathrm{LDR}_{1}\subseteq\mathrm{LDR}_{2}. Clearly ϕ(t)LDR2\phi(t)\in\mathrm{LDR}_{2} but not in LDR1\mathrm{LDR}_{1}. Therefore, LDR1LDR2\mathrm{LDR}_{1}\neq\mathrm{LDR}_{2}. It can be shown that every element in LDR2\mathrm{LDR}_{2} has the form αt+ψ(t)\alpha t+\psi(t) for a ψ(t)LDR1\psi(t)\in\mathrm{LDR}_{1}. It means that LDR2\mathrm{LDR}_{2} is a mixture of zero-rate species and LDR1\mathrm{LDR}_{1}.

Consider ψ(t)LDRj\psi(t)\in\mathrm{LDR}_{j} for j3j\geq 3. Let ψ(j)(t)/ψ(j+1)(t)=b+ct-\psi^{(j)}(t)/\psi^{(j+1)}(t)=b+ct (i.e., dlog(ψ(j)(t))/dt=1/(b+ct)d\log(\psi^{(j)}(t))/dt=-1/(b+ct)). As jjth derivative ratio is always a nonnegative nondecreasing function of tt, both bb and cc are nonnegative. For simplicity, we only consider the case when c>0c>0 and c1c\neq 1 (cases when c=0c=0 and c=1c=1 can be studied through letting c0c\to 0 and c1c\to 1 respectively). Then

ψ(j)(t)=ψ(j)(1)[(b+ct)/(b+c)]1/c,\psi^{(j)}(t)=\psi^{(j)}(1)[(b+ct)/(b+c)]^{-1/c}, (I.1)
ψ(j1)(t)=[ψ(j1)(1)ψ(j)(1)(b+cc1)]+ψ(j)(1)(b+cc1)(b+ctb+c)11/c,\psi^{(j-1)}(t)=\left[\psi^{(j-1)}(1)-\psi^{(j)}(1)\left(\frac{b+c}{c-1}\right)\right]+\psi^{(j)}(1)\left(\frac{b+c}{c-1}\right)\left(\frac{b+ct}{b+c}\right)^{1-1/c}, (I.2)

and

ψ(j2)(t)\displaystyle\psi^{(j-2)}(t) =\displaystyle= ψ(j2)(1)+[ψ(j1)(1)ψ(j)(1)(b+cc1)](t1)\displaystyle\psi^{(j-2)}(1)+\left[\psi^{(j-1)}(1)-\psi^{(j)}(1)\left(\frac{b+c}{c-1}\right)\right](t-1) (I.3)
+(b+c)2ψ(j)(1)(c1)(2c1)[(b+ctb+c)21/c1].\displaystyle+\frac{(b+c)^{2}\psi^{(j)}(1)}{(c-1)(2c-1)}\left[\left(\frac{b+ct}{b+c}\right)^{2-1/c}-1\right].

If c>1c>1 and ψ(j)(1)0\psi^{(j)}(1)\neq 0, from (I.2), sign(ψ(j1)(t))=sign(ψ(j)(1))0sign(\psi^{(j-1)}(t))=sign(\psi^{(j)}(1))\neq 0 when tt is large. It is impossible because they should have different sign. If c>1c>1 and ψ(j)(1)=0\psi^{(j)}(1)=0, from (I.1), ψ(j)(t)\psi^{(j)}(t) is a zero function. It is impossible as ψ(j)(t)/ψ(j+1)(t)-\psi^{(j)}(t)/\psi^{(j+1)}(t) is undefined. If 0<c<10<c<1, from (I.2) and (I.3), sign(ψ(j1)(t))=sign(ψ(j1)(1)+ψ(j)(1)[(b+c)/(1c)])=sign(ψ(j2)(t))sign(\psi^{(j-1)}(t))=sign(\psi^{(j-1)}(1)+\psi^{(j)}(1)[(b+c)/(1-c)])=sign(\psi^{(j-2)}(t)) when tt is large. It is possible only when sign(ψ(j1)(t))=0sign(\psi^{(j-1)}(t))=0. It implies that ψ(i1)(1)+ψ(i)(1)[(b+c)/(1c)]=0\psi^{(i-1)}(1)+\psi^{(i)}(1)[(b+c)/(1-c)]=0. From (I.1) and (I.2), ψ(j1)(t)/ψ(j)(t)=(b+ct)/(1c)-\psi^{(j-1)}(t)/\psi^{(j)}(t)=(b+ct)/(1-c). Thus ψ(t)LDRj1\psi(t)\in\mathrm{LDR}_{j-1}. It follows that LDRj=LDRj1\mathrm{LDR}_{j}=\mathrm{LDR}_{j-1} for all j3j\geq 3.

Appendix J Pointwise Confidence Band for D2/D3 plot

Similar to the confidence band for D1/D2 plot, an approximate 95% pointwise confidence band for D2/D3 plot is ψ^(2)(t)/ψ^(3)(t)±1.96Var^(ψ^(2)(t)/ψ^(3)(t))-\hat{\psi}^{(2)}(t)/\hat{\psi}^{(3)}(t)\pm 1.96\sqrt{\widehat{Var}(-\hat{\psi}^{(2)}(t)/\hat{\psi}^{(3)}(t))}, where

Var^(ψ^(2)(t)ψ^(3)(t))\displaystyle\widehat{Var}\left(-\frac{\hat{\psi}^{(2)}(t)}{\hat{\psi}^{(3)}(t)}\right)
=1ψ^(3)2(t)Var^(ψ^(2)(t))+ψ^(2)2(t)ψ^(3)4(t)Var^(ψ^(3)(t))2ψ^(2)(t)ψ^(3)3(t)Cov^(ψ^(2)(t),ψ^(3)(t)),\displaystyle=\frac{1}{\hat{\psi}^{(3)2}(t)}\widehat{Var}(\hat{\psi}^{(2)}(t))+\frac{\hat{\psi}^{(2)2}(t)}{\hat{\psi}^{(3)4}(t)}\widehat{Var}(\hat{\psi}^{(3)}(t))-\frac{2\hat{\psi}^{(2)}(t)}{\hat{\psi}^{(3)3}(t)}\widehat{Cov}(\hat{\psi}^{(2)}(t),\hat{\psi}^{(3)}(t)),

with

Var^(ψ^(2)(t))=1t04k=2k2(k1)2Nk(t0)(1tt0)2k4,\widehat{Var}(\hat{\psi}^{(2)}(t))=\frac{1}{t_{0}^{4}}\sum_{k=2}^{\infty}k^{2}(k-1)^{2}N_{k}(t_{0})\left(1-\frac{t}{t_{0}}\right)^{2k-4},
Var^(ψ^(3)(t))=1t06k=3k2(k1)2(k2)2Nk(t0)(1tt0)2k6,\widehat{Var}(\hat{\psi}^{(3)}(t))=\frac{1}{t_{0}^{6}}\sum_{k=3}^{\infty}k^{2}(k-1)^{2}(k-2)^{2}N_{k}(t_{0})\left(1-\frac{t}{t_{0}}\right)^{2k-6},

and

Cov^(ψ^(2)(t),ψ^(3)(t))=1t05k=3k2(k1)2(k2)Nk(t0)(1tt0)2k5.\widehat{Cov}(\hat{\psi}^{(2)}(t),\hat{\psi}^{(3)}(t))=-\frac{1}{t_{0}^{5}}\sum_{k=3}^{\infty}k^{2}(k-1)^{2}(k-2)N_{k}(t_{0})\left(1-\frac{t}{t_{0}}\right)^{2k-5}.

Appendix K Parametric Bootstrap Confidence Interval for Hill Numbers

We have θ^\hat{\theta}, an estimator of θ\theta which is E(N0(t0))E(N_{0}(t_{0})) or a Hill number of order qq. We want to construct a 100(1α)%100(1-\alpha)\% confidence interval for θ\theta. Let BB be the total number of bootstrap samples such that α(B+1)/2\alpha(B+1)/2 is an integer. In this paper, we use α=0.05\alpha=0.05 and B=2999B=2999. As θ^\hat{\theta} can be infinite, we avoid using methods that require arithmetic on θ^\hat{\theta}, such as the basic method. The procedure to construct a parametric bootstrap confidence interval is as follows:

For i=1i=1 to BB do {

Generate a bootstrap sample from a parametric model.

Compute θ^\hat{\theta}, which we denote as θ^i\hat{\theta}_{i} basing on the bootstrap sample. }

Sort θ^1,θ^2,,θ^B\hat{\theta}_{1},\hat{\theta}_{2},\ldots,\hat{\theta}_{B} in ascending order. Denote the sorted θ^1,θ^2,,θ^B\hat{\theta}_{1},\hat{\theta}_{2},\ldots,\hat{\theta}_{B} as θ^(1)θ^(2)θ^(B)\hat{\theta}_{(1)}\leq\hat{\theta}_{(2)}\leq\ldots\leq\hat{\theta}_{(B)}. Set θ^(0)=0\hat{\theta}_{(0)}=0 and θ^(B+1)=\hat{\theta}_{(B+1)}=\infty. A 100(1α)%100(1-\alpha)\% confidence interval for θ\theta is

[θ^(j),θ^((B+1)(1α)+j)].[~{}\hat{\theta}_{(j)}~{},~{}\hat{\theta}_{((B+1)(1-\alpha)+j)}~{}].

where jj is an integer such that 0j(B+1)α0\leq j\leq(B+1)\alpha and the above 100(1α)%100(1-\alpha)\% confidence interval is smallest.

Infinity is a special value of θ^\hat{\theta} which can have positive probability mass. We use the smallest 100(1α)%100(1-\alpha)\% confidence interval, and attempt to construct confidence interval with finite upper endpoint if possible.

When θ^\hat{\theta} is an estimator of a Hill number under parametric model LDR1\mathrm{LDR}_{1} or RDR1\mathrm{RDR}_{1}, the bootstrap sample is a simulated FoF under the selected model with the parameter equal to the MLE. This simulation consists of two steps: (i) simulate N+(t0)N_{+}(t_{0}) from Poisson(E^(N+(t0)))\mathrm{Poisson}(\hat{E}(N_{+}(t_{0}))) distribution, and (ii) generate a random sample of size N+(t0)N_{+}(t_{0}) from the fitted SAD and they form the FoF.

When θ^=E^(N0(t0))\hat{\theta}=\hat{E}(N_{0}(t_{0})) in (7.3), the bootstrap sample is {Ni(t0)}i=1,2,3\{N_{i}^{*}(t_{0})\}_{i=1,2,3}. We simulate Ni(t0)N_{i}^{*}(t_{0}) from Poisson(Ni(t0))\mathrm{Poisson}(N_{i}(t_{0})) distribution for i=1,2,3i=1,2,3 independently across ii. After a confidence interval for E(N0(t0))E(N_{0}(t_{0})) is constructed, add N+(t0)N_{+}(t_{0}) to it to find a confidence interval for E(D)E(D). If at least one of N1(t0),N2(t0)N_{1}(t_{0}),N_{2}(t_{0}) or N3(t0)N_{3}(t_{0}) is zero, modification is recommended to avoid having any Ni(t0)N_{i}^{*}(t_{0}) (i=1,2,3i=1,2,3) to be fixed to 0. When there is j3j\geq 3 such that Nj(t0)>0N_{j}(t_{0})>0, we move the ending time t0t_{0} a little bit backward to tt, say t=t0t0/S(t0)t=t_{0}-t_{0}/S(t_{0}) where S(t0)S(t_{0}) is the total number of individuals observed in time interval [0,t0][0,t_{0}], and find {N^i(t)}i=1,2,3\{\hat{N}_{i}(t)\}_{i=1,2,3} using (4.8) as shown below:

N^i(t)=k=iNk(t0)(ki)(tt0)i(1tt0)ki.\hat{N}_{i}(t)=\sum_{k=i}^{\infty}N_{k}(t_{0}){k\choose i}\left(\frac{t}{t_{0}}\right)^{i}\left(1-\frac{t}{t_{0}}\right)^{k-i}.

Clearly N^i(t)>0\hat{N}_{i}(t)>0 for i=1,2,3i=1,2,3. We use {N^i(t)}i=1,2,3\{\hat{N}_{i}(t)\}_{i=1,2,3} in place of {Ni(t0)}i=1,2,3\{N_{i}(t_{0})\}_{i=1,2,3} in bootstrapping. We estimate N+(t)N_{+}(t) (=ψ(t)=\psi(t)) using the relation in (4.5). Add this estimate to the confidence interval for N0(t)N_{0}(t) to construct a confidence interval for E(D)E(D).

Appendix L Expressions of Dνq{}^{q}\!D_{\nu} for RDR1\mathrm{RDR}_{1}

If c1c_{1} or c2c_{2} is zero, RDR1\mathrm{RDR}_{1} reduces to LDR1\mathrm{LDR}_{1}. Assume that c1c_{1} and c2c_{2} are positive.

dν~dλ\displaystyle\frac{d\tilde{\nu}}{d\lambda} =\displaystyle= a(t0+b1)c1(t0+b2)c2exp(b1λ)λc1+c22Γ(c1)Γ(c2)\displaystyle\frac{a(t_{0}+b_{1})^{c_{1}}(t_{0}+b_{2})^{c_{2}}\exp(-b_{1}\lambda)\lambda^{c_{1}+c_{2}-2}}{\Gamma(c_{1})\Gamma(c_{2})}
01yc21(1y)c11exp((b2b1)λy)𝑑y.\displaystyle\int_{0}^{1}y^{c_{2}-1}(1-y)^{c_{1}-1}\exp(-(b_{2}-b_{1})\lambda y)dy.

Assume further that b1>0b_{1}>0, which implies that Λ\Lambda is finite. When q0q\geq 0 and q1q\neq 1,

Dνq{}^{q}\!D_{\nu} =\displaystyle= a(t0+b1b1)c1(t0+b2b2)c2(b1b2Γ(c1+c2+q1)Γ(c1)Γ(c2)\displaystyle a\left(\frac{t_{0}+b_{1}}{b_{1}}\right)^{c_{1}}\left(\frac{t_{0}+b_{2}}{b_{2}}\right)^{c_{2}}\left(\frac{b_{1}b_{2}\Gamma(c_{1}+c_{2}+q-1)}{\Gamma(c_{1})\Gamma(c_{2})}\right.
01ϑ(y)c21(1ϑ(y))c11((b2b1)y+b1)q1dy)1/(1q),\displaystyle\left.\int_{0}^{1}\vartheta(y)^{c_{2}-1}(1-\vartheta(y))^{c_{1}-1}((b_{2}-b_{1})y+b_{1})^{-q-1}dy\right)^{1/(1-q)},

where ϑ(y)=b2y/(b2y+b1(1y))\vartheta(y)=b_{2}y/(b_{2}y+b_{1}(1-y)). When q=1q=1,

Dν1{}^{1}\!D_{\nu} =\displaystyle= a(t0+b1b1)c1(t0+b2b2)c2exp(b1b2B(c1,c2)\displaystyle a\left(\frac{t_{0}+b_{1}}{b_{1}}\right)^{c_{1}}\left(\frac{t_{0}+b_{2}}{b_{2}}\right)^{c_{2}}\exp\left(-\frac{b_{1}b_{2}}{B(c_{1},c_{2})}\right.
01ϑ(y)c21(1ϑ(y))c11Ψ(c1+c2)log((b2b1)y+b1)((b2b1)y+b1)2dy),\displaystyle\left.\int_{0}^{1}\vartheta(y)^{c_{2}-1}(1-\vartheta(y))^{c_{1}-1}\frac{\Psi(c_{1}+c_{2})-\log((b_{2}-b_{1})y+b_{1})}{((b_{2}-b_{1})y+b_{1})^{2}}dy\right),

where B(x,y)B(x,y) is the Beta function and Ψ(x)\Psi(x) is the digamma function.

Appendix M Proof of the log-likelihood function for ρ\rho-appearance design

Let YiY_{i} be the length of the observation period for species ii, and JiJ_{i} be the observed frequency of species ii in its observation period. For species ii, we observe (Yi,Ji)(Y_{i},J_{i}). Either Yi=t0Y_{i}=t_{0} or Ji=ρJ_{i}=\rho. For a species with rate λ\lambda, the probability function of JJ is

P(J=jλ)={(λt0)jexp(λt0)/j!(j<ρ)k=ρ(λt0)kexp(λt0)/k!(j=ρ).P(J=j\mid\lambda)=\begin{cases}(\lambda t_{0})^{j}\exp(-\lambda t_{0})/j!&(j<\rho)\\ \sum_{k=\rho}^{\infty}(\lambda t_{0})^{k}\exp(-\lambda t_{0})/k!&(j=\rho).\end{cases} (M.1)

Since the time of the ρ\rhoth individual follows Erlang(ρ,λ)\mathrm{Erlang}(\rho,\lambda) distribution,

P(Y[y,y+dy),J=ρλ)\displaystyle P(Y\in[y,y+dy),\,J=\rho\,\mid\,\lambda) =λρyρ1exp(λy)(ρ1)!dy.\displaystyle=\frac{\lambda^{\rho}y^{\rho-1}\exp(-\lambda y)}{(\rho-1)!}dy. (M.2)

By (M.1) and (M.2), the joint probability density function of the observations {(Yi,Ji)}in+(t0)\{(Y_{i},J_{i})\}_{i\leq n_{+}(t_{0})}, say {(yi,ji)}in+(t0)\{(y_{i},j_{i})\}_{i\leq n_{+}(t_{0})} given ν\nu is proportional to

eψ(t0)(i:ji<ρλji1t0jieλt0ji!ν(dλ))(i:ji=ρλρ1yiρ1eλyi(ρ1)!ν(dλ)).e^{-\psi(t_{0})}\left(\prod_{i:\,j_{i}<\rho}\int\frac{\lambda^{j_{i}-1}t_{0}^{j_{i}}e^{-\lambda t_{0}}}{j_{i}!}\nu(d\lambda)\right)\left(\prod_{i:\,j_{i}=\rho}\int\frac{\lambda^{\rho-1}y_{i}^{\rho-1}e^{-\lambda y_{i}}}{(\rho-1)!}\nu(d\lambda)\right).

Therefore, the log-likelihood function is

log((ψ{yi,ji}i))=ψ(t0)+j=1ρ1nj(t0)log(ψ(j)(t0))+yi<t0log(ψ(ρ)(yi)).\log(\mathcal{L}(\psi\mid\{y_{i},j_{i}\}_{i}))=-\psi(t_{0})+\sum_{j=1}^{\rho-1}n_{j}(t_{0})\log(\mid\psi^{(j)}(t_{0})\mid)+\sum_{y_{i}<t_{0}}\log(\mid\psi^{(\rho)}(y_{i})\mid).

Appendix N Simulation experiment on the loss of information of the ρ\rho-appearance design

Let us reconsider the bird abundance data for the Wisconsin route of the North American Breeding Bird Survey for 1995 in Section 9. We choose ν(dλ)=γλf(λμ,σ)dλ\nu(d\lambda)=\gamma\lambda f(\lambda\mid\mu,\sigma)d\lambda where f(λμ,σ)f(\lambda\mid\mu,\sigma) is the density function of Lognormal(μ,σ2)\mathrm{Lognormal}(\mu,\sigma^{2}) distribution. The parameter γ\gamma is the expected total number of species. From Section 3, the MLE of (μ,σ)(\mu,\sigma) is identical to the conditional maximum likelihood estimate of the corresponding Poisson-lognormal model [5]. The fitted lognormal mixing distribution is Lognormal(μ^,σ^2)\mathrm{Lognormal}(\hat{\mu},\hat{\sigma}^{2}) distribution with μ^=1.23\hat{\mu}=1.23 and σ^=1.30\hat{\sigma}=1.30. Let ωi(μ,σ)=P(Y=i)\omega_{i}(\mu,\sigma)=P(Y=i) where YY is a Poisson-lognormal random variable with parameters μ\mu and σ\sigma. We use the function “dpoilog” in R-package “poilog” [23] to compute this probability. The estimated γ\gamma is γ^=n+(t0)/(1ω0(μ^,σ^2))=85.2\hat{\gamma}=n_{+}(t_{0})/(1-\omega_{0}(\hat{\mu},\hat{\sigma}^{2}))=85.2.

Without loss of generality, set t0=1t_{0}=1. We use this data to investigate the information loss of the ρ\rho-appearance design. The ρ\rho-appearance data are simulated from the data using the following procedure:

Simulation procedure: Suppose species ii has observed frequency mim_{i} in time [0,1].[0,1]. If mi<ρm_{i}<\rho, our data for this species is mim_{i}, the frequency of it in time [0,1].[0,1]. If miρm_{i}\geq\rho, we simulate the ρ\rho-appearance time of the species, rir_{i} from Beta(ρ,mi+1ρ)\mathrm{Beta}(\rho,m_{i}+1-\rho) distribution, which is the distribution of the ρ\rho order statistic of mim_{i} samples from the U(0,1)U(0,1) distribution.

It can be shown that E(Nk(t))=γωk(μ+log(t),σ)E(N_{k}(t))=\gamma\omega_{k}(\mu+\log(t),\sigma). The log-likelihood function is

log((γ,μ,σ{nj(1)}j=1,,ρ1,{ri}))\displaystyle\log(\mathcal{L}(\gamma,\mu,\sigma\mid\{n_{j}(1)\}_{j=1,\ldots,\rho-1},\{r_{i}\}))
=γ(1ω0(μ,σ))+k=1ρ1nk(1)log(γωk(μ,σ))+rilog(γωρ(μ+log(ri),σ)).\displaystyle=-\gamma(1-\omega_{0}(\mu,\sigma))+\sum_{k=1}^{\rho-1}n_{k}(1)\log(\gamma\omega_{k}(\mu,\sigma))+\sum_{r_{i}}\log(\gamma\omega_{\rho}(\mu+\log(r_{i}),\sigma)).

As the MLE of E(N+(1))=γ(1ω0(μ,σ))E(N_{+}(1))=\gamma(1-\omega_{0}(\mu,\sigma)) is n+(1)n_{+}(1), MLE of μ\mu and σ\sigma, say μ^\hat{\mu} and σ^\hat{\sigma} respectively can be found through maximizing the following function.

n+(1)log(1ω0(μ,σ))+k=1ρ1nk(1)log(ωk(μ,σ))+rilog(ωρ(μ+log(ri),σ)).-n_{+}(1)\log(1-\omega_{0}(\mu,\sigma))+\sum_{k=1}^{\rho-1}n_{k}(1)\log(\omega_{k}(\mu,\sigma))+\sum_{r_{i}}\log(\omega_{\rho}(\mu+\log(r_{i}),\sigma)).

The MLE of γ\gamma is γ^=n+(1)/(1ω0(μ^,σ^))\hat{\gamma}=n_{+}(1)/(1-\omega_{0}(\hat{\mu},\hat{\sigma})). We consider ρ=1,2,,6\rho=1,2,...,6. For each ρ\rho-value, we simulate 100 independent sets of ρ\rho-appearance data. For each simulated data, μ\mu, σ\sigma and γ\gamma are estimated. The sample mean and sample standard deviation of the estimates are presented in Table 5. Graphical display is given in Figure 5.

The mean of the estimate is close to that basing on n~(1)\tilde{n}(1). The standard deviation of the estimator decreases as ρ\rho increases. From the simulation results, the standard deviation of the estimators when ρ=1\rho=1 is considerably worse than those when ρ=2\rho=2. Value ρ=4\rho=4 performs well for this data. The total number of species with frequency less than 4 is 33, around 46% of the seen species.

Table 5: Mean and standard deviation of MLE for North American breeding bird survey data (1995)
ρ\rho 1 2 3 4 5 6 \infty
mean of μ^\hat{\mu} 1.10 1.28 1.23 1.21 1.21 1.22 1.23
sd of μ^\hat{\mu} 0.56 0.09 0.07 0.04 0.04 0.03 0*
mean of σ^\hat{\sigma} 1.39 1.24 1.29 1.29 1.31 1.30 1.30
sd of σ^\hat{\sigma} 0.48 0.15 0.13 0.08 0.09 0.07 0*
mean of γ^\hat{\gamma} 90.4 83.8 85.0 85.5 85.7 85.3 85.2
sd of γ^\hat{\gamma} 18.7 2.52 2.32 1.38 1.56 1.29 0*

* When ρ=\rho=\infty, we always observe the full data, and the sample

standard deviation (sd) of the estimator across simulation is zero.

Refer to caption
Figure 5: Box-and-whisker plots for estimators across different values of ρ\rho in the simulation. The horizontal dashed line in each plot shows the estimate when the full Wisconsin route of the North American breeding bird survey data for 1995 is used. The variability of the estimate delineates the additional noise due to the ρ\rho-appearance design.

Appendix O Simulation comparison of MLE method and curve-fitting method in extrapolation when finite number of points in the empirical SAC are available

Without loss of generality, we set t0=1t_{0}=1. Our data is {n+(0.1),n+(0.2),,n+(1)}\{n_{+}(0.1),n_{+}(0.2),\ldots,n_{+}(1)\}. We consider three distributions: power law, log-series distribution and geometric distribution. The curve-fitting methods for the distributions are described below:

(a) Power law: ψ(t)=τt11/c\psi(t)=\tau t^{1-1/c}. For curve fitting, we regress log(n+(t))\log(n_{+}(t)) on log(t)\log(t).

(b) Log-series distribution: ψ(t)=τlog(1+t/b)/log(1+1/b)\psi(t)=\tau\log(1+t/b)/\log(1+1/b). For curve fitting, we regress n+(t)n_{+}(t) on log(t)\log(t). It is the standard approximation method which assumes that 1+t/bt/b1+t/b\approx t/b.

(c) Geometric distribution (hyperbola law): ψ(t)=τ(1+2b)t/(t+2b)\psi(t)=\tau(1+2b)t/(t+2b). There are various curve-fitting methods for hyperbola law (see for example Raaijmakers [41]). In this simulation, we regresses 1/n+(t)1/n_{+}(t) on 1/t1/t.

The parameter τ=ψ(1)\tau=\psi(1) is the expected number of recorded species at time t0=1t_{0}=1. In the experiment, τ\tau can take value 200 and 1000. The distribution parameter can take 6 values. For power law, the value of cc can be 1.25, 1.5, 2, 3, 4 and 5. For log-series distribution, the value of bb can be 0.01, 0.02, 0.03, 0.05, 0.1, and 0.2. For geometric distribution, the value of bb can be 0.05, 0.1, 0.2, 0.4, 0.6 and 0.8. For each combination of parameters, we simulate 5000 data. The MLE and the curve-fitting estimate of ψ(2)\psi(2) and ψ(4)\psi(4) are found for each simulated data. We evaluate the performance of an estimator by the root mean squared relative error (RMSRE) (for estimator θ^\hat{\theta} for θ\theta, RMSRE=i=1n((θ^iθ)/θ)2/n\mathrm{RMSRE}=\sqrt{\sum_{i=1}^{n}((\hat{\theta}_{i}-\theta)/\theta)^{2}/n}). The results of the simulation are presented in Tables 6, 7 and 8. The RMSRE for MLE is smaller than that for the curve-fitting method in this simulation study.

Table 6: Simulation results for Power law in extrapolation to tt for fixed ψ(1)\psi(1) given ten equally spaced observations of SAC in time interval [0,1][0,1]
(t,ψ(1)t,\psi(1)) c=1.25c=1.25 c=1.5c=1.5 c=2c=2 c=3c=3 c=4c=4 c=5c=5
(2,200) RMSRE(Curve-fitting) 0.075 0.078 0.083 0.091 0.096 0.100
RMSRE(MLE) 0.073 0.074 0.077 0.080 0.082 0.083
(2,1000) RMSRE(Curve-fitting) 0.034 0.035 0.037 0.040 0.042 0.044
RMSRE(MLE) 0.033 0.034 0.035 0.036 0.037 0.037
(4,200) RMSRE(Curve-fitting) 0.081 0.089 0.103 0.120 0.132 0.140
RMSRE(MLE) 0.078 0.084 0.093 0.103 0.109 0.112
(4,1000) RMSRE(Curve-fitting) 0.036 0.040 0.045 0.052 0.057 0.060
RMSRE(MLE) 0.035 0.038 0.042 0.046 0.049 0.050
Table 7: Simulation results for log-series distribution in extrapolation to tt for fixed ψ(1)\psi(1) given ten equally spaced observations of SAC in time interval [0,1][0,1]
(t,ψ(1)t,\psi(1)) b=.01b=.01 b=.02b=.02 b=.03b=.03 b=.05b=.05 b=.1b=.1 b=.2b=.2
(2,200) RMSRE(Curve-fitting) 0.072 0.073 0.073 0.077 0.090 0.122
RMSRE(MLE) 0.072 0.072 0.072 0.072 0.073 0.076
(2,1000) RMSRE(Curve-fitting) 0.033 0.034 0.036 0.043 0.065 0.106
RMSRE(MLE) 0.032 0.032 0.032 0.033 0.033 0.034
(4,200) RMSRE(Curve-fitting) 0.074 0.075 0.077 0.084 0.110 0.166
RMSRE(MLE) 0.073 0.074 0.074 0.076 0.079 0.086
(4,1000) RMSRE(Curve-fitting) 0.034 0.037 0.042 0.055 0.091 0.155
RMSRE(MLE) 0.033 0.033 0.034 0.034 0.035 0.038
Table 8: Simulation results for geometric distribution in extrapolation to tt for fixed ψ(1)\psi(1) given ten equally spaced observations of SAC in time interval [0,1][0,1]
(t,ψ(1)t,\psi(1)) b=.05b=.05 b=.1b=.1 b=.2b=.2 b=.4b=.4 b=.6b=.6 b=.8b=.8
(2,200) RMSRE(Curve-fitting) 0.322 0.458 0.586 0.687 0.731 0.756
RMSRE(MLE) 0.085 0.106 0.136 0.165 0.174 0.174
(2,1000) RMSRE(Curve-fitting) 0.316 0.454 0.583 0.684 0.728 0.753
RMSRE(MLE) 0.054 0.079 0.111 0.134 0.136 0.132
(4,200) RMSRE(Curve-fitting) 0.320 0.462 0.601 0.716 0.768 0.798
RMSRE(MLE) 0.102 0.145 0.218 0.311 0.366 0.397
(4,1000) RMSRE(Curve-fitting) 0.315 0.458 0.598 0.713 0.766 0.796
RMSRE(MLE) 0.075 0.122 0.188 0.254 0.279 0.287

References

  • Arrhenius [1921] Arrhenius, O. (1921). Species and area. Journal of Ecology 9 95–99.
  • Béguinot [2016] Béguinot, J. (2016). Extrapolation of the species accumulation curve associated to “chao” estimator of the number of unrecorded species: a mathematically consistent derivation. Annual Research & Review in Biology 11 1–19.
  • Böhning and Schön [2005] Böhning, D. and Schön, D. (2005). Nonparametric maximum likelihood estimation of population size based on the counting distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54 721–737.
  • Boneh et al. [1998] Boneh, S., Boneh, A., and Caron, R. J. (1998). Estimating the prediction function and the number of unseen species in sampling with replacement. Journal of the American Statistical Association 93 372–379.
  • Bulmer [1974] Bulmer, M. G. (1974). On fitting the Poisson lognormal distribution to species-abundance data. Biometrics 30 101–110.
  • Bunge and Fitzpatrick [1993] Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: a review. Journal of American Statistical Association 88 364–373.
  • Chao [1984] Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11 265–270.
  • Chao, Ma, Hsieh and Chiu [2016] Chao, A. and Hsieh, T. C. (2016). User’s guide for online program SpadeR (species-richness prediction and diversity estimation in R). DOI:10.13140/RG.2.2.20744.62722
  • Chao and Lee [1992] Chao, A. and Lee, S. M. (1992). Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87 210–217.
  • Chao and Jost [2015] Chao, A. and Jost, L. (2015). Estimating diversity and entropy profiles via discovery rates of new species. Methods in Ecology and Evolution 6 873–882.
  • Chao et al. [1993] Chao, A., Ma, M. C., and Yang, M. C. K. (1993). Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 43 783–791.
  • Chee and Wang [2016] Chee, C.-S. and Wang, Y. (2016). Nonparametric estimation of species richness using discrete k-monotone distributions. Computational Statistics and Data Analysis 93 107–118.
  • Chiu and Chao [2016] Chiu, C.-H. and Chao, A. (2016). Estimating and comparing microbial diversity in the presence of sequencing errors. PeerJ 4 1634.
  • Chiu et al. [2014] Chiu, C. H., Wang, Y. T., Walther, B. A. and Chao, A. (2014). An improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 70 671–682.
  • Dengler [2009] Dengler, J. (2009). Which function describes the species–area relationship best? A review and empirical evaluation. Journal of Biogeography 36 728–744.
  • Deolalikar and Laffitte [2016] Deolalikar, V. and Laffitte, H. (2016). Extensive large-scale study of error surfaces in sampling-based distinct value estimators for databases. IEEE International Conference on Big Data pp. 1579–1586.
  • Dickie [2010] Dickie, I. A. (2010). Insidious effects of sequencing errors on perceived diversity in molecular surveys. The New Phytologist 188 916–918.
  • Efron and Thisted [1976] Efron, B. and Thisted, R. (1976). Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 435–447.
  • Engen [1974] Engen, S. (1974). On species frequency models. Biometrika 61 263–270.
  • Fisher et al. [1943] Fisher, R. A., Corbet, A. S., and Williams, C. B. (1943). The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12 42–58.
  • Good [1953] Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 40 237–264.
  • Good and Toulmin [1956] Good, I. J. and Toulmin, G. (1956). The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43 45–63.
  • Grøtan and Engen [2015] Grøtan, V. and Engen, S. (2015). Package ‘poilog’. Biometrics 30 651–660.
  • Haas et al. [1995] Haas, P. J., Naughton, J. F., Seshadri, S., and Stokes, L. (1995). Sampling-based estimation of the number of distinct values of a attribute. VLDB 95 311–322.
  • Hill [1973] Hill, M. O. (1973). Diversity and evenness: a unifying notation and its consequences. Ecology 54 427–432.
  • Hoaglin and Tukey [1985] Hoaglin, D. C. and Tukey, J. W. (1985). Checking the shape of discrete distributions. In Exploring Data Tables, Trends, and Shapes, Eds. D. C.Hoaglin, F. Mosteller & J. W. Tukey, pp.345–416, New York: John Wiley and Sons.
  • Kemp [1968] Kemp, A. W. (1968). A wide class of discrete distributions and the associated differential equations. Sankhya, Series A 30 401–410.
  • Kemp and Kemp [1956] Kemp, C. D. and Kemp, A. W. (1956). Generalized hypergeometric distributions. Journal of the Royal Statistical Society, Series B 18 202–211.
  • Kingman [1993] Kingman, J. F. C. (1993). Poisson Processes. Oxford: Oxford University Press.
  • Kobayashi [1975] Kobayashi, S. (1975). The species-area relation ii. a second model for continuous sampling. Researches on Population Ecology 16 265–280.
  • Lijoi et al. [2007] Lijoi, A., Mena, R. H., and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786.
  • Magurran [2007] Magurran, A. E. (2007). Species abundance distributions over time. Ecology Letters 10 347–354.
  • Martin and Goldenfield [2006] Martin, H. G. and Goldenfield, N. (2006). On the origin and robustness of power-law species-area relationships in ecology. PNAS 103 10310–10315.
  • Matthews and Whittaker [2014] Matthews, T. J. and Whittaker, R. J. (2014). Fitting and comparing competing models of the species abundance distribution: assessment and prospect. Frontiers of Biogeography 6 67–82.
  • Matthews and Whittaker [2015] Matthews, T. J. and Whittaker, R. J. (2015). On the species abundance distribution in applied ecology and biodiversity management. Journal of Applied Ecology 52 443–454.
  • May [1975] May, R. M. (1975). Patterns of species abundance and diversity. In Ecology and Evolution of Communities, Eds. M. L. Cody & J. M. Diamond, pp.81–120. Cambridge: Belknap Press.
  • Norris and Pollock [1998] Norris, J. L. and Pollock, K. H. (1998). Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environmental and Ecological Statistics 5 391–402.
  • Novotny and Basset [2000] Novotny, V. and Basset, Y. (2000). Rare species in communities of tropical insect herbivores: pondering the mystery of singletons. Oikos 89 564–572.
  • Preston [1960] Preston, F. W. (1960). Time and space and the variation of species. Ecology 41 612–627.
  • Preston [1962] Preston, F. W. (1962). The canonical distribution of commonness and rarity. Ecology 43 185–215.
  • Raaijmakers [1987] Raaijmakers, J. G. W. (1987). Statistical analysis of the Michaelis-Menten equation. Biometrics 43 793–803.
  • Sanders [1968] Sanders, H. L. (1968). Marine benthic diversity: a comparative study. American Naturalist 102 243–282.
  • Scheiner [2003] Scheiner, S. M. (2003). Six types of species-area curves. Global Ecology & Biogeography 12 441–447.
  • Schilling et al. [2012] Schilling, R. L., Song, R. and Vondracek, Z. (2012). Bernstein Functions: Theory and Applications. de Gruyter.
  • Strook [2010] Stroock, D. W. (2012). Probability Theory: an Analytic View. Cambridge university press.
  • Tjørve [2003] Tjørve, E. (2003). Shapes and functions of species-area curves: a review of possible models. Journal of Biogeography 30 827–835.
  • Tjørve [2009] Tjørve, E. (2009). Shapes and functions of species-area curves (ii): a review of new models and parameterizations. Journal of Biogeography 36 1435–1445.
  • Trushkowsky et al. [2012] Trushkowsky, B., Kraska, T., Franklin, M. J., and Sarkar, P. (2012). Getting it all from the crowd. arXiv preprint arXiv:1202.2335.
  • Wang [2010] Wang, J.-P. (2010). Estimating species richness by a Poisson-compound gamma model. Biometrika 97 727–740.
  • Williams et al. [2009] Williams, M. R., Lamont, B. B., and Henstridge, J. D. (2009). Species-area functions revisited. Journal of Biogeography 36 1994–2004.
  • Zhou et al. [2017] Zhou, M., Favaro, S., and Walker, S. G. (2017). Frequency of frequencies distributions and size-dependent exchangeable random partitions. Journal of the American Statistical Association 112 1623–1635.