Information criteria for inhomogeneous spatial point processes
Abstract
The theoretical foundation for a number of model selection criteria is established in the context of inhomogeneous point processes and under various asymptotic settings: infill, increasing domain, and combinations of these. For inhomogeneous Poisson processes we consider Akaike’s information criterion and the Bayesian information criterion, and in particular we identify the point process analogue of ‘sample size’ needed for the Bayesian information criterion. Considering general inhomogeneous point processes we derive new composite likelihood and composite Bayesian information criteria for selecting a regression model for the intensity function. The proposed model selection criteria are evaluated using simulations of Poisson processes and cluster point processes.
Keywords: Akaike’s information criterion, Bayesian information criterion, composite information criterion, composite likelihood, inhomogeneous point process, intensity function, model selection.
1 Introduction
Fitting a regression model to the intensity function of a point process is one of the most fundamental tasks in statistical analysis of point pattern data , see e.g. Møller and Waagepetersen (2017) or Coeurjolly and Lavancier (2019) for a recent review of this problem. If the data in question can be viewed as a realization of a Poisson process, regression parameters are usually estimated by maximum likelihood. If the point process is not Poisson, the likelihood function is often computationally intractable. In such cases the Poisson likelihood function can still be used as a composite likelihood function for estimating regression parameters. This is e.g. the approach underlying the popular spatstat R package (Baddeley et al., 2015) procedure kppm.
Considering regression models, model selection is often a pertinent task. In case of a Poisson process, the Akaike information criterion (AIC) (Akaike, 1973) seems an obvious approach (and implemented in the function logLik.ppm of spatstat) since in this case the likelihood function is available. If the assumption of a Poisson process is not tenable, generalization of the AIC to a composite likelihood information criterion (CIC) (Varin and Vidoni, 2005) is relevant. Yet another alternative is the Bayesian information criterion (BIC) (Schwarz, 1978). The classical framework for the information criteria mentioned typically involves a sample of independent observations where the sample size plays a crucial role for the BIC. However, in the case of a unique realization of a point process, it is not obvious how to define sample size. In some sense, it is one, but this choice is obviously not useful for asymptotic justifications of information criteria. Instead, sample size must be linked to properties of the observation window or the point process intensity function. Several proposals for defining ‘sample size’ have been considered in the literature. Choiruddin et al. (2018) use the size of the observation window while Thurman et al. (2015) consider the number of observed points. Jeffrey et al. (2018) use the sum of the number of data points and the number of dummy points used in a numerical approximation of the likelihood (Berman and Turner, 1992).
In this paper we first establish the theoretical foundation for AIC and CIC in the context of intensity function model selection for a point process. This includes asymptotic results for estimates of the ‘least false parameter value’, see e.g. Claeskens and Hjort (2008, Section 2.2). Next we derive the BIC in case of a Poisson process and we thus identify what is the meaning of sample size in this context. We also consider the generalization of BIC to composite likelihood BIC (CBIC) (Gao and Song, 2010) using a concept of effective degrees of freedom derived for the CIC. Our asymptotic developments are established under an original setting which embraces both infill asymptotic (the number of points in a fixed domain increases) and increasing domain asymptotics (the volume of the observation window tends to infinity) which are often considered in the literature.
The rest of the article is organized as follows. The problem of selecting a model for the intensity function is specified in Section 2. In Section 3 we discuss asymptotic results for intensity function regression parameter estimators under a ’double’ asymptotic framework. We derive the AIC and CIC for spatial point processes in Section 4 and develop the BIC and CBIC in Section 5. The different model selection criteria are compared in a simulation study in Section 6. Section 7 gives some concluding remarks. Proofs are given in the Appendices A-E.
2 Intensity model selection
A spatial point process defined on is a locally finite random subset of . If for bounded we denote by the cardinality of , locally finite means that is a finite integer almost surely. For a bounded domain , denotes the volume of . The intensity function and the pair correlation function of are defined (if they exist) by the equations
for any bounded .
If the counts are Poisson distributed, is said to be a Poisson process. In this case counts are independent whenever the subsets are disjoint and the pair correlation function is identically equal to one. For our asymptotic considerations, we assume that a sequence of spatial point processes is observed within a sequence of bounded windows , . We denote by and the intensity and pair correlation function of . With an abuse of notation, we denote for any expectation and variance under the sampling distribution of by and .
For modelling the intensity function we assume that covariates are available where for each , is a locally integrable function on . Let , denote the subsets of and let where is the cardinality of . We consider models for specified in terms of the subsets of the covariates. For each and we define the log-linear model
(1) |
where and . The quantity should not be regarded as a parameter to be estimated. For , could e.g. represent a timespan over which is observed. In the following, with an abuse of notation, we just write for and similarly for related quantities.
The problem we consider is to select among the intensity models , , given by
The distinction between and the other parameters is necessary because our objective is to select among different models which all contain an intercept term. Note that the true intensity function does not necessarily correspond to any of the suggested models , .
3 Estimation of the intensity function
In this section we discuss estimation of the intensity function using a Poisson likelihood function and associated asymptotic results.
3.1 The Poisson likelihood function
The density of a Poisson point process with intensity and observed in is given by (see e.g. Møller and Waagepetersen (2004))
(2) |
for locally finite point configurations . We emphasize that in the following we assume neither that introduced in the previous section is a Poisson process nor that coincides with the intensity function of .
Combining (1) and (2), up to a constant, the of (2) evaluated at becomes
(3) |
Let be the corresponding estimating function given by
(4) |
For any , the sensitivity (or Fisher information) matrix is
(5) |
We assume is positive definite for all (see also condition C5 in the next section). We can then define the estimator of as
(6) |
If is indeed a Poisson process with intensity function , then is the maximum likelihood estimator and the sensitivity (5) equals the observed information matrix .
If is not Poisson, may be viewed as a composite likelihood estimator (Schoenberg, 2005; Waagepetersen, 2007). In the situation where the intensity function coincides with the true intensity function , asymptotic properties of maximum likelihood or composite likelihood estimators obtained as maximizers of Poisson likelihood functions have been established in various settings by Rathbun and Cressie (1994), Waagepetersen (2007), Guan and Loh (2007) and Waagepetersen and Guan (2009). In the next section we investigate the more intriguing situation where the intensity model is misspecified.
3.2 Framework and asymptotic results for misspecified intensity functions
To handle the situation where does not coincide with the intensity function of , we follow Varin and Vidoni (2005) and define a (composite) Kullback-Leibler divergence between the model with parameter and the true sampling distribution. That is,
(7) |
where is the Poisson log-likelihood obtained with the true intensity . For a window and model we let
denote the ‘least wrong parameter value’ under model , provided the maximum exists. It is easy to see by explicit evaluation of the right hand side that
and so condition C5 stated below implies that is well-defined as a unique maximum when is large enough. Also it is easy to see that
(8) |
which means that given by (6) is a candidate to estimate .
The remainder of this section is devoted to asymptotic results for within the above framework of a misspecified intensity function. We thereby extend the results in the references mentioned in Section 3.1. In contrast to these references which used either increasing domain or infill asymptotics, we moreover consider a ‘double asymptotic’ framework as formalized by condition C3 presented below.
Two matrices are crucial for the asymptotic results. The sensitivity matrix is given by (5) regardless of whether the model is misspecified or not. The variance-covariance matrix of (4) is, using the Campbell theorem, given by
(9) |
Observe that does not depend on (whence its notation). Our results will be based on the following assumptions where for a square matrix , (resp. ) stands for the smallest (resp. largest) eigenvalue. We use to denote that and .
-
[C1]
As , .
-
[C2]
is continuous, .
-
[C3]
The sequence is an increasing sequence, such that and . The sets are convex and compact.
-
[C4]
As , almost surely.
-
[C5]
For any and , is positive definite. In addition, for any , there exists a set such that and a such that . Finally, we assume that .
-
[C6]
As , .
-
[C7]
As , in distribution.
We can then state the following asymptotic result which is verified in Appendix A.
Theorem 1.
We stress that Theorem 1 (ii)-(iii) do not require the strong consistency of to 0, i.e. condition C4. We conclude by some remarks regarding the assumptions C1-C7. Condition C1 ensures that the sequence of ‘least wrong parameter value‘ does not diverge with .
Condition C3 is different from existing conditions as it embraces both of the standard asymptotic frameworks considered in the literature:
-
•
infill asymptotics: with a bounded set of and as .
-
•
increasing domain asymptotics: and is a sequence of bounded domains of such that .
It is also valid if both asymptotics are considered at the same time. The assumed convexity in C3 enables the use of the mean value theorem in the proofs of our theoretical results.
In Condition C2, assuming an upper bound for is quite standard. The upper bound further implies that assuming a lower bound is not really restrictive since for each covariate we can always find some so that for any . Replacing by while changing the intercept from to leaves the model unchanged. The continuity assumption is used to prove the strong consistency of .
Condition C4 is also used to ensure the strong consistency of . This condition can be seen as a law of large numbers. In Example 1, we present a class of models where such an assumption is valid under the generalized asymptotic condition C3. We can observe that if there exists such that and , then C4 ensues from an application of Borel-Cantelli’s lemma. At least in the increasing domain framework, assuming such a moment assumption is quite standard to derive a central limit theorem.
Condition C5 is very similar to the assumption required by Rathbun and Cressie (1994) under the Poisson case or by Waagepetersen and Guan (2009) for more general point processes, within the increasing domain asymptotic framework. Note in particular that C5 combined with C1-C2 ensures that .
Under the Poisson case, and so C6 is obviously satisfied if . For more general point processes, assume further that does not depend on and is invariant under translations. Then, thanks to C2, the assumption
will imply C6. This assumption is quite standard and satisfied by a large class of models, see e.g. Waagepetersen and Guan (2009).
Continuing within the increasing domain framework, C7 was established by Rathbun and Cressie (1994) in the Poisson case, by Guan and Loh (2007) and Waagepetersen and Guan (2009) for -mixing point processes, and by Lavancier et al. (2020) for determinantal point processes. The infill asymptotic framework was used to establish C7 in case of Poisson cluster processes in Waagepetersen (2007). However, the ‘double’ asymptotic framework has never been considered. Below, we provide an example of a model which satisfies C4, C6 and C7 under the new general asymptotic setting C3.
Example 1.
Let be a homogeneous Poisson point process on with intensity . Given , let , , be independent inhomogeneous Poisson point processses on with intensity where , is a symmetric density on , and is a non-negative bounded function. Then, is an inhomogeneous Poisson cluster point process (with inhomogeneous offspring). It can be shown that and where the notation denotes convolution. Assuming that , there exists such that
Thus C6 is satisfied. In Appendix B we show that this model also satisfies C4 and C7 within the ‘double’ asymptotic framework. Along the same lines one can show that C7 holds for the inhomogeneous Poisson point process with intensity function .
4 Akaike and composite information criteria
One criterion for model selection would be to choose the model that minimizes the Kullback-Leibler divergence, i.e. the model for which (7) evaluated at is smallest. Of course this criterion is not useful in practice since the true model is unknown. Following Varin and Vidoni (2005) we instead choose the model that minimizes an estimate of the expected value of (7) evaluated at , or equivalently, that minimizes an estimate of with . We follow Varin and Vidoni (2005, Lemmas 1-2) to derive in our context the following result.
Proposition 1.
Remark 1.
The technical condition (13) implies that the sequence is a uniformly integrable sequence of random variables. This ensures that convergence in probability of to 0 implies that .
To estimate the effective degrees of freedom we first simply estimate by . The estimation of is more difficult. Following Claeskens and Hjort (2008, p. 31), note that if coincides with the true intensity function of , then coincides with given by where
In this case we get
We propose to use as an approximation of . In practice we replace by and by an estimate obtained by fitting a valid parametric model for and thus obtain . Our composite model selection criterion then becomes
(14) |
For a Poisson process, in which case and (14) reduces to the popular Akaike’s information criterion. For a clustered point process, meaning that . Thus we penalize more the complexity of the model in the case of a clustered point process. This seems to make sense since random clustering of points (i.e. not due to covariates) may erroneously be picked up by covariates that actually had no effect in the data generating mechanism. Hence there is a greater risk of picking a too complex model for the intensity function in case of a clustered point process than for a Poisson process.
5 Bayesian information criterion
The motivation of the Bayesian information criterion (BIC) is quite different from the derivation of the AIC and CIC criteria considered in the previous section. A main difference is that there is initially no reference to a Kullback-Leibler distance or asymptotics related to a ‘least false parameter value’. Instead, the true model is considered to be one of the models and the idea is to choose the model that has maximum posterior probability within the specified Bayesian framework. However, the asymptotic concepts again play a role in order to derive asymptotic expansions of the posterior probabilities. Section 5.1 covers the Poisson process case. Section 5.2 proposes a composite likelihood BIC in the case where data is not generated from a Poisson process. Note that in this case, it is still assumed that the true intensity function corresponds to one of intensity functions for the models .
5.1 BIC in the Poisson process case
The BIC criterion (see e.g. Schwarz (1978); Lebarbier and Mary-Huard (2006)) defines the best model as
(15) |
From Bayes formula,
Letting denote the prior density of given ,
since is the conditional density of given and . We assume that the prior distribution over models is non-informative, so that the BIC criterion defines the best model as
(16) |
In principle, one could evaluate the using numerical quadrature and then determine . However, this is computationally costly and also the need to elicit a specific prior for each model may be a nuisance. Our next result therefore proposes an asymptotic expansion of . The methodology is standard (basically a Laplace approximation) and well-known in the literature, see Tierney and Kadane (1986) or e.g. Lebarbier and Mary-Huard (2006) and the references therein. However, due to our spatial framework and the double asymptotic point of view considered in this paper, the standard results do not apply straightforwardly. Due to the use of asymptotic results we again need to rely on the notions of a true model and the ‘least false parameter value’ .
We impose the following conditions on the prior for and the mean .
-
[C8]
The prior density of given is continuously differentiable on .
-
[C9]
.
Note that is the marginal mean of under the true intensity model as discussed in Section 2. Condition C9 seems reasonable since it would hold if coincided with any of the specified parametric intensity models . We also need to slightly strengthen assumption C1 and replace it by
-
[C1′]
As , there exists such that .
The following result is verified in Appendix D using Łapiński (2019, Theorem 2), which is a rigorous statement of a multivariate Laplace approximation.
Proposition 2.
The criterion (16) is defined entirely within the specified Bayesian framework. Hence no reference to a true model and no need for asymptotic results. However, this is changed when we derive the expansion (17) for (16). The expansion is around and for technical reasons, when applying the Laplace approximation, convergence of is needed. Therefore we need the assumptions that ensure strong consistency of .
Since and from the strong consistency of , we have that while C8 ensures that is . So, if we neglect terms which are in (17), we follow the standard heuristic and suggest to define a first version of the BIC criterion as
(18) |
where we remind that is the length of .
In practice is not known. However, since it follows that . This justifies to define the criterion in the following natural way
(19) |
5.2 Composite likelihood BIC
Suppose has an intensity function of the form (1) but is not a Poisson process. Then as mentioned in Section 3.1, (3) may be viewed as a composite likelihood score for estimating . In this case, following Gao and Song (2010), (16) may be viewed as a composite likelihood BIC. Again we obtain (19) from (16) by Laplace approximation. However, Gao and Song (2010) suggest to replace by the ‘effective degrees of freedom’ considered in Section 4. Thus our proposed composite likelihood BIC is
which becomes equal to the ordinary BIC in (19) for a Poisson process. From a practical point of view, we simply estimate as in (14).
6 Simulation study
To evaluate the proposed model selection criteria we conduct two simulation studies with spatial covariates, of which four have zero effect. The first study in Section 6.1 considers AIC and BIC in the Poisson point process case. The second study considers the clustered Thomas point process where we employ the CIC and CBIC criteria to select the best model and compare with results obtained using AIC and BIC (assuming wrongly that the simulated data are from a Poisson point process).
The covariates are obtained from the BCI dataset (Hubbell and Foster, 1983; Condit et al., 1996; Condit, 1998) which in addition to locations of around 300 species of trees observed in () contains a number of spatial covariates. In particular, we center and scale the two topological covariates (elevation and slope of elevation) and four soil nutrients (aluminium, boron, calcium, and copper). The six covariates are depicted in Figure 1.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Skipping the dependence on in the notation, we model the intensity function as
(20) |
where are the centered and scaled covariates as in Figure 1 and are the regression coefficients. We consider different settings for and . Note that plays the role of and will be adjusted to obtain desired expected numbers of points in . When the simulation involves an observation window different from , the covariates are simply rescaled to fit .
The model selection criteria are compared in terms of the true positive rate (TPR), false positive rate (FPR), expected Kullback-Leibler divergence (MKL), and mean integrated squared error of the intensity function (MISE). The TPR (resp. FPR) are the expected fractions of informative (resp. non-informative) covariates included in the selected model. For a point process observed on , with intensity where stands for the true parameter estimated by , MKL and MISE are estimated by averaging the following KL and ISE across simulations:
6.1 Poisson point process model
We consider two different scenarios to illustrate both types of asymptotics.
-
•
Scenario 1 (infill asymptotics). . We adjust such that equals either or .
-
•
Scenario 2 (increasing domain asymptotics). or and is chosen so that or .
For each scenario and the choices of and , 500 simulations are generated from inhomogeneous Poisson point processes with intensity function given by (20) using the function from the package. We set and to represent moderate effects of elevation and slope, and set . For each simulation, parameters are estimated by maximizing the Berman-Turner approximation (see e.g. Baddeley and Turner, 2000) of the Poisson log-likelihood (3) using a number of quadrature dummy points to approximate the integral in (3). The estimation is done using the function. Then, the model is selected according to the AIC and BIC-type criteria. We consider different variants of the BIC criterion, namely
where represents a penalty. Note that (omitting dependence on ) and that () corresponds to the criterion used by Thurman et al. (2015) and is also the criterion suggested by the present paper. We also consider used by Choiruddin et al. (2018) and considered by Jeffrey et al. (2018).
AIC | BIC | ||||
TPR | 70 | 59 | 49 | 100 | |
FPR | 16 | 5 | 1 | 100 | |
MISE | 6.5 | 6.0 | 6.0 | 7.9 | |
MKL | 3.0 | 2.9 | 2.9 | 3.6 | |
TPR | 94 | 83 | 63 | 100 | |
FPR | 17 | 2 | 0 | 100 | |
MISE | 2.6 | 2.3 | 3.3 | 3.2 | |
MKL | 2.9 | 2.8 | 4.2 | 3.5 | |
TPR | 95 | 83 | 63 | 62 | |
FPR | 16 | 2 | 0 | 0 | |
MISE | 1.0 | 0.9 | 1.3 | 1.3 | |
MKL | 2.8 | 2.7 | 4.1 | 4.2 | |
TPR | 100 | 99 | 99 | 98 | |
FPR | 18 | 1 | 0 | 0 | |
MISE | 10.4 | 6.5 | 6.7 | 7.0 | |
MKL | 2.9 | 1.8 | 1.9 | 2.0 |
For both scenarios, we perform estimation with (the rule of thumb suggested by spatstat) and model selection with the criteria , , and . Similar results are obtained with and and we omit the results for the latter. We also perform estimation and selection with and only report results for since the results with the criteria , and are similar to those obtained when the estimation is performed with .
When is small, especially when , the criterion obviously fails as it selects the most complex model regardless of the value of . Thereby the FP rate becomes 100%. In addition, as indicated in the second and third rows of Table 1 where the point patterns have the same average number of points, it is worth noticing that selects pretty different models. In particular this criterion has an undesirable strong dependence on the choice of length unit.
The criterion with a large also fails since the TPR with this criterion and is very small compared to the other criteria, especially when the expected number of points is small or moderate. The AIC criterion achieves a high TPR in all situations but fails since it suffers from a high FPR, even in the scenario 2 where .
In all cases, provides the best trade-off between TPR and FPR and the results improve when or is increased. The minimal values of MKL and MISE are further always obtained with . In case of a Poisson process, we therefore recommend (simply denoted BIC in the following).
6.2 Thomas point process model
To generate a simulation from a Thomas point process with intensity (20), we first generate a parent point pattern from a stationary Poisson point process with intensity . Given , clusters , , are generated from inhomogeneous Poisson point processes with intensity functions
where is the density for . Finally, is an inhomogeneous Thomas point process with intensity (20). The regression parameters are set as follows: , , . We consider and two scale parameters and . A lower value for tends to produce more clustered patterns. We consider the observation domains and with adjusted to give expected numbers of points and for the two windows. The chosen value of implies on average 50 parent points on and 200 parent points on .
AIC | BIC | CIC | CBIC | ||
---|---|---|---|---|---|
TPR | 98 | 96 | 88 | 81 | |
FPR | 81 | 65 | 24 | 17 | |
MISE | 4.5 | 4.4 | 2.7 | 2.5 | |
MKL | 9.7 | 9.6 | 7.5 | 9.1 | |
Mean | - | - | 103.6 | 76.6 | |
SD | - | - | 122.7 | 70.1 | |
TPR | 100 | 100 | 97 | 94 | |
FPR | 81 | 65 | 20 | 11 | |
MISE | 3.9 | 3.8 | 2.6 | 2.4 | |
MKL | 10.3 | 10.2 | 7.9 | 8.2 | |
Mean | - | - | 101.1 | 81.3 | |
SD | - | - | 104.6 | 70.2 | |
TPR | 100 | 99 | 95 | 93 | |
FPR | 70 | 49 | 43 | 32 | |
MISE | 2.1 | 2.0 | 1.8 | 1.8 | |
MKL | 5.2 | 5.0 | 4.9 | 5.4 | |
Mean | - | - | 48.9 | 40.9 | |
SD | - | - | 77.3 | 61.3 | |
TPR | 100 | 100 | 96 | 94 | |
FPR | 78 | 57 | 36 | 24 | |
MISE | 2.8 | 2.8 | 2.3 | 2.5 | |
MKL | 7.7 | 7.5 | 7.3 | 9.3 | |
Mean | - | - | 96.1 | 78.1 | |
SD | - | - | 172.1 | 124.6 |
The R function rThomas is used to simulate the point patterns and the function to estimate the regression parameters. We use dummy points for the different integral approximations. Models are then selected using four criteria: AIC, BIC, CIC and CBIC. When using AIC or BIC we implicitly assume wrongly that the simulated patterns come from a Poisson point process and thus we set . For the composite likelihood type criteria CIC and CBIC, we compute using the R function vcov.kppm. The parameters and are estimated using minimum contrast estimation with the tuning parameter (Waagepetersen and Guan, 2009) set to 20 when and to 50 when .
Results are reported in Table 2. The AIC and BIC criteria ignore the second-order structure of the simulated point patterns and we observe that overall AIC and BIC produce high TPR but also very high FPR. The CIC and CBIC give much more reasonable trade-offs between TPR and FPR and also (with one exception) give smaller MKL and MISE than AIC and BIC.
Focusing on CIC and BIC, we first notice that the means of the estimates of are high. This is because the considered clustered point processes are far from Poisson models. We also notice that even when the number of points is quite large, the estimation of is inaccurate with standard deviations of the estimates of the same order as the means. Nevertheless, the resulting CIC and CBIC give reasonable results. Comparing CIC and CBIC, CIC in general has a higher FPR than CBIC but on the other hand always gives the smallest MKL. The TPR and MISE are quite similar for CIC and BIC. Hence in the case of the clustered point process considered here, CIC and CBIC clearly outperform AIC and BIC but there is not a clear winner between CIC and CBIC.
7 Discussion
In this paper we establish a theoretical foundation for various model selection criteria for inhomogeneous point processes under various asymptotic settings. In case of a Poisson process a main contribution is to identify in relation to BIC, the correct interpretation of ‘sample size’ which based on our theoretical derivation is the expected number of points which in practice is estimated by the observed number of points. This interpretation is supported by our simulation study which also supports the common understanding that BIC may be preferable to AIC which tends to pick too complex models.
More generally for selecting a regression model for the intensity function of a general point process we develop composite model selection criteria, CIC and CBIC that clearly outperform AIC and BIC in the simulation study for a clustered point process. One issue regarding CIC and BIC is to estimate the bias correction for the estimate of the composite Kullback-Leibler divergence which depends on the unknown true intensity function and pair correlation function. Here, inspired by the approach underlying AIC, for a given model we simply plug in the fitted intensity function and pair correlation function for the model in question. This is computationally convenient and we leave it as an open problem to develop more precise estimates.
Further interesting topics for future research would be to study the
theoretical foundation of criteria for selecting models for the
conditional intensity of a Gibbs process fitted by
pseudo-likelihood. Another interesting problem is selection of the
penalization parameter when the intensity function is estimated using
regularization methods like the lasso (see e.g Thurman et al., 2015; Choiruddin et al., 2018). This is not covered by our theoretical results which rely on asymptotic results for unbiased estimating functions or Bayesian considerations.
Acknowledgments
The research of J.-F. Coeurjolly is supported by the Natural Sciences and Engineering Research Council.Rasmus Waagepetersen is supported by The Danish Council for Independent Research | Natural Sciences, grant DFF - 7014-00074 "Statistics for point processes in space and beyond", and by the Centre for Stochastic Geometry and
Advanced Bioimaging, funded by grant 8721 from the Villum Foundation.
Appendix A Proof of Theorem 1
Proof.
(i) Using a zero order Taylor expansion of (which equals zero by definition) around expressed with integral remainder term, we have
By rearranging the right-hand side of the latter equation and using Fubini’s theorem, we have
Now, using C2-C3, we can apply a mean value theorem for multiple integrals: there exists such that
(21) |
Since almost surely by condition C2, we deduce (10) from (21) using a little algebra. Let
By conditions C1-C4, it is clear that almost surely as . Using the continuity of and again condition C2, we deduce that tends to 0 almost surely.
(ii) The proof consists in applying a modified version of Waagepetersen and Guan (2009, Theorem 2) to the sequence of estimating functions . The modification is given in Appendix E and is needed to handle a sequence of ‘least false’ parameter values instead of a unique ‘true’ value . Thus, we need to prove the assumptions G1-G4 in Appendix E. We leave the reader to check that conditions C3 and C5 imply conditions G1 and G2 with and .
To prove assumption G3, we have to prove that as ,
where . Consider a such that . Under conditions C1-C3, there exist such that
(22) |
To verify condition G4, it is sufficient to note that condition C6 implies that the variance of is bounded. Letting denote the (unique for large enough) solution of or, equivalently,
we can then conclude that is a root- consistent estimator of , that is, is bounded in probability, which proves (11).
(iii) We use a Taylor expansion around : there exists and such that
where and where . We now show that . Let and be given as above and let . Using (22) we obtain
Now from (ii), for any , by choosing large enough, for sufficiently large whereby for sufficiently large.
Appendix B Conditions C4 and C7 for the inhomogeneous Poisson cluster point process
In this section, we show that the inhomogeneous Poisson cluster point process presented in the end of Section 3.2 satisfies C4 and C7. Recall where .
Proof.
For , let be independent inhomogeneous Poisson point processes with intensity . By the property of any Poisson point process, has the same distribution as . Define . Then,
Using twice the Slivnyak-Mecke Theorem (see e.g. Theorem 3.1 in Møller and Waagepetersen, 2004),
where . Hence,
Moreover, by (8),
so that with . Since C6 is satisfied as shown in Example 1 it follows that the elements of are whereby (elementwise). Therefore, we can apply Kolmogorov’s strong law of large numbers to establish that tends almost surely to 0. Using the same conditions and C5, we can also apply the Lindeberg-Feller theorem and obtain that as , in distribution. ∎
Appendix C Proof of Proposition 1
For a multi-index with cardinality and a times differentiable function we define for ,
and we also use the notation .
Proof.
We follow the sketch of the proof of Varin and Vidoni (2005, Lemmas 1-2). Let be an independent copy of . Let and be the composite likelihood and its corresponding estimating equation evaluated at . And we remind the notation , , and . We have
Using a first order Taylor expansion of around with integral remainder term, we have
where and where the integral remainder term can be expressed as
It is worth noticing that for any and any such that , is a deterministic function of . Therefore does not depend on . Using this and the unbiasedness of we have
Hence,
(23) |
Now, using a first order Taylor expansion of around , we have
(24) |
Combining (23) and the expectation of (24) we obtain
(25) |
Since by definition of , where for some we continue with
(26) |
where . Since is bounded in probability by C6 and converges in probability since converges to zero in probability, we obtain converges in probability. This combined with uniform integrability (13) implies . ∎
Appendix D Proof of Proposition 2
Proof.
We remind the notation .
(27) |
Now, we are in the situation where we can apply Łapiński (2019, Theorem 2), which gives rigorous conditions under which a multivariate Laplace approximation holds. First, condition C8 ensures that is regular enough. Second, condition C5 ensures that has a nonsingular Hessian matrix for any . Third, for with , we have for any (since )
Therefore, under condition C2, is uniformly bounded for in any compact subset of .
The fact that is sufficient to ensure Łapiński (2019, condition (5)). Finally, by the strong consistency of to 0 and from condition C1′, the last condition to check is formulated as follows: we need to verify that there exists such that for , has a unique maximum in a closed ball for some where is given by condition C1′) and such that
(28) |
By condition C5, for any , (and thus ) has indeed a unique maximum. Letting there exists such that almost surely . Consider with such and . Note that for large and any , almost surely.
Using a first order Taylor expansion around and using the fact that , we have
where the remainder term is
where . Consider the set given by C5. Since for large , and since is non negative and has a unique minimum at zero, we have by condition C5 that
for some . Therefore,
Again, by definition of , the strong consistency of and condition C2, we conclude that for large there exists such that whereby we deduce that .
Appendix E Modified version of Theorem 2 in Waagepetersen and Guan (2009)
Consider a sequence of estimating functions , whose distribution is determined by some underlying probability measure generating the data at hand. For a matrix , , and we let , assuming that is differentiable.
Theorem 2.
Assume that there exists a sequence of invertible symmetric matrices and a sequence of parameter values such that
-
G1
.
-
G2
There exists an so that tends to zero where
-
G3
For any ,
in probability under .
-
G4
The sequence is bounded in probability (i.e. for each there exists a so that for sufficiently large).
Then for each , there exists a such that
(29) |
whenever is sufficiently large.
Proof.
The event
occurs if for all with since this implies for some (Lemma 2 in Aitchison and Silvey, 1958). Hence we need to show that there is a such that
for sufficiently large . To this end we write
where . Then
The first term can be made arbitrarily small by picking a sufficiently large and letting tend to infinity. The second term converges to zero as tends to infinity. ∎
References
- Aitchison and Silvey [1958] J. Aitchison and S. D. Silvey. Maximum-likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics, 29:813–825, 1958.
- Akaike [1973] H. Akaike. Information Theory and an Extension of the Maximum Likelihood Principle, pages 199–213. Springer New York, New York, NY, 1973.
- Baddeley and Turner [2000] A. Baddeley and R. Turner. Practical maximum pseudolikelihood for spatial point patterns. Australian & New Zealand Journal of Statistics, 42(3):283–322, 2000.
- Baddeley et al. [2015] A. Baddeley, E. Rubak, and R. Turner. Spatial Point Patterns: Methodology and Applications with R. CRC Press, 2015.
- Berman and Turner [1992] M. Berman and R. Turner. Approximating point process likelihoods with GLIM. Applied Statistics, 41:31–38, 1992.
- Choiruddin et al. [2018] A. Choiruddin, J.-F. Coeurjolly, and F. Letué. Convex and non-convex regularization methods for spatial point processes intensity estimation. Electronic Journal of Statistics, 12:1210–1255, 2018.
- Claeskens and Hjort [2008] G. Claeskens and N.L. Hjort. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2008.
- Coeurjolly and Lavancier [2019] J.-F. Coeurjolly and F. Lavancier. Understanding spatial point patterns through intensity and conditional intensities. In Stochastic Geometry, number 2237 in Lecture Notes in Mathematics, chapter 2. Springer Verlag, 2019.
- Condit [1998] R. Condit. Tropical Forest Census Plots. Springer-Verlag and R. G. Landes Company, Berlin, Germany and Georgetown, Texas, 1998.
- Condit et al. [1996] R. Condit, S.P. Hubbell, and R.B. Foster. Changes in tree species abundance in a neotropical forest: impact of climate change. Journal of tropical ecology, 12(2):231–256, 1996.
- Gao and Song [2010] X. Gao and P. X.-K. Song. Composite likelihood Bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 1-5:1531–1540, 2010.
- Guan and Loh [2007] Y. Guan and J. M. Loh. A thinned block bootstrap procedure for modeling inhomogeneous spatial point patterns. Journal of the American Statistical Association, 102:1377–1386, 2007.
- Hubbell and Foster [1983] S. P. Hubbell and R. B. Foster. Diversity of canopy trees in a neotropical forest and implications for conservation. In S. L. Sutton, T. C. Whitmore, and A. C. Chadwick, editors, Tropical Rain Forest: Ecology and Management, pages 25–41. Blackwell Scientific Publications, Oxford, 1983.
- Jeffrey et al. [2018] D. Jeffrey, J. Horrocks, and Umphrey G.J. Penalized composite likelihoods for inhomogeneous Gibbs point process models. Computational Statistics & Data Analysis, 124:104–116, 2018.
- Łapiński [2019] T. M. Łapiński. Multivariate Laplace’s approximation with estimated error and application to limit theorems. Journal of Approximation Theory, 248:105305, 2019.
- Lavancier et al. [2020] F. Lavancier, A. Poinas, and R. Waagepetersen. Adaptive estimating function inference for non-stationary determinantal point processes. Scandinavian Journal of Statistics, 2020. Appeared online.
- Lebarbier and Mary-Huard [2006] E. Lebarbier and T. Mary-Huard. Une introduction au critère BIC: fondements théoriques et interprétation. Journal de la Société Française de Statistique 1 (147), 39-58, 2006.
- Møller and Waagepetersen [2004] J. Møller and R. P. Waagepetersen. Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall/CRC, Boca Raton, 2004.
- Møller and Waagepetersen [2017] J. Møller and R. P. Waagepetersen. Some recent developments in statistics for spatial point processes. Annual review of Statistics and its Applications, 4:317–342, 2017.
- Rathbun and Cressie [1994] S. L. Rathbun and N. Cressie. Asymptotic properties of estimators for the parameters of spatial inhomogeneous Poisson point processes. Advances in Applied Probability, 26:122–154, 1994.
- Schoenberg [2005] F.P. Schoenberg. Consistent parametric estimation of the intensity of a spatial-temporal point process. Journal of Statistical Planning and Inference, 128:79–93, 2005.
- Schwarz [1978] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–464, 03 1978.
- Thurman et al. [2015] A. L. Thurman, R. Fu, Y. Guan, and J. Zhu. Regularized estimating equations for model selection of clustered spatial point processes. Statistica Sinica, pages 173–188, 2015.
- Tierney and Kadane [1986] L. Tierney and J.B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81:82–86, 1986.
- Varin and Vidoni [2005] C. Varin and P. Vidoni. A note on composite likelihood inference and model selection. Biometrika, 92(3):519–528, 2005.
- Waagepetersen [2007] R. Waagepetersen. An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Biometrics, 63:252–258, 2007.
- Waagepetersen and Guan [2009] R. Waagepetersen and Y. Guan. Two-step estimation for inhomogeneous spatial point processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71:685–702, 2009.