What is the Fourier Transform of a Spatial Point Process?

Tuomas Rajala, Sofia C. Olhede, Jake P. Grainger and David J. Murrell T. Rajala is with Natural Resources Institute Finland, 00790 Helsinki, Finland.S. C. Olhede and J. P. Grainger are with the Institute of Mathematics, EPFL, Lausanne, Switzerland.D. J. Murrell is with the Research Department of Genetics, Evolution and Environment, Centre for Biodiversity and Environment Research, University College London, UK.

Abstract

This paper determines how to define a discretely implemented Fourier transform when analysing an observed spatial point process. To develop this transform we answer four questions; first what is the natural definition of a Fourier transform, and what are its spectral moments, second we calculate fourth order moments of the Fourier transform using Campbell’s theorem. Third we determine how to implement tapering, an important component for spectral analysis of other stochastic processes. Fourth we answer the question of how to produce an isotropic representation of the Fourier transform of the process. This determines the basic spectral properties of an observed spatial point process.

Keywords:

Spectral density function, Spatial point processes, Debiased periodogram, Tapering.

1 Introduction

Spatial point processes are an important form of observational data structure, for example in forest ecology [77], communications networks [57, 49], epidemiology [31], social science [72], pharmacology [35] and medicine [2] amongst many other application fields. Understanding the properties of a point process can be approached from many different perspectives [24], and the aim of this paper is to determine how to extract the frequency (or scale/wavenumber) behaviour of an observed point process, as well as connect that to its spectral representation.

The most common assumption used when analysing spatial processes is that of spatial homogeneity (or stationarity). This means that if you shift all of the observations by a fixed spatial shift, the distribution of those observations does not change from the distribution of the original sample; and if the observational window is changed, then understanding the full set of observations remains tractable. The consequence of this probabilistic invariance in distribution is the spectral representation of a stochastic process. The existence of the spectral representation of a stochastic process means the Fourier transform fully characterises the second order properties of that stochastic process. The Fourier transform also characterises the second order properties of a point process, see e.g. [20]. Yet unlike random fields and time series, spectral analysis of point processes is still in its infancy, see also [5, 6, 56], and critically, the digital processing of a point process remains fully outstanding. Recent interest in Fourier features in machine learning based approaches for point patterns such as [38, 44] show the potential of using Fourier based information as features for estimation and detection. The work in this manuscript both establishes what Fourier features to calculate for homogeneous processes from a sampling perspective, and their large but finite sampling area properties, just like [33] determined the large but finite properties of Fourier representations for random fields. We note that machine learning techniques [73, 48] utilised Fourier features in learning algorithms for parametric models before precise and general understanding was established.

Thus despite inspirational work by the aforementioned authors, several key aspects of discrete spectral analysis lie unresolved when applied to observed point processes. In particular, 1) given the Fast Fourier Transform is unavailable (unlike the case for regularly sampled time series or random fields), how do we efficiently implement the calculation of whichever discrete Fourier transform we chose to define? 2) How does that discrete transform relate to the underlying spectral measure of the process? 3) Can the discrete transform be improved by linear operations such as tapering as is the case for other stochastic processes? 4) If yes, how do we then select a taper? 5) How do we define a radial or isotropic transformation to simplify the representation of the spatial point process? There are many, many more questions to answer before bringing the spectral analysis of spatial point processes to the sophistication of the analysis of random fields, but these currently unresolved questions are positioned as the first hurdles to overcome in the path of whomever wishes to develop more sophisticated spectral analysis techniques. Once the fundamental properties of the spectral representation of a stationary point process have been developed and understood, results could be extended to inhomogeneous point processes, but that is not the aim of this work.

To put this in the context of information theory; understanding how to compute the Fourier transform from a spatially compact signal had already sparked a lengthy debate from the first introduction of tapering [70], and selection of optimal tapers using localization operators [67, 66, 17]. While some aspects of tapering in the context of point patterns is reflected by topics in mature (multi)dimensional tapering and spectral estimation [1, 45, 46, 33], understanding how to adapt these ideas to spatial point processes remains fully outstanding, and must be answered using our understanding of irregularly sampled processes [11]. Understanding and characterising point patterns is naturally a problem of general interest, e.g. [12, 13].

Why then do we want to understand the spectral information of a point pattern? The spectral information of any stochastic process is the same as the spatial information as there is a bijection between the two, but the former is directly showing the scale of variation of the process. To illustrate, let us simulate a complex point pattern and compare its spectral content with its usual spatial representation. Log-Gaussian Cox processes [55], popularly used in applications, are marginally stationary processes whose patterns are more variable than models that do not depend on latent random variables. The pattern (a single realization), and its estimated spectrum (the subject of this paper) are shown in Figure 1. We have chosen a process which is isotropic, a choice we have made so that its usual spatial representation and spectrum is isotropic also, and thus easier to interpret. In the left hand subplot we see the pattern itself, in the middle subplot we show a common spatial summary, the pair correlation function (pcf) comparing both the estimate and truth, and in the right hand subplot we show the spectrum on a decibel scale ( $10\log_{10}$ ), both estimated and true. While the raw pattern (left) looks mainly unremarkable, naturally its periodicity is present in the pcf (middle), but is more obvious in the spectrum (right). We can even estimate the periodicity as having frequency 1. This is immediately recoverable directly from its estimated spectrum.

Refer to caption — Figure 1: A realisation of a log-Gaussian Cox process (left), the estimated against true product density (middle), and estimated against true spectral density function on a decibel scale (right).

To get to the point where we can estimate the spectrum of a point pattern, we need to answer the five questions posed earlier. We will therefore start by determining how to compute Fourier transforms of point processes, discussing the question of how to form direct spectral estimators [58] in this setting. This will be answered in terms of the mean and (co)-variance of the different discrete Fourier transforms that we define. The first surprising result is that the natural direct spectral estimator is biased unless it is mean corrected (this bias was noted by [23], but only corrected in an ad-hoc fashion). We show how to perfectly eliminate this bias.

Second, to understand the variance and covariance of a direct spectral estimator, we need to calculate a fourth order moment of our choice for the discrete spectral transform. This is complicated since Campbell’s formula is required to derive its form for a point process. This approach can be contrasted with using Isserliss’ theorem to determine higher order moments for a Gaussian Process [40]. Calculating covariances between direct spectral estimators gives us a way to determine what grid of wavenumbers the spectral estimator is uncorrelated at, and thus where to evaluate it.

Third, tapering [70] is required to define the direct spectral estimator and avoid leakage; but how to taper a point pattern is an open question. Most tapers are defined for regularly spaced stochastic processes; but there are some continuous tapers. We choose to use the continuous tapers of Riedel and Sidorenko [60] to construct the direct spectral estimator, and use multitapers to reduce the variance of such estimators. We will determine how to implement both.

Fourth, many spatial models are radially symmetric, or isotropic. This prompts us to describe how to construct an isotropic representation of the spectral content of a point processes. There are two possible approaches, namely using the Bessel function [23] to do the transformation; or isotropizing a general spectral representation [25]. We describe how to define the appropriate tapering in this instance, inspired by other isotropic harmonic decompositions [41].

Calculating the Fourier transform of a point process is useful beyond the point pattern itself. If we sample a stochastic process with a point process then the spectrum of a point pattern is equally important to that of the stochastic process to determining the spectrum of the observations, e.g. [50]. For this reason; the study of the frequency information of an observed process is of interest in its own right, and beyond, and we will discuss the implications that arise for point processes.

In Section 2 we define the basic concepts required by the second order representation of a point pattern. In Section 3 we discuss tapering of the DFT and the direct moments of the DFT. In Section 4 we determine how to form a spectral density estimator from our understanding of how to form the DFT and its mean and variance. The next step is to study the covariance structure of such spectral density estimators, see Section 5. We then use that understanding to define how to do linear smoothing in Section 6. For multidimensional spectral representations, the full anisotropic structure can be hard to interpret; we therefore propose 1d isotropic or isotropised summaries to characterise such processes, in Section 7. We then present a simulation study in Section 8, and demonstrate the potential of the proposed methodology in an example from forest ecology in Section 9. We conclude with a discussion in Section 10.

2 Notation and Definitions

In this Section we shall give the basic notation necessary for the spectral analysis of a spatially homogeneous point process. We assume that we observe a spatially homogeneous point process $X$ with intensity $\lambda=\rho^{(1)}>0$ and with $\rho^{(2)}(z)$ defined as the (second order) product density of $X$ . Assuming $X$ is spatially homogeneous means that $\rho^{(2)}(z)$ only has one argument ( $z\in{\mathbb{R}}^{d}$ , namely the spatial shift) rather than depending on two local spatial variables (say $x$ and $y$ rather than just $z=x-y$ ). We assume that $X$ is a simple point process on ${\mathbb{R}}^{d}$ , meaning no duplicate points are allowed. We take $d=2,3,\dots$ , and so exclude the case of $d=1$ , which is relatively well-studied, see for example the references in [19, 43, 75]. For a (Borel) set $A$ in ${\mathbb{R}}^{d}$ , we define $|A|$ for the volume of $A$ and $N(A)=|X\cap A|$ for the number of points of $X\subset{\mathbb{R}}^{d}$ in $A$ . For set $A$ and $z\in\mathbb{R}^{d}$ , denote the $z$ shifted set by $A_{z}=\{x+z:x\in A\}$ . We denote by $B$ any subset of ${\mathbb{R}}^{d}$ where points are observed (or equivalently our observation domain). For any complex number $z$ we use the superscript $z^{\ast}$ to denote the conjugate of $z$ .

In complete analogue with a random field we shall define the spectral density of $X$ as the Fourier transform of the complete covariance function of $X$ . This is the standard approach, and was first proposed by [6]. The complete covariance function of a stationary point process is therefore

\gamma(z)\equiv\lambda\delta(z)+\rho^{(2)}(z)-\lambda^{2},\quad z\in{\mathbb{R}^{d}},

(1)

where $\delta(\cdot)$ is the Dirac delta function. The spectral density function (sdf) [6] of the point process $X$ is then defined as the Fourier transform of the complete covariance function:

f({k})\equiv\mathcal{F}[\gamma]({k})\ =\int_{{\mathbb{R}}^{d}}e^{-2\pi i{k}\cdot z}\gamma(z)\,dz\ =\ \lambda+\int_{{\mathbb{R}^{d}}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz,\qquad{k}\in{\mathbb{R}}^{d}.

(2)

This representation is in direct analogy with the corresponding spectral decomposition of a random field or a time series. The symbol $\mathcal{F}$ denotes the Fourier transform, with $i$ as the imaginary unit. We refer to the argument ${k}$ as the “wavenumber” rather than frequency to acknowledge its multi-dimensionality. This is a natural choice of a Fourier transform in analogy to the analysis of random fields [68]. To avoid additional constants in the inverse Fourier transforms, we parameterise the Fourier transform with wavenumber instead of the customary angular frequency ${\omega}=2\pi{k}$ used by [6] and some others.

For a time series or random field there are a number of spectral results, ranging from the spectral representation theorem (I) [58, p130] and to Bochner’s (Herglotz) theorem (II) [58]. Both these results are not exactly available in the point process setting, but for a time series $X_{t}$ we can note them:

	$\displaystyle X_{t}$	$\displaystyle\overset{(I)}{=}\mu+\int_{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i{k}t}dZ({k}),\quad{\mathrm{\bf Var}}\{dZ({k})\}\overset{(II)}{=}f({k})d{k},$		(3)
	$\displaystyle\gamma(\tau)$	$\displaystyle\overset{(III)}{=}\int_{-\frac{1}{2}}^{\frac{1}{2}}f({k})e^{2\pi i{k}\tau}\,d{k},\quad f({k})\overset{(IV)}{=}\sum_{\tau}\gamma(\tau)e^{-2\pi i{k}\tau}.$		(4)

Equation (3) decomposes $X_{t}$ into random contributions associated with each frequency ${k}$ . The covariance in Eqn (4) is also decomposed into weighted frequency contribution. Both of these observations are important for the interpretation of the Fourier representation of $X_{t}$ . “Important” contributions correspond to $f({k})$ being considerably larger relative to other $f({k}^{\prime})$ as in that scenario ${\mathrm{\bf Var}}\{dZ({k})\}$ is bigger than ${\mathrm{\bf Var}}\{dZ({k}^{\prime})\}$ and therefore $|dZ({k})|$ likely to be larger than $|dZ({k}^{\prime})|$ . Spectral representations of point processes are also discussed in [20, Ch. 8]. We can yet again decompose the (complete) covariance in terms of a spectral representation as in (II), and it takes the form in terms of a $d$ -dimensional Fourier transform of

f({k})=\int e^{-2\pi i{k}\cdot z}\gamma(z)\;dz=\lambda+\int e^{-2\pi i{k}\cdot z}\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\;dz.

(5)

We refer to $f({k})$ as the spectrum of $X$ . We note that the Fourier transform can be inverted to yield the equality of (an analogy of III):

\rho^{(2)}(z)=\lambda^{2}+\int_{{\mathbb{R}}^{d}}f({k})e^{2\pi i{k}\cdot z}\,d{k}.

(6)

We assume the spectrum is the primary object of interest in our study of point patterns, as the covariance $\gamma(z)$ can be fully determined from it, and the covariance characterises the point process. We see directly from (5) that $\rho^{(2)}(z)-\lambda^{2}$ and not the complete covariance $\gamma(z)$ is playing the role of a time series covariance. From a nonparametric perspective the spectrum characterises what wavenumbers are more notable (distinct) in the process. As (5) specifies a constant level $\lambda$ to all wavenumbers, the notable (distinct) wavenumbers are determined from the Fourier transform of $\rho^{(2)}(z)-\lambda^{2}$ .

We have to exercise some caution when interpreting the Fourier transform as a bijective transform. Yes, it should contain the same information about scale, but the meaning of the word “scale” will be different from a simple spatial understanding. In time or space the notion is associated with the support of $\gamma(z)$ or $\rho^{(2)}(z)$ . Being supported over a wavenumber ${k}$ means variation over scales $1/\|{k}\|$ is present in the correlation function, or to represent the variability in $\gamma(z)$ we need wavenumbers $\|{k}\|$ present in the spectral representation. The function $\rho^{(2)}(z)-\lambda^{2}$ is often approached in terms of what scales it is non-zero at, but the variation in the function can also be associated with long or short scales. So for a covariance function, we now have two notions of what it means to possess scales $\|{k}\|$ ; either $\gamma(\mathbf{u}/\|{k}\|)$ is non-zero for unit vector $\mathbf{u}$ or $\gamma(z)$ is variable (changing) and the scale of the change is $\|{k}\|$ . Imagine observing a sinusoid with period $k$ . The function will hit unity at a regular set of $z$ values, and so it will be supported at those $z$ values, but also its Fourier transform will be supported at $1/k$ .

This is clear from Figure 2 where covariance is present at smaller or medium scales smoothly for the clustered Thomas processes, or repelled for the Matérn hard-core processes. As the Matérn process’ complete covariance function is discontinuous, its Fourier transform is supported over all scales (due to the Heisenberg-Gabor uncertainty principle [14]). This can be deduced from (6). To reproduce the discontinuity we need all scales in the Fourier representation.

We see from (5) that not all wavenumbers are given equal weighting. First all wavenumbers are given an equal weighting $\lambda$ and then the Fourier transform of $\rho^{(2)}(z)-\lambda^{2}$ determines the wavenumbers that are up–weighted (added) or down–weighted (subtracted) relative to the overall level of $\lambda$ . We then observe what wavenumbers are important to the representation of the point process, which gives more information, conveniently decomposed on a scale–by–scale manner.

For a random field or a stochastic process in $d$ –dimensions a few cartoon characteristics are important. For a discretely sampled process in $\mathbb{Z}^{d}$ the simplest random process, white noise, is constant across wavenumbers yielding a spectrum that takes the form of $\sigma^{2}$ on $[-\frac{\pi}{\Delta},\frac{\pi}{\Delta}]^{d}$ , where $\Delta>0$ is the sampling period, and zero otherwise. For a point process the choice of definition of the spectral density does not imply a decay because of the inclusion of the term $\lambda\delta(z)$ in space. However once we remove this term, we expect a decay of the remaining spectrum $f({k})-\lambda$ as $\|{k}\|\rightarrow\infty$ . Otherwise (5) would contain additional singularities. Furthermore at ${k}=0$ we retain

f(0)=\lambda+\int\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\;dz.

(7)

The second term in this expansion is not required to be zero, and for the Thomas process for example, it is not zero, as it takes the form of [56, p 55]:

f({k})=\lambda+\lambda\mu e^{-4\pi^{2}\|{k}\|^{2}\sigma^{2}}

(8)

where $\mu$ is the per-cluster expected point count and $\sigma$ is the Gaussian dispersal kernel standard deviation. It is clear from this expression that we cannot arrive at a zero contribution due to the exponential.

As we shall be studying the moments of a point process, it is convenient to restate Campbell’s theorem [10, Sec. 4.3.3], and we shall use this result multiple times. The theorem applies to any measurable function $h:R^{nd}\mapsto R^{+}$ with $n=1,2,...$ , and states that (assuming product densities of order $n$ $\rho^{(n)}$ are well defined for any given $n$ )

{\mathbf{E}}\sum_{x_{1},...,x_{n}\in X}^{\neq}h(x_{1},...,x_{n})=\int_{{\mathbb{R}}^{nd}}h(x_{1},...,x_{n})\rho^{(n)}(x_{1},...,x_{n})dx_{1}...dx_{n},

(9)

where the summation is over distinct point tuples. Note that if the point pattern $X$ is stationary then $\rho^{(1)}(x)=\lambda$ is constant for any $x$ , and $\rho^{(n)}(x_{1},...,x_{n})=\rho^{(n)}(x_{1}-x_{n},\ldots,x_{n-1}-x_{n},0)$ . With some abuse of notation, we shall use the same symbol for non-stationary as well as stationary product densities, where the latter function has $n-1$ arguments for an $n$ th order product density, e.g. we write $\rho^{(n)}(z_{1},...,z_{n-1})=\rho^{(n)}(z_{1},\ldots,z_{n-1},0)$ .

We define a new parameterisation to capture how the process’s $f({k})$ differs from that of a Poisson process. We define the $n$ th deviation of the $n$ th order product density as

\widetilde{\rho}^{(n)}(z_{1},\dots,z_{n-1})=\frac{{\rho}^{(n)}(z_{1},\dots,z_{n-1})-\lambda^{n}}{\lambda^{n}},\quad n=2,3,\dots,

(10)

where the argument of the product density $z_{l}\in{\mathbb{R}}^{d}$ for $l=1,\dots,n-1$ . For a Poisson process these $n=2,3,\dots$ deviations from Poissonianity will all be identically zero. By understanding deviations from Poisson behaviour, we get greater insight into the underlying process of interest. For completeness we also define the Fourier transform of the deviations:

\widetilde{f}^{(n)}({k})={\cal F}\left\{\widetilde{\rho}^{(n)}(z)\right\},

where if $n=2$ we suppress the superscript. With this definition we find that

\widetilde{f}({k})=\frac{f({k})-\lambda}{\lambda^{2}},\quad w\in{\mathbb{R}}^{d},\quad f({k})=\lambda+\lambda^{2}\widetilde{f}({k}).

(11)

Thus in a sense, $\widetilde{f}({k})$ encapsulates the deviation of the function from a constant spectral density of $\lambda$ via the term $\lambda^{2}\widetilde{f}({k})$ .

Already [20, p. 337] discusses some differences in spectral representation of time series versus that of point processes. We would argue that decay of the spectrum is still reasonable to assume once $\lambda$ has been subtracted (so the decay of $\widetilde{f}({k})$ is reasonable to assume for large magnitude wavenumbers). Having established the theoretical spectral description of point processes, we now turn to their sampling properties.

3 Direct Spectral Summaries of Point Patterns

In this section, we shall revisit the possible definitions of a spectral estimator for a point pattern. We shall start by defining a direct Fourier transform and taper this definition to ameliorate edge effects, resulting in families of spectral estimators. We also discuss other choices of spectral estimators used in the literature already. To see the spectral characteristics of $X$ we start from what is known as Bartlett’s periodogram estimator [23] based on observing a point pattern in region $B\subset{\mathbb{R}}^{d}$ , written as

\displaystyle I_{0}({k})

\displaystyle:=\hat{\lambda}+|B|^{-1}\sum_{x,y\in X\cap B}^{\neq}e^{-2\pi i{k}\cdot(x-y)},\quad{k}\in{\mathbb{R}^{d}},

(12)

where we have set $\hat{\lambda}=|B|^{-1}|X\cap B|$ . This definition uses all the data (or points) available to us, $\{x:\;x\in X\cap B\}$ , but is simply a possible choice amongst all possible direct spectral estimators. Normally a direct spectral estimator is one which is bilinear in the DFT of the data, see for example the discussion in [58, Ch 5–6]. We recall a bilinear form is a function $D({\mathbf{x}},{\mathbf{y}})$ satisfying in the first argument $D({\mathbf{x}}_{1}+{\mathbf{x}}_{2},{\mathbf{y}})=D({\mathbf{x}}_{1},{\mathbf{y}})+D({\mathbf{x}}_{2},{\mathbf{y}})$ , and equivalently in the second argument. Bilinear forms have been thoroughly discussed in signal processing for the analysis of time series [52]. As a point process consists of locations the best we can hope for in terms of bilinearity will be in terms of sesquilinearity of the Fourier transform as the point locations will appear in the argument of the complex exponential. Sesquilinearity of $D({\mathbf{x}},{\mathbf{y}})$ simply generalises a bilinear form to the Hermitian symmetry, additionally requiring $D({\mathbf{x}},{\mathbf{y}})=D({\mathbf{y}}^{\ast},{\mathbf{x}})$ as well as $D({\mathbf{x}},{\mathbf{y}})$ satisfying $D({\mathbf{x}}_{1}+{\mathbf{x}}_{2},{\mathbf{y}})=D({\mathbf{x}}_{1},{\mathbf{y}})+D({\mathbf{x}}_{2},{\mathbf{y}})$ .

In time series the bilinear form has been chosen to ensure that spectral estimators are real-valued, and often non-negative though some bilinear estimators are not, see e.g. Guyon’s spectral estimator [34]. We define the tapered DFT of a point process $X$ for a specified general (square) integrable function $h(x)$ (referred to as a ‘data taper’ by [58]) with Fourier transform $H({k})$ and unit norm (i.e. $\|h\|_{2}=1$ ), to be

J_{h}({k}):=\sum_{x\in X\cap B}h(x)e^{-2\pi i{k}\cdot x},\quad{k}\in{\mathbb{R}}^{d}.

(13)

Spectral estimators in time series are bilinear in the observed real-valued process $X_{t}$ and sesquilinear in its Fourier transform $J_{h}({k})$ . We shall still require that the form of the estimator is sesquilinear in the DFT of the point process, but the estimators will not be bilinear in $X$ , the point pattern, as this will not be possible to achieve. We shall also define

\widetilde{J}_{h}({k})=J_{h}({k})-\lambda H({k}).

Because $x$ appears in the argument of the complex exponential, we cannot define an estimator that is bilinear directly in the data. In practice also one has to decide what taper function to use. For 1D point processes tapering has been used [15], but also multitapering has been used on interpolated data [21]. Here, we do not interpolate, do not implement localised analysis, and do not implement analysis in 1D. It is more complex to implement interpolation in higher dimensions, as they are not orderly, unlike a time series time argument.

Based on $J_{h}({k})$ the natural spectral estimator becomes its modulus square or

I_{h}({k}):=\left|J_{h}({k})\right|^{2}=\sum_{x\in X\cap B}h(x)h^{*}(x)+\sum_{x,y\in X\cap B}^{\neq}h(x)h^{*}(y)e^{-2\pi i{k}\cdot(x-y)}.

(14)

If we take $h(z)=h_{0}(z)=|B|^{-\frac{1}{2}}\mathbf{1}(z\in B)$ then we recover Bartlett’s periodogram in (12) from (14). If we do not take the $h_{0}(z)$ taper then Bartlett’s periodogram is not exactly recovered. The point of using a general function $h(z)$ is that abruptly ending the inclusion of points when we leave the region $B$ causes ripples in wavenumber, and therefore a worse estimation of the spectrum, as is also the case of time series when not using a taper [58]. We study the whole family of estimators $I_{h}({k})$ , where a new member of the family is defined for each choice of $h$ . We also note that (14) is a direct spectral estimator [58, p. 207]. Multidimensional tapers are available to us [36, 65], and can be pre-computed. Our choice of tapering corresponds to using continuous tapers evaluated at random locations. We have deduced an asymptotic distribution for $J_{h}({k})$ using [7] but this result cannot be applied to $I_{h}({k})$ even if it superficially may seem to be of the correct form.

Note that the sum in (14) cannot be evaluated in a computationally efficient manner, unlike the DFT, as the locations $\{x\}$ are not regularly spaced, which is unfortunate, but unavoidable. Finally (14) is bilinear in $J_{h}({k})$ but it is not bilinear in $X$ , unlike the commensurate expressions for the DFT of time series and random fields.

For completeness we also note the isotropic estimator of [23, Eqn. 3.3] in 2D, namely

\displaystyle\bar{I}_{0}(\|{k}\|)

\displaystyle:=\hat{\lambda}+\frac{1}{|B|}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{0}\left(2\pi\|{k}\|\|x-y\|\right),

(15)

where $\mathcal{J}_{0}(x)$ denotes the Bessel function of order 0, specified in Section 7. The $d$ -dimensional extension is given later in equation (18). $\bar{I}_{0}$ is less clearly bilinear in the data, but if we start from the estimator $I_{0}$ of (12) and average it over orientations analytically, then we arrive at this form. We therefore with a slight abuse of our terminology also refer to it as a bilinear estimator. There is one additional correction made by [23]. As we shall see, for low wavenumbers there is a bias inherent in (15). To address this problem [23, Eqn. 3.4] suggests taking for some choice of lower bound $t_{0}>0,$

I_{D}(\|{k}\|)=\left\{\begin{array}[]{lcr}\bar{I}_{0}(t_{0})&{\mathrm{if}}&\|{k}\|\leq t_{0}\\ \bar{I}_{0}(\|{k}\|)&{\mathrm{if}}&\|{k}\|>t_{0}\end{array}\right..

(16)

The authors of [23] suggest taking $t_{0}$ so that $\bar{I}_{0}(t_{0})$ is at a minimum, and also suggest smoothing the estimated ${I}_{D}$ , which we clarify in Section 6. The authors also discuss iterated methods of bias correction in their Section 5, and correcting biased estimators of the correlation.

Note also that we have modified the estimator in (15) to divide by $|B|$ rather then the observed number of points in the region, $N(B)$ , as do for example [23], as it is much preferable not to have a random denominator. We shall discuss the usage of $I_{0}(\cdot)$ versus $I_{D}(\cdot)$ further on in Section 7, and the implications of when the process is truly isotropic or not.

As can be surmised from (14) the expectation of the general estimator ${\mathbf{E}}\left\{I_{h}({k})\right\}$ is a convolution between the Fourier transform of the observation window $B$ and the true spectrum, and if $h(z)$ does not go to zero nicely at the boundary of $B$ then the periodogram becomes quite complicated in terms of its expectation. Inspection of (14) also raises a second problem, namely that there is a constant contribution $|B|^{-1}|X\cap B|$ which does not give any wavenumber specific information, and is correlated between wavenumbers. This term is new to time series and random fields, if not new to the expectation of the periodogram of randomly sampled stochastic processes, see [50, 11].

Should we choose to taper isotropically in 2D then we instead use an isotropic taper $h_{I}(\cdot)$

\bar{I}_{h}(\|{k}\|):=\hat{\lambda}+\sum_{x,y\in X\cap B}^{\neq}h_{I}\left(\|x-y\|\right)\mathcal{J}_{0}\left(2\pi\|{k}\|\cdot\|x-y\|\right).

(17)

Note that this cannot necessarily be constructed. Say $J_{h}(\cdot)$ with an isotropic taper would be made with the largest spherical domain and a compact support could be constructed, but what does that imply for $I_{h}(\cdot)$ ? Radial tapers can be determined as described in [65], whether in 2D continuous space or discrete space. Equation (17) can be of course be extended into higher dimensions. In general we would have in dimension $d=1,2,3,4,\dots$

\bar{I}_{h}(\|{k}\|):=\hat{\lambda}+\sum_{x,y\in X\cap B}^{\neq}h_{I}\left(\|x-y\|\right)\mathcal{J}_{d/2-1}\left(2\pi\|{k}\|\|x-y\|\right)\|x-y\|^{-(d/2-1)}.

(18)

We can also use more than one taper, and will use a sequence of orthogonal functions $\{h_{{m}}\}$ [58], that will be used to get a new estimator. We will assume they are all of unit energy and are all (pairwise) orthogonal. This means we will require $||h_{{m}}||_{2}=1$ and

\int_{{\mathbb{R}}^{d}}h_{{m}}(z)h_{{m^{\prime}}}(z)\,dz=\delta_{{{m}}{{m^{\prime}}}}.

(19)

We shall study the properties of direct spectral estimators of the form of Eqn (14). This will help us to characterise the second order properties of the process $X$ . Most of time series analysis is based on discrete regular sampling for instance, and so most tapers are designed for that scenario. We chose to use continuous tapers, rather than interpolating the points to a regular grid. This leaves as possible tapers to use the spheroidal wavefunctions (continuous but hard to compute), as well as the cosine tapers of [60]. These correspond to separable taper choices; non-separable choices with be discussed in Section 7.

4 Distributional Properties of Bilinear Spectral Estimators

In this Section we derive the asymptotic marginal distribution of the tapered periodogram, and its asymptotic expectation. We shall start from the simplest spectral estimator, namely the Bartlett periodogram as specified by (12) and calculate its properties. We also define the transfer function corresponding to a taper $h$ to be

H=\mathcal{F}[h],\quad H_{0}=\mathcal{F}[h_{0}].

(20)

We shall now be quite concrete and understand some common cases of spatial data, and their sampling, and choose as early special cases Cartesian product domains. We shall focus on the cuboid sampling domain:

{B}_{\square}(\bm{l})=[-{l_{1}}/{2},{l_{1}}/{2}]\times\dots\times[-{l_{d}}/{2},{l_{d}}/{2}].

Lemma 4.1.

Assume that $X$ is a homogeneous point process with intensity $\lambda$ and twice differentiable spectrum $f({k})$ at all values of ${k}\neq 0$ . Assume $X$ is observed in a cuboid domain $B_{\square}(\bm{l})$ with a centroid at 0, which is growing in every dimension or $\min_{j}l_{j}\rightarrow\infty$ . Then the expected value of the periodogram $I_{0}({k})$ satisfies the equation

{\mathbf{E}}I_{0}({k})={f}({k})+|B|^{-1}\lambda^{2}T(B,{k})+o(1),\quad{k}\neq 0,

(21)

for

T(B,{k})=\prod_{j=1}^{d}\frac{\sin^{2}(\pi{k}_{j}l_{j})}{(\pi{k}_{j})^{2}}.

Proof.

See Appendix A. ∎

This is not what we would expect as a large sample expectation of the spectrum, given our experience for time series and random fields in that the term $|B|^{-1}\lambda^{2}T(B,{k})$ has been added. To get there, we need to understand the DFT $J_{h}({k})$ better.

We note that in general $h(x)e^{-2\pi i{k}\cdot x}$ is identically distributed but not independent for many choices of point processes $X$ , making the choice of a Central Limit Theorem (CLT) a bit more complex. For a large class of processes one such CLT is given by [7]. If we compare the quantity in (13) with [7], then we see that $q\in{\mathbb{N}}$ and $p=1$ , in their notation. We do not have to worry about $e^{-2\pi i{k}\cdot x}$ being a function of several of the points, and this is why $q=1$ . Citing [7], we can deduce from their Theorem 1 that as $|B|$ diverges $J_{h}({k})$ is asymptotically Gaussian. We recover the second moments by direct calculations:

Proposition 4.1.

Assume that $X$ is a stationary point process with intensity $\lambda$ and with spectrum $f({k})$ , and that $h$ is a unit energy taper supported in the domain $B_{\square}(\bm{l})$ itself only. Then the first moments of the direct spectral estimator $J_{h}({k})$ are given by:

$\displaystyle{\mathbf{E}}\left\{J_{h}({k})\right\}$	$\displaystyle={\mathbf{E}}\sum_{x\in X\cap B}h(x)e^{-2\pi i{k}\cdot x}$	(22)
	$\displaystyle=\lambda\int_{\mathbb{R}^{d}}h(x)e^{-2\pi i{k}\cdot x}\;dx=\lambda H({k}),\quad{k}\in{\mathbb{R}^{d}}$
$\displaystyle{\mathrm{\bf Var}}\left\{J_{h}({k})\right\}$	$\displaystyle=\lambda+\lambda^{2}\int_{\mathbb{R}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}+\lambda^{2}\left\|H({k})\right\|^{2}-\lambda^{2}\|H({k})\|^{2}$
	$\displaystyle=\lambda+\lambda^{2}\int_{\mathbb{R}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}$	(23)
$\displaystyle{\mathrm{Rel}}\left\{J_{h}({k})\right\}$	$\displaystyle=\lambda\int_{\mathbb{R}^{d}}H({k}^{\prime}-2{k})H({k}^{\prime})d{k}^{\prime}+\int_{\mathbb{R}^{d}}U({k},z)e^{-2\pi i{k}\cdot z}\{\rho(z)-\lambda^{2}\}dz,$	(24)

with $U({k},z)=\int h(x)h(z+x)e^{-2\pi i{k}\cdot(2x)}\;dx$ .

Proof.

See Appendix B and notice the definition of (20). ∎

Why do we bother with determining the first two moments of the point pattern? The spectrum characterises any stationary/homogeneous process and so is a key summary to estimate [8]. Estimation will be linear in the periodogram, but once $J_{h}({k})$ is approximately Gaussian, the distribution of the periodogram is determined from these moments, using theory developed for quadratic forms in normal variates [42].

In the above Proposition the term ${\mathrm{Rel}}\left\{\cdot\right\}$ denotes “relation”, also known as the “complimentary covariance”, see e.g. [54]. With these moments, and the result of [7] we deduce the following corollary.

Corollary 4.1.

Assume $X$ satisfies the constraints of [7], then the DFT satisfies:

	$\displaystyle J_{h}({k})$	$\displaystyle\overset{L}{\rightarrow}N_{C}\left(\lambda H({k}),f({k}),r({k})\right),$		(25)
	$\displaystyle r({k})$	$\displaystyle=\lambda\int_{\mathbb{R}^{d}}H({k}^{\prime}-2{k})H({k}^{\prime})d{k}^{\prime}+\int_{\mathbb{R}^{d}}U({k},z)e^{-2\pi i{k}\cdot z}\{\rho(z)-\lambda^{2}\}dxdz,\;U({k},z)=\int h(x)h(z+x)e^{-2\pi i{k}\cdot(2x)}\;dx.$

We can view $J_{h}({k})$ as a complex-valued scalar or a real-valued two-vector. $N_{C}(\mu,\Sigma,C)$ is the general complex normal and its arguments are its mean $\mu$ , it covariance $\Sigma$ , as well as its relation or complimentary covariance $C$ . It should be contrasted with the complex proper normal $N_{C}(\mu,\Sigma),$ that has zero relation.

What can we then say about $I_{h}({k})$ ? Using the continuous mapping theorem [22] we can deduce from (25) that if we consider only arguments ${k}$ so that the complementary covariance is negligible then

|J_{h}({k})|^{2}\overset{L}{\rightarrow}\frac{f({k})}{2}\chi^{2}_{2}\left(\frac{\lambda^{2}|H({k})|^{2}}{f({k})}\right),

(26)

where the parameters of the non–central $\chi^{2}_{\nu}(\mu^{2})$ ( $\nu$ denoting the degrees of freedom, and $\mu^{2}$ the non-centrality parameter). We give the form of $H({k})$ in (20). We can also use the uniform integrability of the variable $J_{h}({k})$ , to get the asymptotic moments of $I_{0}({k})$ from these limits. In fact, the expectation of any member of that family of tapered estimates takes the form of:

{\mathbf{E}}\left\{I_{h}({k})\right\}=\int_{{\mathbb{R}^{d}}}|H({k}^{\prime}-{k})|^{2}f({k}^{\prime})d{k}^{\prime}+\lambda^{2}|H({k})|^{2}.

(27)

This can be derived straightforwardly by direct computation from (14) by taking expectations with Campbell’s formula and using the convolution theorem. We see immediately the bias of this estimator, namely the $\lambda^{2}|H({k})|^{2}$ term. To remove the non-centrality bias, we define the bias-corrected periodogram from the de-biased discrete Fourier transform

\widetilde{I}_{h}({k})=\left|\widetilde{J}_{h}({k})\right|^{2}.

(28)

Note also the discussion in [20, p. 292], where the theory of signed measures is used to make the equivalent definition, if with additional mathematical sophistication. Therefore $J_{h}({k})-\lambda H({k})=\widetilde{J}_{h}({k})$ can be called the signed measure. The quantity $\widetilde{J}_{h}({k})$ is the DFT of the process $X^{0}$ dual to the mean-corrected random measure $N^{0}(dx)=N(dx)-\lambda dx$ where $N$ is the dual counting measure of the point process $X$ . As we have subtracted $H({k})$ off the discrete Fourier transform, it is no longer strictly positive, but once we take the modulus square we are guaranteed to arrive at a real-valued positive quantity.

There is one wavenumber which is problematic, namely ${k}=0$ . For the periodogram we have $h_{0}(x)=\mathbf{1}_{B}(x)/\sqrt{|B|}$ . Thus we have

J_{0}({k})=\frac{1}{\sqrt{|B|}}\sum_{x\in X\cap B}e^{-2\pi i{k}\cdot x},\quad J_{0}(0)=\frac{1}{\sqrt{|B|}}N(B)=\sqrt{\|B\|}\widehat{\lambda}.

(29)

This removes the problem of a non-zero mean of the DFT, as long as we assume that we know the intensity $\lambda$ , and so are in the position to remove this effect. Assuming knowledge of this quantity is not a major hurdle, as it can be estimated consistently for largish areas ( $|B|\rightarrow\infty$ ), by just counting the number of points and dividing by the area. For completeness we also define the signed measure DFT as

\widetilde{J}_{h}({k})=J_{h}({k})-\hat{\lambda}H({k})\quad.

(30)

We note directly from (25), that

\widetilde{J}_{h}({k})\rightarrow N_{C}\{0,f({k}),r({k})\}.

(31)

Also it follows

\widetilde{J}_{0}(0)=\sqrt{|B|}\widehat{\lambda}-\widehat{\lambda}H_{0}(0)=\sqrt{|B|}\widehat{\lambda}-\sqrt{|B|}\widehat{\lambda}=0,

(32)

trivially. Thus we cannot estimate the DFT at wavenumber zero. For a time series analysis when calculating DFTs we subtract off the sample mean, and then also the mean corrected DFT is zero at wavenumber zero. In a time series setting the periodogram at wavenumber zero is often not plotted.

We note that a second way to do bias correction is by means of

\breve{I}({k};h)=\sum_{x,y\in X}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}-\widehat{\lambda^{2}}\left|H({k})\right|^{2}.

(33)

The advantage of the first estimator in (28) is that it is naturally non-negative and removes the error before we take the modulus square. This also should have a positive impact on the variance. In addition, where as $\hat{\lambda}$ is an unbiased estimator for $\lambda$ , $\widehat{\lambda^{2}}$ can be a biased estimator for $\lambda^{2}$ .

For additional generality, we could consider a more sophisticated estimator than (33) or (28) see e.g. [14, Ch. 6], and define

I_{g}({k})=\sum_{x\in X\cap B}\sum_{y\in X\cap B}g(x,y)e^{-2\pi i{k}\cdot\left(x-y\right)}-\widehat{\lambda^{2}}G({k},{k}),

(34)

where $g(x,y)$ is a kernel which may have a specified support, that has to be selected, and $G({k},{k})=\sum_{x}\sum_{y}g(x,y)e^{-2\pi i{k}\cdot\left(x-y\right)}$ . This estimator suffers from not being positive by design. For example we could define $g(x,y)=h(x)h^{\ast}(y)$ with $h(x)$ a multi-dimensional data taper [36], or we could make a non-separable choice of $g(x,y)=h(\|x-y\|)$ . Depending on the choice of $g(x,y)$ the quantity $I_{g}({k})$ may satisfy a number of desirable characteristics such as positivity, asymptotic unbiasedness and computational efficiency.

Looking at the debiased periodogram, we can now determine further properties of $I_{h}({k})$ , or properties of $\widetilde{I}_{h}({k})$ . To produce an estimator that is smoothed we need to study the covariance and variance of $\widetilde{I}_{h}({k}_{1})$ and $\widetilde{I}_{h}({k}_{2})$ . We first look at ${\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}$ . We find that its form for large spatial regions is specified by the following theorem.

Lemma 4.2.

Assume that $X$ is a homogeneous point process with a spectral density $f({k})$ . Then the bias-corrected tapered periodogram has a first moment given by:

\displaystyle{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}

\displaystyle=\int_{{\mathbb{R}^{d}}}|H({k}^{\prime}-{k})|^{2}f({k}^{\prime})d{k}^{\prime}.

Proof.

See Appendix C. ∎

Thus as long as $H({k})$ is getting more concentrated in wavenumber (e.g. $H({k})\rightarrow\delta({k})$ ), this is asymptotically in $|B|$ an unbiased estimator of $f({k})$ . Using the signed measure DFT of (30) to study $X$ is more convenient, as this lets us avoid the contribution that affects the low wavenumbers.

Let us write down what grid we get large sample unbiasedness for, customarily called the Fourier grid, and this is useful when the observational domain $B$ is a box.

Definition 4.1.

The Fourier wavenumber grid for a point process observed on a cuboid domain $B_{\square}(\bm{l})$ corresponds to the points

{\cal K}(\bm{l}):={\cal K}(B_{\square}(\bm{l}))=\{{k}_{n}\;:\;{k}_{n}=\begin{pmatrix}p_{n1}/l_{1},\dots,p_{nd}/l_{d}\end{pmatrix},\quad p_{nj}\in{\mathbb{Z}}\}.

(35)

Note that the physical units of the wavenumber elements is per unit length, such as $m^{-1}$ . Referring to Lemma 4.1 we see that the zero wavenumber, or taper $h_{0}$ related bias term $T(B,{k})$ , vanishes using this grid not only removing any asymptotic bias at low wavenumbers, but also giving us a sampling of the wavenumbers that approximately leads to independent periodogram ordinates, rather like in the random field case, see Corollary 5.1.

These results establish what wavenumber grid we should evaluate a standard spectral estimator of a time series at, in analogy to the the Fourier frequencies [58, p. 197–198] in time series. Their basic importance follows because we can expect the direct Fourier transform to be Gaussian and so uncorrelated implies independence.

A second feature of time series is the notion of the Nyquist wavenumber [58, p. 98]. This does not exist for point processes. It may seem counter intuitive that there is no upper limit to the wavenumbers we can estimate. When analysing a process that has been sampled in space, such as a random field, we expect to see aliasing. Aliasing is when variation that is happening very rapidly is confused with slower cycles, because when we have sampled more sparsely, rapid variation cannot be resolved. For random locations of the point process the pairwise distances can be any real-valued value, so the Nyquist wavenumber does not (in a sense) exist.

Theorem 4.1.

Large–Domain Expectation of the tapered periodogram. Assume that $X$ is a stationary point process with intensity $\lambda$ and twice differentiable spectrum $f({k})$ at all values of ${k}\neq 0$ , and that $h$ is a unit energy taper (e.g. $\|h\|=1$ ) supported in the cuboid domain $B_{\square}(\bm{l})$ only. Assume $X$ is observed in $B_{\square}(\bm{l})$ , which is growing in every dimension, that is $\min l_{j}\rightarrow\infty$ . Then the expected value of the tapered periodogram $I_{h}({k})$ satisfies the equation

{\mathbf{E}}I_{h}({k})=f({k})+\lambda^{2}|H({k})|^{2}+o(1),\quad w\neq 0,

(36)

where $H({k})$ is the Fourier transform of $h(x)$ , as defined by (20).

Proof.

See Appendix E. ∎

This theorem establishes that the tapered periodogram is a biased estimator of the spectrum of the point pattern. Previous authors made up an ad-hoc corrections [23] to remove the bias. Our result will suggest a method to arrive at a positive spectral density estimator that is not biased. This is the basic and important result; equivalent to [8, Section V]. In fact, asymptotics are well–understood both for time series and random fields. Our (previous) understanding of asymptotics for spectral estimation of point processes hearkens back to [20] or [23]. In the latter reference the authors informally refer back to Chemistry theory by Hansen and McDonald, discussing the properties of gasses.

Corollary 4.2.

Bias–corrected periodogram. Assume that $X$ is a stationary point process with intensity $\lambda$ and twice differentiable spectrum $f({k})$ at all values of ${k}\neq 0$ , and that $h$ is a unit energy taper supported in the cuboid domain $B_{\square}(\bm{l})$ only. Assume $X$ is observed in $B_{\square}(\bm{l})$ , which is growing in every dimension. Then the expected value of the tapered periodogram $\widetilde{I}_{h}({k})$ satisfies the equation

{\mathbf{E}}\widetilde{I}_{h}({k})=f({k})+o(1),\quad{k}\neq 0.

(37)

Proof.

See Appendix D. ∎

This corollary then informs us how to do estimation, by removing the bias. This corollary shows the importance of removing the spectral bias before squaring – otherwise it gets hard to isolate and remove the effect at zero wavenumber, which is exacerbated at higher intensities (growing $\lambda$ ), as is clear from (36). Note that the estimator $\widetilde{I}_{h}({k})$ is debiased relative to ${\mathbf{E}}\left\{I_{h}({k})\right\}$ , but is still guaranteed to be non-negative. This can be compared to removing $\widehat{\mu}$ from a time series before calculating the periodogram.

Finally, to employ smoothing for purposes of variance reduction, we will extend the tapered estimator to include multiple tapers. Define ${M}\geq 1$ estimates of the spectrum via

I_{{{{m}}}}^{t}({k})=\sum_{x,y\in X\cap B}h_{{{m}}}(x)e^{-i2\pi{k}\cdot x}h^{\ast}_{{{m}}}(y)e^{i2\pi{k}\cdot y},\quad{{m}}=1,\dots,{M},

(38)

which is not bias-corrected but where we assume

\int_{{\mathbb{R}^{d}}}h_{{{m}}}(x)h^{\ast}_{{m^{\prime}}}(x)dx=\delta_{{{m}}{{m^{\prime}}}},

where $\delta_{{{m}}{{m^{\prime}}}}=1$ if ${{m}}={{m^{\prime}}}$ and $0$ otherwise. Of course (38) still suffers from low-wavenumber bias. To define a debiased estimator we take

\widetilde{I}_{{{{m}}}}^{t}({k})=\left(\sum_{x\in X\cap B}h_{{m}}(x)e^{-i2\pi{k}\cdot x}-\lambda H_{{m}}({k})\right)\left(\sum_{y\in X\cap B}h^{\ast}_{{m}}(y)e^{i2\pi{k}\cdot y}-\lambda H^{\ast}_{{m}}({k})\right),\quad{{m}}=1,\dots,{M}.

(39)

We subsequently average these estimates over ${{m}}$ to reduce variance.

In (38) we have a product of $h_{{m}}(x)h^{\ast}_{{m}}(y)$ . This is an inevitable consequence of the bilinear form of the periodogram from the Fourier transform. The most natural way of making multidimensional and separable tapers would be for $x=\begin{pmatrix}x_{1}&\dots&x_{d}\end{pmatrix}^{T}$ to define the component-wise product taper $h_{{m}}(x)=\tilde{h}_{{{m}}_{1}}(x_{1})\dots\tilde{h}_{{{m}}_{d}}(x_{d})$ . We will then need to re-enumerate ${{m}}_{1},\dots,{{m}}_{d}$ into a single index. For instance in 2D if we have three tapers in 1D, then we will end up with nine tapers in 2D, and ${{m}}$ will range from one to nine. These estimates can also be arrived at starting from (13) and $J_{h}({k})$ . As we shall use ${M}$ tapers linearly, and ${M}^{2}$ for the quadratic estimators, we shall use the subscript ‘0’ to refer to the untapered periodogram, and ${{m}}=1,\dots,{M}$ to refer to the subsequent tapers.

Tapering is very common for stochastic processes. Initially the idea was hotly contested, but its utility is now firmly established both for time series and random fields [18, 76]. The idea of tapering is to ameliorate edge effects that leads to the asymptotically leading bias term. This is important in 1D [70, 76] but of greater importance in 2D and higher [18]. In 2D and higher the edge effects become more pronounced, and indeed asymptotically dominant [47, 53]. We also use multitapering to stabilise the variance of any estimator, as described for time series in [76]. We shall describe the necessary steps to perform linear estimation of the spectrum of a 2D and higher dimension point process in Section 6.

Let us set theory aside for a moment and consider some spectral estimates for archetypal point patterns. Figure 3 demonstrates the debiased periodogram, the debiased multitapered periodogram, with sine-tapers, and a rotationally averaged 1D summary of the peridogram (defined in Sections 6&7) for four point patterns exhibiting different structural behaviour. The aspect ratio of the figure panels are kept the same only to have an orderly figure: Due to debiasing we can estimate the periodogram on wavenumbers of our own choosing, and not just on the Fourier grid which in these examples would be different for different tapers. The wavenumber-grid in this illustration is a regular grid with a step size 0.005 in both dimensions.

We see that information is present at wavenumbers up to $\|{k}\|\approx 0.2$ . The tapering smooths the periodogram, as expected. There is a dark well near the origin for the regular pattern, and a bright hump for the clustered pattern, and both features transfer to the rotationally averaged curves (compare with Figure 2). The anisotropy of the fourth pattern is hinted by the anisotropy visible in the 2D periodograms as elongation of the “hump” in the second dimension. The elongation is perpendicular to the elongations of the clusters in the data because wavenumbers relate inversely to spatial units. The rotationally averaged periodogram was also computed from the (not shown) non-debiased periodogram for comparison. There is a very prominent bias near $\|{k}\|=0$ for the non-debiased version, which illustrates why the proposed debiasing step is relevant when applying the method in practice.

5 Covariance of bilinear spectral estimators

In this Section we determine the covariance of the periodogram, in anticipation of smoothing the periodogram. If we compare the developments of the previous sections to that of spectral analysis, it would seem that we are in a good position to estimate the spectrum, especially as we can assume the spectral deviation function $\widetilde{f}({k})$ is smooth. The smoothness of $\widetilde{f}({k})$ follows from the decay of $\widetilde{\rho}(z)$ . Simple Fourier theory stipulates that the decay of $\widetilde{\rho}(z)$ yields the smoothness of $\widetilde{f}({k})$ . From a mean-square error perspective on estimating the spectrum, to understand how to smooth the periodogram away from zero wavenumber, and to do so we need to further study the variance and covariance of direct spectral estimators.

Our main concern is : 1) Is the variance finite? 2) Can we find a grid of wavenumbers so that the estimated spectrum is uncorrelated at these points? To be able to answer such questions we must study what the variance and covariance of the DFT is. Let us determine the second order properties of spectral estimators. Core to our understanding of smoothing will be the variance and covariance of the tapered functions. This will be established in the following theorem. For brevity define

\phi_{k}(x)=e^{-2\pi i{k}\cdot x},\quad{k},x\in{\mathbb{R}}^{d}.

(40)

With this notation we have

I^{t}_{{m}}({k})=\sum_{x,y\in X\cap B}h_{{{m}}}(x)h_{{{m}}}^{\ast}(y)\phi_{k}(x-y),\quad{{m}}=1,\dots M.

(41)

Above we note that we have the same value of ${{m}}$ across $x$ and $y$ in (41) as this corresponds to a modulus square.

Theorem 5.1.

Let $I^{t}_{{m}}({k})$ denote the (tapered) periodogram given by (38). The covariance between the (tapered) periodogram and itself across two wavenumbers ${k}_{1},{k}_{2}\in{\mathbb{R}}^{d}$ then takes the form of

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({k}_{1}),I^{t}_{{m^{\prime}}}({k}_{2})\}=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)\;dxdydvdz$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{k}_{2}}(x-v)\right\}\right.$
		$\displaystyle\quad+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{k}_{2}}(y-v)\right\}$
		$\displaystyle\quad\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}_{1}}(x-y)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{k}_{2}}(x-y)\right\}\right.$
		$\displaystyle\quad+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\right]\,dxdy$
		$\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\left[\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y)\right]\rho^{(2)}(x,y)dxdy$
		$\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\rho^{(2)}(x,y)dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\rho^{(2)}(x,y)dxdy\right\}.$

Proof.

The proof is in Appendix F. ∎

This theorem derives the general expression for the covariance of the periodogram, and will help us in general to determine how to do linear smoothing. The cross-correlation also relates to [8, Section V], and helps us understand when we can treat Fourier coefficients as uncorrelated (and with asymptotic Gaussianity as independent). Uncorrelatedness is important to the smoothing, as it ensures a variance reduction by averaging. Whilst, the expression derived in Theorem 5.1 is general, and only requires the assumptions of homogeneity of the point process in space, it is hard to understand so let us study some special cases. Let us study this general correlation in the instance of the Poisson process where we know that

\rho^{(n)}(x_{1},\dots,x_{n})=\lambda^{n},

(42)

and consider the case of ${k}_{1}={k}_{2}={k}$ .

For completeness we also define the spectral bandwidth $b_{h}$ by

\displaystyle b^{2}_{h}

\displaystyle:=\int_{{\mathbb{R}}^{d}}\|{k}\|^{2}\left|H({k})\right|^{2}d{k}.

(43)

Corollary 5.1 (Covariance of Spectral Estimates).

Let $I^{t}_{{m}}({k})$ denote the tapered periodogram given by (38) for a Poisson process with intensity $\lambda$ using a single taper ${{m}}$ , and assume that $\|k\|>\max\{b_{h_{{{m}}}},b_{h_{{{m^{\prime}}}}}\}$ . We can determine the covariance between the periodogram using different tapers ( ${{m}}$ and ${{m^{\prime}}}$ ) or at different frequencies by the following:

\displaystyle{\mathrm{\bf Cov}}

\displaystyle\{I^{t}_{{m}}({k}),I^{t}_{{m^{\prime}}}({k})\}=\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx+\lambda^{2}\delta_{{{m}}{{m^{\prime}}}}+o(1).

Note that $o(1)$ is in $|B|$ diverging. The covariance between the (tapered) periodogram and itself at different wavenumbers ( ${k}_{1}$ and ${k}_{2}$ ) takes the form of

\displaystyle{\mathrm{\bf Cov}}

\displaystyle\{I^{t}_{{m}}({k}_{1}),I^{t}_{{m}}({k}_{2})\}=\lambda\|h_{{m}}\|_{4}^{4}=o(1).

Proof.

The proof can be found in Appendices G and H. ∎

At this stage we have simplified our assumptions to the Poisson process which is quite disappointing. The reason why we have chosen to do so is clear from Theorem 5.1. If we assume the process is Poisson then several terms cancel. However if the Fourier transform is turning Gaussian as we have discussed in (25) in Section 3, with uniform integrability of $J_{h}({k})$ the following proposition holds.

Proposition 5.1.

Let $I^{t}_{{m}}({k})$ denote the tapered periodogram given by (38), and $\tilde{I}^{t}_{{m}}({k})$ the tapered bias corrected periodogram. Assume $X$ satisfies the assumptions given for (25) and that $\|\widetilde{f}^{(m)}\|_{0}<\infty$ for $m=2,3,4,5,6$ . Then

\displaystyle{\mathrm{\bf Cov}}\{\widetilde{I}^{t}_{{m}}({k}_{1}),\widetilde{I}^{t}_{{m^{\prime}}}({k}_{2})\}=o(1)+\left|{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}\right|^{2}.

To derive the form of the latter term we note

\displaystyle{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}

\displaystyle={\mathrm{\bf Cov}}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}=\lambda\delta_{{{m}}{{m^{\prime}}}}+\lambda^{2}{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\},

where we have $\tilde{f}({k})$ given by (11) and so

$\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}$	$\displaystyle\equiv\iint_{R^{d}\times R^{d}}\widetilde{\rho}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy+o(1)$
	$\displaystyle=\iint_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}({k}^{\prime})e^{-2\pi ix\cdot({k}_{1}-{k}^{\prime})}e^{2\pi iy\cdot({k}_{2}-{k}^{\prime})}d{k}^{\prime}\cdot h_{{m}}(x)h^{\ast}_{{m^{\prime}}}(y)\,dx\,dy+o(1)$
	$\displaystyle=\int_{R^{d}}\widetilde{f}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})d{k}^{\prime}+o(1).$	(44)

We note that as $H_{{m}}({k})$ has concentrated support we will be able to apply similar arguments to those of Appendix D to determine nearly uncorrelated DFTs.

Proof.

See Appendix I ∎

Definition 5.1.

The tapered Fourier wavenumber grid for a point process observed on a cuboid domain $B_{\square}(\bm{l})$ corresponds to the points

{\cal K}_{M}(B_{\square}(\bm{l}))=\{{k}_{n}\;:\;{k}_{n}=\begin{pmatrix}p_{n1}\tau(\epsilon)/l_{1},\dots,p_{nd}\tau(\epsilon)/l_{d}\end{pmatrix},\quad p_{nj}\in{\mathbb{Z}}\},

(45)

where $\tau(\epsilon)$ is the smallest positive real number such that $\left|\int H_{{m}}(\tau(\epsilon)-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\right|<\epsilon,\,$ for all ${{m}},{{m^{\prime}}}\in\{1\ldots{M}\}$ .

These results hint at producing general methodology; we can for the Poisson process figure out how to do non-parametric estimation. This is however not enough and we need to determine the variance of the periodogram more generally. To go further than Proposition 5.1 and avoid the Poisson assumption we refer back to (25). As ${J}_{h}({k})$ is becoming Gaussian for larger sample sizes this implies its moments must also start to behave like the moments of the Gaussian ${J}_{h}({k})$ .

Note that this result is not contradictory to Theorem 5.1 from a dimensional perspective as this theorem concerns the modulus square of the Fourier transform. With these moments we can now consider the problem of estimation. We might feel an increasing degree of unease as we have succumbed to the usual fallacy of only deriving the most general results for a Poisson process. However as the observational domain becomes larger we would expect $|H_{{m}}({k})|$ to concentrate and so having a non-constant spectrum will become less of an issue.

6 Linear Smoothing

Having already derived the first and second order properties of the periodogram, in this Section we propose an accurate estimator of $f({k})$ formed by smoothing. If we fully characterise $f({k})$ then we have fully characterised $\gamma(z)$ , as the Fourier transform is a bijection. Just like for a time series or random field, a key question concerns the correct method for quadratic characterisation and estimation.

We want to start from the random variables $\{I_{h}({k})\}$ or $\{\widetilde{I}_{h}({k})\}$ and directly from those quantities (non-parametrically) estimate the spectral density. As long as we seek to estimate $f({k})$ at points ${k}$ where the function $f({k})$ is continuous, averaging $\{I_{0}({k})\}$ (or $\{\widetilde{I}_{h}({k})\}$ ) locally seems like a sensible strategy. [23] proposed smoothing the 2D isotropic $\widetilde{I}_{D}({k})$ as defined in (16), and applied a weighted average smoother with user tuned weights. This approach, however, does not determine how to sample wavenumbers, or remove correlation. We would additionally like to include the new form of bias removal before smoothing.

We have two clear options available to us, either smoothing the raw periodogram at the Fourier frequencies (de-biasing is not needed at the Fourier frequencies, because we defined the grid such that the bias will be 0), or by using multitapering. We then have two possible estimators, namely the bias–corrected multi-taper estimator of [76] as well as a multi-dimensional kernel density estimator [26]. They are both linear in the estimated spectrum. We shall use the periodogram rather than another estimator for the kernel density estimation, as it is easier to keep track of bias and correlation.

We therefore define the multitaper estimator [76] (and refer to [36, 51])

\overline{I}_{{M}}^{t}({k})=\frac{1}{{M}}\sum_{{{m}}=1}^{{M}}\widetilde{I}^{t}_{{{m}}}({k}),\quad{k}\in{\mathbb{R}}^{d},

(46)

where $\widetilde{I}^{t}_{{{m}}}({k})$ is defined in (38).

Why is it advantageous to use the estimator in (46)? Because $\widetilde{I}^{t}_{{{m}}}({k})$ is uncorrelated across ${{m}}=1,\dots,{M}$ it follows that the variance of $\overline{I}^{t}_{M}({k})$ decreases like $1/{M}$ . Or

$\displaystyle{\mathrm{\bf Var}}\{\overline{I}_{{M}}^{t}({k})\}$	$\displaystyle=\frac{1}{{M}^{2}}\sum_{{{m}}=1}^{{M}}\sum_{{{m^{\prime}}}=1}^{{M}}{\mathrm{\bf Cov}}\{\widetilde{I}^{t}_{{{m}}}({k}),\widetilde{I}^{t}_{{{m^{\prime}}}}({k})\}$
	$\displaystyle=\frac{1}{P{{M}}^{2}}\sum_{{{m}}=1}^{{M}}\sum_{{{m^{\prime}}}=1}^{{M}}\left(\lambda\delta_{{{m}}{{m^{\prime}}}}+\lambda^{2}(\tilde{f}({k})+o(1))\delta_{{{m}}{{m^{\prime}}}}\right)^{2}$
	$\displaystyle=\frac{\lambda^{2}}{{M}^{2}}\sum_{{{m}}=1}^{{M}}\left(1+\lambda\tilde{f}({k})\right)^{2}+o(1/{M})=\frac{\lambda^{2}}{{M}}\left\{1+\lambda\tilde{f}({k})\right\}^{2}+o(1/{M})$
	$\displaystyle=\frac{1}{{M}}\{f({k})+o(1)\}^{2}+o(1/{M}),$	(47)

where the last equality follows from Lemma J.1, which is a consequence of Proposition 5.1. We shall explore these results in practical scenarios in simulations in Section 8.

An additional useful modification is to consider a one-dimensional smoothing-function $W({k})$ and a bandwidth matrix $\bm{\Omega}$ and implement multidimensional smoothing, see e.g. [26], and averaging over directions, thus reducing the statistic to magnitude $t=\|{k}\|$ only,

\overline{I}_{h}(t;\Omega)=\sum_{{k}^{\prime}\in{\cal K}}{\Omega}^{-1}W\left({\Omega}^{-1}\left(||{k}^{\prime}||-t\right)\right)\widetilde{I}_{h}\left({k}^{\prime}\right),

(48)

which we refer to as the “rotationally averaged periodogram”. [56] call a version of this function the $R$ -spectrum. Alternatively, one could average also over the magnitudes to arrive at a summary function in direction. In 2D the convenient argument would be the polar angle; [56] call this version the $\theta$ -spectrum and use it to study anisotropy.

7 Isotropic spectral estimation

In this Section we show how to form isotropic estimators and discuss their properties. Isotropic estimators are vitally important in two dimensions and higher as it becomes increasingly difficult to interpret spectra. To simplify our representation, it is commonly assumed that we are analysing isotropic processes, and that only isotropic summaries need to be produced.

We therefore consider the special case of stationary and isotropic point processes for which the complete covariance function depends only on distance, $\gamma(z)=\gamma_{I}(||z||)$ [20, p. 310–311]. Such radial functions [32] are special as we do not require any orientational specificity in our spectral representation. The isotropy of $\gamma(z)$ transfers through the Fourier transform to the sdf so in turn $f({k})=f_{I}(||{k}||)$ . We shall refer to the treatment of isotropic random fields when implementing analysis [59]. An alternative is discussed by [27] and [25], for random fields.

The orientation invariance of the spectrum leads to dimensional reduction of the multidimensional Fourier transform. Three basic concepts are useful for the isotropic analysis: Recall the Bessel function of order ${\nu}>-1/2$

\mathcal{J}_{\nu}(t)=\frac{t^{\nu}}{2^{\nu}\pi^{1/2}\Gamma({\nu}+1/2)}\int_{0}^{\pi}\sin^{2{\nu}}(u)~{}\cos[t\ \cos(u)]\;du\qquad t\geq 0.

It is connected to the $d$ -dimensional Fourier transform via [74, Sec 3]

\int_{S^{d-1}}e^{-2\pi tiv\cdot u}du=2\pi~{}t^{-(d/2-1)}\mathcal{J}_{d/2-1}(2\pi t)\quad\forall v\in S^{d-1},\;t\geq 0,

we can also note relationships between radial and Cartesian Fourier representations, see for example [32] or [28].

Recall also the (1-dimensional) Hankel transform of order $m$ of a function $g:[0,\infty]\mapsto{\mathbb{R}}$

\mathcal{H}_{\nu}[g](t)=\int_{0}^{\infty}g(r)\mathcal{J}_{\nu}(tr)r~{}dr,\qquad t\geq 0.

Then if we assume $X$ is a stationary and isotropic process with a radial second order product density $\rho^{(2)}_{I}$ , the sdf at wavenumber ${k}$ with $t=\|{k}\|>0$ takes the form

$\displaystyle f_{I}(t=\\|{k}\\|):=f({k})$	$\displaystyle=\lambda+\int_{{\mathbb{R}^{d}}}\left\{\rho^{(2)}(z)-\lambda^{2}\right\}e^{-2\pi i{k}\cdot z}\;dz$
	$\displaystyle=\lambda+\int_{0}^{\infty}\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}r^{d-1}\int_{S^{d-1}}e^{-2\pi ri~{}{k}\cdot u}\;du\;dr$
	$\displaystyle=\lambda+2\pi\int_{0}^{\infty}\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}\left(\frac{r}{t}\right)^{d/2-1}\mathcal{J}_{d/2-1}(2\pi tr)rdr$
	$\displaystyle\equiv\lambda+2\pi~{}\mathcal{H}_{d/2-1}\left[\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}\left(\frac{r}{t}\right)^{d/2-1}\right](2\pi t),$	(49)

i.e. the sdf is linearly related to the Hankel transform of the complete covariance function.

Finally we recall the radially averaged set covariance function,

\bar{\nu}_{B}(r):=\frac{1}{|S^{d-1}|}\int_{S^{d-1}}|B\cap B_{ru}|du,\quad r\geq 0

where $\nu_{B}(z):=|B\cap B_{z}|$ is the set covariance function of the set $B$ [39, Appendix B.3].

We start by noting the expectation of (18); by direct calculation it follows that this takes the form:

\displaystyle{\mathbf{E}}\{\bar{I}_{h}(\|{k}\|)\}

\displaystyle=\lambda+\int_{B}\int_{B\cap B_{-z}}h_{I}\left(\|z\|\right)\ \mathcal{J}_{d/2-1}\left(2\pi\|{k}\|\|z\|\right)\ \|z\|^{d/2-1}\ \tilde{\rho}^{(2)}(\|z\|)\ dzdx.

As the change of variable to polar coordinates means we will multiply by $\|z\|^{d-1}$ , and this will correspond to a $d/2-1$ Hankel transform of the taper $h(\cdot)$ multiplied by the product density. For more details on the Hankel transform see [32].

To gain insight into how we might estimate the radial sdf besides numerically rotation-averaging the $d$ -dimensional periodogram using Eq. (48) we analytically rotation-average Bartlett’s periodogram (Eq. 14): with $t\geq 0$

$\displaystyle\overline{I}_{0}(t)$	$\displaystyle:=\frac{1}{\|S^{d-1}\|}\int_{S^{d-1}}{I}_{0}(tu)\,du$	(50)
	$\displaystyle=\frac{1}{\|B\|\|S^{d-1}\|}\sum_{x\in X\cap B}\sum_{y\in X\cap B}\int_{S^{d-1}}e^{-2\pi i(tu)\cdot(x-y)}\,du$
	$\displaystyle=\hat{\lambda}+\frac{1}{\|B\|\|S^{d-1}\|}\sum_{x,y\in X\cap B}^{\neq}\int_{S^{d-1}}e^{-2\pi ti(x-y)\cdot u}\ du$
	$\displaystyle=\hat{\lambda}+\frac{2\pi}{\|B\|\|S^{d-1}\|t^{d/2-1}}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{d/2-1}(2\pi t\\|x-y\\|)~{}\\|x-y\\|^{-(d/2-1)},$	(51)

which in the planar case simplifies to

\displaystyle\stackrel{{\scriptstyle d=2}}{{=}}~{}\hat{\lambda}+\frac{1}{|B|}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{0}(2\pi t\|x-y\|).

This explains the motivation behind the isotropic estimator discussed by [6] and [23]. The aforementioned authors prefer to normalise by $N(B)$ rather than by $|B|$ , corresponding to dividing (51) by $\hat{\lambda}$ . Like the classical periodogram, the isotropic “shortcut” estimator of Eq.(51) is highly biased near $0$ , as has already been observed by [23]. The biases are described in the following proposition.

Proposition 7.1.

The isotropic estimator given by (51) has expectation

\displaystyle{\mathbf{E}}\left\{\bar{I}_{0}(t)\right\}=

\displaystyle\lambda+2\pi~{}\mathcal{H}_{d/2-1}\left[\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}\left(\frac{r}{t}\right)^{d/2-1}~{}\frac{\bar{\nu}_{B}(r)}{|B|}\right](2\pi t)+2\pi\lambda^{2}~{}\mathcal{H}_{d/2-1}\left[\left(\frac{r}{t}\right)^{d/2-1}~{}\frac{\bar{\nu}_{B}(r)}{|B|}\right](2\pi t)

Proof.

The proof follows by applying the Campbell’s theorem and adding and subtracting

2\pi\lambda^{2}~{}\mathcal{H}_{d/2-1}\left[\left(\frac{r}{t}\right)^{d/2-1}~{}\frac{\bar{\nu}_{B}(r)}{|B|}\right](2\pi t)=\lambda^{2}\frac{1}{|S^{d-1}||B|}\int_{S^{d-1}}T(B,tu)du.

∎

If we compare the formulas for the expectation of the isotropic sdf in Eq.(7) and the expectation in Proposition 7.1, we recognise two sources of bias, just like with the periodogram. There is a convolution with a function $\bar{\nu}_{B}/|B|$ , coming from the finite observation window like in the periodogram but this time radially averaged, and a centring bias term that is a radially averaged version of the $d$ -dimensional bias. Diggle’s truncation-to-local-minimum construction, see (16), tries to correct for the biases, but among other things, it will not work well for a clustered process as the spectrum near zero will be underestimated.

To work towards a tapered estimator, consider rotation averaging the debiased, tapered estimator of (28), in expectation,

		$\displaystyle{\mathbf{E}}\frac{1}{\|S^{d-1}\|}\int_{S^{d-1}}\widetilde{I}_{h}(tu)\,du$		(52)
	$\displaystyle=$	$\displaystyle{\lambda}+\frac{2\pi}{\|S^{d-1}\|t^{d/2-1}}{\mathbf{E}}\left[\sum_{x,y\in X\cap B}^{\neq}h(x)h(y)\mathcal{J}_{d/2-1}\left(2\pi t\\|x-y\\|\right)\\|x-y\\|^{-(d/2-1)}\right]-\frac{\lambda^{2}}{\|S^{d-1}\|}\int_{S^{d-1}}\left\|H(tu)\right\|^{2}\,du.$

The problem is that a radial function $h_{I}$ might not exist for which $h_{I}(||x-y||)=h(x)h(y)$ . But if we consider data-tapering the observed differences $\{x-y:x,y\in X\cap B\}$ instead of in $B$ , then we can device a taper $h_{I}$ in $B\oplus B=\bigcup_{x\in B}\{B+x\}$ , thus we propose the following isotropic estimator:

\displaystyle\bar{I}_{h}(t):=\hat{\lambda}+\frac{2\pi}{|S^{d-1}|t^{d/2-1}|B|}\sum_{x,y\in X\cap B}^{\neq}h_{I}(x-y)\mathcal{J}_{d/2-1}(2\pi t\|x-y\|)~{}\|x-y\|^{-(d/2-1)}-\widehat{\lambda^{2}}Bias_{I}(t).

(53)

The bias correction term consists of the rotation average of both the taper and the set covariance of $B$ ,

	$\displaystyle Bias_{I}(t)$	$\displaystyle=$	$\displaystyle 2\pi\mathcal{H}_{d/2-1}\left[\left(\frac{r}{t}\right)^{d/2-1}\frac{\bar{p}_{B,h}}{\|B\|}\right](2\pi t),\quad\text{where}$
	$\displaystyle\bar{p}_{B,h}(r)$	$\displaystyle=$	$\displaystyle\|S^{d-1}\|^{-1}\int_{S^{d-1}}\|B\cap B_{ru}\|h_{I}(ru)du,$

and depends only on $t=|{k}|$ . We warn the reader that the estimator of (53) is not necessarily positive, but like previous authors [50, 4], we sacrifice positivity in order to remove large part of the bias. Additional issues arise from estimation of the required $\lambda^{2}$ , as the standard estimator $\hat{\lambda}^{2}$ is biased with a term depending on $\lambda/|B|$ and a term depending on the unknown second order product density of the underlying process. In our examples we use the slightly less biased estimator of $\widehat{\lambda^{2}}=N(B)[N(B)-1]/|B|^{2}$ which removes the first order bias. On the positive side, note that the debiased isotropic estimator without particular tapering (i.e. $h_{I}\equiv 1$ ) can be formed without any tuning parameters.

We may now ask what radial taper to use. Tapers are normally designed for processes observed in discrete time or space. We have discussed using separable tapers, and now note that the choice of the taper needs to match the observational domain. For example it is difficult to match a square observational domain with a radial taper, unless one chooses a compact taper inscribed by the box. There are therefore three considerations 1) demanding the taper be separable in space, 2) isotropic in space, or 3) exactly compact in space. Discrete and isotropic tapers have been been determined numerically by [65]. This will not be an option for us as we need to evaluate the tapers at random locations.

Slepian and co-authors have studied the design of tapers in arbitrary dimensions [66]. Often solutions are in terms of prolate spheroidal wavefunctions. Their discrete analogue is the prolate spheroidal wave sequences, which are similar to the set of Hermite function [78]. However, the isotropic estimator can be seen as an estimator for the Fourier transform of the (non-stationary) first moment of the difference process $Z=\{z=x-y:x,y\in X\cap B\}$ , with the connection $\rho^{(2)}(z)=\lambda_{Z}(z)/|B\cap B_{z}|$ .

Assuming the observation window is a cuboid $B=B_{\square}(\bm{l})$ , then the difference vector observation window is $B\oplus B=\prod_{j=1}^{d}[-l_{j},l_{j}]$ , and we can create the $d$ -dimensional taper related to the Hermite functions as the product of the squared exponentials

h_{I}(z)=\prod_{j=1}^{d}e^{-az_{j}^{2}/(2l_{j})^{2}},\quad z\in{\mathbb{R}^{d}}

(54)

We see that if $l_{j}=l$ for all $j$ , then $h_{I}$ becomes radial. This is related to circular harmonic decompositions, see [41]. The authors of [78] recommend using a scaling factor which we have adjusted for a sampling region of $z_{j}\in(-l_{j},l_{j})$ , but which they match to the spheroidal wave functions. We propose setting $a=25$ , as this numerically seems to not down–weight too much data, or have a too large jump near the border of $B\oplus B$ .

In a sense our above construction is a “fix” as it only works for the 0th Hermite function, and using more than one taper will even if $l_{j}=l$ for all $j$ not lead to isotropic functions. This is not unsurprising as the sampling domain will leave an imprint, and capturing this set of information requires using more than the first taper, see analogous discussion in 1D [58, Ch. 6]. Also, if we have a spatial sampling that is highly elongated rectangle rather than a square then another solution needs to be chosen. Finally we may ask ourselves, what happens if we have a point process $X$ which is anisotropic but we still evaluate (15) in 2D for the pattern? Because it incorporates angular averaging we expect the RHS of the expression to only depend on $\|{k}\|$ as it does.

Proposition 7.2.

Expectation of Diggle’s 2D Estimator II. The large area expectation of Diggle’s estimator given in (15) when $B$ is a cuboidal box $B_{\square}(\bm{l})$ and when the spectrum is not necessarily isotropic takes the form in terms of $\widetilde{f}({k})$ as defined in (11) of

\displaystyle{\mathbf{E}}\{I_{D}({k})\}

\displaystyle=\lambda+\lambda^{2}\int_{{\mathbb{R}}^{d}}\widetilde{f}({k}^{\prime})\frac{\delta(\|{k}\|-\|{k}^{\prime}\|)}{\|{k}\|}d{k}^{\prime}+\int_{{\mathbb{R}}^{d}}\frac{\delta(\|{k}\|-\|{k}^{\prime}\|)}{\|{k}\|}|B|^{-1}T(B,w-{k}^{\prime})d{k}^{\prime}+o(1),

(55)

where $T(B,{k})$ is defined in Lemma 4.1.

Proof.

The proof is provided in Appendix K. ∎

This shows what happens when we calculate isotropic summaries of quantities that are not isotropic per se. Thus the two propositions determine what expectation the analytically orientationally averaged periodogram and the second shows us how Diggle’s estimator mixes orientational information up to produce an isotropic estimator.

8 Simulations

To study the behaviour of the estimators we have discussed, and additionally some of the debiasing results, we conducted an extensive simulation study.

8.1 Details for executing the simulations study

We simulated three stationary models in 2D to represent the archetypal spatial patterns of regularity, complete randomness and clustering. Throughout the simulations the intensity, i.e. expected number of points per unit area, was kept constant at $\lambda\equiv 0.01$ . The processes were simulated in square observation windows $B_{n}=[-l_{n}/2,l_{n}/2]^{2}$ of increasing area, such that on average we observed $\lambda|B_{n}|=\lambda l_{n}^{2}=n=25,50,100,200,400$ or $800$ points. For example, $l_{n=100}=100$ and $l_{n=400}=200$ spatial units. We chose the regime both to explore the often encountered small data scenarios and to ascertain asymptotic behaviour. The models below are tuned so that they produce patterns with visually distinct structures at $l_{n=100}$ .

Complete randomness i.e. the Poisson process has no adjustable parameters after $\lambda$ was fixed. Spatial regularity is represented by the Matérn type II process [39, Section 6.5.2] in two variants, one with hard-core radius $2$ (variant “r2”) and one with radius $5$ (variant “r5”) which correspond to about 40% and 90% of the maximum allowed radius for the process given $\lambda$ , respectively. The clustering behaviour is represented by the Thomas process [39, Section 6.3.2] with cluster intensity $\kappa$ and Gaussian dispersal kernel standard deviation $\sigma$ and again in two variants, one with many small or tight clusters ( $\kappa=0.6\lambda,\sigma=2$ , variant “MS”) and one with a few large clusters ( $\kappa=0.3\lambda,\sigma=6$ , variant “FL”). The per-cluster expected point count $\mu$ was then fixed via the property $\mu\kappa=\lambda$ to $1.67$ and $3.33$ , respectively. The theoretical pair correlation functions and spectral density functions of the models are depicted in Figure 2. Example patterns of each model and observation window combination are given in Supplement Figure 8.

For the estimation of the 2D spectra we fixed the wavenumber grid to ${k}\in[-0.3,0.3]^{2}$ with 101 equidistant steps in each dimension, giving the step size 0.006 in both dimensions. This scale was chosen to cover the interesting range of non-constant values for all models (cf. Figure 2). To reduce the effect of high wavenumber noise only the values on the sub-grid ${k}\in[-0.2,0.2]^{2}$ were considered when integrating over ${k}$ for the quality metrics discussed below. When a rotationally averaged curve was to be computed using Eq. (48), the averaging was done over the magnitude-grid $\|{k}\|=0.003,0.006,...,0.300$ using a box kernel and if not otherwise stated, radius of $1.25\cdot 0.006$ . The radial Hermite taper parameter was fixed to $a=25$ .

We summarised the quality of each estimator $I_{.}=I_{.}({k};{\mathbf{x}})$ under all combinations of a model $\mathcal{M}$ and observation window $B_{n}$ , say $\mathcal{M}_{n}$ , with an integrated summary. First, we estimated per-wavenumber variance $V({k};\mathcal{M}_{n})=\ \text{Var}\left[I_{.}({k};{\mathbf{x}})|{\mathbf{x}}\sim\mathcal{M}_{n}\right]$ , bias $Bias({k};\mathcal{M}_{n})=\ {\mathbf{E}}\left[I_{.}({k};{\mathbf{x}})-f_{M}({k})|{\mathbf{x}}\sim\mathcal{M}_{n}\right]$ and mean square error $MSE({k};\mathcal{M}_{n})\ =\ V({k};\mathcal{M}_{n})+Bias^{2}({k};\mathcal{M}_{n})$ . The $f_{\mathcal{M}}$ stand for the theoretical sdf of the model $\mathcal{M}$ . Then these were summarised further to $iVar(\mathcal{M}_{n})\ =\sum_{k}V({k};\mathcal{M}_{n})$ , the integrated squared bias $iBias^{2}(\mathcal{M}_{n})=\sum_{k}Bias^{2}({k};\mathcal{M}_{n})$ and $iMSE(\mathcal{M}_{n})=iVar(\mathcal{M}_{n})+iBias^{2}(\mathcal{M}_{n})$ . Smaller $iVar,iBias^{2}$ and $iMSE$ indicate better quality. The quantities were estimated from 1000 simulations of every $\mathcal{M}_{n}$ .

The estimator for a periodogram is given in equation (28) with the debiasing term included. Bartlett’s periodogram (“periodogram”) has the taper $h_{0}(x)=|B_{n}|^{-\frac{1}{2}}\mathbf{1}_{B_{n}}(x)$ for window $B_{n}$ . For the multitapering (“mt ${M}$ ”), defined in equation (38) with ${M}={\tilde{M}}^{2}$ tapers each having a parameter ${{m}}~{}=~{}({\tilde{m}}_{1},{\tilde{m}}_{2})~{}\in~{}\{1,...,{\tilde{M}}\}^{2}$ , we used the orthogonal sine-tapers $h_{{m}}(x)=\mathbf{1}_{B_{n}}(x)\prod_{j=1}^{2}\sin[\pi{\tilde{m}}_{j}(x_{j}+l_{j}/2)/l_{j}]$ . For the kernel smoothed estimators (“smoothed $b$ ”), we first compute the Bartlett’s periodogram and then, given the estimate $I_{0}$ on the wavenumber gird, we convolute it with a $b\times b$ discrete template having approximately Gaussian weights.

All computations were done using the R-software [62, v.3.6.3]. Simulations were done with the help the R-package spatstat [3, v.1.64-1]. Each of the discussed estimators were programmed into an R-package ppspectral, available on request from the first author.

Table 1: The fraction of bias removed by the proposed bias correction. Maximum is 1.00.

Model	Estimator	$n=25$	50	100	200	400	800
MatérnII r5	periodogram	1.00	1.00	1.00	0.98	0.99	1.00
MatérnII r5	mt ${M}=3^{2}$	0.99	1.00	1.00	1.00	1.00	1.00
MatérnII r2	periodogram	1.00	1.00	1.00	0.99	0.99	1.00
MatérnII r2	mt ${M}=3^{2}$	1.00	1.00	1.00	1.00	1.00	1.00
Poisson	periodogram	1.00	1.00	1.00	0.98	0.99	1.00
Poisson	mt ${M}=3^{2}$	1.00	1.00	1.00	1.00	1.00	1.00
Thomas FL	periodogram	0.93	0.98	0.99	0.97	0.99	1.00
Thomas FL	mt ${M}=3^{2}$	0.35	0.86	0.98	1.00	1.00	1.00
Thomas MS	periodogram	0.98	0.99	1.00	0.97	0.99	1.00
Thomas MS	mt ${M}=3^{2}$	0.89	0.98	1.00	1.00	1.00	1.00

8.2 The effect of debiasing

As we saw in Figure 3, the debiasing term has a large effect on the quality of the estimates. Table 1 provides the fractions of bias removed by the debiasing term for the Bartlett’s periodogram and ${M}=3^{2}$ multitapered periodogram. Nearly all bias is removed, with the only notable exception being the 65% bias left in the multitapered periodogram when data is an arguably very small sample of Thomas FL variant.

We illustrate the form of the biases in Figure 4 for two model variants. The centring bias is always positive, as expected from equation (27). Bartlett’s periodogram bias (cf. Lemma 4.1) is concentrated along the axis where one of the sinc functions is constantly 1 and near the origin where the second sinc function grows to 1. The multitapered periodogram has a square-shape around the origin.

8.3 Overall quality of the debiased estimators

Figure 5 summarises the estimated integrated variance, squared bias and MSE of the estimators with various tuning parameters. Only the debiased estimators are included, and the values are given in $\log_{10}$ -scale and relative to the baseline given by Bartlett’s periodogram.

Multitaper with ${M}=1^{2}$ (i.e. only one taper) does not differ in iMSE from the periodogram as the benefit of averaging does not apply. With more than one taper the variance is reduced, e.g. ${M}=3^{2}$ it goes down by 90%, and the bias stays at baseline, but adding more and more tapers ( ${M}=6^{2}$ ) starts to introduce bias especially for low sample sizes. The bias is more prominent for the clustered cases, and based on the earlier discussion this is due to oversmoothing at small wavenumbers. Both the variance reduction and the oversmoothing happens with the post-hoc smoothing as well. From the results we can confirm that some smoothing is beneficial but again a balance must be struck to avoid oversmoothing (cf. Section 6). The regular process is more resilient to oversmoothing with reasonable sample sizes.

8.4 Rotational averaging and isotropic estimation

Isotropic estimation, either by averaging radially or by the isotropic shortcuts discussed in Section 7, is an important dimension reduction step for easier visual inspection of the periodogram (particularly for 3D data). The kernel-based rotation averaging depends on a bandwidth parameter, and we studied how this affects the quality metrics. We set the bandwidth radius to $BF\cdot 0.006$ and varied the bandwidth factor $BF$ from 1 to 10. Figure 6 provides the $iBias^{2}$ and $iMSE$ for the debiased Bartlett’s periodogram when comparing the rotationally averaged estimates to the true 1D isotropic sdfs (cf. Figure 2).

The Poisson process has a constant sdf so oversmoothing will not be penalised. For the clustered processes even a small bandwidth introduces bias, but this is typically more than offset by the reduction in variance. The error for the few large clusters variant is more sensitive to changes in the bandwidth than the many small clusters. This is because the effective range of wavenumbers where the sdf exhibits structure is more concentrated, and thus easier to smooth out, than the few large cluster variant (cf. Figure 2).

The strongly regular variant “r5” is similar to the clustered cases in that an optimal bandwidth is clearly present. For the less regular “r2” variant the error is like that of the Poisson process, possibly due to the target sdf being a weak oscillation around a constant and this being hard to detect. If we look at the integrated squared bias we however do see that with large enough data oversmoothing is detectable.

We note that since multitapered periodograms and locally averaged periodograms are already smoothed, their rotational averaged estimates exhibit smoothing bias at smaller bandwidth factors (results not worth showing here).

Table 2: Scores

iMSE,iBias

and

iVar

for isotropic estimators, relative to radially averaged debiased periodogram with optimised smoothing bandwidth.

	No taper			Squared-exponential taper
Model	iMSE	iBias²	iVar	iMSE	iBias²	iVar
MatérnII r5	6.12	0.21	8.57	3.51	0.29	4.90
MatérnII r2	22.40	3.52	23.07	15.59	1.09	16.12
Poisson	15.72	10.53	15.89	13.76	5.75	13.77
Thomas FL	1.22	0.04	1.81	1.32	0.14	1.97
Thomas MS	4.34	0.01	6.50	3.58	0.04	5.35

To compare the radial averaging to direct isotropic estimation, we estimated the debiased isotropic periodogram and the squared-exponential tapered isotropic periodogram with $a=25$ . Table 2 provides the relative quality scores for the isotropic and the tapered isotropic periodograms relative to radially averaged periodogram with the best bandwidth for each model and data size, cf. Figure 6. The values are medians over the sample sizes $n\geq 50$ .

In iMSE sense the radially averaged periodogram performs the best, chiefly because iMSE is of order iVar and the radial averaging greatly reduces the variance. However, in terms of bias the isotropic estimator provides more accuracy, except for Poisson when no oversmoothing is possible. Adding a smooth taper to the isotropic estimator further reduces the bias, but can increase the variance, likely due to some data near the edges is being filtered out. The bias occurs near $\|{k}\|=0$ , mostly when $\|{k}\|<1/(\sqrt{2}l_{n})$ (illustration in Supplements Figure 9).

9 Spectral properties of the Barro Colorado Island data

The Barro Colorado Island (BCI) rainforest data set [16] contains a census of tree and shrub species from a 1000m by 500m region on Barro Colorado Island, Panama. We illustrate the techniques developed in this paper on the point patterns corresponding from three species contained in the BCI data, using the 2015 census, using only individuals that are alive.

Figure 7 shows an example analysis for three species from the BCI data, namely Annona acuminata, Herrania purpurea, and Psychotria marginata. The top row shows the point patterns for each of the three species, the middle row shows estimates of their spectral density functions computed using multitapering, and the bottom row shows estimates of the pair correlation function for each species. It is clear that there are substantial differences in the structures present in the spectral density functions of these species. Firstly, in the case of Annona acuminata (left), we see clearly anisotropic behaviour, similar to the anisotropic clustering shown in Figure 3. This is to be expected as A. acuminata is usually associated with very moist soils and has previously been associated with the stream (a linear structure) running through the plot [37, 29]. Secondly, in the case of Herrania purpurea (middle) we see no structure at all, which is reassuring as this suggests that the features seen in the other two species are signal and not just noise. Again, this makes ecological sense as H. purpurea is a lower canopy tree that has seeds widely dispersed by monkeys and other animals. Finally for Psychotria marginata (right) we see more isotropic behaviour, with information present at low wavenumbers until $\|k\|\approx 0.005$ , corresponding to correlation on scales of length around 200. Additionally, there is further information until $\|k\|\approx 1/15$ , which corresponds to the peak seen in the pair correlation function until scales around 15. This clustering is likely caused by seeds being relatively poorly dispersed [69], but also because P. marginata is associated with moist soils leading to a build up of large clusters of trees in the low plateau and swamp regions of the plot in the upper left quadrant of the BCI plot [37, 29].

10 Conclusions

Spectral analysis and Fourier features are classical and important tools for characterizing and understanding time series and random fields [58, 24], in for example estimation and detection problems. When first introduced, the fundamental theory of spectral analysis for random fields and time series corresponded to understanding what Fourier transform to use, and determining the first and second moments of that Fourier transform for stationary time series and random fields [61]. This led researchers to develop the by now well–established theory of spectral analysis (again for time series and random field) from which a large and sophisticated theory of spectral analysis was built. This set of methods has more recently been discovered in machine learning [48, 73]. The corresponding theory for spatial point processes has been neglected apart from some notable and not very recent exceptions [5, 6, 56]. For point processes machine learning researchers have just started to discover the utility of Fourier-based methods [38, 44]. Current state of the art for the spectral analysis of point processes is that we really do not even know what to compute, and know even less how to address its digital implementation.

This article has addressed this first outstanding problem. We have calculated the expectation of natural method of moments estimators. We have imported state of the art ideas from signal processing as part of this process, and have addressed how to taper for variance reduction in this setting. Introducing tapering in this setting required us to determine how to taper, where we chose to use continuous space tapers, not to interpolate as had been done by previous authors. We showed that in the setting of Gaussian convergence, multitaper estimates of the spectral density have a decreased variance. Our simulation studies verify our theoretical results, and the general utility of tapering outside the most restrictive class where we can prove everything.

This manuscript does not seek to address how to bring spectral analysis of point processes into the 21st century, and reproduce all results available for time series and random fields. Instead we have taken an important first step by adapting existing signal processing tools for time series and random fields and shown how they can be modified to help us estimate the spectral content of a spatial point process. This paves the way forward to develop more sophisticated theory which can be applied to complex spatial point pattern datasets and more sophisticated algorithms.

Acknowledgment

This work was supported in part by by the UK Engineering and Physical Sciences Research Council under Mathematical Sciences Leadership Fellowship EP/I005250/1, Developing Leaders Award EP/L001519/1, and Award EP/N007336/1; the European Research Council under Grant CoG 2015-682172NETS within the Seventh European Union Framework Program; and Academy of Finland (project number 327211).

Appendix A Proof of Lemma 4.1

Proof.

We start the proof by noting that (using that $B$ is centred)

$\displaystyle{\mathbf{E}}I_{0}({k})$	$\displaystyle=\lambda+\|B\|^{-1}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}\rho^{(2)}(z)dz$	(56)
	$\displaystyle=\lambda+\|B\|^{-1}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dzdx+\|B\|^{-1}\lambda^{2}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}dz$
	$\displaystyle=\lambda+\|B\|^{-1}\int_{{\mathbb{R}}^{d}}\|B\cap B_{-z}\|e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz+\|B\|^{-1}\lambda^{2}T(B,{k})$
	$\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}G^{(1)\ast}({k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime}-{k})\,d{k}^{\prime}+\|B\|^{-1}\lambda^{2}T(B,{k}),$

by the convolution theorem, after we have defined the functions:

	$\displaystyle G^{(1)}({k})$	$\displaystyle=\|B\|^{-1}\int_{R^{d}}\|B\cap B_{-z}\|e^{-2\pi i{k}\cdot z}\,dz=\|B\|^{-1}T(B,{k})$
	$\displaystyle\widetilde{G}^{(2)}({k})$	$\displaystyle=\int_{{\mathbb{R}}^{d}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz=\lambda^{2}\widetilde{f}({k}).$

Note that from Gradshteyn 3.741 with the change of variables $x_{j}=\pi{k}_{j}l_{j}$ that we get (recalling $|B|=\prod_{j}l_{j}$ ):

$\displaystyle\int_{{\mathbb{R}}^{d}}T(B,{k})\;d{k}$	$\displaystyle=\int_{{\mathbb{R}}^{d}}\prod_{j}\frac{\sin^{2}(\pi{k}_{j}l_{j})}{(\pi{k}_{j})^{2}}d{k}_{j}$
	$\displaystyle=\int_{{\mathbb{R}}^{d}}\prod_{j}l_{j}^{2}\frac{\sin^{2}(x_{j})}{x_{j}^{2}}(dx_{j}/(\pi l_{j}))$
	$\displaystyle=\|B\|\prod_{j}(\pi/\pi)=\|B\|.$	(57)

We have defined

T(B,{k})=\prod_{j=1}^{d}\frac{\sin^{2}(\pi{k}_{j}l_{j})}{(\pi{k}_{j})^{2}},\quad{k}\in{\mathbb{R}}^{d}.

(58)

We now go back to our definition of ${\mathbf{E}}I_{0}^{(2)}({k})$ , and explore this more carefully. We also fix $0<\beta<1$ , and expand the expectation as follows (realising that care needs to be taken around the pole):

$\displaystyle{\mathbf{E}}I_{0}^{(2)}({k})$	$\displaystyle=\lambda^{2}\int_{R^{d}}G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}$
	$\displaystyle=\lambda^{2}\prod_{j=1}^{d}\left(\int_{-\infty}^{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}+\int_{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}^{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}+\int_{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}^{\infty}\right)G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}$
	$\displaystyle={\mathbf{E}}I_{1\dots 1}^{(2)}({k})+{\mathbf{E}}I_{12\dots}^{(2)}({k})+\dots+{\mathbf{E}}I_{d\cdots d}^{(2)}({k}).$	(59)

We can note directly, that taking a supremum over $G^{(1)\ast}({k}-{k}^{\prime})$ in the range of integration, and assuming the integral of $G^{(2)}({k})$ is finite (which is true from Parseval-Rayleigh relationships

{\mathbf{E}}I_{11}^{(2)}({k})={\cal O}(|B|^{-1}\prod_{j=1}^{d}l_{j}^{2-2\beta})={\cal O}(\prod_{j=1}^{d}l_{j}^{1-2\beta}),

(60)

which requires $\beta>1/2$ to be $o(1)$ . The same holds for the other elements apart from ${\mathbf{E}}I_{2\dots 2}^{(2)}({k})$ . We therefore have

	$\displaystyle{\mathbf{E}}I_{0}^{(2)}({k})$	$\displaystyle=\lambda^{2}\prod_{j=1}^{d}\int_{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}^{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}\left\{1+o(1)\right\}$		(61)
		$\displaystyle\equiv{\cal I}_{0}^{(2)}({k})\left\{1+o(1)\right\},$

so that

{\cal I}_{0}^{(2)}({k})\equiv\lambda^{2}\prod_{j=1}^{d}\int_{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}^{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}.

We assume that $\tilde{f}({k})$ is twice differentiable at ${k}$ (everywhere possibly but ${k}=0$ ) so that we get for all ${k}\neq 0$ and for $\|{k}-{k}^{\prime\prime}\|<\|{k}-{k}^{\prime}\|$

\widetilde{f}({k}^{\prime})=\widetilde{f}({k})+\nabla\widetilde{f}({k})^{T}\left({k}^{\prime}-{k}\right)+\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right),

(62)

defining $\nabla\widetilde{f}({k})$ as the gradient and $\tilde{\mathbf{H}}_{f}({k})$ as the Hessian matrix.

Substituting this expansion back into (61) we get with a change of variables

$\displaystyle{\cal I}_{0}^{(2)}({k})$	$\displaystyle=\lambda^{2}\int_{{k}_{1}-\frac{1}{l_{1}^{1-\beta}}}^{{k}_{1}+\frac{1}{l_{1}^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l_{p}^{1-\beta}}}^{{k}_{p}+\frac{1}{l_{p}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\left\{\widetilde{f}({k})+\nabla\widetilde{f}({k})^{T}\left({k}^{\prime}-{k}\right)+\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\right.$
	$\displaystyle\left.\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\right\}\;d{k}^{\prime}$
	$\displaystyle=\lambda^{2}\widetilde{f}({k})\int_{{k}_{1}-\frac{1}{l_{1}^{1-\beta}}}^{{k}_{1}+\frac{1}{l_{1}^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l_{p}^{1-\beta}}}^{{k}_{p}+\frac{1}{l_{p}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\;d{k}^{\prime}$
	$\displaystyle+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\int_{{k}_{1}-\frac{1}{l^{1-\beta}}}^{{k}_{1}+\frac{1}{l^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}\|B\|^{-1}T(B,{k}-{k}^{\prime})\cdot({k}^{\prime}-{k})\left({k}^{\prime}-{k}\right)^{T}\;d{k}^{\prime}\right\}.$	(63)

We can then note that

\int_{{k}_{1}-\frac{1}{l^{1-\beta}}}^{{k}_{1}+\frac{1}{l^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}G^{(1)}({k}-{k}^{\prime})\;d{k}^{\prime}=1+o(1).

By direct calculation we then find

	$\displaystyle\int_{R^{d}}G^{(1)}({k})d{k}$	$\displaystyle=\|B\|^{-1}\int_{R^{d}}\int_{R^{d}}\|B\times B_{-z}\|e^{-i2\pi{k}\cdot z}\,dzdw$
		$\displaystyle=\|B\|^{-1}\int_{R^{d}}\|B\times B_{-z}\|\delta(z)\,dz$
		$\displaystyle=1.$

Note that this is now consistent with (57), and achieves a value of $1$ . Second,

\int_{{k}_{1}-\frac{1}{l^{1-\beta}}}^{{k}_{1}+\frac{1}{l^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}\left({k}^{\prime}-{k}\right)^{T}G^{(1)\ast}({k}-{k}^{\prime})\;d{k}^{\prime}=0,

|B\times B_{-z}|=|B\times B_{z}|,

implies that this function $|B\times B_{-z}|$ is symmetric (or even) in $z$ , and real-valued, which implies that its Fourier transform also is real-valued. The symmetric integral of an even times an odd function is always zero.

We now also determine

{\mathbf{B}}=\int_{{k}_{1}-\frac{1}{l_{1}^{1-\beta}}}^{{k}_{1}+\frac{1}{l_{1}^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\cdot({k}-{k}^{\prime})\left({k}-{k}^{\prime}\right)^{T}\;d{k}^{\prime}.

(64)

Let us start by a change of variables. Let us first define the matrix

\bm{L}={\mathrm{diag}}\left(l_{1},\dots,l_{d}\right).

Thus the change of variables is

\nu=\pi\bm{L}\left({k}-{k}^{\prime}\right)\Longleftrightarrow{k}^{\prime}={k}-\bm{L}^{-1}\nu/\pi.

(65)

With this change of variables we get that

$\displaystyle{\mathbf{B}}$	$\displaystyle=\int_{-l_{1}^{\beta}/2}^{l_{1}^{\beta}/2}\dots\int_{-l_{p}^{\beta}/2}^{l_{p}^{\beta}/2}G^{(1)\ast}({k}-{k}^{\prime})\cdot({k}-{k}^{\prime})\left({k}-{k}^{\prime}\right)^{T}\;d{k}^{\prime}$
	$\displaystyle=\int_{-l_{1}^{\beta}/2}^{l_{1}^{\beta}/2}\dots\int_{-l_{p}^{\beta}/2}^{l_{p}^{\beta}/2}\prod_{j=1}^{d}\frac{\pi^{2}l_{j}^{2}\sin^{2}(\nu_{j})}{\|B\|\nu_{j}^{2}}\bm{L}^{-1}\pi^{-d}\nu\nu^{T}\bm{L}^{-1}\pi^{-d}\;d\nu_{j}(1/l_{j})$
	$\displaystyle=\int_{-l_{1}^{\beta}/2}^{l_{1}^{\beta}/2}\dots\int_{-l_{p}^{\beta}/2}^{l_{p}^{\beta}/2}\prod_{j=1}^{d}\frac{l_{j}\sin^{2}(\nu_{j})}{\|B\|\nu_{j}^{2}}\bm{L}^{-1}\nu\nu^{T}\bm{L}^{-1}\;d\nu_{j}.$	(66)

We then find if all the $l_{j}=l$ then if $0<\beta$

$\displaystyle{\mathrm{trace}}\left\{{\mathbf{B}}\right\}$	$\displaystyle=\int_{-l^{\beta}/2}^{l^{\beta}/2}\dots\int_{-l^{\beta}/2}^{l^{\beta}/2}\prod_{j=1}^{d}\frac{\sin^{2}(\nu_{j})}{\nu_{j}^{2}}\sum_{q=1}^{d}(\nu_{q}/l)^{2}\;d\nu_{j}$
	$\displaystyle=d\cdot\frac{1}{l^{2}}\left(\int_{-\infty}^{\infty}\frac{\sin^{2}(x)}{x^{2}}\,dx\right)^{d-1}\int_{-l^{\beta}/2}^{l^{\beta}/2}\sin^{2}(x)\,dx$
	$\displaystyle=\frac{d}{l^{2}}\pi^{d-1}l^{\beta}\left\{1+o(1)\right\}.$	(67)

This decreases with increasing $l$ as long as $\beta<2$ (which follows as we have assumed $\beta<1$ ), and as the next error term is $O(l^{-\beta})$ . We therefore want to chose $\beta\rightarrow 1$ , or for some fixed $\epsilon>0$ take $\beta=1-\epsilon$ . Putting all of these components together we get that

	$\displaystyle{\mathbf{E}}I_{0}({k})$	$\displaystyle=\lambda+\|B\|^{-1}\int_{{\mathbb{R}}^{d}}\|B\cap B_{-z}\|e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz+\|B\|^{-1}\lambda^{2}T(B,{k})$
		$\displaystyle=\lambda+\lambda^{2}\widetilde{f}({k})+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\frac{d}{l^{2-\beta}}\pi^{d-1}\left\{1+o(1)\right\}\right\}+\|B\|^{-1}\lambda^{2}T(B,{k})$
		$\displaystyle=\lambda+(-\lambda+f({k}))\left\{1+o(1)\right\}+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\frac{d}{l^{2-\beta}}\pi^{d-1}\left\{1+o(1)\right\}\right\}+\|B\|^{-1}\lambda^{2}T(B,{k})$
		$\displaystyle=f({k})\left\{1+o(1)\right\}+o(1)+\|B\|^{-1}\lambda^{2}T(B,{k}).$

∎

Appendix B Proof of Proposition 4.1

Proof.

We start from using Eqn. (9) and determine that

\displaystyle{\mathbf{E}}\left\{J_{h}({k})\right\}

\displaystyle={\mathbf{E}}\sum_{x\in X}h(x)e^{-2\pi i{k}\cdot x}=\lambda\int_{R^{d}}h(x)e^{-2\pi i{k}\cdot x}\,dx=\lambda H({k}).

(68)

Subsequently we find using Eqn. (9)

$\displaystyle{\mathbf{E}}\left\{\left\|J_{h}({k})\right\|^{2}\right\}$	$\displaystyle={\mathbf{E}}\left\{\sum_{x\in X}\sum_{y\in X}h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\right\}$
	$\displaystyle={\mathbf{E}}\sum_{x\in X}\left\|h(x)\right\|^{2}+{\mathbf{E}}\left\{\sum_{x\in X,y\in X}^{\neq}h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\right\}$
	$\displaystyle=\lambda\int_{R^{d}}\left\|h(x)\right\|^{2}dx+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy$
	$\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy.$	(69)

We now calculate the variance as

	$\displaystyle{\mathrm{\bf Var}}\left\{J_{h}({k})\right\}$	$\displaystyle={\mathbf{E}}\left\{\left\|J_{h}({k})\right\|^{2}\right\}-\left\|{\mathbf{E}}\left\{J_{h}({k})\right\}\right\|^{2}$
		$\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy-\left\|\lambda H({k})\right\|^{2}.$		(70)

We now define the renormalised quantity

\widetilde{\rho}^{(2)}(z)=\frac{{\rho}^{(2)}(z)-\lambda^{2}}{\lambda^{2}},\quad z\in{\mathbb{R}}^{d}.

(71)

The expression in (70) then can be simplified to

$\displaystyle{\mathrm{\bf Var}}\left\{J_{h}({k})\right\}$	$\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy-\left\|\lambda H({k})\right\|^{2}$
	$\displaystyle=\lambda+\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy-\left\|\lambda H({k})\right\|^{2}$
	$\displaystyle=\lambda+\lambda^{2}\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy$
	$\displaystyle=\lambda+\lambda^{2}{\mathrm{\bf Var}}_{1}\left\{J_{h}({k})\right\},$	(72)

where we define

{\mathrm{\bf Var}}_{1}\left\{J_{h}({k})\right\}\equiv\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy.

(73)

To simplify this expression we note that

$\displaystyle{\mathrm{\bf Var}}_{1}\left\{J_{h}({k})\right\}$	$\displaystyle=\int_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy$
	$\displaystyle=\int_{R^{d}\times R^{d}}\left[\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{2\pi i(x-y)\cdot{k}^{\prime}}d{k}^{\prime}\right]\cdot h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy$
	$\displaystyle=\int_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{-2\pi i(x-y)\cdot({k}-{k}^{\prime})}d{k}^{\prime}\cdot h(x)h^{\ast}(y)\,dx\,dy$
	$\displaystyle=\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})\left\|H({k}-{k}^{\prime})\right\|^{2}dw^{\prime}.$	(74)

This yields the desired expression for the variance. We return to the expression of the relation of the DFT. We aim to show that

{\mathrm{Rel}}\left\{J_{h}({k})\right\}=\lambda\int_{\mathbb{R}^{d}}H({k}^{\prime}-2{k})H({k}^{\prime})d{k}^{\prime}+\int_{\mathbb{R}^{d}}U({k},z)e^{-2\pi i{k}\cdot z}\{\rho(z)-\lambda^{2}\}dxdz.

We have by the definition of the relation [64]

\displaystyle{\mathrm{Rel}}\left\{J_{h}({k})\right\}

\displaystyle={\mathbf{E}}\left\{\left(J_{h}({k})\right)^{2}\right\}-\left({\mathbf{E}}\left\{J_{h}({k})\right\}\right)^{2}.

We then note that

$\displaystyle{\mathbf{E}}\left\{\left(J_{h}({k})\right)^{2}\right\}$	$\displaystyle={\mathbf{E}}\left\{\sum_{x\in X}\sum_{y\in X}h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\right\}$
	$\displaystyle={\mathbf{E}}\sum_{x\in X}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}+{\mathbf{E}}\left\{\sum_{x\in X,y\in X}^{\neq}h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\right\}$
	$\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+\iint_{R^{d}\times R^{d}}\bar{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy$
	$\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+{\mathrm{Rel}}_{1}\left\{J_{h}({k})\right\}.$	(75)

We now seek to simplify this and write

{\mathrm{Rel}}_{1}\left\{J_{h}({k})\right\}=\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy.

(76)

We additionally have from (68):

{\mathbf{E}}^{2}\left\{J_{h}({k})\right\}=\lambda^{2}H^{2}({k}).

Putting all of this together we get that

	$\displaystyle{\mathrm{Rel}}\left\{J_{h}({k})\right\}$	$\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy$
		$\displaystyle-\lambda^{2}H^{2}({k})$
		$\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+\lambda^{2}\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy.$

We now implement a change of variables, and set $z=x-y$ . We then have

\iint_{R^{d}\times R^{d}}\!\!\!\!\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy=\iint_{R^{d}\times R^{d}}\!\!\!\!\widetilde{\rho}^{(2)}(z)h(x)e^{-2\pi i{k}\cdot x}h(x-z)e^{-2\pi i{k}\cdot(x-z)}\,dx\,dz,

and so with the definition of

U({k},z)\equiv\int h(x)h(z+x)e^{-2\pi i{k}\cdot(2x)}dx,

(77)

the expression is a consequence of the above expressions. ∎

Appendix C Proof of Lemma 4.2

Proof.

We start from

\widetilde{I}_{h}({k})=\left|J_{h}({k})-\lambda H({k})\right|^{2},

and so take

{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}={\mathbf{E}}\{\left|J_{h}({k})\right|^{2}\}-\lambda{\mathbf{E}}\{J_{h}({k})H^{\ast}({k})\}-\lambda{\mathbf{E}}\{J_{h}^{\ast}({k})H({k})\}+\lambda^{2}{\mathbf{E}}\{\left|H({k})\right|^{2}\}.

We now use Campbell’s theorem to evaluate these expectations. We already have from Proposition 4.1 that

\displaystyle{\mathbf{E}}\left\{\left|J_{h}({k})\right|^{2}\right\}

\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy.

Also we note that

\displaystyle H({k})

\displaystyle=\int_{R^{d}}h(x)e^{-2\pi i{k}\cdot x}\,dx.

(78)

We also note

\displaystyle{\mathbf{E}}\left\{J_{h}({k})\right\}

\displaystyle=\lambda H({k}).

Therefore we have

$\displaystyle\!{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}\!$	$\displaystyle=\!{\mathbf{E}}\{\left\|J_{h}({k})\right\|^{2}\}-\lambda{\mathbf{E}}\{J_{h}({k})H^{\ast}({k})\}-\lambda{\mathbf{E}}\{J_{h}^{\ast}({k})H({k})\}+\lambda^{2}{\mathbf{E}}\{\left\|H({k})\right\|^{2}\}$
	$\displaystyle={\mathbf{E}}\{\left\|J_{h}({k})\right\|^{2}\}-\lambda^{2}\left\|H({k})\right\|^{2}$
	$\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}\{{\rho}^{(2)}(x-y)-\lambda^{2}\}h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy.$	(79)

We note that the multiplication in (79) leads to a convolution in the wavenumber domain. If we rewrite

\displaystyle{\rho}^{(2)}(z)-\lambda^{2}

\displaystyle=\int_{R^{d}}(f({k})-\lambda)e^{2\pi i{k}\cdot z}\,d{k},

(80)

then rewriting (79) we get

	$\displaystyle\!{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}\!$	$\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}\int_{R^{d}}(f(u)-\lambda)\exp(-2\pi iu\cdot(x-y)+2\pi i{k}\cdot(x-y))\,duh(x)h^{\ast}(y)\,dx\,dy$
		$\displaystyle=\lambda+\int_{R^{d}}(f(u)-\lambda)\\|H({k}-u)\\|^{2}\,du=\int_{R^{d}}f(u)\\|H({k}-u)\\|^{2}\,du,$

as given. ∎

Appendix D Proof of Corollary 4.2

Proof.

Starting from Lemma 4.2 we can note that

{\mathbf{E}}\{\widetilde{I}_{h}({k})\}=\int_{{\mathbb{R}}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}f({k}^{\prime})\,d{k}^{\prime}.

(81)

Now we need to assume properties of $|H({k})|$ in order to simplify this expression. Assume the cuboid domain has side length $l$ , and that we have selected a data taper so that it has unit norm:

\int_{{\mathbb{R}}^{d}}\|h(x)\|^{2}\,dx=1.

We assume the data taper $h(x)$ is compactly supported in space and well–concentrated in wavenumber to region ${\mathcal{W}}\subset{\mathbb{R}}^{d}$ so that

\iint_{\mathcal{W}}|H({k})|^{2}\,d{k}=1-\varepsilon_{l},

(82)

where $l>0$ is the minimum length scale in any dimension of ${\mathcal{W}}$ , and we assume $\epsilon_{l}\rightarrow 0$ as $|l|\rightarrow\infty.$ We now repeat the arguments posed in Appendix A, for a different window $h(x)$ then the (spatial) box–car to the region.

We now return to (81) and present a similar argument to Appendix A. We note that

$\displaystyle{\mathbf{E}}\{\widetilde{I}_{h}({k})\}$	$\displaystyle=\int_{{\mathbb{R}}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}f({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\int_{{\mathbb{R}}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}\left(\lambda+\lambda^{2}\widetilde{f}({k}^{\prime})\right)\,d{k}^{\prime}$
	$\displaystyle=\lambda+\lambda^{2}\int_{{\mathbb{R}}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\lambda+\lambda^{2}\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}+\lambda^{2}\int_{{\mathbb{R}}^{d}\backslash{\mathcal{W}}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\lambda+{\mathbf{E}}_{1}\{\widetilde{I}_{h}({k})\}+{\mathbf{E}}_{2}\{\widetilde{I}_{h}({k})\},$	(83)

where $f^{[n]}$ denotes the $n$ th derivative of the function $f$ and we use this equation to define the latter two terms. We note that

\displaystyle\left|{\mathbf{E}}_{2}\{\widetilde{I}_{h}({k})\}\right|

\displaystyle\leq\lambda^{2}\varepsilon_{l}\|\widetilde{f}\|_{0}.

(84)

Thus we only need to understand the remaining term in the expression. We then obtain with a Lagrange form of the remainder

$\displaystyle{\mathbf{E}}_{1}\{\widetilde{I}_{h}({k})\}$	$\displaystyle=\lambda^{2}\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\lambda^{2}\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\left[\widetilde{f}({k})+\widetilde{f}^{[1]}({k})({k}^{\prime}-{k})+\frac{1}{2}\widetilde{f}^{[2]}({k})({k}^{\prime}-{k})^{2}+R^{[3]}({k},{k}^{\prime})\right]\,d{k}^{\prime}$
	$\displaystyle=\lambda^{2}(1-\varepsilon_{l})\widetilde{f}({k})+\frac{1}{2}\lambda^{2}\widetilde{f}^{[2]}({k})\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}({k}^{\prime}-{k})^{2}\,d{k}^{\prime}+\widetilde{R}^{[3]}({k}).$	(85)

We can note that

|{R}^{[3]}({k},{k}^{\prime})|\leq\frac{1}{3!}\sup_{{k}^{\prime\prime}}|\widetilde{f}^{[3]}({k}^{\prime\prime})|\int|{k}^{\prime}-{k}|^{3}\left|H({k}^{\prime}-{k})\right|^{2}\,d{k}^{\prime}.

(86)

Putting these terms together we obtain that

	$\displaystyle{\mathbf{E}}\{\widetilde{I}_{h}({k})\}$	$\displaystyle=\lambda+\lambda^{2}(1-\varepsilon_{l})\widetilde{f}({k})+\frac{1}{2}\lambda^{2}\widetilde{f}^{[2]}({k})\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}({k}^{\prime}-{k})^{2}\,d{k}^{\prime}+\widetilde{R}^{[3]}({k})$		(87)
		$\displaystyle=\lambda+\lambda^{2}\widetilde{f}({k})+o(1),$		(88)

which completes the expression. ∎

Appendix E Proof of Theorem 4.1

Proof.

The expectation of the tapered Fourier transform is (see (9))

$\displaystyle{\mathbf{E}}I^{t}({k})$	$\displaystyle=\int_{{\mathbb{R}}^{2d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}\rho^{(2)}(x,y)\,dx\,dy+\int_{{\mathbb{R}}^{d}}h^{2}(x)\rho^{(1)}(x)\,dx$
	$\displaystyle=\lambda\int_{\mathbb{R}}h^{2}(x)dx+\int_{B^{2}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}\rho^{(2)}(x-y)dxdy$	(89)
	$\displaystyle=\lambda\cdot 1+\int_{B^{2}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy$
	$\displaystyle+\lambda^{2}\int_{B^{2}}h(x)h(y)e^{-i2\pi{k}\cdot(x-y)}dxdy$
	$\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy+\lambda^{2}h(B,{k})h(B,-{k})$
	$\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy+\lambda^{2}\left\|H({k})\right\|^{2}$
	$\displaystyle=\lambda+{\mathbf{E}}I^{t,(2)}({k})+\lambda^{2}\left\|H({k})\right\|^{2},$	(90)

this defining the quantity ${\mathbf{E}}I^{t,(2)}({k}).$

The expression ${\mathbf{E}}I^{t,(2)}({k})$ mirrors what we can see in (56). We see directly the benefits of tapering– $|H({k})|$ will decay rapidly from zero so there will be no effect from the third term–there will be less blurring in the second term apart from very close to ${k}=0$ .

From which we see that we can re-express this understanding in the Fourier domain, again using the convolution theorem. We define

{\mathbf{E}}I^{t,(2)}({k})=\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy.

To explore this further we again do a change of variables:

{\mathbf{E}}I^{t,(2)}({k})=\int_{\mathbb{R}^{d}}\int_{B_{z}}h(x)h(x-z)e^{-i2\pi{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dxdz.

We define the inner product window using that $h(x)$ is compactly supported to get

W_{h}(z)=\int_{B_{z}}h(x)h(x-z)dx=\int_{{\mathbb{R}}^{d}}h(x)h(x-z)dx=\int_{{\mathbb{R}}^{d}}\left|H({k})\right|^{2}e^{2\pi i{k}\cdot z}\,d{k}.

We could also divide by $|B\cap B_{-z}|h(x)h(x-z)$ as long as we are inside the domain. Outside $B$ we cannot, and so this is why we cannot remove all bias in the spectrum even if we bias–correct as suggested by [34].

With this window we can note that

{\mathbf{E}}I^{t,(2)}({k})=\lambda^{2}\int_{\mathbb{R}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}.

(91)

Ideally we would like $\left|H({k})\right|^{2}=\delta({k})$ , but this is not possible as $|B|$ is finite. If we assume $\widetilde{f}({k})$ is smooth and possessing two derivatives then if $B$ is growing we can obtain nearly unbiasedness. We now assume that $\left|H({k})\right|^{2}$ is concentrated to a region in wavenumber ${\cal W}$ and again use a Taylor expansion. There are two approaches to this, either [60] or [71].

The systematic bias: $\lambda^{2}h(B,{k})h(B,-{k})=\left|H({k})\right|^{2}$ replaces the sinc functions for the square box. As we have chosen the function $h$ to be well–concentrated [71, 60, 76], the effect of this is limited.

We assume that

\int_{{\cal W}}\left|H({k})\right|^{2}\;d{k}=1-\varepsilon_{l},\quad{\mathrm{where}}\quad\varepsilon_{l}=o(1).

(92)

We also assume that $\tilde{f}({k})$ is upper bounded by $\|\widetilde{f}\|_{0}$ . We then have

\displaystyle{\mathbf{E}}I^{t,(2)}({k})

\displaystyle=\lambda^{2}\int_{\mathbb{R}^{d}\backslash{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}+\lambda^{2}\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}.

(93)

We note that

\int_{\mathbb{R}^{d}\backslash{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\leq\|\widetilde{f}\|_{0}\{1-\{1-\varepsilon_{l}\}\}.

We can yet again utilise the Taylor expansion of (62) inside ${\cal W}$ . We find

$\displaystyle{\mathbf{E}}I^{t,(2)}({k})$	$\displaystyle=\lambda^{2}\int_{{\cal W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\left\{1+o(1)\right\}$
	$\displaystyle=\lambda^{2}\int_{{\cal W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\left\{\widetilde{f}({k})+\nabla\widetilde{f}({k})^{T}\left({k}^{\prime}-{k}\right)+\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\right\}\;d{k}^{\prime}\left\{1+o(1)\right\}$
	$\displaystyle=\lambda^{2}\widetilde{f}({k})+0+\lambda^{2}\int_{{\cal W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\;d{k}^{\prime}.$	(94)

We note that

	$\displaystyle\\|\int_{{\cal W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\;d{k}^{\prime}\\|^{2}$
	$\displaystyle\leq\|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\|\int_{{\cal W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\left({k}^{\prime}-{k}\right)\;d{k}^{\prime}$
	$\displaystyle=\|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\|\int_{{\cal W}}\left\|H({k})\right\|^{2}w^{2}d{k}={\cal O}(1/l).$

Then

	$\displaystyle{\mathbf{E}}I^{t}({k})$	$\displaystyle=\lambda+{\mathbf{E}}I^{t,(2)}({k})+\lambda^{2}\left\|H({k})\right\|^{2}$
		$\displaystyle=f({k})+0+{\cal O}(1/l)+\lambda^{2}\left\|H({k})\right\|^{2}.$

We have that $\left|H({k})\right|^{2}$ decays in $\|{k}\|$ . If ${k}\notin{\mathcal{W}}$ then the influence of this term becomes negligible. ∎

Appendix F Proof of Theorem 5.1

Proof.

For brevity we define the notation $\phi_{k}(x)=e^{-2\pi i{k}\cdot x}$ . From first principles using Campbell’s theorem we may deduce that the uncentred second moment of $I^{t}_{{m}}({k}_{1})$ with $I^{t}_{{m^{\prime}}}({k}_{2})$ takes the form of

$\displaystyle{\mathbf{E}}\left\{I^{t}_{{m}}({k}_{1})I^{t}_{{m^{\prime}}}({k}_{2})\right\}=$	$\displaystyle{\mathbf{E}}\left[\sum^{\neq}_{x,y\in X}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)+\sum_{x\in X}h_{{m}}^{2}(x)\right]\left[\sum_{z,v\in X}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)+\sum_{q\in X}h_{{m^{\prime}}}^{2}(q)\right]$
$\displaystyle=$	$\displaystyle{\mathbf{E}}\left\{\sum^{\neq}_{x,y\in X}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\sum_{z,v\in X}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)\right\}$
	$\displaystyle+{\mathbf{E}}\left\{\sum_{x\in X}h_{{m}}^{2}(x)\sum^{\neq}_{y,z\in X}h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(z)\phi_{{k}_{2}}(y-z)+\sum_{x\in X}h_{{m^{\prime}}}^{2}(x)\sum^{\neq}_{y,z\in X}h_{{m}}(y)h_{{m}}(z)\phi_{{k}_{1}}(y-z)\right\}$
	$\displaystyle+{\mathbf{E}}\left[\sum_{x\in X}h_{{m}}^{2}(x)\sum_{y\in X}h_{{m^{\prime}}}^{2}(y)\right]$
$\displaystyle=$	$\displaystyle V_{1}({k}_{1},{k}_{2})+V_{2}({k}_{1},{k}_{2})+V_{3}.$	(95)

This latter equation defines the three terms $V_{1}({k}_{1},{k}_{2})$ , $V_{2}({k}_{1},{k}_{2})$ and $V_{3}$ (where the latter does not depend on ${k}_{1}$ and ${k}_{2}$ even if $V_{2}$ is the sum of two terms, one only depending on ${k}_{1}$ and the other only depending on ${k}_{2}$ ). We now need to further split these terms up in order to learn about the cases when we can have repeats of locations or not. We start with $V_{1}$ . For convenience we suppress “ $\in X$ ” notation. First, we note:

	$\displaystyle V_{1}({k}_{1},{k}_{2})$	$\displaystyle={\mathbf{E}}\left\{\sum^{\neq}_{x,y}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\sum_{z,v}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)\right\}$
		$\displaystyle={\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x\neq v\right)\mathbf{1}\left(y\neq z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)$
		$\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x=z,x\neq v\right)\mathbf{1}\left(y\neq z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-v)$
		$\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x=v\right)\mathbf{1}\left(y\neq z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)$
		$\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x\neq v\right)\mathbf{1}\left(y=z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)$
		$\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x\neq v\right)\mathbf{1}\left(y\neq z,y=v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)$
		$\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x=z,x\neq v\right)\mathbf{1}\left(y=v,y\neq z\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)$
		$\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x=v,x\neq z\right)\mathbf{1}\left(y=z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v).$

We have here assumed that $h_{{m}}(x)$ is compactly supported on $B$ for all values of $p$ used. Using the simplification implied by (9) we obtain the expression of

	$\displaystyle V_{1}({k}_{1},{k}_{2})$	$\displaystyle={\mathbf{E}}\left\{\sum^{\neq}_{x,y}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\sum_{z,v}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)\right\}$
		$\displaystyle=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)dxdydvdz$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-v)dxdydv$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-x)dxdydz$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-v)dxdydv$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-y)dxdydz$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-y)dxdy$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-x)dxdy.$

We will be able to use the orthogonality between the data tapers $\{h_{{m^{\prime}}}(x)\}$ but this will be easier in the Fourier domain where we can assume local smoothness of the Fourier transform of $\rho^{(n)}(\dots)$ , and carry out the usual Taylor series. Now we start by implementing the calculations for the Poisson case, to see the mechanics.

The next term in the expansion is then

	$\displaystyle V_{2}$	$\displaystyle({k}_{1},{k}_{2})={\mathbf{E}}\left\{\sum_{v}h_{{m}}^{2}(v)\sum^{\neq}_{x,y}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)+\sum_{x}h_{{m^{\prime}}}^{2}(x)\sum^{\neq}_{v,y}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(v)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\right\}$
		$\displaystyle=\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\;dxdydv+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)h_{{m}}^{2}(x)\phi_{{k}_{2}}(x-y)\;dxdy$
		$\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\;dxdy+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\;dxdydv$
		$\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\;dxdy+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(y)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\;dxdy.$

The final term is in turn using (9)

	$\displaystyle V_{3}$	$\displaystyle={\mathbf{E}}\left[\sum_{x}h_{{m}}^{2}(x)\sum_{y}h_{{m^{\prime}}}^{2}(y)\right]={\mathbf{E}}\sum_{x\neq y}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)+{\mathbf{E}}\sum_{x}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)$
		$\displaystyle=\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\,dx\,dy+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx.$		(96)

We can put these terms together to determine the covariance between the periodogram at two different wavenumbers and using two different tapers as follows:

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({k}_{1}),I^{t}_{{m^{\prime}}}({k}_{2})\}=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)f_{{k}_{1}}(x-y)f_{{k}_{2}}(z-v)dxdydvdz$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)f_{{k}_{1}}(x-y)f_{{k}_{2}}(x-v)dxdydv$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(x)f_{{k}_{1}}(x-y)f_{{k}_{2}}(z-x)dxdydz$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)f_{{k}_{1}}(x-y)f_{{k}_{2}}(y-v)dxdydv$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(y)f_{{k}_{1}}(x-y)f_{{k}_{2}}(z-y)dxdydz$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{1}}(x-y)f_{{k}_{2}}(x-y)dxdy$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(x)f_{{k}_{1}}(x-y)f_{{k}_{2}}(y-x)dxdy$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{2}}(x-y)dxdydv+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{2}}(x-y)dxdy$
		$\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{2}}(x-y)dxdy+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)f_{{k}_{1}}(x-y)dxdy$
		$\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)f_{{k}_{1}}(x-y)dxdy+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(y)h_{{m}}(x)h_{{m}}(y)f_{{k}_{1}}(x-y)dxdy$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\,dx\,dy+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx$
		$\displaystyle-\left\{\lambda+\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)e^{-i{{k}_{1}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}\left\{\lambda+\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)e^{-i{{k}_{2}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}$
		$\displaystyle=\int_{B^{4}}\rho^{(4)}(x,y,z,v)m^{(4)}(x,y,z,v)\,dxdydzdv+\int_{B^{3}}\rho^{(3)}(x,y,z)m^{(3)}(x,y,z)\,dxdydz$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)m^{(2)}(x,y)\,dxdy$
		$\displaystyle-\left\{\lambda+\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)e^{-i{{k}_{1}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}\left\{\lambda+\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)e^{-i{{k}_{2}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\},$

this defining $m^{(4)}(x,y,z,v)$ , $m^{(3)}(x,y,z)$ and $m^{(2)}(x,y)$ , respectively.

We start by noting that the multiplier of $\rho^{(3)}$ takes the form of

	$\displaystyle m^{(3)}(x,y,z)$	$\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-z)+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{-{k}_{2}}(x-z)$
		$\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-z)+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{-{{k}_{2}}}(y-z)$
		$\displaystyle+h_{{m}}^{2}(z)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)+h_{{m^{\prime}}}^{2}(z)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)$
		$\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-z)+\phi_{-{{k}_{2}}}(x-z)\right\}$
		$\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-z)+\phi_{-{{k}_{2}}}(y-z)\right\}$
		$\displaystyle+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(z)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}_{1}}(x-y)h_{{m^{\prime}}}^{2}(z)h_{{m}}(x)h_{{m}}(y)$

Looking at the positive term multiplying $\rho^{(2)}(x,y)$ we can simplify to

	$\displaystyle m^{(2)}(x,y)$	$\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-y)+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-x)$
		$\displaystyle+h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)+h_{{m}}^{2}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)$
		$\displaystyle+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{2}}(x-y)+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)$
		$\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{1}}(x-y)\left\{f_{{k}_{2}}(x-y)+f_{-{{k}_{2}}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)$
		$\displaystyle\times\left\{f_{{k}_{2}}(x-y)+f_{{k}_{1}}(x-y)\right\}+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\left\{f_{{k}_{2}}(x-y)+f_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)$
		$\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{k_{2}}}(x-y)\right\}$
		$\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\cdot\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y).$

Then it follows that we can simplify the final term:

	$\displaystyle\left\{\lambda+\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)e^{-i{{k}_{1}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}\left\{\lambda+\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)e^{-i{{k}_{2}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}$
	$\displaystyle=\lambda^{2}+\lambda\int_{B^{2}}(h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)+h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y))\rho^{(2)}(x-y)dxdy$
	$\displaystyle+\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\rho^{(2)}(x-y)dxdy\right\}\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\rho^{(2)}(x-y)dxdy\right\}$

We then get for the tapered periodogram:

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({{k}_{1}}),I^{t}_{{m^{\prime}}}({{k}_{2}})\}=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)\;dxdydvdz$
		$\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\right.$
		$\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}$
		$\displaystyle\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}_{1}}(x-y)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv$
		$\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right.$
		$\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\right]\,dxdy$
		$\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))\rho^{(2)}(x,y)dxdy$
		$\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\rho^{(2)}(x,y)dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\rho^{(2)}(x,y)dxdy\right\}.$

This gives the covariance between the tapered periodogram with different tapers, at different wavenumbers, and it quite a useful general expression, as the following expression will show.

∎

Appendix G Proof of Corollary 5.1 Part I

Proof.

This proposition both determines the variance of a single taper periodogram, and the cross-covariance between the tapers at a fixed wavenumber ${k}$ for a Poisson Process. If we start from the assumption of Poissonian (namely (42)) then we find

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({{k}}),I^{t}_{{m^{\prime}}}({{k}})\}=\int_{B^{4}}\lambda^{4}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\phi_{{k}}(z-v)\;dxdydvdz$
		$\displaystyle+\int_{B^{3}}\lambda^{3}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-v)+\phi_{-{{k}}}(x-v)\right\}\right.$
		$\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(y-v)+\phi_{-{{k}}}(y-v)\right\}$
		$\displaystyle\left.+\phi_{{k}}(x-y)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}}(x-y)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv$
		$\displaystyle+\int_{B^{2}}\lambda^{2}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-y)+\phi_{-{{k}}}(x-y)\right\}\right.$
		$\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}}(x-y)+\phi_{{k}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\right]\,dxdy$
		$\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)(\phi_{{k}}(x-y)+\phi_{{k}}(x-y))\lambda^{2}dxdy$
		$\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}}(x-y)\lambda^{2}dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\lambda^{2}dxdy\right\}.$

We now need to massage this expression in order to understand the correlation. We first note that the last and first terms cancel exactly, leaving us with

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({{k}}),I^{t}_{{m^{\prime}}}({{k}})\}=\lambda^{3}\int_{B^{3}}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-v)+\phi_{-{{k}}}(x-v)\right\}\right.$
		$\displaystyle\left.+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(y-v)+\phi_{-{{k}}}(y-v)\right\}\right]\;dxdydv$
		$\displaystyle+\lambda^{3}\int_{B^{2}}\phi_{{k}}(x-y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}}(x-y)h_{{m}}(x)h_{{m}}(y)\;dxdy$
		$\displaystyle+\lambda^{2}\int_{B^{2}}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-y)+\phi_{-{{k}}}(x-y)\right\}\right.$
		$\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.2\phi_{{k}}(x-y)\right]\,dxdy$
		$\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-2\lambda^{3}\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\;dxdy$
		$\displaystyle=\sum_{j=1}^{7}T_{j}({k}),$

this defining the sequence $\{T_{j}({k})\}$ . We need some extra relationships to simplify these expressions. We note that

		$\displaystyle\int_{B}h_{{m}}(x)h_{{m^{\prime}}}(x)\;dx=\delta_{{{m}}{{m^{\prime}}}}$		(97)
		$\displaystyle\int_{B}h_{{m}}(x)\phi_{2{k}}(x)\;dx\approx 0,\quad{k}>b_{h}>0,$		(98)
		$\displaystyle\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\approx 0\quad{k}>b_{h}>0,$		(99)

where $b_{h}$ is the bandwidth of the window defined in (43). We find

	$\displaystyle T_{1}({k})$	$\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-v)+\phi_{-{{k}}}(x-v)\right\}\;dxdydv$
		$\displaystyle=\lambda^{3}H_{{m}}(-{k})H_{{m^{\prime}}}(-{k})\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}+\lambda^{3}\delta_{{{m}}{{m^{\prime}}}}H_{{m}}({k})H_{{m^{\prime}}}(-{k}).$		(100)

The next term is

	$\displaystyle T_{2}({k})$	$\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(y-v)+\phi_{-{{k}}}(y-v)\right\}\;dxdydv$
		$\displaystyle=\lambda^{3}\delta_{{{m}}{{m^{\prime}}}}H_{{m}}({k})H_{{m^{\prime}}}(-{k})+\lambda^{3}H_{{m}}(-{k})H_{{m^{\prime}}}(-{k})\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}.$		(101)

The following term is

	$\displaystyle T_{3}({k})$	$\displaystyle=\lambda^{3}\int_{B^{2}}\phi_{{k}}(x-y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}}(x-y)h_{{m}}(x)h_{{m}}(y)\;dxdy$
		$\displaystyle=\lambda^{3}H_{{m^{\prime}}}({k})H_{{m^{\prime}}}(-{k})+\lambda^{3}H_{{m}}({k})H_{{m}}(-{k}).$

The next term is

	$\displaystyle T_{4}({k})$	$\displaystyle=\lambda^{2}\int_{B^{2}}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-y)+\phi_{-{{k}}}(x-y)\right\}\right]dxdy$
		$\displaystyle=\lambda^{2}\left\|\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\right\|^{2}+\lambda^{2}\delta_{{{m}}{{m^{\prime}}}}.$

The next term then takes the form

	$\displaystyle T_{5}({k})=2\lambda^{2}\int_{B^{2}}\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\phi_{{k}}(x-y)\,dxdy$
	$\displaystyle=2\lambda^{2}\left\{H_{{m^{\prime}}}(-{k})\int_{{\mathbb{R}}^{2d}}H_{{m}}({k}^{\prime\prime})H_{{m}}({k}^{\prime}-{k}^{\prime\prime})H_{{m^{\prime}}}({k}-{k}^{\prime})\,d{k}^{\prime}d{k}^{\prime\prime}+H_{{m}}(-{k})\int_{{\mathbb{R}}^{2d}}H_{{m^{\prime}}}({k}^{\prime\prime})H_{{m^{\prime}}}({k}^{\prime}-{k}^{\prime\prime})H_{{m}}({k}-{k}^{\prime})\,d{k}^{\prime}d{k}^{\prime\prime}\right\}.$

The next term takes the form of

\displaystyle T_{6}({k})

\displaystyle=\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx\sim\frac{\lambda}{|B|}.

And the final term is negative:

	$\displaystyle T_{7}({k})$	$\displaystyle=-2\lambda^{3}\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\;dxdy$
		$\displaystyle=-2\lambda^{3}\left\|\int H_{{m}}({k}-{k}^{\prime})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\right\|^{2}.$

As we have assumed the wavenumber ${k}$ is sufficiently large both $|H_{{m}}({k})|\approx|H_{{m^{\prime}}}({k})|\approx 0$ . This means the only surviving contributions are

{\mathrm{\bf Cov}}\{I^{t}_{{m}}({{k}}),I^{t}_{{m^{\prime}}}({{k}})\}\approx\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx+\lambda^{2}\delta_{{{m}}{{m^{\prime}}}}.

This yields the stated result. We can derive the $o(1)$ terms formally should we wish by quantifying the leakage in $l$ , the length of a side. ∎

Appendix H Proof of Corollary 5.1 Part II

Proof.

If we start from the assumption of Poissonian (namely (42)) then

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({{k}_{1}}),I^{t}_{{m}}({{k}_{2}})\}=\int_{B^{4}}\lambda^{4}h_{{m}}(x)h_{{m}}(y)h_{{m}}(z)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)\;dxdydvdz$
		$\displaystyle+\int_{B^{3}}\lambda^{3}\left[h_{{m}}^{2}(x)h_{{m}}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\right.$
		$\displaystyle+h_{{m}}(x)h_{{m}}^{2}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}$
		$\displaystyle\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)+\phi_{{k}_{1}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv$
		$\displaystyle+\int_{B^{2}}\lambda^{2}\left[h_{{m}}(x)h_{{m}}(y)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right.$
		$\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m}}(x)h_{{m}}(y)+h_{{m}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m}}^{2}(y)\right]\,dxdy$
		$\displaystyle+\lambda\int_{B}h_{{m}}^{4}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}^{2}(x)h_{{m}}^{2}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))\lambda^{2}dxdy$
		$\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\lambda^{2}dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{2}}(x-y)\lambda^{2}dxdy\right\}$

Continuing on with the calculations we find:

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{I^{t}_{{m}}({{k}_{1}}),I^{t}_{{m}}({{k}_{2}})\}=\int_{B^{3}}\lambda^{3}\left[h_{{m}}^{2}(x)h_{{m}}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\right.$
		$\displaystyle+h_{{m}}(x)h_{{m}}^{2}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}$
		$\displaystyle\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)+\phi_{{k}_{1}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv$
		$\displaystyle+\int_{B^{2}}\lambda^{2}\left[h_{{m}}^{2}(x)h_{{m}}^{2}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right.$
		$\displaystyle+\left\{h_{{m}}^{3}(x)h_{{m}}(y)+h_{{m}}^{3}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}\right]\,dxdy$
		$\displaystyle+\lambda\int_{B}h_{{m}}^{4}(x)\,dx-\lambda^{3}\int_{B^{2}}h_{{m}}^{2}(x)h_{{m}}^{2}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))dxdy.$

We like in the previous case split this into seven parts $\{\widetilde{T}_{j}({k})\}$ . We find that

	$\displaystyle\widetilde{T}_{1}({k})$	$\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}^{2}(x)h_{{m}}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\;dxdydv$
		$\displaystyle=\lambda^{3}H_{{m}}(-{k}_{1})H_{{m}}(-{k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}+{k}_{2}-{k}^{\prime})\;d{k}^{\prime}$
		$\displaystyle+\lambda^{3}H_{{m}}(-{k}_{1})H_{{m}}({k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}-{k}_{2}-{k}^{\prime})\;d{k}^{\prime}.$

If ${k}_{1}-{k}_{2},{k}_{1}+{k}_{2}\neq 0$ and is some fixed number, then this becomes negligible. The next term is

	$\displaystyle\widetilde{T}_{2}({k})$	$\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}(x)h_{{m}}^{2}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}dxdydv$
		$\displaystyle=\lambda^{3}H_{{m}}({k}_{1})H_{{m}}(-{k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{2}-{k}_{1}-{k}^{\prime})\;d{k}^{\prime}$
		$\displaystyle+\lambda^{3}H_{{m}}({k}_{1})H_{{m}}(-{k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{2}+{k}_{1}-{k}^{\prime})\;d{k}^{\prime}.$

If ${k}_{1}-{k}_{2},{k}_{1}+{k}_{2}\neq 0$ and is some fixed number, then this becomes negligible. The next term is

	$\displaystyle\widetilde{T}_{3}({k})$	$\displaystyle=\lambda^{3}\int_{B^{3}}\left[\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)+\phi_{{k}_{1}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv$
		$\displaystyle=\lambda^{3}\left\|H_{{{m}}}({k}_{2})\right\|^{2}+\lambda^{3}\left\|H_{{{m}}}({k}_{1})\right\|^{2}.$

As ${k}_{1},{k}_{2}>b_{h}$ the bandwidth this contribution becomes negligible. The next contribution in the expression takes the form

	$\displaystyle\widetilde{T}_{4}({k})$	$\displaystyle=\int_{B^{2}}\lambda^{2}\left[h_{{m}}^{2}(x)h_{{m}}^{2}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right]dxdy$
		$\displaystyle=\lambda^{2}\left\|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}+{k}_{2}-{k}^{\prime})\;d{k}^{\prime}\right\|^{2}+\lambda^{2}\left\|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}-{k}_{2}-{k}^{\prime})\;d{k}^{\prime}\right\|^{2}.$

Then the next term is

\displaystyle\widetilde{T}_{5}({k})

\displaystyle=2\lambda^{2}\int_{B^{2}}h_{{m}}^{3}(x)h_{{m}}(y)\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}\right]\,dxdy.

Figuring out the size of this contribution is a little bit more complex. The integral will do a Fourier transform in $y$ of $h_{{m}}(y)$ in ${k}_{1}$ and ${k}_{2}$ , and also the conjugate will be taken. The Fourier transform in $y$ will be supported when $|{k}_{1}|<b_{h}$ and $|{k}_{2}|<b_{h}$ , but will not be supported when either magnitude gets too large. The second Fourier transform in $x$ will be somewhat more spread–this is because we are Fourier transforming $h_{{m}}^{3}(x)$ , which will be a third order convolution once Fourier transformed. But as long as we can use the joint concentration in $|{k}_{1}|<b_{h}$ and $|k_{2}|<b_{h}$ this will still be negligible.

The next term is given by

\displaystyle\widetilde{T}_{6}({k})

\displaystyle=\lambda\int_{B}h_{{m}}^{4}(x)\,dx=\lambda\|h_{{m}}\|_{4}^{4}=o(1).

To see why this is true, note that if $h_{{m}}(x)$ is constant then we can easily calculate the higher order norms. We find by using the Cauchy–Schwarz inequality that if $p\neq q$ that

$\displaystyle T_{6}^{2}({k})$	$\displaystyle=\left[\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\;dx\right]^{2}$
	$\displaystyle\leq\int_{B}h_{{m}}^{4}(x)\int_{B}h_{{m^{\prime}}}^{4}(x)$
$\displaystyle\Rightarrow 0\leq\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\;dx$	$\displaystyle\leq\max\{\int_{B}h_{{m}}^{4}(x),\int_{B}h_{{m^{\prime}}}^{4}(x)\}.$	(102)

We also note that

	$\displaystyle 1^{2}$	$\displaystyle=\left[\int_{B}1\cdot h^{2}(x)\;dx\right]^{2}\leq\int_{B}1^{2}\;dx\cdot\int_{B}h^{4}(x)\;dx$		(103)
	$\displaystyle\Rightarrow$	$\displaystyle\frac{1}{\|B\|}\leq\int_{B}h^{4}(x)\;dx.$		(104)

We now need to determine an upper bound. We then look at

\displaystyle\left[\int_{B}h^{4}(x)\;dx\right]^{2}

\displaystyle\leq\int_{B}1^{2}\;dx\cdot\int_{B}h^{8}(x)\;dx=|B|\cdot\int_{B}h^{8}(x)\;dx.

(105)

Then we have

\frac{1}{|B|}\leq\int_{B}h^{4}(x)\;dx\leq\sqrt{|B|\cdot\int_{B}h^{8}(x)\;dx}.

(106)

We can for any taper $h(x)$ calculate $\int_{B}h^{4}(x)\;dx$ explicitly. For the constant function we get $\frac{1}{|B|}$ . For tapers well–concentrated we would expect a similar decrease, but for any choice of taper we can calculate the value of the 4th norm explicitly.

Thus we understand a bit more about this term. Moving on to the next aspect of the computation, finally,

	$\displaystyle\widetilde{T}_{7}({k})$	$\displaystyle=-\lambda^{3}\int_{B^{2}}h_{{m}}^{2}(x)h_{{m}}^{2}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))dxdy$
		$\displaystyle=-\lambda^{3}\left\|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})\;d{k}^{\prime}\right\|^{2}-\lambda^{3}\left\|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{2}-{k}^{\prime})\;d{k}^{\prime}\right\|^{2}.$

This shows each individual contribution as long as $|{k}_{1}-{k}_{2}|,|{k}_{1}+{k}_{2}|>b_{h}$ . This completes the proof. ∎

Appendix I Proof of Proposition 5.1

Proof.

Assume $X$ satisfies the assumptions given for (25) and (31). Note from Proposition 4.1 that

$\displaystyle 0\leq{\mathrm{\bf Var}}\left\{J_{h}({k})\right\}$	$\displaystyle=\lambda+\lambda^{2}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})\left\|H({k}-{k}^{\prime})\right\|^{2}dw^{\prime}$
	$\displaystyle\leq\lambda+\lambda^{2}\\|\widetilde{f}^{(2)}\\|_{0}\int_{R^{d}}\left\|H({k}-{k}^{\prime})\right\|^{2}dw^{\prime}$
	$\displaystyle=\lambda+\lambda^{2}\\|\widetilde{f}^{(2)}\\|_{0}<\infty.$	(107)

We can then deduce from [22, Theorem 6.1] that $\tilde{J}_{h}({k})$ is uniformly integrable. However to be able to compute the covariance of the periodogram from the convergence of law to the Gaussian, then we need to show that $|\tilde{J}_{{m}}({k})|^{4}$ , or even $|\tilde{J}_{{m}}({k}_{1})|^{2}|\tilde{J}_{{m^{\prime}}}({k}_{2})|^{2}$ are uniformly integrable. We now apply [22, Theorem 6.2] and assume $\|\widetilde{f}^{(2)}\|_{0}$ , $\|\widetilde{f}^{(3)}\|_{0}$ , $\|\widetilde{f}^{(4)}\|_{0}$ , $\|\widetilde{f}^{(5)}\|_{0}$ and $\|\widetilde{f}^{(6)}\|_{0}$ are all finite which assures $|\tilde{J}_{{m}}({k})|^{4}$ and $|\tilde{J}_{{m}}({k}_{1})|^{2}|\tilde{J}_{{m^{\prime}}}({k}_{2})|^{2}$ are uniformly integrable. We can then deduce that as $\tilde{J}_{{m}}({k})$ has converged in law to a Gaussian random variable, the moments of $\tilde{J}_{{m}}({k})$ can be computed from the Gaussian law.

It follows that Isserlis’ [40] theorem can be applied by using [22, Theorem 6.2] and so

	$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\{\widetilde{I}^{t}_{{m}}({k}_{1}),\widetilde{I}^{t}_{{m^{\prime}}}({k}_{2})\}={\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m}}({k}_{1})\widetilde{J}_{{m^{\prime}}}({k}_{2})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}+o(1)-{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m}}({k}_{1})\}{\mathbf{E}}\{\widetilde{J}_{{m^{\prime}}}({k}_{2})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}$
		$\displaystyle={\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}_{{m^{\prime}}}({k}_{2})\}{\mathbf{E}}\{\widetilde{J}^{\ast}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}+{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}{\mathbf{E}}\{\widetilde{J}^{\ast}_{{m}}({k}_{1})\widetilde{J}_{{m^{\prime}}}({k}_{2})\}+o(1)$
		$\displaystyle=o(1)+\left\|{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}\right\|^{2}.$

We note that the same sort of calculations as Proposition 4.1 can be applied and so for ${k}_{1}\neq{k}_{2}$

	$\displaystyle{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}$	$\displaystyle={\mathbf{E}}\{({J}_{{m}}({k}_{1})-\lambda H_{{m}}({k}_{1}))({J}^{\ast}_{{m^{\prime}}}({k}_{2})-\lambda H^{\ast}_{{m^{\prime}}}({k}_{1}))\}$
		$\displaystyle={\mathbf{E}}\{{J}_{{m}}({k}_{1}){J}^{\ast}_{{m^{\prime}}}({k}_{2})\}-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2}).$

We now calculate the covariance as

$\displaystyle{\mathrm{\bf Cov}}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}=$	$\displaystyle{\mathbf{E}}\{{J}_{{m}}({k}_{1}){J}^{\ast}_{{m^{\prime}}}({k}_{2})\}-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2})$
$\displaystyle=$	$\displaystyle\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy$
	$\displaystyle-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2}).$	(108)

We now define the renormalised quantity

\widetilde{\rho}^{(2)}(z)=\frac{{\rho}^{(2)}(z)-\lambda^{2}}{\lambda^{2}},\quad z\in{\mathbb{R}}^{d}.

(109)

The expression in (108) then can be simplified to

$\displaystyle{\mathrm{\bf Cov}}$	$\displaystyle\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}$
	$\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2})$
	$\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy$
	$\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\},$	(110)

where we define

{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}\equiv\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy.

(111)

To simplify this expression we note that

$\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}$	$\displaystyle=\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy$
	$\displaystyle=\iint_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{2\pi i(x-y)\cdot{k}^{\prime}}d{k}^{\prime}\cdot h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy$
	$\displaystyle=\iint_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{-2\pi ix\cdot({k}_{1}-{k}^{\prime})}e^{2\pi iy\cdot({k}_{2}-{k}^{\prime})}d{k}^{\prime}\cdot h_{{m}}(x)h^{\ast}_{{m^{\prime}}}(y)\,dx\,dy$
	$\displaystyle=\iint_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})d{k}^{\prime}.$	(112)

Note that it is not dimensionally contradictory to Theorem 5.1 as the periodogram is the modulus square of the Fourier transform. We note that as $H_{{m}}({k})$ has concentrated support we can apply similar arguments to those of Appendix D, as will follow. ∎

Appendix J Proof of Lemma J.1

Lemma J.1.

Assume $X$ satisfies the assumptions given for (25) and that $\|\widetilde{f}^{(j)}\|_{0}<\infty$ for $j=2,3,4,5,6$ . Assume the two multitapers $h_{{m}}(x)$ and $h_{{m^{\prime}}}(x)$ are orthogonal and are well concentrated on the compact set ${\cal W}\subset{\mathbb{R}^{d}}$ with length scale $l$ so that for some chosen $\epsilon_{l}=o(1/l)$

\int_{\cal W}|H_{{m}}({k})|^{2}\;d{k}=1-\epsilon_{l}.

Then assuming ${k}_{1},{k}_{2}\in{\cal K}_{M}(B_{\square}(\bm{l}))$

	$\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}$	$\displaystyle\equiv\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})d{k}^{\prime}+o(1)$
		$\displaystyle=\widetilde{f}^{(2)}({k}_{1})\delta_{{{m^{\prime}}}{{m}}}\delta_{{k}_{1}{k}_{2}}+{\cal O}(1/l).$		(113)

Proof.

We assume that for a choice of $\varepsilon_{l}$ we can define a wavenumber region ${\cal W}$

\int_{{\cal W}}\left|H_{{m}}({k})\right|^{2}\;d{k}=1-\varepsilon_{l},\quad{\mathrm{where}}\quad\varepsilon_{l}=o(1).

(114)

We also assume that $\tilde{f}({k})$ is upper bounded by $\|\widetilde{f}\|_{0}$ . We then have

\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}

\displaystyle=\int_{\mathbb{R}^{d}\backslash{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}+\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}.

(115)

We note that

\left|\int_{\mathbb{R}^{d}\backslash{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\right|\leq\|\widetilde{f}\|_{0}\{1-\{1-\varepsilon_{l}\}\}.

We can yet again utilise the Taylor expansion of (62) inside ${\cal W}$ . We find

	$\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}=\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\left\{1+o(1)\right\}$
	$\displaystyle=\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\left\{\widetilde{f}({k}_{1})+\nabla\widetilde{f}({k}_{1})^{T}\left({k}^{\prime}-{k}_{1}\right)+\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\right\}\;d{k}^{\prime}\left\{1+o(1)\right\}$
	$\displaystyle=\delta_{{k}_{1}{k}_{2}}\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{1}-{k}^{\prime})\widetilde{f}({k}_{1})\;d{k}^{\prime}+\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})$
	$\displaystyle\times\left\{\nabla\widetilde{f}({k}_{1})^{T}\left({k}^{\prime}-{k}_{1}\right)+\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\right\}\;d{k}^{\prime}\left\{1+o(1)\right\}$
	$\displaystyle=\widetilde{f}({k}_{1})\delta_{{{m^{\prime}}}{{m}}}\delta_{{k}_{1}{k}_{2}}+{\cal O}(1/\sqrt{l})+\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\;d{k}^{\prime}.$		(116)

We note that

	$\displaystyle\left\\|\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\;d{k}^{\prime}\right\\|^{2}$
	$\displaystyle\leq\|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\|\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\left({k}^{\prime}-{k}_{1}\right)\;d{k}^{\prime}$
	$\displaystyle=\|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\|\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})w^{2}d{k}={\cal O}(1/l).$

Then

	$\displaystyle{\mathrm{\bf Cov}}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}$	$\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}$
		$\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}\widetilde{f}({k})\delta_{{{m^{\prime}}}{{m}}}\delta_{{k}_{1}{k}_{2}}+{\cal O}(1/\sqrt{l})+{\cal O}(1/l).$

∎

Appendix K Proof of Proposition 7.2

Proof.

An interesting question is what if we use Diggle’s estimator even if the point process is not isotropic. We recall that the estimator takes in 2D the form

\bar{I}_{0}(|{k}|)=\hat{\lambda}+|B|^{-1}\sum_{x,y\in X\cap B}^{\neq}J_{0}\left(2\pi|{k}|\cdot\|x-y\|\right).

(117)

We can still compute the estimator for any observed point-process $X$ , even if $X$ was not an isotropic process. The estimator $\bar{I}_{0}(|{k}|)$ has expectation

$\displaystyle{\mathbf{E}}\{\bar{I}_{0}(\|{k}\|)\}$	$\displaystyle={\lambda}+\|B\|^{-1}\sum_{x,y\in X\cap B}^{\neq}{\mathbf{E}}J_{0}\left(2\pi\|{k}\|\cdot\\|x-y\\|\right)$	(118)
	$\displaystyle={\lambda}+\|B\|^{-1}\int_{B}\int_{B_{-x}}J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)\rho^{(2)}(z)\,dz\,dx$
	$\displaystyle={\lambda}+\|B\|^{-1}\int_{B}\int_{B_{-x}}J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\,dz\,dx+\lambda^{2}\|B\|^{-1}\int_{B}\int_{B_{-x}}J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)\,dz\,dx$
	$\displaystyle={\lambda}+\|B\|^{-1}\int_{{\mathbb{R}}^{d}}\|B\times B_{-z}\|\cdot J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\,dz+\lambda^{2}\|B\|^{-1}\int_{B}\|B\times B_{-z}\|\cdot J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)\,dz$
	$\displaystyle={\mathbf{E}}\bar{I}_{0}^{(1)}(\|{k}\|)+{\mathbf{E}}\bar{I}_{0}^{(2)}(\|{k}\|)+{\mathbf{E}}\bar{I}_{0}^{(3)}(\|{k}\|),$	(119)

the latter defining the form of these three contributions.

Just like before we shall explicitly demonstrate the effects of this convolution. We have that the Fourier transform of the Bessel function is

$\displaystyle{\cal F}\left\{J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)\right\}(u)$	$\displaystyle=\iint J_{0}\left(2\pi\|{k}\|\cdot\|z\|\right)e^{-2\pi iz\cdot u}dz$
	$\displaystyle=\int_{0}^{\infty}\int_{0}^{2\pi}J_{0}\left(2\pi\|{k}\|r\right)e^{-2\pi ir\|u\|\cos(\phi-\phi_{u})}2\pi\|{k}\|d\|{k}\|d\phi$
	$\displaystyle=\int_{0}^{\infty}J_{0}\left(2\pi\|{k}\|r\right)2\pi\|{k}\|d\|{k}\|\int_{0}^{2\pi}e^{-2\pi i\|{k}\|\|u\|\cos(\phi-\phi_{u})}d\phi.$	(120)

We now note that

J_{0}(x)=\frac{1}{2\pi}\int_{-\pi}^{\pi}e^{-ix\sin t}dt.

Thus

\displaystyle{\cal F}\left\{J_{0}\left(2\pi|{k}|\cdot|z|\right)\right\}(u)=\int_{0}^{\infty}J_{0}\left(2\pi|{k}|r\right)(2\pi)^{2}J_{0}\left(2\pi r|u|\right)\;rdr=(2\pi)^{2}\frac{\delta(|{k}|-|u|)}{|{k}|}.

We now use the convolution theorem to deduce that:

	$\displaystyle{\mathbf{E}}\bar{I}_{0}^{(3)}(\|{k}\|)$	$\displaystyle=\|B\|^{-1}\int_{B}\|B\times B_{-z}\|\cdot J_{0}\left(2\pi\|{k}\|\|z\|\right)\,dz$
		$\displaystyle=\int_{{\mathbb{R}}^{d}}\frac{\delta(\|{k}\|-\|u\|)}{\|{k}\|}\|B\|^{-1}T(B,u)\;du.$		(121)

Thus the low magnitude wavenumber bias is determined by this term. The reason this is a low wavenumber term is the form of $|T(B,u)|$ : this is concentrated near $|u|=0$ , and on top the convolution is aggregating over all wavenumbers with the same modulus. We have assumed rectangular sampling domain.

The second term in the expectation of (119) takes the form of

{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|)=|B|^{-1}\int_{{\mathbb{R}}^{d}}|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}||z|\right)\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\,dz.

As multiplications in wavenumber are convolutions in space, we need to compute

$\displaystyle{\cal F}\left\{\|B\|^{-1}\cdot\|B\times B_{-z}\|\cdot J_{0}\left(2\pi\|{k}\|\|z\|\right)\right\}(u)$	$\displaystyle=\int_{{\mathbb{R}}^{d}}\|B\|^{-1}\cdot\|B\times B_{-z}\|\cdot J_{0}\left(2\pi\|{k}\|\|z\|\right)e^{-2\pi iu^{T}z}\;dz$
	$\displaystyle=\int_{{\mathbb{R}}^{d}}\frac{\delta(\|{k}\|-\|u^{\prime}\|)}{\|{k}\|}\|B\|^{-1}T(B,u^{\prime}-u)\;du^{\prime}$
$\displaystyle{\cal F}\left\{\rho^{(2)}(z)-\lambda^{2}\right\}(u)$	$\displaystyle=\lambda^{2}\widetilde{f}(u).$	(122)

With these pieces we have that

{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|)=\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}\lambda^{2}\widetilde{f}(u^{\prime})\frac{\delta(|{k}|-|u^{\prime\prime}|)}{|{k}|}|B|^{-1}T(B,u^{\prime\prime}-u^{\prime})\;du^{\prime\prime}du^{\prime}.

(123)

We now see that further blurring is present in (123) from averaging out the direction. In the standard non-isotropic case this was just a convolution of $\widetilde{f}({k})$ with $|B|^{-1}T(B,{k})$ . To get a feeling for its behaviour we note that as $|B|^{-1}T(B,{k})\rightarrow\delta({k})/(2\pi)$ . In this case we get for $d=2$

\displaystyle{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|)

\displaystyle\rightarrow\int_{{\mathbb{R}}^{2}}\int_{{\mathbb{R}}^{2}}\lambda^{2}\widetilde{f}(u^{\prime})\frac{\delta(|{k}|-|u^{\prime\prime}|)}{2\pi|{k}|}\delta(u^{\prime\prime}-u^{\prime})\;du^{\prime\prime}du^{\prime}.

So this expression is what follows; an orientationally averaged spectral density. Now assume additionally that the spectrum is isotropic, namely $\widetilde{f}(u^{\prime})=\widetilde{f}_{0}(|u^{\prime}|)$ , and then we get as $|B|\rightarrow\infty$ ,

$\displaystyle{\mathbf{E}}\bar{I}_{0}^{(2)}(\|{k}\|)$	$\displaystyle\rightarrow\int_{{\mathbb{R}}^{2}}\lambda^{2}\widetilde{f}_{0}(\|u^{\prime\prime}\|)\frac{\delta(\|{k}\|-\|u^{\prime\prime}\|)}{2\pi\|{k}\|}\;du^{\prime\prime}$
	$\displaystyle=\lambda^{2}\int_{{\mathbb{R}}^{+}}\widetilde{f}_{0}(u^{\prime\prime})\frac{\delta(\|{k}\|-\|u^{\prime\prime}\|)}{\|{k}\|}\|u^{\prime\prime}\|\;d\|u^{\prime\prime}\|$
	$\displaystyle=\widetilde{f}_{0}(\|{k}\|).$	(124)

This shows that asymptotically we would recover the isotropic spectral density from this component. Finally we can write

{\mathbf{E}}\bar{I}_{0}(|{k}|)=\lambda+\widetilde{f}_{0}(|{k}|)+o(1)+\int_{{\mathbb{R}}^{d}}\frac{\delta(|{k}|-|u|)}{|{k}|}|B|^{-1}T(B,u)\;du.

∎

Appendix L Additional Figures

References

[1] L. D. Abreu and J. L. Romero, (2017). MSE estimates for multitaper spectral estimation and off-grid compressive sensing, IEEE Transactions on Information Theory, 63(12), 7770–7776.
[2] C. Andersson, T. Rajala, and Aila Särkkä, (2018). Hierarchical models for epidermal nerve fiber data. Statistics in medicine, 37(3), 357–374.
[3] A. Baddeley, E. Rubak, and R. Turner. Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, London, 2015.
[4] S. Bandyopadhyay and S. N. Lahiri, (2009). Asymptotic properties of discrete Fourier transforms for spatial data. Sankhya: The Indian Journal of Statistics, Series A, page 221–259, 2009.
[5] M. S. Bartlett, (1963). The spectral analysis of point processes. Journal of the Royal Statistical Society: Series B (Methodological), 25(2), 264–281.
[6] M. S. Bartlett, (1964). The spectral analysis of two-dimensional point processes. Biometrika, 51(3/4), 299–311.
[7] C. A. N. Biscio and R. Waagepetersen, (2019). A general central limit theorem and a subsampling variance estimator for $\alpha$ -mixing point processes. Scandinavian Journal of Statistics, 46(4), 1168–1190.
[8] D. R. Brillinger, (1974). Fourier analysis of stationary processes. Proceedings of the IEEE, 62(12), 1628–1643.
[9] T. P. Bronez, (1988). Spectral estimation of irregularly sampled multidimensional processes by generalised prolate spheroidal sequences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(12), 1862–1873.
[10] S.N. Chiu, D. Stoyan, W.S. Kendall, and J. Mecke. Stochastic Geometry and Its Applications. Wiley Series in Probability and Statistics. Wiley, 2013. ISBN 9781118658253.
[11] I.–S. Chou and K.–S. Lii, (1992). Spectral analysis of random fields with random sampling. In Probabilistic and Stochastic Methods in Analysis, with Applications, page 343–368. Springer Verlag, Berlin.
[12] D. E. Clark, (2020). Local entropy statistics for point processes. IEEE Transactions on Information Theory, 66(2), 1155–1163.
[13] D. E. Clark, (2022). A Cramér–Rao bound for point processes. IEEE Transactions on Information Theory, 68(4), 2147–2155.
[14] L. Cohen, (1995). Time-frequency analysis, volume 778. Prentice hall.
[15] E. A. K. Cohen, (2014). Multi-wavelet coherence for point processes on the real line. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), page 2649–2653.
[16] R. Condit, R. Pérez, S. Aguilar, S. Lao, R. Foster, and S. Hubbell, (2019). Complete data from the Barro Colorado 50-ha plot: 423617 trees, 35 years. URL https://doi.org/10.15146/5xcp-0d46.
[17] A. Connes, and H. Moscovici, (2022). The UV prolate spectrum matches the zeros of zeta. Proceedings of the National Academy of Sciences 119(22), e2123174119.
[18] R. Dahlhaus and H. Künsch, (1987). Edge effects and efficient parameter estimation for stationary random fields. Biometrika, 74(4), 877–882.
[19] D. J. Daley, (1971). Weakly stationary point processes and random measures. Journal of the Royal Statistical Society: Series B (Methodological), 33(3), 406–428.
[20] D. J. Daley and D. Vere-Jones, (2003). An introduction to the theory of point processes. Vol. I: Elementary Theory and Methods. Springer Verlag, Berlin, Germany.
[21] P. Das and B. Babadi. Multitaper spectral analysis of neuronal spiking activity driven by latent stationary processes. Signal Processing, 170:107429, 2020.
[22] A. Dasgupta, (2008). Asymptotic theory of statistics and probability. Springer Verlag, Berlin, Germany.
[23] P. J. Diggle, D. J. Gates and A. Stibbard, (1987). A nonparametric estimator for pairwise-interaction point processes. Biometrika, 74(4), 763–770.
[24] P. J. Diggle, (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC press.
[25] D. Durran, J. A. Weyn and M. Q. Menchaca, (2017). Practical considerations for computing dimensional spectra from gridded data. Monthly Weather Review, 145(9), 3901–3910.
[26] T. Duong and M. L. Hazelton, (2005). Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics, 32(3):485–506.
[27] R. M. Errico, 1985. Spectra computed from a limited area grid. Monthly weather review, 113(9), 1554–1562.
[28] R. Estrada, (2014). On radial functions and distributions and their Fourier transforms. Journal of Fourier Analysis and Applications, 20(2):301–320.
[29] A. J. Flügge, S. C. Olhede, and D. J. Murrell (2014). A method to detect subcommunities from multivariate spatial associations. Methods in Ecology and Evolution, 5(11), 1214-1224.
[30] P. F. Fougere, (1977). A solution to the problem of spontaneous line splitting in maximum entropy power spectrum analysis. Journal of geophysical research, 82(7), 1051–1054.
[31] A. C. Gatrell, T. C. Bailey, P. J. Diggle and B. S. Rowlingson, (1996). Spatial point pattern analysis and its application in geographical epidemiology. Transactions of the Institute of British geographers, 256–274.
[32] L. Grafakos and G. Teschl, (2013). On Fourier transforms of radial functions and distributions. Journal of Fourier Analysis and Applications, 19(1), 167–179.
[33] A. P. Guillaumin, A. M. Sykulski, S. C. Olhede and F. J. Simons, (2022). The Debiased Spatial Whittle likelihood. Journal of the Royal Statistical Society: Series B, 84(4), 1526–1557.
[34] X. Guyon, (1982). Parameter estimation for a stationary process on a d–dimensional lattice. Biometrika, 69(1), 95–105.
[35] J. Häbel, (2017). A three-dimensional anisotropic point process characterization for pharmaceutical coatings. Spatial Statistics, 22, 306–320.
[36] A. Hanssen, (1997). Multidimensional multitaper spectral estimation. Signal Processing, 58(3), 327–332.
[37] K. E. Harms, R. Condit, S. P. Hubbell, and R. B. Foster (2001). Habitat associations of trees and shrubs in a 50-ha neotropical forest plot. Journal of ecology, 89(6), 947-959.
[38] F. Ilhan and S. S. Kozat, (2020). Modeling of spatio-temporal hawkes processes with randomised kernels. IEEE Transactions on Signal Processing, 68, 4946–4958.
[39] J. Illian, A. Penttinen, H. Stoyan, H. and D. Stoyan, (2008). Statistical analysis and modelling of spatial point patterns, John Wiley & Sons Ltd.
[40] L. Isserlis, (1918). On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika 12(1–2), 134-139..
[41] G. Jacovitti and A. Neri, (2000). Multiresolution circular harmonic decomposition. IEEE Transactions on Signal Processing, 48(11), 3242–3247.
[42] N. L. Johnson and S. Kotz, (1970). Continuous univariate distributions – 2, John Wiley & Sons Ltd.
[43] J. Jowett, D. Vere–Jones and P. A. W. Lewis, (1972). The prediction of stationary point processes. In Stochastic point processes, Wiley Interscience.
[44] S. T. John and J. Hensman, (2018). Large-scale Cox process inference using variational Fourier features. International Conference on Machine Learning, 2362–2370.
[45] S. Karnik, Z. Zhu, M. B. Waking, J. Romberg and M. A. Davenport, (2019). The fast Slepian transform. Applied and Computational Harmonic Analysis, 46, 624–652.
[46] S. Karnik, J. Romberg and M. A. Davenport, (2022). Thomson’s multitaper method revisited. IEEE Transactions on Information Theory, 68(7), 4864–4891.
[47] J. T. Kent and M. V. Mardia, (1996). Spectral and circulant approximations to the likelihood for stationary and Gaussian random fields. J. Statistical Planning and Inference, 50(3):379–394,
[48] M. Lázaro-Gredilla, J. Quinonero-Candela, C. E. Rasmussen and A. R. Figueiras-Vidal, (2010). Sparse spectrum Gaussian process regression. J. of Machine Learning Research, 11, 1865–1881.
[49] Y. Li, F. Baccelli, H. S. Dhillon and J. G. Andrews, (2015). Statistical modeling and probabilistic analysis of cellular networks with determinantal point processes, IEEE Transactions on Communications, 63(9), 3405–3422.
[50] K. S. Lii and E. Masry, (1994). Spectral estimation of continuous-time stationary processes from random sampling. Stochastic processes and their applications, 52(1), 39–64.
[51] T.-C. Liu and B. Van Veen, (1992). Multiple window based minimum variance spectrum estimation for multidimensional random fields. IEEE Transactions on Signal Processing, 40(3):578–589.
[52] P. J. Loughlin, J.W. Pitton, and L. E. Atlas, (1993). Bilinear time-frequency representations: New insights and properties. IEEE Transactions on Signal Processing, 41(2), 750–767.
[53] K. V. Mardia and R. J. Marshall, (1984). Biometrika, 71(1), 135–146.
[54] K. S. Miller, (1974). Complex stochastic processes: an introduction to theory and application. Addison-Wesley, New York, USA.
[55] J. Møller, A. R. Syversveen and R. P. Waagepetersen, (1998). Log-Gaussian Cox processes. Scandinavian Journal of Statistics, 25(3), 451–482.
[56] M. A. Mugglestone and E. Renshaw, (1996). A practical guide to the spectral analysis of spatial point processes. Computational Statistics & Data Analysis, 21(1), 43–65.
[57] T. D. Novlan, H. S. Dhillon and J. G. Andrews, (2013). Analytical modeling of uplink cellular networks. IEEE Transactions on Wireless Communications, 12(6), 2669–2679.
[58] D. B. Percival and A. T. Walden, (1993). Spectral analysis for physical applications. Cambridge University Press, Cambridge, UK.
[59] O. Ponomarenko and Y. Perun. Spectral analysis of some classes of multivariate random fields with isotropic property. Theory of Stochastic Processes, 12:142–150, 2006.
[60] K. S. Riedel and A. Sidorenko. (1995). Minimum bias multiple taper spectral estimation. IEEE Transactions on Signal Processing, 43(1), 188–195.
[61] E. A. Robinson. (1982). A historical perspective of spectrum estimation. Proceedings of the IEEE, 70(9), 885–907.
[62] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020. URL https://www.R-project.org/.
[63] P. J. Schreier and L. L. Scharf, (2003). Second-order analysis of improper complex random vectors and processes. IEEE Transactions on Signal Processing, 51(3):714–725.
[64] P. J. Schreier and L. L. Scharf, (2010). Statistical signal processing of complex-valued data: the theory of improper and noncircular signals. Cambridge University Press.
[65] F. J. Simons and D. V. Wang, (2011). Spatiospectral concentration in the Cartesian plane. GEM-International Journal on Geomathematics, 2(1), 1–36.
[66] D. Slepian, (1964). Prolate spheroidal wave functions, Fourier analysis and uncertainty–iv: extensions to many dimensions; generalised prolate spheroidal functions. Bell Labs Technical Journal, 43(6):3009–3057.
[67] D. Slepian, (1983). Some comments on Fourier analysis, uncertainty and modeling. SIAM review, 25(3):379–393.
[68] M. L. Stein, (2012). Interpolation of Spatial Data: some theory for kriging. Springer-Verlag, Berlin, Germany.
[69] T. J. Theim, R. Y. Shirk and T. J. Givnish (2014). Spatial genetic structure in four understory Psychotria species (Rubiaceae) and implications for tropical forest diversity. American Journal of Botany, 101(7), 1189-1199.
[70] D. J. Thomson, (1982). Spectrum estimation and harmonic analysis, Proceedings of the IEEE, vol. 70, pp. 1055–1096.
[71] D. J. Thomson, (1990). Quadratic–inverse spectrum estimates: applications to palaeoclimatology. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 332 (1627):539–597, 1990.
[72] G. E. Tita and S. M. Radil, (2010). Making space for theory: the challenges of theorizing space and place for spatial analysis in criminology. Journal of Quantitative Criminology, 26(4), 467–479.
[73] J.–F. Ton, S. Flaxman, D. Sejdinovic and S. Bhatt, (2018). Spatial mapping with Gaussian processes and nonstationary Fourier features. Spatial statistics, 28, 59–78.
[74] S. Vembu, (1961). Fourier transformation of the $n$ -dimensional radial delta function. The Quarterly Journal of Mathematics, 12(1):165–168.
[75] D. Vere–Jones, (1974). An elementary approach to the spectral theory of stationary random measures. Stochastic Geometry, 65, 307.
[76] A. T. Walden, (2000). A uni ed view of multitaper multivariate spectral estimation. Biometrika, 87(4):767–788.
[77] R. Waagepetersen, Y. Guan, A. Jalilian and J. Mateu, (2016). Analysis of multispecies point patterns by using multivariate log-Gaussian Cox processes. Appl. Statist., 65, 77–96.
[78] Y. Xu, S. Haykin, and R. J. Racine, (1999). Multiple window time-frequency distribution and coherence of eeg using Slepian sequences and Hermite functions. IEEE Transactions on Biomedical Engineering, 46(7): 861–866.

$\displaystyle\overline{I}_{0}(t)$	$\displaystyle:=\frac{1}{\|S^{d-1}\|}\int_{S^{d-1}}{I}_{0}(tu)\,du$	(50)
	$\displaystyle=\frac{1}{\|B\|\|S^{d-1}\|}\sum_{x\in X\cap B}\sum_{y\in X\cap B}\int_{S^{d-1}}e^{-2\pi i(tu)\cdot(x-y)}\,du$
	$\displaystyle=\hat{\lambda}+\frac{1}{\|B\|\|S^{d-1}\|}\sum_{x,y\in X\cap B}^{\neq}\int_{S^{d-1}}e^{-2\pi ti(x-y)\cdot u}\ du$
	$\displaystyle=\hat{\lambda}+\frac{2\pi}{\|B\|\|S^{d-1}\|t^{d/2-1}}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{d/2-1}(2\pi t\\|x-y\\|)~{}\\|x-y\\|^{-(d/2-1)},$	(51)

$\displaystyle{\mathbf{E}}I_{0}({k})$	$\displaystyle=\lambda+\|B\|^{-1}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}\rho^{(2)}(z)dz$	(56)
	$\displaystyle=\lambda+\|B\|^{-1}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dzdx+\|B\|^{-1}\lambda^{2}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}dz$
	$\displaystyle=\lambda+\|B\|^{-1}\int_{{\mathbb{R}}^{d}}\|B\cap B_{-z}\|e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz+\|B\|^{-1}\lambda^{2}T(B,{k})$
	$\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}G^{(1)\ast}({k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime}-{k})\,d{k}^{\prime}+\|B\|^{-1}\lambda^{2}T(B,{k}),$

	$\displaystyle{\mathbf{E}}I_{0}({k})$	$\displaystyle=\lambda+\|B\|^{-1}\int_{{\mathbb{R}}^{d}}\|B\cap B_{-z}\|e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz+\|B\|^{-1}\lambda^{2}T(B,{k})$
		$\displaystyle=\lambda+\lambda^{2}\widetilde{f}({k})+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\frac{d}{l^{2-\beta}}\pi^{d-1}\left\{1+o(1)\right\}\right\}+\|B\|^{-1}\lambda^{2}T(B,{k})$
		$\displaystyle=\lambda+(-\lambda+f({k}))\left\{1+o(1)\right\}+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\frac{d}{l^{2-\beta}}\pi^{d-1}\left\{1+o(1)\right\}\right\}+\|B\|^{-1}\lambda^{2}T(B,{k})$
		$\displaystyle=f({k})\left\{1+o(1)\right\}+o(1)+\|B\|^{-1}\lambda^{2}T(B,{k}).$

$\displaystyle{\mathbf{E}}\{\widetilde{I}_{h}({k})\}$	$\displaystyle=\int_{{\mathbb{R}}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}f({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\int_{{\mathbb{R}}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}\left(\lambda+\lambda^{2}\widetilde{f}({k}^{\prime})\right)\,d{k}^{\prime}$
	$\displaystyle=\lambda+\lambda^{2}\int_{{\mathbb{R}}^{d}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\lambda+\lambda^{2}\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}+\lambda^{2}\int_{{\mathbb{R}}^{d}\backslash{\mathcal{W}}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\lambda+{\mathbf{E}}_{1}\{\widetilde{I}_{h}({k})\}+{\mathbf{E}}_{2}\{\widetilde{I}_{h}({k})\},$	(83)

$\displaystyle{\mathbf{E}}_{1}\{\widetilde{I}_{h}({k})\}$	$\displaystyle=\lambda^{2}\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}$
	$\displaystyle=\lambda^{2}\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}\left[\widetilde{f}({k})+\widetilde{f}^{[1]}({k})({k}^{\prime}-{k})+\frac{1}{2}\widetilde{f}^{[2]}({k})({k}^{\prime}-{k})^{2}+R^{[3]}({k},{k}^{\prime})\right]\,d{k}^{\prime}$
	$\displaystyle=\lambda^{2}(1-\varepsilon_{l})\widetilde{f}({k})+\frac{1}{2}\lambda^{2}\widetilde{f}^{[2]}({k})\int_{\mathcal{W}}\left\|H({k}^{\prime}-{k})\right\|^{2}({k}^{\prime}-{k})^{2}\,d{k}^{\prime}+\widetilde{R}^{[3]}({k}).$	(85)