This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

What is the Fourier Transform of a Spatial Point Process?

Tuomas Rajala, Sofia C. Olhede, Jake P. Grainger and David J. Murrell T. Rajala is with Natural Resources Institute Finland, 00790 Helsinki, Finland.S. C. Olhede and J. P. Grainger are with the Institute of Mathematics, EPFL, Lausanne, Switzerland.D. J. Murrell is with the Research Department of Genetics, Evolution and Environment, Centre for Biodiversity and Environment Research, University College London, UK.
Abstract

This paper determines how to define a discretely implemented Fourier transform when analysing an observed spatial point process. To develop this transform we answer four questions; first what is the natural definition of a Fourier transform, and what are its spectral moments, second we calculate fourth order moments of the Fourier transform using Campbell’s theorem. Third we determine how to implement tapering, an important component for spectral analysis of other stochastic processes. Fourth we answer the question of how to produce an isotropic representation of the Fourier transform of the process. This determines the basic spectral properties of an observed spatial point process.

Keywords:

Spectral density function, Spatial point processes, Debiased periodogram, Tapering.

1 Introduction

Spatial point processes are an important form of observational data structure, for example in forest ecology [77], communications networks [57, 49], epidemiology [31], social science [72], pharmacology [35] and medicine [2] amongst many other application fields. Understanding the properties of a point process can be approached from many different perspectives [24], and the aim of this paper is to determine how to extract the frequency (or scale/wavenumber) behaviour of an observed point process, as well as connect that to its spectral representation.

The most common assumption used when analysing spatial processes is that of spatial homogeneity (or stationarity). This means that if you shift all of the observations by a fixed spatial shift, the distribution of those observations does not change from the distribution of the original sample; and if the observational window is changed, then understanding the full set of observations remains tractable. The consequence of this probabilistic invariance in distribution is the spectral representation of a stochastic process. The existence of the spectral representation of a stochastic process means the Fourier transform fully characterises the second order properties of that stochastic process. The Fourier transform also characterises the second order properties of a point process, see e.g. [20]. Yet unlike random fields and time series, spectral analysis of point processes is still in its infancy, see also [5, 6, 56], and critically, the digital processing of a point process remains fully outstanding. Recent interest in Fourier features in machine learning based approaches for point patterns such as  [38, 44] show the potential of using Fourier based information as features for estimation and detection. The work in this manuscript both establishes what Fourier features to calculate for homogeneous processes from a sampling perspective, and their large but finite sampling area properties, just like [33] determined the large but finite properties of Fourier representations for random fields. We note that machine learning techniques [73, 48] utilised Fourier features in learning algorithms for parametric models before precise and general understanding was established.

Thus despite inspirational work by the aforementioned authors, several key aspects of discrete spectral analysis lie unresolved when applied to observed point processes. In particular, 1) given the Fast Fourier Transform is unavailable (unlike the case for regularly sampled time series or random fields), how do we efficiently implement the calculation of whichever discrete Fourier transform we chose to define? 2) How does that discrete transform relate to the underlying spectral measure of the process? 3) Can the discrete transform be improved by linear operations such as tapering as is the case for other stochastic processes? 4) If yes, how do we then select a taper? 5) How do we define a radial or isotropic transformation to simplify the representation of the spatial point process? There are many, many more questions to answer before bringing the spectral analysis of spatial point processes to the sophistication of the analysis of random fields, but these currently unresolved questions are positioned as the first hurdles to overcome in the path of whomever wishes to develop more sophisticated spectral analysis techniques. Once the fundamental properties of the spectral representation of a stationary point process have been developed and understood, results could be extended to inhomogeneous point processes, but that is not the aim of this work.

To put this in the context of information theory; understanding how to compute the Fourier transform from a spatially compact signal had already sparked a lengthy debate from the first introduction of tapering [70], and selection of optimal tapers using localization operators [67, 66, 17]. While some aspects of tapering in the context of point patterns is reflected by topics in mature (multi)dimensional tapering and spectral estimation [1, 45, 46, 33], understanding how to adapt these ideas to spatial point processes remains fully outstanding, and must be answered using our understanding of irregularly sampled processes [11]. Understanding and characterising point patterns is naturally a problem of general interest, e.g. [12, 13].

Why then do we want to understand the spectral information of a point pattern? The spectral information of any stochastic process is the same as the spatial information as there is a bijection between the two, but the former is directly showing the scale of variation of the process. To illustrate, let us simulate a complex point pattern and compare its spectral content with its usual spatial representation. Log-Gaussian Cox processes [55], popularly used in applications, are marginally stationary processes whose patterns are more variable than models that do not depend on latent random variables. The pattern (a single realization), and its estimated spectrum (the subject of this paper) are shown in Figure 1. We have chosen a process which is isotropic, a choice we have made so that its usual spatial representation and spectrum is isotropic also, and thus easier to interpret. In the left hand subplot we see the pattern itself, in the middle subplot we show a common spatial summary, the pair correlation function (pcf) comparing both the estimate and truth, and in the right hand subplot we show the spectrum on a decibel scale (10log1010\log_{10}), both estimated and true. While the raw pattern (left) looks mainly unremarkable, naturally its periodicity is present in the pcf (middle), but is more obvious in the spectrum (right). We can even estimate the periodicity as having frequency 1. This is immediately recoverable directly from its estimated spectrum.

Refer to caption
Figure 1: A realisation of a log-Gaussian Cox process (left), the estimated against true product density (middle), and estimated against true spectral density function on a decibel scale (right).

To get to the point where we can estimate the spectrum of a point pattern, we need to answer the five questions posed earlier. We will therefore start by determining how to compute Fourier transforms of point processes, discussing the question of how to form direct spectral estimators [58] in this setting. This will be answered in terms of the mean and (co)-variance of the different discrete Fourier transforms that we define. The first surprising result is that the natural direct spectral estimator is biased unless it is mean corrected (this bias was noted by [23], but only corrected in an ad-hoc fashion). We show how to perfectly eliminate this bias.

Second, to understand the variance and covariance of a direct spectral estimator, we need to calculate a fourth order moment of our choice for the discrete spectral transform. This is complicated since Campbell’s formula is required to derive its form for a point process. This approach can be contrasted with using Isserliss’ theorem to determine higher order moments for a Gaussian Process [40]. Calculating covariances between direct spectral estimators gives us a way to determine what grid of wavenumbers the spectral estimator is uncorrelated at, and thus where to evaluate it.

Third, tapering [70] is required to define the direct spectral estimator and avoid leakage; but how to taper a point pattern is an open question. Most tapers are defined for regularly spaced stochastic processes; but there are some continuous tapers. We choose to use the continuous tapers of Riedel and Sidorenko [60] to construct the direct spectral estimator, and use multitapers to reduce the variance of such estimators. We will determine how to implement both.

Fourth, many spatial models are radially symmetric, or isotropic. This prompts us to describe how to construct an isotropic representation of the spectral content of a point processes. There are two possible approaches, namely using the Bessel function [23] to do the transformation; or isotropizing a general spectral representation [25]. We describe how to define the appropriate tapering in this instance, inspired by other isotropic harmonic decompositions [41].

Calculating the Fourier transform of a point process is useful beyond the point pattern itself. If we sample a stochastic process with a point process then the spectrum of a point pattern is equally important to that of the stochastic process to determining the spectrum of the observations, e.g. [50]. For this reason; the study of the frequency information of an observed process is of interest in its own right, and beyond, and we will discuss the implications that arise for point processes.

In Section 2 we define the basic concepts required by the second order representation of a point pattern. In Section 3 we discuss tapering of the DFT and the direct moments of the DFT. In Section 4 we determine how to form a spectral density estimator from our understanding of how to form the DFT and its mean and variance. The next step is to study the covariance structure of such spectral density estimators, see Section 5. We then use that understanding to define how to do linear smoothing in Section 6. For multidimensional spectral representations, the full anisotropic structure can be hard to interpret; we therefore propose 1d isotropic or isotropised summaries to characterise such processes, in Section 7. We then present a simulation study in Section 8, and demonstrate the potential of the proposed methodology in an example from forest ecology in Section 9. We conclude with a discussion in Section 10.

2 Notation and Definitions

In this Section we shall give the basic notation necessary for the spectral analysis of a spatially homogeneous point process. We assume that we observe a spatially homogeneous point process XX with intensity λ=ρ(1)>0\lambda=\rho^{(1)}>0 and with ρ(2)(z)\rho^{(2)}(z) defined as the (second order) product density of XX. Assuming XX is spatially homogeneous means that ρ(2)(z)\rho^{(2)}(z) only has one argument (zdz\in{\mathbb{R}}^{d}, namely the spatial shift) rather than depending on two local spatial variables (say xx and yy rather than just z=xyz=x-y). We assume that XX is a simple point process on d{\mathbb{R}}^{d}, meaning no duplicate points are allowed. We take d=2,3,d=2,3,\dots, and so exclude the case of d=1d=1, which is relatively well-studied, see for example the references in [19, 43, 75]. For a (Borel) set AA in d{\mathbb{R}}^{d}, we define |A||A| for the volume of AA and N(A)=|XA|N(A)=|X\cap A| for the number of points of XdX\subset{\mathbb{R}}^{d} in AA. For set AA and zdz\in\mathbb{R}^{d}, denote the zz shifted set by Az={x+z:xA}A_{z}=\{x+z:x\in A\}. We denote by BB any subset of d{\mathbb{R}}^{d} where points are observed (or equivalently our observation domain). For any complex number zz we use the superscript zz^{\ast} to denote the conjugate of zz.

In complete analogue with a random field we shall define the spectral density of XX as the Fourier transform of the complete covariance function of XX. This is the standard approach, and was first proposed by [6]. The complete covariance function of a stationary point process is therefore

γ(z)λδ(z)+ρ(2)(z)λ2,zd,\gamma(z)\equiv\lambda\delta(z)+\rho^{(2)}(z)-\lambda^{2},\quad z\in{\mathbb{R}^{d}}, (1)

where δ()\delta(\cdot) is the Dirac delta function. The spectral density function (sdf) [6] of the point process XX is then defined as the Fourier transform of the complete covariance function:

f(k)[γ](k)=de2πikzγ(z)𝑑z=λ+de2πikz[ρ(2)(z)λ2]𝑑z,kd.f({k})\equiv\mathcal{F}[\gamma]({k})\ =\int_{{\mathbb{R}}^{d}}e^{-2\pi i{k}\cdot z}\gamma(z)\,dz\ =\ \lambda+\int_{{\mathbb{R}^{d}}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz,\qquad{k}\in{\mathbb{R}}^{d}. (2)

This representation is in direct analogy with the corresponding spectral decomposition of a random field or a time series. The symbol \mathcal{F} denotes the Fourier transform, with ii as the imaginary unit. We refer to the argument k{k} as the “wavenumber” rather than frequency to acknowledge its multi-dimensionality. This is a natural choice of a Fourier transform in analogy to the analysis of random fields [68]. To avoid additional constants in the inverse Fourier transforms, we parameterise the Fourier transform with wavenumber instead of the customary angular frequency ω=2πk{\omega}=2\pi{k} used by [6] and some others.

For a time series or random field there are a number of spectral results, ranging from the spectral representation theorem (I) [58, p130] and to Bochner’s (Herglotz) theorem (II) [58]. Both these results are not exactly available in the point process setting, but for a time series XtX_{t} we can note them:

Xt\displaystyle X_{t} =(I)μ+1212e2πikt𝑑Z(k),𝐕𝐚𝐫{dZ(k)}=(II)f(k)dk,\displaystyle\overset{(I)}{=}\mu+\int_{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i{k}t}dZ({k}),\quad{\mathrm{\bf Var}}\{dZ({k})\}\overset{(II)}{=}f({k})d{k}, (3)
γ(τ)\displaystyle\gamma(\tau) =(III)1212f(k)e2πikτ𝑑k,f(k)=(IV)τγ(τ)e2πikτ.\displaystyle\overset{(III)}{=}\int_{-\frac{1}{2}}^{\frac{1}{2}}f({k})e^{2\pi i{k}\tau}\,d{k},\quad f({k})\overset{(IV)}{=}\sum_{\tau}\gamma(\tau)e^{-2\pi i{k}\tau}. (4)

Equation (3) decomposes XtX_{t} into random contributions associated with each frequency k{k}. The covariance in Eqn (4) is also decomposed into weighted frequency contribution. Both of these observations are important for the interpretation of the Fourier representation of XtX_{t}. “Important” contributions correspond to f(k)f({k}) being considerably larger relative to other f(k)f({k}^{\prime}) as in that scenario 𝐕𝐚𝐫{dZ(k)}{\mathrm{\bf Var}}\{dZ({k})\} is bigger than 𝐕𝐚𝐫{dZ(k)}{\mathrm{\bf Var}}\{dZ({k}^{\prime})\} and therefore |dZ(k)||dZ({k})| likely to be larger than |dZ(k)||dZ({k}^{\prime})|. Spectral representations of point processes are also discussed in [20, Ch. 8]. We can yet again decompose the (complete) covariance in terms of a spectral representation as in (II), and it takes the form in terms of a dd-dimensional Fourier transform of

f(k)=e2πikzγ(z)𝑑z=λ+e2πikz{ρ(2)(z)λ2}𝑑z.f({k})=\int e^{-2\pi i{k}\cdot z}\gamma(z)\;dz=\lambda+\int e^{-2\pi i{k}\cdot z}\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\;dz. (5)

We refer to f(k)f({k}) as the spectrum of XX. We note that the Fourier transform can be inverted to yield the equality of (an analogy of III):

ρ(2)(z)=λ2+df(k)e2πikz𝑑k.\rho^{(2)}(z)=\lambda^{2}+\int_{{\mathbb{R}}^{d}}f({k})e^{2\pi i{k}\cdot z}\,d{k}. (6)

We assume the spectrum is the primary object of interest in our study of point patterns, as the covariance γ(z)\gamma(z) can be fully determined from it, and the covariance characterises the point process. We see directly from (5) that ρ(2)(z)λ2\rho^{(2)}(z)-\lambda^{2} and not the complete covariance γ(z)\gamma(z) is playing the role of a time series covariance. From a nonparametric perspective the spectrum characterises what wavenumbers are more notable (distinct) in the process. As (5) specifies a constant level λ\lambda to all wavenumbers, the notable (distinct) wavenumbers are determined from the Fourier transform of ρ(2)(z)λ2\rho^{(2)}(z)-\lambda^{2}.

We have to exercise some caution when interpreting the Fourier transform as a bijective transform. Yes, it should contain the same information about scale, but the meaning of the word “scale” will be different from a simple spatial understanding. In time or space the notion is associated with the support of γ(z)\gamma(z) or ρ(2)(z)\rho^{(2)}(z). Being supported over a wavenumber k{k} means variation over scales 1/k1/\|{k}\| is present in the correlation function, or to represent the variability in γ(z)\gamma(z) we need wavenumbers k\|{k}\| present in the spectral representation. The function ρ(2)(z)λ2\rho^{(2)}(z)-\lambda^{2} is often approached in terms of what scales it is non-zero at, but the variation in the function can also be associated with long or short scales. So for a covariance function, we now have two notions of what it means to possess scales k\|{k}\|; either γ(𝐮/k)\gamma(\mathbf{u}/\|{k}\|) is non-zero for unit vector 𝐮\mathbf{u} or γ(z)\gamma(z) is variable (changing) and the scale of the change is k\|{k}\|. Imagine observing a sinusoid with period kk. The function will hit unity at a regular set of zz values, and so it will be supported at those zz values, but also its Fourier transform will be supported at 1/k1/k.

This is clear from Figure 2 where covariance is present at smaller or medium scales smoothly for the clustered Thomas processes, or repelled for the Matérn hard-core processes. As the Matérn process’ complete covariance function is discontinuous, its Fourier transform is supported over all scales (due to the Heisenberg-Gabor uncertainty principle [14]). This can be deduced from (6). To reproduce the discontinuity we need all scales in the Fourier representation.

Refer to caption
Figure 2: Pair correlation pcf=ρ(2)/λ2pcf=\rho^{(2)}/\lambda^{2} (pcf, left) and corresponding scaled spectral density function f/λf/\lambda (sdf/λ\lambda, right) for two stationary and isotropic non-Poisson point process models, two variants each, with known pcfs. For the Thomas processes the sdf is given in the text. For the regular Matérn II process the sdf was numerically approximated using the Hankel transform (cf. Section 7). See details of the processes and the variants in Section 8.

We see from (5) that not all wavenumbers are given equal weighting. First all wavenumbers are given an equal weighting λ\lambda and then the Fourier transform of ρ(2)(z)λ2\rho^{(2)}(z)-\lambda^{2} determines the wavenumbers that are up–weighted (added) or down–weighted (subtracted) relative to the overall level of λ\lambda. We then observe what wavenumbers are important to the representation of the point process, which gives more information, conveniently decomposed on a scale–by–scale manner.

For a random field or a stochastic process in dd–dimensions a few cartoon characteristics are important. For a discretely sampled process in d\mathbb{Z}^{d} the simplest random process, white noise, is constant across wavenumbers yielding a spectrum that takes the form of σ2\sigma^{2} on [πΔ,πΔ]d[-\frac{\pi}{\Delta},\frac{\pi}{\Delta}]^{d}, where Δ>0\Delta>0 is the sampling period, and zero otherwise. For a point process the choice of definition of the spectral density does not imply a decay because of the inclusion of the term λδ(z)\lambda\delta(z) in space. However once we remove this term, we expect a decay of the remaining spectrum f(k)λf({k})-\lambda as k\|{k}\|\rightarrow\infty. Otherwise  (5) would contain additional singularities. Furthermore at k=0{k}=0 we retain

f(0)=λ+{ρ(2)(z)λ2}𝑑z.f(0)=\lambda+\int\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\;dz. (7)

The second term in this expansion is not required to be zero, and for the Thomas process for example, it is not zero, as it takes the form of [56, p 55]:

f(k)=λ+λμe4π2k2σ2f({k})=\lambda+\lambda\mu e^{-4\pi^{2}\|{k}\|^{2}\sigma^{2}} (8)

where μ\mu is the per-cluster expected point count and σ\sigma is the Gaussian dispersal kernel standard deviation. It is clear from this expression that we cannot arrive at a zero contribution due to the exponential.

As we shall be studying the moments of a point process, it is convenient to restate Campbell’s theorem [10, Sec. 4.3.3], and we shall use this result multiple times. The theorem applies to any measurable function h:RndR+h:R^{nd}\mapsto R^{+} with n=1,2,n=1,2,..., and states that (assuming product densities of order nn ρ(n)\rho^{(n)} are well defined for any given nn)

𝐄x1,,xnXh(x1,,xn)=ndh(x1,,xn)ρ(n)(x1,,xn)𝑑x1𝑑xn,{\mathbf{E}}\sum_{x_{1},...,x_{n}\in X}^{\neq}h(x_{1},...,x_{n})=\int_{{\mathbb{R}}^{nd}}h(x_{1},...,x_{n})\rho^{(n)}(x_{1},...,x_{n})dx_{1}...dx_{n}, (9)

where the summation is over distinct point tuples. Note that if the point pattern XX is stationary then ρ(1)(x)=λ\rho^{(1)}(x)=\lambda is constant for any xx, and ρ(n)(x1,,xn)=ρ(n)(x1xn,,xn1xn,0)\rho^{(n)}(x_{1},...,x_{n})=\rho^{(n)}(x_{1}-x_{n},\ldots,x_{n-1}-x_{n},0). With some abuse of notation, we shall use the same symbol for non-stationary as well as stationary product densities, where the latter function has n1n-1 arguments for an nnth order product density, e.g. we write ρ(n)(z1,,zn1)=ρ(n)(z1,,zn1,0)\rho^{(n)}(z_{1},...,z_{n-1})=\rho^{(n)}(z_{1},\ldots,z_{n-1},0).

We define a new parameterisation to capture how the process’s f(k)f({k}) differs from that of a Poisson process. We define the nnth deviation of the nnth order product density as

ρ~(n)(z1,,zn1)=ρ(n)(z1,,zn1)λnλn,n=2,3,,\widetilde{\rho}^{(n)}(z_{1},\dots,z_{n-1})=\frac{{\rho}^{(n)}(z_{1},\dots,z_{n-1})-\lambda^{n}}{\lambda^{n}},\quad n=2,3,\dots, (10)

where the argument of the product density zldz_{l}\in{\mathbb{R}}^{d} for l=1,,n1l=1,\dots,n-1. For a Poisson process these n=2,3,n=2,3,\dots deviations from Poissonianity will all be identically zero. By understanding deviations from Poisson behaviour, we get greater insight into the underlying process of interest. For completeness we also define the Fourier transform of the deviations:

f~(n)(k)={ρ~(n)(z)},\widetilde{f}^{(n)}({k})={\cal F}\left\{\widetilde{\rho}^{(n)}(z)\right\},

where if n=2n=2 we suppress the superscript. With this definition we find that

f~(k)=f(k)λλ2,wd,f(k)=λ+λ2f~(k).\widetilde{f}({k})=\frac{f({k})-\lambda}{\lambda^{2}},\quad w\in{\mathbb{R}}^{d},\quad f({k})=\lambda+\lambda^{2}\widetilde{f}({k}). (11)

Thus in a sense, f~(k)\widetilde{f}({k}) encapsulates the deviation of the function from a constant spectral density of λ\lambda via the term λ2f~(k)\lambda^{2}\widetilde{f}({k}).

Already [20, p. 337] discusses some differences in spectral representation of time series versus that of point processes. We would argue that decay of the spectrum is still reasonable to assume once λ\lambda has been subtracted (so the decay of f~(k)\widetilde{f}({k}) is reasonable to assume for large magnitude wavenumbers). Having established the theoretical spectral description of point processes, we now turn to their sampling properties.

3 Direct Spectral Summaries of Point Patterns

In this section, we shall revisit the possible definitions of a spectral estimator for a point pattern. We shall start by defining a direct Fourier transform and taper this definition to ameliorate edge effects, resulting in families of spectral estimators. We also discuss other choices of spectral estimators used in the literature already. To see the spectral characteristics of XX we start from what is known as Bartlett’s periodogram estimator [23] based on observing a point pattern in region BdB\subset{\mathbb{R}}^{d}, written as

I0(k)\displaystyle I_{0}({k}) :=λ^+|B|1x,yXBe2πik(xy),kd,\displaystyle:=\hat{\lambda}+|B|^{-1}\sum_{x,y\in X\cap B}^{\neq}e^{-2\pi i{k}\cdot(x-y)},\quad{k}\in{\mathbb{R}^{d}}, (12)

where we have set λ^=|B|1|XB|\hat{\lambda}=|B|^{-1}|X\cap B|. This definition uses all the data (or points) available to us, {x:xXB}\{x:\;x\in X\cap B\}, but is simply a possible choice amongst all possible direct spectral estimators. Normally a direct spectral estimator is one which is bilinear in the DFT of the data, see for example the discussion in [58, Ch 5–6]. We recall a bilinear form is a function D(𝐱,𝐲)D({\mathbf{x}},{\mathbf{y}}) satisfying in the first argument D(𝐱1+𝐱2,𝐲)=D(𝐱1,𝐲)+D(𝐱2,𝐲)D({\mathbf{x}}_{1}+{\mathbf{x}}_{2},{\mathbf{y}})=D({\mathbf{x}}_{1},{\mathbf{y}})+D({\mathbf{x}}_{2},{\mathbf{y}}), and equivalently in the second argument. Bilinear forms have been thoroughly discussed in signal processing for the analysis of time series  [52]. As a point process consists of locations the best we can hope for in terms of bilinearity will be in terms of sesquilinearity of the Fourier transform as the point locations will appear in the argument of the complex exponential. Sesquilinearity of D(𝐱,𝐲)D({\mathbf{x}},{\mathbf{y}}) simply generalises a bilinear form to the Hermitian symmetry, additionally requiring D(𝐱,𝐲)=D(𝐲,𝐱)D({\mathbf{x}},{\mathbf{y}})=D({\mathbf{y}}^{\ast},{\mathbf{x}}) as well as D(𝐱,𝐲)D({\mathbf{x}},{\mathbf{y}}) satisfying D(𝐱1+𝐱2,𝐲)=D(𝐱1,𝐲)+D(𝐱2,𝐲)D({\mathbf{x}}_{1}+{\mathbf{x}}_{2},{\mathbf{y}})=D({\mathbf{x}}_{1},{\mathbf{y}})+D({\mathbf{x}}_{2},{\mathbf{y}}).

In time series the bilinear form has been chosen to ensure that spectral estimators are real-valued, and often non-negative though some bilinear estimators are not, see e.g. Guyon’s spectral estimator [34]. We define the tapered DFT of a point process XX for a specified general (square) integrable function h(x)h(x) (referred to as a ‘data taper’ by  [58]) with Fourier transform H(k)H({k}) and unit norm (i.e. h2=1\|h\|_{2}=1), to be

Jh(k):=xXBh(x)e2πikx,kd.J_{h}({k}):=\sum_{x\in X\cap B}h(x)e^{-2\pi i{k}\cdot x},\quad{k}\in{\mathbb{R}}^{d}. (13)

Spectral estimators in time series are bilinear in the observed real-valued process XtX_{t} and sesquilinear in its Fourier transform Jh(k)J_{h}({k}). We shall still require that the form of the estimator is sesquilinear in the DFT of the point process, but the estimators will not be bilinear in XX, the point pattern, as this will not be possible to achieve. We shall also define

J~h(k)=Jh(k)λH(k).\widetilde{J}_{h}({k})=J_{h}({k})-\lambda H({k}).

Because xx appears in the argument of the complex exponential, we cannot define an estimator that is bilinear directly in the data. In practice also one has to decide what taper function to use. For 1D point processes tapering has been used [15], but also multitapering has been used on interpolated data [21]. Here, we do not interpolate, do not implement localised analysis, and do not implement analysis in 1D. It is more complex to implement interpolation in higher dimensions, as they are not orderly, unlike a time series time argument.

Based on Jh(k)J_{h}({k}) the natural spectral estimator becomes its modulus square or

Ih(k):=|Jh(k)|2=xXBh(x)h(x)+x,yXBh(x)h(y)e2πik(xy).I_{h}({k}):=\left|J_{h}({k})\right|^{2}=\sum_{x\in X\cap B}h(x)h^{*}(x)+\sum_{x,y\in X\cap B}^{\neq}h(x)h^{*}(y)e^{-2\pi i{k}\cdot(x-y)}. (14)

If we take h(z)=h0(z)=|B|12𝟏(zB)h(z)=h_{0}(z)=|B|^{-\frac{1}{2}}\mathbf{1}(z\in B) then we recover Bartlett’s periodogram in (12) from (14). If we do not take the h0(z)h_{0}(z) taper then Bartlett’s periodogram is not exactly recovered. The point of using a general function h(z)h(z) is that abruptly ending the inclusion of points when we leave the region BB causes ripples in wavenumber, and therefore a worse estimation of the spectrum, as is also the case of time series when not using a taper [58]. We study the whole family of estimators Ih(k)I_{h}({k}), where a new member of the family is defined for each choice of hh. We also note that (14) is a direct spectral estimator [58, p. 207]. Multidimensional tapers are available to us [36, 65], and can be pre-computed. Our choice of tapering corresponds to using continuous tapers evaluated at random locations. We have deduced an asymptotic distribution for Jh(k)J_{h}({k}) using [7] but this result cannot be applied to Ih(k)I_{h}({k}) even if it superficially may seem to be of the correct form.

Note that the sum in (14) cannot be evaluated in a computationally efficient manner, unlike the DFT, as the locations {x}\{x\} are not regularly spaced, which is unfortunate, but unavoidable. Finally (14) is bilinear in Jh(k)J_{h}({k}) but it is not bilinear in XX, unlike the commensurate expressions for the DFT of time series and random fields.

For completeness we also note the isotropic estimator of [23, Eqn. 3.3] in 2D, namely

I¯0(k)\displaystyle\bar{I}_{0}(\|{k}\|) :=λ^+1|B|x,yXB𝒥0(2πkxy),\displaystyle:=\hat{\lambda}+\frac{1}{|B|}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{0}\left(2\pi\|{k}\|\|x-y\|\right), (15)

where 𝒥0(x)\mathcal{J}_{0}(x) denotes the Bessel function of order 0, specified in Section 7. The dd-dimensional extension is given later in equation (18). I¯0\bar{I}_{0} is less clearly bilinear in the data, but if we start from the estimator I0I_{0} of (12) and average it over orientations analytically, then we arrive at this form. We therefore with a slight abuse of our terminology also refer to it as a bilinear estimator. There is one additional correction made by [23]. As we shall see, for low wavenumbers there is a bias inherent in (15). To address this problem [23, Eqn. 3.4] suggests taking for some choice of lower bound t0>0,t_{0}>0,

ID(k)={I¯0(t0)ifkt0I¯0(k)ifk>t0.I_{D}(\|{k}\|)=\left\{\begin{array}[]{lcr}\bar{I}_{0}(t_{0})&{\mathrm{if}}&\|{k}\|\leq t_{0}\\ \bar{I}_{0}(\|{k}\|)&{\mathrm{if}}&\|{k}\|>t_{0}\end{array}\right.. (16)

The authors of [23] suggest taking t0t_{0} so that I¯0(t0)\bar{I}_{0}(t_{0}) is at a minimum, and also suggest smoothing the estimated ID{I}_{D}, which we clarify in Section 6. The authors also discuss iterated methods of bias correction in their Section 5, and correcting biased estimators of the correlation.

Note also that we have modified the estimator in (15) to divide by |B||B| rather then the observed number of points in the region, N(B)N(B), as do for example [23], as it is much preferable not to have a random denominator. We shall discuss the usage of I0()I_{0}(\cdot) versus ID()I_{D}(\cdot) further on in Section 7, and the implications of when the process is truly isotropic or not.

As can be surmised from (14) the expectation of the general estimator 𝐄{Ih(k)}{\mathbf{E}}\left\{I_{h}({k})\right\} is a convolution between the Fourier transform of the observation window BB and the true spectrum, and if h(z)h(z) does not go to zero nicely at the boundary of BB then the periodogram becomes quite complicated in terms of its expectation. Inspection of (14) also raises a second problem, namely that there is a constant contribution |B|1|XB||B|^{-1}|X\cap B| which does not give any wavenumber specific information, and is correlated between wavenumbers. This term is new to time series and random fields, if not new to the expectation of the periodogram of randomly sampled stochastic processes, see [50, 11].

Should we choose to taper isotropically in 2D then we instead use an isotropic taper hI()h_{I}(\cdot)

I¯h(k):=λ^+x,yXBhI(xy)𝒥0(2πkxy).\bar{I}_{h}(\|{k}\|):=\hat{\lambda}+\sum_{x,y\in X\cap B}^{\neq}h_{I}\left(\|x-y\|\right)\mathcal{J}_{0}\left(2\pi\|{k}\|\cdot\|x-y\|\right). (17)

Note that this cannot necessarily be constructed. Say Jh()J_{h}(\cdot) with an isotropic taper would be made with the largest spherical domain and a compact support could be constructed, but what does that imply for Ih()I_{h}(\cdot)? Radial tapers can be determined as described in [65], whether in 2D continuous space or discrete space. Equation (17) can be of course be extended into higher dimensions. In general we would have in dimension d=1,2,3,4,d=1,2,3,4,\dots

I¯h(k):=λ^+x,yXBhI(xy)𝒥d/21(2πkxy)xy(d/21).\bar{I}_{h}(\|{k}\|):=\hat{\lambda}+\sum_{x,y\in X\cap B}^{\neq}h_{I}\left(\|x-y\|\right)\mathcal{J}_{d/2-1}\left(2\pi\|{k}\|\|x-y\|\right)\|x-y\|^{-(d/2-1)}. (18)

We can also use more than one taper, and will use a sequence of orthogonal functions {hm}\{h_{{m}}\} [58], that will be used to get a new estimator. We will assume they are all of unit energy and are all (pairwise) orthogonal. This means we will require hm2=1||h_{{m}}||_{2}=1 and

dhm(z)hm(z)𝑑z=δmm.\int_{{\mathbb{R}}^{d}}h_{{m}}(z)h_{{m^{\prime}}}(z)\,dz=\delta_{{{m}}{{m^{\prime}}}}. (19)

We shall study the properties of direct spectral estimators of the form of Eqn (14). This will help us to characterise the second order properties of the process XX. Most of time series analysis is based on discrete regular sampling for instance, and so most tapers are designed for that scenario. We chose to use continuous tapers, rather than interpolating the points to a regular grid. This leaves as possible tapers to use the spheroidal wavefunctions (continuous but hard to compute), as well as the cosine tapers of [60]. These correspond to separable taper choices; non-separable choices with be discussed in Section 7.

4 Distributional Properties of Bilinear Spectral Estimators

In this Section we derive the asymptotic marginal distribution of the tapered periodogram, and its asymptotic expectation. We shall start from the simplest spectral estimator, namely the Bartlett periodogram as specified by (12) and calculate its properties. We also define the transfer function corresponding to a taper hh to be

H=[h],H0=[h0].H=\mathcal{F}[h],\quad H_{0}=\mathcal{F}[h_{0}]. (20)

We shall now be quite concrete and understand some common cases of spatial data, and their sampling, and choose as early special cases Cartesian product domains. We shall focus on the cuboid sampling domain:

B(𝒍)=[l1/2,l1/2]××[ld/2,ld/2].{B}_{\square}(\bm{l})=[-{l_{1}}/{2},{l_{1}}/{2}]\times\dots\times[-{l_{d}}/{2},{l_{d}}/{2}].
Lemma 4.1.

Assume that XX is a homogeneous point process with intensity λ\lambda and twice differentiable spectrum f(k)f({k}) at all values of k0{k}\neq 0. Assume XX is observed in a cuboid domain B(𝐥)B_{\square}(\bm{l}) with a centroid at 0, which is growing in every dimension or minjlj\min_{j}l_{j}\rightarrow\infty. Then the expected value of the periodogram I0(k)I_{0}({k}) satisfies the equation

𝐄I0(k)=f(k)+|B|1λ2T(B,k)+o(1),k0,{\mathbf{E}}I_{0}({k})={f}({k})+|B|^{-1}\lambda^{2}T(B,{k})+o(1),\quad{k}\neq 0, (21)

for

T(B,k)=j=1dsin2(πkjlj)(πkj)2.T(B,{k})=\prod_{j=1}^{d}\frac{\sin^{2}(\pi{k}_{j}l_{j})}{(\pi{k}_{j})^{2}}.
Proof.

See Appendix A. ∎

This is not what we would expect as a large sample expectation of the spectrum, given our experience for time series and random fields in that the term |B|1λ2T(B,k)|B|^{-1}\lambda^{2}T(B,{k}) has been added. To get there, we need to understand the DFT Jh(k)J_{h}({k}) better.

We note that in general h(x)e2πikxh(x)e^{-2\pi i{k}\cdot x} is identically distributed but not independent for many choices of point processes XX, making the choice of a Central Limit Theorem (CLT) a bit more complex. For a large class of processes one such CLT is given by [7]. If we compare the quantity in (13) with [7], then we see that qq\in{\mathbb{N}} and p=1p=1, in their notation. We do not have to worry about e2πikxe^{-2\pi i{k}\cdot x} being a function of several of the points, and this is why q=1q=1. Citing [7], we can deduce from their Theorem 1 that as |B||B| diverges Jh(k)J_{h}({k}) is asymptotically Gaussian. We recover the second moments by direct calculations:

Proposition 4.1.

Assume that XX is a stationary point process with intensity λ\lambda and with spectrum f(k)f({k}), and that hh is a unit energy taper supported in the domain B(𝐥)B_{\square}(\bm{l}) itself only. Then the first moments of the direct spectral estimator Jh(k)J_{h}({k}) are given by:

𝐄{Jh(k)}\displaystyle{\mathbf{E}}\left\{J_{h}({k})\right\} =𝐄xXBh(x)e2πikx\displaystyle={\mathbf{E}}\sum_{x\in X\cap B}h(x)e^{-2\pi i{k}\cdot x} (22)
=λdh(x)e2πikx𝑑x=λH(k),kd\displaystyle=\lambda\int_{\mathbb{R}^{d}}h(x)e^{-2\pi i{k}\cdot x}\;dx=\lambda H({k}),\quad{k}\in{\mathbb{R}^{d}}
𝐕𝐚𝐫{Jh(k)}\displaystyle{\mathrm{\bf Var}}\left\{J_{h}({k})\right\} =λ+λ2d|H(kk)|2f~(k)𝑑k+λ2|H(k)|2λ2|H(k)|2\displaystyle=\lambda+\lambda^{2}\int_{\mathbb{R}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}+\lambda^{2}\left|H({k})\right|^{2}-\lambda^{2}|H({k})|^{2}
=λ+λ2d|H(kk)|2f~(k)𝑑k\displaystyle=\lambda+\lambda^{2}\int_{\mathbb{R}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime} (23)
Rel{Jh(k)}\displaystyle{\mathrm{Rel}}\left\{J_{h}({k})\right\} =λdH(k2k)H(k)𝑑k+dU(k,z)e2πikz{ρ(z)λ2}𝑑z,\displaystyle=\lambda\int_{\mathbb{R}^{d}}H({k}^{\prime}-2{k})H({k}^{\prime})d{k}^{\prime}+\int_{\mathbb{R}^{d}}U({k},z)e^{-2\pi i{k}\cdot z}\{\rho(z)-\lambda^{2}\}dz, (24)

with U(k,z)=h(x)h(z+x)e2πik(2x)𝑑xU({k},z)=\int h(x)h(z+x)e^{-2\pi i{k}\cdot(2x)}\;dx.

Proof.

See Appendix B and notice the definition of (20). ∎

Why do we bother with determining the first two moments of the point pattern? The spectrum characterises any stationary/homogeneous process and so is a key summary to estimate [8]. Estimation will be linear in the periodogram, but once Jh(k)J_{h}({k}) is approximately Gaussian, the distribution of the periodogram is determined from these moments, using theory developed for quadratic forms in normal variates [42].

In the above Proposition the term Rel{}{\mathrm{Rel}}\left\{\cdot\right\} denotes “relation”, also known as the “complimentary covariance”, see e.g. [54]. With these moments, and the result of  [7] we deduce the following corollary.

Corollary 4.1.

Assume XX satisfies the constraints of [7], then the DFT satisfies:

Jh(k)\displaystyle J_{h}({k}) 𝐿NC(λH(k),f(k),r(k)),\displaystyle\overset{L}{\rightarrow}N_{C}\left(\lambda H({k}),f({k}),r({k})\right), (25)
r(k)\displaystyle r({k}) =λdH(k2k)H(k)𝑑k+dU(k,z)e2πikz{ρ(z)λ2}𝑑x𝑑z,U(k,z)=h(x)h(z+x)e2πik(2x)𝑑x.\displaystyle=\lambda\int_{\mathbb{R}^{d}}H({k}^{\prime}-2{k})H({k}^{\prime})d{k}^{\prime}+\int_{\mathbb{R}^{d}}U({k},z)e^{-2\pi i{k}\cdot z}\{\rho(z)-\lambda^{2}\}dxdz,\;U({k},z)=\int h(x)h(z+x)e^{-2\pi i{k}\cdot(2x)}\;dx.

We can view Jh(k)J_{h}({k}) as a complex-valued scalar or a real-valued two-vector. NC(μ,Σ,C)N_{C}(\mu,\Sigma,C) is the general complex normal and its arguments are its mean μ\mu, it covariance Σ\Sigma, as well as its relation or complimentary covariance CC. It should be contrasted with the complex proper normal NC(μ,Σ),N_{C}(\mu,\Sigma), that has zero relation.

What can we then say about Ih(k)I_{h}({k})? Using the continuous mapping theorem [22] we can deduce from (25) that if we consider only arguments k{k} so that the complementary covariance is negligible then

|Jh(k)|2𝐿f(k)2χ22(λ2|H(k)|2f(k)),|J_{h}({k})|^{2}\overset{L}{\rightarrow}\frac{f({k})}{2}\chi^{2}_{2}\left(\frac{\lambda^{2}|H({k})|^{2}}{f({k})}\right), (26)

where the parameters of the non–central χν2(μ2)\chi^{2}_{\nu}(\mu^{2}) (ν\nu denoting the degrees of freedom, and μ2\mu^{2} the non-centrality parameter). We give the form of H(k)H({k}) in (20). We can also use the uniform integrability of the variable Jh(k)J_{h}({k}), to get the asymptotic moments of I0(k)I_{0}({k}) from these limits. In fact, the expectation of any member of that family of tapered estimates takes the form of:

𝐄{Ih(k)}=d|H(kk)|2f(k)𝑑k+λ2|H(k)|2.{\mathbf{E}}\left\{I_{h}({k})\right\}=\int_{{\mathbb{R}^{d}}}|H({k}^{\prime}-{k})|^{2}f({k}^{\prime})d{k}^{\prime}+\lambda^{2}|H({k})|^{2}. (27)

This can be derived straightforwardly by direct computation from (14) by taking expectations with Campbell’s formula and using the convolution theorem. We see immediately the bias of this estimator, namely the λ2|H(k)|2\lambda^{2}|H({k})|^{2} term. To remove the non-centrality bias, we define the bias-corrected periodogram from the de-biased discrete Fourier transform

I~h(k)=|J~h(k)|2.\widetilde{I}_{h}({k})=\left|\widetilde{J}_{h}({k})\right|^{2}. (28)

Note also the discussion in [20, p. 292], where the theory of signed measures is used to make the equivalent definition, if with additional mathematical sophistication. Therefore Jh(k)λH(k)=J~h(k)J_{h}({k})-\lambda H({k})=\widetilde{J}_{h}({k}) can be called the signed measure. The quantity J~h(k)\widetilde{J}_{h}({k}) is the DFT of the process X0X^{0} dual to the mean-corrected random measure N0(dx)=N(dx)λdxN^{0}(dx)=N(dx)-\lambda dx where NN is the dual counting measure of the point process XX. As we have subtracted H(k)H({k}) off the discrete Fourier transform, it is no longer strictly positive, but once we take the modulus square we are guaranteed to arrive at a real-valued positive quantity.

There is one wavenumber which is problematic, namely k=0{k}=0. For the periodogram we have h0(x)=𝟏B(x)/|B|h_{0}(x)=\mathbf{1}_{B}(x)/\sqrt{|B|}. Thus we have

J0(k)=1|B|xXBe2πikx,J0(0)=1|B|N(B)=Bλ^.J_{0}({k})=\frac{1}{\sqrt{|B|}}\sum_{x\in X\cap B}e^{-2\pi i{k}\cdot x},\quad J_{0}(0)=\frac{1}{\sqrt{|B|}}N(B)=\sqrt{\|B\|}\widehat{\lambda}. (29)

This removes the problem of a non-zero mean of the DFT, as long as we assume that we know the intensity λ\lambda, and so are in the position to remove this effect. Assuming knowledge of this quantity is not a major hurdle, as it can be estimated consistently for largish areas (|B||B|\rightarrow\infty), by just counting the number of points and dividing by the area. For completeness we also define the signed measure DFT as

J~h(k)=Jh(k)λ^H(k).\widetilde{J}_{h}({k})=J_{h}({k})-\hat{\lambda}H({k})\quad. (30)

We note directly from (25), that

J~h(k)NC{0,f(k),r(k)}.\widetilde{J}_{h}({k})\rightarrow N_{C}\{0,f({k}),r({k})\}. (31)

Also it follows

J~0(0)=|B|λ^λ^H0(0)=|B|λ^|B|λ^=0,\widetilde{J}_{0}(0)=\sqrt{|B|}\widehat{\lambda}-\widehat{\lambda}H_{0}(0)=\sqrt{|B|}\widehat{\lambda}-\sqrt{|B|}\widehat{\lambda}=0, (32)

trivially. Thus we cannot estimate the DFT at wavenumber zero. For a time series analysis when calculating DFTs we subtract off the sample mean, and then also the mean corrected DFT is zero at wavenumber zero. In a time series setting the periodogram at wavenumber zero is often not plotted.

We note that a second way to do bias correction is by means of

I˘(k;h)=x,yXh(x)h(y)e2πik(xy)λ2^|H(k)|2.\breve{I}({k};h)=\sum_{x,y\in X}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}-\widehat{\lambda^{2}}\left|H({k})\right|^{2}. (33)

The advantage of the first estimator in (28) is that it is naturally non-negative and removes the error before we take the modulus square. This also should have a positive impact on the variance. In addition, where as λ^\hat{\lambda} is an unbiased estimator for λ\lambda, λ2^\widehat{\lambda^{2}} can be a biased estimator for λ2\lambda^{2}.

For additional generality, we could consider a more sophisticated estimator than (33) or (28) see e.g. [14, Ch. 6], and define

Ig(k)=xXByXBg(x,y)e2πik(xy)λ2^G(k,k),I_{g}({k})=\sum_{x\in X\cap B}\sum_{y\in X\cap B}g(x,y)e^{-2\pi i{k}\cdot\left(x-y\right)}-\widehat{\lambda^{2}}G({k},{k}), (34)

where g(x,y)g(x,y) is a kernel which may have a specified support, that has to be selected, and G(k,k)=xyg(x,y)e2πik(xy)G({k},{k})=\sum_{x}\sum_{y}g(x,y)e^{-2\pi i{k}\cdot\left(x-y\right)}. This estimator suffers from not being positive by design. For example we could define g(x,y)=h(x)h(y)g(x,y)=h(x)h^{\ast}(y) with h(x)h(x) a multi-dimensional data taper  [36], or we could make a non-separable choice of g(x,y)=h(xy)g(x,y)=h(\|x-y\|). Depending on the choice of g(x,y)g(x,y) the quantity Ig(k)I_{g}({k}) may satisfy a number of desirable characteristics such as positivity, asymptotic unbiasedness and computational efficiency.

Looking at the debiased periodogram, we can now determine further properties of Ih(k)I_{h}({k}), or properties of I~h(k)\widetilde{I}_{h}({k}). To produce an estimator that is smoothed we need to study the covariance and variance of I~h(k1)\widetilde{I}_{h}({k}_{1}) and I~h(k2)\widetilde{I}_{h}({k}_{2}). We first look at 𝐄{I~h(k)}{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}. We find that its form for large spatial regions is specified by the following theorem.

Lemma 4.2.

Assume that XX is a homogeneous point process with a spectral density f(k)f({k}). Then the bias-corrected tapered periodogram has a first moment given by:

𝐄{I~h(k)}\displaystyle{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\} =d|H(kk)|2f(k)𝑑k.\displaystyle=\int_{{\mathbb{R}^{d}}}|H({k}^{\prime}-{k})|^{2}f({k}^{\prime})d{k}^{\prime}.
Proof.

See Appendix C. ∎

Thus as long as H(k)H({k}) is getting more concentrated in wavenumber (e.g. H(k)δ(k)H({k})\rightarrow\delta({k})), this is asymptotically in |B||B| an unbiased estimator of f(k)f({k}). Using the signed measure DFT of (30) to study XX is more convenient, as this lets us avoid the contribution that affects the low wavenumbers.

Let us write down what grid we get large sample unbiasedness for, customarily called the Fourier grid, and this is useful when the observational domain BB is a box.

Definition 4.1.

The Fourier wavenumber grid for a point process observed on a cuboid domain B(𝐥)B_{\square}(\bm{l}) corresponds to the points

𝒦(𝒍):=𝒦(B(𝒍))={kn:kn=(pn1/l1,,pnd/ld),pnj}.{\cal K}(\bm{l}):={\cal K}(B_{\square}(\bm{l}))=\{{k}_{n}\;:\;{k}_{n}=\begin{pmatrix}p_{n1}/l_{1},\dots,p_{nd}/l_{d}\end{pmatrix},\quad p_{nj}\in{\mathbb{Z}}\}. (35)

Note that the physical units of the wavenumber elements is per unit length, such as m1m^{-1}. Referring to Lemma 4.1 we see that the zero wavenumber, or taper h0h_{0} related bias term T(B,k)T(B,{k}), vanishes using this grid not only removing any asymptotic bias at low wavenumbers, but also giving us a sampling of the wavenumbers that approximately leads to independent periodogram ordinates, rather like in the random field case, see Corollary 5.1.

These results establish what wavenumber grid we should evaluate a standard spectral estimator of a time series at, in analogy to the the Fourier frequencies [58, p. 197–198] in time series. Their basic importance follows because we can expect the direct Fourier transform to be Gaussian and so uncorrelated implies independence.

A second feature of time series is the notion of the Nyquist wavenumber [58, p. 98]. This does not exist for point processes. It may seem counter intuitive that there is no upper limit to the wavenumbers we can estimate. When analysing a process that has been sampled in space, such as a random field, we expect to see aliasing. Aliasing is when variation that is happening very rapidly is confused with slower cycles, because when we have sampled more sparsely, rapid variation cannot be resolved. For random locations of the point process the pairwise distances can be any real-valued value, so the Nyquist wavenumber does not (in a sense) exist.

Theorem 4.1.

Large–Domain Expectation of the tapered periodogram. Assume that XX is a stationary point process with intensity λ\lambda and twice differentiable spectrum f(k)f({k}) at all values of k0{k}\neq 0, and that hh is a unit energy taper (e.g. h=1\|h\|=1) supported in the cuboid domain B(𝐥)B_{\square}(\bm{l}) only. Assume XX is observed in B(𝐥)B_{\square}(\bm{l}), which is growing in every dimension, that is minlj\min l_{j}\rightarrow\infty. Then the expected value of the tapered periodogram Ih(k)I_{h}({k}) satisfies the equation

𝐄Ih(k)=f(k)+λ2|H(k)|2+o(1),w0,{\mathbf{E}}I_{h}({k})=f({k})+\lambda^{2}|H({k})|^{2}+o(1),\quad w\neq 0, (36)

where H(k)H({k}) is the Fourier transform of h(x)h(x), as defined by (20).

Proof.

See Appendix E. ∎

This theorem establishes that the tapered periodogram is a biased estimator of the spectrum of the point pattern. Previous authors made up an ad-hoc corrections [23] to remove the bias. Our result will suggest a method to arrive at a positive spectral density estimator that is not biased. This is the basic and important result; equivalent to [8, Section V]. In fact, asymptotics are well–understood both for time series and random fields. Our (previous) understanding of asymptotics for spectral estimation of point processes hearkens back to [20] or [23]. In the latter reference the authors informally refer back to Chemistry theory by Hansen and McDonald, discussing the properties of gasses.

Corollary 4.2.

Bias–corrected periodogram. Assume that XX is a stationary point process with intensity λ\lambda and twice differentiable spectrum f(k)f({k}) at all values of k0{k}\neq 0, and that hh is a unit energy taper supported in the cuboid domain B(𝐥)B_{\square}(\bm{l}) only. Assume XX is observed in B(𝐥)B_{\square}(\bm{l}), which is growing in every dimension. Then the expected value of the tapered periodogram I~h(k)\widetilde{I}_{h}({k}) satisfies the equation

𝐄I~h(k)=f(k)+o(1),k0.{\mathbf{E}}\widetilde{I}_{h}({k})=f({k})+o(1),\quad{k}\neq 0. (37)
Proof.

See Appendix D. ∎

This corollary then informs us how to do estimation, by removing the bias. This corollary shows the importance of removing the spectral bias before squaring – otherwise it gets hard to isolate and remove the effect at zero wavenumber, which is exacerbated at higher intensities (growing λ\lambda), as is clear from (36). Note that the estimator I~h(k)\widetilde{I}_{h}({k}) is debiased relative to 𝐄{Ih(k)}{\mathbf{E}}\left\{I_{h}({k})\right\}, but is still guaranteed to be non-negative. This can be compared to removing μ^\widehat{\mu} from a time series before calculating the periodogram.

Finally, to employ smoothing for purposes of variance reduction, we will extend the tapered estimator to include multiple tapers. Define M1{M}\geq 1 estimates of the spectrum via

Imt(k)=x,yXBhm(x)ei2πkxhm(y)ei2πky,m=1,,M,I_{{{{m}}}}^{t}({k})=\sum_{x,y\in X\cap B}h_{{{m}}}(x)e^{-i2\pi{k}\cdot x}h^{\ast}_{{{m}}}(y)e^{i2\pi{k}\cdot y},\quad{{m}}=1,\dots,{M}, (38)

which is not bias-corrected but where we assume

dhm(x)hm(x)𝑑x=δmm,\int_{{\mathbb{R}^{d}}}h_{{{m}}}(x)h^{\ast}_{{m^{\prime}}}(x)dx=\delta_{{{m}}{{m^{\prime}}}},

where δmm=1\delta_{{{m}}{{m^{\prime}}}}=1 if m=m{{m}}={{m^{\prime}}} and 0 otherwise. Of course (38) still suffers from low-wavenumber bias. To define a debiased estimator we take

I~mt(k)=(xXBhm(x)ei2πkxλHm(k))(yXBhm(y)ei2πkyλHm(k)),m=1,,M.\widetilde{I}_{{{{m}}}}^{t}({k})=\left(\sum_{x\in X\cap B}h_{{m}}(x)e^{-i2\pi{k}\cdot x}-\lambda H_{{m}}({k})\right)\left(\sum_{y\in X\cap B}h^{\ast}_{{m}}(y)e^{i2\pi{k}\cdot y}-\lambda H^{\ast}_{{m}}({k})\right),\quad{{m}}=1,\dots,{M}. (39)

We subsequently average these estimates over m{{m}} to reduce variance.

In (38) we have a product of hm(x)hm(y)h_{{m}}(x)h^{\ast}_{{m}}(y). This is an inevitable consequence of the bilinear form of the periodogram from the Fourier transform. The most natural way of making multidimensional and separable tapers would be for x=(x1xd)Tx=\begin{pmatrix}x_{1}&\dots&x_{d}\end{pmatrix}^{T} to define the component-wise product taper hm(x)=h~m1(x1)h~md(xd)h_{{m}}(x)=\tilde{h}_{{{m}}_{1}}(x_{1})\dots\tilde{h}_{{{m}}_{d}}(x_{d}). We will then need to re-enumerate m1,,md{{m}}_{1},\dots,{{m}}_{d} into a single index. For instance in 2D if we have three tapers in 1D, then we will end up with nine tapers in 2D, and m{{m}} will range from one to nine. These estimates can also be arrived at starting from (13) and Jh(k)J_{h}({k}). As we shall use M{M} tapers linearly, and M2{M}^{2} for the quadratic estimators, we shall use the subscript ‘0’ to refer to the untapered periodogram, and m=1,,M{{m}}=1,\dots,{M} to refer to the subsequent tapers.

Tapering is very common for stochastic processes. Initially the idea was hotly contested, but its utility is now firmly established both for time series and random fields [18, 76]. The idea of tapering is to ameliorate edge effects that leads to the asymptotically leading bias term. This is important in 1D [70, 76] but of greater importance in 2D and higher [18]. In 2D and higher the edge effects become more pronounced, and indeed asymptotically dominant [47, 53]. We also use multitapering to stabilise the variance of any estimator, as described for time series in [76]. We shall describe the necessary steps to perform linear estimation of the spectrum of a 2D and higher dimension point process in Section 6.

Let us set theory aside for a moment and consider some spectral estimates for archetypal point patterns. Figure 3 demonstrates the debiased periodogram, the debiased multitapered periodogram, with sine-tapers, and a rotationally averaged 1D summary of the peridogram (defined in Sections 6&7) for four point patterns exhibiting different structural behaviour. The aspect ratio of the figure panels are kept the same only to have an orderly figure: Due to debiasing we can estimate the periodogram on wavenumbers of our own choosing, and not just on the Fourier grid which in these examples would be different for different tapers. The wavenumber-grid in this illustration is a regular grid with a step size 0.005 in both dimensions.

We see that information is present at wavenumbers up to k0.2\|{k}\|\approx 0.2. The tapering smooths the periodogram, as expected. There is a dark well near the origin for the regular pattern, and a bright hump for the clustered pattern, and both features transfer to the rotationally averaged curves (compare with Figure 2). The anisotropy of the fourth pattern is hinted by the anisotropy visible in the 2D periodograms as elongation of the “hump” in the second dimension. The elongation is perpendicular to the elongations of the clusters in the data because wavenumbers relate inversely to spatial units. The rotationally averaged periodogram was also computed from the (not shown) non-debiased periodogram for comparison. There is a very prominent bias near k=0\|{k}\|=0 for the non-debiased version, which illustrates why the proposed debiasing step is relevant when applying the method in practice.

Refer to caption
Figure 3: Top row: Four example data patterns in a rectangle of area 2e4 units, exhibiting regularity, complete randomness, small scale clustering, and clustering with a left-right axis directionality. Second row: Their debiased periodograms excluding k=0{k}=0. Third row: Their multitapered debiased periodograms, 3×33\times 3 sine-tapers. Bottom row: Their rotationally averaged (un-tapered) periodograms, including the case without debiasing. The 2D summaries use a k{k}-grid with a step size 0.0050.005 in both directions. The rotational averaging was estimated at k=0.0050,0.0075,,0.2500\|{k}\|=0.0050,0.0075,...,0.2500 using a box kernel having a radius 1.5 times the k{k}-grid step size 0.005.

5 Covariance of bilinear spectral estimators

In this Section we determine the covariance of the periodogram, in anticipation of smoothing the periodogram. If we compare the developments of the previous sections to that of spectral analysis, it would seem that we are in a good position to estimate the spectrum, especially as we can assume the spectral deviation function f~(k)\widetilde{f}({k}) is smooth. The smoothness of f~(k)\widetilde{f}({k}) follows from the decay of ρ~(z)\widetilde{\rho}(z). Simple Fourier theory stipulates that the decay of ρ~(z)\widetilde{\rho}(z) yields the smoothness of f~(k)\widetilde{f}({k}). From a mean-square error perspective on estimating the spectrum, to understand how to smooth the periodogram away from zero wavenumber, and to do so we need to further study the variance and covariance of direct spectral estimators.

Our main concern is : 1) Is the variance finite? 2) Can we find a grid of wavenumbers so that the estimated spectrum is uncorrelated at these points? To be able to answer such questions we must study what the variance and covariance of the DFT is. Let us determine the second order properties of spectral estimators. Core to our understanding of smoothing will be the variance and covariance of the tapered functions. This will be established in the following theorem. For brevity define

ϕk(x)=e2πikx,k,xd.\phi_{k}(x)=e^{-2\pi i{k}\cdot x},\quad{k},x\in{\mathbb{R}}^{d}. (40)

With this notation we have

Imt(k)=x,yXBhm(x)hm(y)ϕk(xy),m=1,M.I^{t}_{{m}}({k})=\sum_{x,y\in X\cap B}h_{{{m}}}(x)h_{{{m}}}^{\ast}(y)\phi_{k}(x-y),\quad{{m}}=1,\dots M. (41)

Above we note that we have the same value of m{{m}} across xx and yy in (41) as this corresponds to a modulus square.

Theorem 5.1.

Let Imt(k)I^{t}_{{m}}({k}) denote the (tapered) periodogram given by  (38). The covariance between the (tapered) periodogram and itself across two wavenumbers k1,k2d{k}_{1},{k}_{2}\in{\mathbb{R}}^{d} then takes the form of

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k1),Imt(k2)}=B4ρ(4)(x,y,z,v)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)𝑑x𝑑y𝑑v𝑑z\displaystyle\{I^{t}_{{m}}({k}_{1}),I^{t}_{{m^{\prime}}}({k}_{2})\}=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)\;dxdydvdz
+B3ρ(3)(x,y,v)[hm(x)hm(y)hm(x)hm(v)ϕk1(xy){ϕk2(xv)+ϕk2(xv)}\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{k}_{2}}(x-v)\right\}\right.
+hm(x)hm(y)hm(y)hm(v)ϕk1(xy){ϕk2(yv)+ϕk2(yv)}\displaystyle\quad+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{k}_{2}}(y-v)\right\}
+ϕk2(xy)hm2(v)hm(x)hm(y)+ϕk1(xy)hm2(v)hm(x)hm(y)]dxdydv\displaystyle\quad\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}_{1}}(x-y)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv
+B2ρ(2)(x,y)[hm(x)hm(y)hm(x)hm(y)ϕk1(xy){ϕk2(xy)+ϕk2(xy)}\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{k}_{2}}(x-y)\right\}\right.
+{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}{ϕk1(xy)+ϕk2(xy)}+hm2(x)hm2(y)]dxdy\displaystyle\quad+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\right]\,dxdy
+λBhm2(x)hm2(x)𝑑xλ2λB2hm(x)hm(y)hm(x)hm(y)[ϕk1(xy)+ϕk2(xy)]ρ(2)(x,y)𝑑x𝑑y\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\left[\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y)\right]\rho^{(2)}(x,y)dxdy
{B2hm(x)hm(y)ϕk1(xy)ρ(2)(x,y)𝑑x𝑑y}{B2hm(x)hm(y)ϕk2(xy)ρ(2)(x,y)𝑑x𝑑y}.\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\rho^{(2)}(x,y)dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\rho^{(2)}(x,y)dxdy\right\}.
Proof.

The proof is in Appendix F. ∎

This theorem derives the general expression for the covariance of the periodogram, and will help us in general to determine how to do linear smoothing. The cross-correlation also relates to [8, Section V], and helps us understand when we can treat Fourier coefficients as uncorrelated (and with asymptotic Gaussianity as independent). Uncorrelatedness is important to the smoothing, as it ensures a variance reduction by averaging. Whilst, the expression derived in Theorem 5.1 is general, and only requires the assumptions of homogeneity of the point process in space, it is hard to understand so let us study some special cases. Let us study this general correlation in the instance of the Poisson process where we know that

ρ(n)(x1,,xn)=λn,\rho^{(n)}(x_{1},\dots,x_{n})=\lambda^{n}, (42)

and consider the case of k1=k2=k{k}_{1}={k}_{2}={k}.

For completeness we also define the spectral bandwidth bhb_{h} by

bh2\displaystyle b^{2}_{h} :=dk2|H(k)|2𝑑k.\displaystyle:=\int_{{\mathbb{R}}^{d}}\|{k}\|^{2}\left|H({k})\right|^{2}d{k}. (43)
Corollary 5.1 (Covariance of Spectral Estimates).

Let Imt(k)I^{t}_{{m}}({k}) denote the tapered periodogram given by  (38) for a Poisson process with intensity λ\lambda using a single taper m{{m}}, and assume that k>max{bhm,bhm}\|k\|>\max\{b_{h_{{{m}}}},b_{h_{{{m^{\prime}}}}}\}. We can determine the covariance between the periodogram using different tapers (m{{m}} and m{{m^{\prime}}}) or at different frequencies by the following:

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k),Imt(k)}=λBhm2(x)hm2(x)𝑑x+λ2δmm+o(1).\displaystyle\{I^{t}_{{m}}({k}),I^{t}_{{m^{\prime}}}({k})\}=\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx+\lambda^{2}\delta_{{{m}}{{m^{\prime}}}}+o(1).

Note that o(1)o(1) is in |B||B| diverging. The covariance between the (tapered) periodogram and itself at different wavenumbers (k1{k}_{1} and k2{k}_{2}) takes the form of

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k1),Imt(k2)}=λhm44=o(1).\displaystyle\{I^{t}_{{m}}({k}_{1}),I^{t}_{{m}}({k}_{2})\}=\lambda\|h_{{m}}\|_{4}^{4}=o(1).
Proof.

The proof can be found in Appendices G and H. ∎

At this stage we have simplified our assumptions to the Poisson process which is quite disappointing. The reason why we have chosen to do so is clear from Theorem 5.1. If we assume the process is Poisson then several terms cancel. However if the Fourier transform is turning Gaussian as we have discussed in (25) in Section 3, with uniform integrability of Jh(k)J_{h}({k}) the following proposition holds.

Proposition 5.1.

Let Imt(k)I^{t}_{{m}}({k}) denote the tapered periodogram given by (38), and I~mt(k)\tilde{I}^{t}_{{m}}({k}) the tapered bias corrected periodogram. Assume XX satisfies the assumptions given for (25) and that f~(m)0<\|\widetilde{f}^{(m)}\|_{0}<\infty for m=2,3,4,5,6m=2,3,4,5,6. Then

𝐂𝐨𝐯{I~mt(k1),I~mt(k2)}=o(1)+|𝐄{J~m(k1)J~m(k2)}|2.\displaystyle{\mathrm{\bf Cov}}\{\widetilde{I}^{t}_{{m}}({k}_{1}),\widetilde{I}^{t}_{{m^{\prime}}}({k}_{2})\}=o(1)+\left|{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}\right|^{2}.

To derive the form of the latter term we note

𝐄{J~m(k1)J~m(k2)}\displaystyle{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\} =𝐂𝐨𝐯{Jm(k1),Jm(k2)}=λδmm+λ2𝐂𝐨𝐯1{Jm(k1),Jm(k2)},\displaystyle={\mathrm{\bf Cov}}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}=\lambda\delta_{{{m}}{{m^{\prime}}}}+\lambda^{2}{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\},

where we have f~(k)\tilde{f}({k}) given by (11) and so

𝐂𝐨𝐯1{Jm(k1),Jm(k2)}\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\} Rd×Rdρ~(xy)hm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑y+o(1)\displaystyle\equiv\iint_{R^{d}\times R^{d}}\widetilde{\rho}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy+o(1)
=Rd×RdRdf~(k)e2πix(k1k)e2πiy(k2k)𝑑khm(x)hm(y)𝑑x𝑑y+o(1)\displaystyle=\iint_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}({k}^{\prime})e^{-2\pi ix\cdot({k}_{1}-{k}^{\prime})}e^{2\pi iy\cdot({k}_{2}-{k}^{\prime})}d{k}^{\prime}\cdot h_{{m}}(x)h^{\ast}_{{m^{\prime}}}(y)\,dx\,dy+o(1)
=Rdf~(k)Hm(k1k)Hm(k2k)𝑑k+o(1).\displaystyle=\int_{R^{d}}\widetilde{f}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})d{k}^{\prime}+o(1). (44)

We note that as Hm(k)H_{{m}}({k}) has concentrated support we will be able to apply similar arguments to those of Appendix D to determine nearly uncorrelated DFTs.

Proof.

See Appendix I

Definition 5.1.

The tapered Fourier wavenumber grid for a point process observed on a cuboid domain B(𝐥)B_{\square}(\bm{l}) corresponds to the points

𝒦M(B(𝒍))={kn:kn=(pn1τ(ϵ)/l1,,pndτ(ϵ)/ld),pnj},{\cal K}_{M}(B_{\square}(\bm{l}))=\{{k}_{n}\;:\;{k}_{n}=\begin{pmatrix}p_{n1}\tau(\epsilon)/l_{1},\dots,p_{nd}\tau(\epsilon)/l_{d}\end{pmatrix},\quad p_{nj}\in{\mathbb{Z}}\}, (45)

where τ(ϵ)\tau(\epsilon) is the smallest positive real number such that |Hm(τ(ϵ)k)Hm(k)𝑑k|<ϵ,\left|\int H_{{m}}(\tau(\epsilon)-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\right|<\epsilon,\, for all m,m{1M}{{m}},{{m^{\prime}}}\in\{1\ldots{M}\}.

These results hint at producing general methodology; we can for the Poisson process figure out how to do non-parametric estimation. This is however not enough and we need to determine the variance of the periodogram more generally. To go further than Proposition 5.1 and avoid the Poisson assumption we refer back to (25). As Jh(k){J}_{h}({k}) is becoming Gaussian for larger sample sizes this implies its moments must also start to behave like the moments of the Gaussian Jh(k){J}_{h}({k}).

Note that this result is not contradictory to Theorem 5.1 from a dimensional perspective as this theorem concerns the modulus square of the Fourier transform. With these moments we can now consider the problem of estimation. We might feel an increasing degree of unease as we have succumbed to the usual fallacy of only deriving the most general results for a Poisson process. However as the observational domain becomes larger we would expect |Hm(k)||H_{{m}}({k})| to concentrate and so having a non-constant spectrum will become less of an issue.

6 Linear Smoothing

Having already derived the first and second order properties of the periodogram, in this Section we propose an accurate estimator of f(k)f({k}) formed by smoothing. If we fully characterise f(k)f({k}) then we have fully characterised γ(z)\gamma(z), as the Fourier transform is a bijection. Just like for a time series or random field, a key question concerns the correct method for quadratic characterisation and estimation.

We want to start from the random variables {Ih(k)}\{I_{h}({k})\} or {I~h(k)}\{\widetilde{I}_{h}({k})\} and directly from those quantities (non-parametrically) estimate the spectral density. As long as we seek to estimate f(k)f({k}) at points k{k} where the function f(k)f({k}) is continuous, averaging {I0(k)}\{I_{0}({k})\} (or {I~h(k)}\{\widetilde{I}_{h}({k})\}) locally seems like a sensible strategy. [23] proposed smoothing the 2D isotropic I~D(k)\widetilde{I}_{D}({k}) as defined in (16), and applied a weighted average smoother with user tuned weights. This approach, however, does not determine how to sample wavenumbers, or remove correlation. We would additionally like to include the new form of bias removal before smoothing.

We have two clear options available to us, either smoothing the raw periodogram at the Fourier frequencies (de-biasing is not needed at the Fourier frequencies, because we defined the grid such that the bias will be 0), or by using multitapering. We then have two possible estimators, namely the bias–corrected multi-taper estimator of [76] as well as a multi-dimensional kernel density estimator [26]. They are both linear in the estimated spectrum. We shall use the periodogram rather than another estimator for the kernel density estimation, as it is easier to keep track of bias and correlation.

We therefore define the multitaper estimator [76] (and refer to [36, 51])

I¯Mt(k)=1Mm=1MI~mt(k),kd,\overline{I}_{{M}}^{t}({k})=\frac{1}{{M}}\sum_{{{m}}=1}^{{M}}\widetilde{I}^{t}_{{{m}}}({k}),\quad{k}\in{\mathbb{R}}^{d}, (46)

where I~mt(k)\widetilde{I}^{t}_{{{m}}}({k}) is defined in (38).

Why is it advantageous to use the estimator in (46)? Because I~mt(k)\widetilde{I}^{t}_{{{m}}}({k}) is uncorrelated across m=1,,M{{m}}=1,\dots,{M} it follows that the variance of I¯Mt(k)\overline{I}^{t}_{M}({k}) decreases like 1/M1/{M}. Or

𝐕𝐚𝐫{I¯Mt(k)}\displaystyle{\mathrm{\bf Var}}\{\overline{I}_{{M}}^{t}({k})\} =1M2m=1Mm=1M𝐂𝐨𝐯{I~mt(k),I~mt(k)}\displaystyle=\frac{1}{{M}^{2}}\sum_{{{m}}=1}^{{M}}\sum_{{{m^{\prime}}}=1}^{{M}}{\mathrm{\bf Cov}}\{\widetilde{I}^{t}_{{{m}}}({k}),\widetilde{I}^{t}_{{{m^{\prime}}}}({k})\}
=1PM2m=1Mm=1M(λδmm+λ2(f~(k)+o(1))δmm)2\displaystyle=\frac{1}{P{{M}}^{2}}\sum_{{{m}}=1}^{{M}}\sum_{{{m^{\prime}}}=1}^{{M}}\left(\lambda\delta_{{{m}}{{m^{\prime}}}}+\lambda^{2}(\tilde{f}({k})+o(1))\delta_{{{m}}{{m^{\prime}}}}\right)^{2}
=λ2M2m=1M(1+λf~(k))2+o(1/M)=λ2M{1+λf~(k)}2+o(1/M)\displaystyle=\frac{\lambda^{2}}{{M}^{2}}\sum_{{{m}}=1}^{{M}}\left(1+\lambda\tilde{f}({k})\right)^{2}+o(1/{M})=\frac{\lambda^{2}}{{M}}\left\{1+\lambda\tilde{f}({k})\right\}^{2}+o(1/{M})
=1M{f(k)+o(1)}2+o(1/M),\displaystyle=\frac{1}{{M}}\{f({k})+o(1)\}^{2}+o(1/{M}), (47)

where the last equality follows from Lemma J.1, which is a consequence of Proposition 5.1. We shall explore these results in practical scenarios in simulations in Section 8.

An additional useful modification is to consider a one-dimensional smoothing-function W(k)W({k}) and a bandwidth matrix 𝛀\bm{\Omega} and implement multidimensional smoothing, see e.g. [26], and averaging over directions, thus reducing the statistic to magnitude t=kt=\|{k}\| only,

I¯h(t;Ω)=k𝒦Ω1W(Ω1(kt))I~h(k),\overline{I}_{h}(t;\Omega)=\sum_{{k}^{\prime}\in{\cal K}}{\Omega}^{-1}W\left({\Omega}^{-1}\left(||{k}^{\prime}||-t\right)\right)\widetilde{I}_{h}\left({k}^{\prime}\right), (48)

which we refer to as the “rotationally averaged periodogram”. [56] call a version of this function the RR-spectrum. Alternatively, one could average also over the magnitudes to arrive at a summary function in direction. In 2D the convenient argument would be the polar angle; [56] call this version the θ\theta-spectrum and use it to study anisotropy.

7 Isotropic spectral estimation

In this Section we show how to form isotropic estimators and discuss their properties. Isotropic estimators are vitally important in two dimensions and higher as it becomes increasingly difficult to interpret spectra. To simplify our representation, it is commonly assumed that we are analysing isotropic processes, and that only isotropic summaries need to be produced.

We therefore consider the special case of stationary and isotropic point processes for which the complete covariance function depends only on distance, γ(z)=γI(z)\gamma(z)=\gamma_{I}(||z||) [20, p. 310–311]. Such radial functions [32] are special as we do not require any orientational specificity in our spectral representation. The isotropy of γ(z)\gamma(z) transfers through the Fourier transform to the sdf so in turn f(k)=fI(k)f({k})=f_{I}(||{k}||). We shall refer to the treatment of isotropic random fields when implementing analysis [59]. An alternative is discussed by [27] and [25], for random fields.

The orientation invariance of the spectrum leads to dimensional reduction of the multidimensional Fourier transform. Three basic concepts are useful for the isotropic analysis: Recall the Bessel function of order ν>1/2{\nu}>-1/2

𝒥ν(t)=tν2νπ1/2Γ(ν+1/2)0πsin2ν(u)cos[tcos(u)]𝑑ut0.\mathcal{J}_{\nu}(t)=\frac{t^{\nu}}{2^{\nu}\pi^{1/2}\Gamma({\nu}+1/2)}\int_{0}^{\pi}\sin^{2{\nu}}(u)~{}\cos[t\ \cos(u)]\;du\qquad t\geq 0.

It is connected to the dd-dimensional Fourier transform via [74, Sec 3]

Sd1e2πtivu𝑑u=2πt(d/21)𝒥d/21(2πt)vSd1,t0,\int_{S^{d-1}}e^{-2\pi tiv\cdot u}du=2\pi~{}t^{-(d/2-1)}\mathcal{J}_{d/2-1}(2\pi t)\quad\forall v\in S^{d-1},\;t\geq 0,

we can also note relationships between radial and Cartesian Fourier representations, see for example [32] or [28].

Recall also the (1-dimensional) Hankel transform of order mm of a function g:[0,]g:[0,\infty]\mapsto{\mathbb{R}}

ν[g](t)=0g(r)𝒥ν(tr)r𝑑r,t0.\mathcal{H}_{\nu}[g](t)=\int_{0}^{\infty}g(r)\mathcal{J}_{\nu}(tr)r~{}dr,\qquad t\geq 0.

Then if we assume XX is a stationary and isotropic process with a radial second order product density ρI(2)\rho^{(2)}_{I}, the sdf at wavenumber k{k} with t=k>0t=\|{k}\|>0 takes the form

fI(t=k):=f(k)\displaystyle f_{I}(t=\|{k}\|):=f({k}) =λ+d{ρ(2)(z)λ2}e2πikz𝑑z\displaystyle=\lambda+\int_{{\mathbb{R}^{d}}}\left\{\rho^{(2)}(z)-\lambda^{2}\right\}e^{-2\pi i{k}\cdot z}\;dz
=λ+0{ρI(2)(r)λ2}rd1Sd1e2πriku𝑑u𝑑r\displaystyle=\lambda+\int_{0}^{\infty}\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}r^{d-1}\int_{S^{d-1}}e^{-2\pi ri~{}{k}\cdot u}\;du\;dr
=λ+2π0{ρI(2)(r)λ2}(rt)d/21𝒥d/21(2πtr)r𝑑r\displaystyle=\lambda+2\pi\int_{0}^{\infty}\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}\left(\frac{r}{t}\right)^{d/2-1}\mathcal{J}_{d/2-1}(2\pi tr)rdr
λ+2πd/21[{ρI(2)(r)λ2}(rt)d/21](2πt),\displaystyle\equiv\lambda+2\pi~{}\mathcal{H}_{d/2-1}\left[\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}\left(\frac{r}{t}\right)^{d/2-1}\right](2\pi t), (49)

i.e. the sdf is linearly related to the Hankel transform of the complete covariance function.

Finally we recall the radially averaged set covariance function,

ν¯B(r):=1|Sd1|Sd1|BBru|𝑑u,r0\bar{\nu}_{B}(r):=\frac{1}{|S^{d-1}|}\int_{S^{d-1}}|B\cap B_{ru}|du,\quad r\geq 0

where νB(z):=|BBz|\nu_{B}(z):=|B\cap B_{z}| is the set covariance function of the set BB [39, Appendix B.3].

We start by noting the expectation of (18); by direct calculation it follows that this takes the form:

𝐄{I¯h(k)}\displaystyle{\mathbf{E}}\{\bar{I}_{h}(\|{k}\|)\} =λ+BBBzhI(z)𝒥d/21(2πkz)zd/21ρ~(2)(z)𝑑z𝑑x.\displaystyle=\lambda+\int_{B}\int_{B\cap B_{-z}}h_{I}\left(\|z\|\right)\ \mathcal{J}_{d/2-1}\left(2\pi\|{k}\|\|z\|\right)\ \|z\|^{d/2-1}\ \tilde{\rho}^{(2)}(\|z\|)\ dzdx.

As the change of variable to polar coordinates means we will multiply by zd1\|z\|^{d-1}, and this will correspond to a d/21d/2-1 Hankel transform of the taper h()h(\cdot) multiplied by the product density. For more details on the Hankel transform see [32].

To gain insight into how we might estimate the radial sdf besides numerically rotation-averaging the dd-dimensional periodogram using Eq. (48) we analytically rotation-average Bartlett’s periodogram (Eq. 14): with t0t\geq 0

I¯0(t)\displaystyle\overline{I}_{0}(t) :=1|Sd1|Sd1I0(tu)𝑑u\displaystyle:=\frac{1}{|S^{d-1}|}\int_{S^{d-1}}{I}_{0}(tu)\,du (50)
=1|B||Sd1|xXByXBSd1e2πi(tu)(xy)𝑑u\displaystyle=\frac{1}{|B||S^{d-1}|}\sum_{x\in X\cap B}\sum_{y\in X\cap B}\int_{S^{d-1}}e^{-2\pi i(tu)\cdot(x-y)}\,du
=λ^+1|B||Sd1|x,yXBSd1e2πti(xy)u𝑑u\displaystyle=\hat{\lambda}+\frac{1}{|B||S^{d-1}|}\sum_{x,y\in X\cap B}^{\neq}\int_{S^{d-1}}e^{-2\pi ti(x-y)\cdot u}\ du
=λ^+2π|B||Sd1|td/21x,yXB𝒥d/21(2πtxy)xy(d/21),\displaystyle=\hat{\lambda}+\frac{2\pi}{|B||S^{d-1}|t^{d/2-1}}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{d/2-1}(2\pi t\|x-y\|)~{}\|x-y\|^{-(d/2-1)}, (51)

which in the planar case simplifies to

=d=2λ^+1|B|x,yXB𝒥0(2πtxy).\displaystyle\stackrel{{\scriptstyle d=2}}{{=}}~{}\hat{\lambda}+\frac{1}{|B|}\sum_{x,y\in X\cap B}^{\neq}\mathcal{J}_{0}(2\pi t\|x-y\|).

This explains the motivation behind the isotropic estimator discussed by [6] and [23]. The aforementioned authors prefer to normalise by N(B)N(B) rather than by |B||B|, corresponding to dividing (51) by λ^\hat{\lambda}. Like the classical periodogram, the isotropic “shortcut” estimator of Eq.(51) is highly biased near 0, as has already been observed by [23]. The biases are described in the following proposition.

Proposition 7.1.

The isotropic estimator given by (51) has expectation

𝐄{I¯0(t)}=\displaystyle{\mathbf{E}}\left\{\bar{I}_{0}(t)\right\}= λ+2πd/21[{ρI(2)(r)λ2}(rt)d/21ν¯B(r)|B|](2πt)+2πλ2d/21[(rt)d/21ν¯B(r)|B|](2πt)\displaystyle\lambda+2\pi~{}\mathcal{H}_{d/2-1}\left[\left\{\rho_{I}^{(2)}(r)-\lambda^{2}\right\}\left(\frac{r}{t}\right)^{d/2-1}~{}\frac{\bar{\nu}_{B}(r)}{|B|}\right](2\pi t)+2\pi\lambda^{2}~{}\mathcal{H}_{d/2-1}\left[\left(\frac{r}{t}\right)^{d/2-1}~{}\frac{\bar{\nu}_{B}(r)}{|B|}\right](2\pi t)
Proof.

The proof follows by applying the Campbell’s theorem and adding and subtracting

2πλ2d/21[(rt)d/21ν¯B(r)|B|](2πt)=λ21|Sd1||B|Sd1T(B,tu)𝑑u.2\pi\lambda^{2}~{}\mathcal{H}_{d/2-1}\left[\left(\frac{r}{t}\right)^{d/2-1}~{}\frac{\bar{\nu}_{B}(r)}{|B|}\right](2\pi t)=\lambda^{2}\frac{1}{|S^{d-1}||B|}\int_{S^{d-1}}T(B,tu)du.

If we compare the formulas for the expectation of the isotropic sdf in Eq.(7) and the expectation in Proposition 7.1, we recognise two sources of bias, just like with the periodogram. There is a convolution with a function ν¯B/|B|\bar{\nu}_{B}/|B|, coming from the finite observation window like in the periodogram but this time radially averaged, and a centring bias term that is a radially averaged version of the dd-dimensional bias. Diggle’s truncation-to-local-minimum construction, see (16), tries to correct for the biases, but among other things, it will not work well for a clustered process as the spectrum near zero will be underestimated.

To work towards a tapered estimator, consider rotation averaging the debiased, tapered estimator of (28), in expectation,

𝐄1|Sd1|Sd1I~h(tu)𝑑u\displaystyle{\mathbf{E}}\frac{1}{|S^{d-1}|}\int_{S^{d-1}}\widetilde{I}_{h}(tu)\,du (52)
=\displaystyle= λ+2π|Sd1|td/21𝐄[x,yXBh(x)h(y)𝒥d/21(2πtxy)xy(d/21)]λ2|Sd1|Sd1|H(tu)|2𝑑u.\displaystyle{\lambda}+\frac{2\pi}{|S^{d-1}|t^{d/2-1}}{\mathbf{E}}\left[\sum_{x,y\in X\cap B}^{\neq}h(x)h(y)\mathcal{J}_{d/2-1}\left(2\pi t\|x-y\|\right)\|x-y\|^{-(d/2-1)}\right]-\frac{\lambda^{2}}{|S^{d-1}|}\int_{S^{d-1}}\left|H(tu)\right|^{2}\,du.

The problem is that a radial function hIh_{I} might not exist for which hI(xy)=h(x)h(y)h_{I}(||x-y||)=h(x)h(y). But if we consider data-tapering the observed differences {xy:x,yXB}\{x-y:x,y\in X\cap B\} instead of in BB, then we can device a taper hIh_{I} in BB=xB{B+x}B\oplus B=\bigcup_{x\in B}\{B+x\}, thus we propose the following isotropic estimator:

I¯h(t):=λ^+2π|Sd1|td/21|B|x,yXBhI(xy)𝒥d/21(2πtxy)xy(d/21)λ2^BiasI(t).\displaystyle\bar{I}_{h}(t):=\hat{\lambda}+\frac{2\pi}{|S^{d-1}|t^{d/2-1}|B|}\sum_{x,y\in X\cap B}^{\neq}h_{I}(x-y)\mathcal{J}_{d/2-1}(2\pi t\|x-y\|)~{}\|x-y\|^{-(d/2-1)}-\widehat{\lambda^{2}}Bias_{I}(t). (53)

The bias correction term consists of the rotation average of both the taper and the set covariance of BB,

BiasI(t)\displaystyle Bias_{I}(t) =\displaystyle= 2πd/21[(rt)d/21p¯B,h|B|](2πt),where\displaystyle 2\pi\mathcal{H}_{d/2-1}\left[\left(\frac{r}{t}\right)^{d/2-1}\frac{\bar{p}_{B,h}}{|B|}\right](2\pi t),\quad\text{where}
p¯B,h(r)\displaystyle\bar{p}_{B,h}(r) =\displaystyle= |Sd1|1Sd1|BBru|hI(ru)𝑑u,\displaystyle|S^{d-1}|^{-1}\int_{S^{d-1}}|B\cap B_{ru}|h_{I}(ru)du,

and depends only on t=|k|t=|{k}|. We warn the reader that the estimator of (53) is not necessarily positive, but like previous authors  [50, 4], we sacrifice positivity in order to remove large part of the bias. Additional issues arise from estimation of the required λ2\lambda^{2}, as the standard estimator λ^2\hat{\lambda}^{2} is biased with a term depending on λ/|B|\lambda/|B| and a term depending on the unknown second order product density of the underlying process. In our examples we use the slightly less biased estimator of λ2^=N(B)[N(B)1]/|B|2\widehat{\lambda^{2}}=N(B)[N(B)-1]/|B|^{2} which removes the first order bias. On the positive side, note that the debiased isotropic estimator without particular tapering (i.e. hI1h_{I}\equiv 1) can be formed without any tuning parameters.

We may now ask what radial taper to use. Tapers are normally designed for processes observed in discrete time or space. We have discussed using separable tapers, and now note that the choice of the taper needs to match the observational domain. For example it is difficult to match a square observational domain with a radial taper, unless one chooses a compact taper inscribed by the box. There are therefore three considerations 1) demanding the taper be separable in space, 2) isotropic in space, or 3) exactly compact in space. Discrete and isotropic tapers have been been determined numerically by [65]. This will not be an option for us as we need to evaluate the tapers at random locations.

Slepian and co-authors have studied the design of tapers in arbitrary dimensions [66]. Often solutions are in terms of prolate spheroidal wavefunctions. Their discrete analogue is the prolate spheroidal wave sequences, which are similar to the set of Hermite function [78]. However, the isotropic estimator can be seen as an estimator for the Fourier transform of the (non-stationary) first moment of the difference process Z={z=xy:x,yXB}Z=\{z=x-y:x,y\in X\cap B\}, with the connection ρ(2)(z)=λZ(z)/|BBz|\rho^{(2)}(z)=\lambda_{Z}(z)/|B\cap B_{z}|.

Assuming the observation window is a cuboid B=B(𝒍)B=B_{\square}(\bm{l}), then the difference vector observation window is BB=j=1d[lj,lj]B\oplus B=\prod_{j=1}^{d}[-l_{j},l_{j}], and we can create the dd-dimensional taper related to the Hermite functions as the product of the squared exponentials

hI(z)=j=1deazj2/(2lj)2,zdh_{I}(z)=\prod_{j=1}^{d}e^{-az_{j}^{2}/(2l_{j})^{2}},\quad z\in{\mathbb{R}^{d}} (54)

We see that if lj=ll_{j}=l for all jj, then hIh_{I} becomes radial. This is related to circular harmonic decompositions, see [41]. The authors of [78] recommend using a scaling factor which we have adjusted for a sampling region of zj(lj,lj)z_{j}\in(-l_{j},l_{j}), but which they match to the spheroidal wave functions. We propose setting a=25a=25, as this numerically seems to not down–weight too much data, or have a too large jump near the border of BBB\oplus B.

In a sense our above construction is a “fix” as it only works for the 0th Hermite function, and using more than one taper will even if lj=ll_{j}=l for all jj not lead to isotropic functions. This is not unsurprising as the sampling domain will leave an imprint, and capturing this set of information requires using more than the first taper, see analogous discussion in 1D [58, Ch. 6]. Also, if we have a spatial sampling that is highly elongated rectangle rather than a square then another solution needs to be chosen. Finally we may ask ourselves, what happens if we have a point process XX which is anisotropic but we still evaluate (15) in 2D for the pattern? Because it incorporates angular averaging we expect the RHS of the expression to only depend on k\|{k}\| as it does.

Proposition 7.2.

Expectation of Diggle’s 2D Estimator II. The large area expectation of Diggle’s estimator given in (15) when BB is a cuboidal box B(𝐥)B_{\square}(\bm{l}) and when the spectrum is not necessarily isotropic takes the form in terms of f~(k)\widetilde{f}({k}) as defined in (11) of

𝐄{ID(k)}\displaystyle{\mathbf{E}}\{I_{D}({k})\} =λ+λ2df~(k)δ(kk)k𝑑k+dδ(kk)k|B|1T(B,wk)𝑑k+o(1),\displaystyle=\lambda+\lambda^{2}\int_{{\mathbb{R}}^{d}}\widetilde{f}({k}^{\prime})\frac{\delta(\|{k}\|-\|{k}^{\prime}\|)}{\|{k}\|}d{k}^{\prime}+\int_{{\mathbb{R}}^{d}}\frac{\delta(\|{k}\|-\|{k}^{\prime}\|)}{\|{k}\|}|B|^{-1}T(B,w-{k}^{\prime})d{k}^{\prime}+o(1), (55)

where T(B,k)T(B,{k}) is defined in Lemma 4.1.

Proof.

The proof is provided in Appendix K. ∎

This shows what happens when we calculate isotropic summaries of quantities that are not isotropic per se. Thus the two propositions determine what expectation the analytically orientationally averaged periodogram and the second shows us how Diggle’s estimator mixes orientational information up to produce an isotropic estimator.

8 Simulations

To study the behaviour of the estimators we have discussed, and additionally some of the debiasing results, we conducted an extensive simulation study.

8.1 Details for executing the simulations study

We simulated three stationary models in 2D to represent the archetypal spatial patterns of regularity, complete randomness and clustering. Throughout the simulations the intensity, i.e. expected number of points per unit area, was kept constant at λ0.01\lambda\equiv 0.01. The processes were simulated in square observation windows Bn=[ln/2,ln/2]2B_{n}=[-l_{n}/2,l_{n}/2]^{2} of increasing area, such that on average we observed λ|Bn|=λln2=n=25,50,100,200,400\lambda|B_{n}|=\lambda l_{n}^{2}=n=25,50,100,200,400 or 800800 points. For example, ln=100=100l_{n=100}=100 and ln=400=200l_{n=400}=200 spatial units. We chose the regime both to explore the often encountered small data scenarios and to ascertain asymptotic behaviour. The models below are tuned so that they produce patterns with visually distinct structures at ln=100l_{n=100}.

Complete randomness i.e. the Poisson process has no adjustable parameters after λ\lambda was fixed. Spatial regularity is represented by the Matérn type II process [39, Section 6.5.2] in two variants, one with hard-core radius 22 (variant “r2”) and one with radius 55 (variant “r5”) which correspond to about 40% and 90% of the maximum allowed radius for the process given λ\lambda, respectively. The clustering behaviour is represented by the Thomas process [39, Section 6.3.2] with cluster intensity κ\kappa and Gaussian dispersal kernel standard deviation σ\sigma and again in two variants, one with many small or tight clusters (κ=0.6λ,σ=2\kappa=0.6\lambda,\sigma=2, variant “MS”) and one with a few large clusters (κ=0.3λ,σ=6\kappa=0.3\lambda,\sigma=6, variant “FL”). The per-cluster expected point count μ\mu was then fixed via the property μκ=λ\mu\kappa=\lambda to 1.671.67 and 3.333.33, respectively. The theoretical pair correlation functions and spectral density functions of the models are depicted in Figure 2. Example patterns of each model and observation window combination are given in Supplement Figure 8.

For the estimation of the 2D spectra we fixed the wavenumber grid to k[0.3,0.3]2{k}\in[-0.3,0.3]^{2} with 101 equidistant steps in each dimension, giving the step size 0.006 in both dimensions. This scale was chosen to cover the interesting range of non-constant values for all models (cf. Figure 2). To reduce the effect of high wavenumber noise only the values on the sub-grid k[0.2,0.2]2{k}\in[-0.2,0.2]^{2} were considered when integrating over k{k} for the quality metrics discussed below. When a rotationally averaged curve was to be computed using Eq. (48), the averaging was done over the magnitude-grid k=0.003,0.006,,0.300\|{k}\|=0.003,0.006,...,0.300 using a box kernel and if not otherwise stated, radius of 1.250.0061.25\cdot 0.006. The radial Hermite taper parameter was fixed to a=25a=25.

We summarised the quality of each estimator I.=I.(k;𝐱)I_{.}=I_{.}({k};{\mathbf{x}}) under all combinations of a model \mathcal{M} and observation window BnB_{n}, say n\mathcal{M}_{n}, with an integrated summary. First, we estimated per-wavenumber variance V(k;n)=Var[I.(k;𝐱)|𝐱n]V({k};\mathcal{M}_{n})=\ \text{Var}\left[I_{.}({k};{\mathbf{x}})|{\mathbf{x}}\sim\mathcal{M}_{n}\right], bias Bias(k;n)=𝐄[I.(k;𝐱)fM(k)|𝐱n]Bias({k};\mathcal{M}_{n})=\ {\mathbf{E}}\left[I_{.}({k};{\mathbf{x}})-f_{M}({k})|{\mathbf{x}}\sim\mathcal{M}_{n}\right] and mean square error MSE(k;n)=V(k;n)+Bias2(k;n)MSE({k};\mathcal{M}_{n})\ =\ V({k};\mathcal{M}_{n})+Bias^{2}({k};\mathcal{M}_{n}). The ff_{\mathcal{M}} stand for the theoretical sdf of the model \mathcal{M}. Then these were summarised further to iVar(n)=kV(k;n)iVar(\mathcal{M}_{n})\ =\sum_{k}V({k};\mathcal{M}_{n}), the integrated squared bias iBias2(n)=kBias2(k;n)iBias^{2}(\mathcal{M}_{n})=\sum_{k}Bias^{2}({k};\mathcal{M}_{n}) and iMSE(n)=iVar(n)+iBias2(n)iMSE(\mathcal{M}_{n})=iVar(\mathcal{M}_{n})+iBias^{2}(\mathcal{M}_{n}). Smaller iVar,iBias2iVar,iBias^{2} and iMSEiMSE indicate better quality. The quantities were estimated from 1000 simulations of every n\mathcal{M}_{n}.

The estimator for a periodogram is given in equation (28) with the debiasing term included. Bartlett’s periodogram (“periodogram”) has the taper h0(x)=|Bn|12𝟏Bn(x)h_{0}(x)=|B_{n}|^{-\frac{1}{2}}\mathbf{1}_{B_{n}}(x) for window BnB_{n}. For the multitapering (“mt M{M}”), defined in equation (38) with M=M~2{M}={\tilde{M}}^{2} tapers each having a parameter m=(m~1,m~2){1,,M~}2{{m}}~{}=~{}({\tilde{m}}_{1},{\tilde{m}}_{2})~{}\in~{}\{1,...,{\tilde{M}}\}^{2}, we used the orthogonal sine-tapers hm(x)=𝟏Bn(x)j=12sin[πm~j(xj+lj/2)/lj]h_{{m}}(x)=\mathbf{1}_{B_{n}}(x)\prod_{j=1}^{2}\sin[\pi{\tilde{m}}_{j}(x_{j}+l_{j}/2)/l_{j}]. For the kernel smoothed estimators (“smoothed bb”), we first compute the Bartlett’s periodogram and then, given the estimate I0I_{0} on the wavenumber gird, we convolute it with a b×bb\times b discrete template having approximately Gaussian weights.

All computations were done using the R-software [62, v.3.6.3]. Simulations were done with the help the R-package spatstat [3, v.1.64-1]. Each of the discussed estimators were programmed into an R-package ppspectral, available on request from the first author.

Table 1: The fraction of bias removed by the proposed bias correction. Maximum is 1.00.
Model Estimator n=25n=25 50 100 200 400 800
MatérnII r5 periodogram 1.00 1.00 1.00 0.98 0.99 1.00
MatérnII r5 mt M=32{M}=3^{2} 0.99 1.00 1.00 1.00 1.00 1.00
MatérnII r2 periodogram 1.00 1.00 1.00 0.99 0.99 1.00
MatérnII r2 mt M=32{M}=3^{2} 1.00 1.00 1.00 1.00 1.00 1.00
Poisson periodogram 1.00 1.00 1.00 0.98 0.99 1.00
Poisson mt M=32{M}=3^{2} 1.00 1.00 1.00 1.00 1.00 1.00
Thomas FL periodogram 0.93 0.98 0.99 0.97 0.99 1.00
Thomas FL mt M=32{M}=3^{2} 0.35 0.86 0.98 1.00 1.00 1.00
Thomas MS periodogram 0.98 0.99 1.00 0.97 0.99 1.00
Thomas MS mt M=32{M}=3^{2} 0.89 0.98 1.00 1.00 1.00 1.00

8.2 The effect of debiasing

As we saw in Figure 3, the debiasing term has a large effect on the quality of the estimates. Table 1 provides the fractions of bias removed by the debiasing term for the Bartlett’s periodogram and M=32{M}=3^{2} multitapered periodogram. Nearly all bias is removed, with the only notable exception being the 65% bias left in the multitapered periodogram when data is an arguably very small sample of Thomas FL variant.

We illustrate the form of the biases in Figure 4 for two model variants. The centring bias is always positive, as expected from equation (27). Bartlett’s periodogram bias (cf. Lemma 4.1) is concentrated along the axis where one of the sinc functions is constantly 1 and near the origin where the second sinc function grows to 1. The multitapered periodogram has a square-shape around the origin.

Refer to caption
Figure 4: The effect of bias removal for two of the models described in the text. Theoretical sdf on the left, and then towards the right, estimated mean of: Bartlett’s periodogram, biased and debiased; Multitapered M=32{M}=3^{2}, biased and debiased. Estimated with sample sizes n=800n=800.

8.3 Overall quality of the debiased estimators

Figure 5 summarises the estimated integrated variance, squared bias and MSE of the estimators with various tuning parameters. Only the debiased estimators are included, and the values are given in log10\log_{10}-scale and relative to the baseline given by Bartlett’s periodogram.

Multitaper with M=12{M}=1^{2} (i.e. only one taper) does not differ in iMSE from the periodogram as the benefit of averaging does not apply. With more than one taper the variance is reduced, e.g. M=32{M}=3^{2} it goes down by 90%, and the bias stays at baseline, but adding more and more tapers (M=62{M}=6^{2}) starts to introduce bias especially for low sample sizes. The bias is more prominent for the clustered cases, and based on the earlier discussion this is due to oversmoothing at small wavenumbers. Both the variance reduction and the oversmoothing happens with the post-hoc smoothing as well. From the results we can confirm that some smoothing is beneficial but again a balance must be struck to avoid oversmoothing (cf. Section 6). The regular process is more resilient to oversmoothing with reasonable sample sizes.

Refer to caption
Figure 5: Relative integrated variance (top row), squared integrated bias (middle), and integrate mean square error (bottom) for different models and estimators of spectral density at various levels of sample size nn. The values are relative to debiased Bartlett’s periodogram, which has the value 0 in the presented log10\log_{10}-scale. For example, a value of -1 means the error is of order 0.1 relative to the periodogram.

8.4 Rotational averaging and isotropic estimation

Isotropic estimation, either by averaging radially or by the isotropic shortcuts discussed in Section 7, is an important dimension reduction step for easier visual inspection of the periodogram (particularly for 3D data). The kernel-based rotation averaging depends on a bandwidth parameter, and we studied how this affects the quality metrics. We set the bandwidth radius to BF0.006BF\cdot 0.006 and varied the bandwidth factor BFBF from 1 to 10. Figure 6 provides the iBias2iBias^{2} and iMSEiMSE for the debiased Bartlett’s periodogram when comparing the rotationally averaged estimates to the true 1D isotropic sdfs (cf. Figure 2).

The Poisson process has a constant sdf so oversmoothing will not be penalised. For the clustered processes even a small bandwidth introduces bias, but this is typically more than offset by the reduction in variance. The error for the few large clusters variant is more sensitive to changes in the bandwidth than the many small clusters. This is because the effective range of wavenumbers where the sdf exhibits structure is more concentrated, and thus easier to smooth out, than the few large cluster variant (cf. Figure 2).

The strongly regular variant “r5” is similar to the clustered cases in that an optimal bandwidth is clearly present. For the less regular “r2” variant the error is like that of the Poisson process, possibly due to the target sdf being a weak oscillation around a constant and this being hard to detect. If we look at the integrated squared bias we however do see that with large enough data oversmoothing is detectable.

We note that since multitapered periodograms and locally averaged periodograms are already smoothed, their rotational averaged estimates exhibit smoothing bias at smaller bandwidth factors (results not worth showing here).

Refer to caption
Figure 6: Integrated squared bias (top) and integrated MSE of the rotationally averaged Bartlett’s periodogram when compared to the true 1D isotropic sdf. The rotational averaging bandwidth was BF0.006BF\cdot 0.006, where the bandwidth factor BFBF is on the x-axis. The models are described in the text.
Table 2: Scores iMSE,iBiasiMSE,iBias and iVariVar for isotropic estimators, relative to radially averaged debiased periodogram with optimised smoothing bandwidth.
No taper Squared-exponential taper
Model iMSE iBias2 iVar iMSE iBias2 iVar
MatérnII r5 6.12 0.21 8.57 3.51 0.29 4.90
MatérnII r2 22.40 3.52 23.07 15.59 1.09 16.12
Poisson 15.72 10.53 15.89 13.76 5.75 13.77
Thomas FL 1.22 0.04 1.81 1.32 0.14 1.97
Thomas MS 4.34 0.01 6.50 3.58 0.04 5.35

To compare the radial averaging to direct isotropic estimation, we estimated the debiased isotropic periodogram and the squared-exponential tapered isotropic periodogram with a=25a=25. Table 2 provides the relative quality scores for the isotropic and the tapered isotropic periodograms relative to radially averaged periodogram with the best bandwidth for each model and data size, cf. Figure 6. The values are medians over the sample sizes n50n\geq 50.

In iMSE sense the radially averaged periodogram performs the best, chiefly because iMSE is of order iVar and the radial averaging greatly reduces the variance. However, in terms of bias the isotropic estimator provides more accuracy, except for Poisson when no oversmoothing is possible. Adding a smooth taper to the isotropic estimator further reduces the bias, but can increase the variance, likely due to some data near the edges is being filtered out. The bias occurs near k=0\|{k}\|=0, mostly when k<1/(2ln)\|{k}\|<1/(\sqrt{2}l_{n}) (illustration in Supplements Figure 9).

9 Spectral properties of the Barro Colorado Island data

The Barro Colorado Island (BCI) rainforest data set [16] contains a census of tree and shrub species from a 1000m by 500m region on Barro Colorado Island, Panama. We illustrate the techniques developed in this paper on the point patterns corresponding from three species contained in the BCI data, using the 2015 census, using only individuals that are alive.

Refer to caption
Figure 7: Top row: Point patterns of three different species from the BCI data. Middle row: Multitaper spectral estimates from the above point patterns, rescaled by the intensity and plotted on a decibel scale, i.e. 10log10(I¯Mt(k)/λ^)10\log_{10}(\bar{I}^{t}_{M}({k})/\hat{\lambda}) with M=32M=3^{2}. Bottom row: Estimated pair correlation function for the three species, i.e. ρ^(2)(z)/λ^2\widehat{\rho}^{(2)}(z)/\hat{\lambda}^{2}.

Figure 7 shows an example analysis for three species from the BCI data, namely Annona acuminata, Herrania purpurea, and Psychotria marginata. The top row shows the point patterns for each of the three species, the middle row shows estimates of their spectral density functions computed using multitapering, and the bottom row shows estimates of the pair correlation function for each species. It is clear that there are substantial differences in the structures present in the spectral density functions of these species. Firstly, in the case of Annona acuminata (left), we see clearly anisotropic behaviour, similar to the anisotropic clustering shown in Figure 3. This is to be expected as A. acuminata is usually associated with very moist soils and has previously been associated with the stream (a linear structure) running through the plot [37, 29]. Secondly, in the case of Herrania purpurea (middle) we see no structure at all, which is reassuring as this suggests that the features seen in the other two species are signal and not just noise. Again, this makes ecological sense as H. purpurea is a lower canopy tree that has seeds widely dispersed by monkeys and other animals. Finally for Psychotria marginata (right) we see more isotropic behaviour, with information present at low wavenumbers until k0.005\|k\|\approx 0.005, corresponding to correlation on scales of length around 200. Additionally, there is further information until k1/15\|k\|\approx 1/15, which corresponds to the peak seen in the pair correlation function until scales around 15. This clustering is likely caused by seeds being relatively poorly dispersed [69], but also because P. marginata is associated with moist soils leading to a build up of large clusters of trees in the low plateau and swamp regions of the plot in the upper left quadrant of the BCI plot [37, 29].

10 Conclusions

Spectral analysis and Fourier features are classical and important tools for characterizing and understanding time series and random fields [58, 24], in for example estimation and detection problems. When first introduced, the fundamental theory of spectral analysis for random fields and time series corresponded to understanding what Fourier transform to use, and determining the first and second moments of that Fourier transform for stationary time series and random fields [61]. This led researchers to develop the by now well–established theory of spectral analysis (again for time series and random field) from which a large and sophisticated theory of spectral analysis was built. This set of methods has more recently been discovered in machine learning [48, 73]. The corresponding theory for spatial point processes has been neglected apart from some notable and not very recent exceptions [5, 6, 56]. For point processes machine learning researchers have just started to discover the utility of Fourier-based methods [38, 44]. Current state of the art for the spectral analysis of point processes is that we really do not even know what to compute, and know even less how to address its digital implementation.

This article has addressed this first outstanding problem. We have calculated the expectation of natural method of moments estimators. We have imported state of the art ideas from signal processing as part of this process, and have addressed how to taper for variance reduction in this setting. Introducing tapering in this setting required us to determine how to taper, where we chose to use continuous space tapers, not to interpolate as had been done by previous authors. We showed that in the setting of Gaussian convergence, multitaper estimates of the spectral density have a decreased variance. Our simulation studies verify our theoretical results, and the general utility of tapering outside the most restrictive class where we can prove everything.

This manuscript does not seek to address how to bring spectral analysis of point processes into the 21st century, and reproduce all results available for time series and random fields. Instead we have taken an important first step by adapting existing signal processing tools for time series and random fields and shown how they can be modified to help us estimate the spectral content of a spatial point process. This paves the way forward to develop more sophisticated theory which can be applied to complex spatial point pattern datasets and more sophisticated algorithms.

Acknowledgment

This work was supported in part by by the UK Engineering and Physical Sciences Research Council under Mathematical Sciences Leadership Fellowship EP/I005250/1, Developing Leaders Award EP/L001519/1, and Award EP/N007336/1; the European Research Council under Grant CoG 2015-682172NETS within the Seventh European Union Framework Program; and Academy of Finland (project number 327211).

Appendix A Proof of Lemma 4.1

Proof.

We start the proof by noting that (using that BB is centred)

𝐄I0(k)\displaystyle{\mathbf{E}}I_{0}({k}) =λ+|B|1BBxe2πikzρ(2)(z)𝑑z\displaystyle=\lambda+|B|^{-1}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}\rho^{(2)}(z)dz (56)
=λ+|B|1BBxe2πikz[ρ(2)(z)λ2]𝑑z𝑑x+|B|1λ2BBxe2πikz𝑑z\displaystyle=\lambda+|B|^{-1}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dzdx+|B|^{-1}\lambda^{2}\int_{B}\int_{B_{-x}}e^{-2\pi i{k}\cdot z}dz
=λ+|B|1d|BBz|e2πikz[ρ(2)(z)λ2]𝑑z+|B|1λ2T(B,k)\displaystyle=\lambda+|B|^{-1}\int_{{\mathbb{R}}^{d}}|B\cap B_{-z}|e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz+|B|^{-1}\lambda^{2}T(B,{k})
=λ+dG(1)(k)G~(2)(kk)𝑑k+|B|1λ2T(B,k),\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}G^{(1)\ast}({k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime}-{k})\,d{k}^{\prime}+|B|^{-1}\lambda^{2}T(B,{k}),

by the convolution theorem, after we have defined the functions:

G(1)(k)\displaystyle G^{(1)}({k}) =|B|1Rd|BBz|e2πikz𝑑z=|B|1T(B,k)\displaystyle=|B|^{-1}\int_{R^{d}}|B\cap B_{-z}|e^{-2\pi i{k}\cdot z}\,dz=|B|^{-1}T(B,{k})
G~(2)(k)\displaystyle\widetilde{G}^{(2)}({k}) =de2πikz[ρ(2)(z)λ2]𝑑z=λ2f~(k).\displaystyle=\int_{{\mathbb{R}}^{d}}e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz=\lambda^{2}\widetilde{f}({k}).

Note that from Gradshteyn 3.741 with the change of variables xj=πkjljx_{j}=\pi{k}_{j}l_{j} that we get (recalling |B|=jlj|B|=\prod_{j}l_{j}):

dT(B,k)𝑑k\displaystyle\int_{{\mathbb{R}}^{d}}T(B,{k})\;d{k} =djsin2(πkjlj)(πkj)2dkj\displaystyle=\int_{{\mathbb{R}}^{d}}\prod_{j}\frac{\sin^{2}(\pi{k}_{j}l_{j})}{(\pi{k}_{j})^{2}}d{k}_{j}
=djlj2sin2(xj)xj2(dxj/(πlj))\displaystyle=\int_{{\mathbb{R}}^{d}}\prod_{j}l_{j}^{2}\frac{\sin^{2}(x_{j})}{x_{j}^{2}}(dx_{j}/(\pi l_{j}))
=|B|j(π/π)=|B|.\displaystyle=|B|\prod_{j}(\pi/\pi)=|B|. (57)

We have defined

T(B,k)=j=1dsin2(πkjlj)(πkj)2,kd.T(B,{k})=\prod_{j=1}^{d}\frac{\sin^{2}(\pi{k}_{j}l_{j})}{(\pi{k}_{j})^{2}},\quad{k}\in{\mathbb{R}}^{d}. (58)

We now go back to our definition of 𝐄I0(2)(k){\mathbf{E}}I_{0}^{(2)}({k}), and explore this more carefully. We also fix 0<β<10<\beta<1, and expand the expectation as follows (realising that care needs to be taken around the pole):

𝐄I0(2)(k)\displaystyle{\mathbf{E}}I_{0}^{(2)}({k}) =λ2RdG(1)(kk)G~(2)(k)𝑑k\displaystyle=\lambda^{2}\int_{R^{d}}G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}
=λ2j=1d(kj1lj1β+kj1lj1βkj+1lj1β+kj+1lj1β)G(1)(kk)G~(2)(k)dk\displaystyle=\lambda^{2}\prod_{j=1}^{d}\left(\int_{-\infty}^{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}+\int_{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}^{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}+\int_{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}^{\infty}\right)G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}
=𝐄I11(2)(k)+𝐄I12(2)(k)++𝐄Idd(2)(k).\displaystyle={\mathbf{E}}I_{1\dots 1}^{(2)}({k})+{\mathbf{E}}I_{12\dots}^{(2)}({k})+\dots+{\mathbf{E}}I_{d\cdots d}^{(2)}({k}). (59)

We can note directly, that taking a supremum over G(1)(kk)G^{(1)\ast}({k}-{k}^{\prime}) in the range of integration, and assuming the integral of G(2)(k)G^{(2)}({k}) is finite (which is true from Parseval-Rayleigh relationships

𝐄I11(2)(k)=𝒪(|B|1j=1dlj22β)=𝒪(j=1dlj12β),{\mathbf{E}}I_{11}^{(2)}({k})={\cal O}(|B|^{-1}\prod_{j=1}^{d}l_{j}^{2-2\beta})={\cal O}(\prod_{j=1}^{d}l_{j}^{1-2\beta}), (60)

which requires β>1/2\beta>1/2 to be o(1)o(1). The same holds for the other elements apart from 𝐄I22(2)(k){\mathbf{E}}I_{2\dots 2}^{(2)}({k}). We therefore have

𝐄I0(2)(k)\displaystyle{\mathbf{E}}I_{0}^{(2)}({k}) =λ2j=1dkj1lj1βkj+1lj1βG(1)(kk)G~(2)(k)𝑑k{1+o(1)}\displaystyle=\lambda^{2}\prod_{j=1}^{d}\int_{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}^{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}\left\{1+o(1)\right\} (61)
0(2)(k){1+o(1)},\displaystyle\equiv{\cal I}_{0}^{(2)}({k})\left\{1+o(1)\right\},

so that

0(2)(k)λ2j=1dkj1lj1βkj+1lj1βG(1)(kk)G~(2)(k)𝑑k.{\cal I}_{0}^{(2)}({k})\equiv\lambda^{2}\prod_{j=1}^{d}\int_{{k}_{j}-\frac{1}{l_{j}^{1-\beta}}}^{{k}_{j}+\frac{1}{l_{j}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\widetilde{G}^{(2)}({k}^{\prime})d{k}^{\prime}.

We assume that f~(k)\tilde{f}({k}) is twice differentiable at k{k} (everywhere possibly but k=0{k}=0) so that we get for all k0{k}\neq 0 and for kk′′<kk\|{k}-{k}^{\prime\prime}\|<\|{k}-{k}^{\prime}\|

f~(k)=f~(k)+f~(k)T(kk)+12(kk)T𝐇~f(k′′)(kk),\widetilde{f}({k}^{\prime})=\widetilde{f}({k})+\nabla\widetilde{f}({k})^{T}\left({k}^{\prime}-{k}\right)+\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right), (62)

defining f~(k)\nabla\widetilde{f}({k}) as the gradient and 𝐇~f(k)\tilde{\mathbf{H}}_{f}({k}) as the Hessian matrix.

Substituting this expansion back into (61) we get with a change of variables

0(2)(k)\displaystyle{\cal I}_{0}^{(2)}({k}) =λ2k11l11βk1+1l11βkp1lp1βkp+1lp1βG(1)(kk){f~(k)+f~(k)T(kk)+12(kk)T\displaystyle=\lambda^{2}\int_{{k}_{1}-\frac{1}{l_{1}^{1-\beta}}}^{{k}_{1}+\frac{1}{l_{1}^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l_{p}^{1-\beta}}}^{{k}_{p}+\frac{1}{l_{p}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\left\{\widetilde{f}({k})+\nabla\widetilde{f}({k})^{T}\left({k}^{\prime}-{k}\right)+\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\right.
𝐇~f(k′′)(kk)}dk\displaystyle\left.\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\right\}\;d{k}^{\prime}
=λ2f~(k)k11l11βk1+1l11βkp1lp1βkp+1lp1βG(1)(kk)𝑑k\displaystyle=\lambda^{2}\widetilde{f}({k})\int_{{k}_{1}-\frac{1}{l_{1}^{1-\beta}}}^{{k}_{1}+\frac{1}{l_{1}^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l_{p}^{1-\beta}}}^{{k}_{p}+\frac{1}{l_{p}^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\;d{k}^{\prime}
+12trace{𝐇~f(k′′)k11l1βk1+1l1βkp1l1βkp+1l1β|B|1T(B,kk)(kk)(kk)T𝑑k}.\displaystyle+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\int_{{k}_{1}-\frac{1}{l^{1-\beta}}}^{{k}_{1}+\frac{1}{l^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}|B|^{-1}T(B,{k}-{k}^{\prime})\cdot({k}^{\prime}-{k})\left({k}^{\prime}-{k}\right)^{T}\;d{k}^{\prime}\right\}. (63)

We can then note that

k11l1βk1+1l1βkp1l1βkp+1l1βG(1)(kk)𝑑k=1+o(1).\int_{{k}_{1}-\frac{1}{l^{1-\beta}}}^{{k}_{1}+\frac{1}{l^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}G^{(1)}({k}-{k}^{\prime})\;d{k}^{\prime}=1+o(1).

By direct calculation we then find

RdG(1)(k)𝑑k\displaystyle\int_{R^{d}}G^{(1)}({k})d{k} =|B|1RdRd|B×Bz|ei2πkz𝑑z𝑑w\displaystyle=|B|^{-1}\int_{R^{d}}\int_{R^{d}}|B\times B_{-z}|e^{-i2\pi{k}\cdot z}\,dzdw
=|B|1Rd|B×Bz|δ(z)𝑑z\displaystyle=|B|^{-1}\int_{R^{d}}|B\times B_{-z}|\delta(z)\,dz
=1.\displaystyle=1.

Note that this is now consistent with (57), and achieves a value of 11. Second,

k11l1βk1+1l1βkp1l1βkp+1l1β(kk)TG(1)(kk)𝑑k=0,\int_{{k}_{1}-\frac{1}{l^{1-\beta}}}^{{k}_{1}+\frac{1}{l^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}\left({k}^{\prime}-{k}\right)^{T}G^{(1)\ast}({k}-{k}^{\prime})\;d{k}^{\prime}=0,

as

|B×Bz|=|B×Bz|,|B\times B_{-z}|=|B\times B_{z}|,

implies that this function |B×Bz||B\times B_{-z}| is symmetric (or even) in zz, and real-valued, which implies that its Fourier transform also is real-valued. The symmetric integral of an even times an odd function is always zero.

We now also determine

𝐁=k11l11βk1+1l11βkp1l1βkp+1l1βG(1)(kk)(kk)(kk)T𝑑k.{\mathbf{B}}=\int_{{k}_{1}-\frac{1}{l_{1}^{1-\beta}}}^{{k}_{1}+\frac{1}{l_{1}^{1-\beta}}}\dots\int_{{k}_{p}-\frac{1}{l^{1-\beta}}}^{{k}_{p}+\frac{1}{l^{1-\beta}}}G^{(1)\ast}({k}-{k}^{\prime})\cdot({k}-{k}^{\prime})\left({k}-{k}^{\prime}\right)^{T}\;d{k}^{\prime}. (64)

Let us start by a change of variables. Let us first define the matrix

𝑳=diag(l1,,ld).\bm{L}={\mathrm{diag}}\left(l_{1},\dots,l_{d}\right).

Thus the change of variables is

ν=π𝑳(kk)k=k𝑳1ν/π.\nu=\pi\bm{L}\left({k}-{k}^{\prime}\right)\Longleftrightarrow{k}^{\prime}={k}-\bm{L}^{-1}\nu/\pi. (65)

With this change of variables we get that

𝐁\displaystyle{\mathbf{B}} =l1β/2l1β/2lpβ/2lpβ/2G(1)(kk)(kk)(kk)T𝑑k\displaystyle=\int_{-l_{1}^{\beta}/2}^{l_{1}^{\beta}/2}\dots\int_{-l_{p}^{\beta}/2}^{l_{p}^{\beta}/2}G^{(1)\ast}({k}-{k}^{\prime})\cdot({k}-{k}^{\prime})\left({k}-{k}^{\prime}\right)^{T}\;d{k}^{\prime}
=l1β/2l1β/2lpβ/2lpβ/2j=1dπ2lj2sin2(νj)|B|νj2𝑳1πdννT𝑳1πddνj(1/lj)\displaystyle=\int_{-l_{1}^{\beta}/2}^{l_{1}^{\beta}/2}\dots\int_{-l_{p}^{\beta}/2}^{l_{p}^{\beta}/2}\prod_{j=1}^{d}\frac{\pi^{2}l_{j}^{2}\sin^{2}(\nu_{j})}{|B|\nu_{j}^{2}}\bm{L}^{-1}\pi^{-d}\nu\nu^{T}\bm{L}^{-1}\pi^{-d}\;d\nu_{j}(1/l_{j})
=l1β/2l1β/2lpβ/2lpβ/2j=1dljsin2(νj)|B|νj2𝑳1ννT𝑳1dνj.\displaystyle=\int_{-l_{1}^{\beta}/2}^{l_{1}^{\beta}/2}\dots\int_{-l_{p}^{\beta}/2}^{l_{p}^{\beta}/2}\prod_{j=1}^{d}\frac{l_{j}\sin^{2}(\nu_{j})}{|B|\nu_{j}^{2}}\bm{L}^{-1}\nu\nu^{T}\bm{L}^{-1}\;d\nu_{j}. (66)

We then find if all the lj=ll_{j}=l then if 0<β0<\beta

trace{𝐁}\displaystyle{\mathrm{trace}}\left\{{\mathbf{B}}\right\} =lβ/2lβ/2lβ/2lβ/2j=1dsin2(νj)νj2q=1d(νq/l)2dνj\displaystyle=\int_{-l^{\beta}/2}^{l^{\beta}/2}\dots\int_{-l^{\beta}/2}^{l^{\beta}/2}\prod_{j=1}^{d}\frac{\sin^{2}(\nu_{j})}{\nu_{j}^{2}}\sum_{q=1}^{d}(\nu_{q}/l)^{2}\;d\nu_{j}
=d1l2(sin2(x)x2𝑑x)d1lβ/2lβ/2sin2(x)𝑑x\displaystyle=d\cdot\frac{1}{l^{2}}\left(\int_{-\infty}^{\infty}\frac{\sin^{2}(x)}{x^{2}}\,dx\right)^{d-1}\int_{-l^{\beta}/2}^{l^{\beta}/2}\sin^{2}(x)\,dx
=dl2πd1lβ{1+o(1)}.\displaystyle=\frac{d}{l^{2}}\pi^{d-1}l^{\beta}\left\{1+o(1)\right\}. (67)

This decreases with increasing ll as long as β<2\beta<2 (which follows as we have assumed β<1\beta<1), and as the next error term is O(lβ)O(l^{-\beta}). We therefore want to chose β1\beta\rightarrow 1, or for some fixed ϵ>0\epsilon>0 take β=1ϵ\beta=1-\epsilon. Putting all of these components together we get that

𝐄I0(k)\displaystyle{\mathbf{E}}I_{0}({k}) =λ+|B|1d|BBz|e2πikz[ρ(2)(z)λ2]𝑑z+|B|1λ2T(B,k)\displaystyle=\lambda+|B|^{-1}\int_{{\mathbb{R}}^{d}}|B\cap B_{-z}|e^{-2\pi i{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dz+|B|^{-1}\lambda^{2}T(B,{k})
=λ+λ2f~(k)+12trace{𝐇~f(k′′)dl2βπd1{1+o(1)}}+|B|1λ2T(B,k)\displaystyle=\lambda+\lambda^{2}\widetilde{f}({k})+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\frac{d}{l^{2-\beta}}\pi^{d-1}\left\{1+o(1)\right\}\right\}+|B|^{-1}\lambda^{2}T(B,{k})
=λ+(λ+f(k)){1+o(1)}+12trace{𝐇~f(k′′)dl2βπd1{1+o(1)}}+|B|1λ2T(B,k)\displaystyle=\lambda+(-\lambda+f({k}))\left\{1+o(1)\right\}+\frac{1}{2}{\mathrm{trace}}\left\{\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\frac{d}{l^{2-\beta}}\pi^{d-1}\left\{1+o(1)\right\}\right\}+|B|^{-1}\lambda^{2}T(B,{k})
=f(k){1+o(1)}+o(1)+|B|1λ2T(B,k).\displaystyle=f({k})\left\{1+o(1)\right\}+o(1)+|B|^{-1}\lambda^{2}T(B,{k}).

Appendix B Proof of Proposition 4.1

Proof.

We start from using Eqn. (9) and determine that

𝐄{Jh(k)}\displaystyle{\mathbf{E}}\left\{J_{h}({k})\right\} =𝐄xXh(x)e2πikx=λRdh(x)e2πikx𝑑x=λH(k).\displaystyle={\mathbf{E}}\sum_{x\in X}h(x)e^{-2\pi i{k}\cdot x}=\lambda\int_{R^{d}}h(x)e^{-2\pi i{k}\cdot x}\,dx=\lambda H({k}). (68)

Subsequently we find using Eqn. (9)

𝐄{|Jh(k)|2}\displaystyle{\mathbf{E}}\left\{\left|J_{h}({k})\right|^{2}\right\} =𝐄{xXyXh(x)e2πikxh(y)e2πiky}\displaystyle={\mathbf{E}}\left\{\sum_{x\in X}\sum_{y\in X}h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\right\}
=𝐄xX|h(x)|2+𝐄{xX,yXh(x)e2πikxh(y)e2πiky}\displaystyle={\mathbf{E}}\sum_{x\in X}\left|h(x)\right|^{2}+{\mathbf{E}}\left\{\sum_{x\in X,y\in X}^{\neq}h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\right\}
=λRd|h(x)|2𝑑x+Rd×Rdρ(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y\displaystyle=\lambda\int_{R^{d}}\left|h(x)\right|^{2}dx+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy
=λ+Rd×Rdρ(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y.\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy. (69)

We now calculate the variance as

𝐕𝐚𝐫{Jh(k)}\displaystyle{\mathrm{\bf Var}}\left\{J_{h}({k})\right\} =𝐄{|Jh(k)|2}|𝐄{Jh(k)}|2\displaystyle={\mathbf{E}}\left\{\left|J_{h}({k})\right|^{2}\right\}-\left|{\mathbf{E}}\left\{J_{h}({k})\right\}\right|^{2}
=λ+Rd×Rdρ(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y|λH(k)|2.\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy-\left|\lambda H({k})\right|^{2}. (70)

We now define the renormalised quantity

ρ~(2)(z)=ρ(2)(z)λ2λ2,zd.\widetilde{\rho}^{(2)}(z)=\frac{{\rho}^{(2)}(z)-\lambda^{2}}{\lambda^{2}},\quad z\in{\mathbb{R}}^{d}. (71)

The expression in (70) then can be simplified to

𝐕𝐚𝐫{Jh(k)}\displaystyle{\mathrm{\bf Var}}\left\{J_{h}({k})\right\} =λ+Rd×Rdρ(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y|λH(k)|2\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy-\left|\lambda H({k})\right|^{2}
=λ+λ2Rd×Rd(ρ~(2)(xy)+1)h(x)e2πikxh(y)e2πiky𝑑x𝑑y|λH(k)|2\displaystyle=\lambda+\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy-\left|\lambda H({k})\right|^{2}
=λ+λ2Rd×Rdρ~(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y\displaystyle=\lambda+\lambda^{2}\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy
=λ+λ2𝐕𝐚𝐫1{Jh(k)},\displaystyle=\lambda+\lambda^{2}{\mathrm{\bf Var}}_{1}\left\{J_{h}({k})\right\}, (72)

where we define

𝐕𝐚𝐫1{Jh(k)}Rd×Rdρ~(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y.{\mathrm{\bf Var}}_{1}\left\{J_{h}({k})\right\}\equiv\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy. (73)

To simplify this expression we note that

𝐕𝐚𝐫1{Jh(k)}\displaystyle{\mathrm{\bf Var}}_{1}\left\{J_{h}({k})\right\} =Rd×Rdρ~(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y\displaystyle=\int_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy
=Rd×Rd[Rdf~(2)(k)e2πi(xy)k𝑑k]h(x)e2πikxh(y)e2πiky𝑑x𝑑y\displaystyle=\int_{R^{d}\times R^{d}}\left[\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{2\pi i(x-y)\cdot{k}^{\prime}}d{k}^{\prime}\right]\cdot h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy
=Rd×RdRdf~(2)(k)e2πi(xy)(kk)𝑑kh(x)h(y)𝑑x𝑑y\displaystyle=\int_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{-2\pi i(x-y)\cdot({k}-{k}^{\prime})}d{k}^{\prime}\cdot h(x)h^{\ast}(y)\,dx\,dy
=Rdf~(2)(k)|H(kk)|2𝑑w.\displaystyle=\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})\left|H({k}-{k}^{\prime})\right|^{2}dw^{\prime}. (74)

This yields the desired expression for the variance. We return to the expression of the relation of the DFT. We aim to show that

Rel{Jh(k)}=λdH(k2k)H(k)𝑑k+dU(k,z)e2πikz{ρ(z)λ2}𝑑x𝑑z.{\mathrm{Rel}}\left\{J_{h}({k})\right\}=\lambda\int_{\mathbb{R}^{d}}H({k}^{\prime}-2{k})H({k}^{\prime})d{k}^{\prime}+\int_{\mathbb{R}^{d}}U({k},z)e^{-2\pi i{k}\cdot z}\{\rho(z)-\lambda^{2}\}dxdz.

We have by the definition of the relation [64]

Rel{Jh(k)}\displaystyle{\mathrm{Rel}}\left\{J_{h}({k})\right\} =𝐄{(Jh(k))2}(𝐄{Jh(k)})2.\displaystyle={\mathbf{E}}\left\{\left(J_{h}({k})\right)^{2}\right\}-\left({\mathbf{E}}\left\{J_{h}({k})\right\}\right)^{2}.

We then note that

𝐄{(Jh(k))2}\displaystyle{\mathbf{E}}\left\{\left(J_{h}({k})\right)^{2}\right\} =𝐄{xXyXh(x)e2πikxh(y)e2πiky}\displaystyle={\mathbf{E}}\left\{\sum_{x\in X}\sum_{y\in X}h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\right\}
=𝐄xX{h(x)}2e4πikx+𝐄{xX,yXh(x)e2πikxh(y)e2πiky}\displaystyle={\mathbf{E}}\sum_{x\in X}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}+{\mathbf{E}}\left\{\sum_{x\in X,y\in X}^{\neq}h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\right\}
=λRd{h(x)}2e4πikx𝑑x+Rd×Rdρ¯(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+\iint_{R^{d}\times R^{d}}\bar{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy
=λRd{h(x)}2e4πikx𝑑x+Rel1{Jh(k)}.\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+{\mathrm{Rel}}_{1}\left\{J_{h}({k})\right\}. (75)

We now seek to simplify this and write

Rel1{Jh(k)}=λ2Rd×Rd(ρ~(2)(xy)+1)h(x)e2πikxh(y)e2πiky𝑑x𝑑y.{\mathrm{Rel}}_{1}\left\{J_{h}({k})\right\}=\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy. (76)

We additionally have from (68):

𝐄2{Jh(k)}=λ2H2(k).{\mathbf{E}}^{2}\left\{J_{h}({k})\right\}=\lambda^{2}H^{2}({k}).

Putting all of this together we get that

Rel{Jh(k)}\displaystyle{\mathrm{Rel}}\left\{J_{h}({k})\right\} =λRd{h(x)}2e4πikx𝑑x+λ2Rd×Rd(ρ~(2)(xy)+1)h(x)e2πikxh(y)e2πiky𝑑x𝑑y\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy
λ2H2(k)\displaystyle-\lambda^{2}H^{2}({k})
=λRd{h(x)}2e4πikx𝑑x+λ2Rd×Rdρ~(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y.\displaystyle=\lambda\int_{R^{d}}\left\{h(x)\right\}^{2}e^{-4\pi i{k}\cdot x}dx+\lambda^{2}\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy.

We now implement a change of variables, and set z=xyz=x-y. We then have

Rd×Rdρ~(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y=Rd×Rdρ~(2)(z)h(x)e2πikxh(xz)e2πik(xz)𝑑x𝑑z,\iint_{R^{d}\times R^{d}}\!\!\!\!\widetilde{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h(y)e^{-2\pi i{k}\cdot y}\,dx\,dy=\iint_{R^{d}\times R^{d}}\!\!\!\!\widetilde{\rho}^{(2)}(z)h(x)e^{-2\pi i{k}\cdot x}h(x-z)e^{-2\pi i{k}\cdot(x-z)}\,dx\,dz,

and so with the definition of

U(k,z)h(x)h(z+x)e2πik(2x)𝑑x,U({k},z)\equiv\int h(x)h(z+x)e^{-2\pi i{k}\cdot(2x)}dx, (77)

the expression is a consequence of the above expressions. ∎

Appendix C Proof of Lemma 4.2

Proof.

We start from

I~h(k)=|Jh(k)λH(k)|2,\widetilde{I}_{h}({k})=\left|J_{h}({k})-\lambda H({k})\right|^{2},

and so take

𝐄{I~h(k)}=𝐄{|Jh(k)|2}λ𝐄{Jh(k)H(k)}λ𝐄{Jh(k)H(k)}+λ2𝐄{|H(k)|2}.{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}={\mathbf{E}}\{\left|J_{h}({k})\right|^{2}\}-\lambda{\mathbf{E}}\{J_{h}({k})H^{\ast}({k})\}-\lambda{\mathbf{E}}\{J_{h}^{\ast}({k})H({k})\}+\lambda^{2}{\mathbf{E}}\{\left|H({k})\right|^{2}\}.

We now use Campbell’s theorem to evaluate these expectations. We already have from Proposition 4.1 that

𝐄{|Jh(k)|2}\displaystyle{\mathbf{E}}\left\{\left|J_{h}({k})\right|^{2}\right\} =λ+Rd×Rdρ(2)(xy)h(x)e2πikxh(y)e2πiky𝑑x𝑑y.\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy.

Also we note that

H(k)\displaystyle H({k}) =Rdh(x)e2πikx𝑑x.\displaystyle=\int_{R^{d}}h(x)e^{-2\pi i{k}\cdot x}\,dx. (78)

We also note

𝐄{Jh(k)}\displaystyle{\mathbf{E}}\left\{J_{h}({k})\right\} =λH(k).\displaystyle=\lambda H({k}).

Therefore we have

𝐄{I~h(k)}\displaystyle\!{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}\! =𝐄{|Jh(k)|2}λ𝐄{Jh(k)H(k)}λ𝐄{Jh(k)H(k)}+λ2𝐄{|H(k)|2}\displaystyle=\!{\mathbf{E}}\{\left|J_{h}({k})\right|^{2}\}-\lambda{\mathbf{E}}\{J_{h}({k})H^{\ast}({k})\}-\lambda{\mathbf{E}}\{J_{h}^{\ast}({k})H({k})\}+\lambda^{2}{\mathbf{E}}\{\left|H({k})\right|^{2}\}
=𝐄{|Jh(k)|2}λ2|H(k)|2\displaystyle={\mathbf{E}}\{\left|J_{h}({k})\right|^{2}\}-\lambda^{2}\left|H({k})\right|^{2}
=λ+Rd×Rd{ρ(2)(xy)λ2}h(x)e2πikxh(y)e2πiky𝑑x𝑑y.\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}\{{\rho}^{(2)}(x-y)-\lambda^{2}\}h(x)e^{-2\pi i{k}\cdot x}h^{\ast}(y)e^{2\pi i{k}\cdot y}\,dx\,dy. (79)

We note that the multiplication in (79) leads to a convolution in the wavenumber domain. If we rewrite

ρ(2)(z)λ2\displaystyle{\rho}^{(2)}(z)-\lambda^{2} =Rd(f(k)λ)e2πikz𝑑k,\displaystyle=\int_{R^{d}}(f({k})-\lambda)e^{2\pi i{k}\cdot z}\,d{k}, (80)

then rewriting (79) we get

𝐄{I~h(k)}\displaystyle\!{\mathbf{E}}\left\{\widetilde{I}_{h}({k})\right\}\! =λ+Rd×RdRd(f(u)λ)exp(2πiu(xy)+2πik(xy))𝑑uh(x)h(y)𝑑x𝑑y\displaystyle=\lambda+\iint_{R^{d}\times R^{d}}\int_{R^{d}}(f(u)-\lambda)\exp(-2\pi iu\cdot(x-y)+2\pi i{k}\cdot(x-y))\,duh(x)h^{\ast}(y)\,dx\,dy
=λ+Rd(f(u)λ)H(ku)2𝑑u=Rdf(u)H(ku)2𝑑u,\displaystyle=\lambda+\int_{R^{d}}(f(u)-\lambda)\|H({k}-u)\|^{2}\,du=\int_{R^{d}}f(u)\|H({k}-u)\|^{2}\,du,

as given. ∎

Appendix D Proof of Corollary 4.2

Proof.

Starting from Lemma 4.2 we can note that

𝐄{I~h(k)}=d|H(kk)|2f(k)𝑑k.{\mathbf{E}}\{\widetilde{I}_{h}({k})\}=\int_{{\mathbb{R}}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}f({k}^{\prime})\,d{k}^{\prime}. (81)

Now we need to assume properties of |H(k)||H({k})| in order to simplify this expression. Assume the cuboid domain has side length ll, and that we have selected a data taper so that it has unit norm:

dh(x)2𝑑x=1.\int_{{\mathbb{R}}^{d}}\|h(x)\|^{2}\,dx=1.

We assume the data taper h(x)h(x) is compactly supported in space and well–concentrated in wavenumber to region 𝒲d{\mathcal{W}}\subset{\mathbb{R}}^{d} so that

𝒲|H(k)|2𝑑k=1εl,\iint_{\mathcal{W}}|H({k})|^{2}\,d{k}=1-\varepsilon_{l}, (82)

where l>0l>0 is the minimum length scale in any dimension of 𝒲{\mathcal{W}}, and we assume ϵl0\epsilon_{l}\rightarrow 0 as |l|.|l|\rightarrow\infty. We now repeat the arguments posed in Appendix A, for a different window h(x)h(x) then the (spatial) box–car to the region.

We now return to (81) and present a similar argument to Appendix A. We note that

𝐄{I~h(k)}\displaystyle{\mathbf{E}}\{\widetilde{I}_{h}({k})\} =d|H(kk)|2f(k)𝑑k\displaystyle=\int_{{\mathbb{R}}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}f({k}^{\prime})\,d{k}^{\prime}
=d|H(kk)|2(λ+λ2f~(k))𝑑k\displaystyle=\int_{{\mathbb{R}}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}\left(\lambda+\lambda^{2}\widetilde{f}({k}^{\prime})\right)\,d{k}^{\prime}
=λ+λ2d|H(kk)|2f~(k)𝑑k\displaystyle=\lambda+\lambda^{2}\int_{{\mathbb{R}}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}
=λ+λ2𝒲|H(kk)|2f~(k)𝑑k+λ2d\𝒲|H(kk)|2f~(k)𝑑k\displaystyle=\lambda+\lambda^{2}\int_{\mathcal{W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}+\lambda^{2}\int_{{\mathbb{R}}^{d}\backslash{\mathcal{W}}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}
=λ+𝐄1{I~h(k)}+𝐄2{I~h(k)},\displaystyle=\lambda+{\mathbf{E}}_{1}\{\widetilde{I}_{h}({k})\}+{\mathbf{E}}_{2}\{\widetilde{I}_{h}({k})\}, (83)

where f[n]f^{[n]} denotes the nnth derivative of the function ff and we use this equation to define the latter two terms. We note that

|𝐄2{I~h(k)}|\displaystyle\left|{\mathbf{E}}_{2}\{\widetilde{I}_{h}({k})\}\right| λ2εlf~0.\displaystyle\leq\lambda^{2}\varepsilon_{l}\|\widetilde{f}\|_{0}. (84)

Thus we only need to understand the remaining term in the expression. We then obtain with a Lagrange form of the remainder

𝐄1{I~h(k)}\displaystyle{\mathbf{E}}_{1}\{\widetilde{I}_{h}({k})\} =λ2𝒲|H(kk)|2f~(k)𝑑k\displaystyle=\lambda^{2}\int_{\mathcal{W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\,d{k}^{\prime}
=λ2𝒲|H(kk)|2[f~(k)+f~[1](k)(kk)+12f~[2](k)(kk)2+R[3](k,k)]𝑑k\displaystyle=\lambda^{2}\int_{\mathcal{W}}\left|H({k}^{\prime}-{k})\right|^{2}\left[\widetilde{f}({k})+\widetilde{f}^{[1]}({k})({k}^{\prime}-{k})+\frac{1}{2}\widetilde{f}^{[2]}({k})({k}^{\prime}-{k})^{2}+R^{[3]}({k},{k}^{\prime})\right]\,d{k}^{\prime}
=λ2(1εl)f~(k)+12λ2f~[2](k)𝒲|H(kk)|2(kk)2𝑑k+R~[3](k).\displaystyle=\lambda^{2}(1-\varepsilon_{l})\widetilde{f}({k})+\frac{1}{2}\lambda^{2}\widetilde{f}^{[2]}({k})\int_{\mathcal{W}}\left|H({k}^{\prime}-{k})\right|^{2}({k}^{\prime}-{k})^{2}\,d{k}^{\prime}+\widetilde{R}^{[3]}({k}). (85)

We can note that

|R[3](k,k)|13!supk′′|f~[3](k′′)||kk|3|H(kk)|2𝑑k.|{R}^{[3]}({k},{k}^{\prime})|\leq\frac{1}{3!}\sup_{{k}^{\prime\prime}}|\widetilde{f}^{[3]}({k}^{\prime\prime})|\int|{k}^{\prime}-{k}|^{3}\left|H({k}^{\prime}-{k})\right|^{2}\,d{k}^{\prime}. (86)

Putting these terms together we obtain that

𝐄{I~h(k)}\displaystyle{\mathbf{E}}\{\widetilde{I}_{h}({k})\} =λ+λ2(1εl)f~(k)+12λ2f~[2](k)𝒲|H(kk)|2(kk)2𝑑k+R~[3](k)\displaystyle=\lambda+\lambda^{2}(1-\varepsilon_{l})\widetilde{f}({k})+\frac{1}{2}\lambda^{2}\widetilde{f}^{[2]}({k})\int_{\mathcal{W}}\left|H({k}^{\prime}-{k})\right|^{2}({k}^{\prime}-{k})^{2}\,d{k}^{\prime}+\widetilde{R}^{[3]}({k}) (87)
=λ+λ2f~(k)+o(1),\displaystyle=\lambda+\lambda^{2}\widetilde{f}({k})+o(1), (88)

which completes the expression. ∎

Appendix E Proof of Theorem 4.1

Proof.

The expectation of the tapered Fourier transform is (see (9))

𝐄It(k)\displaystyle{\mathbf{E}}I^{t}({k}) =2dh(x)h(y)e2πik(xy)ρ(2)(x,y)𝑑x𝑑y+dh2(x)ρ(1)(x)𝑑x\displaystyle=\int_{{\mathbb{R}}^{2d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}\rho^{(2)}(x,y)\,dx\,dy+\int_{{\mathbb{R}}^{d}}h^{2}(x)\rho^{(1)}(x)\,dx
=λh2(x)𝑑x+B2h(x)h(y)e2πik(xy)ρ(2)(xy)𝑑x𝑑y\displaystyle=\lambda\int_{\mathbb{R}}h^{2}(x)dx+\int_{B^{2}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}\rho^{(2)}(x-y)dxdy (89)
=λ1+B2h(x)h(y)e2πik(xy)[ρ(2)(xy)λ2]𝑑x𝑑y\displaystyle=\lambda\cdot 1+\int_{B^{2}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy
+λ2B2h(x)h(y)ei2πk(xy)𝑑x𝑑y\displaystyle+\lambda^{2}\int_{B^{2}}h(x)h(y)e^{-i2\pi{k}\cdot(x-y)}dxdy
=λ+ddh(x)h(y)e2πik(xy)[ρ(2)(xy)λ2]𝑑x𝑑y+λ2h(B,k)h(B,k)\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy+\lambda^{2}h(B,{k})h(B,-{k})
=λ+ddh(x)h(y)e2πik(xy)[ρ(2)(xy)λ2]𝑑x𝑑y+λ2|H(k)|2\displaystyle=\lambda+\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy+\lambda^{2}\left|H({k})\right|^{2}
=λ+𝐄It,(2)(k)+λ2|H(k)|2,\displaystyle=\lambda+{\mathbf{E}}I^{t,(2)}({k})+\lambda^{2}\left|H({k})\right|^{2}, (90)

this defining the quantity 𝐄It,(2)(k).{\mathbf{E}}I^{t,(2)}({k}).

The expression 𝐄It,(2)(k){\mathbf{E}}I^{t,(2)}({k}) mirrors what we can see in (56). We see directly the benefits of tapering–|H(k)||H({k})| will decay rapidly from zero so there will be no effect from the third term–there will be less blurring in the second term apart from very close to k=0{k}=0.

From which we see that we can re-express this understanding in the Fourier domain, again using the convolution theorem. We define

𝐄It,(2)(k)=ddh(x)h(y)e2πik(xy)[ρ(2)(xy)λ2]𝑑x𝑑y.{\mathbf{E}}I^{t,(2)}({k})=\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}h(x)h(y)e^{-2\pi i{k}\cdot(x-y)}[\rho^{(2)}(x-y)-\lambda^{2}]dxdy.

To explore this further we again do a change of variables:

𝐄It,(2)(k)=dBzh(x)h(xz)ei2πkz[ρ(2)(z)λ2]𝑑x𝑑z.{\mathbf{E}}I^{t,(2)}({k})=\int_{\mathbb{R}^{d}}\int_{B_{z}}h(x)h(x-z)e^{-i2\pi{k}\cdot z}[\rho^{(2)}(z)-\lambda^{2}]dxdz.

We define the inner product window using that h(x)h(x) is compactly supported to get

Wh(z)=Bzh(x)h(xz)𝑑x=dh(x)h(xz)𝑑x=d|H(k)|2e2πikz𝑑k.W_{h}(z)=\int_{B_{z}}h(x)h(x-z)dx=\int_{{\mathbb{R}}^{d}}h(x)h(x-z)dx=\int_{{\mathbb{R}}^{d}}\left|H({k})\right|^{2}e^{2\pi i{k}\cdot z}\,d{k}.

We could also divide by |BBz|h(x)h(xz)|B\cap B_{-z}|h(x)h(x-z) as long as we are inside the domain. Outside BB we cannot, and so this is why we cannot remove all bias in the spectrum even if we bias–correct as suggested by [34].

With this window we can note that

𝐄It,(2)(k)=λ2d|H(kk)|2f~(k)𝑑k.{\mathbf{E}}I^{t,(2)}({k})=\lambda^{2}\int_{\mathbb{R}^{d}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}. (91)

Ideally we would like |H(k)|2=δ(k)\left|H({k})\right|^{2}=\delta({k}), but this is not possible as |B||B| is finite. If we assume f~(k)\widetilde{f}({k}) is smooth and possessing two derivatives then if BB is growing we can obtain nearly unbiasedness. We now assume that |H(k)|2\left|H({k})\right|^{2} is concentrated to a region in wavenumber 𝒲{\cal W} and again use a Taylor expansion. There are two approaches to this, either [60] or [71].

The systematic bias: λ2h(B,k)h(B,k)=|H(k)|2\lambda^{2}h(B,{k})h(B,-{k})=\left|H({k})\right|^{2} replaces the sinc functions for the square box. As we have chosen the function hh to be well–concentrated [71, 60, 76], the effect of this is limited.

We assume that

𝒲|H(k)|2𝑑k=1εl,whereεl=o(1).\int_{{\cal W}}\left|H({k})\right|^{2}\;d{k}=1-\varepsilon_{l},\quad{\mathrm{where}}\quad\varepsilon_{l}=o(1). (92)

We also assume that f~(k)\tilde{f}({k}) is upper bounded by f~0\|\widetilde{f}\|_{0}. We then have

𝐄It,(2)(k)\displaystyle{\mathbf{E}}I^{t,(2)}({k}) =λ2d\𝒲|H(kk)|2f~(k)𝑑k+λ2𝒲|H(kk)|2f~(k)𝑑k.\displaystyle=\lambda^{2}\int_{\mathbb{R}^{d}\backslash{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}+\lambda^{2}\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}. (93)

We note that

d\𝒲|H(kk)|2f~(k)𝑑kf~0{1{1εl}}.\int_{\mathbb{R}^{d}\backslash{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\leq\|\widetilde{f}\|_{0}\{1-\{1-\varepsilon_{l}\}\}.

We can yet again utilise the Taylor expansion of (62) inside 𝒲{\cal W}. We find

𝐄It,(2)(k)\displaystyle{\mathbf{E}}I^{t,(2)}({k}) =λ2𝒲|H(kk)|2f~(k)𝑑k{1+o(1)}\displaystyle=\lambda^{2}\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\left\{1+o(1)\right\}
=λ2𝒲|H(kk)|2{f~(k)+f~(k)T(kk)+12(kk)T𝐇~f(k′′)(kk)}𝑑k{1+o(1)}\displaystyle=\lambda^{2}\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\left\{\widetilde{f}({k})+\nabla\widetilde{f}({k})^{T}\left({k}^{\prime}-{k}\right)+\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\right\}\;d{k}^{\prime}\left\{1+o(1)\right\}
=λ2f~(k)+0+λ2𝒲|H(kk)|212(kk)T𝐇~f(k′′)(kk)𝑑k.\displaystyle=\lambda^{2}\widetilde{f}({k})+0+\lambda^{2}\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\;d{k}^{\prime}. (94)

We note that

𝒲|H(kk)|212(kk)T𝐇~f(k′′)(kk)𝑑k2\displaystyle\|\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}\right)\;d{k}^{\prime}\|^{2}
|𝐇~f(k′′)|𝒲|H(kk)|212(kk)T(kk)𝑑k\displaystyle\leq|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})|\int_{{\cal W}}\left|H({k}^{\prime}-{k})\right|^{2}\frac{1}{2}\left({k}^{\prime}-{k}\right)^{T}\left({k}^{\prime}-{k}\right)\;d{k}^{\prime}
=|𝐇~f(k′′)|𝒲|H(k)|2w2𝑑k=𝒪(1/l).\displaystyle=|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})|\int_{{\cal W}}\left|H({k})\right|^{2}w^{2}d{k}={\cal O}(1/l).

Then

𝐄It(k)\displaystyle{\mathbf{E}}I^{t}({k}) =λ+𝐄It,(2)(k)+λ2|H(k)|2\displaystyle=\lambda+{\mathbf{E}}I^{t,(2)}({k})+\lambda^{2}\left|H({k})\right|^{2}
=f(k)+0+𝒪(1/l)+λ2|H(k)|2.\displaystyle=f({k})+0+{\cal O}(1/l)+\lambda^{2}\left|H({k})\right|^{2}.

We have that |H(k)|2\left|H({k})\right|^{2} decays in k\|{k}\|. If k𝒲{k}\notin{\mathcal{W}} then the influence of this term becomes negligible. ∎

Appendix F Proof of Theorem 5.1

Proof.

For brevity we define the notation ϕk(x)=e2πikx\phi_{k}(x)=e^{-2\pi i{k}\cdot x}. From first principles using Campbell’s theorem we may deduce that the uncentred second moment of Imt(k1)I^{t}_{{m}}({k}_{1}) with Imt(k2)I^{t}_{{m^{\prime}}}({k}_{2}) takes the form of

𝐄{Imt(k1)Imt(k2)}=\displaystyle{\mathbf{E}}\left\{I^{t}_{{m}}({k}_{1})I^{t}_{{m^{\prime}}}({k}_{2})\right\}= 𝐄[x,yXhm(x)hm(y)ϕk1(xy)+xXhm2(x)][z,vXhm(z)hm(v)ϕk2(zv)+qXhm2(q)]\displaystyle{\mathbf{E}}\left[\sum^{\neq}_{x,y\in X}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)+\sum_{x\in X}h_{{m}}^{2}(x)\right]\left[\sum_{z,v\in X}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)+\sum_{q\in X}h_{{m^{\prime}}}^{2}(q)\right]
=\displaystyle= 𝐄{x,yXhm(x)hm(y)ϕk1(xy)z,vXhm(z)hm(v)ϕk2(zv)}\displaystyle{\mathbf{E}}\left\{\sum^{\neq}_{x,y\in X}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\sum_{z,v\in X}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)\right\}
+𝐄{xXhm2(x)y,zXhm(y)hm(z)ϕk2(yz)+xXhm2(x)y,zXhm(y)hm(z)ϕk1(yz)}\displaystyle+{\mathbf{E}}\left\{\sum_{x\in X}h_{{m}}^{2}(x)\sum^{\neq}_{y,z\in X}h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(z)\phi_{{k}_{2}}(y-z)+\sum_{x\in X}h_{{m^{\prime}}}^{2}(x)\sum^{\neq}_{y,z\in X}h_{{m}}(y)h_{{m}}(z)\phi_{{k}_{1}}(y-z)\right\}
+𝐄[xXhm2(x)yXhm2(y)]\displaystyle+{\mathbf{E}}\left[\sum_{x\in X}h_{{m}}^{2}(x)\sum_{y\in X}h_{{m^{\prime}}}^{2}(y)\right]
=\displaystyle= V1(k1,k2)+V2(k1,k2)+V3.\displaystyle V_{1}({k}_{1},{k}_{2})+V_{2}({k}_{1},{k}_{2})+V_{3}. (95)

This latter equation defines the three terms V1(k1,k2)V_{1}({k}_{1},{k}_{2}), V2(k1,k2)V_{2}({k}_{1},{k}_{2}) and V3V_{3} (where the latter does not depend on k1{k}_{1} and k2{k}_{2} even if V2V_{2} is the sum of two terms, one only depending on k1{k}_{1} and the other only depending on k2{k}_{2}). We now need to further split these terms up in order to learn about the cases when we can have repeats of locations or not. We start with V1V_{1}. For convenience we suppress “X\in X” notation. First, we note:

V1(k1,k2)\displaystyle V_{1}({k}_{1},{k}_{2}) =𝐄{x,yhm(x)hm(y)ϕk1(xy)z,vhm(z)hm(v)ϕk2(zv)}\displaystyle={\mathbf{E}}\left\{\sum^{\neq}_{x,y}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\sum_{z,v}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)\right\}
=𝐄x,yz,v𝟏(xz,xv)𝟏(yz,yv)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)\displaystyle={\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x\neq v\right)\mathbf{1}\left(y\neq z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)
+𝐄x,yz,v𝟏(x=z,xv)𝟏(yz,yv)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(xv)\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x=z,x\neq v\right)\mathbf{1}\left(y\neq z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-v)
+𝐄x,yz,v𝟏(xz,x=v)𝟏(yz,yv)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x=v\right)\mathbf{1}\left(y\neq z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)
+𝐄x,yz,v𝟏(xz,xv)𝟏(y=z,yv)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x\neq v\right)\mathbf{1}\left(y=z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)
+𝐄x,yz,v𝟏(xz,xv)𝟏(yz,y=v)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x\neq z,x\neq v\right)\mathbf{1}\left(y\neq z,y=v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)
+𝐄x,yz,v𝟏(x=z,xv)𝟏(y=v,yz)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x=z,x\neq v\right)\mathbf{1}\left(y=v,y\neq z\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)
+𝐄x,yz,v𝟏(x=v,xz)𝟏(y=z,yv)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv).\displaystyle+{\mathbf{E}}\sum^{\neq}_{x,y}\sum_{z,v}^{\neq}\mathbf{1}\left(x=v,x\neq z\right)\mathbf{1}\left(y=z,y\neq v\right)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v).

We have here assumed that hm(x)h_{{m}}(x) is compactly supported on BB for all values of pp used. Using the simplification implied by (9) we obtain the expression of

V1(k1,k2)\displaystyle V_{1}({k}_{1},{k}_{2}) =𝐄{x,yhm(x)hm(y)ϕk1(xy)z,vhm(z)hm(v)ϕk2(zv)}\displaystyle={\mathbf{E}}\left\{\sum^{\neq}_{x,y}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\sum_{z,v}^{\neq}h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{2}}(z-v)\right\}
=B4ρ(4)(x,y,z,v)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)𝑑x𝑑y𝑑v𝑑z\displaystyle=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)dxdydvdz
+B3ρ(3)(x,y,v)hm(x)hm(y)hm(x)hm(v)ϕk1(xy)ϕk2(xv)𝑑x𝑑y𝑑v\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-v)dxdydv
+B3ρ(3)(x,y,z)hm(x)hm(y)hm(z)hm(x)ϕk1(xy)ϕk2(zx)𝑑x𝑑y𝑑z\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-x)dxdydz
+B3ρ(3)(x,y,v)hm(x)hm(y)hm(y)hm(v)ϕk1(xy)ϕk2(yv)𝑑x𝑑y𝑑v\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-v)dxdydv
+B3ρ(3)(x,y,z)hm(x)hm(y)hm(z)hm(y)ϕk1(xy)ϕk2(zy)𝑑x𝑑y𝑑z\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-y)dxdydz
+B2ρ(2)(x,y)hm(x)hm(y)hm(x)hm(y)ϕk1(xy)ϕk2(xy)𝑑x𝑑y\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-y)dxdy
+B2ρ(2)(x,y)hm(x)hm(y)hm(y)hm(x)ϕk1(xy)ϕk2(yx)𝑑x𝑑y.\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-x)dxdy.

We will be able to use the orthogonality between the data tapers {hm(x)}\{h_{{m^{\prime}}}(x)\} but this will be easier in the Fourier domain where we can assume local smoothness of the Fourier transform of ρ(n)()\rho^{(n)}(\dots), and carry out the usual Taylor series. Now we start by implementing the calculations for the Poisson case, to see the mechanics.

The next term in the expansion is then

V2\displaystyle V_{2} (k1,k2)=𝐄{vhm2(v)x,yhm(x)hm(y)ϕk2(xy)+xhm2(x)v,yhm(x)hm(y)hm(v)hm(y)ϕk1(xy)}\displaystyle({k}_{1},{k}_{2})={\mathbf{E}}\left\{\sum_{v}h_{{m}}^{2}(v)\sum^{\neq}_{x,y}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)+\sum_{x}h_{{m^{\prime}}}^{2}(x)\sum^{\neq}_{v,y}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(v)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\right\}
=B3ρ(3)(x,y,v)hm2(v)hm(x)hm(y)ϕk2(xy)𝑑x𝑑y𝑑v+B2ρ(2)(x,y)hm(x)hm(y)hm2(x)ϕk2(xy)𝑑x𝑑y\displaystyle=\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\;dxdydv+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)h_{{m}}^{2}(x)\phi_{{k}_{2}}(x-y)\;dxdy
+B2ρ(2)(x,y)hm2(y)hm(x)hm(y)ϕk2(xy)𝑑x𝑑y+B3ρ(3)(x,y,v)hm2(v)hm(x)hm(y)ϕk1(xy)𝑑x𝑑y𝑑v\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\;dxdy+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\;dxdydv
+B2ρ(2)(x,y)hm2(x)hm(x)hm(y)ϕk1(xy)𝑑x𝑑y+B2ρ(2)(x,y)hm2(y)hm(x)hm(y)ϕk1(xy)𝑑x𝑑y.\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\;dxdy+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(y)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\;dxdy.

The final term is in turn using (9)

V3\displaystyle V_{3} =𝐄[xhm2(x)yhm2(y)]=𝐄xyhm2(x)hm2(y)+𝐄xhm2(x)hm2(x)\displaystyle={\mathbf{E}}\left[\sum_{x}h_{{m}}^{2}(x)\sum_{y}h_{{m^{\prime}}}^{2}(y)\right]={\mathbf{E}}\sum_{x\neq y}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)+{\mathbf{E}}\sum_{x}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)
=B2ρ(2)(x,y)hm2(x)hm2(y)𝑑x𝑑y+λBhm2(x)hm2(x)𝑑x.\displaystyle=\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\,dx\,dy+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx. (96)

We can put these terms together to determine the covariance between the periodogram at two different wavenumbers and using two different tapers as follows:

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k1),Imt(k2)}=B4ρ(4)(x,y,z,v)hm(x)hm(y)hm(z)hm(v)fk1(xy)fk2(zv)𝑑x𝑑y𝑑v𝑑z\displaystyle\{I^{t}_{{m}}({k}_{1}),I^{t}_{{m^{\prime}}}({k}_{2})\}=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)f_{{k}_{1}}(x-y)f_{{k}_{2}}(z-v)dxdydvdz
+B3ρ(3)(x,y,v)hm(x)hm(y)hm(x)hm(v)fk1(xy)fk2(xv)𝑑x𝑑y𝑑v\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)f_{{k}_{1}}(x-y)f_{{k}_{2}}(x-v)dxdydv
+B3ρ(3)(x,y,z)hm(x)hm(y)hm(z)hm(x)fk1(xy)fk2(zx)𝑑x𝑑y𝑑z\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(x)f_{{k}_{1}}(x-y)f_{{k}_{2}}(z-x)dxdydz
+B3ρ(3)(x,y,v)hm(x)hm(y)hm(y)hm(v)fk1(xy)fk2(yv)𝑑x𝑑y𝑑v\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)f_{{k}_{1}}(x-y)f_{{k}_{2}}(y-v)dxdydv
+B3ρ(3)(x,y,z)hm(x)hm(y)hm(z)hm(y)fk1(xy)fk2(zy)𝑑x𝑑y𝑑z\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,z)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(y)f_{{k}_{1}}(x-y)f_{{k}_{2}}(z-y)dxdydz
+B2ρ(2)(x,y)hm(x)hm(y)hm(x)hm(y)fk1(xy)fk2(xy)𝑑x𝑑y\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{1}}(x-y)f_{{k}_{2}}(x-y)dxdy
+B2ρ(2)(x,y)hm(x)hm(y)hm(y)hm(x)fk1(xy)fk2(yx)𝑑x𝑑y\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(x)f_{{k}_{1}}(x-y)f_{{k}_{2}}(y-x)dxdy
+B3ρ(3)(x,y,v)hm2(v)hm(x)hm(y)fk2(xy)𝑑x𝑑y𝑑v+B2ρ(2)(x,y)hm2(x)hm(x)hm(y)fk2(xy)𝑑x𝑑y\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{2}}(x-y)dxdydv+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{2}}(x-y)dxdy
+B2ρ(2)(x,y)hm2(y)hm(x)hm(y)fk2(xy)𝑑x𝑑y+B3ρ(3)(x,y,v)hm2(v)hm(x)hm(y)fk1(xy)𝑑x𝑑y\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{2}}(x-y)dxdy+\int_{B^{3}}\rho^{(3)}(x,y,v)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)f_{{k}_{1}}(x-y)dxdy
+B2ρ(2)(x,y)hm2(x)hm(x)hm(y)fk1(xy)𝑑x𝑑y+B2ρ(2)(x,y)hm2(y)hm(x)hm(y)fk1(xy)𝑑x𝑑y\displaystyle+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)f_{{k}_{1}}(x-y)dxdy+\iint_{B^{2}}\rho^{(2)}(x,y)h_{{m^{\prime}}}^{2}(y)h_{{m}}(x)h_{{m}}(y)f_{{k}_{1}}(x-y)dxdy
+B2ρ(2)(x,y)hm2(x)hm2(y)𝑑x𝑑y+λBhm2(x)hm2(x)𝑑x\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\,dx\,dy+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx
{λ+B2hm(x)hm(y)eik1T(xy)ρ(2)(xy)𝑑x𝑑y}{λ+B2hm(x)hm(y)eik2T(xy)ρ(2)(xy)𝑑x𝑑y}\displaystyle-\left\{\lambda+\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)e^{-i{{k}_{1}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}\left\{\lambda+\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)e^{-i{{k}_{2}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}
=B4ρ(4)(x,y,z,v)m(4)(x,y,z,v)𝑑x𝑑y𝑑z𝑑v+B3ρ(3)(x,y,z)m(3)(x,y,z)𝑑x𝑑y𝑑z\displaystyle=\int_{B^{4}}\rho^{(4)}(x,y,z,v)m^{(4)}(x,y,z,v)\,dxdydzdv+\int_{B^{3}}\rho^{(3)}(x,y,z)m^{(3)}(x,y,z)\,dxdydz
+B2ρ(2)(x,y)m(2)(x,y)𝑑x𝑑y\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)m^{(2)}(x,y)\,dxdy
{λ+B2hm(x)hm(y)eik1T(xy)ρ(2)(xy)𝑑x𝑑y}{λ+B2hm(x)hm(y)eik2T(xy)ρ(2)(xy)𝑑x𝑑y},\displaystyle-\left\{\lambda+\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)e^{-i{{k}_{1}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}\left\{\lambda+\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)e^{-i{{k}_{2}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\},

this defining m(4)(x,y,z,v)m^{(4)}(x,y,z,v), m(3)(x,y,z)m^{(3)}(x,y,z) and m(2)(x,y)m^{(2)}(x,y), respectively.

We start by noting that the multiplier of ρ(3)\rho^{(3)} takes the form of

m(3)(x,y,z)\displaystyle m^{(3)}(x,y,z) =hm(x)hm(y)hm(x)hm(z)ϕk1(xy)ϕk2(xz)+hm(x)hm(y)hm(z)hm(x)ϕk1(xy)ϕk2(xz)\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-z)+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{-{k}_{2}}(x-z)
+hm(x)hm(y)hm(y)hm(z)ϕk1(xy)ϕk2(yz)+hm(x)hm(y)hm(z)hm(y)ϕk1(xy)ϕk2(yz)\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-z)+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{-{{k}_{2}}}(y-z)
+hm2(z)hm(x)hm(y)ϕk2(xy)+hm2(z)hm(x)hm(y)ϕk1(xy)\displaystyle+h_{{m}}^{2}(z)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)+h_{{m^{\prime}}}^{2}(z)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)
=hm(x)hm(y)hm(x)hm(z)ϕk1(xy){ϕk2(xz)+ϕk2(xz)}\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-z)+\phi_{-{{k}_{2}}}(x-z)\right\}
+hm(x)hm(y)hm(y)hm(z)ϕk1(xy){ϕk2(yz)+ϕk2(yz)}\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(z)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-z)+\phi_{-{{k}_{2}}}(y-z)\right\}
+ϕk2(xy)hm2(z)hm(x)hm(y)+ϕk1(xy)hm2(z)hm(x)hm(y)\displaystyle+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(z)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}_{1}}(x-y)h_{{m^{\prime}}}^{2}(z)h_{{m}}(x)h_{{m}}(y)

Looking at the positive term multiplying ρ(2)(x,y)\rho^{(2)}(x,y) we can simplify to

m(2)(x,y)\displaystyle m^{(2)}(x,y) =hm(x)hm(y)hm(x)hm(y)ϕk1(xy)ϕk2(xy)+hm(x)hm(y)hm(y)hm(x)ϕk1(xy)ϕk2(yx)\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(x-y)+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(x)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(y-x)
+hm2(x)hm(x)hm(y)ϕk2(xy)+hm2(y)hm(x)hm(y)ϕk1(xy)+hm2(x)hm(x)hm(y)ϕk1(xy)\displaystyle+h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)+h_{{m}}^{2}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)
+hm2(x)hm(x)hm(y)ϕk2(xy)+hm2(x)hm2(y)\displaystyle+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{2}}(x-y)+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)
=hm(x)hm(y)hm(x)hm(y)fk1(xy){fk2(xy)+fk2(xy)}+hm2(x)hm(x)hm(y)\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)f_{{k}_{1}}(x-y)\left\{f_{{k}_{2}}(x-y)+f_{-{{k}_{2}}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)
×{fk2(xy)+fk1(xy)}+hm2(x)hm(x)hm(y){fk2(xy)+fk1(xy)}+hm2(x)hm2(y)\displaystyle\times\left\{f_{{k}_{2}}(x-y)+f_{{k}_{1}}(x-y)\right\}+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\left\{f_{{k}_{2}}(x-y)+f_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)
=hm(x)hm(y)hm(x)hm(y)ϕk1(xy){ϕk2(xy)+ϕk2(xy)}\displaystyle=h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{k_{2}}}(x-y)\right\}
+{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}{ϕk2(xy)+ϕk1(xy)}+hm2(x)hm2(y).\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\cdot\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y).

Then it follows that we can simplify the final term:

{λ+B2hm(x)hm(y)eik1T(xy)ρ(2)(xy)𝑑x𝑑y}{λ+B2hm(x)hm(y)eik2T(xy)ρ(2)(xy)𝑑x𝑑y}\displaystyle\left\{\lambda+\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)e^{-i{{k}_{1}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}\left\{\lambda+\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)e^{-i{{k}_{2}}^{T}(x-y)}\rho^{(2)}(x-y)dxdy\right\}
=λ2+λB2(hm(x)hm(y)ϕk1(xy)+hm(x)hm(y)ϕk2(xy))ρ(2)(xy)𝑑x𝑑y\displaystyle=\lambda^{2}+\lambda\int_{B^{2}}(h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)+h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y))\rho^{(2)}(x-y)dxdy
+{B2hm(x)hm(y)ϕk1(xy)ρ(2)(xy)𝑑x𝑑y}{B2hm(x)hm(y)ϕk2(xy)ρ(2)(xy)𝑑x𝑑y}\displaystyle+\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\rho^{(2)}(x-y)dxdy\right\}\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\rho^{(2)}(x-y)dxdy\right\}

We then get for the tapered periodogram:

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k1),Imt(k2)}=B4ρ(4)(x,y,z,v)hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)𝑑x𝑑y𝑑v𝑑z\displaystyle\{I^{t}_{{m}}({{k}_{1}}),I^{t}_{{m^{\prime}}}({{k}_{2}})\}=\int_{B^{4}}\rho^{(4)}(x,y,z,v)h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)\;dxdydvdz
+B3ρ(3)(x,y,v)[hm(x)hm(y)hm(x)hm(v)ϕk1(xy){ϕk2(xv)+ϕk2(xv)}\displaystyle+\int_{B^{3}}\rho^{(3)}(x,y,v)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\right.
+hm(x)hm(y)hm(y)hm(v)ϕk1(xy){ϕk2(yv)+ϕk2(yv)}\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}
+ϕk2(xy)hm2(v)hm(x)hm(y)+ϕk1(xy)hm2(v)hm(x)hm(y)]dxdydv\displaystyle\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}_{1}}(x-y)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv
+B2ρ(2)(x,y)[hm(x)hm(y)hm(x)hm(y)ϕk1(xy){ϕk2(xy)+ϕk2(xy)}\displaystyle+\int_{B^{2}}\rho^{(2)}(x,y)\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right.
+{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}{ϕk2(xy)+ϕk1(xy)}+hm2(x)hm2(y)]dxdy\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\right]\,dxdy
+λBhm2(x)hm2(x)𝑑xλ2λB2hm(x)hm(y)hm(x)hm(y)(ϕk1(xy)+ϕk2(xy))ρ(2)(x,y)𝑑x𝑑y\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))\rho^{(2)}(x,y)dxdy
{B2hm(x)hm(y)ϕk1(xy)ρ(2)(x,y)𝑑x𝑑y}{B2hm(x)hm(y)ϕk2(xy)ρ(2)(x,y)𝑑x𝑑y}.\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\rho^{(2)}(x,y)dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}_{2}}(x-y)\rho^{(2)}(x,y)dxdy\right\}.

This gives the covariance between the tapered periodogram with different tapers, at different wavenumbers, and it quite a useful general expression, as the following expression will show.

Appendix G Proof of Corollary 5.1 Part I

Proof.

This proposition both determines the variance of a single taper periodogram, and the cross-covariance between the tapers at a fixed wavenumber k{k} for a Poisson Process. If we start from the assumption of Poissonian (namely (42)) then we find

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k),Imt(k)}=B4λ4hm(x)hm(y)hm(z)hm(v)ϕk(xy)ϕk(zv)𝑑x𝑑y𝑑v𝑑z\displaystyle\{I^{t}_{{m}}({{k}}),I^{t}_{{m^{\prime}}}({{k}})\}=\int_{B^{4}}\lambda^{4}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(z)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\phi_{{k}}(z-v)\;dxdydvdz
+B3λ3[hm(x)hm(y)hm(x)hm(v)ϕk(xy){ϕk(xv)+ϕk(xv)}\displaystyle+\int_{B^{3}}\lambda^{3}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-v)+\phi_{-{{k}}}(x-v)\right\}\right.
+hm(x)hm(y)hm(y)hm(v)ϕk(xy){ϕk(yv)+ϕk(yv)}\displaystyle+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(y-v)+\phi_{-{{k}}}(y-v)\right\}
+ϕk(xy)hm2(v)hm(x)hm(y)+ϕk(xy)hm2(v)hm(x)hm(y)]dxdydv\displaystyle\left.+\phi_{{k}}(x-y)h_{{m}}^{2}(v)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}}(x-y)h_{{m^{\prime}}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv
+B2λ2[hm(x)hm(y)hm(x)hm(y)ϕk(xy){ϕk(xy)+ϕk(xy)}\displaystyle+\int_{B^{2}}\lambda^{2}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-y)+\phi_{-{{k}}}(x-y)\right\}\right.
+{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}{ϕk(xy)+ϕk(xy)}+hm2(x)hm2(y)]dxdy\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}}(x-y)+\phi_{{k}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(y)\right]\,dxdy
+λBhm2(x)hm2(x)𝑑xλ2λB2hm(x)hm(y)hm(x)hm(y)(ϕk(xy)+ϕk(xy))λ2𝑑x𝑑y\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)(\phi_{{k}}(x-y)+\phi_{{k}}(x-y))\lambda^{2}dxdy
{B2hm(x)hm(y)ϕk(xy)λ2𝑑x𝑑y}{B2hm(x)hm(y)ϕk(xy)λ2𝑑x𝑑y}.\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}}(x-y)\lambda^{2}dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\lambda^{2}dxdy\right\}.

We now need to massage this expression in order to understand the correlation. We first note that the last and first terms cancel exactly, leaving us with

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k),Imt(k)}=λ3B3[hm(x)hm(y)hm(x)hm(v)ϕk(xy){ϕk(xv)+ϕk(xv)}\displaystyle\{I^{t}_{{m}}({{k}}),I^{t}_{{m^{\prime}}}({{k}})\}=\lambda^{3}\int_{B^{3}}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-v)+\phi_{-{{k}}}(x-v)\right\}\right.
+hm(x)hm(y)hm(y)hm(v)ϕk(xy){ϕk(yv)+ϕk(yv)}]dxdydv\displaystyle\left.+h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(y-v)+\phi_{-{{k}}}(y-v)\right\}\right]\;dxdydv
+λ3B2ϕk(xy)hm(x)hm(y)+ϕk(xy)hm(x)hm(y)dxdy\displaystyle+\lambda^{3}\int_{B^{2}}\phi_{{k}}(x-y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}}(x-y)h_{{m}}(x)h_{{m}}(y)\;dxdy
+λ2B2[hm(x)hm(y)hm(x)hm(y)ϕk(xy){ϕk(xy)+ϕk(xy)}\displaystyle+\lambda^{2}\int_{B^{2}}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-y)+\phi_{-{{k}}}(x-y)\right\}\right.
+{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}2ϕk(xy)]dxdy\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.2\phi_{{k}}(x-y)\right]\,dxdy
+λBhm2(x)hm2(x)𝑑x2λ3B2hm(x)hm(y)hm(x)hm(y)ϕk(xy)𝑑x𝑑y\displaystyle+\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx-2\lambda^{3}\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\;dxdy
=j=17Tj(k),\displaystyle=\sum_{j=1}^{7}T_{j}({k}),

this defining the sequence {Tj(k)}\{T_{j}({k})\}. We need some extra relationships to simplify these expressions. We note that

Bhm(x)hm(x)𝑑x=δmm\displaystyle\int_{B}h_{{m}}(x)h_{{m^{\prime}}}(x)\;dx=\delta_{{{m}}{{m^{\prime}}}} (97)
Bhm(x)ϕ2k(x)𝑑x0,k>bh>0,\displaystyle\int_{B}h_{{m}}(x)\phi_{2{k}}(x)\;dx\approx 0,\quad{k}>b_{h}>0, (98)
Hm(k2k)Hm(k)𝑑k0k>bh>0,\displaystyle\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\approx 0\quad{k}>b_{h}>0, (99)

where bhb_{h} is the bandwidth of the window defined in (43). We find

T1(k)\displaystyle T_{1}({k}) =λ3B3hm(x)hm(y)hm(x)hm(v)ϕk(xy){ϕk(xv)+ϕk(xv)}𝑑x𝑑y𝑑v\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-v)+\phi_{-{{k}}}(x-v)\right\}\;dxdydv
=λ3Hm(k)Hm(k)Hm(k2k)Hm(k)𝑑k+λ3δmmHm(k)Hm(k).\displaystyle=\lambda^{3}H_{{m}}(-{k})H_{{m^{\prime}}}(-{k})\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}+\lambda^{3}\delta_{{{m}}{{m^{\prime}}}}H_{{m}}({k})H_{{m^{\prime}}}(-{k}). (100)

The next term is

T2(k)\displaystyle T_{2}({k}) =λ3B3hm(x)hm(y)hm(y)hm(v)ϕk(xy){ϕk(yv)+ϕk(yv)}𝑑x𝑑y𝑑v\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(y)h_{{m^{\prime}}}(v)\phi_{{k}}(x-y)\left\{\phi_{{k}}(y-v)+\phi_{-{{k}}}(y-v)\right\}\;dxdydv
=λ3δmmHm(k)Hm(k)+λ3Hm(k)Hm(k)Hm(k2k)Hm(k)𝑑k.\displaystyle=\lambda^{3}\delta_{{{m}}{{m^{\prime}}}}H_{{m}}({k})H_{{m^{\prime}}}(-{k})+\lambda^{3}H_{{m}}(-{k})H_{{m^{\prime}}}(-{k})\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}. (101)

The following term is

T3(k)\displaystyle T_{3}({k}) =λ3B2ϕk(xy)hm(x)hm(y)+ϕk(xy)hm(x)hm(y)dxdy\displaystyle=\lambda^{3}\int_{B^{2}}\phi_{{k}}(x-y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+\phi_{{k}}(x-y)h_{{m}}(x)h_{{m}}(y)\;dxdy
=λ3Hm(k)Hm(k)+λ3Hm(k)Hm(k).\displaystyle=\lambda^{3}H_{{m^{\prime}}}({k})H_{{m^{\prime}}}(-{k})+\lambda^{3}H_{{m}}({k})H_{{m}}(-{k}).

The next term is

T4(k)\displaystyle T_{4}({k}) =λ2B2[hm(x)hm(y)hm(x)hm(y)ϕk(xy){ϕk(xy)+ϕk(xy)}]𝑑x𝑑y\displaystyle=\lambda^{2}\int_{B^{2}}\left[h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\left\{\phi_{{k}}(x-y)+\phi_{-{{k}}}(x-y)\right\}\right]dxdy
=λ2|Hm(k2k)Hm(k)𝑑k|2+λ2δmm.\displaystyle=\lambda^{2}\left|\int H_{{m}}({k}^{\prime}-2{k})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\right|^{2}+\lambda^{2}\delta_{{{m}}{{m^{\prime}}}}.

The next term then takes the form

T5(k)=2λ2B2{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}ϕk(xy)𝑑x𝑑y\displaystyle T_{5}({k})=2\lambda^{2}\int_{B^{2}}\left\{h_{{m}}^{2}(x)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)+h_{{m^{\prime}}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\phi_{{k}}(x-y)\,dxdy
=2λ2{Hm(k)2dHm(k′′)Hm(kk′′)Hm(kk)𝑑k𝑑k′′+Hm(k)2dHm(k′′)Hm(kk′′)Hm(kk)𝑑k𝑑k′′}.\displaystyle=2\lambda^{2}\left\{H_{{m^{\prime}}}(-{k})\int_{{\mathbb{R}}^{2d}}H_{{m}}({k}^{\prime\prime})H_{{m}}({k}^{\prime}-{k}^{\prime\prime})H_{{m^{\prime}}}({k}-{k}^{\prime})\,d{k}^{\prime}d{k}^{\prime\prime}+H_{{m}}(-{k})\int_{{\mathbb{R}}^{2d}}H_{{m^{\prime}}}({k}^{\prime\prime})H_{{m^{\prime}}}({k}^{\prime}-{k}^{\prime\prime})H_{{m}}({k}-{k}^{\prime})\,d{k}^{\prime}d{k}^{\prime\prime}\right\}.

The next term takes the form of

T6(k)\displaystyle T_{6}({k}) =λBhm2(x)hm2(x)𝑑xλ|B|.\displaystyle=\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx\sim\frac{\lambda}{|B|}.

And the final term is negative:

T7(k)\displaystyle T_{7}({k}) =2λ3B2hm(x)hm(y)hm(x)hm(y)ϕk(xy)𝑑x𝑑y\displaystyle=-2\lambda^{3}\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)h_{{m^{\prime}}}(x)h_{{m^{\prime}}}(y)\phi_{{k}}(x-y)\;dxdy
=2λ3|Hm(kk)Hm(k)𝑑k|2.\displaystyle=-2\lambda^{3}\left|\int H_{{m}}({k}-{k}^{\prime})H_{{m^{\prime}}}({k}^{\prime})\;d{k}^{\prime}\right|^{2}.

As we have assumed the wavenumber k{k} is sufficiently large both |Hm(k)||Hm(k)|0|H_{{m}}({k})|\approx|H_{{m^{\prime}}}({k})|\approx 0. This means the only surviving contributions are

𝐂𝐨𝐯{Imt(k),Imt(k)}λBhm2(x)hm2(x)𝑑x+λ2δmm.{\mathrm{\bf Cov}}\{I^{t}_{{m}}({{k}}),I^{t}_{{m^{\prime}}}({{k}})\}\approx\lambda\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\,dx+\lambda^{2}\delta_{{{m}}{{m^{\prime}}}}.

This yields the stated result. We can derive the o(1)o(1) terms formally should we wish by quantifying the leakage in ll, the length of a side. ∎

Appendix H Proof of Corollary 5.1 Part II

Proof.

If we start from the assumption of Poissonian (namely (42)) then

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k1),Imt(k2)}=B4λ4hm(x)hm(y)hm(z)hm(v)ϕk1(xy)ϕk2(zv)𝑑x𝑑y𝑑v𝑑z\displaystyle\{I^{t}_{{m}}({{k}_{1}}),I^{t}_{{m}}({{k}_{2}})\}=\int_{B^{4}}\lambda^{4}h_{{m}}(x)h_{{m}}(y)h_{{m}}(z)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\phi_{{k}_{2}}(z-v)\;dxdydvdz
+B3λ3[hm2(x)hm(y)hm(v)ϕk1(xy){ϕk2(xv)+ϕk2(xv)}\displaystyle+\int_{B^{3}}\lambda^{3}\left[h_{{m}}^{2}(x)h_{{m}}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\right.
+hm(x)hm2(y)hm(v)ϕk1(xy){ϕk2(yv)+ϕk2(yv)}\displaystyle+h_{{m}}(x)h_{{m}}^{2}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}
+ϕk2(xy)hm2(v)hm(x)hm(y)+ϕk1(xy)hm2(v)hm(x)hm(y)]dxdydv\displaystyle\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)+\phi_{{k}_{1}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv
+B2λ2[hm(x)hm(y)hm(x)hm(y)ϕk1(xy){ϕk2(xy)+ϕk2(xy)}\displaystyle+\int_{B^{2}}\lambda^{2}\left[h_{{m}}(x)h_{{m}}(y)h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right.
+{hm2(x)hm(x)hm(y)+hm2(x)hm(x)hm(y)}{ϕk2(xy)+ϕk1(xy)}+hm2(x)hm2(y)]dxdy\displaystyle+\left\{h_{{m}}^{2}(x)h_{{m}}(x)h_{{m}}(y)+h_{{m}}^{2}(x)h_{{m}}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}+h_{{m}}^{2}(x)h_{{m}}^{2}(y)\right]\,dxdy
+λBhm4(x)𝑑xλ2λB2hm2(x)hm2(y)(ϕk1(xy)+ϕk2(xy))λ2𝑑x𝑑y\displaystyle+\lambda\int_{B}h_{{m}}^{4}(x)\,dx-\lambda^{2}-\lambda\int_{B^{2}}h_{{m}}^{2}(x)h_{{m}}^{2}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))\lambda^{2}dxdy
{B2hm(x)hm(y)ϕk1(xy)λ2𝑑x𝑑y}{B2hm(x)hm(y)ϕk2(xy)λ2𝑑x𝑑y}\displaystyle-\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{1}}(x-y)\lambda^{2}dxdy\right\}\cdot\left\{\int_{B^{2}}h_{{m}}(x)h_{{m}}(y)\phi_{{k}_{2}}(x-y)\lambda^{2}dxdy\right\}

Continuing on with the calculations we find:

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Imt(k1),Imt(k2)}=B3λ3[hm2(x)hm(y)hm(v)ϕk1(xy){ϕk2(xv)+ϕk2(xv)}\displaystyle\{I^{t}_{{m}}({{k}_{1}}),I^{t}_{{m}}({{k}_{2}})\}=\int_{B^{3}}\lambda^{3}\left[h_{{m}}^{2}(x)h_{{m}}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\right.
+hm(x)hm2(y)hm(v)ϕk1(xy){ϕk2(yv)+ϕk2(yv)}\displaystyle+h_{{m}}(x)h_{{m}}^{2}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}
+ϕk2(xy)hm2(v)hm(x)hm(y)+ϕk1(xy)hm2(v)hm(x)hm(y)]dxdydv\displaystyle\left.+\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)+\phi_{{k}_{1}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv
+B2λ2[hm2(x)hm2(y)ϕk1(xy){ϕk2(xy)+ϕk2(xy)}\displaystyle+\int_{B^{2}}\lambda^{2}\left[h_{{m}}^{2}(x)h_{{m}}^{2}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right.
+{hm3(x)hm(y)+hm3(x)hm(y)}{ϕk2(xy)+ϕk1(xy)}]dxdy\displaystyle+\left\{h_{{m}}^{3}(x)h_{{m}}(y)+h_{{m}}^{3}(x)h_{{m}}(y)\right\}\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}\right]\,dxdy
+λBhm4(x)𝑑xλ3B2hm2(x)hm2(y)(ϕk1(xy)+ϕk2(xy))𝑑x𝑑y.\displaystyle+\lambda\int_{B}h_{{m}}^{4}(x)\,dx-\lambda^{3}\int_{B^{2}}h_{{m}}^{2}(x)h_{{m}}^{2}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))dxdy.

We like in the previous case split this into seven parts {T~j(k)}\{\widetilde{T}_{j}({k})\}. We find that

T~1(k)\displaystyle\widetilde{T}_{1}({k}) =λ3B3hm2(x)hm(y)hm(v)ϕk1(xy){ϕk2(xv)+ϕk2(xv)}𝑑x𝑑y𝑑v\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}^{2}(x)h_{{m}}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-v)+\phi_{-{{k}_{2}}}(x-v)\right\}\;dxdydv
=λ3Hm(k1)Hm(k2)dHm(k)Hm(k1+k2k)𝑑k\displaystyle=\lambda^{3}H_{{m}}(-{k}_{1})H_{{m}}(-{k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}+{k}_{2}-{k}^{\prime})\;d{k}^{\prime}
+λ3Hm(k1)Hm(k2)dHm(k)Hm(k1k2k)𝑑k.\displaystyle+\lambda^{3}H_{{m}}(-{k}_{1})H_{{m}}({k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}-{k}_{2}-{k}^{\prime})\;d{k}^{\prime}.

If k1k2,k1+k20{k}_{1}-{k}_{2},{k}_{1}+{k}_{2}\neq 0 and is some fixed number, then this becomes negligible. The next term is

T~2(k)\displaystyle\widetilde{T}_{2}({k}) =λ3B3hm(x)hm2(y)hm(v)ϕk1(xy){ϕk2(yv)+ϕk2(yv)}𝑑x𝑑y𝑑v\displaystyle=\lambda^{3}\int_{B^{3}}h_{{m}}(x)h_{{m}}^{2}(y)h_{{m}}(v)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(y-v)+\phi_{-{{k}_{2}}}(y-v)\right\}dxdydv
=λ3Hm(k1)Hm(k2)dHm(k)Hm(k2k1k)𝑑k\displaystyle=\lambda^{3}H_{{m}}({k}_{1})H_{{m}}(-{k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{2}-{k}_{1}-{k}^{\prime})\;d{k}^{\prime}
+λ3Hm(k1)Hm(k2)dHm(k)Hm(k2+k1k)𝑑k.\displaystyle+\lambda^{3}H_{{m}}({k}_{1})H_{{m}}(-{k}_{2})\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{2}+{k}_{1}-{k}^{\prime})\;d{k}^{\prime}.

If k1k2,k1+k20{k}_{1}-{k}_{2},{k}_{1}+{k}_{2}\neq 0 and is some fixed number, then this becomes negligible. The next term is

T~3(k)\displaystyle\widetilde{T}_{3}({k}) =λ3B3[ϕk2(xy)hm2(v)hm(x)hm(y)+ϕk1(xy)hm2(v)hm(x)hm(y)]𝑑x𝑑y𝑑v\displaystyle=\lambda^{3}\int_{B^{3}}\left[\phi_{{k}_{2}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)+\phi_{{k}_{1}}(x-y)h_{{m}}^{2}(v)h_{{m}}(x)h_{{m}}(y)\right]\;dxdydv
=λ3|Hm(k2)|2+λ3|Hm(k1)|2.\displaystyle=\lambda^{3}\left|H_{{{m}}}({k}_{2})\right|^{2}+\lambda^{3}\left|H_{{{m}}}({k}_{1})\right|^{2}.

As k1,k2>bh{k}_{1},{k}_{2}>b_{h} the bandwidth this contribution becomes negligible. The next contribution in the expression takes the form

T~4(k)\displaystyle\widetilde{T}_{4}({k}) =B2λ2[hm2(x)hm2(y)ϕk1(xy){ϕk2(xy)+ϕk2(xy)}]𝑑x𝑑y\displaystyle=\int_{B^{2}}\lambda^{2}\left[h_{{m}}^{2}(x)h_{{m}}^{2}(y)\phi_{{k}_{1}}(x-y)\left\{\phi_{{k}_{2}}(x-y)+\phi_{-{{k}_{2}}}(x-y)\right\}\right]dxdy
=λ2|dHm(k)Hm(k1+k2k)𝑑k|2+λ2|dHm(k)Hm(k1k2k)𝑑k|2.\displaystyle=\lambda^{2}\left|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}+{k}_{2}-{k}^{\prime})\;d{k}^{\prime}\right|^{2}+\lambda^{2}\left|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}-{k}_{2}-{k}^{\prime})\;d{k}^{\prime}\right|^{2}.

Then the next term is

T~5(k)\displaystyle\widetilde{T}_{5}({k}) =2λ2B2hm3(x)hm(y){ϕk2(xy)+ϕk1(xy)}]dxdy.\displaystyle=2\lambda^{2}\int_{B^{2}}h_{{m}}^{3}(x)h_{{m}}(y)\left.\left\{\phi_{{k}_{2}}(x-y)+\phi_{{k}_{1}}(x-y)\right\}\right]\,dxdy.

Figuring out the size of this contribution is a little bit more complex. The integral will do a Fourier transform in yy of hm(y)h_{{m}}(y) in k1{k}_{1} and k2{k}_{2}, and also the conjugate will be taken. The Fourier transform in yy will be supported when |k1|<bh|{k}_{1}|<b_{h} and |k2|<bh|{k}_{2}|<b_{h}, but will not be supported when either magnitude gets too large. The second Fourier transform in xx will be somewhat more spread–this is because we are Fourier transforming hm3(x)h_{{m}}^{3}(x), which will be a third order convolution once Fourier transformed. But as long as we can use the joint concentration in |k1|<bh|{k}_{1}|<b_{h} and |k2|<bh|k_{2}|<b_{h} this will still be negligible.

The next term is given by

T~6(k)\displaystyle\widetilde{T}_{6}({k}) =λBhm4(x)𝑑x=λhm44=o(1).\displaystyle=\lambda\int_{B}h_{{m}}^{4}(x)\,dx=\lambda\|h_{{m}}\|_{4}^{4}=o(1).

To see why this is true, note that if hm(x)h_{{m}}(x) is constant then we can easily calculate the higher order norms. We find by using the Cauchy–Schwarz inequality that if pqp\neq q that

T62(k)\displaystyle T_{6}^{2}({k}) =[Bhm2(x)hm2(x)𝑑x]2\displaystyle=\left[\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\;dx\right]^{2}
Bhm4(x)Bhm4(x)\displaystyle\leq\int_{B}h_{{m}}^{4}(x)\int_{B}h_{{m^{\prime}}}^{4}(x)
0Bhm2(x)hm2(x)𝑑x\displaystyle\Rightarrow 0\leq\int_{B}h_{{m}}^{2}(x)h_{{m^{\prime}}}^{2}(x)\;dx max{Bhm4(x),Bhm4(x)}.\displaystyle\leq\max\{\int_{B}h_{{m}}^{4}(x),\int_{B}h_{{m^{\prime}}}^{4}(x)\}. (102)

We also note that

12\displaystyle 1^{2} =[B1h2(x)𝑑x]2B12𝑑xBh4(x)𝑑x\displaystyle=\left[\int_{B}1\cdot h^{2}(x)\;dx\right]^{2}\leq\int_{B}1^{2}\;dx\cdot\int_{B}h^{4}(x)\;dx (103)
\displaystyle\Rightarrow 1|B|Bh4(x)𝑑x.\displaystyle\frac{1}{|B|}\leq\int_{B}h^{4}(x)\;dx. (104)

We now need to determine an upper bound. We then look at

[Bh4(x)𝑑x]2\displaystyle\left[\int_{B}h^{4}(x)\;dx\right]^{2} B12𝑑xBh8(x)𝑑x=|B|Bh8(x)𝑑x.\displaystyle\leq\int_{B}1^{2}\;dx\cdot\int_{B}h^{8}(x)\;dx=|B|\cdot\int_{B}h^{8}(x)\;dx. (105)

Then we have

1|B|Bh4(x)𝑑x|B|Bh8(x)𝑑x.\frac{1}{|B|}\leq\int_{B}h^{4}(x)\;dx\leq\sqrt{|B|\cdot\int_{B}h^{8}(x)\;dx}. (106)

We can for any taper h(x)h(x) calculate Bh4(x)𝑑x\int_{B}h^{4}(x)\;dx explicitly. For the constant function we get 1|B|\frac{1}{|B|}. For tapers well–concentrated we would expect a similar decrease, but for any choice of taper we can calculate the value of the 4th norm explicitly.

Thus we understand a bit more about this term. Moving on to the next aspect of the computation, finally,

T~7(k)\displaystyle\widetilde{T}_{7}({k}) =λ3B2hm2(x)hm2(y)(ϕk1(xy)+ϕk2(xy))𝑑x𝑑y\displaystyle=-\lambda^{3}\int_{B^{2}}h_{{m}}^{2}(x)h_{{m}}^{2}(y)(\phi_{{k}_{1}}(x-y)+\phi_{{k}_{2}}(x-y))dxdy
=λ3|dHm(k)Hm(k1k)𝑑k|2λ3|dHm(k)Hm(k2k)𝑑k|2.\displaystyle=-\lambda^{3}\left|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})\;d{k}^{\prime}\right|^{2}-\lambda^{3}\left|\int_{{\mathbb{R}}^{d}}H_{{m}}({k}^{\prime})H_{{m}}({k}_{2}-{k}^{\prime})\;d{k}^{\prime}\right|^{2}.

This shows each individual contribution as long as |k1k2|,|k1+k2|>bh|{k}_{1}-{k}_{2}|,|{k}_{1}+{k}_{2}|>b_{h}. This completes the proof. ∎

Appendix I Proof of Proposition 5.1

Proof.

Assume XX satisfies the assumptions given for (25) and (31). Note from Proposition 4.1 that

0𝐕𝐚𝐫{Jh(k)}\displaystyle 0\leq{\mathrm{\bf Var}}\left\{J_{h}({k})\right\} =λ+λ2Rdf~(2)(k)|H(kk)|2𝑑w\displaystyle=\lambda+\lambda^{2}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})\left|H({k}-{k}^{\prime})\right|^{2}dw^{\prime}
λ+λ2f~(2)0Rd|H(kk)|2𝑑w\displaystyle\leq\lambda+\lambda^{2}\|\widetilde{f}^{(2)}\|_{0}\int_{R^{d}}\left|H({k}-{k}^{\prime})\right|^{2}dw^{\prime}
=λ+λ2f~(2)0<.\displaystyle=\lambda+\lambda^{2}\|\widetilde{f}^{(2)}\|_{0}<\infty. (107)

We can then deduce from [22, Theorem 6.1] that J~h(k)\tilde{J}_{h}({k}) is uniformly integrable. However to be able to compute the covariance of the periodogram from the convergence of law to the Gaussian, then we need to show that |J~m(k)|4|\tilde{J}_{{m}}({k})|^{4}, or even |J~m(k1)|2|J~m(k2)|2|\tilde{J}_{{m}}({k}_{1})|^{2}|\tilde{J}_{{m^{\prime}}}({k}_{2})|^{2} are uniformly integrable. We now apply  [22, Theorem 6.2] and assume f~(2)0\|\widetilde{f}^{(2)}\|_{0}, f~(3)0\|\widetilde{f}^{(3)}\|_{0}, f~(4)0\|\widetilde{f}^{(4)}\|_{0}, f~(5)0\|\widetilde{f}^{(5)}\|_{0} and f~(6)0\|\widetilde{f}^{(6)}\|_{0} are all finite which assures |J~m(k)|4|\tilde{J}_{{m}}({k})|^{4} and |J~m(k1)|2|J~m(k2)|2|\tilde{J}_{{m}}({k}_{1})|^{2}|\tilde{J}_{{m^{\prime}}}({k}_{2})|^{2} are uniformly integrable. We can then deduce that as J~m(k)\tilde{J}_{{m}}({k}) has converged in law to a Gaussian random variable, the moments of J~m(k)\tilde{J}_{{m}}({k}) can be computed from the Gaussian law.

It follows that Isserlis’ [40] theorem can be applied by using [22, Theorem 6.2] and so

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {I~mt(k1),I~mt(k2)}=𝐄{J~m(k1)J~m(k1)J~m(k2)J~m(k2)}+o(1)𝐄{J~m(k1)J~m(k1)}𝐄{J~m(k2)J~m(k2)}\displaystyle\{\widetilde{I}^{t}_{{m}}({k}_{1}),\widetilde{I}^{t}_{{m^{\prime}}}({k}_{2})\}={\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m}}({k}_{1})\widetilde{J}_{{m^{\prime}}}({k}_{2})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}+o(1)-{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m}}({k}_{1})\}{\mathbf{E}}\{\widetilde{J}_{{m^{\prime}}}({k}_{2})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}
=𝐄{J~m(k1)J~m(k2)}𝐄{J~m(k1)J~m(k2)}+𝐄{J~m(k1)J~m(k2)}𝐄{J~m(k1)J~m(k2)}+o(1)\displaystyle={\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}_{{m^{\prime}}}({k}_{2})\}{\mathbf{E}}\{\widetilde{J}^{\ast}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}+{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}{\mathbf{E}}\{\widetilde{J}^{\ast}_{{m}}({k}_{1})\widetilde{J}_{{m^{\prime}}}({k}_{2})\}+o(1)
=o(1)+|𝐄{J~m(k1)J~m(k2)}|2.\displaystyle=o(1)+\left|{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\}\right|^{2}.

We note that the same sort of calculations as Proposition 4.1 can be applied and so for k1k2{k}_{1}\neq{k}_{2}

𝐄{J~m(k1)J~m(k2)}\displaystyle{\mathbf{E}}\{\widetilde{J}_{{m}}({k}_{1})\widetilde{J}^{\ast}_{{m^{\prime}}}({k}_{2})\} =𝐄{(Jm(k1)λHm(k1))(Jm(k2)λHm(k1))}\displaystyle={\mathbf{E}}\{({J}_{{m}}({k}_{1})-\lambda H_{{m}}({k}_{1}))({J}^{\ast}_{{m^{\prime}}}({k}_{2})-\lambda H^{\ast}_{{m^{\prime}}}({k}_{1}))\}
=𝐄{Jm(k1)Jm(k2)}λ2Hm(k1)Hm(k2).\displaystyle={\mathbf{E}}\{{J}_{{m}}({k}_{1}){J}^{\ast}_{{m^{\prime}}}({k}_{2})\}-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2}).

We now calculate the covariance as

𝐂𝐨𝐯{Jm(k1),Jm(k2)}=\displaystyle{\mathrm{\bf Cov}}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}= 𝐄{Jm(k1)Jm(k2)}λ2Hm(k1)Hm(k2)\displaystyle{\mathbf{E}}\{{J}_{{m}}({k}_{1}){J}^{\ast}_{{m^{\prime}}}({k}_{2})\}-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2})
=\displaystyle= λδmmδk1k2+Rd×Rdρ(2)(xy)hm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑y\displaystyle\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\iint_{R^{d}\times R^{d}}{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy
λ2Hm(k1)Hm(k2).\displaystyle-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2}). (108)

We now define the renormalised quantity

ρ~(2)(z)=ρ(2)(z)λ2λ2,zd.\widetilde{\rho}^{(2)}(z)=\frac{{\rho}^{(2)}(z)-\lambda^{2}}{\lambda^{2}},\quad z\in{\mathbb{R}}^{d}. (109)

The expression in (108) then can be simplified to

𝐂𝐨𝐯\displaystyle{\mathrm{\bf Cov}} {Jm(k1),Jm(k2)}\displaystyle\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}
=λδmmδk1k2+λ2Rd×Rd(ρ~(2)(xy)+1)hm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑yλ2Hm(k1)Hm(k2)\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}\iint_{R^{d}\times R^{d}}\left(\widetilde{\rho}^{(2)}(x-y)+1\right)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy-\lambda^{2}H_{{m}}({k}_{1})H^{\ast}_{{m^{\prime}}}({k}_{2})
=λδmmδk1k2+λ2Rd×Rdρ~(2)(xy)hm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑y\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy
=λδmmδk1k2+λ2𝐂𝐨𝐯1{Jm(k1),Jm(k2)},\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}, (110)

where we define

𝐂𝐨𝐯1{Jm(k1),Jm(k2)}Rd×Rdρ~(2)(xy)hm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑y.{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}\equiv\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy. (111)

To simplify this expression we note that

𝐂𝐨𝐯1{Jm(k1),Jm(k2)}\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\} =Rd×Rdρ~(2)(xy)hm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑y\displaystyle=\iint_{R^{d}\times R^{d}}\widetilde{\rho}^{(2)}(x-y)h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy
=Rd×RdRdf~(2)(k)e2πi(xy)k𝑑khm(x)e2πik1xhm(y)e2πik2y𝑑x𝑑y\displaystyle=\iint_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{2\pi i(x-y)\cdot{k}^{\prime}}d{k}^{\prime}\cdot h_{{m}}(x)e^{-2\pi i{k}_{1}\cdot x}h^{\ast}_{{m^{\prime}}}(y)e^{2\pi i{k}_{2}\cdot y}\,dx\,dy
=Rd×RdRdf~(2)(k)e2πix(k1k)e2πiy(k2k)𝑑khm(x)hm(y)𝑑x𝑑y\displaystyle=\iint_{R^{d}\times R^{d}}\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})e^{-2\pi ix\cdot({k}_{1}-{k}^{\prime})}e^{2\pi iy\cdot({k}_{2}-{k}^{\prime})}d{k}^{\prime}\cdot h_{{m}}(x)h^{\ast}_{{m^{\prime}}}(y)\,dx\,dy
=Rdf~(2)(k)Hm(k1k)Hm(k2k)𝑑k.\displaystyle=\iint_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})d{k}^{\prime}. (112)

Note that it is not dimensionally contradictory to Theorem 5.1 as the periodogram is the modulus square of the Fourier transform. We note that as Hm(k)H_{{m}}({k}) has concentrated support we can apply similar arguments to those of Appendix D, as will follow. ∎

Appendix J Proof of Lemma J.1

Lemma J.1.

Assume XX satisfies the assumptions given for (25) and that f~(j)0<\|\widetilde{f}^{(j)}\|_{0}<\infty for j=2,3,4,5,6j=2,3,4,5,6. Assume the two multitapers hm(x)h_{{m}}(x) and hm(x)h_{{m^{\prime}}}(x) are orthogonal and are well concentrated on the compact set 𝒲d{\cal W}\subset{\mathbb{R}^{d}} with length scale ll so that for some chosen ϵl=o(1/l)\epsilon_{l}=o(1/l)

𝒲|Hm(k)|2𝑑k=1ϵl.\int_{\cal W}|H_{{m}}({k})|^{2}\;d{k}=1-\epsilon_{l}.

Then assuming k1,k2𝒦M(B(𝐥)){k}_{1},{k}_{2}\in{\cal K}_{M}(B_{\square}(\bm{l}))

𝐂𝐨𝐯1{Jm(k1),Jm(k2)}\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\} Rdf~(2)(k)Hm(k1k)Hm(k2k)𝑑k+o(1)\displaystyle\equiv\int_{R^{d}}\widetilde{f}^{(2)}({k}^{\prime})H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})d{k}^{\prime}+o(1)
=f~(2)(k1)δmmδk1k2+𝒪(1/l).\displaystyle=\widetilde{f}^{(2)}({k}_{1})\delta_{{{m^{\prime}}}{{m}}}\delta_{{k}_{1}{k}_{2}}+{\cal O}(1/l). (113)
Proof.

We assume that for a choice of εl\varepsilon_{l} we can define a wavenumber region 𝒲{\cal W}

𝒲|Hm(k)|2𝑑k=1εl,whereεl=o(1).\int_{{\cal W}}\left|H_{{m}}({k})\right|^{2}\;d{k}=1-\varepsilon_{l},\quad{\mathrm{where}}\quad\varepsilon_{l}=o(1). (114)

We also assume that f~(k)\tilde{f}({k}) is upper bounded by f~0\|\widetilde{f}\|_{0}. We then have

𝐂𝐨𝐯1{Jm(k1),Jm(k2)}\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\} =d\𝒲Hm(k1k)Hm(k2k)f~(k)𝑑k+𝒲Hm(k1k)Hm(k2k)f~(k)𝑑k.\displaystyle=\int_{\mathbb{R}^{d}\backslash{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}+\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}. (115)

We note that

|d\𝒲Hm(k1k)Hm(k2k)f~(k)𝑑k|f~0{1{1εl}}.\left|\int_{\mathbb{R}^{d}\backslash{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\right|\leq\|\widetilde{f}\|_{0}\{1-\{1-\varepsilon_{l}\}\}.

We can yet again utilise the Taylor expansion of (62) inside 𝒲{\cal W}. We find

𝐂𝐨𝐯1{Jm(k1),Jm(k2)}=𝒲Hm(k1k)Hm(k2k)f~(k)𝑑k{1+o(1)}\displaystyle{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}=\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\widetilde{f}({k}^{\prime})\;d{k}^{\prime}\left\{1+o(1)\right\}
=𝒲Hm(k1k)Hm(k2k){f~(k1)+f~(k1)T(kk1)+12(kk1)T𝐇~f(k′′)(kk1)}𝑑k{1+o(1)}\displaystyle=\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\left\{\widetilde{f}({k}_{1})+\nabla\widetilde{f}({k}_{1})^{T}\left({k}^{\prime}-{k}_{1}\right)+\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\right\}\;d{k}^{\prime}\left\{1+o(1)\right\}
=δk1k2𝒲Hm(k1k)Hm(k1k)f~(k1)𝑑k+𝒲Hm(k1k)Hm(k2k)\displaystyle=\delta_{{k}_{1}{k}_{2}}\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{1}-{k}^{\prime})\widetilde{f}({k}_{1})\;d{k}^{\prime}+\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})
×{f~(k1)T(kk1)+12(kk1)T𝐇~f(k′′)(kk1)}dk{1+o(1)}\displaystyle\times\left\{\nabla\widetilde{f}({k}_{1})^{T}\left({k}^{\prime}-{k}_{1}\right)+\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\right\}\;d{k}^{\prime}\left\{1+o(1)\right\}
=f~(k1)δmmδk1k2+𝒪(1/l)+𝒲Hm(k1k)Hm(k2k)12(kk1)T𝐇~f(k′′)(kk1)𝑑k.\displaystyle=\widetilde{f}({k}_{1})\delta_{{{m^{\prime}}}{{m}}}\delta_{{k}_{1}{k}_{2}}+{\cal O}(1/\sqrt{l})+\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\;d{k}^{\prime}. (116)

We note that

𝒲Hm(k1k)Hm(k2k)12(kk1)T𝐇~f(k′′)(kk1)𝑑k2\displaystyle\left\|\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})\left({k}^{\prime}-{k}_{1}\right)\;d{k}^{\prime}\right\|^{2}
|𝐇~f(k′′)|𝒲Hm(k1k)Hm(k2k)12(kk1)T(kk1)𝑑k\displaystyle\leq|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})|\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})\frac{1}{2}\left({k}^{\prime}-{k}_{1}\right)^{T}\left({k}^{\prime}-{k}_{1}\right)\;d{k}^{\prime}
=|𝐇~f(k′′)|𝒲Hm(k1k)Hm(k2k)w2𝑑k=𝒪(1/l).\displaystyle=|\tilde{\mathbf{H}}_{f}({k}^{\prime\prime})|\int_{{\cal W}}H_{{m}}({k}_{1}-{k}^{\prime})H^{\ast}_{{m^{\prime}}}({k}_{2}-{k}^{\prime})w^{2}d{k}={\cal O}(1/l).

Then

𝐂𝐨𝐯{Jm(k1),Jm(k2)}\displaystyle{\mathrm{\bf Cov}}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\} =λδmmδk1k2+λ2𝐂𝐨𝐯1{Jm(k1),Jm(k2)}\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}{\mathrm{\bf Cov}}_{1}\left\{{J}_{{m}}({k}_{1}),{J}_{{m^{\prime}}}({k}_{2})\right\}
=λδmmδk1k2+λ2f~(k)δmmδk1k2+𝒪(1/l)+𝒪(1/l).\displaystyle=\lambda\delta_{{{m}}{{m^{\prime}}}}\delta_{{k}_{1}{k}_{2}}+\lambda^{2}\widetilde{f}({k})\delta_{{{m^{\prime}}}{{m}}}\delta_{{k}_{1}{k}_{2}}+{\cal O}(1/\sqrt{l})+{\cal O}(1/l).

Appendix K Proof of Proposition 7.2

Proof.

An interesting question is what if we use Diggle’s estimator even if the point process is not isotropic. We recall that the estimator takes in 2D the form

I¯0(|k|)=λ^+|B|1x,yXBJ0(2π|k|xy).\bar{I}_{0}(|{k}|)=\hat{\lambda}+|B|^{-1}\sum_{x,y\in X\cap B}^{\neq}J_{0}\left(2\pi|{k}|\cdot\|x-y\|\right). (117)

We can still compute the estimator for any observed point-process XX, even if XX was not an isotropic process. The estimator I¯0(|k|)\bar{I}_{0}(|{k}|) has expectation

𝐄{I¯0(|k|)}\displaystyle{\mathbf{E}}\{\bar{I}_{0}(|{k}|)\} =λ+|B|1x,yXB𝐄J0(2π|k|xy)\displaystyle={\lambda}+|B|^{-1}\sum_{x,y\in X\cap B}^{\neq}{\mathbf{E}}J_{0}\left(2\pi|{k}|\cdot\|x-y\|\right) (118)
=λ+|B|1BBxJ0(2π|k||z|)ρ(2)(z)𝑑z𝑑x\displaystyle={\lambda}+|B|^{-1}\int_{B}\int_{B_{-x}}J_{0}\left(2\pi|{k}|\cdot|z|\right)\rho^{(2)}(z)\,dz\,dx
=λ+|B|1BBxJ0(2π|k||z|){ρ(2)(z)λ2}𝑑z𝑑x+λ2|B|1BBxJ0(2π|k||z|)𝑑z𝑑x\displaystyle={\lambda}+|B|^{-1}\int_{B}\int_{B_{-x}}J_{0}\left(2\pi|{k}|\cdot|z|\right)\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\,dz\,dx+\lambda^{2}|B|^{-1}\int_{B}\int_{B_{-x}}J_{0}\left(2\pi|{k}|\cdot|z|\right)\,dz\,dx
=λ+|B|1d|B×Bz|J0(2π|k||z|){ρ(2)(z)λ2}𝑑z+λ2|B|1B|B×Bz|J0(2π|k||z|)𝑑z\displaystyle={\lambda}+|B|^{-1}\int_{{\mathbb{R}}^{d}}|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}|\cdot|z|\right)\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\,dz+\lambda^{2}|B|^{-1}\int_{B}|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}|\cdot|z|\right)\,dz
=𝐄I¯0(1)(|k|)+𝐄I¯0(2)(|k|)+𝐄I¯0(3)(|k|),\displaystyle={\mathbf{E}}\bar{I}_{0}^{(1)}(|{k}|)+{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|)+{\mathbf{E}}\bar{I}_{0}^{(3)}(|{k}|), (119)

the latter defining the form of these three contributions.

Just like before we shall explicitly demonstrate the effects of this convolution. We have that the Fourier transform of the Bessel function is

{J0(2π|k||z|)}(u)\displaystyle{\cal F}\left\{J_{0}\left(2\pi|{k}|\cdot|z|\right)\right\}(u) =J0(2π|k||z|)e2πizu𝑑z\displaystyle=\iint J_{0}\left(2\pi|{k}|\cdot|z|\right)e^{-2\pi iz\cdot u}dz
=002πJ0(2π|k|r)e2πir|u|cos(ϕϕu)2π|k|d|k|𝑑ϕ\displaystyle=\int_{0}^{\infty}\int_{0}^{2\pi}J_{0}\left(2\pi|{k}|r\right)e^{-2\pi ir|u|\cos(\phi-\phi_{u})}2\pi|{k}|d|{k}|d\phi
=0J0(2π|k|r)2π|k|d|k|02πe2πi|k||u|cos(ϕϕu)𝑑ϕ.\displaystyle=\int_{0}^{\infty}J_{0}\left(2\pi|{k}|r\right)2\pi|{k}|d|{k}|\int_{0}^{2\pi}e^{-2\pi i|{k}||u|\cos(\phi-\phi_{u})}d\phi. (120)

We now note that

J0(x)=12πππeixsint𝑑t.J_{0}(x)=\frac{1}{2\pi}\int_{-\pi}^{\pi}e^{-ix\sin t}dt.

Thus

{J0(2π|k||z|)}(u)=0J0(2π|k|r)(2π)2J0(2πr|u|)r𝑑r=(2π)2δ(|k||u|)|k|.\displaystyle{\cal F}\left\{J_{0}\left(2\pi|{k}|\cdot|z|\right)\right\}(u)=\int_{0}^{\infty}J_{0}\left(2\pi|{k}|r\right)(2\pi)^{2}J_{0}\left(2\pi r|u|\right)\;rdr=(2\pi)^{2}\frac{\delta(|{k}|-|u|)}{|{k}|}.

We now use the convolution theorem to deduce that:

𝐄I¯0(3)(|k|)\displaystyle{\mathbf{E}}\bar{I}_{0}^{(3)}(|{k}|) =|B|1B|B×Bz|J0(2π|k||z|)𝑑z\displaystyle=|B|^{-1}\int_{B}|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}||z|\right)\,dz
=dδ(|k||u|)|k||B|1T(B,u)𝑑u.\displaystyle=\int_{{\mathbb{R}}^{d}}\frac{\delta(|{k}|-|u|)}{|{k}|}|B|^{-1}T(B,u)\;du. (121)

Thus the low magnitude wavenumber bias is determined by this term. The reason this is a low wavenumber term is the form of |T(B,u)||T(B,u)|: this is concentrated near |u|=0|u|=0, and on top the convolution is aggregating over all wavenumbers with the same modulus. We have assumed rectangular sampling domain.

The second term in the expectation of (119) takes the form of

𝐄I¯0(2)(|k|)=|B|1d|B×Bz|J0(2π|k||z|){ρ(2)(z)λ2}𝑑z.{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|)=|B|^{-1}\int_{{\mathbb{R}}^{d}}|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}||z|\right)\left\{\rho^{(2)}(z)-\lambda^{2}\right\}\,dz.

As multiplications in wavenumber are convolutions in space, we need to compute

{|B|1|B×Bz|J0(2π|k||z|)}(u)\displaystyle{\cal F}\left\{|B|^{-1}\cdot|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}||z|\right)\right\}(u) =d|B|1|B×Bz|J0(2π|k||z|)e2πiuTz𝑑z\displaystyle=\int_{{\mathbb{R}}^{d}}|B|^{-1}\cdot|B\times B_{-z}|\cdot J_{0}\left(2\pi|{k}||z|\right)e^{-2\pi iu^{T}z}\;dz
=dδ(|k||u|)|k||B|1T(B,uu)𝑑u\displaystyle=\int_{{\mathbb{R}}^{d}}\frac{\delta(|{k}|-|u^{\prime}|)}{|{k}|}|B|^{-1}T(B,u^{\prime}-u)\;du^{\prime}
{ρ(2)(z)λ2}(u)\displaystyle{\cal F}\left\{\rho^{(2)}(z)-\lambda^{2}\right\}(u) =λ2f~(u).\displaystyle=\lambda^{2}\widetilde{f}(u). (122)

With these pieces we have that

𝐄I¯0(2)(|k|)=ddλ2f~(u)δ(|k||u′′|)|k||B|1T(B,u′′u)𝑑u′′𝑑u.{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|)=\int_{{\mathbb{R}}^{d}}\int_{{\mathbb{R}}^{d}}\lambda^{2}\widetilde{f}(u^{\prime})\frac{\delta(|{k}|-|u^{\prime\prime}|)}{|{k}|}|B|^{-1}T(B,u^{\prime\prime}-u^{\prime})\;du^{\prime\prime}du^{\prime}. (123)

We now see that further blurring is present in (123) from averaging out the direction. In the standard non-isotropic case this was just a convolution of f~(k)\widetilde{f}({k}) with |B|1T(B,k)|B|^{-1}T(B,{k}). To get a feeling for its behaviour we note that as |B|1T(B,k)δ(k)/(2π)|B|^{-1}T(B,{k})\rightarrow\delta({k})/(2\pi). In this case we get for d=2d=2

𝐄I¯0(2)(|k|)\displaystyle{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|) 22λ2f~(u)δ(|k||u′′|)2π|k|δ(u′′u)𝑑u′′𝑑u.\displaystyle\rightarrow\int_{{\mathbb{R}}^{2}}\int_{{\mathbb{R}}^{2}}\lambda^{2}\widetilde{f}(u^{\prime})\frac{\delta(|{k}|-|u^{\prime\prime}|)}{2\pi|{k}|}\delta(u^{\prime\prime}-u^{\prime})\;du^{\prime\prime}du^{\prime}.

So this expression is what follows; an orientationally averaged spectral density. Now assume additionally that the spectrum is isotropic, namely f~(u)=f~0(|u|)\widetilde{f}(u^{\prime})=\widetilde{f}_{0}(|u^{\prime}|), and then we get as |B||B|\rightarrow\infty,

𝐄I¯0(2)(|k|)\displaystyle{\mathbf{E}}\bar{I}_{0}^{(2)}(|{k}|) 2λ2f~0(|u′′|)δ(|k||u′′|)2π|k|𝑑u′′\displaystyle\rightarrow\int_{{\mathbb{R}}^{2}}\lambda^{2}\widetilde{f}_{0}(|u^{\prime\prime}|)\frac{\delta(|{k}|-|u^{\prime\prime}|)}{2\pi|{k}|}\;du^{\prime\prime}
=λ2+f~0(u′′)δ(|k||u′′|)|k||u′′|d|u′′|\displaystyle=\lambda^{2}\int_{{\mathbb{R}}^{+}}\widetilde{f}_{0}(u^{\prime\prime})\frac{\delta(|{k}|-|u^{\prime\prime}|)}{|{k}|}|u^{\prime\prime}|\;d|u^{\prime\prime}|
=f~0(|k|).\displaystyle=\widetilde{f}_{0}(|{k}|). (124)

This shows that asymptotically we would recover the isotropic spectral density from this component. Finally we can write

𝐄I¯0(|k|)=λ+f~0(|k|)+o(1)+dδ(|k||u|)|k||B|1T(B,u)𝑑u.{\mathbf{E}}\bar{I}_{0}(|{k}|)=\lambda+\widetilde{f}_{0}(|{k}|)+o(1)+\int_{{\mathbb{R}}^{d}}\frac{\delta(|{k}|-|u|)}{|{k}|}|B|^{-1}T(B,u)\;du.

Appendix L Additional Figures

Refer to caption
Figure 8: Examples of the simulated patterns used in the simulation trials for studying the quality of the estimators. Details of each model are given in Section 8. Note that the exact number of points in each realisation varies slightly.
Refer to caption
Figure 9: Isotropic sdf/λsdf/\lambda, mean estimates versus true curve (thick gray line).

References

  • [1] L. D. Abreu and J. L. Romero, (2017). MSE estimates for multitaper spectral estimation and off-grid compressive sensing, IEEE Transactions on Information Theory, 63(12), 7770–7776.
  • [2] C. Andersson, T. Rajala, and Aila Särkkä, (2018). Hierarchical models for epidermal nerve fiber data. Statistics in medicine, 37(3), 357–374.
  • [3] A. Baddeley, E. Rubak, and R. Turner. Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, London, 2015.
  • [4] S. Bandyopadhyay and S. N. Lahiri, (2009). Asymptotic properties of discrete Fourier transforms for spatial data. Sankhya: The Indian Journal of Statistics, Series A, page 221–259, 2009.
  • [5] M. S. Bartlett, (1963). The spectral analysis of point processes. Journal of the Royal Statistical Society: Series B (Methodological), 25(2), 264–281.
  • [6] M. S. Bartlett, (1964). The spectral analysis of two-dimensional point processes. Biometrika, 51(3/4), 299–311.
  • [7] C. A. N. Biscio and R. Waagepetersen, (2019). A general central limit theorem and a subsampling variance estimator for α\alpha-mixing point processes. Scandinavian Journal of Statistics, 46(4), 1168–1190.
  • [8] D. R. Brillinger, (1974). Fourier analysis of stationary processes. Proceedings of the IEEE, 62(12), 1628–1643.
  • [9] T. P. Bronez, (1988). Spectral estimation of irregularly sampled multidimensional processes by generalised prolate spheroidal sequences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(12), 1862–1873.
  • [10] S.N. Chiu, D. Stoyan, W.S. Kendall, and J. Mecke. Stochastic Geometry and Its Applications. Wiley Series in Probability and Statistics. Wiley, 2013. ISBN 9781118658253.
  • [11] I.–S. Chou and K.–S. Lii, (1992). Spectral analysis of random fields with random sampling. In Probabilistic and Stochastic Methods in Analysis, with Applications, page 343–368. Springer Verlag, Berlin.
  • [12] D. E. Clark, (2020). Local entropy statistics for point processes. IEEE Transactions on Information Theory, 66(2), 1155–1163.
  • [13] D. E. Clark, (2022). A Cramér–Rao bound for point processes. IEEE Transactions on Information Theory, 68(4), 2147–2155.
  • [14] L. Cohen, (1995). Time-frequency analysis, volume 778. Prentice hall.
  • [15] E. A. K. Cohen, (2014). Multi-wavelet coherence for point processes on the real line. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), page 2649–2653.
  • [16] R. Condit, R. Pérez, S. Aguilar, S. Lao, R. Foster, and S. Hubbell, (2019). Complete data from the Barro Colorado 50-ha plot: 423617 trees, 35 years. URL https://doi.org/10.15146/5xcp-0d46.
  • [17] A. Connes, and H. Moscovici, (2022). The UV prolate spectrum matches the zeros of zeta. Proceedings of the National Academy of Sciences 119(22), e2123174119.
  • [18] R. Dahlhaus and H. Künsch, (1987). Edge effects and efficient parameter estimation for stationary random fields. Biometrika, 74(4), 877–882.
  • [19] D. J. Daley, (1971). Weakly stationary point processes and random measures. Journal of the Royal Statistical Society: Series B (Methodological), 33(3), 406–428.
  • [20] D. J. Daley and D. Vere-Jones, (2003). An introduction to the theory of point processes. Vol. I: Elementary Theory and Methods. Springer Verlag, Berlin, Germany.
  • [21] P. Das and B. Babadi. Multitaper spectral analysis of neuronal spiking activity driven by latent stationary processes. Signal Processing, 170:107429, 2020.
  • [22] A. Dasgupta, (2008). Asymptotic theory of statistics and probability. Springer Verlag, Berlin, Germany.
  • [23] P. J. Diggle, D. J. Gates and A. Stibbard, (1987). A nonparametric estimator for pairwise-interaction point processes. Biometrika, 74(4), 763–770.
  • [24] P. J. Diggle, (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC press.
  • [25] D. Durran, J. A. Weyn and M. Q. Menchaca, (2017). Practical considerations for computing dimensional spectra from gridded data. Monthly Weather Review, 145(9), 3901–3910.
  • [26] T. Duong and M. L. Hazelton, (2005). Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics, 32(3):485–506.
  • [27] R. M. Errico, 1985. Spectra computed from a limited area grid. Monthly weather review, 113(9), 1554–1562.
  • [28] R. Estrada, (2014). On radial functions and distributions and their Fourier transforms. Journal of Fourier Analysis and Applications, 20(2):301–320.
  • [29] A. J. Flügge, S. C. Olhede, and D. J. Murrell (2014). A method to detect subcommunities from multivariate spatial associations. Methods in Ecology and Evolution, 5(11), 1214-1224.
  • [30] P. F. Fougere, (1977). A solution to the problem of spontaneous line splitting in maximum entropy power spectrum analysis. Journal of geophysical research, 82(7), 1051–1054.
  • [31] A. C. Gatrell, T. C. Bailey, P. J. Diggle and B. S. Rowlingson, (1996). Spatial point pattern analysis and its application in geographical epidemiology. Transactions of the Institute of British geographers, 256–274.
  • [32] L. Grafakos and G. Teschl, (2013). On Fourier transforms of radial functions and distributions. Journal of Fourier Analysis and Applications, 19(1), 167–179.
  • [33] A. P. Guillaumin, A. M. Sykulski, S. C. Olhede and F. J. Simons, (2022). The Debiased Spatial Whittle likelihood. Journal of the Royal Statistical Society: Series B, 84(4), 1526–1557.
  • [34] X. Guyon, (1982). Parameter estimation for a stationary process on a d–dimensional lattice. Biometrika, 69(1), 95–105.
  • [35] J. Häbel, (2017). A three-dimensional anisotropic point process characterization for pharmaceutical coatings. Spatial Statistics, 22, 306–320.
  • [36] A. Hanssen, (1997). Multidimensional multitaper spectral estimation. Signal Processing, 58(3), 327–332.
  • [37] K. E. Harms, R. Condit, S. P. Hubbell, and R. B. Foster (2001). Habitat associations of trees and shrubs in a 50-ha neotropical forest plot. Journal of ecology, 89(6), 947-959.
  • [38] F. Ilhan and S. S. Kozat, (2020). Modeling of spatio-temporal hawkes processes with randomised kernels. IEEE Transactions on Signal Processing, 68, 4946–4958.
  • [39] J. Illian, A. Penttinen, H. Stoyan, H. and D. Stoyan, (2008). Statistical analysis and modelling of spatial point patterns, John Wiley & Sons Ltd.
  • [40] L. Isserlis, (1918). On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika 12(1–2), 134-139..
  • [41] G. Jacovitti and A. Neri, (2000). Multiresolution circular harmonic decomposition. IEEE Transactions on Signal Processing, 48(11), 3242–3247.
  • [42] N. L. Johnson and S. Kotz, (1970). Continuous univariate distributions – 2, John Wiley & Sons Ltd.
  • [43] J. Jowett, D. Vere–Jones and P. A. W. Lewis, (1972). The prediction of stationary point processes. In Stochastic point processes, Wiley Interscience.
  • [44] S. T. John and J. Hensman, (2018). Large-scale Cox process inference using variational Fourier features. International Conference on Machine Learning, 2362–2370.
  • [45] S. Karnik, Z. Zhu, M. B. Waking, J. Romberg and M. A. Davenport, (2019). The fast Slepian transform. Applied and Computational Harmonic Analysis, 46, 624–652.
  • [46] S. Karnik, J. Romberg and M. A. Davenport, (2022). Thomson’s multitaper method revisited. IEEE Transactions on Information Theory, 68(7), 4864–4891.
  • [47] J. T. Kent and M. V. Mardia, (1996). Spectral and circulant approximations to the likelihood for stationary and Gaussian random fields. J. Statistical Planning and Inference, 50(3):379–394,
  • [48] M. Lázaro-Gredilla, J. Quinonero-Candela, C. E. Rasmussen and A. R. Figueiras-Vidal, (2010). Sparse spectrum Gaussian process regression. J. of Machine Learning Research, 11, 1865–1881.
  • [49] Y. Li, F. Baccelli, H. S. Dhillon and J. G. Andrews, (2015). Statistical modeling and probabilistic analysis of cellular networks with determinantal point processes, IEEE Transactions on Communications, 63(9), 3405–3422.
  • [50] K. S. Lii and E. Masry, (1994). Spectral estimation of continuous-time stationary processes from random sampling. Stochastic processes and their applications, 52(1), 39–64.
  • [51] T.-C. Liu and B. Van Veen, (1992). Multiple window based minimum variance spectrum estimation for multidimensional random fields. IEEE Transactions on Signal Processing, 40(3):578–589.
  • [52] P. J. Loughlin, J.W. Pitton, and L. E. Atlas, (1993). Bilinear time-frequency representations: New insights and properties. IEEE Transactions on Signal Processing, 41(2), 750–767.
  • [53] K. V. Mardia and R. J. Marshall, (1984). Biometrika, 71(1), 135–146.
  • [54] K. S. Miller, (1974). Complex stochastic processes: an introduction to theory and application. Addison-Wesley, New York, USA.
  • [55] J. Møller, A. R. Syversveen and R. P. Waagepetersen, (1998). Log-Gaussian Cox processes. Scandinavian Journal of Statistics, 25(3), 451–482.
  • [56] M. A. Mugglestone and E. Renshaw, (1996). A practical guide to the spectral analysis of spatial point processes. Computational Statistics & Data Analysis, 21(1), 43–65.
  • [57] T. D. Novlan, H. S. Dhillon and J. G. Andrews, (2013). Analytical modeling of uplink cellular networks. IEEE Transactions on Wireless Communications, 12(6), 2669–2679.
  • [58] D. B. Percival and A. T.  Walden, (1993). Spectral analysis for physical applications. Cambridge University Press, Cambridge, UK.
  • [59] O. Ponomarenko and Y. Perun. Spectral analysis of some classes of multivariate random fields with isotropic property. Theory of Stochastic Processes, 12:142–150, 2006.
  • [60] K. S. Riedel and A. Sidorenko. (1995). Minimum bias multiple taper spectral estimation. IEEE Transactions on Signal Processing, 43(1), 188–195.
  • [61] E. A. Robinson. (1982). A historical perspective of spectrum estimation. Proceedings of the IEEE, 70(9), 885–907.
  • [62] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020. URL https://www.R-project.org/.
  • [63] P. J. Schreier and L. L. Scharf, (2003). Second-order analysis of improper complex random vectors and processes. IEEE Transactions on Signal Processing, 51(3):714–725.
  • [64] P. J. Schreier and L. L. Scharf, (2010). Statistical signal processing of complex-valued data: the theory of improper and noncircular signals. Cambridge University Press.
  • [65] F. J. Simons and D. V. Wang, (2011). Spatiospectral concentration in the Cartesian plane. GEM-International Journal on Geomathematics, 2(1), 1–36.
  • [66] D. Slepian, (1964). Prolate spheroidal wave functions, Fourier analysis and uncertainty–iv: extensions to many dimensions; generalised prolate spheroidal functions. Bell Labs Technical Journal, 43(6):3009–3057.
  • [67] D. Slepian, (1983). Some comments on Fourier analysis, uncertainty and modeling. SIAM review, 25(3):379–393.
  • [68] M. L. Stein, (2012). Interpolation of Spatial Data: some theory for kriging. Springer-Verlag, Berlin, Germany.
  • [69] T. J. Theim, R. Y. Shirk and T. J. Givnish (2014). Spatial genetic structure in four understory Psychotria species (Rubiaceae) and implications for tropical forest diversity. American Journal of Botany, 101(7), 1189-1199.
  • [70] D. J. Thomson, (1982). Spectrum estimation and harmonic analysis, Proceedings of the IEEE, vol. 70, pp. 1055–1096.
  • [71] D. J. Thomson, (1990). Quadratic–inverse spectrum estimates: applications to palaeoclimatology. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 332 (1627):539–597, 1990.
  • [72] G. E. Tita and S. M. Radil, (2010). Making space for theory: the challenges of theorizing space and place for spatial analysis in criminology. Journal of Quantitative Criminology, 26(4), 467–479.
  • [73] J.–F. Ton, S. Flaxman, D. Sejdinovic and S. Bhatt, (2018). Spatial mapping with Gaussian processes and nonstationary Fourier features. Spatial statistics, 28, 59–78.
  • [74] S. Vembu, (1961). Fourier transformation of the nn-dimensional radial delta function. The Quarterly Journal of Mathematics, 12(1):165–168.
  • [75] D. Vere–Jones, (1974). An elementary approach to the spectral theory of stationary random measures. Stochastic Geometry, 65, 307.
  • [76] A. T. Walden, (2000). A uni ed view of multitaper multivariate spectral estimation. Biometrika, 87(4):767–788.
  • [77] R. Waagepetersen, Y. Guan, A. Jalilian and J. Mateu, (2016). Analysis of multispecies point patterns by using multivariate log-Gaussian Cox processes. Appl. Statist., 65, 77–96.
  • [78] Y. Xu, S. Haykin, and R. J. Racine, (1999). Multiple window time-frequency distribution and coherence of eeg using Slepian sequences and Hermite functions. IEEE Transactions on Biomedical Engineering, 46(7): 861–866.