This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Thomson’s Multitaper Method Revisited

Santhosh Karnik, Justin Romberg, Mark A. Davenport
Abstract

Thomson’s multitaper method estimates the power spectrum of a signal from NN equally spaced samples by averaging KK tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson’s multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furthermore, many practitioners use a DPSS bandwidth WW and number of tapers that are smaller than what the theory suggests is optimal because the computational requirements increase with the number of tapers. We revisit Thomson’s multitaper method from a linear algebra perspective involving subspace projections. This provides additional insight and helps us establish nonasymptotic bounds on some statistical properties of the multitaper spectral estimate, which are similar to existing asymptotic results. We show using K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers instead of the traditional 2NWO(1)2NW-O(1) tapers better protects against spectral leakage, especially when the power spectrum has a high dynamic range. Our perspective also allows us to derive an ϵ\epsilon-approximation to the multitaper spectral estimate which can be evaluated on a grid of frequencies using O(log(NW)log1ϵ)O(\log(NW)\log\tfrac{1}{\epsilon}) FFTs instead of K=O(NW)K=O(NW) FFTs. This is useful in problems where many samples are taken, and thus, using many tapers is desirable.

S. Karnik, J. Romberg, and M. A. Davenport are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 USA (e-mail: [email protected], [email protected], [email protected]). This work was partially supported by NSF grant CCF-1409406, a grant from Lockheed Martin, and a gift from the Alfred P. Sloan Foundation. A preliminary version of this paper highlighting some of the key results also appeared in [1].

1 Introduction

Perhaps one of the most fundamental problems in digital signal processing is spectral estimation, i.e., estimating the power spectrum of a signal from a window of NN evenly spaced samples. The simplest solution is the periodogram, which simply takes the squared-magnitude of the discrete time Fourier transform (DTFT) of the samples. Obtaining only a finite number of samples is equivalent to multiplying the signal by a rectangular function before sampling. As a result, the DTFT of the samples is the DTFT of the signal convolved with the DTFT of the rectangular function, which is a slowly-decaying sinc function. Hence, narrow frequency components in the true signal appear more spread out in the periodogram. This phenomenon is known as “spectral leakage”.

The most common approach to mitigating the spectral leakage phenomenon is to multiply the samples by a taper before computing the periodogram. Since multiplying the signal by the taper is equivalant to convolving the DTFT of the signal with the DTFT of the taper, using a taper whose DTFT is highly concentrated around f=0f=0 will help mitigate the spectral leakage phenomenon. Numerous kinds of tapers have been proposed [2] which all have DTFTs which are highly concentrated around f=0f=0. The Slepian basis vectors, also known as the discrete prolate spheroidal sequences (DPSSs), are designed such that their DTFTs have a maximal concentration of energy in the frequency band [W,W][-W,W] subject to being orthonormal [3]. The first 2NW\approx 2NW of these Slepian basis vectors have DTFTs which are highly concentrated in the frequency band [W,W][-W,W]. Thus, any of the first 2NW\approx 2NW Slepian basis vectors provides a good choice to use as a taper.

In 1982, David Thomson [4] proposed a multitaper method which computes a tapered periodogram for each of the first K2NWK\approx 2NW Slepian tapers, and then averages these periodograms. Due to the spectral concentration properties of the Slepian tapers, Thomson’s multitaper method also does an excellent job mitigating spectral leakage. Furthermore, by averaging KK tapered periodograms, Thomson’s multitaper method is more robust than a single tapered periodogram. As such, Thomson’s multitaper method has been used in a wide variety of applications, such as cognitive radio [5, 6, 7, 8, 9], digital audio coding [10, 11], as well as to analyze EEG [12, 13] and other neurological signals [14, 15, 16, 17, 18], climate data [19, 20, 21, 22, 23, 24, 25], breeding calls of Adélie penguins [26] and songs of other birds [27, 28, 29, 30], topography of terrestrial planets [31], solar waves[32], and gravitational waves [33].

The existing theoretical results regarding Thomson’s multitaper method are either qualitative or asymptotic. Here, we provide a brief summary of these results. See [4, 34, 35, 8, 36, 37] for additional details. Suppose that the power spectral density S(f)S(f) of the signal is “slowly varying”. Let S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) denote the multitaper spectral estimate. Then, the following results are known.

  • The estimate is approximately unbiased, i.e., 𝔼S^Kmt(f)S(f)\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)\approx S(f).

  • The variance is roughly Var[S^Kmt(f)]1KS(f)2\operatorname{Var}[\widehat{S}^{\text{mt}}_{K}(f)]\approx\tfrac{1}{K}S(f)^{2}.

  • For any two frequencies f1,f2f_{1},f_{2} that are at least 2W2W apart, S^Kmt(f1)\widehat{S}^{\text{mt}}_{K}(f_{1}) and S^Kmt(f2)\widehat{S}^{\text{mt}}_{K}(f_{2}) are approximately uncorrelated.

  • The multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) has a concentration behavior about its mean which is similar to a scaled chi-squared random variable with 2K2K degrees of freedom.

  • If the power spectral density S(f)S(f) is twice differentiable, then choosing a bandwidth of W=O(N1/5)W=O(N^{-1/5}) and using K2NW=O(N4/5)K\approx 2NW=O(N^{4/5}) tapers minimizes the mean squared error of the multitaper spectral estimate 𝔼|S^Kmt(f)S(f)|2\mathbb{E}|\widehat{S}^{\text{mt}}_{K}(f)-S(f)|^{2}.

These asymptotic results demonstrate that using more than a small constant number of tapers improves the quality of the multitaper spectral estimate. However, using more tapers increases the computational requirements. As a result, many practitioners often use significantly fewer tapers than what is optimal.

The main contributions of this work are as follows:

  • We revisit Thomson’s multitaper method from a linear algebra based perspective. Specifically, for each frequency ff, the multitaper spectral estimate computes the 22-norm of the projection of the vector of samples onto a KK-dimensional subspace. The subspace chosen can be viewed as the result of performing principle component analysis on a continuum of sinusoids whose frequency is between fWf-W and f+Wf+W.

  • Using this perspective, we establish non-asymptotic bounds on the bias, variance, covariance, and probability tails of the multitaper spectral estimate. These non-asymptotic bounds are comparable to the known asymptotic results which assume that the spectrum is slowly varying. Also, these bounds show that using K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers (instead of the traditional choice of K=2NW1K=\left\lfloor 2NW\right\rfloor-1 or 2NW2\left\lfloor 2NW\right\rfloor-2 tapers) provides better protection against spectral leakage, especially in scenarios where the spectrum has a large dynamic range.

  • We also use this perspective to demonstrate a fast algorithm for evaluating an ϵ\epsilon-approximation of the multitaper spectral estimate on a grid of LL equally spaced frequencies. The complexity of this algorithm is O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) while the complexity of evaluating the exact multitaper spectral estimate on a grid of LL equally spaced frequencies is O(KLlogL)O(KL\log L). Computing the ϵ\epsilon-approximation is faster than the exact multitaper provided Klog(NW)log1ϵK\gtrsim\log(NW)\log\tfrac{1}{\epsilon} tapers are used.

The rest of this work is organized as follows. In Section 2 we formally introduce Thompson’s multitaper spectral estimate, both from the traditional view involving tapered periodograms as well as from a linear algebra view involving projections onto subspaces. In Section 3, we state non-asymptotic bounds regarding the bias, variance, and probability concentration of the multitaper spectral estimate. These bounds are proved in Appendix A. In Section 4, we state our fast algorithm for evaluating the multitaper spectral estimate at a grid of frequencies. The proofs of the approximation error and computational requirements are in Appendix B. In Section 5, we show numerical experiments which demonstrate that using K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers minimizes the impact of spectral leakage on the multitaper spectral estimate, and that our ϵ\epsilon-approximation to the multitaper spectral estimate can be evaluated at a grid of evenly spaced frequencies significantly faster than the exact multitaper spectral estimate. We finally conclude the paper in Section 6.

2 Thomson’s Multitaper Method

2.1 Traditional view

Let x(n)x(n), nn\in\mathbb{Z} be a stationary, ergodic, zero-mean, Gaussian process. The autocorrelation and power spectral density of x(n)x(n) are defined by

Rn=𝔼[x(m)x(m+n)¯]form,n,R_{n}=\mathbb{E}\left[x(m)\overline{x(m+n)}\right]\quad\text{for}\quad m,n\in\mathbb{Z},

and

S(f)=n=Rnej2πfnforfS(f)=\sum_{n=-\infty}^{\infty}R_{n}e^{-j2\pi fn}\quad\text{for}\quad f\in\mathbb{R}

respectively. The goal of spectral estimation is to estimate S(f)S(f) from the vector 𝒙N\boldsymbol{x}\in\mathbb{C}^{N} of equispaced samples 𝒙[n]=x(n)\boldsymbol{x}[n]=x(n) for n[N]n\in[N]. 111We use the notation [N][N] to denote the set {0,,N1}\{0,\ldots,N-1\}. We will also use [K][K] and [L][L] in a similar manner.

One of the earliest, and perhaps the simplest estimator of S(f)S(f) is the periodogram [38, 39]

S^(f)=1N|n=0N1𝒙[n]ej2πfn|2.\widehat{S}(f)=\dfrac{1}{N}\left|\sum_{n=0}^{N-1}\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}.

This estimator can be efficiently evaluated at a grid of evenly spaced frequencies via the FFT. However, the periodogram has high variance and suffers from the spectral leakage phenomenon [40].

A modification to the periodogram is to pick a data taper 𝒘N\boldsymbol{w}\in\mathbb{R}^{N} with 𝒘2=1\|\boldsymbol{w}\|_{2}=1, and then weight the samples by the taper as follows

S^𝒘(f)=|n=0N1𝒘[n]𝒙[n]ej2πfn|2.\widehat{S}_{\boldsymbol{w}}(f)=\left|\sum_{n=0}^{N-1}\boldsymbol{w}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}.

If 𝒘[n]\boldsymbol{w}[n] is small near n=0n=0 and n=N1n=N-1, then this “smoothes” the “edges” of the sample window. Note that the expectation of the tapered periodogram is given by a convolution of the true spectrum and the spectral window of the taper,

𝔼[S^𝒘(f)]=S(f)|𝒘~(f)|2\mathbb{E}[\widehat{S}_{\boldsymbol{w}}(f)]=S(f)\circledast\left|\widetilde{\boldsymbol{w}}(f)\right|^{2}

where

𝒘~(f)=n=0N1𝒘[n]ej2πfn.\widetilde{\boldsymbol{w}}(f)=\sum_{n=0}^{N-1}\boldsymbol{w}[n]e^{-j2\pi fn}.

Hence, a good taper will have its spectral window |𝒘~(f)|2\left|\widetilde{\boldsymbol{w}}(f)\right|^{2} concentrated around f=0f=0 so that 𝔼[S^𝒘(f)]=S(f)|𝒘~(f)|2S(f)\mathbb{E}\left[\widehat{S}_{\boldsymbol{w}}(f)\right]=S(f)\circledast\left|\widetilde{\boldsymbol{w}}(f)\right|^{2}\approx S(f), i.e., the tapered periodogram will be approximately unbiased.

For a given bandwidth parameter W>0W>0, we can ask “Which taper maximizes the concentration of its spectral window, |𝒘~(f)|2\left|\widetilde{\boldsymbol{w}}(f)\right|^{2}, in the interval [W,W][-W,W]?” Note that we can write

WW|𝒘~(f)|2𝑑f=𝒘𝑩𝒘\int_{-W}^{W}\left|\widetilde{\boldsymbol{w}}(f)\right|^{2}\,df=\boldsymbol{w}^{*}\boldsymbol{B}\boldsymbol{w}

where 𝑩\boldsymbol{B} is the N×NN\times N prolate matrix [41, 42] whose entries are given by

𝑩[m,n]={sin[2πW(mn)]π(mn)ifmn,2Wifm=n.\boldsymbol{B}[m,n]=\begin{cases}\dfrac{\sin[2\pi W(m-n)]}{\pi(m-n)}&\text{if}\ m\neq n,\\ 2W&\text{if}\ m=n.\end{cases}

Hence, the taper whose spectral window, |𝒘~(f)|2\left|\widetilde{\boldsymbol{w}}(f)\right|^{2}, is maximally concentrated in [W,W][-W,W] is the eigenvector of 𝑩\boldsymbol{B} corresponding to the largest eigenvalue.

The prolate matrix 𝑩\boldsymbol{B} was first studied extensively by David Slepian [3]. As such, we will refer to the orthonormal eigenvectors 𝒔0,𝒔1,,𝒔N1N\boldsymbol{s}_{0},\boldsymbol{s}_{1},\ldots,\boldsymbol{s}_{N-1}\in\mathbb{R}^{N} of 𝑩\boldsymbol{B} as the Slepian basis vectors, where the corresponding eigenvalues λ0λ1λN1\lambda_{0}\geq\lambda_{1}\geq\cdots\geq\lambda_{N-1} are sorted in descending order. Slepian showed that all the eigenvalues of 𝑩\boldsymbol{B} are distinct and strictly between 0 and 11. Furthermore, the eigenvalues of 𝑩\boldsymbol{B} exhibit a particular clustering behavior. Specifically, the first slightly less than 2NW2NW eigenvalues are very close to 11, and the last slightly less than N2NWN-2NW eigenvalues are very close to 0.

While 𝒔0\boldsymbol{s}_{0} is the taper whose spectral window is maximally concentrated in [W,W][-W,W], any of the Slepian basis vectors 𝒔k\boldsymbol{s}_{k} for which 𝒔k𝑩𝒔k=λk1\boldsymbol{s}_{k}^{*}\boldsymbol{B}\boldsymbol{s}_{k}=\lambda_{k}\approx 1 will also have a high energy concentration in [W,W][-W,W], and thus, make good tapers. Thomson [4] proposed a multitaper spectral estimate by using each of the first K2NWK\approx 2NW Slepian basis vectors as tapers, and taking an average of the resulting tapered periodograms, i.e.,

S^Kmt(f)=1Kk=0K1S^k(f)whereS^k(f)=|n=0N1𝒔k[n]𝒙[n]ej2πfn|2.\widehat{S}^{\text{mt}}_{K}(f)=\dfrac{1}{K}\sum_{k=0}^{K-1}\widehat{S}_{k}(f)\quad\text{where}\quad\widehat{S}_{k}(f)=\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}.

The expectation of the multitaper spectral estimate satisfies

𝔼[S^Kmt(f)]=S(f)ψ(f)\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=S(f)\circledast\psi(f)

where

ψ(f)=1Kk=0K1|n=0N1𝒔k[n]ej2πfn|2\psi(f)=\dfrac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]e^{-j2\pi fn}\right|^{2}

is known as the spectral window of the multitaper spectral estimate. It can be shown that when K2NWK\approx 2NW, the spectral window ψ(f)\psi(f) approximates 12W𝟙[W,W](f)\tfrac{1}{2W}\mathbbm{1}_{[-W,W]}(f) on f[12,12]f\in[-\tfrac{1}{2},\tfrac{1}{2}]. Thus, the multitaper spectral estimate behaves in expectation like a smoothed version of the true spectrum S(f)S(f).

It can be shown that if the spectrum S(f)S(f) is slowly varying around a frequency ff, then the tapered spectral estimates S^k(f)\widehat{S}_{k}(f) are approximately uncorrelated, and Var[S^k(f)]S(f)2\operatorname{Var}[\widehat{S}_{k}(f)]\approx S(f)^{2}. Hence, Var[S^Kmt(f)]1KS(f)2\operatorname{Var}[\widehat{S}^{\text{mt}}_{K}(f)]\approx\tfrac{1}{K}S(f)^{2}. Thus, Thomson’s multitaper method produces a spectral estimate whose variance is a factor of K2NWK\approx 2NW smaller than the variance of a single tapered periodogram.

As we increase WW, the width of the spectral window ψ(f)\psi(f) increases, which causes the expectation of the multitaper spectral estimate to be further smoothed. However, increasing WW also allows us to increase the number of tapers K2NWK\approx 2NW, which reduces the variance of the multitaper spectral estimate. Intuitively, Thomson’s multitaper method introduces a tradeoff between resolution and robustness.

2.2 Linear algebra view

Here we provide an alternate perspective on Thomson’s multitaper method which is based on linear algebra and subspace projections. Suppose that for each frequency ff\in\mathbb{R}, we choose a low-dimensional subspace 𝒮fN\mathcal{S}_{f}\subset\mathbb{C}^{N}, and form a spectral estimate by computing proj𝒮f(𝒙)22\|\text{proj}_{\mathcal{S}_{f}}(\boldsymbol{x})\|_{2}^{2}, i.e., the energy in the projection of 𝒙\boldsymbol{x} onto the subspace 𝒮f\mathcal{S}_{f}. One simple choice is the one-dimensional subspace 𝒮f=span{𝒆f}\mathcal{S}_{f}=\text{span}\{\boldsymbol{e}_{f}\} where

𝒆f=[1ej2πf1ej2πf2ej2πf(N1)]T\boldsymbol{e}_{f}=\begin{bmatrix}1&e^{j2\pi f\cdot 1}&e^{j2\pi f\cdot 2}&\cdots&e^{j2\pi f\cdot(N-1)}\end{bmatrix}^{T}

is a vector of equispaced samples from a complex sinusoid with frequency ff. For this choice of 𝒮f\mathcal{S}_{f}, we have

proj𝒮f(𝒙)22=|𝒆f,𝒙|2𝒆f22=1N|n=0N1𝒙[n]ej2πfn|2,\left\|\text{proj}_{\mathcal{S}_{f}}(\boldsymbol{x})\right\|_{2}^{2}=\dfrac{\left|\left<\boldsymbol{e}_{f},\boldsymbol{x}\right>\right|^{2}}{\|\boldsymbol{e}_{f}\|_{2}^{2}}=\dfrac{1}{N}\left|\sum_{n=0}^{N-1}\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2},

which is exactly the classic periodogram.

We can also choose a low-dimensional subspace 𝒮f\mathcal{S}_{f} which minimizes the average representation error of sinusoids 𝒆f\boldsymbol{e}_{f^{\prime}} with frequency f[fW,f+W]f^{\prime}\in[f-W,f+W] for some small W>0W>0, i.e.,

minimize𝒮fNdim(𝒮f)=KfWf+W𝒆fproj𝒮f(𝒆f)22𝑑f,\operatorname*{\text{minimize}}_{\begin{subarray}{c}\mathcal{S}_{f}\subset\mathbb{C}^{N}\\ \dim(\mathcal{S}_{f})=K\end{subarray}}\int_{f-W}^{f+W}\left\|\boldsymbol{e}_{f^{\prime}}-\text{proj}_{\mathcal{S}_{f}}(\boldsymbol{e}_{f^{\prime}})\right\|_{2}^{2}\,df^{\prime},

where the dimension of the subspace KK is fixed. Using ideas from the Karhunen-Loeve (KL) transform [43], it can be shown that the optimal KK-dimensional subspace is the span of the top KK eigenvectors of the covariance matrix

𝑪f:=12WfWf+W𝒆f𝒆f𝑑f.\boldsymbol{C}_{f}:=\dfrac{1}{2W}\int_{f-W}^{f+W}\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}.

The entries of this covariance matrix are

𝑪f[m,n]\displaystyle\boldsymbol{C}_{f}[m,n] =12WfWf+W𝒆f[m]𝒆f[n]¯𝑑f\displaystyle=\dfrac{1}{2W}\int_{f-W}^{f+W}\boldsymbol{e}_{f^{\prime}}[m]\overline{\boldsymbol{e}_{f^{\prime}}[n]}\,df^{\prime}
=12WfWf+Wej2πf(mn)𝑑f\displaystyle=\dfrac{1}{2W}\int_{f-W}^{f+W}e^{j2\pi f^{\prime}(m-n)}\,df^{\prime}
=sin[2πW(mn)]2πW(mn)ej2πf(mn)\displaystyle=\dfrac{\sin[2\pi W(m-n)]}{2\pi W(m-n)}e^{j2\pi f(m-n)}
=12W𝒆f[m]𝑩[m,n]𝒆f[n]¯,\displaystyle=\dfrac{1}{2W}\boldsymbol{e}_{f}[m]\boldsymbol{B}[m,n]\overline{\boldsymbol{e}_{f}[n]},

where again 𝑩\boldsymbol{B} is the N×NN\times N prolate matrix. Hence, we can write

𝑪f=12W𝑬f𝑩𝑬f,\boldsymbol{C}_{f}=\dfrac{1}{2W}\boldsymbol{E}_{f}\boldsymbol{B}\boldsymbol{E}_{f}^{*},

where 𝑬f=diag(𝒆f)N×N\boldsymbol{E}_{f}=\operatorname{diag}(\boldsymbol{e}_{f})\in\mathbb{C}^{N\times N} is a unitary matrix which modulates vectors by pointwise multiplying them by the sinusoid 𝒆f\boldsymbol{e}_{f}. Therefore, the eigenvectors of 𝑪f\boldsymbol{C}_{f} are the modulated Slepian basis vectors 𝑬f𝒔k\boldsymbol{E}_{f}\boldsymbol{s}_{k} for k[N]k\in[N], and the corresponding eigenvalues are λk2W\tfrac{\lambda_{k}}{2W}. Hence, we can choose 𝒮f=span{𝑬f𝒔0,,𝑬f𝒔K1}\mathcal{S}_{f}=\text{span}\{\boldsymbol{E}_{f}\boldsymbol{s}_{0},\ldots,\boldsymbol{E}_{f}\boldsymbol{s}_{K-1}\}, i.e., the span of the first KK Slepian basis vectors modulated to the frequency ff. Since 𝒔0,,𝒔K1\boldsymbol{s}_{0},\ldots,\boldsymbol{s}_{K-1} are orthonormal vectors, and 𝑬f\boldsymbol{E}_{f} is a unitary matrix, 𝑬f𝒔0,,𝑬f𝒔K1\boldsymbol{E}_{f}\boldsymbol{s}_{0},\ldots,\boldsymbol{E}_{f}\boldsymbol{s}_{K-1} are orthonormal vectors. Hence, proj𝒮f(𝒙)=𝑬f𝑺K𝑺K𝑬f𝒙\text{proj}_{\mathcal{S}_{f}}(\boldsymbol{x})=\boldsymbol{E}_{f}\boldsymbol{S}_{K}\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x} where 𝑺K=[𝒔0𝒔K1]\boldsymbol{S}_{K}=\begin{bmatrix}\boldsymbol{s}_{0}&\ldots&\boldsymbol{s}_{K-1}\end{bmatrix}, and thus,

proj𝒮f(𝒙)22\displaystyle\left\|\text{proj}_{\mathcal{S}_{f}}(\boldsymbol{x})\right\|_{2}^{2} =𝑬f𝑺K𝑺K𝑬f𝒙22\displaystyle=\left\|\boldsymbol{E}_{f}\boldsymbol{S}_{K}\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2}
=𝑺K𝑬f𝒙22\displaystyle=\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2}
=k=0K1|(𝑺K𝑬f𝒙)[k]|2\displaystyle=\sum_{k=0}^{K-1}\left|(\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x})[k]\right|^{2}
=k=0K1|𝒔k𝑬f𝒙|2\displaystyle=\sum_{k=0}^{K-1}\left|\boldsymbol{s}_{k}^{*}\boldsymbol{E}_{f}\boldsymbol{x}\right|^{2}
=k=0K1|n=0N1𝒔k[n]𝒙[n]ej2πfn|2.\displaystyle=\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}.

Up to a constant scale factor, this is precisely the multitaper spectral estimate. Hence, we can view the the multitaper spectral estimate S^Kmt(f)=1K𝑺K𝑬f𝒙22\widehat{S}^{\text{mt}}_{K}(f)=\tfrac{1}{K}\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\|_{2}^{2} as the energy in 𝒙\boldsymbol{x} after it is projected onto the KK-dimensional subspace which best represents the collection of sinusoids {𝒆f:f[fW,f+W]}\{\boldsymbol{e}_{f^{\prime}}:f^{\prime}\in[f-W,f+W]\}.

2.3 Slepian basis eigenvalues

Before we proceed to our main results, we elaborate on the clustering behavior of the Slepian basis eigenvalues, as they are critical to our analysis in the rest of this paper. For any ϵ(0,12)\epsilon\in(0,\tfrac{1}{2}), slightly fewer than 2NW2NW eigenvalues lie in [1ϵ,1)[1-\epsilon,1), slightly fewer than N2NWN-2NW eigenvalues lie in (0,ϵ](0,\epsilon], and very few eigenvalues lie in the so-called “transition region” (ϵ,1ϵ)(\epsilon,1-\epsilon). In Figure 1, we demonstrate this phenomenon by plotting the first 10001000 Slepian basis eigenvalues for N=10000N=10000 and W=1100W=\tfrac{1}{100} (so 2NW=2002NW=200). The first 194194 eigenvalues lie in [0.999,1)[0.999,1) and the last 97949794 eigenvalues lie in (0,0.001](0,0.001]. Only 1212 eigenvalues lie in (0.001,0.999)(0.001,0.999).

Refer to caption
Figure 1: A plot of the first 10001000 Slepian basis eigenvalues for N=10000N=10000 and W=1100W=\tfrac{1}{100}. These eigenvalues satisfy λ1930.9997\lambda_{193}\approx 0.9997 and λ2060.0003\lambda_{206}\approx 0.0003. Only 1212 of the 1000010000 Slepian basis eigenvalues lie in (0.001,0.999)(0.001,0.999).

Slepian [3] showed that for any fixed W(0,12)W\in(0,\tfrac{1}{2}) and bb\in\mathbb{R},

λ2NW+(b/π)logN11+ebπasN.\lambda_{\left\lfloor 2NW+(b/\pi)\log N\right\rfloor}\to\dfrac{1}{1+e^{b\pi}}\quad\text{as}\quad N\to\infty.

From this result, it is easy to show that for any fixed W(0,12)W\in(0,\tfrac{1}{2}) and ϵ(0,12)\epsilon\in(0,\tfrac{1}{2}),

#{k:ϵ<λk<1ϵ}2π2logNlog(1ϵ1)asN.\#\{k:\epsilon<\lambda_{k}<1-\epsilon\}\sim\dfrac{2}{\pi^{2}}\log N\log\left(\dfrac{1}{\epsilon}-1\right)\quad\text{as}\quad N\to\infty.

In [44], the authors of this paper derive a non-asymptotic bound on the number of Slepian basis eigenvalues in the transition region. For any NN\in\mathbb{N}, W(0,12)W\in(0,\tfrac{1}{2}), and ϵ(0,12)\epsilon\in(0,\tfrac{1}{2}),

#{k:ϵ<λk<1ϵ}2π2log(100NW+25)log(5ϵ(1ϵ))+7.\#\{k:\epsilon<\lambda_{k}<1-\epsilon\}\leq\dfrac{2}{\pi^{2}}\log(100NW+25)\log\left(\dfrac{5}{\epsilon(1-\epsilon)}\right)+7.

This shows that the width of the transition region grows logarithmically with respect to both the time-bandwidth product 2NW2NW and the tolerance parameter ϵ\epsilon. Note that when ϵ\epsilon and WW are small, this result is a significant improvement over the prior non-asymptotic bounds in [45], [46], and [47], which scale like O(logNϵ)O(\tfrac{\log N}{\epsilon}), O(log(NW)ϵ)O(\tfrac{\log(NW)}{\epsilon}), and O(logNlog1ϵ)O(\log N\log\tfrac{1}{\epsilon}) respectively. In Section 4, we will exploit the fact that the number of Slepian basis eigenvalues in the transition region grows like O(log(NW)log1ϵ)O(\log(NW)\log\tfrac{1}{\epsilon}) to derive a fast algorithm for evaluating S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) at a grid of evenly spaced frequencies.

Furthermore, in [44], the authors of this paper combine the above bound on the width of the transition region with the fact from [45] that λ2NW112λ2NW\lambda_{\left\lfloor 2NW\right\rfloor-1}\geq\tfrac{1}{2}\geq\lambda_{\left\lceil 2NW\right\rceil} to obtain the following lower bound on the first 2NW\approx 2NW eigenvalues

λk110exp(2NWk72π2log(100NW+25))for0k2NW1.\lambda_{k}\geq 1-10\exp\left(-\dfrac{\left\lfloor 2NW\right\rfloor-k-7}{\tfrac{2}{\pi^{2}}\log(100NW+25)}\right)\quad\text{for}\quad 0\leq k\leq\left\lfloor 2NW\right\rfloor-1.

This shows that as kk decreases from 2NW\approx 2NW, the quantity 1λk1-\lambda_{k} decays exponentially to 0. In particular, this means that the first 2NWO(log(NW))2NW-O(\log(NW)) Slepian basis tapers 𝒔k\boldsymbol{s}_{k} have spectral windows 𝒔~k(f)\widetilde{\boldsymbol{s}}_{k}(f) which have very little energy outside [W,W][-W,W]. However, the spectral windows of the Slepian basis tapers for kk near 2NW2NW will have a significant amount of energy outside [W,W][-W,W]. In Section 3, we show that the statistical properties of the multitaper spectral estimate are significantly improved when using K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers instead of the traditional K=2NW1K=\left\lfloor 2NW\right\rfloor-1 tapers.

3 Statistical Properties and Spectral Leakage

For a given vector of signal samples 𝒙N\boldsymbol{x}\in\mathbb{C}^{N}, using Thomson’s multitaper method for spectral estimation requires selecting two parameters: the half-bandwidth WW of the Slepian basis tapers and the number of tapers KK which are used in the multitaper spectral estimate. The selection of these parameters can greatly impact the accuracy of the multitaper spectral estimate.

In some applications, a relatively small number of samples NN are taken, and the desired frequency resolution for the spectral estimate is O(1N)O(\tfrac{1}{N}), i.e., a small multiple of the fundamental Rayleigh resolution limit. In such cases, many practitioners [14, 17, 19, 20, 22, 25, 28, 29] choose the half-bandwidth parameter WW such that 2NW2NW is between 33 and 1010, and then choose the number of Slepian basis tapers to be between K=2NWK=\left\lfloor 2NW\right\rfloor and K=2NW2K=\left\lfloor 2NW\right\rfloor-2. However, in applications where a large number of samples NN are taken, and some loss of resolution is acceptable, choosing a larger half-bandwidth parameter WW can result in a more accurate spectral estimate. Furthermore, if the power spectral density S(f)S(f) has a high dynamic range (that is maxfS(f)minfS(f)\max_{f}S(f)\gg\min_{f}S(f)), we aim to show that choosing K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers for some small δ>0\delta>0 (instead of K=2NWO(1)K=2NW-O(1) tapers) can provide significantly better protection against spectral leakage.

For all the theorems in this section, we assume that 𝒙N\boldsymbol{x}\in\mathbb{C}^{N} is a vector of samples from a complex Gaussian process whose power spectral density S(f)S(f) is bounded and integrable. Note that the analogous results for a real Gaussian process would be similar, but slightly more complicated to state. To state our results, define

M=maxfS(f),M=\max_{f\in\mathbb{R}}S(f),

i.e., the global maximum of the power spectral density, and for each frequency ff\in\mathbb{R} we define:

mf=minf[fW,f+W]S(f),m_{f}=\min_{f^{\prime}\in[f-W,f+W]}S(f^{\prime}),
Mf=maxf[fW,f+W]S(f),M_{f}=\max_{f^{\prime}\in[f-W,f+W]}S(f^{\prime}),
Af=12WfWf+WS(f)𝑑f,A_{f}=\dfrac{1}{2W}\int_{f-W}^{f+W}S(f^{\prime})\,df^{\prime},
Rf=12WfWf+WS(f)2𝑑f,R_{f}=\sqrt{\dfrac{1}{2W}\int_{f-W}^{f+W}S(f^{\prime})^{2}\,df^{\prime}},

i.e. the minimum, maximum, average, and root-mean-squared values of the power spectral density over the interval [fW,f+W][f-W,f+W]. We also define the quantities

ΣK(1)=1Kk=0K1(1λk)\Sigma_{K}^{(1)}=\dfrac{1}{K}\sum_{k=0}^{K-1}(1-\lambda_{k})
ΣK(2)=1Kk=0K1(1λk)2.\Sigma_{K}^{(2)}=\sqrt{\dfrac{1}{K}\sum_{k=0}^{K-1}(1-\lambda_{k})^{2}}.

Before we proceed to our results, we make note of the fact that mfm_{f}, MfM_{f}, AfA_{f}, and RfR_{f} are all “local” properties of the power spectral density, i.e., they depend only on values of S(f)S(f^{\prime}) for f[fW,f+W]f^{\prime}\in[f-W,f+W], whereas MM is a “global” property. Note that if the power spectral density is “slowly varying” over the interval [fW,f+W][f-W,f+W], then mfMfAfRfS(f)m_{f}\approx M_{f}\approx A_{f}\approx R_{f}\approx S(f). However, MM could be several orders of magnitude larger than mfm_{f}, MfM_{f}, AfA_{f}, and RfR_{f} if the power spectral density has a high dynamic range.

By using the bound on the Slepian basis eigenvalues from Section 2.3, we can obtain λK11δ\lambda_{K-1}\geq 1-\delta for some suitably small δ>0\delta>0 by choosing the number of tapers to be K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}). This choice of KK guarantees that 0ΣK(1)ΣK(2)1λK1δ0\leq\Sigma^{(1)}_{K}\leq\Sigma^{(2)}_{K}\leq 1-\lambda_{K-1}\leq\delta, i.e., ΣK(1)\Sigma^{(1)}_{K}, ΣK(2)\Sigma^{(2)}_{K}, and 1λK11-\lambda_{K-1} are all small, and thus, the global property M=maxfS(f)M=\max_{f}S(f) of the power spectral density will have a minimal impact on the non-asymptotic results below. In other words, using K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers mitigates the ability for values of the power spectral density S(f)S(f^{\prime}) at frequencies f[fW,f+W]f^{\prime}\not\in[f-W,f+W] to impact the estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f). However, if K=2NWO(1)K=2NW-O(1) tapers are used, then the quantities ΣK(1)\Sigma^{(1)}_{K}, ΣK(2)\Sigma^{(2)}_{K}, and 1λK11-\lambda_{K-1} could be large enough for the global property M=maxfS(f)M=\max_{f}S(f) of the power spectral density to significantly weaken the non-asymptotic results below. In other words, energy in the power spectral density S(f)S(f^{\prime}) at frequencies f[fW,f+W]f^{\prime}\not\in[f-W,f+W] can “leak” into the estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f).

We begin with a bound on the bias of the multitaper spectral estimate under the additional assumption that the power spectral density is twice differentiable. Note this assumption is only used in Theorem 1.

Theorem 1.

For any frequency ff\in\mathbb{R}, if S(f)S(f^{\prime}) is twice continuously differentiable in [fW,f+W][f-W,f+W], then the bias of the multitaper spectral estimate is bounded by

Bias[S^Kmt(f)]=|𝔼S^Kmt(f)S(f)|Mf′′NW33K+(M+Mf)ΣK(1),\operatorname{Bias}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\left|\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f)\right|\leq\dfrac{M^{\prime\prime}_{f}NW^{3}}{3K}+(M+M_{f})\Sigma_{K}^{(1)},

where

Mf′′=maxf[fW,f+W]|S′′(f)|.M^{\prime\prime}_{f}=\max_{f^{\prime}\in[f-W,f+W]}|S^{\prime\prime}(f^{\prime})|.

If K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers are used for some small δ>0\delta>0, then this upper bound is slightly larger than 16Mf′′W2\tfrac{1}{6}M^{\prime\prime}_{f}W^{2}, which is similar to the asymptotic results in [4, 35, 48, 36, 37] which state that the bias is roughly 16S′′(f)W2\tfrac{1}{6}S^{\prime\prime}(f)W^{2}. However, if K=2NWO(1)K=2NW-O(1) tapers are used, the term (M+Mf)ΣK(1)(M+M_{f})\Sigma_{K}^{(1)} could dominate this bound, and the bias could be much larger than the asymptotic result.

If the power spectral density is not twice-differentiable, we can still obtain the following bound on the bias of the multitaper spectral estimate.

Theorem 2.

For any frequency ff\in\mathbb{R}, the bias of the multitaper spectral estimate is bounded by

Bias[S^Kmt(f)]=|𝔼S^Kmt(f)S(f)|(Mfmf)(1ΣK(1))+MΣK(1).\operatorname{Bias}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\left|\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f)\right|\leq(M_{f}-m_{f})(1-\Sigma_{K}^{(1)})+M\Sigma_{K}^{(1)}.

If K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers are used for some small δ>0\delta>0, then this upper bound is slightly larger than MfmfM_{f}-m_{f}. This guarantees the bias is small when the power spectral density is “slowly varying” over [fW,f+W][f-W,f+W]. However, if K=2NWO(1)K=2NW-O(1) tapers are used, the term MΣK(1)M\Sigma_{K}^{(1)} could dominate this bound, and the bias could be much larger than the asymptotic result.

Next, we state our bound on the variance of the multitaper spectral estimate.

Theorem 3.

For any frequency ff\in\mathbb{R}, the variance of the multitaper spectral estimate is bounded by

Var[S^Kmt(f)]1K(Rf2NWK+MΣK(2))2.\operatorname{Var}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]\leq\dfrac{1}{K}\left(R_{f}\sqrt{\dfrac{2NW}{K}}+M\Sigma^{(2)}_{K}\right)^{2}.

If K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers are used for some small δ>0\delta>0, then this upper bound is slightly larger than 1KRf2\tfrac{1}{K}R_{f}^{2}, which is similar to the asymptotic results in [4, 35, 48, 36, 37] which state that the variance is roughly 1KS(f)2\tfrac{1}{K}S(f)^{2}. However, if K=2NWO(1)K=2NW-O(1) tapers are used, the term MΣK(2)M\Sigma_{K}^{(2)} could dominate this bound, and the variance could be much larger than the asymptotic result.

We also note that if the frequencies f1,f2f_{1},f_{2} are more than 2W2W apart, then the multitaper spectral estimates at those frequencies have a very low covariance.

Theorem 4.

For any frequencies f1,f2f_{1},f_{2}\in\mathbb{R} such that 2W<|f1f2|<12W2W<|f_{1}-f_{2}|<1-2W, the covariance of the multitaper spectral estimates at those frequencies is bounded by

0Cov[S^Kmt(f1),S^Kmt(f2)]((Rf1+Rf2)2NWKΣK(1)+MΣK(1))2.0\leq\operatorname{Cov}\left[\widehat{S}^{\text{mt}}_{K}(f_{1}),\widehat{S}^{\text{mt}}_{K}(f_{2})\right]\leq\left((R_{f_{1}}+R_{f_{2}})\sqrt{\dfrac{2NW}{K}\Sigma^{(1)}_{K}}+M\Sigma^{(1)}_{K}\right)^{2}.

If K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers are used for some small δ>0\delta>0, then the covariance is guaranteed to be small. However, if K=2NWO(1)K=2NW-O(1) tapers are used, the upper bound is no longer guaranteed to be small, and the covariance could be large.

Finally, we also provide a concentration result for the multitaper spectral estimate.

Theorem 5.

For any frequency ff\in\mathbb{R}, the multitaper spectral estimate satisfies the concentration inequalities

{S^Kmt(f)β𝔼S^Kmt(f)}β1eκf(β1lnβ)forβ>1,\mathbb{P}\left\{\widehat{S}^{\text{mt}}_{K}(f)\geq\beta\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)\right\}\leq\beta^{-1}\mathrm{e}^{-\kappa_{f}(\beta-1-\ln\beta)}\quad\text{for}\quad\beta>1,

and

{S^Kmt(f)β𝔼S^Kmt(f)}eκf(β1lnβ)for0<β<1,\mathbb{P}\left\{\widehat{S}^{\text{mt}}_{K}(f)\leq\beta\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)\right\}\leq\mathrm{e}^{-\kappa_{f}(\beta-1-\ln\beta)}\quad\text{for}\quad 0<\beta<1,

where the frequency dependent constant κf\kappa_{f} satisfies

κfK(1ΣK(1))Mf2NW(MfAf)Mf+(MMf)(1λK1).\kappa_{f}\geq\dfrac{K\left(1-\Sigma^{(1)}_{K}\right)M_{f}-2NW(M_{f}-A_{f})}{M_{f}+(M-M_{f})(1-\lambda_{K-1})}.

We note that these are identical to the concentration bounds for a chi-squared random variable with 2κf2\kappa_{f} degrees of freedom. If K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers are used for some small δ>0\delta>0 and the power spectral density is “slowly varying” over [fW,f+W][f-W,f+W], then this lower bound on κf\kappa_{f} is slightly less than KK. Hence, S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) has a concentration behavior that is similar to a chi-squared random variable with 2K2K degrees of freedom, as the asymptotic results in [4, 34] suggest. However, if K=2NWO(1)K=2NW-O(1) tapers are used, then κf\kappa_{f} could be much smaller, and thus, the multitaper spectral estimate would have significantly worse concentration about its mean.

The proofs of Theorems 1-5 are given in Appendix A. In Section 5, we perform simulations demonstrating that using K=2NWO(1)K=2NW-O(1) tapers results in a multitaper spectral estimate that is vulnerable to spectral leakage, whereas using K=2NWO(log(NW)log1δ)K=2NW-O(\log(NW)\log\tfrac{1}{\delta}) tapers for a suitably small δ>0\delta>0 significantly reduces the impact of spectral leakage on the multitaper spectral estimate.

4 Fast Algorithms

Given a vector of NN samples 𝒙N\boldsymbol{x}\in\mathbb{C}^{N}, evaluating the multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) at a grid of LL evenly spaced frequencies f[L]/Lf\in[L]/L (where we assume LNL\geq N) can be done in O(KLlogL)O(KL\log L) operations and using O(KL)O(KL) memory via KK length-LL fast Fourier transforms (FFTs). In applications where the number of samples NN is small, the number of tapers KK used is usually a small constant, and so, the computational requirements are a small constant factor more than that of an FFT. However in many applications, using a large number of tapers is desirable, but the computational requirements make this impractical. As mentioned in Section 1, if the power spectrum S(f)S(f) is twice-differentiable, then the MSE of the multitaper spectral estimate is minimized when the bandwidth parameter is W=O(N1/5)W=O(N^{-1/5}) and K=O(N4/5)K=O(N^{4/5}) tapers are used [36]. For medium to large scale problems, precomputing and storing O(N4/5)O(N^{4/5}) tapers and/or performing O(N4/5)O(N^{4/5}) FFTs may be impractical.

In this section, we present an ϵ\epsilon-approximation S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) to the multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) which requires O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) operations and O(Llog(NW)log1ϵ)O(L\log(NW)\log\tfrac{1}{\epsilon}) memory. This is faster than the exact multitaper spectral estimation provided the number of tapers satisfies Klog(NW)log1ϵK\gtrsim\log(NW)\log\tfrac{1}{\epsilon}.

To construct this approximation, we first fix a tolerance parameter ϵ(0,12)\epsilon\in(0,\tfrac{1}{2}), and suppose that the number of tapers, KK, is chosen such that λK112\lambda_{K-1}\geq\tfrac{1}{2} and λK1ϵ\lambda_{K}\leq 1-\epsilon. Note that this is a very mild assumption as it only forces KK to be slightly less than 2NW2NW. Next, we partition the indices [N][N] into four sets:

1\displaystyle\mathcal{I}_{1} ={k[K]:λk1ϵ}\displaystyle=\{k\in[K]:\lambda_{k}\geq 1-\epsilon\}
2\displaystyle\mathcal{I}_{2} ={k[K]:ϵ<λk<1ϵ}\displaystyle=\{k\in[K]:\epsilon<\lambda_{k}<1-\epsilon\}
3\displaystyle\mathcal{I}_{3} ={k[N][K]:ϵ<λk<1ϵ}\displaystyle=\{k\in[N]\setminus[K]:\epsilon<\lambda_{k}<1-\epsilon\}
4\displaystyle\mathcal{I}_{4} ={k[N][K]:λkϵ}\displaystyle=\{k\in[N]\setminus[K]:\lambda_{k}\leq\epsilon\}

and define the approximate estimator

S~Kmt(f):=1KΨ(f)+1Kk2(1λk)S^k(f)1Kk3λkS^k(f),\widetilde{S}^{\text{mt}}_{K}(f):=\dfrac{1}{K}\Psi(f)+\dfrac{1}{K}\sum_{k\in\mathcal{I}_{2}}(1-\lambda_{k})\widehat{S}_{k}(f)-\dfrac{1}{K}\sum_{k\in\mathcal{I}_{3}}\lambda_{k}\widehat{S}_{k}(f),

where

Ψ(f):=k=0N1λkS^k(f).\Psi(f):=\sum_{k=0}^{N-1}\lambda_{k}\widehat{S}_{k}(f).

Both S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) and S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) are weighted sums of the single taper estimates S^k(f)\widehat{S}_{k}(f) for k[N]k\in[N]. Additionally, it can be shown that the weights are similar, i.e., the first KK weights are exactly or approximately 1K\tfrac{1}{K}, and the last NKN-K weights are exactly or approximately 0. Hence, it is reasonable to expect that S~Kmt(f)S^Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f)\approx\widehat{S}^{\text{mt}}_{K}(f). The following theorem shows that is indeed the case.

Theorem 6.

The approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) defined above satisfies

|S~Kmt(f)S^Kmt(f)|ϵK𝒙22for allf.\left|\widetilde{S}^{\text{mt}}_{K}(f)-\widehat{S}^{\text{mt}}_{K}(f)\right|\leq\dfrac{\epsilon}{K}\|\boldsymbol{x}\|_{2}^{2}\quad\text{for all}\quad f\in\mathbb{R}.

Furthermore, we show in Lemma 8 that Ψ(f)=𝒙𝑬f𝑩𝑬f𝒙\Psi(f)=\boldsymbol{x}^{*}\boldsymbol{E}_{f}\boldsymbol{B}\boldsymbol{E}_{f}^{*}\boldsymbol{x}. This formula doesn’t involve any of the Slepian tapers. By exploiting the fact that the prolate matrix 𝑩\boldsymbol{B} is Toeplitz, we also show in Lemma 8 that if L2NL\geq 2N, then

[Ψ(0L)Ψ(1L)Ψ(L2L)Ψ(L1L)]T=𝑭1(𝒃𝑭|𝑭𝒁𝒙|2),\begin{bmatrix}\Psi(\tfrac{0}{L})&\Psi(\tfrac{1}{L})&\cdots&\Psi(\tfrac{L-2}{L})&\Psi(\tfrac{L-1}{L})\end{bmatrix}^{T}=\boldsymbol{F}^{-1}\left(\boldsymbol{b}\circ\boldsymbol{F}\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right),

where 𝒁L×N\boldsymbol{Z}\in\mathbb{R}^{L\times N} is a matrix which zero-pads length-NN vectors to length-LL, 𝑭L×L\boldsymbol{F}\in\mathbb{C}^{L\times L} is a length-LL FFT matrix, 𝒃L\boldsymbol{b}\in\mathbb{R}^{L} is the first column of the matrix formed by extending the prolate matrix 𝑩\boldsymbol{B} to an L×LL\times L circulant matrix, ||2|\cdot|^{2} denotes the pointwise magnitude-squared, and \circ denotes a pointwise multiplication. Hence, Ψ(f)\Psi(f) can be evaluated at a grid of LL evenly spaced frequencies f[L]/Lf\in[L]/L in O(LlogL)O(L\log L) operations via three length-LL FFTs/inverse FFTs. Evaluating the other #(23)=O(log(NW)log1ϵ)\#(\mathcal{I}_{2}\cup\mathcal{I}_{3})=O(\log(NW)\log\tfrac{1}{\epsilon}) terms in the expression for S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) at the LL grid frequencies can be done in O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) operations via #(23)\#(\mathcal{I}_{2}\cup\mathcal{I}_{3}) length-LL FFTs. Using these results, we establish the following theorem which states how quickly S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) can be evaluated at the grid frequencies.

Theorem 7.

For any vector of samples 𝐱N\boldsymbol{x}\in\mathbb{C}^{N} and any number of grid frequencies LNL\geq N, the approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) can be evaluated at the LL grid frequencies f[L]/Lf\in[L]/L in O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) operations and using O(Llog(NW)log1ϵ)O(L\log(NW)\log\tfrac{1}{\epsilon}) memory.

Note if NL<2NN\leq L<2N, we can apply the method briefly described above to evaluate Ψ(f)\Psi(f) at f[2L]/2Lf\in[2L]/2L, and then downsample the result. The proofs of Theorems 6 and 7 are given in Appendix B. In Section 5, we perform simulations comparing the time needed to evaluate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) and S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) at a grid of frequencies.

5 Simulations

In this section, we show simulations to demonstrate three observations. (1) Using K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers instead of the traditional choice of K=2NW1K=\left\lfloor 2NW\right\rfloor-1 tapers significantly reduce the effects of spectral leakage. (2) Using a larger bandwidth WW, and thus, more tapers can produce a more robust spectral estimate. (3) As the number of samples NN and the number of tapers KK grows, our approximation S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) becomes significantly faster to use than the exact multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f).

5.1 Spectral leakage

First, we demonstrate that choosing K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers instead of the traditional choice of K=2NW1K=\left\lfloor 2NW\right\rfloor-1 tapers significantly reduces the effects of spectral leakage. We fix a signal length of N=2000N=2000, a bandwidth parameter of W=1100W=\tfrac{1}{100} (so 2NW=402NW=40) and consider four choices for the number of tapers: K=39K=39, 3636, 3232, and 2929. Note that K=39=2NW1K=39=\left\lfloor 2NW\right\rfloor-1 is the traditional choice as to how many tapers to use, while 3636, 3232, and 2929 are the largest values of KK such that λK1\lambda_{K-1} is at least 11031-10^{-3}, 11061-10^{-6}, and 11091-10^{-9} respectively.

In Figure 2, we show three plots of the spectral window

ψ(f)=1Kk=0K1|n=0N1𝒔k[n]ej2πfn|2\psi(f)=\dfrac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]e^{-j2\pi fn}\right|^{2}

of the multitaper spectral estimate for each of those values of KK. At the top of Figure 2, we plot ψ(f)\psi(f) over the entire range [12,12][-\tfrac{1}{2},\tfrac{1}{2}] using a logarithmic scale. The lines outside appear thick due to the highly oscillatory behavior of ψ(f)\psi(f). This can be better seen in the middle of Figure 2, where we plot ψ(f)\psi(f) over [2W,2W][-2W,2W] using a logarithmic scale. The behavior of ψ(f)\psi(f) inside [W,W][-W,W] can be better seen at the bottom of Figure 2, where we plot ψ(f)\psi(f) over [2W,2W][-2W,2W] using a linear scale.

All four spectral windows have similar behavior in that ψ(f)\psi(f) is small outside [W,W][-W,W] and large near 0. However, outside of [W,W][-W,W] the spectral windows using K=2NWO(log(NW))K=\left\lfloor 2NW\right\rfloor-O(\log(NW)) tapers are multiple orders of magnitude smaller than the spectral window using K=2NW1K=\left\lfloor 2NW\right\rfloor-1 tapers. Hence, the amount of spectral leakage can be reduced by multiple orders of magnitude by trimming the number of tapers used from K=2NW1K=\left\lfloor 2NW\right\rfloor-1 to K=2NWO(log(NW))K=2NW-O(\log(NW)).

Refer to caption
Refer to caption
Refer to caption
Figure 2: Plots of the spectral windows ψ(f)\psi(f) for N=2000N=2000, W=1100W=\tfrac{1}{100}, and K=39K=39, 3636, 3232, and 2929 tapers. (Top) A logarithmic scale plot over f[12,12]f\in[-\tfrac{1}{2},\tfrac{1}{2}]. (Middle) A logarithmic scale plot over f[2W,2W]f\in[-2W,2W]. (Bottom) A linear scale plot over f[2W,2W]f\in[-2W,2W].

We further demonstrate the importance of using K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers to reduce spectral leakage by showing a signal detection example. We generate a vector 𝒙N\boldsymbol{x}\in\mathbb{C}^{N} of N=2000N=2000 samples of a Gaussian random process with a power spectral density function of

S(f)={103iff[0.18,0.22]109iff[0.28,0.32]102iff[0.38,0.42]101iff[0.78,0.82]100else.S(f)=\begin{cases}10^{3}&\text{if}\ f\in[0.18,0.22]\\ 10^{9}&\text{if}\ f\in[0.28,0.32]\\ 10^{2}&\text{if}\ f\in[0.38,0.42]\\ 10^{1}&\text{if}\ f\in[0.78,0.82]\\ 10^{0}&\text{else}\end{cases}.

This simulates an antenna receiving signals from four narrowband sources with some background noise. Note that one source is significantly stronger than the other three sources. In Figure 3, we plot:

  • the periodogram of 𝒙\boldsymbol{x},

  • the multitaper spectral estimate of 𝒙\boldsymbol{x} with W=1100W=\tfrac{1}{100} and K=2NW1=39K=\left\lfloor 2NW\right\rfloor-1=39 tapers,

  • the multitaper spectral estimate of 𝒙\boldsymbol{x} with W=1100W=\tfrac{1}{100} and K=29K=29 tapers (chosen so λK11109\lambda_{K-1}\geq 1-10^{-9}).

Refer to caption
Figure 3: Plots of the true power spectral density, the periodogram, the multitaper spectral estimate with W=1100W=\tfrac{1}{100} and K=2NW1=39K=\left\lfloor 2NW\right\rfloor-1=39, and the multitaper spectral estimate with W=1100W=\tfrac{1}{100} and K=29K=29 tapers (chosen so λK11109\lambda_{K-1}\geq 1-10^{-9}).

We note that all three spectral estimates yield large values in the frequency band [0.28,0.32][0.28,0.32]. However, in the periodogram and the multitaper spectral estimate with K=39K=39 tapers, the energy in the frequency band [0.28,0.32][0.28,0.32] “leaks” into the frequency bands occupied by the smaller three sources. As a result, the smaller three sources are hard to detect using the periodogram or the multitaper spectral estimate with K=39K=39 tapers. However, all four sources are clearly visible when looking at the multitaper spectral estimate with K=29K=29 tapers. For frequencies ff not within WW of the edges of the frequency band, the multitaper spectral estimate is within a small constant factor of the true power spectral density.

5.2 Comparison of spectral estimation methods

Next, we demonstrate a few key points about selecting the bandwidth parameter WW and the number of tapers KK used in the multitaper spectral estimate. We compare the following eight methods to estimate the power spectrum of a Gaussian random process from a vector 𝒙N\boldsymbol{x}\in\mathbb{C}^{N} of N=218=262144N=2^{18}=262144 samples:

  1. 1.

    The classic periodogram

  2. 2.

    A tapered periodogram using a single DPSS taper 𝒔0\boldsymbol{s}_{0} with 2NW=82NW=8 (chosen such that λ01109\lambda_{0}\geq 1-10^{-9})

  3. 3.

    The exact multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) with a small bandwidth parameter of W=1.25×104W=1.25\times 10^{-4} and K=2NW1=64K=\left\lfloor 2NW\right\rfloor-1=64 tapers

  4. 4.

    The exact multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) with a small bandwidth parameter of W=1.25×104W=1.25\times 10^{-4} and K=53K=53 tapers (chosen such that λK11109>λK\lambda_{K-1}\geq 1-10^{-9}>\lambda_{K})

  5. 5.

    The approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) with a larger bandwidth parameter of W=2.0×103W=2.0\times 10^{-3}, K=2NW1=1047K=\left\lfloor 2NW\right\rfloor-1=1047 tapers, and a tolerance parameter of ϵ=109\epsilon=10^{-9}

  6. 6.

    The approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) with a larger bandwidth parameter of W=2.0×103W=2.0\times 10^{-3}, K=1031K=1031 tapers (chosen such that λK11109>λK\lambda_{K-1}\geq 1-10^{-9}>\lambda_{K}), and a tolerance parameter of ϵ=109\epsilon=10^{-9}

  7. 7.

    The exact multitaper spectral estimate with the adaptive weighting scheme suggested by Thomson[4] with a small bandwidth parameter of W=1.25×104W=1.25\times 10^{-4} and K=2NW1=64K=\left\lfloor 2NW\right\rfloor-1=64 tapers

  8. 8.

    The exact multitaper spectral estimate with the adaptive weighting scheme suggested by Thomson[4] with a larger bandwidth parameter of W=2.0×103W=2.0\times 10^{-3} and K=2NW1=1047K=\left\lfloor 2NW\right\rfloor-1=1047 tapers

The adaptive weighting scheme computes the single taper periodograms S^k(f)\widehat{S}_{k}(f) for k[K]k\in[K], and then forms a weighted estimate

S^Kad(f)=k=0K1αk(f)S^k(f)k=0K1αk(f)\widehat{S}^{\text{ad}}_{K}(f)=\dfrac{\sum_{k=0}^{K-1}\alpha_{k}(f)\widehat{S}_{k}(f)}{\sum_{k=0}^{K-1}\alpha_{k}(f)}

where the frequency dependent weights αk(f)\alpha_{k}(f) satisfy

αk(f)=λkS^Kad(f)2(λkS^Kad(f)+(1λk)σ2)2\alpha_{k}(f)=\dfrac{\lambda_{k}\widehat{S}^{\text{ad}}_{K}(f)^{2}}{\left(\lambda_{k}\widehat{S}^{\text{ad}}_{K}(f)+(1-\lambda_{k})\sigma^{2}\right)^{2}}

where σ2=1N𝒙22\sigma^{2}=\tfrac{1}{N}\|\boldsymbol{x}\|_{2}^{2}. Of course, solving for the weights directly is difficult, so this method requires initializing the weights and alternating between updating the estimate S^Kad(f)\widehat{S}^{\text{ad}}_{K}(f) and updating the weights αk(f)\alpha_{k}(f). This weighting procedure is designed to keep all the weights large at frequencies where S(f)S(f) is large and reduce the weights of the last few tapers at frequencies where S(f)S(f) is small. Effectively, this allows the spectrum to be estimated with more tapers at frequencies where S(f)S(f) is large while simultaneously reducing the spectral leakage from the last few tapers at frequencies where S(f)S(f) is small. The cost is the increased computation time due to setting the weights iteratively. For more details regarding this adaptive scheme, see [8].

In Figure 4, we plot the power spectrum and the eight estimates for a single realization 𝒙N\boldsymbol{x}\in\mathbb{C}^{N} of the Gaussian random process. Additionally, for 10001000 realizations 𝒙iN\boldsymbol{x}_{i}\in\mathbb{C}^{N}, i=1,,1000i=1,\ldots,1000 of the Gaussian random process, we compute a spectral estimate using each of the above eight methods. In Figure 5, we plot the empirical mean logarithmic deviation in dB, i.e.,

11000i=11000|10log10S^[𝒙i](f)S(f)|\dfrac{1}{1000}\sum_{i=1}^{1000}\left|10\log_{10}\dfrac{\widehat{S}[\boldsymbol{x}_{i}](f)}{S(f)}\right|

for each of the eight methods. In Table 1, we list the average time needed to precompute the DPSS tapers, the average time to compute the spectral estimate after the tapers are computed, and the average of the empirical mean logarithmic deviation in dB across the frequency spectrum.

We make the following observations:

  • The periodogram and the single taper periodogram (methods 1 and 2) are too noisy to be useful spectral estimates.

  • Methods 3, 4, and 7 yield a noticeably noisier spectral estimate than methods 5, 6, and 8. This is due to the fact that methods 5, 6, and 8 use a larger bandwidth parameter and more tapers.

  • The spectral estimates obtained with methods 1, 3 and 5 suffer from spectral leakage, i.e., the error is large at frequencies ff where S(f)S(f) is small, as can be seen in Figure 5. This is due to the fact that they use K=2NW1K=\left\lfloor 2NW\right\rfloor-1 tapers, and thus, include tapered periodograms S^k(f)\widehat{S}_{k}(f) for which λk\lambda_{k} is not very close to 11.

  • Methods 4 and 6 avoid using tapered periodograms S^k(f)\widehat{S}_{k}(f) for which λk<1109\lambda_{k}<1-10^{-9} and methods 7 and 8 use these tapered periodograms but assign a low weight to them at frequencies where S(f)S(f) is small. Hence, methods 4, 6, 7, and 8 are able to mitigate the spectral leakage phenomenon.

  • Methods 5 and 6 are slightly faster than methods 3 and 4 due to the fact that our approximate multitaper spectral estimate only needs to compute #{k:ϵ<λk<1ϵ}=36\#\{k:\epsilon<\lambda_{k}<1-\epsilon\}=36 tapers and 3636 tapered periodograms.

  • Method 7 takes noticeably longer than methods 3 and 4, and method 8 takes considerably longer than methods 5 and 6. This is because the iterative method for computing the adaptive weights requires many iterations to converge when the underlying power spectral density has a high dynamic range.

  • Methods 6 and 8 exhibit very similar performance. This is to be expected, as using a weighted average of 10471047 tapered periodograms is similar to using the unweighted average of the first 10311031 tapered periodograms. The empirical mean logarithmic deviation is larger at frequencies where S(f)S(f) is rapidly varying and smaller at frequencies where S(f)S(f) is slowly varying. This is to be expected as the local bias (caused due to the smoothing effect of the tapers) dominates the variance at these frequencies.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Plots of the true spectrum (blue) and the spectral estimates (red) using each of the eight methods.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Plots of the empirical mean logarithmic deviation using each of the eight methods.
Method Avg MLD (dB) Avg Precomputation Time (sec) Avg Run Time (sec)
1 5.83030 N/A 0.0111
2 4.41177 0.1144 0.0111
3 1.48562 4.3893 0.3771
4 0.47836 3.6274 0.3093
5 1.56164 3.3303 0.2561
6 0.12534 3.3594 0.2533
7 0.45080 4.3721 1.4566
8 0.12532 78.9168 17.2709
Table 1: Table of mean logarithmic deviations (averaged across entire frequency spectrum), precomputation times, and computation times for each of the eight spectral estimation methods.

From this, we can draw several conclusions. First, by slightly trimming the number of tapers from K=2NW1K=\left\lfloor 2NW\right\rfloor-1 to 2NWO(log(NW))2NW-O(\log(NW)), one can significantly mitigate the amount of spectral leakage in the spectral estimate. Second, using a larger bandwidth parameter and averaging over more tapered periodograms will result in a less noisy spectral estimate. Third, our fast approximate multitaper spectral estimate can yield a more accurate estimate in the same amount of time as an exact multitaper spectral estimate because our fast approximation needs to compute significantly fewer tapers and tapered periodograms. Fourth, a multitaper spectral estimate with 2NW1\left\lfloor 2NW\right\rfloor-1 tapers and adaptive weights can yield a slightly lower error than a multitaper spectral estimate with 2NWO(log(NW))2NW-O(\log(NW)) tapers and fixed weights, but as 2NW2NW increases, this effect becomes minimal and not worth the increased computational cost.

5.3 Fast multitaper spectral estimation

Finally, we demonstrate that the runtime for computing the approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) scales roughly linearly with the number of samples. We vary the signal length NN over several values between 2102^{10} and 2202^{20}. For each value of NN, we set the bandwidth parameter to be W=0.08N1/5W=0.08\cdot N^{-1/5} as this is similar to what is asymptotically optimal for a twice differentiable spectrum [36]. We then choose a number of tapers KK such that λK11103>λK\lambda_{K-1}\geq 1-10^{-3}>\lambda_{K}, and then compute the approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) at f[N]/Nf\in[N]/N for each of the tolerances ϵ=104\epsilon=10^{-4}, 10810^{-8}, and 101210^{-12}. If N217N\leq 2^{17}, we also compute the exact multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) at f[N]/Nf\in[N]/N. For N>217N>2^{17}, we did not evaluate the exact multitaper spectral estimate due to the excessive computational requirements.

We split the work needed to produce the spectral estimate into a precomputation stage and a computation stage. The precomputation stage involves computing the Slepian tapers which are required for the spectral estimate. In applications where a multitaper spectral estimate needs to be computed for several signals (using the same parameters N,W,KN,W,K), computing the tapers only needs to be done once. The exact multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) requires computing 𝒔k\boldsymbol{s}_{k} for k[K]k\in[K], while the approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) requires computing 𝒔k\boldsymbol{s}_{k} for k23={k:ϵ<λk<1ϵ}k\in\mathcal{I}_{2}\cup\mathcal{I}_{3}=\{k:\epsilon<\lambda_{k}<1-\epsilon\}. The computation stage involves evaluating S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) or S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) at f[N]/Nf\in[N]/N with the required tapers 𝒔k\boldsymbol{s}_{k} already computed.

In Figures 6 and 7, we plot the precomputation and computation time respectively versus the signal length NN for the exact multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) as well as the approximate multitaper spectral estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) for ϵ=104\epsilon=10^{-4}, 10810^{-8}, and 101210^{-12}. The times were averaged over 100100 trials of the procedure described above. The plots are on a log-log scale. We note that the precomputation and computation times for the approximate multitaper spectral estimate grow roughly linearly with NN while the precomputation and computation times for the exact multitaper spectral estimate grow significantly faster. Also, the precomputation and computation times for the approximate multitaper spectral estimate with the very small tolerance ϵ=1012\epsilon=10^{-12} are only two to three times more than those for the larger tolerance ϵ=104\epsilon=10^{-4}.

Refer to caption
Figure 6: Plot of the average precomputation time vs. signal length NN for the exact multitaper spectral estimate and our fast approximation for ϵ=104\epsilon=10^{-4}, 10810^{-8}, and 101210^{-12}.
Refer to caption
Figure 7: Plot of the average computation time vs. signal length NN for the exact multitaper spectral estimate and our fast approximation for ϵ=104\epsilon=10^{-4}, 10810^{-8}, and 101210^{-12}.

6 Conclusions

Thomson’s multitaper method is a widely-used tool for spectral estimation. In this paper, we have presented a linear algebra based perspective of Thomson’s multitaper method. Specifically, the multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) can be viewed as the 22-norm energy of the projection of the signal 𝒙N\boldsymbol{x}\in\mathbb{C}^{N} onto a KK-dimensional subspace 𝒮fN\mathcal{S}_{f}\subset\mathbb{C}^{N}. The subspace 𝒮f\mathcal{S}_{f} is chosen to be the optimal KK-dimensional subspace for representing sinusoids with frequencies between fWf-W and f+Wf+W.

We have also provided non-asymptotic bounds on the bias, variance, covariance, and probability tails of the multitaper spectral estimate. In particular, if the multitaper spectral estimate uses K=2NWO(log(NW))K=2NW-O(\log(NW)) tapers, the effects of spectral leakage are mitigated, and our non-asymptotic bounds are very similar to existing asymptotic results. However, when the traditional choice of K=2NW1K=\left\lfloor 2NW\right\rfloor-1 tapers is used, the non-asymptotic bounds are significantly worse, and our simulations show the multitaper spectral estimate suffers severely from spectral leakage.

Finally, we have derived an ϵ\epsilon-approximation to the multitaper spectral estimate which can be evaluated at a grid of LL evenly spaced frequencies in O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) operations. Since evaluating the exact multitaper spectral estimate at a grid of LL evenly spaced frequencies requires O(KLlogL)O(KL\log L) operations, our ϵ\epsilon-approximation is faster when Klog(NW)log1ϵK\gtrsim\log(NW)\log\tfrac{1}{\epsilon} tapers are required. In problems involving a large number of samples, minimizing the mean-squared error of the multitaper spectral estimate requires using a large number of tapers, and thus, our ϵ\epsilon-approximation can drastically reduce the required computation.

References

  • [1] S. Karnik, J. Romberg, and M. A. Davenport. Fast multitaper spectral estimation. In 2019 13th International conference on Sampling Theory and Applications (SampTA), pages 1–4, Bordeaux, France, July 2019. IEEE.
  • [2] J. H. McClellan, R. W. Schafer, and M. A. Yoder. Signal Processing First. Pearson, Upper Saddle River, NJ, 2003.
  • [3] D. Slepian. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V – The discrete case. Bell Sys. Tech. J., 57(5):1371–1430, 1978.
  • [4] D. J. Thomson. Spectrum estimation and harmonic analysis. Proc. IEEE, 70(9):1055–1096, September 1982.
  • [5] S. Haykin. Cognitive radio: brain-empowered wireless communications. IEEE J. Sel. Areas Commun., 23(2):201–220, February 2005.
  • [6] B. Farhang-Boroujeny. Filter Bank Spectrum Sensing for Cognitive Radios. IEEE Trans. Signal Process., 56(5):1801–1811, May 2008.
  • [7] B. Farhang-Boroujeny and R. Kempter. Multicarrier communication techniques for spectrum sensing and communication in cognitive radios. IEEE Communications Magazine, 46(4):80–85, April 2008.
  • [8] S. Haykin, D. J. Thomson, and J. H. Reed. Spectrum sensing for cognitive radio. Proc. IEEE, 97(5):849–877, May 2009.
  • [9] E. Axell, G. Leus, E. Larsson, and H. Poor. Spectrum sensing for cognitive radio : State-of-the-art and recent advances. IEEE Signal Processing Magazine, 29(3):101–116, May 2012.
  • [10] K. N. Hamdy, M. Ali, and A. H. Tewfik. Low bit rate high quality audio coding with combined harmonic and wavelet representations. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Conference Proceedings, volume 2, pages 1045–1048, Atlanta, GA, USA, 1996. IEEE.
  • [11] T. Painter and A. Spanias. Perceptual coding of digital audio. Proc. IEEE, 88(4):451–515, April 2000.
  • [12] A. Delorme and S. Makeig. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neuroscience Methods, 134(1):9–21, March 2004.
  • [13] A. Delorme, T. Sejnowski, and S. Makeig. Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. NeuroImage, 34(4):1443–1449, February 2007.
  • [14] R. R. Llinas, U. Ribary, D. Jeanmonod, E. Kronberg, and P. P. Mitra. Thalamocortical dysrhythmia: A neurological and neuropsychiatric syndrome characterized by magnetoencephalography. Proc. Nat. Acad. Sci., 96(26):15222–15227, December 1999.
  • [15] B. Pesaran, J. S. Pezaris, M. Sahani, P. P. Mitra, and R. A. Andersen. Temporal structure in neuronal activity during working memory in macaque parietal cortex. Nature Neuroscience, 5(8):805–811, August 2002.
  • [16] J. Csicsvari, B. Jamieson, K. D. Wise, and G. Buzsáki. Mechanisms of gamma oscillations in the hippocampus of the behaving rat. Neuron, 37(2):311–322, January 2003.
  • [17] P. P. Mitra and B. Pesaran. Analysis of dynamic brain imaging data. Biophysical Journal, 76(2):691–708, February 1999.
  • [18] M. W. Jones and M. A. Wilson. Theta rhythms coordinate hippocampal–prefrontal interactions in a spatial memory task. PLoS Biology, 3(12):e402, November 2005.
  • [19] G. Bond, W. Showers, M. Cheseby, R. Lotti, P. Almasi, P. de Menocal, P. Priore, H. Cullen, I. Hajdas, and G. Bonani. A pervasive millennial-scale cycle in north atlantic holocene and glacial climates. Science, 278(5341):1257–1266, November 1997.
  • [20] M. Ghil, M. R. Allen, M. D. Dettinger, K. Ide, D. Kondrashov, M. E. Mann, A. W. Robertson, A. Saunders, Y. Tian, F. Varadi, and P. Yiou. Advanced spectral methods for climatic time series. Rev. Geophys., 40(1):3–1–3–41, 2002.
  • [21] R. Vautard and M. Ghil. Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series. Physica D: Nonlinear Phenomena, 35(3):395–424, May 1989.
  • [22] M. E. Mann and J. M. Lees. Robust estimation of background noise and signal detection in climatic time series. Climatic Change, 33(3):409–445, July 1996.
  • [23] S. Minobe. A 50-70 year climatic oscillation over the north pacific and north america. Geophys. Res. Lett., 24(6):683–686, March 1997.
  • [24] J. Jouzel, N. I. Barkov, J. M. Barnola, M. Bender, J. Chappellaz, C. Genthon, V. M. Kotlyakov, V. Lipenkov, C. Lorius, J. R. Petit, D. Raynaud, G. Raisbeck, C. Ritz, T. Sowers, M. Stievenard, F. Yiou, and P. Yiou. Extending the Vostok ice-core record of palaeoclimate to the penultimate glacial period. Nature, 364(6436):407–412, July 1993.
  • [25] D. J. Thomson. Time series analysis of holocene climate data. Philos. Trans. Roy. Soc. London. Ser. A, Math. and Physical Sci., 330(1615):601–616, 1990.
  • [26] D. Brunton, A. Rodrigo, and E. Marks. Ecstatic display calls of the Adélie penguin honestly predict male condition and breeding success. Behaviour, 147(2):165–184, 2010.
  • [27] S. Saar and P. P. Mitra. A technique for characterizing the development of rhythms in bird song. PLoS ONE, 3(1):e1461, January 2008.
  • [28] M. Hansson-Sandsten, M. Tarka, J. Caissy-Martineau, B. Hansson, and D. Hasselquist. SVD-based classification of bird singing in different time-frequency domains using multitapers. In 2011 19th European Signal Processing Conference, pages 966–970, August 2011. ISSN: 2076-1465.
  • [29] A. Leonardo and M. Konishi. Decrystallization of adult birdsong by perturbation of auditory feedback. Nature, 399(6735):466–470, June 1999.
  • [30] O. Tchernichovski, F. Nottebohm, C. E. Ho, B. Pesaran, and P. P. Mitra. A procedure for an automated measurement of song similarity. Animal Behaviour, 59(6):1167–1176, June 2000.
  • [31] M. A. Wieczorek. Gravity and topography of the terrestrial planets. In G. Schubert, editor, Treatise on Geophysics, pages 165–206. Elsevier, Amsterdam, 2nd edition, January 2007.
  • [32] S. G. Claudepierre, S. R. Elkington, and M. Wiltberger. Solar wind driving of magnetospheric ULF waves: Pulsations driven by velocity shear at the magnetopause: ULF waves driven by velocity shear at the magnetopause. J. Geophys. Res.: Space Phys., 113(A5), May 2008.
  • [33] B. Allen and J. D. Romano. Detecting a stochastic background of gravitational radiation: Signal processing strategies and sensitivities. Phys. Rev. D, 59(10):102001, March 1999.
  • [34] D. B. Percival and A. T. Walden. Spectral analysis for physical applications: multitaper and conventional univariate techniques. Cambridge University Press, Cambridge ; New York, NY, USA, 1993.
  • [35] A. T. Walden. A unified view of multitaper multivariate spectral estimation. Biometrika, 87(4):767–788, December 2000.
  • [36] L. D. Abreu and J. L. Romero. MSE estimates for multitaper spectral estimation and off-grid compressive sensing. IEEE Trans. Inf. Theory, 63(12):7770–7776, December 2017.
  • [37] C. L. Haley and M. Anitescu. Optimal bandwidth for multitaper spectrum estimation. IEEE Sig. Proc. Lett., 24(11):1696–1700, November 2017.
  • [38] G. G. Stokes. On a method of detecting inequalities of unknown periods in a series of observations. Proc. Roy. Soc. London, 29:122–123, 1879.
  • [39] A. Schuster. On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena. Terr. Magnetism, 3(1):13–41, 1898.
  • [40] F. J. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proc. IEEE, 66(1):51–83, 1978.
  • [41] J. M. Varah. The prolate matrix. Linear Algebra and its Applications, 187:269–278, July 1993.
  • [42] A. W. Bojanczyk, R. P. Brent, F. R. De Hoog, and D. R. Sweet. On the stability of the Bareiss and related Toeplitz factorization algorithms. SIAM J. Matrix Anal. Appl., 16(1):40–57, January 1995.
  • [43] H. Stark and J. W. Woods. Probability, random processes, and estimation theory for engineers. Prentice-Hall, Englewood Cliffs, N.J, 1986.
  • [44] S. Karnik, J. Romberg, and M. A. Davenport. Improved bounds for the eigenvalues of prolate spheroidal wave functions and discrete prolate spheroidal sequences. arXiv:2006.00427 [math], September 2020. arXiv: 2006.00427.
  • [45] Z. Zhu and M. B. Wakin. Approximating sampled sinusoids and multiband signals using multiband modulated DPSS dictionaries. J. Fourier Anal. Appl., 23(6):1263–1310, December 2017.
  • [46] M. Boulsane, N. Bourguiba, and A. Karoui. Discrete prolate spheroidal wave functions: Further spectral analysis and some related applications. Journal of Scientific Computing, 82(3):1–19, February 2020.
  • [47] S. Karnik, Z. Zhu, M. B. Wakin, J. Romberg, and M. A. Davenport. The fast Slepian transform. Applied and Computational Harmonic Analysis, 46(3):624–652, May 2019.
  • [48] K. S. Lii and M. Rosenblatt. Prolate spheroidal spectral estimates. Statist. & Probability. Lett., 78(11):1339–1348, August 2008.
  • [49] P. H. M. Janssen and P. Stoica. On the expectation of the product of four matrix-valued Gaussian random variables. IEEE Trans. Autom. Control, 33(9):867–870, September 1988.
  • [50] S. Janson. Tail bounds for sums of geometric and exponential variables. Statist. & Probability. Lett., 135:1–6, April 2018.
  • [51] D. L. Hanson and F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Statist., 42(3):1079–1083, June 1971.
  • [52] F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables whose distributions are not necessarily symmetric. Ann. Probability, 1(6):1068–1070, December 1973.
  • [53] D. M. Gruenbacher and D. R. Hummels. A simple algorithm for generating discrete prolate spheroidal sequences. IEEE Trans. Signal Process., 42(11):3276–3278, November 1994.

Appendix A Proof of Results in Section 3

A.1 Norms of Gaussian random variables

In this subsection, we develop results on the 22-norm of linear combinations of Gaussian random variables, which will be critical to our proofs of the theorems in Section 3.

First, we state a result from [49] regarding the product of four jointly complex Gaussian vectors.

Lemma 1.

[49] Suppose 𝐚,𝐛,𝐜,𝐝K\boldsymbol{a},\boldsymbol{b},\boldsymbol{c},\boldsymbol{d}\in\mathbb{C}^{K} have a joint complex Gaussian distribution. Then

𝔼[𝒂𝒃𝒄𝒅]=𝔼[𝒂𝒃]𝔼[𝒄𝒅]+𝔼[𝒄𝒂]𝔼[𝒅𝒃]+𝔼[𝒂𝔼[𝒃𝒄]𝒅]+2𝔼[𝒂]𝔼[𝒃]𝔼[𝒄]𝔼[𝒅],\mathbb{E}[\boldsymbol{a}^{*}\boldsymbol{b}\boldsymbol{c}^{*}\boldsymbol{d}]=\mathbb{E}[\boldsymbol{a}^{*}\boldsymbol{b}]\mathbb{E}[\boldsymbol{c}^{*}\boldsymbol{d}]+\mathbb{E}[\boldsymbol{c}^{*}\otimes\boldsymbol{a}^{*}]\mathbb{E}[\boldsymbol{d}\otimes\boldsymbol{b}]+\mathbb{E}[\boldsymbol{a}^{*}\mathbb{E}[\boldsymbol{b}\boldsymbol{c}^{*}]\boldsymbol{d}]+2\mathbb{E}[\boldsymbol{a}^{*}]\mathbb{E}[\boldsymbol{b}]\mathbb{E}[\boldsymbol{c}^{*}]\mathbb{E}[\boldsymbol{d}],

where \otimes denotes the Kronecker product.

It should be noted that [49] proves a more general result for the product of four joint complex Gaussian matrices, but this is a bit harder to state. So we give the result for vectors.

We now prove a result regarding the norm-squared of linear combinations of complex Gaussians.

Lemma 2.

Let 𝐱𝒞𝒩(𝟎,𝐑)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}) for some positive semidefinite 𝐑N×N\boldsymbol{R}\in\mathbb{C}^{N\times N} and let 𝐔,𝐕K×N\boldsymbol{U},\boldsymbol{V}\in\mathbb{C}^{K\times N}. Then, we have:

𝔼[𝑼𝒙22]=tr[𝑼𝑹𝑼]andCov[𝑼𝒙22,𝑽𝒙22]=𝑼𝑹𝑽F2.\mathbb{E}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2}\right]=\operatorname{tr}[\boldsymbol{U}\boldsymbol{R}\boldsymbol{U}^{*}]\quad\text{and}\quad\operatorname{Cov}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2},\left\|\boldsymbol{V}\boldsymbol{x}\right\|_{2}^{2}\right]=\|\boldsymbol{U}\boldsymbol{R}\boldsymbol{V}^{*}\|_{F}^{2}.
Proof.

The expectation of 𝑼𝒙22\|\boldsymbol{U}\boldsymbol{x}\|_{2}^{2} can be computed as follows

𝔼[𝑼𝒙22]=𝔼[tr[𝑼𝒙𝒙𝑼]]=tr[𝔼[𝑼𝒙𝒙𝑼]]=tr[𝑼𝔼[𝒙𝒙]𝑼]=tr[𝑼𝑹𝑼].\mathbb{E}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2}\right]=\mathbb{E}\left[\operatorname{tr}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\right]=\operatorname{tr}\left[\mathbb{E}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\right]=\operatorname{tr}\left[\boldsymbol{U}\mathbb{E}\left[\boldsymbol{x}\boldsymbol{x}^{*}\right]\boldsymbol{U}^{*}\right]=\operatorname{tr}\left[\boldsymbol{U}\boldsymbol{R}\boldsymbol{U}^{*}\right].

Since the entries of 𝒙\boldsymbol{x} are jointly Gaussian with mean 0, the entries of 𝑼𝒙\boldsymbol{U}\boldsymbol{x} and 𝑽𝒙\boldsymbol{V}\boldsymbol{x} are also jointly Gaussian with mean 0. Then, by applying Lemma 1 (with 𝒂=𝒃=𝑼𝒙\boldsymbol{a}=\boldsymbol{b}=\boldsymbol{U}\boldsymbol{x} and 𝒄=𝒅=𝑽𝒙\boldsymbol{c}=\boldsymbol{d}=\boldsymbol{V}\boldsymbol{x}), we find that

Cov[𝑼𝒙22,𝑽𝒙22]\displaystyle\operatorname{Cov}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2},\left\|\boldsymbol{V}\boldsymbol{x}\right\|_{2}^{2}\right] =𝔼[𝑼𝒙22𝑽𝒙22]𝔼[𝑼𝒙22]𝔼[𝑽𝒙22]\displaystyle=\mathbb{E}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2}\cdot\left\|\boldsymbol{V}\boldsymbol{x}\right\|_{2}^{2}\right]-\mathbb{E}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2}\right]\mathbb{E}\left[\left\|\boldsymbol{V}\boldsymbol{x}\right\|_{2}^{2}\right]
=𝔼[𝒙𝑼𝑼𝒙𝒙𝑽𝑽𝒙]𝔼[𝒙𝑼𝑼𝒙]𝔼[𝒙𝑽𝑽𝒙]\displaystyle=\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{U}^{*}\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{V}^{*}\boldsymbol{V}\boldsymbol{x}\right]-\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{U}^{*}\boldsymbol{U}\boldsymbol{x}\right]\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{V}^{*}\boldsymbol{V}\boldsymbol{x}\right]
=𝔼[𝒙𝑽𝒙𝑼]𝔼[𝑽𝒙𝑼𝒙]+𝔼[𝒙𝑼𝔼[𝑼𝒙𝒙𝑽]𝑽𝒙]\displaystyle=\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{V}^{*}\otimes\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\mathbb{E}\left[\boldsymbol{V}\boldsymbol{x}\otimes\boldsymbol{U}\boldsymbol{x}\right]+\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{U}^{*}\mathbb{E}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{V}^{*}\right]\boldsymbol{V}\boldsymbol{x}\right]
+2𝔼[𝒙𝑼]𝔼[𝑼𝒙]𝔼[𝒙𝑽]𝔼[𝑽𝒙].\displaystyle\ \ \ \ \ +2\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{U}^{*}]\mathbb{E}[\boldsymbol{U}\boldsymbol{x}]\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{V}^{*}]\mathbb{E}[\boldsymbol{V}\boldsymbol{x}].

We proceed to evaluate each of these three terms.

Since 𝒙𝒞𝒩(𝟎,𝑹)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}), we can write 𝒙=𝑹1/2𝒚\boldsymbol{x}=\boldsymbol{R}^{1/2}\boldsymbol{y} where 𝒚𝒞𝒩(𝟎,𝐈)\boldsymbol{y}\sim\mathcal{CN}(\boldsymbol{0},{\bf I}) and 𝑹1/2\boldsymbol{R}^{1/2} is the unique positive semidefinite squareroot of 𝑹\boldsymbol{R}. Then, by using the identity 𝑿1𝑿2𝑿3𝑿4=(𝑿1𝑿3)(𝑿2𝑿4)\boldsymbol{X}_{1}\boldsymbol{X}_{2}\otimes\boldsymbol{X}_{3}\boldsymbol{X}_{4}=(\boldsymbol{X}_{1}\otimes\boldsymbol{X}_{3})(\boldsymbol{X}_{2}\otimes\boldsymbol{X}_{4}) for appropriately sized matrices 𝑿1,𝑿2,𝑿3,𝑿4\boldsymbol{X}_{1},\boldsymbol{X}_{2},\boldsymbol{X}_{3},\boldsymbol{X}_{4}, we get

𝔼[𝑽𝒙𝑼𝒙]=𝔼[𝑽𝑹1/2𝒚𝑼𝑹1/2𝒚]=𝔼[(𝑽𝑹1/2𝑼𝑹1/2)(𝒚𝒚)]=(𝑽𝑹1/2𝑼𝑹1/2)𝔼[𝒚𝒚]\mathbb{E}[\boldsymbol{V}\boldsymbol{x}\otimes\boldsymbol{U}\boldsymbol{x}]=\mathbb{E}\left[\boldsymbol{V}\boldsymbol{R}^{1/2}\boldsymbol{y}\otimes\boldsymbol{U}\boldsymbol{R}^{1/2}\boldsymbol{y}\right]=\mathbb{E}\left[\left(\boldsymbol{V}\boldsymbol{R}^{1/2}\otimes\boldsymbol{U}\boldsymbol{R}^{1/2}\right)(\boldsymbol{y}\otimes\boldsymbol{y})\right]=\left(\boldsymbol{V}\boldsymbol{R}^{1/2}\otimes\boldsymbol{U}\boldsymbol{R}^{1/2}\right)\mathbb{E}[\boldsymbol{y}\otimes\boldsymbol{y}]

Since the entries of 𝒚\boldsymbol{y} are i.i.d. 𝒞𝒩(0,1)\mathcal{CN}(0,1), 𝔼[𝒚[n]𝒚[n]]=0\mathbb{E}\left[\boldsymbol{y}[n]\boldsymbol{y}[n^{\prime}]\right]=0 for all indices n,n=0,,N1n,n^{\prime}=0,\ldots,N-1. (Note that 𝔼[𝒚[n]2]0\mathbb{E}[\boldsymbol{y}[n]^{2}]\neq 0 if the entries of 𝒚\boldsymbol{y} were i.i.d. 𝒩(0,1)\mathcal{N}(0,1) instead of 𝒞𝒩(0,1)\mathcal{CN}(0,1).) Hence, 𝔼[𝒚𝒚]=𝟎\mathbb{E}[\boldsymbol{y}\otimes\boldsymbol{y}]=\boldsymbol{0}, and thus, 𝔼[𝑽𝒙𝑼𝒙]=(𝑽𝑹1/2𝑼𝑹1/2)𝔼[𝒚𝒚]=𝟎\mathbb{E}[\boldsymbol{V}\boldsymbol{x}\otimes\boldsymbol{U}\boldsymbol{x}]=(\boldsymbol{V}\boldsymbol{R}^{1/2}\otimes\boldsymbol{U}\boldsymbol{R}^{1/2})\mathbb{E}[\boldsymbol{y}\otimes\boldsymbol{y}]=\boldsymbol{0}. Similarly, 𝔼[𝒙𝑽𝒙𝑼]=𝟎\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{V}^{*}\otimes\boldsymbol{x}^{*}\boldsymbol{U}^{*}]=\boldsymbol{0}^{*}, and so, 𝔼[𝒙𝑽𝒙𝑼]𝔼[𝑽𝒙𝑼𝒙]=0\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{V}^{*}\otimes\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\mathbb{E}\left[\boldsymbol{V}\boldsymbol{x}\otimes\boldsymbol{U}\boldsymbol{x}\right]=0.

Using the cyclic property of the trace operator, linearity of the trace and expectation operators, and the fact that 𝔼[𝒙𝒙]=𝑹\mathbb{E}[\boldsymbol{x}\boldsymbol{x}^{*}]=\boldsymbol{R}, we find that

E[𝒙𝑼𝔼[𝑼𝒙𝒙𝑽]𝑽𝒙]\displaystyle E\left[\boldsymbol{x}^{*}\boldsymbol{U}^{*}\mathbb{E}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{V}^{*}\right]\boldsymbol{V}\boldsymbol{x}\right] =𝔼[tr[𝔼[𝑼𝒙𝒙𝑽]𝑽𝒙𝒙𝑼]]\displaystyle=\mathbb{E}\left[\operatorname{tr}\left[\mathbb{E}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{V}^{*}\right]\boldsymbol{V}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\right]
=tr[𝔼[𝔼[𝑼𝒙𝒙𝑽]𝑽𝒙𝒙𝑼]]\displaystyle=\operatorname{tr}\left[\mathbb{E}\left[\mathbb{E}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{V}^{*}\right]\boldsymbol{V}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\right]
=tr[𝑼𝔼[𝒙𝒙]𝑽𝑽𝔼[𝒙𝒙]𝑼]\displaystyle=\operatorname{tr}\left[\boldsymbol{U}\mathbb{E}\left[\boldsymbol{x}\boldsymbol{x}^{*}\right]\boldsymbol{V}^{*}\boldsymbol{V}\mathbb{E}\left[\boldsymbol{x}\boldsymbol{x}^{*}\right]\boldsymbol{U}^{*}\right]
=tr[𝑼𝑹𝑽𝑽𝑹𝑼]\displaystyle=\operatorname{tr}\left[\boldsymbol{U}\boldsymbol{R}\boldsymbol{V}^{*}\boldsymbol{V}\boldsymbol{R}\boldsymbol{U}^{*}\right]
=𝑼𝑹𝑽F2.\displaystyle=\left\|\boldsymbol{U}\boldsymbol{R}\boldsymbol{V}^{*}\right\|_{F}^{2}.

Finally, since 𝔼[𝒙]=𝟎\mathbb{E}[\boldsymbol{x}]=\boldsymbol{0}, we have 𝔼[𝑼𝒙]=𝔼[𝑽𝒙]=𝟎\mathbb{E}[\boldsymbol{U}\boldsymbol{x}]=\mathbb{E}[\boldsymbol{V}\boldsymbol{x}]=\boldsymbol{0} and 𝔼[𝒙𝑼]=𝔼[𝒙𝑽]=𝟎\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{U}^{*}]=\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{V}^{*}]=\boldsymbol{0}^{*}. Hence,

2𝔼[𝒙𝑼]𝔼[𝑼𝒙]𝔼[𝒙𝑽]𝔼[𝑽𝒙]=0.2\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{U}^{*}]\mathbb{E}[\boldsymbol{U}\boldsymbol{x}]\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{V}^{*}]\mathbb{E}[\boldsymbol{V}\boldsymbol{x}]=0.

Adding these three terms yields,

Cov[𝑼𝒙22,𝑽𝒙22]\displaystyle\operatorname{Cov}\left[\left\|\boldsymbol{U}\boldsymbol{x}\right\|_{2}^{2},\left\|\boldsymbol{V}\boldsymbol{x}\right\|_{2}^{2}\right] =𝔼[𝒙𝑽𝒙𝑼]𝔼[𝑽𝒙𝑼𝒙]+𝔼[𝒙𝑼𝔼[𝑼𝒙𝒙𝑽]𝑽𝒙]\displaystyle=\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{V}^{*}\otimes\boldsymbol{x}^{*}\boldsymbol{U}^{*}\right]\mathbb{E}\left[\boldsymbol{V}\boldsymbol{x}\otimes\boldsymbol{U}\boldsymbol{x}\right]+\mathbb{E}\left[\boldsymbol{x}^{*}\boldsymbol{U}^{*}\mathbb{E}\left[\boldsymbol{U}\boldsymbol{x}\boldsymbol{x}^{*}\boldsymbol{V}^{*}\right]\boldsymbol{V}\boldsymbol{x}\right]
+2𝔼[𝒙𝑼]𝔼[𝑼𝒙]𝔼[𝒙𝑽]𝔼[𝑽𝒙].\displaystyle\ \ \ \ \ +2\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{U}^{*}]\mathbb{E}[\boldsymbol{U}\boldsymbol{x}]\mathbb{E}[\boldsymbol{x}^{*}\boldsymbol{V}^{*}]\mathbb{E}[\boldsymbol{V}\boldsymbol{x}].
=𝑼𝑹𝑽F2.\displaystyle=\left\|\boldsymbol{U}\boldsymbol{R}\boldsymbol{V}^{*}\right\|_{F}^{2}.

A.2 Concentration of norms of Gaussian random variables

Next, we state a result from [50] regarding concentration bounds for sums of independent exponential random variables.

Lemma 3.

[50] Let Z0,,ZN1Z_{0},\ldots,Z_{N-1} be independent exponential random variables with 𝔼[Zn]=μn\mathbb{E}[Z_{n}]=\mu_{n}. Then, the sum

Z:=n=0N1ZnZ:=\displaystyle\sum_{n=0}^{N-1}Z_{n}

satisfies

{Zβ𝔼[Z]}β1eκ(β1lnβ)forβ>1,\displaystyle\mathbb{P}\left\{Z\geq\beta\mathbb{E}[Z]\right\}\leq\beta^{-1}e^{-\kappa(\beta-1-\ln\beta)}\quad\text{for}\quad\beta>1,

and

{Zβ𝔼[Z]}eκ(β1lnβ)for0<β<1,\displaystyle\mathbb{P}\left\{Z\leq\beta\mathbb{E}[Z]\right\}\leq e^{-\kappa(\beta-1-\ln\beta)}\quad\text{for}\quad 0<\beta<1,

where

κ=n=0N1μnmaxn=0,,N1μn.\kappa=\dfrac{\sum\limits_{n=0}^{N-1}\mu_{n}}{\max\limits_{n=0,\ldots,N-1}\mu_{n}}.

We now apply this lemma to derive concentration bounds for 𝑨𝒙22\|\boldsymbol{A}\boldsymbol{x}\|_{2}^{2}, where 𝒙\boldsymbol{x} is a vector of Gaussian random variables and 𝑨\boldsymbol{A} is a matrix.

Lemma 4.

Let 𝐱𝒞𝒩(𝟎,𝐑)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}) for some positive semidefinite 𝐑N×N\boldsymbol{R}\in\mathbb{C}^{N\times N}. Also, let 𝐀K×N\boldsymbol{A}\in\mathbb{C}^{K\times N}. Then, the random variable 𝐀𝐱22\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2} satisfies

{𝑨𝒙22β𝔼[𝑨𝒙22]}β1eκ(β1lnβ)forβ>1,\mathbb{P}\left\{\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\geq\beta\mathbb{E}\left[\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\right]\right\}\leq\beta^{-1}e^{-\kappa(\beta-1-\ln\beta)}\quad\text{for}\quad\beta>1,

and

{𝑨𝒙22β𝔼[𝑨𝒙22]}eκ(β1lnβ)for0<β<1,\mathbb{P}\left\{\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\leq\beta\mathbb{E}\left[\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\right]\right\}\leq e^{-\kappa(\beta-1-\ln\beta)}\quad\text{for}\quad 0<\beta<1,

where

κ=tr[𝑨𝑹𝑨]𝑨𝑹𝑨.\kappa=\dfrac{\operatorname{tr}\left[\boldsymbol{A}\boldsymbol{R}\boldsymbol{A}^{*}\right]}{\left\|\boldsymbol{A}\boldsymbol{R}\boldsymbol{A}^{*}\right\|}.
Proof.

Since 𝒙𝒞𝒩(𝟎,𝑹)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}), we can write 𝒙=𝑹1/2𝒚\boldsymbol{x}=\boldsymbol{R}^{1/2}\boldsymbol{y} where 𝒚𝒞𝒩(𝟎,𝐈)\boldsymbol{y}\sim\mathcal{CN}(\boldsymbol{0},{\bf I}) and 𝑹1/2\boldsymbol{R}^{1/2} is the unique positive semidefinite squareroot of 𝑹\boldsymbol{R}. Then, using eigendecomposition, we can write 𝑹1/2𝑨𝑨𝑹1/2=𝑾𝑫𝑾\boldsymbol{R}^{1/2}\boldsymbol{A}^{*}\boldsymbol{A}\boldsymbol{R}^{1/2}=\boldsymbol{W}\boldsymbol{D}\boldsymbol{W}^{*} where 𝑾\boldsymbol{W} is unitary and 𝑫=diag(d1,,dn)\boldsymbol{D}=\operatorname{diag}(d_{1},\ldots,\,\mathrm{d}_{n}). Since 𝑾\boldsymbol{W} is unitary, 𝒛:=𝑾𝒚𝒞𝒩(𝟎,𝐈N)\boldsymbol{z}:=\boldsymbol{W}^{*}\boldsymbol{y}\sim\mathcal{CN}(\boldsymbol{0},{\bf I}_{N}). Then, we have:

𝑨𝒙22=𝒙𝑨𝑨𝒙=𝒚𝑹1/2𝑨𝑨𝑹1/2𝒚=𝒚𝑾𝑫𝑾𝒚=𝒛𝑫𝒛=n=0N1dn|𝒛[n]|2.\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}=\boldsymbol{x}^{*}\boldsymbol{A}^{*}\boldsymbol{A}\boldsymbol{x}=\boldsymbol{y}^{*}\boldsymbol{R}^{1/2}\boldsymbol{A}^{*}\boldsymbol{A}\boldsymbol{R}^{1/2}\boldsymbol{y}=\boldsymbol{y}^{*}\boldsymbol{W}\boldsymbol{D}\boldsymbol{W}^{*}\boldsymbol{y}=\boldsymbol{z}^{*}\boldsymbol{D}\boldsymbol{z}=\sum_{n=0}^{N-1}d_{n}\left|\boldsymbol{z}[n]\right|^{2}.

Since 𝒛𝒞𝒩(𝟎,𝐈N)\boldsymbol{z}\sim\mathcal{CN}(\boldsymbol{0},{\bf I}_{N}), we have that 𝒛[0],,𝒛[N1]\boldsymbol{z}[0],\ldots,\boldsymbol{z}[N-1] are i.i.d. 𝒞𝒩(0,1)\mathcal{CN}(0,1). Hence, dn|𝒛[n]|2Exp(dn)d_{n}|\boldsymbol{z}[n]|^{2}\sim\text{Exp}(d_{n}), and are independent. If we apply Lemma 3, the fact that the trace and operator norm are invariant under unitary similarity transforms, and the matrix identities tr[𝑿𝑿]=tr[𝑿𝑿]\operatorname{tr}[\boldsymbol{X}\boldsymbol{X}^{*}]=\operatorname{tr}[\boldsymbol{X}^{*}\boldsymbol{X}] and 𝑿𝑿=𝑿𝑿\|\boldsymbol{X}\boldsymbol{X}^{*}\|=\|\boldsymbol{X}^{*}\boldsymbol{X}\|, we obtain

{𝑨𝒙22β𝔼[𝑨𝒙22]}β1eκ(β1lnβ)forβ>1,\displaystyle\mathbb{P}\left\{\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\geq\beta\mathbb{E}\left[\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\right]\right\}\leq\beta^{-1}e^{-\kappa(\beta-1-\ln\beta)}\quad\text{for}\quad\beta>1,

and

{𝑨𝒙22β𝔼[𝑨𝒙22]}eκ(β1lnβ)for0<β<1,\displaystyle\mathbb{P}\left\{\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\leq\beta\mathbb{E}\left[\left\|\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}\right]\right\}\leq e^{-\kappa(\beta-1-\ln\beta)}\quad\text{for}\quad 0<\beta<1,

where

κ=n=0N1dnmaxn=0,,N1dn=tr[𝑫]𝑫=tr[𝑾𝑫𝑾]𝑾𝑫𝑾=tr[𝑹1/2𝑨𝑨𝑹1/2]𝑹1/2𝑨𝑨𝑹1/2=tr[𝑨𝑹𝑨]𝑨𝑹𝑨.\kappa=\dfrac{\sum\limits_{n=0}^{N-1}d_{n}}{\max\limits_{n=0,\ldots,N-1}d_{n}}=\dfrac{\operatorname{tr}[\boldsymbol{D}]}{\|\boldsymbol{D}\|}=\dfrac{\operatorname{tr}[\boldsymbol{W}\boldsymbol{D}\boldsymbol{W}^{*}]}{\|\boldsymbol{W}\boldsymbol{D}\boldsymbol{W}^{*}\|}=\dfrac{\operatorname{tr}\left[\boldsymbol{R}^{1/2}\boldsymbol{A}^{*}\boldsymbol{A}\boldsymbol{R}^{1/2}\right]}{\left\|\boldsymbol{R}^{1/2}\boldsymbol{A}^{*}\boldsymbol{A}\boldsymbol{R}^{1/2}\right\|}=\dfrac{\operatorname{tr}\left[\boldsymbol{A}\boldsymbol{R}\boldsymbol{A}^{*}\right]}{\left\|\boldsymbol{A}\boldsymbol{R}\boldsymbol{A}^{*}\right\|}.

We note that similar bounds can be obtained by applying the Hanson-Wright Inequality [51, 52].

A.3 Intermediate results

We continue by presenting a lemma showing that certain matrices can be represented as an integral of a frequency dependent rank-11 matrix.

Lemma 5.

For any frequency ff\in\mathbb{R}, define a complex sinusoid 𝐞fN\boldsymbol{e}_{f}\in\mathbb{C}^{N} by 𝐞f[n]=ej2πfn\boldsymbol{e}_{f}[n]=e^{j2\pi fn} for n=0,,N1n=0,\ldots,N-1. Then, we have:

𝑩=WW𝒆f𝒆f𝑑f,𝐈=1/21/2𝒆f𝒆f𝑑f,andΩ𝒆f𝒆f𝑑f=𝐈𝑩,\boldsymbol{B}=\int_{-W}^{W}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df,\quad{\bf I}=\int_{-1/2}^{1/2}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df,\quad\text{and}\quad\int_{\Omega}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df={\bf I}-\boldsymbol{B},

where Ω=[12,12][W,W]\Omega=[-\tfrac{1}{2},\tfrac{1}{2}]\setminus[-W,W]. Furthermore, if x(t)x(t) is a stationary, ergodic, zero-mean, Gaussian stochastic process x(t)x(t) with power spectral density S(f)S(f), and 𝐱N\boldsymbol{x}\in\mathbb{C}^{N} is a vector of equispaced samples 𝐱[n]=x(n)\boldsymbol{x}[n]=x(n) for n=0,,N1n=0,\ldots,N-1, then the covariance matrix of 𝐱\boldsymbol{x} can be written as

𝑹:=𝔼[𝒙𝒙]=1/21/2S(f)𝒆f𝒆f𝑑f.\boldsymbol{R}:=\mathbb{E}[\boldsymbol{x}\boldsymbol{x}^{*}]=\int_{-1/2}^{1/2}S(f)\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df.
Proof.

For any m,n=0,,N1m,n=0,\ldots,N-1, we have

WW𝒆f[m]𝒆f[n]¯𝑑f=WWej2πf(mn)𝑑f=sin[2πW(mn)]π(mn)=𝑩[m,n],\int_{-W}^{W}\boldsymbol{e}_{f}[m]\overline{\boldsymbol{e}_{f}[n]}\,df=\int_{-W}^{W}e^{j2\pi f(m-n)}\,df=\dfrac{\sin[2\pi W(m-n)]}{\pi(m-n)}=\boldsymbol{B}[m,n],

and

1/21/2𝒆f[m]𝒆f[n]¯𝑑f=1/21/2ej2πf(mn)𝑑f={1ifm=n0ifmn=𝐈[m,n].\int_{-1/2}^{1/2}\boldsymbol{e}_{f}[m]\overline{\boldsymbol{e}_{f}[n]}\,df=\int_{-1/2}^{1/2}e^{j2\pi f(m-n)}\,df=\begin{cases}1&\text{if}\ m=n\\ 0&\text{if}\ m\neq n\end{cases}={\bf I}[m,n].

We can put these into matrix form as

WW𝒆f𝒆f𝑑f=𝑩and1/21/2𝒆f𝒆f𝑑f=𝐈.\int_{-W}^{W}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df=\boldsymbol{B}\quad\text{and}\quad\int_{-1/2}^{1/2}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df={\bf I}.

From this, it follows that

Ω𝒆f𝒆f𝑑f=1/21/2𝒆f𝒆f𝑑fWW𝒆f𝒆f𝑑f=𝐈𝑩.\int_{\Omega}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df=\int_{-1/2}^{1/2}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df-\int_{-W}^{W}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df={\bf I}-\boldsymbol{B}.

Finally, using the definition of the power spectral density, we have

𝑹[m,n]=𝔼[𝒙[m]𝒙[n]¯]=1/21/2S(f)ej2πf(mn)𝑑f=1/21/2S(f)𝒆f[m]𝒆f[n]¯𝑑f\boldsymbol{R}[m,n]=\mathbb{E}[\boldsymbol{x}[m]\overline{\boldsymbol{x}[n]}]=\int_{-1/2}^{1/2}S(f)e^{j2\pi f(m-n)}\,df=\int_{-1/2}^{1/2}S(f)\boldsymbol{e}_{f}[m]\overline{\boldsymbol{e}_{f}[n]}\,df

for m,n=0,,N1m,n=0,\ldots,N-1. Again, we can put this into matrix form as

𝑹:=𝔼[𝒙𝒙]=1/21/2S(f)𝒆f𝒆f𝑑f.\boldsymbol{R}:=\mathbb{E}[\boldsymbol{x}\boldsymbol{x}^{*}]=\int_{-1/2}^{1/2}S(f)\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df.

Next, we show that the expectation of the multitaper spectral estimate is the convolution of the power spectral density S(f)S(f) with the spectral window ψ(f)\psi(f).

Lemma 6.

The expectation of the multitaper spectral estimate can be written as

𝔼[S^Kmt(f)]=1/21/2S(ff)ψ(f)𝑑f\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}

where

ψ(f):=1Ktr[𝑺K𝒆f𝒆f𝑺K]=1K𝑺K𝒆f22=1Kk=0K1|n=0N1𝒔k[n]ej2πfn|2\psi(f):=\displaystyle\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{-f}\boldsymbol{e}_{-f}^{*}\boldsymbol{S}_{K}\right]=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{-f}\right\|_{2}^{2}=\dfrac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]e^{-j2\pi fn}\right|^{2}

is the spectral window of the multitaper estimate.

Proof.

Since S^Kmt(f)=1K𝑺K𝑬f𝒙22\widehat{S}^{\text{mt}}_{K}(f)=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2} where 𝒙𝒞𝒩(𝟎,𝑹)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}), by lemma 2, we have

𝔼[S^Kmt(f)]=1Ktr[𝑺K𝑬f𝑹𝑬f𝑺K].\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right].

We can rewrite this expression as follows:

𝔼[S^Kmt(f)]\displaystyle\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right] =1Ktr[𝑺K𝑬f𝑹𝑬f𝑺K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right]
=1Ktr[𝑺K𝑬f(1/21/2S(f)𝒆f𝒆f𝑑f)𝑬f𝑺K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\left(\int_{-1/2}^{1/2}S(f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right]
=1K1/21/2S(f)tr[𝑺K𝑬f𝒆f𝒆f𝑬f𝑺K]𝑑f\displaystyle=\dfrac{1}{K}\int_{-1/2}^{1/2}S(f^{\prime})\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right]\,df^{\prime}
=1K1/21/2S(f)tr[𝑺K𝒆ff𝒆ff𝑺K]𝑑f\displaystyle=\dfrac{1}{K}\int_{-1/2}^{1/2}S(f^{\prime})\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f^{\prime}-f}\boldsymbol{e}_{f^{\prime}-f}^{*}\boldsymbol{S}_{K}\right]\,df^{\prime}
=1/21/2S(f)ψ(ff)𝑑f,\displaystyle=\int_{-1/2}^{1/2}S(f^{\prime})\psi(f-f^{\prime})\,df^{\prime},
=f1/2f+1/2S(ff)ψ(f)𝑑f\displaystyle=\int_{f-1/2}^{f+1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}
=1/21/2S(ff)ψ(f)𝑑f,\displaystyle=\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime},

where the last line follows since S(f)S(f) is 11-periodic (by definition) and ψ(f)=1K𝑺K𝒆f22\psi(f)=\dfrac{1}{K}\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{-f}\|_{2}^{2} is 11-periodic (because 𝒆f[n]=e2πfn\boldsymbol{e}_{-f}[n]=e^{-2\pi fn} is 11-periodic for all nn).

Finally, it’s easy to check that ψ(f)\psi(f) can be written in the following alternate forms:

ψ(f)=1Ktr[𝑺K𝒆f𝒆f𝑺K]=1K𝒆f𝑺K𝑺K𝒆f=1K𝑺K𝒆f22=1Kk=0K1|n=0N1𝒔k[n]ej2πfn|2.\psi(f)=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{-f}\boldsymbol{e}_{-f}^{*}\boldsymbol{S}_{K}\right]=\dfrac{1}{K}\boldsymbol{e}_{-f}^{*}\boldsymbol{S}_{K}\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{-f}=\dfrac{1}{K}\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{-f}\|_{2}^{2}=\dfrac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]e^{-j2\pi fn}\right|^{2}.

Lemma 7.

The spectral window ψ(f)\psi(f) defined in Lemma 6, satisfies the following properties:

  • ψ(f)=ψ(f)\psi(f)=\psi(-f) for all ff\in\mathbb{R}

  • WWψ(f)𝑑f=1ΣK(1)\displaystyle\int_{-W}^{W}\psi(f)\,df=1-\Sigma^{(1)}_{K} and Ωψ(f)𝑑f=ΣK(1)\displaystyle\int_{\Omega}\psi(f)\,df=\Sigma^{(1)}_{K} where Ω=[12,12][W,W]\Omega=[-\tfrac{1}{2},\tfrac{1}{2}]\setminus[-W,W]

  • 0ψ(f)NK0\leq\psi(f)\leq\dfrac{N}{K}

Proof.

First, the spectral window is an even function since

ψ(f)=1Kk=0K1|n=0N1𝒔k[n]ej2πfn|2=1Kk=0K1|n=0N1𝒔k[n]ej2πfn|2=ψ(f)\psi(-f)=\dfrac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]e^{j2\pi fn}\right|^{2}=\dfrac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]e^{-j2\pi fn}\right|^{2}=\psi(f)

for all ff\in\mathbb{R}.

As a consequence, we can set 𝑺K=[𝒔0𝒔1𝒔K1]N×K\boldsymbol{S}_{K}=\begin{bmatrix}\boldsymbol{s}_{0}&\boldsymbol{s}_{1}&\cdots&\boldsymbol{s}_{K-1}\end{bmatrix}\in\mathbb{R}^{N\times K} and write ψ(f)=ψ(f)=1Ktr[𝑺K𝒆f𝒆f𝑺K]=1K𝑺K𝒆f22\psi(f)=\psi(-f)=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\boldsymbol{S}_{K}\right]=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\right\|_{2}^{2}. Then, the integral of the spectral window ψ(f)\psi(f) over [W,W][-W,W] is

WWψ(f)𝑑f\displaystyle\int_{-W}^{W}\psi(f)\,df =WW1Ktr[𝑺K𝒆f𝒆f𝑺K]𝑑f\displaystyle=\int_{-W}^{W}\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\boldsymbol{S}_{K}\right]\,df
=1Ktr[𝑺K(WW𝒆f𝒆f𝑑f)𝑺K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df\right)\boldsymbol{S}_{K}\right]
=1Ktr[𝑺K𝑩𝑺K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{B}\boldsymbol{S}_{K}\right]
=1Ktr[𝚲K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{\Lambda}_{K}\right]
=1Kk=0K1λk\displaystyle=\dfrac{1}{K}\sum_{k=0}^{K-1}\lambda_{k}
=11Kk=0K1(1λk)\displaystyle=1-\dfrac{1}{K}\sum_{k=0}^{K-1}(1-\lambda_{k})
=1ΣK(1),\displaystyle=1-\Sigma_{K}^{(1)},

where we have used the notation 𝚲K=diag(λ0,,λK1)\boldsymbol{\Lambda}_{K}=\operatorname{diag}(\lambda_{0},\ldots,\lambda_{K-1}). Similarly, the integral of the spectral window ψ(f)\psi(f) over Ω=[12,W)(W,12]\Omega=[-\tfrac{1}{2},-W)\cup(W,\tfrac{1}{2}] is

Ωψ(f)𝑑f\displaystyle\int_{\Omega}\psi(f)\,df =Ω1Ktr[𝑺K𝒆f𝒆f𝑺K]𝑑f\displaystyle=\int_{\Omega}\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\boldsymbol{S}_{K}\right]\,df
=1Ktr[𝑺K(Ω𝒆f𝒆f𝑑f)𝑺K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df\right)\boldsymbol{S}_{K}\right]
=1Ktr[𝑺K(𝐈𝑩)𝑺K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}({\bf I}-\boldsymbol{B})\boldsymbol{S}_{K}\right]
=1Ktr[𝐈𝚲K]\displaystyle=\dfrac{1}{K}\operatorname{tr}\left[{\bf I}-\boldsymbol{\Lambda}_{K}\right]
=1Kk=0K1(1λk)\displaystyle=\dfrac{1}{K}\sum_{k=0}^{K-1}(1-\lambda_{k})
=ΣK(1).\displaystyle=\Sigma_{K}^{(1)}.

Finally, the spectral window is bounded by

0ψ(f)=1K𝑺K𝒆f221K𝑺K2𝒆f22=NK,0\leq\psi(f)=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\right\|_{2}^{2}\leq\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\right\|^{2}\left\|\boldsymbol{e}_{f}\right\|_{2}^{2}=\dfrac{N}{K},

where we have used the facts that 𝑺K=1\|\boldsymbol{S}_{K}^{*}\|=1 (because 𝑺K\boldsymbol{S}_{K} is orthonormal) and 𝒆f22=N\left\|\boldsymbol{e}_{f}\right\|_{2}^{2}=N. ∎

A.4 Proof of Theorem 1

By Lemma 6, we have

𝔼[S^Kmt(f)]=1/21/2S(ff)ψ(f)𝑑f.\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}.

We now split the expression for the bias into two pieces as follows:

Bias[S^Kmt(f)]\displaystyle\operatorname{Bias}\left[\widehat{S}^{\text{mt}}_{K}(f)\right] =|𝔼S^Kmt(f)S(f)|\displaystyle=\left|\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f)\right|
=|1/21/2S(ff)ψ(f)𝑑fS(f)|\displaystyle=\left|\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f)\right|
=|WWS(ff)ψ(f)𝑑f+ΩS(ff)ψ(f)𝑑fS(f)|\displaystyle=\left|\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}+\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f)\right|
|WWS(ff)ψ(f)𝑑fS(f)|local bias+|ΩS(ff)ψ(f)𝑑f|broadband bias.\displaystyle\leq\underbrace{\left|\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f)\right|}_{\text{local bias}}+\underbrace{\left|\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}\right|}_{\text{broadband bias}}.

Since S(f)S(f) is twice continuously differentiable, for any f[W,W]f^{\prime}\in[-W,W], there exists a ξf\xi_{f^{\prime}} between fff-f^{\prime} and ff such that

S(ff)=S(f)S(f)f+12S′′(ξf)f2.S(f-f^{\prime})=S(f)-S^{\prime}(f)f^{\prime}+\dfrac{1}{2}S^{\prime\prime}(\xi_{f^{\prime}})f^{\prime 2}.

Then, since |S′′(ξf)|maxξ[fW,f+W]|S′′(ξ)|=Mf′′\left|S^{\prime\prime}(\xi_{f^{\prime}})\right|\leq\displaystyle\max_{\xi\in[f-W,f+W]}|S^{\prime\prime}(\xi)|=M^{\prime\prime}_{f}, WWψ(f)𝑑f=1ΣK(1)\int_{-W}^{W}\psi(f)\,df=1-\Sigma^{(1)}_{K}, and 0ψ(f)NK0\leq\psi(f^{\prime})\leq\tfrac{N}{K} for all ff^{\prime}\in\mathbb{R}, we can bound the local bias as follows:

|WWS(ff)ψ(f)𝑑fS(f)|\displaystyle\left|\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f)\right| =|WW(S(f)S(f)f+12S′′(ξf)f2)ψ(f)𝑑fS(f)|\displaystyle=\left|\int_{-W}^{W}\left(S(f)-S^{\prime}(f)f^{\prime}+\dfrac{1}{2}S^{\prime\prime}(\xi_{f^{\prime}})f^{\prime 2}\right)\psi(f^{\prime})\,df^{\prime}-S(f)\right|
=|WWS(f)ψ(f)𝑑fS(f)WWS(f)fψ(f)odd w.r.t.f𝑑f+12WWS′′(ξf)f2ψ(f)𝑑f|\displaystyle=\left|\int_{-W}^{W}S(f)\psi(f^{\prime})\,df^{\prime}-S(f)-\int_{-W}^{W}\underbrace{S^{\prime}(f)f^{\prime}\psi(f^{\prime})}_{\text{odd w.r.t.}\ f^{\prime}}\,df^{\prime}+\dfrac{1}{2}\int_{-W}^{W}S^{\prime\prime}(\xi_{f^{\prime}})f^{\prime 2}\psi(f^{\prime})\,df^{\prime}\right|
=|WWS(f)ψ(f)𝑑fS(f)0+12WWS′′(ξf)f2ψ(f)𝑑f|\displaystyle=\left|\int_{-W}^{W}S(f)\psi(f^{\prime})\,df^{\prime}-S(f)-0+\dfrac{1}{2}\int_{-W}^{W}S^{\prime\prime}(\xi_{f^{\prime}})f^{\prime 2}\psi(f^{\prime})\,df^{\prime}\right|
|WWS(f)ψ(f)𝑑fS(f)|+|12WWS′′(ξf)f2ψ(f)𝑑f|\displaystyle\leq\left|\int_{-W}^{W}S(f)\psi(f^{\prime})\,df^{\prime}-S(f)\right|+\left|\dfrac{1}{2}\int_{-W}^{W}S^{\prime\prime}(\xi_{f^{\prime}})f^{\prime 2}\psi(f^{\prime})\,df^{\prime}\right|
=|S(f)(1ΣK(1))S(f)|+|12WWS′′(ξf)f2ψ(f)𝑑f|\displaystyle=\left|S(f)(1-\Sigma_{K}^{(1)})-S(f)\right|+\left|\dfrac{1}{2}\int_{-W}^{W}S^{\prime\prime}(\xi_{f^{\prime}})f^{\prime 2}\psi(f^{\prime})\,df^{\prime}\right|
S(f)ΣK(1)+12WW|S′′(ξf)||f|2ψ(f)𝑑f\displaystyle\leq S(f)\Sigma_{K}^{(1)}+\dfrac{1}{2}\int_{-W}^{W}|S^{\prime\prime}(\xi_{f^{\prime}})||f^{\prime}|^{2}\psi(f^{\prime})\,df^{\prime}
S(f)ΣK(1)+12WWMf′′|f|2NK𝑑f\displaystyle\leq S(f)\Sigma_{K}^{(1)}+\dfrac{1}{2}\int_{-W}^{W}M^{\prime\prime}_{f}|f^{\prime}|^{2}\dfrac{N}{K}\,df^{\prime}
=S(f)ΣK(1)+Mf′′NW33K\displaystyle=S(f)\Sigma_{K}^{(1)}+\dfrac{M^{\prime\prime}_{f}NW^{3}}{3K}
MfΣK(1)+Mf′′NW33K\displaystyle\leq M_{f}\Sigma_{K}^{(1)}+\dfrac{M^{\prime\prime}_{f}NW^{3}}{3K}

Since S(f)maxfS(f)=MS(f^{\prime})\leq\displaystyle\max_{f^{\prime}\in\mathbb{R}}S(f^{\prime})=M, we can bound the broadband bias as follows:

|ΩS(ff)ψ(f)𝑑f|\displaystyle\left|\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}\right| =ΩS(ff)ψ(f)𝑑f\displaystyle=\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}
ΩMψ(f)𝑑f\displaystyle\leq\int_{\Omega}M\psi(f^{\prime})\,df^{\prime}
=MΣK(1)\displaystyle=M\Sigma_{K}^{(1)}

Combining the bounds on the local bias and broadband bias yields

Bias[S^Kmt(f)]\displaystyle\operatorname{Bias}\left[\widehat{S}^{\text{mt}}_{K}(f)\right] |WWS(ff)ψ(f)𝑑fS(f)|local bias+|ΩS(ff)ψ(f)𝑑f|broadband bias\displaystyle\leq\underbrace{\left|\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f)\right|}_{\text{local bias}}+\underbrace{\left|\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}\right|}_{\text{broadband bias}}
MfΣK(1)+Mf′′NW33K+MΣK(1)\displaystyle\leq M_{f}\Sigma_{K}^{(1)}+\dfrac{M^{\prime\prime}_{f}NW^{3}}{3K}+M\Sigma_{K}^{(1)}
=Mf′′NW33K+(M+Mf)ΣK(1),\displaystyle=\dfrac{M^{\prime\prime}_{f}NW^{3}}{3K}+(M+M_{f})\Sigma_{K}^{(1)},

which establishes Theorem 1.

A.5 Proof of Theorem 2

Without the assumption that S(f)S(f) is twice differentiable, we can still obtain a bound on the bias. Using the bounds mf=minξ[fW,f+W]S(ξ)S(f)maxξ[fW,f+W]S(ξ)=Mfm_{f}=\displaystyle\min_{\xi\in[f-W,f+W]}S(\xi)\leq S(f^{\prime})\leq\max_{\xi\in[f-W,f+W]}S(\xi)=M_{f} and 0S(f)maxξS(ξ)=M0\leq S(f^{\prime})\leq\max_{\xi\in\mathbb{R}}S(\xi)=M along with the integrals WWψ(f)𝑑f=1ΣK(1)\int_{-W}^{W}\psi(f)\,df=1-\Sigma^{(1)}_{K} and Ωψ(f)𝑑f=ΣK(1)\int_{\Omega}\psi(f)\,df=\Sigma^{(1)}_{K}, we can obtain the following upper bound on 𝔼S^Kmt(f)S(f)\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f):

𝔼S^Kmt(f)S(f)\displaystyle\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f) =1/21/2S(ff)ψ(f)𝑑fS(f)\displaystyle=\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f^{\prime})
=WWS(ff)ψ(f)𝑑f+ΩS(ff)ψ(f)𝑑fS(f)\displaystyle=\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}+\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f^{\prime})
WWMfψ(f)𝑑f+ΩMψ(f)𝑑fmf\displaystyle\leq\int_{-W}^{W}M_{f}\psi(f^{\prime})\,df^{\prime}+\int_{\Omega}M\psi(f^{\prime})\,df^{\prime}-m_{f}
=Mf(1ΣK(1))+MΣK(1)mf\displaystyle=M_{f}(1-\Sigma^{(1)}_{K})+M\Sigma^{(1)}_{K}-m_{f}
=(Mfmf)(1ΣK(1))+(Mmf)ΣK(1)\displaystyle=(M_{f}-m_{f})(1-\Sigma^{(1)}_{K})+(M-m_{f})\Sigma^{(1)}_{K}
(Mfmf)(1ΣK(1))+MΣK(1),\displaystyle\leq(M_{f}-m_{f})(1-\Sigma^{(1)}_{K})+M\Sigma^{(1)}_{K},

Similarly, we can obtain the following lower bound on 𝔼S^Kmt(f)S(f)\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f):

𝔼S^Kmt(f)S(f)\displaystyle\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f) =1/21/2S(ff)ψ(f)𝑑fS(f)\displaystyle=\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f^{\prime})
=WWS(ff)ψ(f)𝑑f+ΩS(ff)ψ(f)𝑑fS(f)\displaystyle=\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}+\int_{\Omega}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}-S(f^{\prime})
WWmfψ(f)𝑑f+Ω0ψ(f)𝑑fMf\displaystyle\geq\int_{-W}^{W}m_{f}\psi(f^{\prime})\,df^{\prime}+\int_{\Omega}0\psi(f^{\prime})\,df^{\prime}-M_{f}
=mf(1ΣK(1))+0Mf\displaystyle=m_{f}(1-\Sigma^{(1)}_{K})+0-M_{f}
=(Mfmf)(1ΣK(1))MfΣK(1)\displaystyle=-(M_{f}-m_{f})(1-\Sigma^{(1)}_{K})-M_{f}\Sigma^{(1)}_{K}
(Mfmf)(1ΣK(1))MΣK(1).\displaystyle\geq-(M_{f}-m_{f})(1-\Sigma^{(1)}_{K})-M\Sigma^{(1)}_{K}.

From the above two bounds, we have

Bias[S^Kmt(f)]=|𝔼S^Kmt(f)S(f)|(Mfmf)(1ΣK(1))+MΣK(1),\operatorname{Bias}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\left|\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)-S(f)\right|\leq(M_{f}-m_{f})(1-\Sigma_{K}^{(1)})+M\Sigma_{K}^{(1)},

which establishes Theorem 2.

A.6 Proof of Theorem 3

Since S^Kmt(f)=1K𝑺K𝑬f𝒙22\widehat{S}^{\text{mt}}_{K}(f)=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2} where 𝒙𝒞𝒩(𝟎,𝑹)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}), by Lemma 2, we have

Var[S^Kmt(f)]=Var[1K𝑺K𝑬f𝒙22]=1K2Cov[𝑺K𝑬f𝒙22,𝑺K𝑬f𝒙22]=1K2𝑺K𝑬f𝑹𝑬f𝑺KF2.\operatorname{Var}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\operatorname{Var}\left[\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2}\right]=\dfrac{1}{K^{2}}\operatorname{Cov}\left[\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2},\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2}\right]=\dfrac{1}{K^{2}}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\|_{F}^{2}.

We focus on bounding the Frobenius norm of 𝑺K𝑬f𝑹𝑬f𝑺K\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}. To do this, we first split it into two pieces - an integral over [W,W][-W,W] and an integral over Ω=[12,12][W,W]\Omega=[-\tfrac{1}{2},\tfrac{1}{2}]\setminus[-W,W]:

𝑺K𝑬f𝑹𝑬f𝑺K\displaystyle\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K} =𝑺K𝑬f(1/21/2S(f)𝒆f𝒆f𝑑f)𝑬f𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\left(\int_{-1/2}^{1/2}S(f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{E}_{f}\boldsymbol{S}_{K}
=𝑺K(1/21/2S(f)𝑬f𝒆f𝒆f𝑬f𝑑f)𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{-1/2}^{1/2}S(f^{\prime})\boldsymbol{E}_{f}^{*}\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\boldsymbol{E}_{f}\,df^{\prime}\right)\boldsymbol{S}_{K}
=𝑺K(1/21/2S(f)𝒆ff𝒆ff𝑑f)𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{-1/2}^{1/2}S(f^{\prime})\boldsymbol{e}_{f^{\prime}-f}\boldsymbol{e}_{f^{\prime}-f}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}
=𝑺K(f1/2f+1/2S(f+f)𝒆f𝒆f𝑑f)𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{f-1/2}^{f+1/2}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}
=𝑺K(1/21/2S(f+f)𝒆f𝒆f𝑑f)𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{-1/2}^{1/2}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}
=𝑺K(WWS(f+f)𝒆f𝒆f𝑑f+ΩS(f+f)𝒆f𝒆f𝑑f)𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}+\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}
=𝑺K(WWS(f+f)𝒆f𝒆f𝑑f)𝑺K+𝑺K(ΩS(f+f)𝒆f𝒆f𝑑f)𝑺K.\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}+\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}.

We will proceed by bounding the Frobenius norm of the two pieces above, and then applying the triangle inequality. Since S(f)maxfS(f)=MS(f)\leq\displaystyle\max_{f\in\mathbb{R}}S(f)=M for all ff\in\mathbb{R}, we trivially have

𝑺K(ΩS(f+f)𝒆f𝒆f𝑑f)𝑺K𝑺K(ΩM𝒆f𝒆f𝑑f)𝑺K=𝑺K[M(𝐈𝑩)]𝑺K=M(𝐈𝚲K).\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}\preceq\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}M\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}=\boldsymbol{S}_{K}^{*}[M({\bf I}-\boldsymbol{B})]\boldsymbol{S}_{K}=M({\bf I}-\boldsymbol{\Lambda}_{K}).

Then, since 𝑷𝑸\boldsymbol{P}\preceq\boldsymbol{Q} implies 𝑷F𝑸F\|\boldsymbol{P}\|_{F}\leq\|\boldsymbol{Q}\|_{F}, we have

𝑺K(ΩS(f+f)𝒆f𝒆f𝑑f)𝑺KFM(𝐈𝚲K)F=k=0K1M2(1λk)2=MKΣK(2).\left\|\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}\right\|_{F}\leq\left\|M({\bf I}-\boldsymbol{\Lambda}_{K})\right\|_{F}=\sqrt{\sum_{k=0}^{K-1}M^{2}(1-\lambda_{k})^{2}}=M\sqrt{K}\Sigma^{(2)}_{K}.

Obtaining a good bound on the first piece requires a more intricate argument. We define 𝟙W(f)=1\mathbbm{1}_{W}(f)=1 if f[W,W]f\in[-W,W] and 𝟙W(f)=0\mathbbm{1}_{W}(f)=0 if f[12,12][W,W]f\in[-\tfrac{1}{2},\tfrac{1}{2}]\setminus[-W,W]. For convenience, we also extend 𝟙W(f)\mathbbm{1}_{W}(f) to ff\in\mathbb{R} such that 𝟙W(f)\mathbbm{1}_{W}(f) is 11-periodic. With this definition, we can write

WWS(f+f)𝒆f𝒆f𝑑f\displaystyle\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime} =1/21/2S(f+f)𝟙W(f)𝒆f𝒆f𝑑f\displaystyle=\int_{-1/2}^{1/2}S(f+f^{\prime})\mathbbm{1}_{W}(f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}
=01S(f+f)𝟙W(f)𝒆f𝒆f𝑑f\displaystyle=\int_{0}^{1}S(f+f^{\prime})\mathbbm{1}_{W}(f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}
==0N1N+1NS(f+f)𝟙W(f)𝒆f𝒆f𝑑f\displaystyle=\sum_{\ell=0}^{N-1}\int_{\tfrac{\ell}{N}}^{\tfrac{\ell+1}{N}}S(f+f^{\prime})\mathbbm{1}_{W}(f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}
==0N101NS(f+f+N)𝟙W(f+N)𝒆f+N𝒆f+N𝑑f\displaystyle=\sum_{\ell=0}^{N-1}\int_{0}^{\tfrac{1}{N}}S(f+f^{\prime}+\tfrac{\ell}{N})\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}^{*}\,df^{\prime}
=01N=0N1S(f+f+N)𝟙W(f+N)𝒆f+N𝒆f+Ndf,\displaystyle=\int_{0}^{\tfrac{1}{N}}\sum_{\ell=0}^{N-1}S(f+f^{\prime}+\tfrac{\ell}{N})\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}^{*}\,df^{\prime},

where the second line holds since S(f)S(f), 𝟙W(f)\mathbbm{1}_{W}(f), and 𝒆f\boldsymbol{e}_{f} are 11-periodic. Now, for any ff^{\prime}\in\mathbb{R}, the vectors {1N𝒆f+N}=0N1\left\{\tfrac{1}{\sqrt{N}}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}\right\}_{\ell=0}^{N-1} form an orthonormal basis of N\mathbb{C}^{N}. Hence, we have

=0N1a𝒆f+N𝒆f+NF2=N2=0N1|a|2\left\|\sum_{\ell=0}^{N-1}a_{\ell}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}^{*}\right\|_{F}^{2}=N^{2}\sum_{\ell=0}^{N-1}|a_{\ell}|^{2}

for any choice of coefficients {a}=0N1\{a_{\ell}\}_{\ell=0}^{N-1} and offset frequency ff^{\prime}\in\mathbb{R}. By applying this formula, along with the triangle inequality and the Cauchy-Shwarz Integral inequality, we obtain

WWS(f+f)𝒆f𝒆f𝑑fF2\displaystyle\left\|\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right\|_{F}^{2} =01N=0N1S(f+f+N)𝟙W(f+N)𝒆f+N𝒆f+NdfF2\displaystyle=\left\|\int_{0}^{\tfrac{1}{N}}\sum_{\ell=0}^{N-1}S(f+f^{\prime}+\tfrac{\ell}{N})\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}^{*}\,df^{\prime}\right\|_{F}^{2}
(01N=0N1S(f+f+N)𝟙W(f+N)𝒆f+N𝒆f+NF𝑑f)2\displaystyle\leq\left(\int_{0}^{\tfrac{1}{N}}\left\|\sum_{\ell=0}^{N-1}S(f+f^{\prime}+\tfrac{\ell}{N})\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}\boldsymbol{e}_{f^{\prime}+\tfrac{\ell}{N}}^{*}\right\|_{F}\,df^{\prime}\right)^{2}
=(01NN(=0N1S(f+f+N)2𝟙W(f+N)2)1/2𝑑f)2\displaystyle=\left(\int_{0}^{\tfrac{1}{N}}N\left(\sum_{\ell=0}^{N-1}S(f+f^{\prime}+\tfrac{\ell}{N})^{2}\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})^{2}\right)^{1/2}\,df^{\prime}\right)^{2}
(01NN2𝑑f)(01N=0N1S(f+f+N)2𝟙W(f+N)2df)\displaystyle\leq\left(\int_{0}^{\tfrac{1}{N}}N^{2}\,df^{\prime}\right)\left(\int_{0}^{\tfrac{1}{N}}\sum_{\ell=0}^{N-1}S(f+f^{\prime}+\tfrac{\ell}{N})^{2}\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})^{2}\,df^{\prime}\right)
=N=0N101NS(f+f+N)2𝟙W(f+N)2𝑑f\displaystyle=N\sum_{\ell=0}^{N-1}\int_{0}^{\tfrac{1}{N}}S(f+f^{\prime}+\tfrac{\ell}{N})^{2}\mathbbm{1}_{W}(f^{\prime}+\tfrac{\ell}{N})^{2}\,df^{\prime}
=N=0N1N+1NS(f+f)2𝟙W(f)2𝑑f\displaystyle=N\sum_{\ell=0}^{N-1}\int_{\tfrac{\ell}{N}}^{\tfrac{\ell+1}{N}}S(f+f^{\prime})^{2}\mathbbm{1}_{W}(f^{\prime})^{2}\,df^{\prime}
=N01S(f+f)2𝟙W(f)2𝑑f\displaystyle=N\int_{0}^{1}S(f+f^{\prime})^{2}\mathbbm{1}_{W}(f^{\prime})^{2}\,df^{\prime}
=N1/21/2S(f+f)2𝟙W(f)2𝑑f\displaystyle=N\int_{-1/2}^{1/2}S(f+f^{\prime})^{2}\mathbbm{1}_{W}(f^{\prime})^{2}\,df^{\prime}
=NWWS(f+f)2𝑑f\displaystyle=N\int_{-W}^{W}S(f+f^{\prime})^{2}\,df^{\prime}
=2NWRf2\displaystyle=2NWR_{f}^{2}

Since 𝑺KN×K\boldsymbol{S}_{K}\in\mathbb{R}^{N\times K} is orthonormal, 𝑺K𝑿𝑺KF𝑿F\|\boldsymbol{S}_{K}^{*}\boldsymbol{X}\boldsymbol{S}_{K}\|_{F}\leq\|\boldsymbol{X}\|_{F} for any Hermitian matrix 𝑿N×N\boldsymbol{X}\in\mathbb{C}^{N\times N}. Hence,

𝑺KWWS(f+f)𝒆f𝒆f𝑑f𝑺KFWWS(f+f)𝒆f𝒆f𝑑fFRf2NW.\left\|\boldsymbol{S}_{K}^{*}\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\boldsymbol{S}_{K}\right\|_{F}\leq\left\|\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right\|_{F}\leq R_{f}\sqrt{2NW}.

Finally, by applying the two bounds we’ve derived, we obtain

𝑺K𝑬f𝑹𝑬f𝑺KF\displaystyle\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\|_{F} =𝑺K(WWS(f+f)𝒆f𝒆f𝑑f)𝑺K+𝑺K(ΩS(f+f)𝒆f𝒆f𝑑f)𝑺KF\displaystyle=\left\|\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}+\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}\right\|_{F}
𝑺K(WWS(f+f)𝒆f𝒆f𝑑f)𝑺KF+𝑺K(ΩS(f+f)𝒆f𝒆f𝑑f)𝑺KF\displaystyle\leq\left\|\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}\right\|_{F}+\left\|\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}\right\|_{F}
Rf2NW+MKΣK(2),\displaystyle\leq R_{f}\sqrt{2NW}+M\sqrt{K}\Sigma^{(2)}_{K},

and thus,

Var[S^Kmt(f)]=1K2𝑺K𝑬f𝑹𝑬f𝑺KF21K(Rf2NWK+MΣK(2))2.\operatorname{Var}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]=\dfrac{1}{K^{2}}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\|_{F}^{2}\leq\dfrac{1}{K}\left(R_{f}\sqrt{\dfrac{2NW}{K}}+M\Sigma^{(2)}_{K}\right)^{2}.

A.7 Proof of Theorem 4

Since S^Kmt(f)=1K𝑺K𝑬f𝒙22\widehat{S}^{\text{mt}}_{K}(f)=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2} where 𝒙𝒞𝒩(𝟎,𝑹)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}), by Lemma 2, we have

Cov[S^Kmt(f1),S^Kmt(f2)]=Cov[1K𝑺K𝑬f1𝒙22,1K𝑺K𝑬f2𝒙22]=1K2𝑺K𝑬f1𝑹𝑬f2𝑺KF2.\operatorname{Cov}\left[\widehat{S}^{\text{mt}}_{K}(f_{1}),\widehat{S}^{\text{mt}}_{K}(f_{2})\right]=\operatorname{Cov}\left[\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{x}\right\|_{2}^{2},\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{2}}^{*}\boldsymbol{x}\right\|_{2}^{2}\right]=\dfrac{1}{K^{2}}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{R}\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K}\right\|_{F}^{2}.

We focus on bounding the Frobenius norm of 𝑺K𝑬f1𝑹𝑬f2𝑺K\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{R}\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K}. To do this, we first split it into three pieces - an integral over [f1W,f1+W][f_{1}-W,f_{1}+W], an integral over [f2W,f2+W][f_{2}-W,f_{2}+W], and an integral over Ω=[12,12]([f1W,f1+W][f2W,f2+W])\Omega^{\prime}=[-\tfrac{1}{2},\tfrac{1}{2}]\setminus([f_{1}-W,f_{1}+W]\cup[f_{2}-W,f_{2}+W]):

𝑺K𝑬f1𝑹𝑬f2𝑺K\displaystyle\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{R}\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K} =𝑺K𝑬f1(1/21/2S(f)𝒆f𝒆f𝑑f)𝑬f2𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\left(\int_{-1/2}^{1/2}S(f)\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\,df\right)\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K}
=1/21/2S(f)𝑺K𝑬f1𝒆f𝒆f𝑬f2𝑺K𝑑f\displaystyle=\int_{-1/2}^{1/2}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{e}_{f}\boldsymbol{e}_{f}^{*}\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K}\,df
=1/21/2S(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑f\displaystyle=\int_{-1/2}^{1/2}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df
=f1Wf1+WS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑f+f2Wf2+WS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑f\displaystyle=\int_{f_{1}-W}^{f_{1}+W}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df+\int_{f_{2}-W}^{f_{2}+W}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df
+ΩS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑f.\displaystyle\ \ \ \ \ +\int_{\Omega^{\prime}}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df.

By using the triangle inequality, the identity 𝒙𝒚F=𝒙2𝒚2\|\boldsymbol{x}\boldsymbol{y}^{*}\|_{F}=\|\boldsymbol{x}\|_{2}\|\boldsymbol{y}\|_{2} for vectors 𝒙,𝒚\boldsymbol{x},\boldsymbol{y}, the Cauchy-Shwarz Inequality, and the facts that ψ(f)NK\psi(f)\leq\tfrac{N}{K} and Ωψ(f)𝑑f=ΣK(1)\int_{\Omega}\psi(f)\,df=\Sigma^{(1)}_{K}, we can bound the Frobenius norm of the first piece by

f1Wf1+WS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑fF2\displaystyle\left\|\int_{f_{1}-W}^{f_{1}+W}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df\right\|_{F}^{2} (f1Wf1+WS(f)𝑺K𝒆ff1𝒆ff2𝑺KF𝑑f)2\displaystyle\leq\left(\int_{f_{1}-W}^{f_{1}+W}\left\|S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\right\|_{F}\,df\right)^{2}
(f1Wf1+WS(f)𝑺K𝒆ff12𝑺K𝒆ff22𝑑f)2\displaystyle\leq\left(\int_{f_{1}-W}^{f_{1}+W}S(f)\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\right\|_{2}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{2}}\right\|_{2}\,df\right)^{2}
(f1Wf1+WS(f)2𝑺K𝒆ff122𝑑f)(f1Wf1+W𝑺K𝒆ff222𝑑f)\displaystyle\leq\left(\int_{f_{1}-W}^{f_{1}+W}S(f)^{2}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\right\|_{2}^{2}\,df\right)\left(\int_{f_{1}-W}^{f_{1}+W}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{2}}\right\|_{2}^{2}\,df\right)
=(WWS(f+f1)2𝑺K𝒆f22𝑑f)(f1f2Wf1f2+W𝑺K𝒆f22𝑑f)\displaystyle=\left(\int_{-W}^{W}S(f+f_{1})^{2}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\right\|_{2}^{2}\,df\right)\left(\int_{f_{1}-f_{2}-W}^{f_{1}-f_{2}+W}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f}\right\|_{2}^{2}\,df\right)
=(WWS(f+f1)2Kψ(f)𝑑f)(f1f2Wf1f2+WKψ(f)𝑑f)\displaystyle=\left(\int_{-W}^{W}S(f+f_{1})^{2}K\psi(f)\,df\right)\left(\int_{f_{1}-f_{2}-W}^{f_{1}-f_{2}+W}K\psi(f)\,df\right)
(WWS(f+f1)2N𝑑f)(ΩKψ(f)𝑑f)\displaystyle\leq\left(\int_{-W}^{W}S(f+f_{1})^{2}\cdot N\,df\right)\left(\int_{\Omega}K\psi(f)\,df\right)
=2NWRf12KΣK(1)\displaystyle=2NWR_{f_{1}}^{2}\cdot K\Sigma^{(1)}_{K}
=Rf122NWKΣK(1)\displaystyle=R_{f_{1}}^{2}\cdot 2NWK\Sigma^{(1)}_{K}

In a nearly identical manner, we can bound the second piece by

f2Wf2+WS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑fF2Rf222NWKΣK(1).\left\|\int_{f_{2}-W}^{f_{2}+W}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df\right\|_{F}^{2}\leq R_{f_{2}}^{2}\cdot 2NWK\Sigma^{(1)}_{K}.

The third piece can also be bounded in a similar manner, but the details are noticeably different, so we show the derivation:

ΩS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑fF2\displaystyle\left\|\int_{\Omega^{\prime}}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df\right\|_{F}^{2} (ΩS(f)𝑺K𝒆ff1𝒆ff2𝑺KF𝑑f)2\displaystyle\leq\left(\int_{\Omega^{\prime}}\left\|S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\right\|_{F}\,df\right)^{2}
(ΩS(f)𝑺K𝒆ff12𝑺K𝒆ff22𝑑f)2\displaystyle\leq\left(\int_{\Omega^{\prime}}S(f)\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\right\|_{2}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{2}}\right\|_{2}\,df\right)^{2}
(ΩM𝑺K𝒆ff12𝑺K𝒆ff22𝑑f)2\displaystyle\leq\left(\int_{\Omega^{\prime}}M\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\right\|_{2}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{2}}\right\|_{2}\,df\right)^{2}
M2(Ω𝑺K𝒆ff122𝑑f)(Ω𝑺K𝒆ff222𝑑f)\displaystyle\leq M^{2}\left(\int_{\Omega^{\prime}}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\right\|_{2}^{2}\,df\right)\left(\int_{\Omega^{\prime}}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{2}}\right\|_{2}^{2}\,df\right)
=M2(ΩKψ(ff1)𝑑f)(ΩKψ(ff2)𝑑f)\displaystyle=M^{2}\left(\int_{\Omega^{\prime}}K\psi(f-f_{1})\,df\right)\left(\int_{\Omega^{\prime}}K\psi(f-f_{2})\,df\right)
M2(Ω1Kψ(ff1)𝑑f)(Ω2Kψ(ff2)𝑑f)\displaystyle\leq M^{2}\left(\int_{\Omega^{\prime}_{1}}K\psi(f-f_{1})\,df\right)\left(\int_{\Omega^{\prime}_{2}}K\psi(f-f_{2})\,df\right)
=M2(ΩKψ(f)𝑑f)(ΩKψ(f)𝑑f)\displaystyle=M^{2}\left(\int_{\Omega}K\psi(f)\,df\right)\left(\int_{\Omega}K\psi(f)\,df\right)
=M2(KΣK(1))2\displaystyle=M^{2}\left(K\Sigma^{(1)}_{K}\right)^{2}

where Ω1=[12,12][f1W,f1+W]\Omega^{\prime}_{1}=[-\tfrac{1}{2},\tfrac{1}{2}]\setminus[f_{1}-W,f_{1}+W] and Ω2=[12,12][f2W,f2+W]\Omega^{\prime}_{2}=[-\tfrac{1}{2},\tfrac{1}{2}]\setminus[f_{2}-W,f_{2}+W].

Finally, by applying the three bounds we’ve derived, we obtain

𝑺K𝑬f1𝑹𝑬f2𝑺KF\displaystyle\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{R}\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K}\right\|_{F} f1Wf1+WS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑fF+f2Wf2+WS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑fF\displaystyle\leq\left\|\int_{f_{1}-W}^{f_{1}+W}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df\right\|_{F}+\left\|\int_{f_{2}-W}^{f_{2}+W}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df\right\|_{F}
+ΩS(f)𝑺K𝒆ff1𝒆ff2𝑺K𝑑fF\displaystyle\ \ \ \ \ +\left\|\int_{\Omega^{\prime}}S(f)\boldsymbol{S}_{K}^{*}\boldsymbol{e}_{f-f_{1}}\boldsymbol{e}_{f-f_{2}}^{*}\boldsymbol{S}_{K}\,df\right\|_{F}
Rf12NWKΣK(1)+Rf22NWKΣK(1)+MKΣK(1),\displaystyle\leq R_{f_{1}}\sqrt{2NWK\Sigma^{(1)}_{K}}+R_{f_{2}}\sqrt{2NWK\Sigma^{(1)}_{K}}+MK\Sigma^{(1)}_{K},

and thus,

0Cov[S^Kmt(f1),S^Kmt(f2)]=1K2𝑺K𝑬f1𝑹𝑬f2𝑺KF2((Rf1+Rf2)2NWKΣK(1)+MΣK(1))2.0\leq\operatorname{Cov}\left[\widehat{S}^{\text{mt}}_{K}(f_{1}),\widehat{S}^{\text{mt}}_{K}(f_{2})\right]=\dfrac{1}{K^{2}}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f_{1}}^{*}\boldsymbol{R}\boldsymbol{E}_{f_{2}}\boldsymbol{S}_{K}\right\|_{F}^{2}\leq\left((R_{f_{1}}+R_{f_{2}})\sqrt{\dfrac{2NW}{K}\Sigma^{(1)}_{K}}+M\Sigma^{(1)}_{K}\right)^{2}.

A.8 Proof of Theorem 5

Since S^Kmt(f)=1K𝑺K𝑬f𝒙22\widehat{S}^{\text{mt}}_{K}(f)=\dfrac{1}{K}\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right\|_{2}^{2} where 𝒙𝒞𝒩(𝟎,𝑹)\boldsymbol{x}\sim\mathcal{CN}(\boldsymbol{0},\boldsymbol{R}), by Lemma 4, we have

{S^Kmt(f)β𝔼S^Kmt(f)}β1eκf(β1lnβ)forβ>1,\mathbb{P}\left\{\widehat{S}^{\text{mt}}_{K}(f)\geq\beta\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)\right\}\leq\beta^{-1}\mathrm{e}^{-\kappa_{f}(\beta-1-\ln\beta)}\quad\text{for}\quad\beta>1,

and

{S^Kmt(f)β𝔼S^Kmt(f)}eκf(β1lnβ)for0<β<1,\mathbb{P}\left\{\widehat{S}^{\text{mt}}_{K}(f)\leq\beta\mathbb{E}\widehat{S}^{\text{mt}}_{K}(f)\right\}\leq\mathrm{e}^{-\kappa_{f}(\beta-1-\ln\beta)}\quad\text{for}\quad 0<\beta<1,

where

κf=tr[𝑺K𝑬f𝑹𝑬f𝑺K]𝑺K𝑬f𝑹𝑬f𝑺K.\kappa_{f}=\dfrac{\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right]}{\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\|}.

We can get an upper bound on 𝑺K𝑬f𝑹𝑬f𝑺K\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K} in the Loewner ordering by splitting it into two pieces as done in the proof of Theorem 3, and then bounding each piece:

𝑺K𝑬f𝑹𝑬f𝑺K\displaystyle\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K} =𝑺K(WWS(f+f)𝒆f𝒆f𝑑f)𝑺K+𝑺K(ΩS(f+f)𝒆f𝒆f𝑑f)𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}+\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}S(f+f^{\prime})\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}
𝑺K(WWMf𝒆f𝒆f𝑑f)𝑺K+𝑺K(ΩM𝒆f𝒆f𝑑f)𝑺K\displaystyle\preceq\boldsymbol{S}_{K}^{*}\left(\int_{-W}^{W}M_{f}\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}+\boldsymbol{S}_{K}^{*}\left(\int_{\Omega}M\boldsymbol{e}_{f^{\prime}}\boldsymbol{e}_{f^{\prime}}^{*}\,df^{\prime}\right)\boldsymbol{S}_{K}
=𝑺K(Mf𝑩)𝑺K+𝑺K(M(𝐈𝑩))𝑺K\displaystyle=\boldsymbol{S}_{K}^{*}\left(M_{f}\boldsymbol{B}\right)\boldsymbol{S}_{K}+\boldsymbol{S}_{K}^{*}\left(M({\bf I}-\boldsymbol{B})\right)\boldsymbol{S}_{K}
=Mf𝚲K+M(𝐈𝚲K)\displaystyle=M_{f}\boldsymbol{\Lambda}_{K}+M({\bf I}-\boldsymbol{\Lambda}_{K})
=Mf𝐈+(MMf)(𝐈𝚲K).\displaystyle=M_{f}{\bf I}+(M-M_{f})({\bf I}-\boldsymbol{\Lambda}_{K}).

Then, by using the fact that 𝑷𝑸𝑷𝑸\boldsymbol{P}\preceq\boldsymbol{Q}\implies\|\boldsymbol{P}\|\leq\|\boldsymbol{Q}\| for PSD matrices 𝑷\boldsymbol{P} and 𝑸\boldsymbol{Q}, we can bound,

𝑺K𝑬f𝑹𝑬f𝑺KMf𝐈+(MMf)(𝐈𝚲K)=Mf+(MMf)(1λK1).\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\|\leq\left\|M_{f}{\bf I}+(M-M_{f})({\bf I}-\boldsymbol{\Lambda}_{K})\right\|=M_{f}+(M-M_{f})(1-\lambda_{K-1}).

We can also get a lower bound on tr[𝑺K𝑬f𝑹𝑬f𝑺K]=K𝔼[S^Kmt(f)]\operatorname{tr}[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}]=K\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right] by using the formula for E[S^Kmt(f)]E\left[\widehat{S}^{\text{mt}}_{K}(f)\right] from Lemma 6 along with the properties of the spectral window derived in Lemma 7 as follows:

tr[𝑺K𝑬f𝑹𝑬f𝑺K]\displaystyle\operatorname{tr}[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}] =K𝔼[S^Kmt(f)]\displaystyle=K\mathbb{E}\left[\widehat{S}^{\text{mt}}_{K}(f)\right]
=K1/21/2S(ff)ψ(f)𝑑f\displaystyle=K\int_{-1/2}^{1/2}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}
KWWS(ff)ψ(f)𝑑f\displaystyle\geq K\int_{-W}^{W}S(f-f^{\prime})\psi(f^{\prime})\,df^{\prime}
=KWWS(ff)[NK(NKψ(f))]𝑑f\displaystyle=K\int_{-W}^{W}S(f-f^{\prime})\left[\dfrac{N}{K}-\left(\dfrac{N}{K}-\psi(f^{\prime})\right)\right]\,df^{\prime}
=KWWS(ff)NK𝑑fKWWS(ff)(NKψ(f))𝑑f\displaystyle=K\int_{-W}^{W}S(f-f^{\prime})\dfrac{N}{K}\,df^{\prime}-K\int_{-W}^{W}S(f-f^{\prime})\left(\dfrac{N}{K}-\psi(f^{\prime})\right)\,df^{\prime}
=NfWf+WS(f)𝑑fWWS(ff)(NKψ(f))𝑑f\displaystyle=N\int_{f-W}^{f+W}S(f^{\prime})\,df^{\prime}-\int_{-W}^{W}S(f-f^{\prime})(N-K\psi(f^{\prime}))\,df^{\prime}
NfWf+WS(f)𝑑fWWMf(NKψ(f))𝑑f\displaystyle\geq N\int_{f-W}^{f+W}S(f^{\prime})\,df^{\prime}-\int_{-W}^{W}M_{f}(N-K\psi(f^{\prime}))\,df^{\prime}
=2NWAf(2NWK(1ΣK(1)))Mf\displaystyle=2NWA_{f}-\left(2NW-K\left(1-\Sigma^{(1)}_{K}\right)\right)M_{f}
=K(1ΣK(1))Mf2NW(MfAf)\displaystyle=K\left(1-\Sigma^{(1)}_{K}\right)M_{f}-2NW(M_{f}-A_{f})

Combining the upper bound on 𝑺K𝑬f𝑹𝑬f𝑺K\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\| with the lower bound on tr[𝑺K𝑬f𝑹𝑬f𝑺K]\operatorname{tr}[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}], yields

κf=tr[𝑺K𝑬f𝑹𝑬f𝑺K]𝑺K𝑬f𝑹𝑬f𝑺KK(1ΣK(1))Mf2NW(MfAf)Mf+(MMf)(1λK1).\kappa_{f}=\dfrac{\operatorname{tr}\left[\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right]}{\left\|\boldsymbol{S}_{K}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{R}\boldsymbol{E}_{f}\boldsymbol{S}_{K}\right\|}\geq\dfrac{K\left(1-\Sigma^{(1)}_{K}\right)M_{f}-2NW(M_{f}-A_{f})}{M_{f}+\left(M-M_{f}\right)(1-\lambda_{K-1})}.

Appendix B Proof of Results in Section 4

B.1 Fast algorithm for computing Ψ(f)\Psi(f) at grid frequencies

To begin developing our fast approximations for S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f), we first show that an eigenvalue weighted sum of NN tapered spectral estimates can be evaluated at a grid of frequencies f[L]/Lf\in[L]/L where L2NL\geq 2N in O(LlogL)O(L\log L) operations and using O(L)O(L) memory.

Lemma 8.

For any vector 𝐱N\boldsymbol{x}\in\mathbb{C}^{N} and any integer L2NL\geq 2N, the quantity

Ψ(f):=k=0N1λkS^k(f)whereS^k(f)=|n=0N1𝒔k[n]𝒙[n]ej2πfn|2\Psi(f):=\sum_{k=0}^{N-1}\lambda_{k}\widehat{S}_{k}(f)\quad\text{where}\quad\widehat{S}_{k}(f)=\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}

can be evaluated at the grid frequencies f[L]/Lf\in[L]/L in O(LlogL)O(L\log L) operations and using O(L)O(L) memory.

Proof.

Using eigendecomposition, we can write 𝑩=𝑺𝚲𝑺\boldsymbol{B}=\boldsymbol{S}\boldsymbol{\Lambda}\boldsymbol{S}^{*}, where

𝑺=[𝒔0𝒔1𝒔N1]\boldsymbol{S}=\begin{bmatrix}\boldsymbol{s}_{0}&\boldsymbol{s}_{1}&\cdots&\boldsymbol{s}_{N-1}\end{bmatrix}

and

𝚲=diag(λ0,λ1,,λN1).\boldsymbol{\Lambda}=\operatorname{diag}(\lambda_{0},\lambda_{1},\ldots,\lambda_{N-1}).

For any ff\in\mathbb{R}, we let 𝑬fN×N\boldsymbol{E}_{f}\in\mathbb{C}^{N\times N} be a diagonal matrix with diagonal entries

𝑬f[n,n]=ej2πfnforn[N].\boldsymbol{E}_{f}[n,n]=e^{j2\pi fn}\quad\text{for}\quad n\in[N].

With this definition, Ψ(f)\Psi(f) can be written as

Ψ(f)\displaystyle\Psi(f) =k=0N1λkS^k(f)\displaystyle=\sum_{k=0}^{N-1}\lambda_{k}\widehat{S}_{k}(f)
=k=0N1λk|n=0N1𝒔k[n]𝒙[n]ej2πfn|2\displaystyle=\sum_{k=0}^{N-1}\lambda_{k}\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}
=k=0N1λk|𝒔k𝑬f𝒙|2\displaystyle=\sum_{k=0}^{N-1}\lambda_{k}\left|\boldsymbol{s}_{k}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right|^{2}
=k=0N1𝚲[k,k]|(𝑺𝑬f𝒙)[k]|2\displaystyle=\sum_{k=0}^{N-1}\boldsymbol{\Lambda}[k,k]\left|(\boldsymbol{S}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x})[k]\right|^{2}
=𝒙𝑬f𝑺𝚲𝑺𝑬f𝒙\displaystyle=\boldsymbol{x}^{*}\boldsymbol{E}_{f}\boldsymbol{S}\boldsymbol{\Lambda}\boldsymbol{S}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}
=𝒙𝑬f𝑩𝑬f𝒙\displaystyle=\boldsymbol{x}^{*}\boldsymbol{E}_{f}\boldsymbol{B}\boldsymbol{E}_{f}^{*}\boldsymbol{x}

This gives us a a formula for Ψ(f)=k=0N1λkS^k(f)\Psi(f)=\sum_{k=0}^{N-1}\lambda_{k}\widehat{S}_{k}(f) which does not require computing any of the Slepian tapers. We will now use the fact that 𝑩\boldsymbol{B} is a Toeplitz matrix to efficiently compute Ψ(L)\Psi(\tfrac{\ell}{L}) for all [L]/L\ell\in[L]/L.

First, note that we can “extend” 𝑩\boldsymbol{B} to a larger circulant matrix, which is diagonalized by a Fourier Transform matrix. Specifically, define a vector of sinc samples 𝒃L\boldsymbol{b}\in\mathbb{R}^{L} by

𝒃[]={sin[2πW]πif{0,,N1}0if{N,,LN}sin[2πW(L)]π(L)if{LN+1,,L1},\boldsymbol{b}[\ell]=\begin{cases}\dfrac{\sin[2\pi W\ell]}{\pi\ell}&\text{if}\ \ell\in\{0,\ldots,N-1\}\\ 0&\text{if}\ \ell\in\{N,\ldots,L-N\}\\ \dfrac{\sin[2\pi W(L-\ell)]}{\pi(L-\ell)}&\text{if}\ \ell\in\{L-N+1,\ldots,L-1\}\end{cases},

and let 𝑩extL×L\boldsymbol{B}_{\text{ext}}\in\mathbb{R}^{L\times L} be defined by

𝑩ext[m,n]=𝒃[mn(modL)]form,n[L].\boldsymbol{B}_{\text{ext}}[m,n]=\boldsymbol{b}[m-n\pmod{L}]\ \text{for}\ m,n\in[L].

It is easy to check that 𝑩ext[m,n]=𝑩[m,n]for allm,n[N]\boldsymbol{B}_{\text{ext}}[m,n]=\boldsymbol{B}[m,n]\ \text{for all}\ m,n\in[N], i.e., the upper-left N×NN\times N submatrix of 𝑩ext\boldsymbol{B}_{\text{ext}} is 𝑩\boldsymbol{B}. Hence, we can write

𝑩=𝒁𝑩ext𝒁,\boldsymbol{B}=\boldsymbol{Z}^{*}\boldsymbol{B}_{\text{ext}}\boldsymbol{Z},

where

𝒁=[𝐈N×N𝟎(LN)×N]L×N\boldsymbol{Z}=\begin{bmatrix}{\bf I}_{N\times N}\\ \boldsymbol{0}_{(L-N)\times N}\end{bmatrix}\in\mathbb{R}^{L\times N}

is a zeropadding matrix. Since 𝑩ext\boldsymbol{B}_{\text{ext}} is a circulant matrix whose first column is 𝒃\boldsymbol{b}, we can write

𝑩ext=𝑭1diag(𝑭𝒃)𝑭\boldsymbol{B}_{\text{ext}}=\boldsymbol{F}^{-1}\operatorname{diag}(\boldsymbol{F}\boldsymbol{b})\boldsymbol{F}

where 𝑭L×L\boldsymbol{F}\in\mathbb{C}^{L\times L} is an FFT matrix, i.e.,

𝑭[m,n]=ej2πmn/Lform,n[L].\boldsymbol{F}[m,n]=e^{-j2\pi mn/L}\ \text{for}\ m,n\in[L].

Note that with this normalization, the inverse FFT satisfies

𝑭1=1L𝑭,\boldsymbol{F}^{-1}=\dfrac{1}{L}\boldsymbol{F}^{*},

as well as the conjugation identity

𝑭1𝒚=1L𝑭𝒚¯¯for all𝒚N.\boldsymbol{F}^{-1}\boldsymbol{y}=\dfrac{1}{L}\overline{\boldsymbol{F}\overline{\boldsymbol{y}}}\quad\text{for all}\quad\boldsymbol{y}\in\mathbb{C}^{N}.

Next, for any ff\in\mathbb{R}, let 𝑫fL×L\boldsymbol{D}_{f}\in\mathbb{C}^{L\times L} be a diagonal matrix with diagonal entries

𝑫f[m,m]=ej2πfmform[L],\boldsymbol{D}_{f}[m,m]=e^{j2\pi fm}\quad\text{for}\quad m\in[L],

and for each [L]\ell\in[L], let 𝑪L×L\boldsymbol{C}_{\ell}\in\mathbb{C}^{L\times L} be a cyclic shift matrix, i.e.,

𝑪[m,n]={1ifnm(modL)0otherwise.\boldsymbol{C}_{\ell}[m,n]=\begin{cases}1&\text{if}\ n-m\equiv\ell\pmod{L}\\ 0&\text{otherwise}\end{cases}.

Since 𝑬f\boldsymbol{E}_{f} and 𝑫f\boldsymbol{D}_{f} are both diagonal matrices and 𝑬f[n,n]=𝑫f[n,n]\boldsymbol{E}_{f}[n,n]=\boldsymbol{D}_{f}[n,n] for n[N]n\in[N], we have

𝒁𝑬f=𝑫f𝒁and𝑬f𝒁=𝒁𝑫f\boldsymbol{Z}\boldsymbol{E}_{f}^{*}=\boldsymbol{D}_{f}^{*}\boldsymbol{Z}\quad\text{and}\quad\boldsymbol{E}_{f}\boldsymbol{Z}^{*}=\boldsymbol{Z}^{*}\boldsymbol{D}_{f}

for all ff\in\mathbb{R}. Also, it is easy to check that cyclically shifting each column of 𝑭\boldsymbol{F} by \ell indices is equivalent to modulating the rows of 𝑭\boldsymbol{F}, or more specifically

𝑭𝑫/L=𝑪𝑭and𝑫/L𝑭=𝑭𝑪\boldsymbol{F}\boldsymbol{D}_{\ell/L}^{*}=\boldsymbol{C}_{\ell}\boldsymbol{F}\quad\text{and}\quad\boldsymbol{D}_{\ell/L}\boldsymbol{F}^{*}=\boldsymbol{F}^{*}\boldsymbol{C}_{\ell}^{*}

for all [L]\ell\in[L]. Additionally, for any vectors 𝒑,𝒒L\boldsymbol{p},\boldsymbol{q}\in\mathbb{C}^{L}, we will denote 𝒑𝒒L\boldsymbol{p}\circ\boldsymbol{q}\in\mathbb{C}^{L} to be the pointwise multiplication of 𝒑\boldsymbol{p} and 𝒒\boldsymbol{q}, i.e.,

(𝒑𝒒)[]=𝒑[]𝒒[]for[L],(\boldsymbol{p}\circ\boldsymbol{q})[\ell]=\boldsymbol{p}[\ell]\boldsymbol{q}[\ell]\quad\text{for}\quad\ell\in[L],

and 𝒑𝒒L\boldsymbol{p}\circledast\boldsymbol{q}\in\mathbb{C}^{L} to be the circluar cross-correlation of 𝒑\boldsymbol{p} and 𝒒\boldsymbol{q}, i.e.,

(𝒑𝒒)[]==0L1𝒑[]¯𝒒[+(modL)]for[L].(\boldsymbol{p}\circledast\boldsymbol{q})[\ell]=\sum_{\ell^{\prime}=0}^{L-1}\overline{\boldsymbol{p}[\ell^{\prime}]}\boldsymbol{q}[\ell^{\prime}+\ell\pmod{L}]\quad\text{for}\quad\ell\in[L].

Note that the circular cross-correlation of 𝒑\boldsymbol{p} and 𝒒\boldsymbol{q} can be computed using FFTs via the formula

𝒑𝒒=𝑭1(𝑭𝒑¯𝑭𝒒).\boldsymbol{p}\circledast\boldsymbol{q}=\boldsymbol{F}^{-1}(\overline{\boldsymbol{F}\boldsymbol{p}}\circ\boldsymbol{F}\boldsymbol{q}).

We will also use the notation |𝒑|2=𝒑¯𝒑|\boldsymbol{p}|^{2}=\overline{\boldsymbol{p}}\circ\boldsymbol{p} for convenience.

We can now manipulate our formula for Ψ(L)\Psi(\tfrac{\ell}{L}) as follows

Ψ(L)\displaystyle\Psi(\tfrac{\ell}{L}) :=𝒙𝑬/L𝑩𝑬/L𝒙\displaystyle:=\boldsymbol{x}^{*}\boldsymbol{E}_{\ell/L}\boldsymbol{B}\boldsymbol{E}_{\ell/L}^{*}\boldsymbol{x}
=𝒙𝑬/L𝒁𝑩ext𝒁𝑬/L𝒙\displaystyle=\boldsymbol{x}^{*}\boldsymbol{E}_{\ell/L}\boldsymbol{Z}^{*}\boldsymbol{B}_{\text{ext}}\boldsymbol{Z}\boldsymbol{E}_{\ell/L}^{*}\boldsymbol{x}
=𝒙𝑬/L𝒁𝑭1diag(𝑭𝒃)𝑭𝒁𝑬/L𝒙\displaystyle=\boldsymbol{x}^{*}\boldsymbol{E}_{\ell/L}\boldsymbol{Z}^{*}\boldsymbol{F}^{-1}\operatorname{diag}(\boldsymbol{F}\boldsymbol{b})\boldsymbol{F}\boldsymbol{Z}\boldsymbol{E}_{\ell/L}^{*}\boldsymbol{x}
=1L𝒙𝑬/L𝒁𝑭diag(𝑭𝒃)𝑭𝒁𝑬/L𝒙\displaystyle=\dfrac{1}{L}\boldsymbol{x}^{*}\boldsymbol{E}_{\ell/L}\boldsymbol{Z}^{*}\boldsymbol{F}^{*}\operatorname{diag}(\boldsymbol{F}\boldsymbol{b})\boldsymbol{F}\boldsymbol{Z}\boldsymbol{E}_{\ell/L}^{*}\boldsymbol{x}
=1L𝒙𝒁𝑫/L𝑭diag(𝑭𝒃)𝑭𝑫/L𝒁𝒙\displaystyle=\dfrac{1}{L}\boldsymbol{x}^{*}\boldsymbol{Z}^{*}\boldsymbol{D}_{\ell/L}\boldsymbol{F}^{*}\operatorname{diag}(\boldsymbol{F}\boldsymbol{b})\boldsymbol{F}\boldsymbol{D}_{\ell/L}^{*}\boldsymbol{Z}\boldsymbol{x}
=1L𝒙𝒁𝑭𝑪diag(𝑭𝒃)𝑪𝑭𝒁𝒙\displaystyle=\dfrac{1}{L}\boldsymbol{x}^{*}\boldsymbol{Z}^{*}\boldsymbol{F}^{*}\boldsymbol{C}_{\ell}^{*}\operatorname{diag}(\boldsymbol{F}\boldsymbol{b})\boldsymbol{C}_{\ell}\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}
=1L=0L1(𝑭𝒃)[]|(𝑪𝑭𝒁𝒙)[]|2\displaystyle=\dfrac{1}{L}\sum_{\ell^{\prime}=0}^{L-1}(\boldsymbol{F}\boldsymbol{b})[\ell^{\prime}]\cdot\left|\left(\boldsymbol{C}_{\ell}\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right)[\ell^{\prime}]\right|^{2}
=1L=0L1(𝑭𝒃)[]|(𝑭𝒁𝒙)[+(modL)]|2\displaystyle=\dfrac{1}{L}\sum_{\ell^{\prime}=0}^{L-1}(\boldsymbol{F}\boldsymbol{b})[\ell^{\prime}]\cdot\left|\left(\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right)[\ell^{\prime}+\ell\pmod{L}]\right|^{2}
=(1L𝑭𝒃¯|𝑭𝒁𝒙|2)[].\displaystyle=\left(\dfrac{1}{L}\overline{\boldsymbol{F}\boldsymbol{b}}\circledast\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right)[\ell].
=(𝑭1(1L𝑭𝑭𝒃¯¯𝑭|𝑭𝒁𝒙|2))[]\displaystyle=\left(\boldsymbol{F}^{-1}\left(\dfrac{1}{L}\overline{\boldsymbol{F}\overline{\boldsymbol{F}\boldsymbol{b}}}\circ\boldsymbol{F}\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right)\right)[\ell]
=(𝑭1(𝑭1𝑭𝒃𝑭|𝑭𝒁𝒙|2))[]\displaystyle=\left(\boldsymbol{F}^{-1}\left(\boldsymbol{F}^{-1}\boldsymbol{F}\boldsymbol{b}\circ\boldsymbol{F}\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right)\right)[\ell]
=(𝑭1(𝒃𝑭|𝑭𝒁𝒙|2))[].\displaystyle=\left(\boldsymbol{F}^{-1}\left(\boldsymbol{b}\circ\boldsymbol{F}\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right)\right)[\ell].

Therefore,

[Ψ(0L)Ψ(1L)Ψ(L2L)Ψ(L1L)]T=𝑭1(𝒃𝑭|𝑭𝒁𝒙|2).\begin{bmatrix}\Psi(\tfrac{0}{L})&\Psi(\tfrac{1}{L})&\cdots&\Psi(\tfrac{L-2}{L})&\Psi(\tfrac{L-1}{L})\end{bmatrix}^{T}=\boldsymbol{F}^{-1}\left(\boldsymbol{b}\circ\boldsymbol{F}\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right).

So to compute Ψ(f)\Psi(f) at all grid frequencies f[L]/Lf\in[L]/L, we only need to compute 𝑭1(𝒃𝑭|𝑭𝒁𝒙|2)\boldsymbol{F}^{-1}\left(\boldsymbol{b}\circ\boldsymbol{F}\left|\boldsymbol{F}\boldsymbol{Z}\boldsymbol{x}\right|^{2}\right), which can be done in O(LlogL)O(L\log L) operations using O(L)O(L) memory via three length-LL FFTs/IFFTs and a few pointwise operations on vectors of length LL. ∎

B.2 Approximations for general multitaper spectral estimates

Next, we present a lemma regarding approximations to spectral estimates which use orthonormal tapers.

Lemma 9.

Let 𝐱N\boldsymbol{x}\in\mathbb{C}^{N} be a vector of NN equispaced samples, and let {𝐯k}k=0N1\{\boldsymbol{v}_{k}\}_{k=0}^{N-1} be any orthonormal set of tapers in N\mathbb{C}^{N}. For each k[N]k\in[N], define a tapered spectral estimate

Vk(f)=|n=0N1𝒗k[n]𝒙[n]ej2πfn|2.V_{k}(f)=\left|\sum_{n=0}^{N-1}\boldsymbol{v}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}.

Also, let {γk}k=0N1\left\{\gamma_{k}\right\}_{k=0}^{N-1} and {γ~k}k=0N1\left\{\widetilde{\gamma}_{k}\right\}_{k=0}^{N-1} be real coefficients, and then define a multitaper spectral estimate V^(f)\widehat{V}(f) and an approximation V~(f)\widetilde{V}(f) by

V^(f)=k=0N1γkVk(f)andV~(f)=k=0N1γ~kVk(f).\widehat{V}(f)=\sum_{k=0}^{N-1}\gamma_{k}V_{k}(f)\quad\text{and}\quad\widetilde{V}(f)=\sum_{k=0}^{N-1}\widetilde{\gamma}_{k}V_{k}(f).

Then, for any frequency ff\in\mathbb{R}, we have

|V^(f)V~(f)|(maxk|γkγ~k|)x22.\left|\widehat{V}(f)-\widetilde{V}(f)\right|\leq\left(\max_{k}\left|\gamma_{k}-\widetilde{\gamma}_{k}\right|\right)\|x\|_{2}^{2}.
Proof.

Let 𝑽=[𝒗0𝒗N1]\boldsymbol{V}=\begin{bmatrix}\boldsymbol{v}_{0}&\cdots&\boldsymbol{v}_{N-1}\end{bmatrix}, and let 𝚪\boldsymbol{\Gamma}, 𝚪~N×N\widetilde{\boldsymbol{\Gamma}}\in\mathbb{R}^{N\times N}, and 𝑬fN×N\boldsymbol{E}_{f}\in\mathbb{C}^{N\times N} be diagonal matrices whose diagonal entries are 𝚪[n,n]=γn\boldsymbol{\Gamma}[n,n]=\gamma_{n}, 𝚪~[n,n]=γ~n\widetilde{\boldsymbol{\Gamma}}[n,n]=\widetilde{\gamma}_{n}, and 𝑬f[n,n]=ej2πfn\boldsymbol{E}_{f}[n,n]=e^{j2\pi fn} for n[N]n\in[N]. Then,

V^(f)\displaystyle\widehat{V}(f) =k=0N1γkVk(f)\displaystyle=\sum_{k=0}^{N-1}\gamma_{k}V_{k}(f)
=k=0N1γk|n=0N1𝒗k[n]𝒙[n]ej2πfn|2\displaystyle=\sum_{k=0}^{N-1}\gamma_{k}\left|\sum_{n=0}^{N-1}\boldsymbol{v}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi fn}\right|^{2}
=k=0N1γk|𝒗k𝑬f𝒙|2\displaystyle=\sum_{k=0}^{N-1}\gamma_{k}\left|\boldsymbol{v}_{k}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right|^{2}
=k=0N1𝚪[k,k]|(𝑽𝑬f𝒙)[k]|2\displaystyle=\sum_{k=0}^{N-1}\boldsymbol{\Gamma}[k,k]\left|(\boldsymbol{V}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x})[k]\right|^{2}
=𝒙𝑬f𝑽𝚪𝑽𝑬f𝒙.\displaystyle=\boldsymbol{x}^{*}\boldsymbol{E}_{f}\boldsymbol{V}\boldsymbol{\Gamma}\boldsymbol{V}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}.

In a nearly identical manner,

V~(f)=𝒙𝑬f𝑽𝚪~𝑽𝑬f𝒙.\widetilde{V}(f)=\boldsymbol{x}^{*}\boldsymbol{E}_{f}\boldsymbol{V}\widetilde{\boldsymbol{\Gamma}}\boldsymbol{V}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}.

Since 𝑽\boldsymbol{V} is orthonormal, 𝑽=𝑽=1\|\boldsymbol{V}\|=\|\boldsymbol{V}^{*}\|=1. Since 𝑬f\boldsymbol{E}_{f} is diagonal, and all the diagonal entries have modulus 11, 𝑬f=𝑬f=1\|\boldsymbol{E}_{f}\|=\|\boldsymbol{E}_{f}^{*}\|=1. Hence, for any ff\in\mathbb{R}, we can bound

|V^(f)V~(f)|\displaystyle\left|\widehat{V}(f)-\widetilde{V}(f)\right| =|𝒙𝑬f𝑽(𝚪𝚪~)𝑽𝑬f𝒙|\displaystyle=\left|\boldsymbol{x}^{*}\boldsymbol{E}_{f}\boldsymbol{V}\left(\boldsymbol{\Gamma}-\widetilde{\boldsymbol{\Gamma}}\right)\boldsymbol{V}^{*}\boldsymbol{E}_{f}^{*}\boldsymbol{x}\right|
𝒙2𝑬f𝑽𝚪𝚪~𝑽𝑬f𝒙2\displaystyle\leq\|\boldsymbol{x}\|_{2}\|\boldsymbol{E}_{f}\|\|\boldsymbol{V}\|\|\boldsymbol{\Gamma}-\widetilde{\boldsymbol{\Gamma}}\|\|\boldsymbol{V}^{*}\|\|\boldsymbol{E}_{f}^{*}\|\|\boldsymbol{x}\|_{2}
=𝚪𝚪~𝒙22\displaystyle=\|\boldsymbol{\Gamma}-\widetilde{\boldsymbol{\Gamma}}\|\|\boldsymbol{x}\|_{2}^{2}
=(maxk|γkγ~k|)x22,\displaystyle=\left(\max_{k}|\gamma_{k}-\widetilde{\gamma}_{k}|\right)\|x\|_{2}^{2},

as desired. ∎

B.3 Proof of Theorem 6

Recall that the indices [N][N] are partitioned as follows:

1\displaystyle\mathcal{I}_{1} ={k[K]:λk1ϵ}\displaystyle=\{k\in[K]:\lambda_{k}\geq 1-\epsilon\}
2\displaystyle\mathcal{I}_{2} ={k[K]:ϵ<λk<1ϵ}\displaystyle=\{k\in[K]:\epsilon<\lambda_{k}<1-\epsilon\}
3\displaystyle\mathcal{I}_{3} ={k[N][K]:ϵ<λk<1ϵ}\displaystyle=\{k\in[N]\setminus[K]:\epsilon<\lambda_{k}<1-\epsilon\}
4\displaystyle\mathcal{I}_{4} ={k[N][K]:λkϵ}.\displaystyle=\{k\in[N]\setminus[K]:\lambda_{k}\leq\epsilon\}.

Using the partitioning above, the unweighted multitaper spectral estimate S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) can be written as

S^Kmt(f)\displaystyle\widehat{S}^{\text{mt}}_{K}(f) =1Kk=0K1S^k(f)\displaystyle=\dfrac{1}{K}\sum_{k=0}^{K-1}\widehat{S}_{k}(f)
=k121KS^k(f),\displaystyle=\sum_{k\in\mathcal{I}_{1}\cup\mathcal{I}_{2}}\dfrac{1}{K}\widehat{S}_{k}(f),

and the approximate estimator S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) can be written as

S~Kmt(f)\displaystyle\widetilde{S}^{\text{mt}}_{K}(f) =1KΨ(f)+1Kk2(1λk)S^k(f)1Kk3λkS^k(f)\displaystyle=\dfrac{1}{K}\Psi(f)+\dfrac{1}{K}\sum_{k\in\mathcal{I}_{2}}(1-\lambda_{k})\widehat{S}_{k}(f)-\dfrac{1}{K}\sum_{k\in\mathcal{I}_{3}}\lambda_{k}\widehat{S}_{k}(f)
=k=0N1λkKS^k(f)+k21λkKS^k(f)k3λkKS^k(f)\displaystyle=\sum_{k=0}^{N-1}\dfrac{\lambda_{k}}{K}\widehat{S}_{k}(f)+\sum_{k\in\mathcal{I}_{2}}\dfrac{1-\lambda_{k}}{K}\widehat{S}_{k}(f)-\sum_{k\in\mathcal{I}_{3}}\dfrac{\lambda_{k}}{K}\widehat{S}_{k}(f)
=k14λkKS^k(f)+k21KS^k(f)\displaystyle=\sum_{k\in\mathcal{I}_{1}\cup\mathcal{I}_{4}}\dfrac{\lambda_{k}}{K}\widehat{S}_{k}(f)+\sum_{k\in\mathcal{I}_{2}}\dfrac{1}{K}\widehat{S}_{k}(f)

Thus, S^Kmt(f)\widehat{S}^{\text{mt}}_{K}(f) and S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) can be written as

S^Kmt(f)=k=0N1γkS^k(f)andS~Kmt(f)=k=0N1γ~kS^k(f)\widehat{S}^{\text{mt}}_{K}(f)=\sum_{k=0}^{N-1}\gamma_{k}\widehat{S}_{k}(f)\quad\text{and}\quad\widetilde{S}^{\text{mt}}_{K}(f)=\sum_{k=0}^{N-1}\widetilde{\gamma}_{k}\widehat{S}_{k}(f)

where

γk={1/Kk12,0k34,andγ~k={λk/Kk14,1/Kk2,0k3.\gamma_{k}=\begin{cases}1/K&k\in\mathcal{I}_{1}\cup\mathcal{I}_{2},\\ 0&k\in\mathcal{I}_{3}\cup\mathcal{I}_{4},\end{cases}\quad\text{and}\quad\widetilde{\gamma}_{k}=\begin{cases}\lambda_{k}/K&k\in\mathcal{I}_{1}\cup\mathcal{I}_{4},\\ 1/K&k\in\mathcal{I}_{2},\\ 0&k\in\mathcal{I}_{3}.\end{cases}

We now bound |γkγ~k|\left|\gamma_{k}-\widetilde{\gamma}_{k}\right| for all k[N]k\in[N]. For k1k\in\mathcal{I}_{1}, we have λk1ϵ\lambda_{k}\geq 1-\epsilon, and thus,

|γkγ~k|=|1KλkK|=1λkKϵK.\left|\gamma_{k}-\widetilde{\gamma}_{k}\right|=\left|\dfrac{1}{K}-\dfrac{\lambda_{k}}{K}\right|=\dfrac{1-\lambda_{k}}{K}\leq\dfrac{\epsilon}{K}.

For k23k\in\mathcal{I}_{2}\cup\mathcal{I}_{3} we have γk=γ~k\gamma_{k}=\widetilde{\gamma}_{k}, i.e., |γkγ~k|=0\left|\gamma_{k}-\widetilde{\gamma}_{k}\right|=0. For k4k\in\mathcal{I}_{4}, we have λkϵ\lambda_{k}\leq\epsilon, and thus,

|γkγ~k|=|0λkK|=λkKϵK.\left|\gamma_{k}-\widetilde{\gamma}_{k}\right|=\left|0-\dfrac{\lambda_{k}}{K}\right|=\dfrac{\lambda_{k}}{K}\leq\dfrac{\epsilon}{K}.

Hence, |γkγ~k|ϵK\left|\gamma_{k}-\widetilde{\gamma}_{k}\right|\leq\frac{\epsilon}{K} for all k[N]k\in[N], and thus by Lemma 9,

|S^Kmt(f)S~Kmt(f)|ϵK𝒙22\left|\widehat{S}^{\text{mt}}_{K}(f)-\widetilde{S}^{\text{mt}}_{K}(f)\right|\leq\dfrac{\epsilon}{K}\|\boldsymbol{x}\|_{2}^{2}

for all frequencies ff\in\mathbb{R}.

B.4 Proof of Theorem 7

To evaluate the approximate multitaper estimate

S~Kmt(f)=1KΨ(f)+1Kk2(1λk)S^k(f)1Kk3λkS^k(f)\widetilde{S}^{\text{mt}}_{K}(f)=\dfrac{1}{K}\Psi(f)+\dfrac{1}{K}\sum_{k\in\mathcal{I}_{2}}(1-\lambda_{k})\widehat{S}_{k}(f)-\dfrac{1}{K}\sum_{k\in\mathcal{I}_{3}}\lambda_{k}\widehat{S}_{k}(f)

at the LL grid frequencies f[L]/Lf\in[L]/L one needs to:

  • For each k23k\in\mathcal{I}_{2}\cup\mathcal{I}_{3}, precompute the Slepian basis vectors 𝒔k\boldsymbol{s}_{k} and eigenvalues λk\lambda_{k}.

    Computing the Slepian basis vector 𝒔k\boldsymbol{s}_{k} and the corresponding eigenvalue λk\lambda_{k} for a single index kk can be done in O(NlogN)O(N\log N) operations and O(N)O(N) memory via the method described in [53]. This needs to be done for #(23)=#{k:ϵ<λk<1ϵ}=O(log(NW)log1ϵ)\#(\mathcal{I}_{2}\cup\mathcal{I}_{3})=\#\{k:\epsilon<\lambda_{k}<1-\epsilon\}=O(\log(NW)\log\tfrac{1}{\epsilon}) values of kk, so the total cost of this step is O(Nlog(N)log(NW)log1ϵ)O(N\log(N)\log(NW)\log\tfrac{1}{\epsilon}) operations and O(Nlog(NW)log1ϵ)O(N\log(NW)\log\tfrac{1}{\epsilon}) memory.

  • For [L]\ell\in[L], evaluate Ψ(L)\Psi(\tfrac{\ell}{L}).

    If L2NL\geq 2N, then evaluating Ψ(L)\Psi(\tfrac{\ell}{L}) for [L]\ell\in[L] can be done in O(LlogL)O(L\log L) operations and O(L)O(L) memory as shown in Lemma 8. If NL<2NN\leq L<2N, then 2L2N2L\geq 2N, so by Lemma 8, we can evaluate Ψ(2L)\Psi(\tfrac{\ell}{2L}) for [2L]\ell\in[2L] in O(2Llog2L)=O(LlogL)O(2L\log 2L)=O(L\log L) operations and O(2L)=O(L)O(2L)=O(L) memory and then simply downsample the result to obtain Ψ(L)\Psi(\tfrac{\ell}{L}) for [L]\ell\in[L].

  • For each k23k\in\mathcal{I}_{2}\cup\mathcal{I}_{3} and each [L]\ell\in[L], evaluate S^k(L)\widehat{S}_{k}(\tfrac{\ell}{L}).

    Evaluating S^k(L)=|n=0N1𝒔k[n]𝒙[n]ej2πn/L|2\widehat{S}_{k}(\tfrac{\ell}{L})=\left|\sum_{n=0}^{N-1}\boldsymbol{s}_{k}[n]\boldsymbol{x}[n]e^{-j2\pi n\ell/L}\right|^{2} for all [L]\ell\in[L] can be done by pointwise multiplying 𝒔k\boldsymbol{s}_{k} and 𝒙\boldsymbol{x}, zeropadding this vector to length LL, computing a length-LL FFT, and then computing the squared magnitude of each FFT coefficient. This takes O(LlogL)O(L\log L) operations and O(L)O(L) memory. This needs to be done for #(23)=O(log(NW)log1ϵ)\#(\mathcal{I}_{2}\cup\mathcal{I}_{3})=O(\log(NW)\log\tfrac{1}{\epsilon}) values of kk, so the total cost of this step is O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) operations and O(Llog(NW)log1ϵ)O(L\log(NW)\log\tfrac{1}{\epsilon}) memory.

  • For each [L]\ell\in[L], evaluate the weighted sum above for S~Kmt(L)\widetilde{S}^{\text{mt}}_{K}(\tfrac{\ell}{L}).

    Once Ψ(L)\Psi(\tfrac{\ell}{L}) and S^k(L)\widehat{S}_{k}(\tfrac{\ell}{L}) for k23k\in\mathcal{I}_{2}\cup\mathcal{I}_{3} are computed, evaluating S~Kmt(L)\widetilde{S}^{\text{mt}}_{K}(\tfrac{\ell}{L}) requires O(#(23))=O(log(NW)log1ϵ)O(\#(\mathcal{I}_{2}\cup\mathcal{I}_{3}))=O(\log(NW)\log\tfrac{1}{\epsilon}) multiplications and additions. This has to be done for each [L]\ell\in[L], so the total cost is O(Llog(NW)log1ϵ)O(L\log(NW)\log\tfrac{1}{\epsilon}) operations.

Since LNL\geq N, the total cost of evaluating the approximate multitaper estimate S~Kmt(f)\widetilde{S}^{\text{mt}}_{K}(f) at the LL grid frequencies f[L]/Lf\in[L]/L is O(LlogLlog(NW)log1ϵ)O(L\log L\log(NW)\log\tfrac{1}{\epsilon}) operations and O(Llog(NW)log1ϵ)O(L\log(NW)\log\tfrac{1}{\epsilon}) memory.