This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Formulating and critically examining the assumptions of global 21-cm signal analyses: How to avoid the false troughs that can appear in single spectrum fits

Keith Tauscher Center for Astrophysics and Space Astronomy, Department of Astrophysical and Planetary Science, University of Colorado, Boulder, CO 80309, USA Department of Physics, University of Colorado, Boulder, CO 80309, USA David Rapetti NASA Ames Research Center, Moffett Field, CA 94035, USA Universities Space Research Association, Mountain View, CA 94043, USA Center for Astrophysics and Space Astronomy, Department of Astrophysical and Planetary Science, University of Colorado, Boulder, CO 80309, USA Jack O. Burns Center for Astrophysics and Space Astronomy, Department of Astrophysical and Planetary Science, University of Colorado, Boulder, CO 80309, USA [email protected]
Abstract

The assumptions inherent to global 21-cm signal analyses are rarely delineated. In this paper, we formulate a general list of suppositions underlying a given claimed detection of the global 21-cm signal. Then, we specify the form of these assumptions for two different analyses: 1) the one performed by the EDGES team showing an absorption trough in brightness temperature that they modeled separately from the sky foreground and 2) a new, so-called Minimum Assumption Analysis (MAA), that makes the most conservative assumptions possible for the signal. We show fits using the EDGES analysis on various beam-weighted foreground simulations from the EDGES latitude with no signal added. Depending on the beam used, these simulations produce large false troughs due to the invalidity of the foreground model to describe the combination of beam chromaticity and the shape of the Galactic plane in the sky, the residuals of which are captured by the ad hoc flattened Gaussian signal model. On the other hand, the MAA provides robust fits by including many spectra at different time bins and allowing any possible 21-cm spectrum to be modeled exactly. We present uncertainty levels and example signal reconstructions found with the MAA for different numbers of time bins. With enough time bins, one can determine the true 21-cm signal with the MAA to <10<10 times the noise level.

cosmology: dark ages, reionization, first stars — cosmology: observations

1 Introduction

The highly redshifted 21-cm signal generated by the hyperfine, spin-flip transition of neutral hydrogen tracks the history of the Universe after recombination and before reionization (Pritchard & Loeb, 2012), from the Dark Ages, before there were any compact sources, through Cosmic Dawn, when the first stars were born. This signal comes in two forms, the power spectrum, which has both angular and spectral dependence, and the sky-averaged global signal, which is a single spectrum. While the former is being focused on by experiments like the Hydrogen Epoch of Reionization Array (HERA, DeBoer et al., 2017), the former, which is the focus of this paper, is actively being searched for by experiments such as the Large-Aperture Experiment to Detect the Dark Age (LEDA, Bernardi, 2018), the Shaped Antenna measurement of the background RAdio Spectrum (SARAS, Singh et al., 2018), the Cosmic Twilight Polarimeter (CTP, Nhan et al., 2019), the Radio Experiment for the Analysis of Cosmic Hydrogen (REACH, de Lera Acedo, 2019), and the Experiment to Detect the Global Epoch-of-Reionization Signature (EDGES, Monsalve et al., 2017a, 2018). In particular, in Bowman et al. (2018) the EDGES team reported a trough in the sky-averaged radio spectrum at 78 MHz using an analysis technique that we will examine in detail in this paper. Space-based mission concepts, such as the Dark Ages Polarimeter PathfindER (DAPPER; Burns et al., 2019; Burns, 2020) and its precursor the Dark Ages Radio Explorer (DARE, Burns et al., 2017), are also being developed to measure the global 21-cm signal from the vicinity of the Moon, where experiments do not have to deal with systematic effects such as human-generated Radio Frequency Interference (RFI) or attenuation and emission from the ionosphere or interactions of the antenna beam with the environment.

While all of these experiments must be designed carefully to have the sensitivity to measure the global signal, which is expected to be a few hundred mK deep, this paper will focus on the data analysis techniques necessary to extract the signal from the beam-weighted galactic foreground emission, which, at the relevant frequencies, is generally at a level of a few thousand K (for the magnitude of the beam-weighted foreground seen by EDGES, see Figure 1a of Bowman et al., 2018). The shape of the beam-weighted foreground in the absence of the global signal is not known a priori, so simple subtraction is not an option. Therefore, one must devise a model which is known (or at least reasonably assumed) to encapsulate the possible values of the beam-weighted foreground. This task would be easy if the beam would be achromatic, but such beams do not exist. Real antennas have beams that change with frequency over the wide bandwidth that these experiments need to measure. This beam chromaticity distorts the spectral shapes of the intrinsic foreground as measured by an achromatic beam (and which would be well fit by models such as the ones EDGES uses) in ways particular to the antenna being used. Therefore, to fit the beam-weighted foreground, a model that is particular to the given antenna and experiment location must be created. Such a model can be created via Singular Value Decomposition (SVD) of a training set as it is in the pipeline laid out in Tauscher et al. (2018), Rapetti et al. (2019), and Tauscher et al. (2020). The method presented for making this model is similar in spirit to that defined in Switzer & Liu (2014), except the foreground basis is derived from an a priori training set instead of the data itself. It is also very similar to the method performed by Vedantham et al. (2014), although in that paper, SVD is performed on spectra at different time bins to define modes that span frequencies, while in this paper, different simulations of spectra defined at multiple time bins are used to define modes that span both frequencies and time bins. This difference is crucial because the constraining power of our method comes from the fact that its foreground model encodes correlations between time bins, which is not true of the purely spectral modes arising from the methods of Vedantham et al. (2014). Our method, a new variant of which will be described in Section 4, connects with the idea from Liu et al. (2013) that angular information can effectively supplement spectral information in signal estimation. However, instead of requiring instruments with angular resolution, our method gleans spatial information from time-dependent drift-scan measurements from single antenna instruments with no angular resolution, where the spectra weight the entire spatial sky into a single pixel.

This paper is especially relevant in light of the large volume of literature written by the community in their attempts to explain the trough presented by Bowman et al. (2018), especially its depth. Some have explored the possibility that it could represent a larger radio background than expected (Feng & Holder, 2018; Ewall-Wice et al., 2018, 2019; Fialkov & Barkana, 2019; Mebane et al., 2019), while others have hypothesized that Rutherford-like scattering of dark matter could cause the hydrogen gas to cool faster than adiabatic cooling would imply (Muñoz et al., 2018; Barkana, 2018; Barkana et al., 2018; Fialkov et al., 2018; Loeb & Muñoz, 2018; Berlin et al., 2018). While none of these ideas perfectly describe the detection (see, e.g. Creque-Sarbinowski et al., 2019, for a criticism of dark matter explanations), they illustrate the temptation to use the results of Bowman et al. (2018) to explore possible exotic physics. In addition to these physical explanations, some have explored possible unmodeled systematic effects present in the data (Hills et al., 2018; Bradley et al., 2019; Draine & Miralda-Escudé, 2018; Singh & Subrahmanyan, 2019; Spinelli et al., 2019; Sims & Pober, 2020). In this paper, we layout yet another possible explanation of the EDGES results—that the modeling performed could be mistaking beam cromaticity distortion of the foreground for signal.

In Section 2, we lay out the general presumptions necessary to perform any global 21-cm signal experiment. In Section 3, we generate and fit multiple EDGES-like simulations of beam-weighted foregrounds with the analysis method of Bowman et al. (2018). In Section 4, we propose a new, minimum assumption analysis that allows for any possible 21-cm global signal and demonstrate it on simulated spectra with a signal and beam-weighted foreground. In Section 5, we discuss and compare the results of the two analyses. Finally, we conclude in Section 6.

2 Enumerating assumptions

The likelihood used when doing a foreground-only fit (like the one done to generate the residuals found in Figure 1b of Bowman et al., 2018) is

fg-only(𝒂)e(𝒚𝑭𝒂)T𝑪1(𝒚𝑭𝒂)/2,{\mathcal{L}}_{\text{fg-only}}({\boldsymbol{a}})\propto e^{-({\boldsymbol{y}}-{\boldsymbol{F}}{\boldsymbol{a}})^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{y}}-{\boldsymbol{F}}{\boldsymbol{a}})/2}, (1)

where 𝒚{\boldsymbol{y}} is the spectrum written as a column vector, 𝑪{\boldsymbol{C}} the covariance matrix of the data’s noise distribution, and 𝑭{\boldsymbol{F}} a matrix with the basis vectors used to fit the foreground as columns. The value of 𝒂{\boldsymbol{a}} that maximizes fg-only(𝒂){\mathcal{L}}_{\text{fg-only}}({\boldsymbol{a}}) is

𝒂ML,fg-only=(𝑭T𝑪1𝑭)1𝑭T𝑪1𝒚{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}}=({\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{F}})^{-1}{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{y}} (2)

and the maximum likelihood reconstruction of the foreground, 𝜸fg{\boldsymbol{\gamma}}_{\text{fg}}, is given by 𝜸fg=𝑭𝒂ML,fg-only{\boldsymbol{\gamma}}_{\text{fg}}={\boldsymbol{F}}{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}}, or 𝜸fg=𝚽𝒚{\boldsymbol{\gamma}}_{\text{fg}}={\boldsymbol{\Phi}}{\boldsymbol{y}}, where

𝚽=𝑭(𝑭T𝑪1𝑭)1𝑭T𝑪1{\boldsymbol{\Phi}}={\boldsymbol{F}}({\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{F}})^{-1}{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1} (3)

is the matrix that projects vectors into the column space of 𝑭{\boldsymbol{F}} (i.e. the space of spectra formable by the linear foreground model). The residual, 𝒓{\boldsymbol{r}}, unfit by this procedure is given by 𝒓=𝒚𝜸fg=(𝑰𝚽)𝒚{\boldsymbol{r}}={\boldsymbol{y}}-{\boldsymbol{\gamma}}_{\text{fg}}=({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}. The data are given by a sum of a foreground term, 𝒚(fg){\boldsymbol{y}}^{({\text{fg}})}, a signal term 𝒚(21){\boldsymbol{y}}^{(21)}, and random noise 𝒏{\boldsymbol{n}}.

We can write 𝒚(fg)=𝑭𝜶+𝜹{\boldsymbol{y}}^{({\text{fg}})}={\boldsymbol{F}}{\boldsymbol{\alpha}}+{\boldsymbol{\delta}} where 𝜶{\boldsymbol{\alpha}} is the coefficient vector that best describes 𝒚(fg){\boldsymbol{y}}^{({\text{fg}})} given the basis matrix 𝑭{\boldsymbol{F}}111If the signal, 𝒚(21){\boldsymbol{y}}^{(21)}, was zero and there was no random noise, then 𝜶{\boldsymbol{\alpha}} would be equal to 𝒂ML,fg-only{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}}. However, in the general case where there is a nonzero signal, 𝒂ML,fg-only{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}} absorbs some of the signal spectrum. In addition, 𝒂ML,fg-only{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}} is subject to small biases caused by random noise, whereas in this formulation 𝜶{\boldsymbol{\alpha}} is the best fitting coefficient vector of the ideal, noiseless foreground (i.e. the one which would be measured given infinite integration time with a stable instrument). and 𝜹{\boldsymbol{\delta}} is the part of 𝒚(fg){\boldsymbol{y}}^{({\text{fg}})} that cannot be fit by the foreground model, i.e. the part of 𝒚(fg){\boldsymbol{y}}^{({\text{fg}})} that is not in the column space of 𝑭{\boldsymbol{F}}, which satisfies 𝚽𝜹=𝟎{\boldsymbol{\Phi}}{\boldsymbol{\delta}}={\boldsymbol{0}}. This means that (𝑰𝚽)𝒚(fg)=𝜹({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}^{({\text{fg}})}={\boldsymbol{\delta}} and the foreground-only residual 𝒓{\boldsymbol{r}} is given by

𝒓=(𝑰𝚽)(𝒚(21)+𝜹+𝒏).{\boldsymbol{r}}=({\boldsymbol{I}}-{\boldsymbol{\Phi}})({\boldsymbol{y}}^{(21)}+{\boldsymbol{\delta}}+{\boldsymbol{n}}). (4)

Therefore, to solve for the signal plus foreground inaccuracy and noise, which we define as 𝒔=𝒚(21)+𝜹+𝒏{\boldsymbol{s}}={\boldsymbol{y}}^{(21)}+{\boldsymbol{\delta}}+{\boldsymbol{n}}, we must find the general solution of (𝑰𝚽)𝒔=𝒓({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{s}}={\boldsymbol{r}}, which is the sum of any particular solution, 𝒔p{\boldsymbol{s}}_{p}, and the general solution, 𝒔h{\boldsymbol{s}}_{h}, to the homogeneous equation, (𝑰𝚽)𝒔h=𝟎({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{s}}_{h}={\boldsymbol{0}}. We guess that 𝒓{\boldsymbol{r}} is a particular solution and verify by noting that (𝑰𝚽)𝒓=(𝑰𝚽)2𝒚=(𝑰2𝚽+𝚽2)𝒚=(𝑰𝚽)𝒚=𝒓({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{r}}=({\boldsymbol{I}}-{\boldsymbol{\Phi}})^{2}{\boldsymbol{y}}=({\boldsymbol{I}}-2{\boldsymbol{\Phi}}+{\boldsymbol{\Phi}}^{2}){\boldsymbol{y}}=({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}={\boldsymbol{r}} because 𝚽2=𝚽{\boldsymbol{\Phi}}^{2}={\boldsymbol{\Phi}}. The homogeneous solution is any vector in the null space of 𝑰𝚽{\boldsymbol{I}}-{\boldsymbol{\Phi}}, which is equivalent to any vector that remains unchanged when multiplied on the left by 𝚽{\boldsymbol{\Phi}}. Since 𝚽{\boldsymbol{\Phi}} is a projection matrix onto the columns of 𝑭{\boldsymbol{F}}, any vector in the column space of 𝑭{\boldsymbol{F}} remains unchanged when being multiplied by 𝚽{\boldsymbol{\Phi}}. Therefore, 𝒔h=𝑭𝒙{\boldsymbol{s}}_{h}={\boldsymbol{F}}{\boldsymbol{x}} for any 𝒙N{\boldsymbol{x}}\in\mathbb{R}^{N}. This means that the signal satisfies

𝒚(21)=𝒓+𝑭𝒙𝜹𝒏{\boldsymbol{y}}^{(21)}={\boldsymbol{r}}+{\boldsymbol{F}}{\boldsymbol{x}}-{\boldsymbol{\delta}}-{\boldsymbol{n}} (5)

for some unknown vector 𝒙{\boldsymbol{x}}. Equation 5 implies that, due to the filtering out of the foregrounds when fitting, the signal can have any foreground modes added to it without changing the quality of the fit. This shows that with only one spectrum and no more information, even if δ=𝟎{\boldsymbol{\delta}}={\boldsymbol{0}} (i.e. the foreground model is perfect), the 21-cm signal cannot be uniquely determined. Thus, an extra assumption about the form of the signal is required. The full set of assumptions needed to extract the global 21-cm signal with useful, rigorous uncertainties is as follows:

  1. 1.

    Sky-averaged radio data contains a sum of beam-weighted foreground emission and the global signal. This assumes a well-calibrated instrument.

  2. 2.

    The noise of the data follows a known or estimated distribution.222Usually, this distribution is a zero-mean Gaussian specified entirely by its covariance matrix, 𝑪{\boldsymbol{C}}, which must be known up to a constant of proportionality. The proportionality constant must be sufficiently close to 1 if we also wish to measure goodness-of-fit.

  3. 3.

    The true beam-weighted foreground can be fit with the given foreground model, described by the basis matrix 𝑭{\boldsymbol{F}}, to well below the noise level of the data. This is equivalent to 𝜹T𝑪1𝜹Nc{\boldsymbol{\delta}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\delta}}\ll N_{c}, where NcN_{c} is the number of channels in 𝒚{\boldsymbol{y}} and 𝜹{\boldsymbol{\delta}} is the unmodeled component of the foreground as above, given by 𝜹=(𝑰𝚽)𝒚(fg){\boldsymbol{\delta}}=({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}^{({\text{fg}})}.

  4. 4.

    The signal follows a specific form.

Assumption 4 can take many forms, some much stronger and more unjustified than others. In Bowman et al. (2018), the signal is assumed to follow the form of a flattened Gaussian profile. Note that this is an assumption for analyzing the data and cannot be justified by examining the data itself, especially in light of the Bayesian evidence-based analysis of Sims & Pober (2020), which showed that many possible assumed signal models lead to essentially the same Bayesian evidence. Crucially, the errors obtained via fitting a given model of the signal do not account for the unknown likelihood that the true signal can be fit by that model.

The specific flattened Gaussian profile in Bowman et al. (2018) merely represents the value of 𝒚(21){\boldsymbol{y}}^{(21)} from Equation 5 that best matches the assumed flattened Gaussian model. Different values of 𝒙{\boldsymbol{x}} along with different assumed signal models lead to equally valid interpretations of the data (see, e.g. Hills et al., 2018; Bradley et al., 2019; Singh & Subrahmanyan, 2019; Sims & Pober, 2020).

3 EDGES-like analysis

3.1 Assumptions

3.1.1 Assumption 1: calibration

The EDGES team has worked extensively on the calibration of their receiver (Monsalve et al., 2017a, b). Assuming solar, weather, RFI, and other transient events are removed, due to the rigor of their lab measurements and calibration strategy, it is reasonable to assume that, to the mK level, their data consists solely of beam-weighted foreground and global 21-cm signal.

3.1.2 Assumption 2: noise level

Although the covariance distribution of the noise is not known, the residuals presented in Bowman et al. (2018) seem to have a relatively flat magnitude of 20 mK when many smooth modes are removed. So, for our simulations we use 𝑪=σ2𝑰{\boldsymbol{C}}=\sigma^{2}{\boldsymbol{I}} where σ=20\sigma=20 mK and 𝑰{\boldsymbol{I}} is the identity matrix.

3.1.3 Assumption 3: foreground model

The EDGES analysis involves multiple linear foreground models, but a common model in the literature is a polynomial in ln(ν/ν0)\ln{(\nu/\nu_{0})} multiplied by (ν/ν0)2.5(\nu/\nu_{0})^{-2.5} where ν\nu is the observed frequency and ν0\nu_{0} is a reference frequency,

fg(a1,a2,,aNt)=(νν0)2.5k=1Ntak[ln(νν0)]k1.{\mathcal{M}}_{\text{fg}}(a_{1},a_{2},\ldots,a_{N_{t}})=\left(\frac{\nu}{\nu_{0}}\right)^{-2.5}\sum_{k=1}^{N_{t}}a_{k}\left[\ln{\left(\frac{\nu}{\nu_{0}}\right)}\right]^{k-1}. (6)

This is equivalent to assuming that 𝑭{\boldsymbol{F}} is a matrix with the terms (ν/ν0)2.5[ln(ν/ν0)]k1(\nu/\nu_{0})^{-2.5}[\ln{(\nu/\nu_{0})}]^{k-1} as its columns and the foreground coefficient vector 𝒂{\boldsymbol{a}} is given by 𝒂T=[a1a2aNt]{\boldsymbol{a}}^{T}=\begin{bmatrix}a_{1}&a_{2}&\cdots&a_{N_{t}}\end{bmatrix}. This model is based on the Taylor series approximation of T0(ν/ν0)βT_{0}(\nu/\nu_{0})^{\beta} around β=2.5\beta=-2.5, which should adequately describe the intrinsic synchrotron foreground in each sky direction (see also Hibbard et al., in preparation).

3.1.4 Assumption 4: signal model

In Bowman et al. (2018), the EDGES team uses a flattened Gaussian signal model of the form

𝓜21(A,μ,w,τ)=A(1eτeD1eτ),{\boldsymbol{\mathcal{M}}}_{21}(A,\mu,w,\tau)=A\left(\frac{1-e^{-\tau e^{D}}}{1-e^{-\tau}}\right), (7)

where

D=[2(νμ)w]2ln{1τln[1+eτ2]}.D=\left[\frac{2(\nu-\mu)}{w}\right]^{2}\ \ln{\left\{-\frac{1}{\tau}\ln{\left[\frac{1+e^{-\tau}}{2}\right]}\right\}}. (8)

In this section, we adopt the same model.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: The Haslam map scaled to 80 MHz with a spectral index of -2.5 shown with the zenith from the EDGES latitude (26.726.7^{\circ} S) as the uppermost point at different LSTs. The top left, top right, and lower left panels show the cases at LST hours of 0, 3, and 6. The lower right panel shows the case where the map has been smoothly smeared between hours 0 and 6 with directions below the horizon (grey region) masked out, which is the foreground used in the simulations in Sections 3 and 4. In all panels, the celestial poles are marked by white dots.

3.2 Simulations

To illustrate the potential problems with the EDGES analysis in Bowman et al. (2018), we have created simple simulations of the beam-weighted foreground expected to be seen from the Murchison Radio-astronomy Observatory (MRO) where the experiment is located.33326.726.7^{\circ} S, 116.6116.6^{\circ} E This section will lay out those simulations as well as apply the EDGES analysis to them.

The foreground brightness temperature, y(fg)y^{({\text{fg}})}, in the simulations presented throughout Sections 3 and 4 is given by

y(fg)(ν)=B(ν,θ,ϕ)T(ν,θ,ϕ)𝑑Ω,y^{({\text{fg}})}(\nu)=\int B(\nu,\theta,\phi)\ T(\nu,\theta,\phi)\ d\Omega, (9)

where B(ν,θ,ϕ)B(\nu,\theta,\phi) is the antenna beam, T(ν,θ,ϕ)T(\nu,\theta,\phi) is the foreground emission, and θ\theta and ϕ\phi are the spherical coordinate angles.

Refer to caption
Figure 2: Summary of EDGES-like simulations and analyses. Throughout the figure, curves with the same color correspond to the same beam FWHM curvature. Upper left: FWHM curves of the four simulated antenna beams as a function of frequency. Upper right: Residuals when the foregrounds generated with the beams from LST hours 0-6 at the EDGES site are fit with the foreground model of Equation 6 with Nt=6N_{t}=6. The Root-Mean-Square (RMS) value of each residual is shown in the legend (to be compared to noise level of 20 mK). Lower left: Flattened Gaussian best fits for each beam case when the foreground and absorption trough models are fit simultaneously. Large troughs appear as the beam curvature grows. Lower right: Residuals of the combined foreground and absorption trough fits that produced the signals in the lower left panel. The green dashed lines in the upper right and lower panels show results found when the Haslam map is replaced by a toy model of the galaxy that accounts only for the shape of the galactic plane (see Section 5.1.3 for details on the model). Note that the same noise realization was added to all five simulated spectra for comparison purposes.

3.2.1 Antenna beam

The antenna beam used in our simulations is a Gaussian beam with a frequency dependent Full Width at Half Maximum (FWHM), α(ν)\alpha(\nu), i.e.

B(ν,θ,ϕ)exp{4ln2[θα(ν)]2}.B(\nu,\theta,\phi)\propto\exp{\left\{-4\ln{2}\left[\frac{\theta}{\alpha(\nu)}\right]^{2}\right\}}. (10)

The FWHMs we use are parabolas with fixed endpoints at (50 MHz,83)(50\text{ MHz},83^{\circ}) and (100 MHz,104)(100\text{ MHz},104^{\circ})444These endpoints are similar to those of the beam of the blade antenna used by EDGES (see extended data figure 4a of Bowman et al., 2018). and varying curvature, cc. This means, they have the form

α(ν)=αl(ν)+cαc(ν),\alpha(\nu)=\alpha_{l}(\nu)+c\alpha_{c}(\nu), (11)

where αl(ν)\alpha_{l}(\nu) is the line connecting the two endpoints and αc(ν)\alpha_{c}(\nu) is the parabola with unit curvature and zeros at the endpoint frequencies. αl(ν)\alpha_{l}(\nu) and αc(ν)\alpha_{c}(\nu) are given by

αl(ν)\displaystyle\alpha_{l}(\nu) =0.42(ν1 MHz)+62,\displaystyle=0.42^{\circ}\left(\frac{\nu}{1\text{ MHz}}\right)+62^{\circ}, (12a)
αc(ν)\displaystyle\alpha_{c}(\nu) =12ν2(75 MHz)ν+(50 MHz)2.\displaystyle=\frac{1}{2}\nu^{2}-(75\text{ MHz})\nu+(50\text{ MHz})^{2}. (12b)

For illustration purposes, we use c/(1/MHz2)c/(1^{\circ}/\text{MHz}^{2}) values of 8.0×1038.0\times 10^{-3}, 1.6×1021.6\times 10^{-2}, 2.4×1022.4\times 10^{-2}, and 3.4×1023.4\times 10^{-2}. The FWHM curves of the four beams used in our simulations are shown in the upper left panel of Figure 2. To ensure that results are not tainted by any fraction of the beam below the horizon, the beams are normalized such that

B(ν,θ,ϕ)𝑑Ω=1,\iint_{\mathcal{H}}B(\nu,\theta,\phi)\ d\Omega=1, (13)

where \mathcal{H} represents the half of the sphere above the horizon, i.e. 0θπ/20\leq\theta\leq\pi/2 and 0ϕ<2π0\leq\phi<2\pi.

3.2.2 Foreground map

For the simulations used in this paper, we use the 408 MHz map from Haslam et al. (1982), scaled to lower frequencies via multiplication by a power law with spectral index -2.5, i.e.

T(ν,θ,ϕ,t)=THaslam(θ,ϕ,t)×(ν408 MHz)2.5.T(\nu,\theta,\phi,t)=T_{\text{Haslam}}(\theta,\phi,t)\times\left(\frac{\nu}{408\text{ MHz}}\right)^{-2.5}. (14)

The time dependence indicated in both TT and THaslamT_{\text{Haslam}} is determined by the rotation of the Earth from the EDGES latitude as a function of sidereal time. The Haslam maps, rotated to match the zenith direction at 0, 3, and 6 hr LST (local sidereal time) at 26.726.7^{\circ} S latitude, are shown in the top left, top right, and bottom left panels of Figure 1.

3.2.3 Observation strategy

The simulations were performed by averaging together equally all beam-weighted foregrounds between LST hours 0-6 from the EDGES latitude. As the central bulge of the Milky Way is closest to the zenith between LST hours 17 and 18, this strategy amounts to averaging during low foreground times. This observation strategy leads to an effective foreground given by

Teff(ν,θ,ϕ,06 hr LST)=0 hr LST6 hr LSTT(ν,θ,ϕ,t)𝑑tT_{\text{eff}}(\nu,\theta,\phi,0\rightarrow 6\text{ hr LST})=\int_{0\text{ hr LST}}^{6\text{ hr LST}}T(\nu,\theta,\phi,t)\ dt (15)

and shown at 80 MHz in the bottom right panel of Figure 1.

3.2.4 No 21-cm signal added

No 21-cm global signal was included in the simulations. All fits shown in Figure 2 are performed on the beam-weighted foreground alone with noise.

3.2.5 Random noise

To each simulated measurement of the beam-weighted foreground we add the same realization of noise at the 20 mK level (see Section 3.1.2) to control for the effect of noise without ignoring it.

3.3 Likelihood maximization techniques

For foreground-only fits, we maximize the likelihood through the analytical computation of the coefficient vector 𝒂ML,fg-only{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}} as in Equation 2. Then, the maximum likelihood foreground reconstruction is simply 𝑭𝒂ML,fg-only{\boldsymbol{F}}{\boldsymbol{a}}_{{\text{ML}},{\text{fg-only}}}.

For fits with both the linear foreground model and the flattened Gaussian absorption trough model, we numerically explore the space of the trough parameters {A,μ,w,τ}\{A,\mu,w,\tau\} to find the maximum likelihood value.555We use the minimize function from scipy.optimize to minimize the negative log-likelihood. At the kthk^{\text{th}} iteration of the optimization algorithm, the exploration is at (A(k),μ(k),w(k),τ(k))(A^{(k)},\mu^{(k)},w^{(k)},\tau^{(k)}). To evaluate the full likelihood at that point, we perform a foreground-only fit, as above, on the modified data spectrum given by 𝒚𝓜21(A(k),ν(k),w(k),τ(k)){\boldsymbol{y}}-{\boldsymbol{\mathcal{M}}}_{21}(A^{(k)},\nu^{(k)},w^{(k)},\tau^{(k)}). This allows for exploration of only the nonlinear absorption trough parameters instead of including the foreground parameters, which would slow down the analysis. This technique is similar to that used in Rapetti et al. (2019), except instead of exploring the entire posterior distribution of the trough parameters with a Markov Chain Monte Carlo simulation, here we merely wish to find its maximum through gradient ascent.

3.4 EDGES-like analysis results

The beam-weighted foregrounds with noise from the simulations described in Section 3.2 were fit with the model given by Equation 6, which is meant to describe the sky foreground completely. The residuals from these foreground-only fits are shown in the upper right panel of Figure 2. While the beam-weighted foreground from the lowest curvature FWHM beam case is fit to the noise level, it is clear that the residuals grow as the curvature increases, with the largest curvature beam case yielding residuals two times higher than the noise level. These foreground-only residuals correspond to the 𝒓{\boldsymbol{r}} curves from Equation 4 with 𝒚(21)=𝟎{\boldsymbol{y}}^{(21)}={\boldsymbol{0}}.

As in Bowman et al. (2018), we also performed fits that include both the linear foreground model and a flattened Gaussian absorption trough model. The resulting troughs are shown in the lower left panel of Figure 2. As the beam FWHM curvature increases, the size of the fit trough increases up to \sim800 mK, showing that inaccuracies in the foreground model (i.e. violations of assumption 3) can lead to false troughs appearing in fits. The residuals to these foreground and trough fits are shown in the lower right panel of Figure 2. In all four cases, these residuals are at the 20 mK noise level, showing that flattened Gaussian troughs can effectively complement the smooth foreground model from Equation 6 to fit the foreground down to the noise level.

The simulations presented here show that some frequency dependencies, such as curvature in the FWHM of an antenna beam (which there is some sign of in extended data Figure 4a of Bowman et al., 2018, although only three frequencies are shown), can induce structure in the beam-weighted foreground that cannot be fit out by a 6th6^{\text{th}} order polynomial. Not accounting for this structure can lead to artificial troughs being found when a signal model is fit simultaneously with the beam-weighted foreground. It is important to note that the simulations proposed here use simple beams, fully characterized by the FWHM function α(ν)\alpha(\nu), whereas real antenna beams have many independent modes of variation corresponding to the geometry and electrical properties of antenna components. One complication in the specific case of the EDGES beam is the lack of azimuthal symmetry, leading to different E- and H-plane beam patterns.

4 Minimum assumption analysis

4.1 A more robust assumption 4

For a robust analysis, instead of assumption 4 as laid out for the EDGES-like analysis in Section 3.1.4, which is not properly motivated, we can utilize exclusively the fact—not even used in the previous analysis—that, by definition, the global 21-cm signal must be spatially uniform. If the data 𝒚{\boldsymbol{y}} is a concatenation of NsN_{s} spectra taken by the same instrument instead of a single spectrum, i.e.

𝒚=[𝒚1T𝒚2T𝒚NsT]T,{\boldsymbol{y}}=\begin{bmatrix}{\boldsymbol{y}}_{1}^{T}&{\boldsymbol{y}}_{2}^{T}&\cdots&{\boldsymbol{y}}_{N_{s}}^{T}\end{bmatrix}^{T}, (16)

then the spatial uniformity of the signal implies that the signal contribution to each spectrum is identical, i.e. 𝒚k(21)=𝒚(21){\boldsymbol{y}}^{(21)}_{k}={\boldsymbol{y}}^{(21)} for all kk. On the other hand, the foregrounds of each spectrum will be different (unless the same sky is overhead in two or more of the spectra).

The analysis with the fewest possible assumptions about the signal would involve assuming nothing about the spectral behavior of the signal, i.e. using a signal model that can fit all values of 𝒚(21){\boldsymbol{y}}^{(21)} in Nν\mathbb{R}^{N_{\nu}} where NνN_{\nu} is the number of frequencies. This can be achieved by computing a so-called “expansion matrix” 𝚿{\boldsymbol{\Psi}} (see Tauscher et al., 2018, for the initial definition) that encodes how the signal appears in the data, i.e. the signal term in the full data, 𝒚{\boldsymbol{y}}, is 𝚿𝒚(21){\boldsymbol{\Psi}}{\boldsymbol{y}}^{(21)}. To encode the defining information aforementioned that all spectra have the same signal in them, we use 𝚿=[𝑰𝑰𝑰]T{\boldsymbol{\Psi}}=\begin{bmatrix}{\boldsymbol{I}}&{\boldsymbol{I}}&\cdots&{\boldsymbol{I}}\end{bmatrix}^{T}. We term an analysis with this form of the signal the minimum assumption analysis (MAA).

𝚿{\boldsymbol{\Psi}} can also be adapted to fit different experimental designs, such as full Stokes measurements (e.g., from CTP and DAPPER) or data for which a different amount of sky is blocked in each spectrum (such as it can be for DAPPER due to the shifting position of the moon).

4.2 Assumption 3: choosing foreground basis

4.2.1 Polynomial bases

Using identical and independent basis sets for each spectrum leads to a degeneracy between the foreground and signal models, causing infinite uncertainties. To see this, suppose that each spectrum has its own polynomial basis, represented by the matrix 𝑭0{\boldsymbol{F}}_{0}. In this case, the full foreground basis matrix is

𝑭=[𝑭0𝟎𝟎𝟎𝑭0𝟎𝟎𝟎𝑭0].{\boldsymbol{F}}=\begin{bmatrix}{\boldsymbol{F}}_{0}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{F}}_{0}&\cdots&{\boldsymbol{0}}\\ \vdots&\vdots&\ddots&\vdots\\ {\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{F}}_{0}\end{bmatrix}. (17)

Multiplying this basis by a coefficient vector of the form 𝒙(fg)=[𝜻T𝜻T𝜻T]T{\boldsymbol{x}}^{({\text{fg}})}=\begin{bmatrix}{\boldsymbol{\zeta}}^{T}&{\boldsymbol{\zeta}}^{T}&\cdots&{\boldsymbol{\zeta}}^{T}\end{bmatrix}^{T}, leads to a foreground vector of the form 𝑭𝒙(fg)=𝚿𝑭0𝜻{\boldsymbol{F}}{\boldsymbol{x}}^{({\text{fg}})}={\boldsymbol{\Psi}}{\boldsymbol{F}}_{0}{\boldsymbol{\zeta}}. Since, in this case, the foreground basis can generate vectors in the column space of 𝚿{\boldsymbol{\Psi}}, the foreground and signal bases are degenerate and the uncertainties diverge to infinity. This is true even if the 𝑭0{\boldsymbol{F}}_{0} basis can exactly fit the foregrounds in each spectrum. Note that this explanation holds no matter what the true beam-weighted foreground is because it only depends on the form of the beam-weighted foreground model.

4.2.2 General choice of basis

We wish to find a basis that satisfies 𝑭T𝑪1𝑭=𝑰{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}} and can accurately fit the beam-weighted foreground expected for each spectrum. We suggest generating a simulated training set of foregrounds of the form

𝑩=[𝒃1(1)𝒃1(2)𝒃1(Nt)𝒃2(1)𝒃2(2)𝒃2(Nt)𝒃Ns(1)𝒃Ns(2)𝒃Ns(Nt)],{\boldsymbol{B}}=\begin{bmatrix}{\boldsymbol{b}}^{(1)}_{1}&{\boldsymbol{b}}^{(2)}_{1}&\cdots&{\boldsymbol{b}}^{(N_{t})}_{1}\\ {\boldsymbol{b}}^{(1)}_{2}&{\boldsymbol{b}}^{(2)}_{2}&\cdots&{\boldsymbol{b}}^{(N_{t})}_{2}\\ \vdots&\vdots&\ddots&\vdots\\ {\boldsymbol{b}}^{(1)}_{N_{s}}&{\boldsymbol{b}}^{(2)}_{N_{s}}&\cdots&{\boldsymbol{b}}^{(N_{t})}_{N_{s}}\end{bmatrix}, (18)

where 𝒃n(m){\boldsymbol{b}}^{(m)}_{n} is the nthn^{\text{th}} spectrum of the mthm^{\text{th}} simulation and there are NtN_{t} total simulations. For a given number of basis vectors NFN_{F}, weighted Singular Value Decomposition (SVD) of 𝑩{\boldsymbol{B}} as in Tauscher et al. (2018) yields a basis 𝑭{\boldsymbol{F}} satisfying the normalization conditions.666Weighted SVD refers to a decomposition of the form 𝑩=𝑼𝚺𝑽T{\boldsymbol{B}}={\boldsymbol{U}}{\boldsymbol{\Sigma}}{\boldsymbol{V}}^{T} where 𝑼T𝑪1𝑼=𝑰{\boldsymbol{U}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{U}}={\boldsymbol{I}}, 𝑽T𝑽=𝑰{\boldsymbol{V}}^{T}{\boldsymbol{V}}={\boldsymbol{I}}, and 𝚺{\boldsymbol{\Sigma}} is a rectangular diagonal matrix with decreasing non-negative elements on the diagonal. The basis matrix 𝑭{\boldsymbol{F}} is made from the first NFN_{F} columns of 𝑼{\boldsymbol{U}}. This basis will encode expected correlations between the different spectra in the foreground model directly, which is a crucial aspect of the analysis necessary to obtain finite errors (see also Tauscher et al., 2020) when allowing for any signal to be fit as discussed in Section 4.1.

4.3 Maximum likelihood calculations

With a data vector 𝒚{\boldsymbol{y}}, a signal expansion matrix 𝚿{\boldsymbol{\Psi}}, a noise covariance matrix 𝑪{\boldsymbol{C}}, and a foreground basis matrix 𝑭{\boldsymbol{F}}, the channel covariance 𝚫(21){\boldsymbol{\Delta}}^{(21)} of the signal and the corresponding mean 𝜸(21){\boldsymbol{\gamma}}^{(21)} are given by

𝚫(21)\displaystyle{\boldsymbol{\Delta}}^{(21)} =[𝚿T𝑪1(𝑰𝚽)𝚿]1,\displaystyle=\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}, (19a)
𝜸(21)\displaystyle{\boldsymbol{\gamma}}^{(21)} =𝚫(21)𝚿T𝑪1(𝑰𝚽)𝒚,\displaystyle={\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}, (19b)

where, under the normalization condition 𝑭T𝑪1𝑭=𝑰{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}}, 𝚽{\boldsymbol{\Phi}} is the foreground projection matrix described in Section 2, given by 𝚽=𝑭𝑭T𝑪1{\boldsymbol{\Phi}}={\boldsymbol{F}}{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}. The calculations leading to Equations 19a and 19b are presented in Appendix A. Appendices B and C give the forms of 𝚫(21){\boldsymbol{\Delta}}^{(21)} and 𝜸(21){\boldsymbol{\gamma}}^{(21)} for the specific cases of NsN_{s} total power spectra, where the signal expansion matrix is 𝚿=[𝑰𝑰𝑰]T{\boldsymbol{\Psi}}=\begin{bmatrix}{\boldsymbol{I}}&{\boldsymbol{I}}&\cdots&{\boldsymbol{I}}\end{bmatrix}^{T}, and 4Ns4N_{s} spectra of all 4 Stokes parameters, where the expansion matrix is 𝚿=[𝑰𝟎𝟎𝟎𝑰𝟎𝟎𝟎]{\boldsymbol{\Psi}}=\begin{bmatrix}{\boldsymbol{I}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{I}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\end{bmatrix}.777For the sake of simplicity, results using the MAA with polarization are not shown in this paper, but the equations of Appendix C will prove useful in future work.

Refer to caption
Figure 3: 1σ\sigma uncertainty levels under the minimum assumption analysis (MAA) with varying number of time bins. The dashed line represents the noise level on the signal. We assume an integration time of 100 hours split evenly across all time bins.
Refer to caption
Refer to caption
Figure 4: Top: Example signal reconstruction for the MAA using 6 time bins. While the case shown is a Gaussian in frequency, note that the MAA can be used in the same manner obtaining equal errors to measure any possible signal spectrum. The black line shows the input signal. The blue line shows the channel mean of the reconstruction, given by 𝜸(21){\boldsymbol{\gamma}}^{(21)} in Equation 19b, and the intervals show 1 and 2 times the square root of the diagonal elements of 𝚫(21){\boldsymbol{\Delta}}^{(21)} from Equation 19a. Because the beam-weighted foreground model (derived from the training set) fits the given beam-weighted foreground case (which was taken from the training set), these intervals correspond to 68% and 95% confidence levels. Bottom: Frequency correlation matrix defined in Equation 20. Clearly, the foreground filter generated by this training set induces positive correlations between all frequencies of the 21-cm signal estimate. As in Figure 3, both panels assume an integration time of 100 hours split evenly across all time bins.

4.4 Simulation details

To test the MAA, we apply it using a foreground training set that is made identically to that used in Tauscher et al. (2020), which we briefly review here. Like the simulations from Section 3, the foreground map used in this training set was the Haslam map scaled down in frequency using a spatially constant spectral index of -2.5. The pointing of the antenna is set by the EDGES latitude and the sources and beam below the horizon are set to zero as in Section 3. The beams used have FWHMs specified by a distribution of quadratic functions given by α(ν)=k=02bkLk(ν(80 MHz)40 MHz)\alpha(\nu)=\sum_{k=0}^{2}b_{k}L_{k}\left(\frac{\nu-(80\text{ MHz})}{40\text{ MHz}}\right) where LkL_{k} are the kthk^{\text{th}} order Legendre polynomials and the bkb_{k}’s are normal with means μ0=70\mu_{0}=70^{\circ}, μ1=20\mu_{1}=-20^{\circ}, and μ2=0\mu_{2}=0^{\circ} and standard deviations σ0=10\sigma_{0}=10^{\circ}, σ1=5\sigma_{1}=5^{\circ}, and σ2=5\sigma_{2}=5^{\circ}.

The day is split into 100 LST chunks,888In each of these \sim15 minute chunks, the foreground is smeared as shown in Figure 1. which are then averaged to the number of bins used for the given analysis. For example, in the 5 time bins case, LST chunks 1-20, 21-40, 41-60, 61-80, and 81-100 are used to generate the 5 time bins. If the number of time bins used does not divide 100, then the 100 chunks are reduced to the largest multiple of the number of bins less than 100 before binning.

4.5 MAA results

The results of applying MAA using the training set described in Section 4.4 are shown in Figures 3 and 4. Figure 3 shows the 1σ1\sigma uncertainty levels at each frequency using different numbers of total power spectra in the training set. These uncertainties are very large for small numbers of bins (indeed, they diverge when only one time bin is used), but approach the noise level for large numbers of bins. Note that different training sets will lead to different results and that confidence levels of 68%68\%, 95%95\%, and 99.7%99.7\% only correspond to 1σ1\sigma, 2σ2\sigma, and 3σ3\sigma when the training set adequately describes the beam-weighted foreground. If the training set is not adequate, any given percentage confidence interval will be wider than expected based on the σ\sigma-level. Appendix D derives the so-called signal bias statistic, ε\varepsilon, from Tauscher et al. (2018) in the MAA case when the foreground training set is imperfect. This statistic connects percentage confidence to the σ\sigma-level at which the signal is inside the interval in a root-mean-square sense.

The top panel of Figure 4 shows an example reconstruction found when applying the MAA with 6 time bins to the sum of a beam-weighted foreground curve from the training set and an additional signal component that is Gaussian in frequency. The two intervals show the 1σ1\sigma and 2σ2\sigma confidence levels at each individual frequency, which are 1 and 2 times the square roots of the diagonal elements of 𝚫(21){\boldsymbol{\Delta}}^{(21)}, respectively. Note, however, that even though a Gaussian signal is used in this example, as shown in the figure, the method would work equally well, providing the same errors, with any conceivable global 21-cm signal.

The bottom panel of Figure 4 shows the frequency correlation matrix, 𝚲(21){\boldsymbol{\Lambda}}^{(21)}, which is a normalized form of the covariance matrix, 𝚫(21){\boldsymbol{\Delta}}^{(21)}. The correlation matrix is designed to contain only values between 1 and -1, which occur at perfect correlation and anti-correlation, respectively. It is defined through Λij(21)=Corr[γi,γj]=Cov[γi,γj]/Var[γi]Var[γj]\Lambda^{(21)}_{ij}=\text{Corr}[\gamma_{i},\gamma_{j}]=\text{Cov}[\gamma_{i},\gamma_{j}]/\sqrt{\text{Var}[\gamma_{i}]\,\text{Var}[\gamma_{j}]}, which can also be written as

𝚲(21)=(𝚪(21))1/2𝚫(21)(𝚪(21))1/2,{\boldsymbol{\Lambda}}^{(21)}=\left({\boldsymbol{\Gamma}}^{(21)}\right)^{-1/2}{\boldsymbol{\Delta}}^{(21)}\left({\boldsymbol{\Gamma}}^{(21)}\right)^{-1/2}, (20)

where 𝚪(21){\boldsymbol{\Gamma}}^{(21)} has the same diagonal elements as 𝚫(21){\boldsymbol{\Delta}}^{(21)} but has no nonzero off-diagonal elements. By construction, all diagonal elements of 𝚲(21){\boldsymbol{\Lambda}}^{(21)} are equal to 1. It is clear from Figure 4 that all of the frequency channels are positively correlated with each other in this particular case.

To completely utilize the power of the MAA, the full covariance matrix, which would include such correlations, should be used when defining a likelihood function to extend MAA results to constrain any given signal model.

5 Discussion

5.1 EDGES-like analysis

5.1.1 Beam chromaticity distortions

The fits in Section 3 (Figure 2) show that false troughs can appear when fitting a single spectrum. Some may argue that troughs cannot be created by the foreground because the spectral dependence of the foreground does not allow it. However, one needs to keep in mind that what is measured is not the intrinsic spectral structure of the foreground radiation, but rather the spectral structure of the beam-weighted foreground. Since the angular structure of the beam weighting changes from frequency to frequency the foreground component of the measured spectrum is composed of different ratios of each direction’s radiation at each frequency.

This means that a linear model that fits the intrinsic foreground spectrum of each pixel, like the model from Equation 6, will not necessarily fit the beam-weighted foreground. To illustrate this, suppose that every sky direction’s intrinsic foreground spectrum can be fit by a model using a basis matrix 𝑭0{\boldsymbol{F}}_{0}, i.e. 𝑻(θ,ϕ)=𝑭0𝒙(θ,ϕ){\boldsymbol{T}}(\theta,\phi)={\boldsymbol{F}}_{0}{\boldsymbol{x}}(\theta,\phi) where the kthk^{\text{th}} element of 𝑻(θ,ϕ){\boldsymbol{T}}(\theta,\phi) is the foreground spectrum at the kthk^{\text{th}} frequency. If the beam is achromatic, then the beam-weighted foreground is given by

𝒚achromatic(fg)\displaystyle{\boldsymbol{y}}^{({\text{fg}})}_{\text{achromatic}} =B(θ,ϕ)𝑻(θ,ϕ)𝑑Ω,\displaystyle=\int B(\theta,\phi)\ {\boldsymbol{T}}(\theta,\phi)\ d\Omega, (21a)
=𝑭0B(θ,ϕ)𝒙(θ,ϕ)𝑑Ω,\displaystyle={\boldsymbol{F}}_{0}\int B(\theta,\phi)\ {\boldsymbol{x}}(\theta,\phi)\ d\Omega, (21b)

where B(θ,ϕ)B(\theta,\phi) is the beam pattern at every frequency that satisfies B(θ,ϕ)𝑑Ω=1\int B(\theta,\phi)\ d\Omega=1. This means that a linear model that fits 𝑻(θ,ϕ){\boldsymbol{T}}(\theta,\phi) for every angle also fits the beam-weighted foreground 𝒚achromatic(fg){\boldsymbol{y}}^{({\text{fg}})}_{\text{achromatic}}, i.e. the foreground model residual is 𝜹achromatic=(𝑰𝚽)𝒚achromatic(fg)=𝟎{\boldsymbol{\delta}}_{\text{achromatic}}=({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}^{({\text{fg}})}_{\text{achromatic}}={\boldsymbol{0}}. In the case of beam chromaticity, the beam is a diagonal matrix with the diagonal entries being the beam patterns at each frequency, satisfying 𝑩(θ,ϕ)𝑑Ω=𝑰\int{\boldsymbol{B}}(\theta,\phi)\ d\Omega={\boldsymbol{I}}. In this case, the beam-weighted foreground is

𝒚chromatic(fg)\displaystyle{\boldsymbol{y}}^{({\text{fg}})}_{\text{chromatic}} =𝑩(θ,ϕ)𝑻(θ,ϕ)𝑑Ω\displaystyle=\int{\boldsymbol{B}}(\theta,\phi)\ {\boldsymbol{T}}(\theta,\phi)\ d\Omega (22a)
=𝑩(θ,ϕ)𝑭0𝒙(θ,ϕ)𝑑Ω\displaystyle=\int{\boldsymbol{B}}(\theta,\phi)\ {\boldsymbol{F}}_{0}{\boldsymbol{x}}(\theta,\phi)\ d\Omega (22b)

and the residual

𝜹chromatic=(𝑰𝚽)𝑩(θ,ϕ)𝑭0𝒙(θ,ϕ)𝑑Ω{\boldsymbol{\delta}}_{\text{chromatic}}=\int({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{B}}(\theta,\phi)\ {\boldsymbol{F}}_{0}{\boldsymbol{x}}(\theta,\phi)\ d\Omega (23)

is not necessarily zero,999The only situation in which 𝜹chromatic{\boldsymbol{\delta}}^{\text{chromatic}} is necessarily zero for all beams, 𝑩(θ,ϕ){\boldsymbol{B}}(\theta,\phi), occurs when 𝑰=𝚽{\boldsymbol{I}}={\boldsymbol{\Phi}}. But, this implies that projecting into the span of the foreground basis has no effect, which can only happen if the foreground basis is complete. In this case, the signal model will be degenerate with the foreground model and therefore cannot be extracted. leading to a possible failure of the assumption that the foreground model is adequate (Assumption 3). Therefore, while troughs are unlikely to appear in intrinsic foreground spectra, they may well appear when beam-weighted foregrounds are fit with a model of a trough and an intrinsic foreground spectrum model simultaneously.

5.1.2 EDGES beam chromaticity factor

In several of their works (see, e.g., Bowman et al., 2018; Monsalve et al., 2018; Mozdzen et al., 2019), the EDGES team uses what they term the “beam chromaticity factor” (see Equation 14 of Monsalve et al., 2017a), defined at each LST through

C(ν)=Bref(ν,θ,ϕ)Tref(ν,θ,ϕ)𝑑ΩBref(νref,θ,ϕ)Tref(ν,θ,ϕ)𝑑Ω,C(\nu)=\frac{\int B_{\text{ref}}(\nu,\theta,\phi)\ T_{\text{ref}}(\nu,\theta,\phi)\ d\Omega}{\int B_{\text{ref}}(\nu_{\text{ref}},\theta,\phi)\ T_{\text{ref}}(\nu,\theta,\phi)\ d\Omega}, (24)

where the LST dependence comes from how the temperature map is rotated with respect to the beam and Bref(ν,θ,ϕ)B_{\text{ref}}(\nu,\theta,\phi) and Tref(ν,θ,ϕ)T_{\text{ref}}(\nu,\theta,\phi) are the assumed forms of the beam and foreground. The goal of this factor is to make the spectrum appear as if it was measured with an achromatic beam (specifically, the beam at ν=νref\nu=\nu_{\text{ref}}) so that, as discussed in Section 5.1.1, basis vectors that fit the intrinsic foreground spectra can be used to fit the beam-weighted foreground spectrum. If we denote a spectrum measured at a single LST by y(ν)y(\nu), then the EDGES beam calibration forms a corrected spectrum, y(ν)=y(ν)/C(ν)y^{\prime}(\nu)=y(\nu)/C(\nu). Neglecting noise and possible receiver errors for the sake of clarity, we can express the measured beam-weighted foreground spectrum through

y(ν)=B(ν,θ,ϕ)T(ν,θ,ϕ)𝑑Ω,y(\nu)=\int B(\nu,\theta,\phi)\ T(\nu,\theta,\phi)\ d\Omega, (25)

where B(ν,θ,ϕ)B(\nu,\theta,\phi) and T(ν,θ,ϕ)T(\nu,\theta,\phi) are the unknown forms of the true beam and foreground. With these definitions, the corrected spectrum is given by

y(ν)=Bref(νref,θ,ϕ)Tref(ν,θ,ϕ)𝑑Ω×B(ν,θ,ϕ)T(ν,θ,ϕ)𝑑ΩBref(ν,θ,ϕ)Tref(ν,θ,ϕ)𝑑Ω.y^{\prime}(\nu)=\int B_{\text{ref}}(\nu_{\text{ref}},\theta,\phi)\ T_{\text{ref}}(\nu,\theta,\phi)\ d\Omega\\ \times\frac{\int B(\nu,\theta,\phi)\ T(\nu,\theta,\phi)\ d\Omega}{\int B_{\text{ref}}(\nu,\theta,\phi)\ T_{\text{ref}}(\nu,\theta,\phi)\ d\Omega}. (26)

The intention is for the fraction on the second line of Equation 26 to be 1 so that y(ν)y^{\prime}(\nu) is simply the reference foreground weighted by the reference beam at the reference frequency. However, in order to assume that the fraction is 1, one must assume that Bref(ν,θ,ϕ)=B(ν,θ,ϕ)B_{\text{ref}}(\nu,\theta,\phi)=B(\nu,\theta,\phi) and Tref(ν,θ,ϕ)=T(ν,θ,ϕ)T_{\text{ref}}(\nu,\theta,\phi)=T(\nu,\theta,\phi); but, this is not a reasonable assumption as foreground and beam models are likely imperfect.101010If the reference beam and foreground were equal to the true beam and foreground, then one would know the exact beam-weighted foreground and should simply subtract it from the measured spectrum, leaving only noise and the 21-cm signal. The beam chromaticity factor calibration method is therefore prone to introduce significant biases. On the contrary, our method of deriving modes describing the beam-weighted foreground from a large sample of simulations and observations allows uncertainties in both the beam and foreground to be included in the analysis. However, as will be elaborated on in Section 5.2, it is important to note that our method is contingent on the ability of the simulations and observations to accurately describe the data being analyzed. SVD can only provide eigenmodes in the space spanned by the training set curves.

5.1.3 Toy galaxy model

In Section 3, in addition to the Haslam 408 MHz map, we also tested a toy model of galactic emission described by

Ttoy(ν,θ,ϕ)=(25 K)(ν408 MHz)2.5×{1+[9+2cosϕ]2[(θπ2)/(π45)]2}.T_{\text{toy}}(\nu,\theta,\phi)=\left(25\text{ K}\right)\ \left(\frac{\nu}{408\text{ MHz}}\right)^{-2.5}\\ \times\left\{1+\left[9+2\cos{\phi}\right]2^{-\left[\left(\theta-\frac{\pi}{2}\right)/\left(\frac{\pi}{45}\right)\right]^{2}}\right\}. (27)

This is a model that treats the Galactic plane as a 88^{\circ} FWHM Gaussian in colatitude, θ\theta, with a maximum given by 300 K at the Galactic center and 200 K at the anticenter and asymptotes to 25 K away from the plane. This map is then scaled by [ν/(408 MHz)]2.5[\nu/(408\text{ MHz})]^{-2.5}. Due to the fact that the false troughs remain after this change (see e.g. the dashed green line in the lower left panel of Figure 2), we infer that the chromaticity of the quadratic beams used in Section 3 interacts with the galactic plane as seen from the EDGES latitude to generate residuals that are better fit with a flattened Gaussian plus foreground model than the foreground model alone, which could lead to analyses falsely concluding that there are troughs in the sky-averaged radio spectrum.

Since the false troughs found in Section 3 are fits to aspects of the beam-weighted foreground, we urge EDGES to perform the same experiment and analysis from the northern hemisphere where the galaxy behaves very differently in the sky than from the southern hemisphere. Because of the different orientation of the galactic plane, the distortions caused by chromatic beams in our simulations are more easily fit by foreground models like that in Equation 6 at northern latitudes than southern latitudes.

5.1.4 Exploring parameter distributions

The natural endpoint of the EDGES-like analysis is not a maximum likelihood fit like the one shown in Figure 1 of Bowman et al. (2018), on which Figure 2 was based. Instead, one wishes to explore the parameter distribution using a sampling method like Markov Chain Monte Carlo (MCMC) exploration. However, the resulting distribution is only meaningful if the pieces of the model (i.e. foreground and signal models) accurately describe their respective components. In fact, parameter distributions from global 21-cm signal fits to observations are generated by a complex combination of the following four factors: 1) the foreground model parameterization; 2) the bias of the foreground model; 3) the signal model parameterization; 4) the bias of the signal model.

Ideally, factors 2 and 4 are unimportant because the foreground and signal models can fit the true foreground and signal to within the noise level. However, it is impossible to fully verify that this is the case in practice; so, their effects must be considered. Section 3 is not meant to explain the exact form and parameter distribution of the trough observed by EDGES, but instead to show that generic polynomial foreground models like those used by EDGES do not account for beam chromaticity and may lead to untrustable results. Essentially, we are pointing out that factor 2 is likely affecting the EDGES analysis.

In addition to generating bias in fits, residuals from fits with inaccurate foreground models may have led EDGES to choose the unjustified flattened Gaussian signal model, possibly furthering the signal bias of factor 4 (with respect to a more conventional signal model) silently while significantly reducing overall residuals.111111If true, this would be a case of factor 2 affecting factor 3. For robustness reasons, it is important to justify the signal model in a way external to the observations themselves.

Due to the complex interactions between the four factors in generating a posterior distribution, MCMC exploration of distributions is outside the scope of this paper and is left for an extended examination in future work.

5.2 Minimum assumption analysis

5.2.1 Advantages and limitations

While moving the experiment to a different location could provide evidence that the reported trough is a product of an inadequate foreground model, this does not solve the underlying problem—that the foreground model does not account for beam chromaticity.

The MAA proposed in Section 4 avoids this problem by making a basis specific to the given antenna by performing SVD on a training set of simulations made with that antenna’s beam (see Section 4.2.2).

In addition, the MAA allows any possible global signal, avoiding unwanted bias from generating a signal model that interacts with the foreground model bias to produce false results. In fact, under the assumption that the foreground model generated by SVD is adequate, the MAA signal mean 𝜸(21){\boldsymbol{\gamma}}^{(21)} and covariance 𝚫(21){\boldsymbol{\Delta}}^{(21)} constitute a direct measurement not of a signal model, but of the signal itself. In fact, since all degrees of freedom are retained, a physical model can be fit directly to the constraints implied by 𝜸(21){\boldsymbol{\gamma}}^{(21)} and 𝚫(21){\boldsymbol{\Delta}}^{(21)} (such as those shown in the bottom panel of Figure 3).

Note that, for simplicity, the cosmic microwave background (CMB) is not included in the training set used in Section 4, which merely uses the Haslam map scaled with a spectral index of -2.5. If it was included (as it should be when analyzing data from a real experiment),121212More precisely, the CMB should be subtracted from the foreground model used to generate the training set so that it can be found by the signal fitting and subtracted after the fact. then the signal mean 𝜸(21){\boldsymbol{\gamma}}^{(21)} would include a 2.725 K flat spectrum component which would need to be subtracted out.

Another important note about the MAA is that it is predicated upon the beam-weighted foreground training set provided to it. The results may change if the training set is changed, even if the data being analyzed remain the same. The training set used in Section 4 contained a wide variety of beam FWHMs but only one intrinsic foreground map. A different training set built using many intrinsic foreground maps and one beam might produce errors that differ significantly from the ones presented in Figure 3. However, even as different training sets would lead to different levels of precision, the accuracy of the MAA should remain steady as the training set changes as long as the true beam-weighted foreground is encompassed by the SVD eigenmodes of the training sets.131313Here, we use precision to refer to the size of uncertainties, which is known even when the true 21-cm signal is unknown and accuracy to refer to the size of the difference between the signal reconstruction and the true signal with respect to the size of the uncertainties. One aspect of this effect is that some training sets will require more modes to be included in the foreground model than others, forcing one to degrade precision in order to retain accuracy.

5.2.2 Extension to motion induced dipole

Slosar (2017) pointed out that, much like with the Cosmic Microwave Background (CMB), Doppler shifting of the 21-cm background induces a dipole component that is related to the monopole. In particular, the dipole component, δTb(dip)\delta T_{b}^{(\text{dip})}, of the 21-cm signal is related to the monopole spectrum, δTb(mon)\delta T_{b}^{(\text{mon})}, through141414Note that this assumes that there is no intrinsic (i.e. non-motion-induced) dipole of the 21-cm signal.

δTb(dip)(ν,ψ)=βcosψ(1νddν)δTb(mon)(ν),\delta T_{b}^{(\text{dip})}(\nu,\psi)=\beta\cos{\psi}\left(1-\nu\frac{d}{d\nu}\right)\delta T_{b}^{(\text{mon})}(\nu), (28)

where β\beta is the magnitude of our velocity with respect to the CMB as a fraction of the speed of light and ψ\psi is the angle between that velocity and the line of sight. Based on CMB observations, β1.2×103\beta\approx 1.2\times 10^{-3}. Deshpande (2018) suggested that Equation 28 should be considered an essential qualifying test of any measurement of the global signal. Due to the fact that (by Equation 28), δTb(dip)\delta T_{b}^{(\text{dip})} is linear in δTb(mon)\delta T_{b}^{(\text{mon})}, it could conceivably be included in the MAA in order for any fit to pass this test by default, essentially extending the minimal signal assumption introduced in Section 4.1. To do so, we note that the sum of the monopole and dipole components is

δTb(mon+dip)(ν,ψ)=[1+βcosψ(1νddν)]s(ν),\delta T_{b}^{(\text{mon}+\text{dip})}(\nu,\psi)=\left[1+\beta\cos{\psi}\left(1-\nu\frac{d}{d\nu}\right)\right]s(\nu), (29)

where s(ν)=δTb(mon)(ν)s(\nu)=\delta T_{b}^{(\text{mon})}(\nu). To determine how this would appear to a single antenna experiment, we must multiply by the beam and integrate over all angles. Defining tk(ν)Bk(ν,θ,ϕ)δTb(mon+dip)(ν,ψ)𝑑Ωt_{k}(\nu)\equiv\int B_{k}(\nu,\theta,\phi)\ \delta T_{b}^{(\text{mon}+\text{dip})}(\nu,\psi)\ d\Omega as the signature of the 21-cm signal in the spectrum measured at the kthk^{\text{th}} time period (or, equivalently, pointing angle), it is clear that

tk(ν)=[1+βak(ν)(1νddν)]s(ν),t_{k}(\nu)=\left[1+\beta a_{k}(\nu)\left(1-\nu\frac{d}{d\nu}\right)\right]s(\nu), (30)

where ak(ν)=Bk(ν,θ,ϕ)cosψdΩa_{k}(\nu)=\int B_{k}(\nu,\theta,\phi)\ \cos{\psi}\ d\Omega.151515Note that ak(ν)a_{k}(\nu) will not be known exactly because the beam will not be known exactly. However, since the dipole component will be on the order of 10 mK (see Slosar, 2017) and ak(ν)a_{k}(\nu) will be of order unity, the expected imprecision levels in the beam, which should be at the sub-percent level, should not impact the results significantly. This can be written in matrix notation where frequency channels correspond to the elements of vectors. In this way, the signature of the 21-cm signal in the kthk^{\text{th}} spectrum is

𝒕k=[𝑰+β𝑨k(𝑰𝑵𝑫)]𝒔,{\boldsymbol{t}}_{k}=\left[{\boldsymbol{I}}+\beta{\boldsymbol{A}}_{k}({\boldsymbol{I}}-{\boldsymbol{N}}{\boldsymbol{D}})\right]{\boldsymbol{s}}, (31)

where 𝑨k{\boldsymbol{A}}_{k} is the diagonal matrix with nonzero elements given by 𝒂k{\boldsymbol{a}}_{k}, 𝑵{\boldsymbol{N}} is the diagonal matrix with the frequencies of the data channels on the diagonal, and 𝑫{\boldsymbol{D}} is a derivative matrix (see Appendix E for subtleties involved in taking derivatives with a matrix). Now, combining all NsN_{s} spectra into one vector leaves 𝒕=𝚿𝒔{\boldsymbol{t}}={\boldsymbol{\Psi}}{\boldsymbol{s}} where

𝚿=[𝑰𝑰𝑰]+β[𝑨1𝑨2𝑨Ns](𝑰𝑵𝑫).{\boldsymbol{\Psi}}=\begin{bmatrix}{\boldsymbol{I}}\\ {\boldsymbol{I}}\\ \vdots\\ {\boldsymbol{I}}\end{bmatrix}+\beta\begin{bmatrix}{\boldsymbol{A}}_{1}\\ {\boldsymbol{A}}_{2}\\ \vdots\\ {\boldsymbol{A}}_{N_{s}}\end{bmatrix}({\boldsymbol{I}}-{\boldsymbol{N}}{\boldsymbol{D}}). (32)

To extend the MAA to include the motion-induced dipole, one must merely plug this 𝚿{\boldsymbol{\Psi}} into Equations 19a and 19b. The effect of including the motion-induced dipole on the results of the MAA is left for future work.

5.3 Between the two extremes

In past work (Tauscher et al., 2018; Rapetti et al., 2019; Tauscher et al., 2020), we laid out an SVD-based pipeline for extracting the 21-cm global signal from the large beam-weighted foregrounds. That analysis is identical in its foreground basis generation to the MAA from Section 4. However, it differs in that the signal is restricted to a specific model, either a physically motivated one or one created from performing SVD on a signal training set. So, in a sense, the pipeline described in those works is between the two extremes discussed in this paper of strong assumptions (EDGES-like analysis) and robust assumptions (MAA).

6 Conclusions

In this paper, we formulated a list of general assumptions for global 21-cm signal analyses. These include assumptions about the calibration of the instrument (Assumption 1), the distribution of the noise (Assumption 2), the adequacy of the foreground model (Assumption 3), and the form of the signal (Assumption 4). We then contrasted two different specific forms of these assumptions, an EDGES-like analysis and a new, so-called minimum assumption analysis (MAA).

The EDGES-like analysis is performed on single spectra with a polynomial-based foreground model and a flattened Gaussian signal model. We presented fits of simulations using zero signal and the Haslam map (as seen from the EDGES latitude) scaled with a constant spectral index and weighted by four different Gaussian beams with quadratic FWHMs. Two of these fits resulted in large flattened Gaussian troughs near 78 MHz, like the one reported by the EDGES team. These troughs also appeared when the Haslam map was replaced by a toy model of galactic emission that uses a 25 K background temperature (at 408 MHz) and models the galactic plane as a Gaussian in latitude with a peak of 300 K at the galactic center and 200 K at the anticenter, showing that the interaction of the beam with the Galactic plane introduces troughs.

This vulnerability to false troughs is due to the inadequacy of the foreground model, which does not account for beam chromaticity. While the only real solution to this problem is to modify the analysis technique, steps such as moving the experiment to the northern hemisphere, where the galaxy appears very differently, should at least modify (and potentially decrease) any false troughs that are found.

The MAA, on the other hand, is performed on many time-binned spectra (see also Tauscher et al., 2020) instead of one spectrum. Also, instead of assuming a specific model for the signal, it allows for any possible spectrum that is the same across each time bin. In addition, the foreground model accounts for beam chromaticity because the basis vectors are taken from applying Singular Value Decomposition (SVD) to a training set of foregrounds weighted by the beam of the specific antenna being used (Tauscher et al., 2018). Given the beam-weighted foreground training set employed, we found that, under these assumptions, any signal can be measured with uncertainties within an order of magnitude of the noise level.

While the MAA could be considered the most robust, conservative form of the assumptions specified in Section 2, if there are well motivated theoretical models for the signal, the pipeline we laid out in Tauscher et al. (2018), Rapetti et al. (2019), and Tauscher et al. (2020) can be applied. That method uses training sets for both the beam-weighted foreground and the global signal to create models for each of them. Ultimately, experimenters can decide if a theoretical model of the signal (not just based on residuals from the data with respect to the intrinsic foreground model, as in the case of the flattened Gaussian) is rigorous enough to be explored using our full pipeline, for stronger constraints. However, given the current theoretical uncertainties, if the MAA is possible for the selected beam-weighted foreground model, we recommend to start with this analysis before selecting a physical model of the signal because it allows for any signal—even those which are unexpected—and may guide the model selection procedure.

We thank Neil Bassett and Joshua Hibbard for useful discussions during the development of this study. We also appreciate detailed discussions with Judd Bowman and Raul Monsolve about the EDGES data analysis approach. This work is directly supported by the NASA Solar System Exploration Virtual Institute cooperative agreement 80ARC017M0006.

Appendix A Minimum assumption analysis general calculation

In this appendix, we will compute the minimum assumption maximum likelihood for signal reconstruction and the uncertainties on that reconstruction when the signal expansion matrix is 𝚿{\boldsymbol{\Psi}} (i.e. the signal 𝒚(21){\boldsymbol{y}}^{(21)} appears in the data as 𝚿𝒚(21){\boldsymbol{\Psi}}{\boldsymbol{y}}^{(21)}), the noise covariance is 𝑪{\boldsymbol{C}}, and the foreground basis matrix is 𝑭{\boldsymbol{F}}.

The signal assumption may be implemented by assuming 𝒚(21)=𝑨𝒙(21){\boldsymbol{y}}^{(21)}={\boldsymbol{A}}{\boldsymbol{x}}^{(21)} for an invertible161616Note that invertible matrices are square, meaning that there are as many parameters in 𝒙(21){\boldsymbol{x}}^{(21)} as there are frequencies in 𝒚(21){\boldsymbol{y}}^{(21)}. basis matrix 𝑨{\boldsymbol{A}}.171717𝑨{\boldsymbol{A}} could be assumed to be the identity matrix, but the analytical calculations can be completed more easily if the signal basis matrix is normalized. The model for the full data is

𝓜(𝒙(fg),𝒙(21))\displaystyle{\boldsymbol{\mathcal{M}}}({\boldsymbol{x}}^{({\text{fg}})},{\boldsymbol{x}}^{(21)}) =𝓜(fg)(𝒙(fg))+𝓜(21)(𝒙(21))\displaystyle={\boldsymbol{\mathcal{M}}}^{({\text{fg}})}({\boldsymbol{x}}^{({\text{fg}})})+{\boldsymbol{\mathcal{M}}}^{(21)}({\boldsymbol{x}}^{(21)}) (A1a)
=𝑭𝒙(fg)+𝚿𝑨𝒙(21),\displaystyle={\boldsymbol{F}}{\boldsymbol{x}}^{({\text{fg}})}+{\boldsymbol{\Psi}}{\boldsymbol{A}}{\boldsymbol{x}}^{(21)}, (A1b)

This can also be written 𝓜(𝒙)=𝑮𝒙{\boldsymbol{\mathcal{M}}}({\boldsymbol{x}})={\boldsymbol{G}}{\boldsymbol{x}}, where 𝑮=[𝑭𝚿𝑨]{\boldsymbol{G}}=\begin{bmatrix}{\boldsymbol{F}}&{\boldsymbol{\Psi}}{\boldsymbol{A}}\end{bmatrix}, 𝒙=[𝒙(fg)𝒙(21)]{\boldsymbol{x}}=\begin{bmatrix}{\boldsymbol{x}}^{({\text{fg}})}\\ {\boldsymbol{x}}^{(21)}\end{bmatrix}, and 𝑭{\boldsymbol{F}} is the foreground basis that applies to all of the spectra simultaneously, as opposed to the single spectrum basis laid out in Section 2. The likelihood function is therefore

(𝒙)e(𝒚𝑮𝒙)T𝑪1(𝒚𝑮𝒙)/2.{\mathcal{L}}({\boldsymbol{x}})\propto e^{-({\boldsymbol{y}}-{\boldsymbol{G}}{\boldsymbol{x}})^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{y}}-{\boldsymbol{G}}{\boldsymbol{x}})/2}. (A2)

The maximum likelihood value of 𝒙{\boldsymbol{x}}, 𝝃{\boldsymbol{\xi}}, and its covariance, 𝑺{\boldsymbol{S}}, are given by

𝝃=𝑺𝑮T𝑪1𝒚 and 𝑺=(𝑮T𝑪1𝑮)1.{\boldsymbol{\xi}}={\boldsymbol{S}}{\boldsymbol{G}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{y}}\ \ \text{ and }\ \ {\boldsymbol{S}}=({\boldsymbol{G}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{G}})^{-1}. (A3)

These are easiest to calculate when 𝑭{\boldsymbol{F}} and 𝚿𝑨{\boldsymbol{\Psi}}{\boldsymbol{A}} are normalized through 𝑭T𝑪1𝑭=𝑰{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}} and 𝑨T𝚿T𝑪1𝚿𝑨=𝑰{\boldsymbol{A}}^{T}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Psi}}{\boldsymbol{A}}={\boldsymbol{I}}. Under these conditions, the covariance of the signal parameters is

𝑺(21)=(𝑰𝑨T𝚿T𝑪1𝚽𝚿𝑨)1,{\boldsymbol{S}}^{(21)}=({\boldsymbol{I}}-{\boldsymbol{A}}^{T}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Phi}}{\boldsymbol{\Psi}}{\boldsymbol{A}})^{-1}, (A4)

where 𝚽{\boldsymbol{\Phi}} is the projection matrix described in Section 2, which is given by 𝚽=𝑭𝑭T𝑪1{\boldsymbol{\Phi}}={\boldsymbol{F}}{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1} under the current normalization conditions. This means that the channel covariance of the signal distribution is 𝚫(21)=𝑨𝑺(21)𝑨T{\boldsymbol{\Delta}}^{(21)}={\boldsymbol{A}}{\boldsymbol{S}}^{(21)}{\boldsymbol{A}}^{T}, which can be written as

𝚫(21)=[(𝑨𝑨T)1𝚿T𝑪1𝚽𝚿]1.{\boldsymbol{\Delta}}^{(21)}=\left[\left({\boldsymbol{A}}{\boldsymbol{A}}^{T}\right)^{-1}-{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Phi}}{\boldsymbol{\Psi}}\right]^{-1}. (A5)

The mean of the signal parameters is

𝝃(21)=𝑺(21)𝑨T𝚿T𝑪1(𝑰𝚽)𝒚{\boldsymbol{\xi}}^{(21)}={\boldsymbol{S}}^{(21)}{\boldsymbol{A}}^{T}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}\left({\boldsymbol{I}}-{\boldsymbol{\Phi}}\right){\boldsymbol{y}} (A6)

and the channel mean of the signal distribution is 𝜸(21)=𝑨𝝃(21){\boldsymbol{\gamma}}^{(21)}={\boldsymbol{A}}{\boldsymbol{\xi}}^{(21)}, or

𝜸(21)=𝚫(21)𝚿T𝑪1(𝑰𝚽)𝒚.{\boldsymbol{\gamma}}^{(21)}={\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}. (A7)

To satisfy our normalization condition, the signal basis matrix 𝑨{\boldsymbol{A}} must satisfy 𝑨T𝚿T𝑪1𝚿𝑨=𝑰{\boldsymbol{A}}^{T}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Psi}}{\boldsymbol{A}}={\boldsymbol{I}}. Multiplying on the left by (𝑨𝑨T)1𝑨({\boldsymbol{A}}{\boldsymbol{A}}^{T})^{-1}{\boldsymbol{A}} and on the right by 𝑨1{\boldsymbol{A}}^{-1}, we find that (𝑨𝑨T)1=𝚿T𝑪1𝚿({\boldsymbol{A}}{\boldsymbol{A}}^{T})^{-1}={\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Psi}}. Therefore, Equation A5 becomes

𝚫(21)=[𝚿T𝑪1(𝑰𝚽)𝚿]1.{\boldsymbol{\Delta}}^{(21)}=\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}. (A8)

This means that if any vector in the column space of 𝚿{\boldsymbol{\Psi}}, i.e. any vector of the form 𝚿𝒛=[𝒛T𝒛T𝒛T]T{\boldsymbol{\Psi}}{\boldsymbol{z}}=\begin{bmatrix}{\boldsymbol{z}}^{T}&{\boldsymbol{z}}^{T}&\cdots&{\boldsymbol{z}}^{T}\end{bmatrix}^{T}, can be represented by the foreground vectors, then the uncertainties are infinite. Plugging Equation A8 into Equation A7, we find that the signal channel mean is

𝜸(21)=[𝚿T𝑪1(𝑰𝚽)𝚿]1𝚿T𝑪1(𝑰𝚽)𝒚.{\boldsymbol{\gamma}}^{(21)}=\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{y}}. (A9)

Appendix B Minimum assumption analysis with only total power spectra

As mentioned in Section 4.1, when there are NsN_{s} spectra concatenated together in the data curve being fit, i.e. 𝒚=[𝒚1T𝒚2T𝒚NsT]T{\boldsymbol{y}}=\begin{bmatrix}{\boldsymbol{y}}_{1}^{T}&{\boldsymbol{y}}_{2}^{T}&\cdots&{\boldsymbol{y}}_{N_{s}}^{T}\end{bmatrix}^{T}, the signal expansion matrix 𝚿{\boldsymbol{\Psi}} is given by 𝚿=[𝑰𝑰𝑰]T{\boldsymbol{\Psi}}=\begin{bmatrix}{\boldsymbol{I}}&{\boldsymbol{I}}&\cdots&{\boldsymbol{I}}\end{bmatrix}^{T}. Assuming the different spectra have independent noise, we can write the covariance 𝑪{\boldsymbol{C}} in this case as

𝑪=[𝑪1𝟎𝟎𝟎𝑪2𝟎𝟎𝟎𝑪Ns].{\boldsymbol{C}}=\begin{bmatrix}{\boldsymbol{C}}_{1}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{C}}_{2}&\cdots&{\boldsymbol{0}}\\ \vdots&\vdots&\ddots&\vdots\\ {\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{C}}_{N_{s}}\end{bmatrix}. (B1)

We split 𝑭{\boldsymbol{F}} into NsN_{s} different components through 𝑭=[𝑭1T𝑭2T𝑭NsT]T{\boldsymbol{F}}=\begin{bmatrix}{\boldsymbol{F}}_{1}^{T}&{\boldsymbol{F}}_{2}^{T}&\cdots&{\boldsymbol{F}}_{N_{s}}^{T}\end{bmatrix}^{T}. This means that the normalization condition of the foreground basis matrix, 𝑭T𝑪1𝑭=𝑰{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}}, becomes k=1Ns𝑭kT𝑪k1𝑭k=𝑰\sum_{k=1}^{N_{s}}{\boldsymbol{F}}_{k}^{T}{\boldsymbol{C}}_{k}^{-1}{\boldsymbol{F}}_{k}={\boldsymbol{I}}, the foreground projection matrix is

𝚽=[𝑭1𝑭1T𝑪11𝑭1𝑭2T𝑪21𝑭1𝑭NsT𝑪Ns1𝑭2𝑭1T𝑪11𝑭2𝑭2T𝑪21𝑭2𝑭NsT𝑪Ns1𝑭Ns𝑭1T𝑪11𝑭Ns𝑭2T𝑪21𝑭Ns𝑭NsT𝑪Ns1],{\boldsymbol{\Phi}}=\begin{bmatrix}{\boldsymbol{F}}_{1}{\boldsymbol{F}}_{1}^{T}{\boldsymbol{C}}_{1}^{-1}&{\boldsymbol{F}}_{1}{\boldsymbol{F}}_{2}^{T}{\boldsymbol{C}}_{2}^{-1}&\cdots&{\boldsymbol{F}}_{1}{\boldsymbol{F}}_{N_{s}}^{T}{\boldsymbol{C}}_{N_{s}}^{-1}\\ {\boldsymbol{F}}_{2}{\boldsymbol{F}}_{1}^{T}{\boldsymbol{C}}_{1}^{-1}&{\boldsymbol{F}}_{2}{\boldsymbol{F}}_{2}^{T}{\boldsymbol{C}}_{2}^{-1}&\cdots&{\boldsymbol{F}}_{2}{\boldsymbol{F}}_{N_{s}}^{T}{\boldsymbol{C}}_{N_{s}}^{-1}\\ \vdots&\vdots&\ddots&\vdots\\ {\boldsymbol{F}}_{N_{s}}{\boldsymbol{F}}_{1}^{T}{\boldsymbol{C}}_{1}^{-1}&{\boldsymbol{F}}_{N_{s}}{\boldsymbol{F}}_{2}^{T}{\boldsymbol{C}}_{2}^{-1}&\cdots&{\boldsymbol{F}}_{N_{s}}{\boldsymbol{F}}_{N_{s}}^{T}{\boldsymbol{C}}_{N_{s}}^{-1}\end{bmatrix}, (B2)

the signal channel covariance matrix is

𝚫(21)=[n=1Ns(𝑰m=1Ns𝑪m1𝑭m𝑭nT)𝑪n1]1,{\boldsymbol{\Delta}}^{(21)}=\left[\sum_{n=1}^{N_{s}}\left({\boldsymbol{I}}-\sum_{m=1}^{N_{s}}{\boldsymbol{C}}_{m}^{-1}{\boldsymbol{F}}_{m}{\boldsymbol{F}}_{n}^{T}\right){\boldsymbol{C}}_{n}^{-1}\right]^{-1}, (B3)

and the signal channel mean is

𝜸(21)=𝚫(21)[n=1Ns(𝑰m=1Ns𝑪m1𝑭m𝑭nT)𝑪n1𝒚n].{\boldsymbol{\gamma}}^{(21)}={\boldsymbol{\Delta}}^{(21)}\left[\sum_{n=1}^{N_{s}}\left({\boldsymbol{I}}-\sum_{m=1}^{N_{s}}{\boldsymbol{C}}_{m}^{-1}{\boldsymbol{F}}_{m}{\boldsymbol{F}}_{n}^{T}\right){\boldsymbol{C}}_{n}^{-1}{\boldsymbol{y}}_{n}\right]. (B4)

Appendix C Minimum assumption analysis with polarization

In this appendix, we will layout the form of the MAA when the data vector is the concatenation of 4Nb4N_{b} spectra containing all four Stokes parameters at NbN_{b} time bins, i.e.

𝒚=[𝒚1,IT𝒚1,QT𝒚1,UT𝒚1,VT𝒚Nb,IT𝒚Nb,QT𝒚Nb,UT𝒚Nb,VT]T.{\boldsymbol{y}}=\begin{bmatrix}{\boldsymbol{y}}_{1,I}^{T}&{\boldsymbol{y}}_{1,Q}^{T}&{\boldsymbol{y}}_{1,U}^{T}&{\boldsymbol{y}}_{1,V}^{T}&\cdots&{\boldsymbol{y}}_{N_{b},I}^{T}&{\boldsymbol{y}}_{N_{b},Q}^{T}&{\boldsymbol{y}}_{N_{b},U}^{T}&{\boldsymbol{y}}_{N_{b},V}^{T}\end{bmatrix}^{T}. (C1)

The signal appears only in the Stokes I spectra, so the signal expansion matrix is

𝚿=[𝑰𝟎𝟎𝟎𝑰𝟎𝟎𝟎]T.{\boldsymbol{\Psi}}=\begin{bmatrix}{\boldsymbol{I}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{I}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\end{bmatrix}^{T}. (C2)

Since the four Stokes parameters of the kthk^{\text{th}} time bin have roughly the same noise covariance,181818If Q/IQ/I, U/IU/I, and V/IV/I have magnitudes of order xx, then the fractional difference between the noise levels on II, QQ, UU, and VV is of order x2x^{2}. See Tauscher et al. (2020) for exact calculations of Stokes parameters noise levels. For the purposes here, it is best just to assume all four Stokes parameters have the same noise level. 𝑪k{\boldsymbol{C}}_{k}, we can write the full noise covariance matrix as

𝑪=[𝑪1𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪1𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪1𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪1𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪Nb𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪Nb𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪Nb𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝑪Nb].{\boldsymbol{C}}=\begin{bmatrix}{\boldsymbol{C}}_{1}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{C}}_{1}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{C}}_{1}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{C}}_{1}&\cdots&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\vdots\\ {\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{C}}_{N_{b}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}&{\boldsymbol{C}}_{N_{b}}&{\boldsymbol{0}}&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{C}}_{N_{b}}&{\boldsymbol{0}}\\ {\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&\cdots&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{0}}&{\boldsymbol{C}}_{N_{b}}\end{bmatrix}. (C3)

In this case, the foreground basis returned by the SVD algorithm is given by

𝑭=[𝑭1,IT𝑭1,QT𝑭1,UT𝑭1,VT𝑭Nb,IT𝑭Nb,QT𝑭Nb,UT𝑭Nb,VT]T,{\boldsymbol{F}}=\begin{bmatrix}{\boldsymbol{F}}_{1,I}^{T}&{\boldsymbol{F}}_{1,Q}^{T}&{\boldsymbol{F}}_{1,U}^{T}&{\boldsymbol{F}}_{1,V}^{T}&\cdots&{\boldsymbol{F}}_{N_{b},I}^{T}&{\boldsymbol{F}}_{N_{b},Q}^{T}&{\boldsymbol{F}}_{N_{b},U}^{T}&{\boldsymbol{F}}_{N_{b},V}^{T}\end{bmatrix}^{T}, (C4)

which implies that the normalization convention for the foreground basis matrix is k=1NsS{I,Q,U,V}𝑭k,ST𝑪k1𝑭k,S=𝑰\sum_{k=1}^{N_{s}}\sum_{S\in\{I,Q,U,V\}}{\boldsymbol{F}}_{k,S}^{T}{\boldsymbol{C}}_{k}^{-1}{\boldsymbol{F}}_{k,S}={\boldsymbol{I}}, and the foreground projection matrix is

𝚽=[𝑭1,I𝑭1,IT𝑪11𝑭1,I𝑭1,VT𝑪11𝑭1,I𝑭Nb,IT𝑪Nb1𝑭1,I𝑭Nb,VT𝑪Nb1𝑭1,V𝑭1,IT𝑪11𝑭1,V𝑭1,VT𝑪11𝑭1,V𝑭Nb,IT𝑪Nb1𝑭1,V𝑭Nb,VT𝑪Nb1𝑭Nb,I𝑭1,IT𝑪11𝑭Nb,I𝑭1,VT𝑪11𝑭Nb,I𝑭Nb,IT𝑪Nb1𝑭Nb,I𝑭Nb,VT𝑪Nb1𝑭Nb,V𝑭1,IT𝑪11𝑭Nb,V𝑭1,VT𝑪11𝑭Nb,V𝑭Nb,IT𝑪Nb1𝑭Nb,V𝑭Nb,VT𝑪Nb1].{\boldsymbol{\Phi}}=\begin{bmatrix}{\boldsymbol{F}}_{1,I}{\boldsymbol{F}}_{1,I}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{1,I}{\boldsymbol{F}}_{1,V}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{1,I}{\boldsymbol{F}}_{N_{b},I}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}&\cdots&{\boldsymbol{F}}_{1,I}{\boldsymbol{F}}_{N_{b},V}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}\\ \vdots&\ddots&\vdots&\ddots&\vdots&\ddots&\vdots\\ {\boldsymbol{F}}_{1,V}{\boldsymbol{F}}_{1,I}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{1,V}{\boldsymbol{F}}_{1,V}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{1,V}{\boldsymbol{F}}_{N_{b},I}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}&\cdots&{\boldsymbol{F}}_{1,V}{\boldsymbol{F}}_{N_{b},V}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}\\ \vdots&\ddots&\vdots&\ddots&\vdots&\ddots&\vdots\\ {\boldsymbol{F}}_{N_{b},I}{\boldsymbol{F}}_{1,I}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{N_{b},I}{\boldsymbol{F}}_{1,V}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{N_{b},I}{\boldsymbol{F}}_{N_{b},I}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}&\cdots&{\boldsymbol{F}}_{N_{b},I}{\boldsymbol{F}}_{N_{b},V}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}\\ \vdots&\ddots&\vdots&\ddots&\vdots&\ddots&\vdots\\ {\boldsymbol{F}}_{N_{b},V}{\boldsymbol{F}}_{1,I}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{N_{b},V}{\boldsymbol{F}}_{1,V}^{T}{\boldsymbol{C}}_{1}^{-1}&\cdots&{\boldsymbol{F}}_{N_{b},V}{\boldsymbol{F}}_{N_{b},I}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}&\cdots&{\boldsymbol{F}}_{N_{b},V}{\boldsymbol{F}}_{N_{b},V}^{T}{\boldsymbol{C}}_{N_{b}}^{-1}\end{bmatrix}. (C5)

Very similarly to Equation B3, the signal channel covariance given by Equation 19a is

𝚫(21)={n=1Nb[𝑰(m=1Nb𝑪m1𝑭m,I)𝑭n,IT]𝑪n1}1.{\boldsymbol{\Delta}}^{(21)}=\left\{\sum_{n=1}^{N_{b}}\left[{\boldsymbol{I}}-\left(\sum_{m=1}^{N_{b}}{\boldsymbol{C}}_{m}^{-1}{\boldsymbol{F}}_{m,I}\right){\boldsymbol{F}}_{n,I}^{T}\right]{\boldsymbol{C}}_{n}^{-1}\right\}^{-1}. (C6)

The signal channel mean is

𝜸(21)=𝚫(21){n=1NbS={I,Q,U,V}[δSI𝑰(m=1Nb𝑪m1𝑭m,I)𝑭n,ST]𝑪n1𝒚n,S}.{\boldsymbol{\gamma}}^{(21)}={\boldsymbol{\Delta}}^{(21)}\left\{\sum_{n=1}^{N_{b}}\sum_{S=\{I,Q,U,V\}}\left[\delta_{SI}{\boldsymbol{I}}-\left(\sum_{m=1}^{N_{b}}{\boldsymbol{C}}_{m}^{-1}{\boldsymbol{F}}_{m,I}\right){\boldsymbol{F}}_{n,S}^{T}\right]{\boldsymbol{C}}_{n}^{-1}{\boldsymbol{y}}_{n,S}\right\}. (C7)

For computation purposes, we define the total signal noise covariance, 𝑪T{\boldsymbol{C}}_{T}, the weighted total power basis, 𝑯{\boldsymbol{H}}, the weighted total power data, 𝒘{\boldsymbol{w}}, and the overlap vector of the data, 𝒅{\boldsymbol{d}}, through

𝑪T1=k=1Nb𝑪k1𝑯=k=1Nb𝑪k1𝑭k,I𝒘=k=1Nb𝑪k1𝒚k,I, and𝒅=k=1NbS{I,Q,U,V}𝑭k,ST𝑪k1𝒚k,S.{\boldsymbol{C}}_{T}^{-1}=\sum_{k=1}^{N_{b}}{\boldsymbol{C}}_{k}^{-1}\text{, }\ \ {\boldsymbol{H}}=\sum_{k=1}^{N_{b}}{\boldsymbol{C}}_{k}^{-1}{\boldsymbol{F}}_{k,I}\text{, }\ \ {\boldsymbol{w}}=\sum_{k=1}^{N_{b}}{\boldsymbol{C}}_{k}^{-1}{\boldsymbol{y}}_{k,I}\text{, and}\ \ {\boldsymbol{d}}=\sum_{k=1}^{N_{b}}\sum_{S\in\{I,Q,U,V\}}{\boldsymbol{F}}_{k,S}^{T}{\boldsymbol{C}}_{k}^{-1}{\boldsymbol{y}}_{k,S}. (C8)

With these definitions, 𝚫(21){\boldsymbol{\Delta}}^{(21)} and 𝜸(21){\boldsymbol{\gamma}}^{(21)} are given by

𝚫(21)=(𝑪T1𝑯𝑯T)1 and 𝜸(21)=𝚫(21)(𝒘𝑯𝒅).{\boldsymbol{\Delta}}^{(21)}=\left({\boldsymbol{C}}_{T}^{-1}-{\boldsymbol{H}}{\boldsymbol{H}}^{T}\right)^{-1}\ \ \text{ and }\ \ {\boldsymbol{\gamma}}^{(21)}={\boldsymbol{\Delta}}^{(21)}({\boldsymbol{w}}-{\boldsymbol{H}}{\boldsymbol{d}}). (C9)

These equations also work when calculating the channel mean and covariance with total power spectra only as long as the sum over SS in the definition of 𝒅{\boldsymbol{d}} includes only II.

Appendix D Signal bias statistic under minimum assumption analysis

In this appendix, we examine the effect of biases in the foreground model, i.e. the effect of non-zero 𝜹{\boldsymbol{\delta}} on the uncertainties of minimum assumption analyses. To do this, we write the data as 𝒚=𝑭𝒙+𝜹+𝚿𝒛+𝒏{\boldsymbol{y}}={\boldsymbol{F}}{\boldsymbol{x}}+{\boldsymbol{\delta}}+{\boldsymbol{\Psi}}{\boldsymbol{z}}+{\boldsymbol{n}}, where 𝜹{\boldsymbol{\delta}} is the foreground bias (i.e. unmodeled foreground) that satisfies 𝑭T𝑪1𝜹=𝟎{\boldsymbol{F}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\delta}}={\boldsymbol{0}}, 𝑭𝒙{\boldsymbol{F}}{\boldsymbol{x}} is the modeled foreground, 𝒛{\boldsymbol{z}} is the true signal, and 𝒏{\boldsymbol{n}} is the Gaussian noise realization with covariance 𝑪{\boldsymbol{C}}. With these definitions, the signal channel mean 𝜸(21){\boldsymbol{\gamma}}^{(21)} is given by

𝜸(21)=[𝚿T𝑪1(𝑰𝚽)𝚿]1𝚿T𝑪1(𝑰𝚽)(𝑭𝒙+𝜹+𝚿𝒛+𝒏).{\boldsymbol{\gamma}}^{(21)}=\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}})({\boldsymbol{F}}{\boldsymbol{x}}+{\boldsymbol{\delta}}+{\boldsymbol{\Psi}}{\boldsymbol{z}}+{\boldsymbol{n}}). (D1)

Since 𝚽𝑭𝒙=𝑭𝒙{\boldsymbol{\Phi}}{\boldsymbol{F}}{\boldsymbol{x}}={\boldsymbol{F}}{\boldsymbol{x}} and 𝚽𝜹=𝟎{\boldsymbol{\Phi}}{\boldsymbol{\delta}}={\boldsymbol{0}}, this means

𝜸(21)\displaystyle{\boldsymbol{\gamma}}^{(21)} =[𝚿T𝑪1(𝑰𝚽)𝚿]1[𝚿T𝑪1𝜹+𝚿T𝑪1(𝑰𝚽)𝚿𝒛+𝚿T𝑪1(𝑰𝚽)𝒏],\displaystyle=\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\delta}}+{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}{\boldsymbol{z}}+{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{n}}\right], (D2a)
=[𝚿T𝑪1(𝑰𝚽)𝚿]1𝚿T𝑪1𝜹+𝒛+[𝚿T𝑪1(𝑰𝚽)𝚿]1𝚿T𝑪1(𝑰𝚽)𝒏,\displaystyle=\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\delta}}+{\boldsymbol{z}}+\left[{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}\right]^{-1}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{n}}, (D2b)
=𝚫(21)𝚿T𝑪1𝜹+𝒛+𝚫(21)𝚿T𝑪1(𝑰𝚽)𝒏.\displaystyle={\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\delta}}+{\boldsymbol{z}}+{\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{n}}. (D2c)

This means that the signal channel bias is

𝜸(21)𝒛=𝚫(21)𝚿T𝑪1[𝜹+(𝑰𝚽)𝒏].{\boldsymbol{\gamma}}^{(21)}-{\boldsymbol{z}}={\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}\left[{\boldsymbol{\delta}}+({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{n}}\right]. (D3)

The so-called signal bias statistic, ε2\varepsilon^{2}, defined through ε2=1Nν(𝜸(21)𝒛)T[𝚫(21)]1(𝜸(21)𝒛)\varepsilon^{2}=\frac{1}{N_{\nu}}({\boldsymbol{\gamma}}^{(21)}-{\boldsymbol{z}})^{T}\left[{\boldsymbol{\Delta}}^{(21)}\right]^{-1}({\boldsymbol{\gamma}}^{(21)}-{\boldsymbol{z}}), which yields the squared number of sigma at which the signal is measured in an averaged sense across the band, satisfies

ε2=1Nν[𝜹+(𝑰𝚽)𝒏]T𝑪1𝚿𝚫(21)𝚿T𝑪1[𝜹+(𝑰𝚽)𝒏],\varepsilon^{2}=\frac{1}{N_{\nu}}\left[{\boldsymbol{\delta}}+({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{n}}\right]^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Psi}}{\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}\left[{\boldsymbol{\delta}}+({\boldsymbol{I}}-{\boldsymbol{\Phi}}){\boldsymbol{n}}\right], (D4)

where NνN_{\nu} is the number of frequency channels. Assuming that 𝜹{\boldsymbol{\delta}} follows a normal distribution (with a singular covariance matrix), the expectation value and variance of ε2\varepsilon^{2} are

E[ε2]\displaystyle{\text{E}}[\varepsilon^{2}] =1+𝝁T𝑨𝝁+Tr(𝑨𝚺),\displaystyle=1+{\boldsymbol{\mu}}^{T}{\boldsymbol{A}}{\boldsymbol{\mu}}+{\text{Tr}}({\boldsymbol{A}}{\boldsymbol{\Sigma}})\,, (D5a)
Var[ε2]\displaystyle{\text{Var}}[\varepsilon^{2}] =2Nν+4𝝁T𝑨𝚺𝑨𝝁+2Tr(𝑨𝚺𝑨𝚺)+4Nν𝝁T𝑨𝝁+4NνTr(𝑨𝚺),\displaystyle=\frac{2}{N_{\nu}}+4{\boldsymbol{\mu}}^{T}{\boldsymbol{A}}{\boldsymbol{\Sigma}}{\boldsymbol{A}}{\boldsymbol{\mu}}+2{\text{Tr}}({\boldsymbol{A}}{\boldsymbol{\Sigma}}{\boldsymbol{A}}{\boldsymbol{\Sigma}})+\frac{4}{N_{\nu}}{\boldsymbol{\mu}}^{T}{\boldsymbol{A}}{\boldsymbol{\mu}}+\frac{4}{N_{\nu}}{\text{Tr}}({\boldsymbol{A}}{\boldsymbol{\Sigma}})\,, (D5b)

where 𝑨=1Nν(𝑰𝚽)T𝑪1𝚿𝚫(21)𝚿T𝑪1(𝑰𝚽){\boldsymbol{A}}=\frac{1}{N_{\nu}}({\boldsymbol{I}}-{\boldsymbol{\Phi}})^{T}{\boldsymbol{C}}^{-1}{\boldsymbol{\Psi}}{\boldsymbol{\Delta}}^{(21)}{\boldsymbol{\Psi}}^{T}{\boldsymbol{C}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi}}), 𝝁=E[𝜹]{\boldsymbol{\mu}}={\text{E}}[{\boldsymbol{\delta}}], and 𝚺=Cov[𝜹]{\boldsymbol{\Sigma}}={\text{Cov}}[{\boldsymbol{\delta}}]. Assuming that ε2\varepsilon^{2} approximately follows a normal distribution, this means that, at a confidence level of pp,

ε<1+𝝁T𝑨𝝁+Tr(𝑨𝚺)+21Nν+2𝝁T𝑨𝚺𝑨𝝁+Tr(𝑨𝚺𝑨𝚺)+2Nν𝝁T𝑨𝝁+2NνTr(𝑨𝚺)erf1(2p1).\varepsilon<\sqrt{1+{\boldsymbol{\mu}}^{T}{\boldsymbol{A}}{\boldsymbol{\mu}}+{\text{Tr}}({\boldsymbol{A}}{\boldsymbol{\Sigma}})+2\sqrt{\frac{1}{N_{\nu}}+2{\boldsymbol{\mu}}^{T}{\boldsymbol{A}}{\boldsymbol{\Sigma}}{\boldsymbol{A}}{\boldsymbol{\mu}}+{\text{Tr}}({\boldsymbol{A}}{\boldsymbol{\Sigma}}{\boldsymbol{A}}{\boldsymbol{\Sigma}})+\frac{2}{N_{\nu}}{\boldsymbol{\mu}}^{T}{\boldsymbol{A}}{\boldsymbol{\mu}}+\frac{2}{N_{\nu}}{\text{Tr}}({\boldsymbol{A}}{\boldsymbol{\Sigma}})}\ \text{erf}^{-1}(2p-1)}. (D6)

This equation connects the RMS number of sigma, ε\varepsilon, to confidence levels, pp, in the general case where the foreground model is imperfect.

Appendix E Taking derivative of finite-dimensional signal with matrix

In Section 5.2.2, the frequency derivative of a signal 𝒔=[s(ν1)s(ν2)s(νNν)]T{\boldsymbol{s}}=\begin{bmatrix}s(\nu_{1})&s(\nu_{2})&\cdots&s(\nu_{N_{\nu}})\end{bmatrix}^{T} is written as 𝑫𝒔{\boldsymbol{D}}{\boldsymbol{s}}. There are some subtleties to this representation. First, the discrete nature of 𝒔{\boldsymbol{s}} means that any derivative taken will be a finite difference approximation. Second, note that the derivative of a vector with NνN_{\nu} elements is only strictly defined on the Nν1N_{\nu}-1 midpoints. The derivative matrix defined in this way is (Nν1)×Nν(N_{\nu}-1)\times N_{\nu} and in the case of equally spaced frequency channels with resolution Δν\Delta\nu is given by

𝑫=1Δν[110000011000001000000110000011].{\boldsymbol{D}}=\frac{1}{\Delta\nu}\begin{bmatrix}-1&1&0&\cdots&0&0&0\\ 0&-1&1&\cdots&0&0&0\\ 0&0&-1&\cdots&0&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&-1&1&0\\ 0&0&0&\cdots&0&-1&1\end{bmatrix}. (E1)

To define 𝑫𝒔{\boldsymbol{D}}{\boldsymbol{s}} in the same space as 𝒔{\boldsymbol{s}}, the derivative must be interpolated in some way (i.e. 𝑫{\boldsymbol{D}} must be made square in some way). A natural interpolation scheme is to take the derivative at the first (last) endpoint to be the derivative at the midpoint of the first (last) pair of elements and to take the derivative at each interior point to be the average of the derivatives at the two adjacent midpoints. Defined in this way, the derivative matrix from Equation E1 is modified to

𝑫\displaystyle{\boldsymbol{D}} =(12[200001100001100000100001100002])(1Δν[110000011000001000000110000011]),\displaystyle=\left(\frac{1}{2}\begin{bmatrix}2&0&0&\cdots&0&0\\ 1&1&0&\cdots&0&0\\ 0&1&1&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&1&0\\ 0&0&0&\cdots&1&1\\ 0&0&0&\cdots&0&2\end{bmatrix}\right)\left(\frac{1}{\Delta\nu}\begin{bmatrix}-1&1&0&\cdots&0&0&0\\ 0&-1&1&\cdots&0&0&0\\ 0&0&-1&\cdots&0&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&-1&1&0\\ 0&0&0&\cdots&0&-1&1\end{bmatrix}\right), (E2a)
=12Δν[220000101000010000000010000101000022].\displaystyle=\frac{1}{2\Delta\nu}\begin{bmatrix}-2&2&0&\cdots&0&0&0\\ -1&0&1&\cdots&0&0&0\\ 0&-1&0&\cdots&0&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&0&1&0\\ 0&0&0&\cdots&-1&0&1\\ 0&0&0&\cdots&0&-2&2\end{bmatrix}. (E2b)

This is the derivative matrix referred to in Section 5.2.2.

References

  • Barkana (2018) Barkana, R. 2018, Nature, 555, 71, doi: 10.1038/nature25791
  • Barkana et al. (2018) Barkana, R., Outmezguine, N. J., Redigol, D., & Volansky, T. 2018, Phys. Rev. D, 98, 103005, doi: 10.1103/PhysRevD.98.103005
  • Berlin et al. (2018) Berlin, A., Hooper, D., Krnjaic, G., & McDermott, S. D. 2018, Physical Review Letters, 121, 011102, doi: 10.1103/PhysRevLett.121.011102
  • Bernardi (2018) Bernardi, G. 2018, in IAU Symposium, Vol. 333, Peering towards Cosmic Dawn, ed. V. Jelić & T. van der Hulst, 98–101, doi: 10.1017/S1743921318000674
  • Bowman et al. (2018) Bowman, J. D., Rogers, A. E. E., Monsalve, R. A., Mozdzen, T. J., & Mahesh, N. 2018, Nature, 555, 67, doi: 10.1038/nature25792
  • Bradley et al. (2019) Bradley, R. F., Tauscher, K., Rapetti, D., & Burns, J. O. 2019, ApJ, 874, 153, doi: 10.3847/1538-4357/ab0d8b
  • Burns et al. (2019) Burns, J., Bale, S., Bassett, N., et al. 2019, BAAS, 51, 6. https://arxiv.org/abs/1902.06147
  • Burns (2020) Burns, J. O. 2020, arXiv e-prints, arXiv:2003.06881. https://arxiv.org/abs/2003.06881
  • Burns et al. (2017) Burns, J. O., Bradley, R., Tauscher, K., et al. 2017, ApJ, 844, 33, doi: 10.3847/1538-4357/aa77f4
  • Creque-Sarbinowski et al. (2019) Creque-Sarbinowski, C., Ji, L., Kovetz, E. D., & Kamionkowski, M. 2019, Phys. Rev. D, 100, 023528, doi: 10.1103/PhysRevD.100.023528
  • de Lera Acedo (2019) de Lera Acedo, E. 2019, International Conference on Electromagnetics in Advanced Applications, 0626, doi: 10.1109/ICEAA.2019.8879199
  • DeBoer et al. (2017) DeBoer, D. R., Parsons, A. R., Aguirre, J. E., et al. 2017, PASP, 129, 045001, doi: 10.1088/1538-3873/129/974/045001
  • Deshpande (2018) Deshpande, A. A. 2018, ApJ, 866, L7, doi: 10.3847/2041-8213/aae318
  • Draine & Miralda-Escudé (2018) Draine, B. T., & Miralda-Escudé, J. 2018, ApJ, 858, L10, doi: 10.3847/2041-8213/aac08a
  • Ewall-Wice et al. (2018) Ewall-Wice, A., Chang, T.-C., Lazio, J., et al. 2018, ApJ, 868, 63, doi: 10.3847/1538-4357/aae51d
  • Ewall-Wice et al. (2019) Ewall-Wice, A., Chang, T.-C., & Lazio, T. J. W. 2019, ArXiv e-prints, arXiv:1903.06788. https://arxiv.org/abs/1903.06788
  • Feng & Holder (2018) Feng, C., & Holder, G. 2018, ApJ, 858, L17, doi: 10.3847/2041-8213/aac0fe
  • Fialkov & Barkana (2019) Fialkov, A., & Barkana, R. 2019, MNRAS, 486, 1763, doi: 10.1093/mnras/stz873
  • Fialkov et al. (2018) Fialkov, A., Barkana, R., & Cohen, A. 2018, Physical Review Letters, 121, 011101, doi: 10.1103/PhysRevLett.121.011101
  • Haslam et al. (1982) Haslam, C. G. T., Salter, C. J., Stoffel, H., & Wilson, W. E. 1982, A&AS, 47, 1
  • Hills et al. (2018) Hills, R., Kulkarni, G., Meerburg, P. D., & Puchwein, E. 2018, Nature, 564, E32, doi: 10.1038/s41586-018-0796-5
  • Liu et al. (2013) Liu, A., Pritchard, J. R., Tegmark, M., & Loeb, A. 2013, Phys. Rev. D, 87, 043002, doi: 10.1103/PhysRevD.87.043002
  • Loeb & Muñoz (2018) Loeb, A., & Muñoz, J. B. 2018, Physics Online Journal, 11, 69, doi: 10.1103/Physics.11.69
  • Mebane et al. (2019) Mebane, R. H., Mirocha, J., & Furlanetto, S. R. 2019, ArXiv e-prints, arXiv:1910.10171. https://arxiv.org/abs/1910.10171
  • Monsalve et al. (2018) Monsalve, R. A., Greig, B., Bowman, J. D., et al. 2018, ApJ, 863, 11, doi: 10.3847/1538-4357/aace54
  • Monsalve et al. (2017a) Monsalve, R. A., Rogers, A. E. E., Bowman, J. D., & Mozdzen, T. J. 2017a, ApJ, 847, 64, doi: 10.3847/1538-4357/aa88d1
  • Monsalve et al. (2017b) —. 2017b, ApJ, 835, 49, doi: 10.3847/1538-4357/835/1/49
  • Mozdzen et al. (2019) Mozdzen, T. J., Mahesh, N., Monsalve, R. A., Rogers, A. E. E., & Bowman, J. D. 2019, MNRAS, 483, 4411, doi: 10.1093/mnras/sty3410
  • Muñoz et al. (2018) Muñoz, J. B., Dvorkin, C., & Loeb, A. 2018, Phys. Rev. Lett., 121, 121301, doi: 10.1103/PhysRevLett.121.121301
  • Nhan et al. (2019) Nhan, B. D., Bordenave, D. D., Bradley, R. F., et al. 2019, ApJ, 883, 126, doi: 10.3847/1538-4357/ab391b
  • Pritchard & Loeb (2012) Pritchard, J. R., & Loeb, A. 2012, Reports on Progress in Physics, 75, 086901, doi: 10.1088/0034-4885/75/8/086901
  • Rapetti et al. (2019) Rapetti, D., Tauscher, K., Mirocha, J., & Burns, J. O. 2019, arXiv e-prints, arXiv:1912.02205. https://arxiv.org/abs/1912.02205
  • Sims & Pober (2020) Sims, P. H., & Pober, J. C. 2020, MNRAS, 492, 22, doi: 10.1093/mnras/stz3388
  • Singh & Subrahmanyan (2019) Singh, S., & Subrahmanyan, R. 2019, ApJ, 880, 26, doi: 10.3847/1538-4357/ab2879
  • Singh et al. (2018) Singh, S., Subrahmanyan, R., Udaya Shankar, N., et al. 2018, ApJ, 858, 54, doi: 10.3847/1538-4357/aabae1
  • Slosar (2017) Slosar, A. 2017, Phys. Rev. Lett., 118, 151301, doi: 10.1103/PhysRevLett.118.151301
  • Spinelli et al. (2019) Spinelli, M., Bernardi, G., & Santos, M. G. 2019, MNRAS, 489, 4007, doi: 10.1093/mnras/stz2425
  • Switzer & Liu (2014) Switzer, E. R., & Liu, A. 2014, ApJ, 793, 102, doi: 10.1088/0004-637X/793/2/102
  • Tauscher et al. (2020) Tauscher, K., Rapetti, D., & Burns, J. O. 2020, arXiv e-prints, arXiv:2003.05452. https://arxiv.org/abs/2003.05452
  • Tauscher et al. (2018) Tauscher, K., Rapetti, D., Burns, J. O., & Switzer, E. 2018, ApJ, 853, 187, doi: 10.3847/1538-4357/aaa41f
  • Vedantham et al. (2014) Vedantham, H. K., Koopmans, L. V. E., de Bruyn, A. G., et al. 2014, MNRAS, 437, 1056, doi: 10.1093/mnras/stt1878