First measurement of the Mg II forest correlation function in the Epoch of Reionization
Abstract
In the process of producing the roughly three ionizing photons per atom required to reionize the IGM, the same massive stars explode and eject metals into their surroundings, enriching the Universe to . While the overly sensitive Ly transition makes Gunn-Peterson absorption of background quasar light an ineffective probe of reionization at , strong low-ionization transitions like the Mg ii Å doublet will give rise to a detectable ‘metal-line forest’, if metals pollute the neutral IGM. We measure the auto-correlation of the Mg ii forest transmission using a sample of ten ground based quasar spectra probing the redshift range (). The correlation function exhibits strong small-scale clustering and a pronounced peak at the doublet velocity () arising from strong absorbers in the CGM of galaxies. After these strong absorbers are identified and masked the signal is consistent with noise. Our measurements are compared to a suite of models generated by combining a large hydrodynamical simulation with a semi-numerical reionization topology, assuming a simple uniform enrichment model. We obtain a 95% credibility upper limit of at , assuming uninformative priors on [Mg/H] and the IGM neutral fraction . Splitting the data into low- (; ) and high- (; ) subsamples again yields null-detections and 95% upper limits of and , respectively. These first measurements set the stage for an approved JWST Cycle 2 program (GO 3526) targeting a similar number of quasars that will be an order of magnitude more sensitive, making the Mg ii forest an emerging powerful tool to deliver precision constraints on the reionization and enrichment history of the Universe.
keywords:
cosmology: observation – intergalactic medium – quasars: absorption lines – methods: observation.1 Introduction
One of the standard probes to study the intergalatic medium (IGM) is neutral hydrogen manifested as a blended collection of absoption lines known as the Ly forest. However, due to the large scattering cross section of the Ly transition, the absorption lines quickly saturate in the presence of residual neutral hydrogen as low as as 0.01%, thus rendering it challenging to utilize the Ly forest as a probe of the IGM beyond . An alternative probe of the high-redshift IGM is required.
Heavy elements, or metals, are expected to be present in the IGM around the Epoch of Reionization due to their ejection by massive stars exploding as supernovae. Low-ionization metals lines (e.g. O i, Mg II, and Si ii) are good tracers of neutral hydrogen in the pre-reionized IGM due to their similar ionization energy, and their low number densities mean that they do not saturate as easily as the Ly line. Before reionization, forests of these low-ionization lines are expected to arise from predominantly-neutral and overdense regions (Oh, 2002). During and after reionization, the hardening of the UV background causes the disappearance of the low-ionization forests (e.g. O i, Si ii, and Mg ii) and gives rise instead to forests of high-ionization lines (e.g. C iv, N v, and O vi). This phase transition signalling reionization has been tentatively observed in the circumgalactic medium (CGM) of galaxies (e.g. Becker et al., 2009, 2019; Cooper et al., 2019; D’Odorico et al., 2022). Absorption lines by metal ions therefore provide us with an additional tool with which to study the IGM and the reionization history.
Despite the expected presence of metals at early times, their abundance and distribution are unknown, and models do not agree on the typical metallicity of the IGM (e.g. Oppenheimer & Davé, 2006; Oppenheimer et al., 2009; Jaacks et al., 2018; Kirihara et al., 2020; Liu & Bromm, 2020). However, assuming reionization is powered by ionizing photons from massive stars, cosmic reionization and enrichment are then intimately linked, since the same massive stars that reionize the Universe would enrich the IGM to when they explode as supernovae (e.g., Miralda-Escudé & Rees, 1998; Ferrara, 2016). Existing observations of metal absorbers concentrate around , and they suggest a typical IGM metallicity of as traced by C iv (e.g. Songaila & Cowie, 1996; Songaila, 1997; Schaye et al., 2003; Simcoe, 2011) and exhibit a lack of evolution in the metallicity from (Schaye et al., 2003). Higher-redshift observations of metal absorbers (e.g. Simcoe et al., 2011; Ryan-Weber et al., 2006; Bosman et al., 2017) are generally probing the CGM of galaxies rather than the diffuse IGM. This is because current observations are limited by the use of standard methods that rely on detecting discrete absorbing lines, which necessitates high signal-to-noise ratio (SNR) and high resolution measurements, without which one is likely probing overdense gas around galaxies. Additionally, observing in the near-IR, where metal absorbers redshift into at higher redshifts, faces myriad challenges arising from the higher sky background and worsening detector sensivity and spectral resolution.
Hennawi et al. (2021) (hereafter H2021) introduced using the auto-correlation of the metal-line forest, which is based on clustering analyses of the Ly forest, to detect IGM absorbers. The proposed method treats the metal-line forest as a continuous random field and utilizes all spectral pixels to statistically average down the noise, resulting in far higher sensitivity to the expected weak absorption signals. By combining a large hydrodynamical simulation with a semi-numerical reionization topology, H2021 simulated the Mg ii forest in the pre-reionized IGM at and showed that its auto-correlation is sensitive to the neutral hydrogen fraction, , and the IGM metallicity, [Mg/H]. The characteristic signal is a peak in the auto-correlation function at the velocity separation of the Mg ii doublet (768 km/s), where the peak amplitude increases with [Mg/H] and the large-scale power decreases with increasing . They showed that and [Mg/H] can be constrained to within 5% and 0.02 dex, respectively, assuming a mock JWST NIRSpec dataset of 10 quasars. Tie et al. (2022) later extended this method to the C iv forest at focusing on the enrichment topology and using a more realistic model for the metal distribution, and showed that the model parameters can also be well-constrained (see Xin et al. in prep for a cross-correlation analysis between metal-line and Ly forests). Most recently, Karaçaylı et al. (2023) introduced an analytical framework for metal absorption based on the halo model and attempted to measure their 1D power spectra in high- and low-resolution data.
In this work, we aim to jointly constrain the neutral fraction and metallicity of the IGM at for the first time with the auto-correlation of the Mg ii forest in a sample of ten high-redshift quasars. We describe our quasar dataset in §2 and the reduction of their spectra in §3. In §4, we address contamination from CGM absorbers that can bias our signal and describe steps to mask them out. We present our results in §5, where we measure the correlation function of the Mg ii forest and describe our procedure to constrain the metallicity and neutral fraction of the IGM using forward models and Bayesian inference. We conclude in §6. Throughout this work, we adopt a CDM cosmology with the following parameters: = 0.3192, = 0.6808, = 0.04964, and = 0.67038, which agree with the cosmological constrains from the CMB (Planck Collaboration et al., 2020) within one sigma. All distances in this work are comoving, denoted as cMpc or ckpc, unless explicitly indicated otherwise. We define metallicity as , where is the ratio of the number of magnesium atoms to the number of hydrogen atoms and is the corresponding Solar ratio, where we used (Asplund et al., 2009).
2 Observation
Our dataset consists of ten quasars at 6.80: five 7 quasars observed with Keck/MOSFIRE (McLean et al., 2010, 2012), four 6.8 quasars observed with Keck/NIRES (Wilson et al., 2004), and one ultra-deep VLT/XSHOOTER (Vernet et al., 2011) observation of a quasar (Bosman et al., 2017). Our sample of quasars and their properties are listed in table 1. J03131806 (Wang et al., 2021) is the current record holder for the highest-redshift quasar discovered, followed by J13420928 (Bañados et al., 2018), while J00381527 (Yang et al., 2019) is a broad-absorption line quasar (as is J03131806). We also include two new unpublished high- quasars, denoted as ‘newqso1’ (Bañados et al. in prep) and ‘newqso2’ (Yang et al. in prep) for the rest of this paper. We note that there are other ground-based observations of 6.8 quasars, e.g. observed with VLT/XSHOOTER and GEMINI/GNIRS, but these are either too low in signal-to-noise ratio (SNR 3) or have too low spectral resolution in the case of GNIRS.
Our MOSFIRE observations were executed following an ABAB dither sequence in the -band covering from m and through a 0.7” slit, giving a nominal resolution of or FWHM = 83 km/s, where the spectral sampling is 2.78 pixels per resolution element111https://www2.keck.hawaii.edu/inst/mosfire/grating.html, resulting in a pixel size of km/s. The NIRES observations were acquired with an ABBA dither sequence and provide full wavelength coverage of the YJHK bands ( m) through a fixed 0.55” slit, where the mean resolution is or FWHM = 111 km/s and the sampling is 2.7 pixels per resolution element222https://www2.keck.hawaii.edu/inst/nires/spectrometer.html, resulting in a pixel size of km/s. The XSHOOTER observation of J11200641 was taken by Bosman et al. (2017) over 30 hours, observed through a 0.9” slit in the VIS and NIR arms. Bosman et al. (2017) measured the resolution to be or FWHM = 43 km/s, which we adopt here, and we will use a sampling of 3.7 pixels333https://www.eso.org/sci/facilities/paranal/instruments/xshooter/inst.html, which gives a pixel size of km/s.
No. | Name | mag | mag | SNR444median values calculated from the unmasked regions on the bottom panels of Figures 1 6 | SNRcorr555We discover that the noise vectors produced by the PypeIt pipeline could be slightly over- and under-estimated. We correct for this by applying correction factors to the noise vectors of the quasars as described in the text. | Instr. | 666Effective weights of quasars towards the correlation function in each redshift bin (see definition following Eqn 4) | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
(AB) | (AB) | (sec) | (All-) | (High-) | (Low-) | |||||||
1 | J03131806777Wang et al. (2021) | 7.642 | 20.94 | 19.96 | 25920 | 18.04 | 18.56 | 1.45 | MOSFIRE | 0.332 | 0.360 | 0.314 |
2 | J13420928888Bañados et al. (2018) | 7.541 | 20.30 | 20.03 | 12960 | 14.38 | 15.04 | 1.35 | MOSFIRE | 0.139 | 0.187 | 0.101 |
3 | J10072115999Yang et al. (2020) | 7.515 | 20.22 | 19.75 | 7920 | 9.82 | 10.82 | 1.33 | NIRES | 0.036 | 0.046 | 0.032 |
4 | J11200641101010Mortlock et al. (2011) | 7.085 | 20.35 | — | 114000 | 14.7 | 13.88 | 0.91 | XSHOOTER | 0.089 | 0.035 | 0.130 |
5 | J02520503111111Yang et al. (2019) | 7.001 | 20.19 | 19.92 | 13680 | 20.63 | 19.0 | 0.82 | MOSFIRE | 0.117 | 0.072 | 0.151 |
6 | J00381527121212Wang et al. (2018) | 7.034 | 19.69 | 19.33 | 7200 | 22.86 | 21.67 | 0.83 | MOSFIRE | 0.224 | 0.242 | 0.207 |
7 | J04110907131313Pons et al. (2019); Wang et al. (2019) | 6.826 | 20.02 | 19.56 | 5760 | 12.58 | 13.53 | 0.66 | NIRES | 0.028 | 0.017 | 0.036 |
8 | J03191008141414Yang et al. (2019) | 6.827 | 20.98 | — | 18720 | 7.61 | 8.48 | 0.66 | NIRES | 0.004 | 0.003 | 0.005 |
9 | newqso1151515Bañados et al. in prep | 7.0 | — | 20.39161616Priv. comm. | 9360 | 5.36 | 6.09 | 0.83 | NIRES | 0.001 | 0.001 | 0.001 |
10 | newqso2171717Yang et al. in prep | 7.1 | — | 19.61181818Priv. comm. | 5760 | 13.29 | 12.65 | 0.90 | MOSFIRE | 0.028 | 0.036 | 0.024 |
3 Data Reduction
We reduce our spectra with PypeIt191919https://pypeit.readthedocs.io/en/latest/, a Python package for semi-automated reduction of astronomical slit-based spectroscopy (Prochaska et al., 2020b, a). After performing standard calibration steps and flux calibration on each exposure, we coadd all exposures of the object and apply telluric correction on the combined spectrum. Since the spectra have different resolutions, wavelength grids, and pixel sizes, we coadd them all onto a common grid with a pixel size of 40 km/s. Finally, we restrict our analyses to a minimum wavelength of 1.95 m, corresponding to .
We fit for the quasar continuum using a B-spline with a breakpoint at every-60th pixel (amounting to a breakpoint at every 2400 km/s spacing) and we mask strong absorbers identified by-eye before fitting the continuum. We use the same breakpoint spacing for all the quasars. Figures 1 to 3 show the spectra before and after continuum normalization for representative members of our dataset. For each quasar, the gray shaded regions in the top panel indicate regions that are masked before continuum fitting due to visually-detected strong absorbers and masks provided by PypeIt from the data reduction. In the bottom panels, we mask regions due to proximity to quasars and where (gray), low SNR due to telluric absorption (blue), and detected CGM absorbers (red; see §4); these are excluded from the correlation function analyses.
We also mask pixels that fall within a quasar proximity zone, as these pixels will be affected by the quasar own ionizing radiation and produce biased results on the IGM neutral fraction. For the proximity zone (PZ) mask, we remove pixels within rest-frame km/s blueward from the quasar Mg ii emission line, where we define the Mg ii emission line to be the midpoint of the Mg ii Å and Å doublet at 2800 . Our choice of km/s follows Bosman et al. (2021), who consider pixels at 1185 for their Ly optical depth study, which results in a proximity zone size of 7643 km/s, or 8.64 proper Mpc at the mean redshift of our quasars, which is = 7.16. We use a fixed PZ size for all quasars as there are indications that quasar PZ sizes do not evolve strongly with redshift (Eilers et al., 2017; Satyavolu et al., 2023), and even in the presence of evolution, our adopted value of 8.64 proper Mpc is rather conservative given that the typical PZ size for quasars ranges from proper Mpc (Eilers et al., 2017; Eilers et al., 2020; Satyavolu et al., 2023). The unshaded regions are used for the correlation function measurements, and they range from to , with , where is the pixel redshift and is the median value. Table 1 includes the pathlength of each quasar that contributes towards the correlation function measurement, where the total pathlength of our dataset is or . is the absorption pathlength, where (Bahcall & Peebles, 1969). We compute for each quasar and take the sum as the total absorption pathlength quoted above.
We discover that the noise in the quasar spectra produced by PypeIt are slightly over- and under-estimated and not normally distributed, and we apply correction factors to correct for this such that the noise distribution is Gaussian. To determine the correction factors, we compute the residual after applying all masks (including CGM masks described in §4) and obtain the median absolute deviation (MAD) of the residual distribution for each quasar as the correction factor to the noise. We obtain correction factors of 0.972, 0.956, 0.908, 1.059, 1.086, 1.055, 0.93, 0.898, 0.88, 1.051 to the noise vectors of quasars no. 1 to no. 10 (see Table 1), respectively. After applying these corrections, the median SNR of our dataset is 13.7. Figure 4 zooms in on certain regions of the Mg ii forest from the normalized spectra, where the typical flux fluctuations given our data quality is 10%, or higher towards the redder part of the spectrum. Typical IGM absorbers are expected to produce percent-level fluctuations (see Figure 2 of H2021), so attempting to detect them at this SNR level will be challenging.








.


4 Masking Mg II absorbers of the CGM
As we are interested in measuring properties of the IGM, Mg ii absorbers within the virial radii of galaxies, namely those in their circumgalactic media (CGM), can bias our measurement (Figure 15 of H2021). While the strongest absorbers can be identified by eye, weaker but more abundant absorbers are difficult to detect visually. We briefly describe our procedures to mask out the CGM absorbers here, but see H2021 or Tie et al. (2022) for more details.
The first masking procedure uses the flux decrement () probability distribution function to filter out CGM absorbers. The rationale behind this procedure is that the PDF of the Mg ii forest will exhibit a tail towards large values (strong absorptions), which is sourced by rare strong CGM absorbers, and a peak at smaller value due to IGM absorbers (Figure 14 of H2021). Figure 5 (left panel) shows the flux PDF for our data, where we mask all pixels with (shaded region). The top axis shows the the pixel equivalent width, , where is the spectral pixel width at rest-frame Mg ii 2796 , which depends on the spectral sampling. Using our fixed common grid with km/s, .
While a flux cut helps remove strongly-absorbed pixels, it is usually insufficient for removing all affected pixels due to the broad absorption wings of CGM absorbers. As such, we supplement the flux cut with a second masking procedure, which identifies absorbers using the transmission field convolved with the profile of the Mg ii doublet. The convolved transmission field is known as the significance or field,
(1) |
where is the flux variance (noise vector) of the data and is a matched filter with being the optical depth of a Mg ii doublet with a Gaussian velocity distribution, assuming cm-2 and Doppler parameter , in which we use the FWHM of the respective quasar spectrum, except for the XSHOOTER spectrum of J11200641. For J11200641, we use a larger FWHM of 150 km/s, considering that we rebin all spectra to a fixed 40 km/s grid. Note that is the continuum-normalized flux and is the continuum-normalized variance. Given the spectrum, absorbers are identified by their extreme value. We mask out pixels above as well as pixels km/s around each masked pixel to account for their absorption wings. Figure 5 (right panel) shows the distribution of the data. In both panels, in addition to the distribution of each quasar (colored lines), we also plot the distribution for all quasars (black) and pure Gaussian noise sampled from the noise vectors of all quasars (blue). While the PDF from all quasars is similar to that of noise, the PDF shows evidence for fluctuations that are not pure noise at , since the noise PDF drops off at . Both PDFs also peak at some characteristic values as expected (see Figure 13 of Hennawi et al., 2021), although the PDF does not flatten off at large values and the plateau at large is weak in the PDF. In summary, we mask 1) pixels with and 2) pixels with and those km/s around each masked pixel.
Figure 6 shows the masked absorbers for the three highest- quasars in our dataset (we show the rest in Figure 18 and 19 in Appendix B for brevity). The green and magenta horizontal lines indicate our flux and cutoffs, respectively. We discover new strong absorber candidates in J03131806, J00381527, and newqso1, and recover known absorbers at (Simcoe et al., 2020) and in the spectrum of J13420928. We recover two of the three weak absorbers identified by Bosman et al. (2017) in J11200641, and for the one that we do not detect, its chi values are and it is masked manually. These absorbers are indicated by the blue ticks in Figure 18 in Appendix B.





5 Results
5.1 Auto-correlation measurement of the Mg II forest
We perform the correlation function measurement over three redshift bins:
-
1.
(“low-” bin, where )
-
2.
(“high-” bin, where )
-
3.
(“all-” bin, with )
Recall that is the redshift of the pixels that contributed to the correlation function and is the median value. We first compute the relative flux fluctuation for each continuum-normalized quasar spectrum () as
(2) |
where is the mean flux of the entire dataset in each redshift bin. We compute as the weighted average of the normalized flux for the entire dataset using the inverse variance of the corrected measurement noise (see below). We use the sub-optimal unbiased estimator for the correlation function as the weighted averaged of all pixel pairs (Slosar et al., 2011)
(3) |
where means summing over pixel pairs separated by the velocity lag . Rewriting the sum as a sum over all pixel pairs from each quasar , the correlation function can also be computed as the weighted average over the individual quasar correlation functions
(4) |
where , is the result of normalizing (i.e. ), and is the weighted estimate of the correlation function of each quasar .
Along similar lines, is computed as a weighted average via
(5) |
where is the continuum normalized flux of the th spectral pixel in the th quasar. The values of are 1.0015, 1.001, 1.0018 for the all-, high-, and low- bins, respectively, after masking out CGM absorbers.
For the weights, We use the inverse variance of the corrected measurement noise, , where is the error of the continuum normalized flux. While one should technically also include the variance due to the intrinsic fluctuations in the Mg ii forest, we neglect that term here as we are noise-dominated — based on H2021, for an IGM model with [Mg/H] = (), the typical fluctuation in the Mg ii forest is (2%), which requires a SNR of at least 20 (50), whereas the median SNR of our dataset is 13.7 (considering only the top five highest SNR quasars, the mean SNR is still 18).
We measure the correlation function of the Mg ii forest from km/s to km/s. Given that the correlation function is expected to rise at small scales and peak at the doublet separation (768 km/s), and flatten at large velocity separations, a fine bin size is needed to capture the small-scale rise and the correlation function peak, whereas a coarse bin size is sufficient at large scales. Therefore, to save computation time during the forward modeling step (§5.2), we adopt a heterogenous binning where we use a bin size of 80 km/s at km/s and a bin size of 200 km/s at km/s. Since we are using the weighted average over all quasars as the measurement for the dataset, given the heteregeneous SNR and pathlength of the quasars, they do not contribute equally to the final measurement. Table 1 shows the effective weights, , which quantifies the contribution of each quasar to the total correlation function for each redshift bin. For the all- bin, J03131806 has the highest weight (33%), followed by J00381527 (22%), J1342+0928 (14%), and J02520503 (12%). Similarly, the high- bin measurement is dominated by J03131806, J00381527, and J1342+0928, while the low- bin measurement is dominated by J03131806, J00381527, and J02520503. Given the unequal contributions of individual quasars to the correlation function measurement, we can compute a so-called effective sample size (Kish, 1995), which is an estimate of the sample size required for a random process where all data points carry uniform weight, to achieve the same level of variance as a random process with non-uniform weights. The effective sample size is given by , where is the weight of the -th data point. We find values of 4.9, 4.3, and 5.1 for the all-, high-, and low- bins, respectively, which is significantly less than the actual number of quasars in the dataset (ten).
Figures 7 to 9 show the correlation functions before and after we filter out CGM absorbers (see §4) in the three redshift bins. The colored points are the measurements for each quasar and the black points are the average from all quasars (Eqn 4). In the left panels of the figures, we detect a strong peak in the correlation function at the Mg ii doublet separation (768 km/s) that is due to strong CGM absorbers, as well as a rise at small velocity lags due to the small-scale velocity structure of the absorbers. The detection of the Mg ii peak at the correct location provides a sanity check of our methodology. We roughly estimate the error bars on the correlation function in the left panels as the dispersion between the individual correlation functions divided by , with being the total number of quasars in our dataset. We note that this error estimation is technically incorrect as our final correlation function measurement is a weighted one, but since we are only using these errors for visual purposes and given that bootstrapping a weighted measurement with a very small effective sample size () is challenging, we deem this to be sufficient. In the right panels, after masking out these strong absorbers, we detect a null signal consistent with noise in all redshift bins. Here, the error bars are obtained by taking the square root of the diagonal elements of the best-fit covariance matrix computed in §5.2.
We qualitatively compare our measurements with several IGM models from H2021, with varying [Mg/H] but fixed = 0.5, assuming FWHM = 120 km/s and sampling of 3 to simulate our data grid but noiseless skewers otherwise, as shown in Figure 10, where by-eye we can rule out models with high [Mg/H] . We perform a more rigorous inference in the next section.




5.2 Constraints on and [Mg/H] from Bayesian statistical inference
We forward model our quasar dataset and perform Bayesian inference to place constraints (upper limits) on the IGM metallicity and neutral fraction. We create 1000 realizations of each quasar in our dataset using the IGM models from H2021 spanning from = 0 to = 1.0 in bins of 0.02 and from to also in bins of 0.02. These IGM models are produced by combining a Nyx hydrodynamical simulation (Almgren et al., 2013; Lukić et al., 2015) of the pre-reionized IGM at , assuming the IGM is uniformly enriched with metals, with a modified version (Davies & Furlanetto, 2022) of the 21cmFAST semi-numerical simulations (Mesinger et al., 2011) of the reionization topology. The Nyx simulation was performed with a box size of 40 cMpc on a 20483 grid and we analyze the output at a single redshit snapshot at . The neutral fraction from 21cmFAST was computed on a 2563 grid over the Nyx simulation domain and evaluated at . Our 21cmFAST simulations adopt a fixed ionizing mean free path of 20 cMpc and adjust the ionizing efficiency to obtain the full range of models.
We forward model regions of the spectra outside of the quasar proximity zones. After convolving the mock Mg ii forest skewers with the FWHM and sampling of the respective instrument that the quasar is observed with, we interpolate them onto the observed coarse wavelength grid with a pixel size of 40 km/s Finally, we randomly sample the corrected noise vectors assuming a Gaussian distribution and these nois realizations to the simulated spectra. As the pathlength of a quasar in our dataset is longer than the pathlength of a mock skewer ( 60,000 km/s vs. 6567 km/s, see H2021), each forward-modeled spectrum is made up of randomly-drawn skewers such that the total pathlength of the two matches, where remaining pixels in the last skewer are not used in the analyses.
We generate forward-modeled datasets in total, where a mock dataset consists of randomly drawing one mock spectrum (i.e. multiple mock skewers per spectrum) from the 1000 realizations of each quasar. Finally, we apply the same masks and weights as the data to the forward-modeled data when computing the mock correlation function. We repeat the forward modeling in all three redshift bins (low-, high-, and all-). Figure 11 shows a comparison of real and forward-modeled spectra for the top three highest- quasars in our dataset (the rest are shown in Figure 20 and 21 in Appendix C), assuming an IGM model with (.



We perform Markov Chain Monte Carlo (MCMC) analysis using EMCEE (Foreman-Mackey et al., 2013), assuming a multivariate Gaussian likelihood as follows,
(6) |
where C is the covariance matrix, ) is the determinant of the covariance matrix, is the number of velocity bins over which we compute the correlation functions, and d = is the difference between the correlation function measured from the mock dataset and the model correlation function (henceforth abbreviated as ). As a forward-modeled spectrum consists of random skewers, we first compute the correlation function of each skewer individually and take the weighted average over these skewers as the correlation function for each spectrum. The correlation function for the mock dataset, , is computed as the weighted average over the individual quasar correlation function according to Eqn 4. We compute the model correlation function as the average over the individual noiseless correlation functions computed from the noiseless mock spectra. As discussed in §5.1, we measure the correlation function from a separation of 10 km/s to 3500 km/s in bins of 80 km/s at km/s and in bins of 200 km/s at km/s.
While the covariance matrix can in principle be calculated by bootstrapping from the actual data, given our small dataset where the effective sample size is 5, the covariance matrix estimated from the data will be too noisy. As such, we compute the elements of the covariance matrix from the forward models via
(7) |
where and refers to different bins of and the average is taken over the realizations of mock datasets.
For our MCMC sampling, we assume a flat linear prior on from 0 to 1 and a flat log prior on [Mg/H] from to . To speed up the MCMC sampling, we interpolate the covariance matrix, its determinant, and the model correlation function computed on our coarse model grid onto a finer grid with bins of 0.001 in and bins of 0.003 in [Mg/H].
As we describe in detail in Appendix A, a challenge that we encountered was that the MCMC samples appeared to be visually inconsistent with the data points and error bars. We determined that this issue arises from the presence of correlated negative correlation function values at some velocity lags in the data, which do not occur in the forward models. These strong correlated data points are improbable given the covariance matrix and thus yield MCMC samples and best-fit models that are visually inconsistent with the measurements. These spurious correlations arise from some unknown systematic in the data, which is most likely to be correlated noise resulting from sky-subtraction systematics. We developed a procedure that allows us to mask specific problematic velocity lags, to ensure sensible inference, as we describe in detail in Appendix A.
Figure 12 shows the resulting parameter constraints in the all- bin. Unsurprisingly, due to our null detection, we are unable to place any strong constraints on either parameters. To infer an upper limit on [Mg/H], we repeat the MCMC sampling with a flat linear prior on [Mg/H] and obtain an upper limit of at 95% C.L. The reason to adopt a linear prior for the upper limit is because the resulting upper limit is strongly affected by the prior range for a log prior, but not so for a linear prior. Figure 13 and 14 show the constraints on the high- and low- redshift bins, respectively. The corresponding 95% C.L. upper limits are [Mg/H] and [Mg/H] for the high- and low- bins, respectively.






6 Conclusions
The auto-correlation of the Mg ii forest, when treated as a continuous random field, has been shown to probe the neutral fraction and metallicity of the IGM (H2021). We measure this statistic for the first time using a sample of ten quasars observed with the Keck/MOSFIRE, Keck/NIRES, and VLT/X-SHOOTER spectrographs, where the median redshift of our measurement is . We also measure the correlation function over three redshift bins: “low-” (, ), “high-” (, ), and “all-” (). Our masking procedures to remove CGM absorbers successfully recover known strong absorbers as well as weaker absorbers.
The correlation function of the Mg ii forest without the CGM absorbers masked validates our measurement technique. We detect strong peaks in the correlation functions at the characteristic velocity separation of the Mg ii doublet (768 km/s) in all redshift bins as well as a rise towards small velocity lags owing to the velocity structure of CGM absorbers. After masking out the identified CGM absorbers, our correlation function measurements yield a null result that is consistent with noise in all redshift bins. We performed Bayesian inference where forward-modeled spectra were used to syntehsize a model-dependent covariance matrix of the data. Our MCMC analysis yields an upper limit of [Mg/H] at 95% confidence at a median redshift , while we are unable to place any strong constraints on the hydrogen neutral fraction. We place our upper limits in context with other literature measurements of the IGM metallicity in Figure 15. Simcoe (2011) (red points in figure) obtained an estimate of [C/H] via Voigt profile fitting of a sample of discrete H i absorbers with log, finding a median abundance [C/H] = at , which is times lower than similar measurements at using both C iv and O vi. On the other hand, Schaye et al. (2003) measured the distribution of C iv pixel optical depth as a function of the corresponding H i optical depth and converted this relation into an estimate of [C/H] as a function of density (the so-called pixel optical method). They measured a median [C/H] at in the low density IGM gas with log and found little evolution in [C/H] from to , as indicated by their best-fit model in the gray shaded region. To establish the existence or lack thereof of an evolution in the IGM metallicity, it is crucial to perform more measurements at and fill the redshift gap at and our results provide an important step in this direction.

Despite a null detection in this first study, a statistical method like the correlation function remains one of the best ways to measure the IGM metallicity at high redshift, which has never been attempted before due to observational difficulties and challenges posed by the traditional method of discrete line identification. Looking ahead to the future with spectra from JWST (James Webb Space Telescope; Gardner et al., 2006), we expect to place stronger constraints on primordial cosmic enrichment and reionization. For instance, using a sample of eight quasar spectra with a SNR/pixel of 76 from JWST/NIRSpec202020https://www.stsci.edu/jwst/science-execution/program-information?id=3526, we expect to jointly constrain [Mg/H] within 1 precision of 0.04 dex and within 1 precision of 10%, as well as place an upper limit of [Mg/H] on the IGM enrichment at 95% credibility. Finally, by combining very high- measurements from JWST that is probing the neutral IGM with ground-based measurements of, e.g. the C iv forest, at that is probing the ionized IGM, we may be able to detect the transition from forests of low-ionization metal absorbers to forests of high-ionization lines, which is the smoking gun of reionization as seen through metal absorption lines.
Acknowledgment
We acknowledge helpful conversations with the ENIGMA group at UC Santa Barbara and Leiden University. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 885301). JFH acknowledges support from the National Science Foundation under Grant No. 1816006. JO acknowledges support from grants PID2022-138855NB-C32 and CNS2022-135878 from the Spanish Ministerio de Ciencia y Tecnologia. This research made use of Astropy212121http://www.astropy.org, a community-developed core Python package for Astronomy (Astropy Collaboration et al., 2013, 2018) and SciPy222222https://scipy.org (Virtanen et al., 2020).
Data availability
The data underlying this article will be shared on reasonable request to the corresponding author.
References
- Almgren et al. (2013) Almgren A. S., Bell J. B., Lijewski M. J., Lukić Z., Van Andel E., 2013, ApJ, 765, 39
- Asplund et al. (2009) Asplund M., Grevesse N., Sauval A. J., Scott P., 2009, ARA&A, 47, 481
- Astropy Collaboration et al. (2013) Astropy Collaboration et al., 2013, AAP, 558, A33
- Astropy Collaboration et al. (2018) Astropy Collaboration et al., 2018, AJ, 156, 123
- Bañados et al. (2018) Bañados E., et al., 2018, Nature, 553, 473
- Bahcall & Peebles (1969) Bahcall J. N., Peebles P. J. E., 1969, ApJL, 156, L7
- Becker et al. (2009) Becker G. D., Rauch M., Sargent W. L. W., 2009, ApJ, 698, 1010
- Becker et al. (2019) Becker G. D., et al., 2019, ApJ, 883, 163
- Bosman et al. (2017) Bosman S. E. I., Becker G. D., Haehnelt M. G., Hewett P. C., McMahon R. G., Mortlock D. J., Simpson C., Venemans B. P., 2017, MNRAS, 470, 1919
- Bosman et al. (2021) Bosman S. E. I., et al., 2021, arXiv e-prints, p. arXiv:2108.03699
- Cooper et al. (2019) Cooper T. J., Simcoe R. A., Cooksey K. L., Bordoloi R., Miller D. R., Furesz G., Turner M. L., Bañados E., 2019, ApJ, 882, 77
- D’Odorico et al. (2022) D’Odorico V., et al., 2022, MNRAS, 512, 2389
- Davies & Furlanetto (2022) Davies F. B., Furlanetto S. R., 2022, MNRAS, 514, 1302
- Eilers et al. (2017) Eilers A.-C., Davies F. B., Hennawi J. F., Prochaska J. X., Lukić Z., Mazzucchelli C., 2017, ApJ, 840, 24
- Eilers et al. (2020) Eilers A.-C., et al., 2020, ApJ, 900, 37
- Ferrara (2016) Ferrara A., 2016, in Mesinger A., ed., Astrophysics and Space Science Library Vol. 423, Understanding the Epoch of Cosmic Reionization: Challenges and Progress, Springer International Publishing, Switzerland p. 163
- Foreman-Mackey et al. (2013) Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013, PASP, 125, 306
- Gardner et al. (2006) Gardner J. P., et al., 2006, Space Sci. Rev., 123, 485
- Hennawi et al. (2021) Hennawi J. F., Davies F. B., Wang F., Oñorbe J., 2021, MNRAS, 506, 2963
- Jaacks et al. (2018) Jaacks J., Thompson R., Finkelstein S. L., Bromm V., 2018, MNRAS, 475, 4396
- Karaçaylı et al. (2023) Karaçaylı N. G., et al., 2023, arXiv e-prints, p. arXiv:2302.06936
- Kirihara et al. (2020) Kirihara T., Hasegawa K., Umemura M., Mori M., Ishiyama T., 2020, MNRAS, 491, 4387
- Kish (1995) Kish L., 1995, Survey Sampling Wiley
- Liu & Bromm (2020) Liu B., Bromm V., 2020, MNRAS, 497, 2839
- Lukić et al. (2015) Lukić Z., Stark C. W., Nugent P., White M., Meiksin A. A., Almgren A., 2015, MNRAS, 446, 3697
- McLean et al. (2010) McLean I. S., et al., 2010, in McLean I. S., Ramsay S. K., Takami H., eds, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series Vol. 7735, Ground-based and Airborne Instrumentation for Astronomy III. p. 77351E
- McLean et al. (2012) McLean I. S., et al., 2012, in McLean I. S., Ramsay S. K., Takami H., eds, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series Vol. 8446, Ground-based and Airborne Instrumentation for Astronomy IV. p. 84460J
- Mesinger et al. (2011) Mesinger A., Furlanetto S., Cen R., 2011, MNRAS, 411, 955
- Miralda-Escudé & Rees (1998) Miralda-Escudé J., Rees M. J., 1998, ApJ, 497, 21
- Mortlock et al. (2011) Mortlock D. J., et al., 2011, Nature, 474, 616
- Oh (2002) Oh S. P., 2002, MNRAS, 336, 1021
- Oppenheimer & Davé (2006) Oppenheimer B. D., Davé R., 2006, MNRAS, 373, 1265
- Oppenheimer et al. (2009) Oppenheimer B. D., Davé R., Finlator K., 2009, MNRAS, 396, 729
- Planck Collaboration et al. (2020) Planck Collaboration et al., 2020, AAP, 641, A6
- Pons et al. (2019) Pons E., McMahon R. G., Simcoe R. A., Banerji M., Hewett P. C., Reed S. L., 2019, MNRAS, 484, 5142
- Prochaska et al. (2020a) Prochaska J. X., et al., 2020a, pypeit/PypeIt: Release 1.0.0, Zenodo, 10.5281/zenodo.3743493
- Prochaska et al. (2020b) Prochaska J. X., et al., 2020b, Journal of Open Source Software, 10.21105/joss.02308, 5, 2308
- Ryan-Weber et al. (2006) Ryan-Weber E. V., Pettini M., Madau P., 2006, MNRAS, 371, L78
- Satyavolu et al. (2023) Satyavolu S., et al., 2023, MNRAS, 522, 4918
- Schaye et al. (2003) Schaye J., Aguirre A., Kim T.-S., Theuns T., Rauch M., Sargent W. L. W., 2003, ApJ, 596, 768
- Simcoe (2011) Simcoe R. A., 2011, ApJ, 738, 159
- Simcoe et al. (2011) Simcoe R. A., et al., 2011, ApJ, 743, 21
- Simcoe et al. (2020) Simcoe R. A., Onoue M., Eilers A.-C., Banados E., Cooper T. J., Furesz G., Hennawi J. F., Venemans B., 2020, arXiv e-prints, p. arXiv:2011.10582
- Slosar et al. (2011) Slosar A., et al., 2011, JCAP, 2011, 001
- Songaila (1997) Songaila A., 1997, ApJL, 490, L1
- Songaila & Cowie (1996) Songaila A., Cowie L. L., 1996, AJ, 112, 335
- Tie et al. (2022) Tie S. S., Hennawi J. F., Kakiichi K., Bosman S. E. I., 2022, arXiv e-prints, p. arXiv:2201.10571
- Vernet et al. (2011) Vernet J., et al., 2011, AAP, 536, A105
- Virtanen et al. (2020) Virtanen P., et al., 2020, Nature Methods, 17, 261
- Wang et al. (2018) Wang F., et al., 2018, ApJL, 869, L9
- Wang et al. (2019) Wang F., et al., 2019, ApJ, 884, 30
- Wang et al. (2021) Wang F., et al., 2021, ApJL, 907, L1
- Wilson et al. (2004) Wilson J. C., et al., 2004, in Moorwood A. F. M., Iye M., eds, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series Vol. 5492, Ground-based Instrumentation for Astronomy. pp 1295-1305
- Yang et al. (2019) Yang J., et al., 2019, AJ, 157, 236
- Yang et al. (2020) Yang J., et al., 2020, ApJL, 897, L14
Appendix A
During our investigations, we discovered that our likelihood yields MCMC samples and best-fit models that appear visually inconsistent with the actual measurements and error bars. The source of this discrepancy is correlated velocity bins with negative values in the data that are not supposed to occur given the correlation structure of the covariance matrix. To rectify this, we mask the outlying velocity bins before performing the MCMC. To determine which bins should be masked, we compare values of the correlation function measured from the data with values measured from the forward models in which the IGM model has no detectable signal. This is because which velocity bins are outliers depend on the structure of the covariance matrix, which is model-dependent. Therefore, we choose a model with no signal to remove sensitivity towards the signal.
Figures 16 and 17 show the correlation functions in one velocity lag bin plotted against the values of all other velocity bins for the all- bin. The black points are measured from forward models with = 0.50 and [Mg/H] = , where the ellipses denote their 1, 2, and 3 contours. Measurements from the real data that fall outside of the 3 ellipse are marked as red while measurements that fall within the 3 ellipse are marked as green. We mask the velocity bin where 20 or more of the real measurements fall outside the 3 ellipse; Figure 16 shows two examples of bad bins that are masked, while Figure 17 shows two examples of good bins that we retain. In conclusion, we mask = 370, 450, 1170, and 1490 km/s in the all- bin, = 50 and 1090 km/s in the high- bin, and = 50, 130, 290, 370, 450, 770, 930, and 1490 km/s in the low- bin (see Figures 12 - 14).
We generate the forward-modeled datasets and compute their covariance matrices for all bins and only apply the bin masking before the MCMC sampling, in which we mask the bad bins in both the correlation function and the covariance matrix. The best-fit models and upper limits are therefore obtained from the masked data and masked covariance matrices.




Appendix B
Here we show the masked absorbers for the rest of the quasars in our dataset that are not shown in §4, which are Figure 18 and 19.





Appendix C
Here we show the forward-modeled spectra for the rest of the quasars in our dataset that are not shown in §5, assuming an IGM model with (. These are shown in Figure 20 and 21.




