This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Mitigating radio frequency interference in CHIME/FRB real-time intensity data

Masoud Rafiei-Ravandi Department of Physics, McGill University, 3600 rue University, Montréal, QC H3A 2T8, Canada McGill Space Institute, McGill University, 3550 rue University, Montréal, QC H3A 2A7, Canada Perimeter Institute for Theoretical Physics, 31 Caroline Street N, Waterloo, ON N25 2YL, Canada Department of Physics and Astronomy, University of Waterloo, Waterloo, ON N2L 3G1, Canada Kendrick M. Smith Perimeter Institute for Theoretical Physics, 31 Caroline Street N, Waterloo, ON N25 2YL, Canada Masoud Rafiei-Ravandi [email protected]
Abstract

Extragalactic fast radio bursts (FRBs) are a new class of astrophysical transient with unknown origins that have become a main focus of radio observatories worldwide. FRBs are highly energetic (1036\sim 10^{36}104210^{42} erg) flashes that last for about a millisecond. Thanks to its broad bandwidth (400–800 MHz), large field of view (\sim200 sq. deg.), and massive data rate (1500 TB of coherently beamformed data per day), the Canadian Hydrogen Intensity Mapping Experiment / Fast Radio Burst (CHIME/FRB) project has increased the total number of discovered FRBs by over a factor 10 in 3 yr of operation. CHIME/FRB observations are hampered by the constant exposure to radio frequency interference (RFI) from artificial devices (e.g., cellular phones, aircraft), resulting in \sim20% loss of bandwidth. In this work, we describe our novel technique for mitigating RFI in CHIME/FRB real-time intensity data. We mitigate RFI through a sequence of iterative operations, which mask out statistical outliers from frequency-channelized intensity data that have been effectively high-pass filtered. Keeping false-positive and false-negative rates at very low levels, our approach is useful for any high-performance surveys of radio transients in the future.

High energy astrophysics (739), Radio telescopes (1360), Radio transient sources (2008)

1 introduction

The search for extragalactic fast radio bursts (FRBs) has become a central undertaking for radio telescopes worldwide. FRBs are brief (\simmillisecond) flashes of radio signal with unknown origins. To date, over 750 FRBs have been discovered (see, e.g., CHIME/FRB Catalog 1 in CHIME/FRB Collaboration, 2021).111Catalogs of known FRBs are available on https://www.herta-experiment.org/frbstats (Spanakis-Misirlis, 2021) and https://www.wis-tns.org (Petroff & Yaron, 2020). This includes 25 repeating sources that emit multiple bursts sporadically over long (\simdays to months) time intervals, and about 20 (repeating and nonrepeating) sources that are localized to host galaxies (for a recent review, see Petroff et al., 2022).

The FRB arrival time tt at the observing frequency ν\nu is delayed proportional to ν2\nu^{-2}, owing to the dispersion of radio waves in the intervening cold plasma along the line of sight. The delay is measured by the dispersion measure (DM), which sets the proportionality constant in the dispersion relation. FRBs are found by integrating the total intensity I(t,ν)I(t,\nu) across frequency along different dispersion profiles and searching for sharp peaks in the time profiles in the (t,DMt,\mbox{DM}) space.

Most FRBs have been discovered by the Canadian Hydrogen Intensity Mapping Experiment / Fast Radio Burst (CHIME/FRB) instrument (CHIME/FRB Collaboration, 2018), which is equipped with an incoherent dedispersion search engine (out to a maximum DM of 13,000 pc cm-3) operating constantly between 400 and 800 MHz. CHIME/FRB is one of the four digital backends that are connected to the CHIME telescope, which has a large field of view (\sim200 deg2). The CHIME telescope and its digital backends (CHIME Collaboration, 2022) are located at the Dominion Radio Astrophysical Observatory (DRAO), which is in a radio-quiet area in the Okanagan Valley, British Columbia, Canada. Despite being regulated by local and federal governments, the CHIME radio bandwidth is exposed to a broad range of anthropogenic signals from various sources nearby (e.g., cellular phones) and far away (e.g., aircraft and satellites) that interfere with daily observations of astrophysical phenomena. If not mitigated properly, even a very short (\sim10 ms) blip of radio frequency interference (RFI) could trigger thousands of false positives in the CHIME/FRB instrument.

RFI can be very unpredictable in the (t,ν)(t,\nu) space. Several RFI mitigation techniques have been effective in the past. Most notably, impulsive RFI could be masked through an iterative threshold-based scheme (see, e.g., the SumThreshold algorithm in Offringa et al., 2010a, b) or a statistical estimator such as the spectral kurtosis (see, e.g., Nita & Gary, 2010a, b; Taylor et al., 2019). Periodic RFI could be detected more robustly in the harmonic space of intensity (see, e.g., Maan et al., 2021). In addition, large-scale RFI variations could be removed by fitting a baseline (see, e.g., Eatough et al., 2009; Zhang et al., 2022). Furthermore, general RFI patterns could be flagged through machine-learning algorithms (see, e.g., Akeret et al., 2017; Czech et al., 2018; Yang et al., 2020). In this work, we describe our strategy for mitigating RFI in the CHIME/FRB data. CHIME/FRB science requirements combined with its data rate (1500 TB/day) introduced computational challenges that led to the design and development of extremely optimized software from scratch.

As a brief overview, CHIME/FRB masks RFI through a long sequence of iterative operations in a real-time pipeline. These operations have been fine-tuned in order to account for the computational cost while simultaneously minimizing the false-positive and false-negative rates. The pipeline computes a time-varying RFI mask for the frequency-channelized intensity data, so that RFI can be suppressed prior to the dedispersion transform. This technique is the first step in our RFI mitigation and is the only one that works directly with the intensity data to lower the number of false positives from 105\sim 10^{5} to a few per day, while detecting FRBs at an unprecedented rate (CHIME/FRB Collaboration, 2021). Despite being designed for a specific instrument, our algorithms could in principle be applied to other real-time searches of radio transients such as pulsar surveys.

The paper is organized as follows. We present an overview of the CHIME/FRB software pipeline in §2. In §3, we present a detailed account of our RFI mitigation strategy. We discuss results, unsolved problems, and potential opportunities for improvement in §4. Throughout, we adopt the CHIME/FRB data as a running example.

2 CHIME/FRB pipeline

The CHIME/FRB instrument (CHIME/FRB Collaboration, 2018) has multiple processing levels (denoted by L1, L2, etc.) that have been running constantly except during upgrade cycles and emergency shutdowns since mid-2018 (for a full discussion of telescope up time, see CHIME Collaboration, 2022). In L0 (the CHIME FX correlator, containing 256 processing nodes; also see CHIME Collaboration, 2022), frequency-channelized data are coherently arranged in an array of 4×2564\times 256 digitally formed beams. In L1 (the real-time FRB search engine, running on 128 processing nodes, each with two 10-core Intel E5-2630v4 CPUs and 128 GB of RAM), beamformed packets are assembled into a time series of total intensity samples. Then, RFI is mitigated by generating a mask (zeros and ones), which is subsequently used for detrending (or effectively high-pass filtering) and dedispersing the intensity (see Figure 1). Once the intensity is dedispersed, L1b groups and ranks astrophysical vs. RFI-triggered events based on their signal-to-noise ratio (S/N), arrival time, DM, and sky location. Next, L2/L3 labels sources “Galactic” or “extragalactic” and depending on this and on, e.g., S/N, decides what action to take (e.g., whether to store buffered intensity data). Finally, metadata are stored in a database (L4).

Refer to caption

Refer to caption

Refer to caption

Figure 1: CHIME/FRB L1 data for a single beam, containing a pulsar transit along with RFI. Top panel: raw intensity with persistent RFI between 600 and 650 MHz. The relatively low-intensity region centered at \sim750 MHz corresponds to the long-term evolution (LTE) band, which is usually active. Multiple scatterings of radio waves off the feed structure cause the systematic “ripple” effect with a period of \approx30 MHz across the bandwidth (CHIME Collaboration, 2022). Black regions indicate missing data. Middle panel: detrended intensity, including the missing data and RFI mask (black regions) prior to the dedispersion transform (see Figure 2). Dispersed sweeps (ν2\propto\nu^{-2}) across the bandwidth, particularly visible in the 600–400 MHz range at \approx2.2 second intervals, are signals from a nearby pulsar. Bottom panel: dedispersion space of events, including the pulsar (green and yellow dots at DM 100\lesssim 100 pc cm-3). The RFI event at (t128.8t\approx 128.8 s, DM 6500\approx 6500 pc cm-3) has a low signal-to-noise ratio (7S/N<107\leq{\rm S/N}<10) and is below the detection threshold.

We define a “false positive” as an event with nonastrophysical origin (e.g., RFI, thermal noise, and faulty calibration) and S/N \geq 10 that passes through L1–L4 and is classified as an astrophysical candidate. We emphasize that the main CHIME/FRB search is carried out over the dedispersion space. Resembling FRBs by randomly lining up as ν2\nu^{-2} in the intensity space, RFI has been by far the largest contributing factor to the CHIME/FRB false positives, which could easily overwhelm L2–L4 pipelines. In addition, unmasked RFI could reduce the true-positive rate. The L1 system computes a running estimate of the variance of intensity, and unmasked RFI could contaminate this estimate, lowering the overall system sensitivity, severely impacting the true-positive rate for various selection criteria. As such, we mitigate false positives in all levels of the intensity pipeline: L1 masks raw intensity data without incorporating beam information, while L1b–L3 employ machine-learning classifiers to discriminate between true and false positives after sky localization utilizing ancillary information about the system performance and environment. This paper focuses on the RFI mitigation prior to the dedispersion transform in L1.

The L1 software package contains multiple object-oriented code libraries with distinct purposes. For instance, “ch_frb_io222ch_frb_io: https://github.com/CHIMEFRB/ch_frb_io assembles L0 data packets into a stream of 1 s chunks of masked intensity data,333Intensity values are floating-point numbers between 0 and 256. Initial masks contain zeros and ones that correspond to bad and good intensity data, respectively. which are then fed into “rf_pipelines444rf_pipelines: https://github.com/kmsmith137/rf_pipelines for RFI mitigation. Figure 2 shows the high-level logic of the RFI mitigation pipeline as configured and implemented for CHIME/FRB.

Each box in the diagram represents an operation (or “transform”) whose input and output arrays are represented as arrows. All arrays are 2D arrays indexed by (t,νt,\nu), which are processed in 4 s chunks (see §3). The “intensity stream” continuously emits arrays of the intensity ItνI_{t\nu} and mask MtνM_{t\nu} that are downsampled from 16k to 1k frequency channels as follows:

Itν\displaystyle I_{t\nu} =\displaystyle= n=015Mt,16ν+nIt,16ν+nn=015Mt,16ν+n\displaystyle\frac{\mathop{\textstyle\sum}\limits_{n=0}^{15}{M_{t,16\nu+n}\,I_{t,16\nu+n}}}{\mathop{\textstyle\sum}\limits_{n=0}^{15}{M_{t,16\nu+n}}} (1)
Mtν\displaystyle M_{t\nu} =\displaystyle= 116n=015Mt,16ν+n,\displaystyle\frac{1}{16}\mathop{\textstyle\sum}\limits_{n=0}^{15}{M_{t,16\nu+n}}\,, (2)

where the masks are treated as nonbinary weights. Next, downsampled data are processed inside the “subpipeline” in order to update the mask, which is then upsampled from 1k to 16k frequency channels for mitigating RFI in the raw intensity data. We upsample arrays by copying and tiling elements along the downsampled axis (e.g., upsampling the 1D array [0,0.5,1][0,0.5,1] by a factor 2 yields [0,0,0.5,0.5,1,1][0,0,0.5,0.5,1,1]). This down/upsampling logic is essential for minimizing the computational cost while maintaining a robust yet coarse-grained estimate of time-varying RFI patterns across the CHIME/FRB bandwidth at 1 ms time resolution. In §3, we take a deep dive into the core of our RFI mitigation algorithm inside the “subpipeline.”

Using the upsampled mask, we detrend and dedisperse the raw intensity with 16k frequency channels. Thus, the dedispersion transform continuously receives a set of (detrended intensity, RFI mask) arrays with 16k frequency channels as input. Then, the dedispersion transform replaces masked regions555The dedispersion transform can be configured to expand the boundaries of masks in order to suppress potentially unmasked residuals from highly active RFI in neighboring frequency and time samples. This is the last line of defense against RFI-triggered false positives in L1. of intensity arrays by an estimate of the running variance based on a white noise model for the input intensity values and a slowly varying function for the variance of unmasked intensity along the time axis. In addition, the running variance is adopted for normalizing dedispersed events to S/N. Finally, the dedispersion transform outputs a 4D array (DM, scattering measure, spectral index, time) of coarse-grained trigger statistics, where the coarse-graining is done by taking the maximum over time and DM.

The entire chain of L1 operations is fixed in the CHIME/FRB instrument; a slight modification in the algorithm requires a system-wide reconfiguration and restart.666CHIME/FRB L1 configuration files are available in ch_frb_rfi: https://github.com/mrafieir/ch_frb_rfi Although very rare, such reconfigurations are necessary when RFI starts to show up in a newly allocated radio band, e.g., owing to the 5G network. To that end, we alert system operators777The instrument is monitored daily by members of the CHIME/FRB collaboration. to save several hours of intensity data during active RFI. Using the saved data, we attempt to reproduce the culprit for a close visual inspection. We can often find a simple hotfix that will suppress the new RFI effectively (see §3.3).

Refer to caption

Figure 2: CHIME/FRB L1 pipeline for processing frequency-channelized intensity data in real time. L1 mitigates RFI for 16k frequency channels (Nν=16kN_{\nu}={\rm 16k}) by constantly generating a downsampled mask through a long sequence of operations inside the “subpipeline,” which reduces the overall computational cost by a factor \approx16 (see Figure 3). The RFI mask is upsampled back to 16k frequency channels for detrending and dedispersing the raw intensity.

3 RFI transform chains

The “subpipeline” (see Figure 2) contains a sequence of over 100 transforms for masking RFI. This long chain of operations is constructed mainly by iterating over the clipping and detrending transforms, which are described in §3.1 and §3.2, respectively. We took the iterative approach based on the observation that RFI contaminates the probability distribution function (PDF) of intensity (and its higher moments) through statistical outliers (e.g., 5σ\sigma deviations from the mean of intensity in a specific frequency channel) that can be masked incrementally.

While our clipping and detrending transforms can generally recognize and suppress RFI through the upsampled mask, they can also leave some residuals that induce false positives after the final round of detrenders. For example, unmasked narrowband RFI could cause a series of ripples in the detrended intensity along the frequency axis. Under the right circumstances, these ripples could induce false positives downstream. Therefore, we need to account for such residual-induced false positives as well as RFI-triggered false positives throughout the L1 pipeline (see, e.g., Figure 4). In addition, we require an overall latency of \sim10 s for reporting events in real time. This is to facilitate detection of counterparts using other telescopes, achieved through automated real-time alerts (Zwaniga, 2021). Also, our baseband (voltage) recording system (Michilli et al., 2021) imposes an upper bound (4 s) on the size of intensity chunks in the L1 ring buffers.

Figure 3 illustrates the CHIME/FRB RFI transform chain nested inside the subpipeline.

Refer to caption

Refer to caption

Refer to caption

Figure 3: CHIME/FRB L1 subpipeline for generating dynamic RFI masks over intensity chunks with 1k frequency channels. Top panel: high-level diagram of the subpipeline, including the initial static mask, clipping and detrending transform chains in a specific order as described in §3. Middle panel: clipping transform chain that is iterated six times (circular arrow). Bottom panel: detrending transform chain, which is also executed for raw intensity data with 16k frequency channels (see Figure 2).

The simplest form of RFI transforms is a static mask, which constantly eliminates a set of known frequency channels exhibiting intense RFI. We place the static mask at the very beginning of our chain in order to minimize residuals, which can easily distort the intensity PDF.

Following the static mask, clipping transforms or “clippers” take the streaming chunk of intensity data and mask out statistical outliers dynamically. These outliers could be deviations from the mean or standard deviation (stdv) of the intensity PDF along an arbitrary axis (i.e., time and/or frequency). Then, a sequence of detrending transforms or “detrenders” remove large-scale intensity variations (along time and frequency) from the data. Next, a second round of clipping transforms is applied to the modified intensity PDF. We note that both the mask and intensity data stream are updated throughout this process inside the subpipeline. However, only the mask is upsampled and passed to the last round of detrenders prior to the dedispersion over 16k frequency channels. The last round of detrenders is required for the CHIME/FRB dedispersion transform, which assumes a zero baseline for the incoming intensity data stream. Finally, the upsampled mask (\sim20% loss of bandwidth on average) and detrended intensity data are passed to the dedispersion transform.

Ideally, the specific configuration of RFI transform chains would be determined through a global minimization problem, which would aim at minimizing the computational cost along with the false-positive and false-negative rates. In actuality, we configured the CHIME/FRB RFI transform chain empirically; although not necessarily optimal, we configured transforms through trial and error using hundreds of hours of intensity acquisitions.888Other than with the real-time pipeline, L1 can be run in an offline mode with saved intensity data. The remainder of this section contains a detailed account of the RFI transform chains.

3.1 Clipping transform chain

CHIME/FRB beamformed intensity data are constructed by summing squared voltages Va(ν)Vb(ν)\langle V_{a}(\nu)V_{b}^{*}(\nu^{\prime})\rangle from the CHIME antennas aa and bb (see, e.g., Masui et al., 2019). In the absence of RFI, we expect the raw voltages to be Gaussian distributed (Fridman & Baan, 2001) and hence the intensity to be χ2\chi^{2} distributed. In CHIME/FRB, we developed two different types of transforms for masking statistical outliers in the intensity data: intensity and standard deviation (stdv) clipping transforms.

Let ItνI_{t\nu} represent the 2D array of possibly downsampled masked intensity, which is a function of time (tt) and frequency (ν\nu). Then, the intensity clipping transform updates the mask MtνM_{t\nu} by zeroing out elements that satisfy

|ItνI¯tν|(σcVar(I)tν1/2),\left|I_{t\nu}-\bar{I}_{t\nu}\right|\geq\left(\sigma_{c}\mbox{Var}(I)_{t\nu}^{1/2}\right), (3)

where the threshold level σc\sigma_{c} is a configurable constant, and the mean intensity array I¯tν\bar{I}_{t\nu} and variance array Var(I)tν\mbox{Var}(I)_{t\nu} are computed based on the initial intensity ItνI_{t\nu} along a specific axis in time and/or frequency in two stages as follows:

Itνinternallyclip[I¯tν,Var(I)tν]reshape[I¯tν,Var(I)tν],I_{t\nu}\xrightarrow[\rm internally]{\rm clip}\left[\bar{I}_{t^{\prime}\nu^{\prime}},\mbox{Var}(I)_{t^{\prime}\nu^{\prime}}\right]\xrightarrow{{\rm reshape}}\left[\bar{I}_{t\nu},\mbox{Var}(I)_{t\nu}\right], (4)

where the indices tt^{\prime} and ν\nu^{\prime} refer to arrays with a reduced length, e.g., computing the mean along the time axis of an array with shape (Nt,NνN_{t},N_{\nu})=(4k, 16k) results in a new array with shape (Nt,NνN_{t^{\prime}},N_{\nu^{\prime}})=(1, 16k). In Equation (4), the first stage is the computation of the masked mean and variance along an axis:

I¯tν\displaystyle\bar{I}_{t^{\prime}\nu^{\prime}} =\displaystyle= m=0Dt1n=0Dν1MxyIxym=0Dt1n=0Dν1Mxy\displaystyle\frac{\mathop{\textstyle\sum}\limits_{m=0}^{D_{t}-1}\,\mathop{\textstyle\sum}\limits_{n=0}^{D_{\nu}-1}{M_{xy}\,I_{xy}}}{\mathop{\textstyle\sum}\limits_{m=0}^{D_{t}-1}\,\mathop{\textstyle\sum}\limits_{n=0}^{D_{\nu}-1}{M_{xy}}} (5)
Var(I)tν\displaystyle\mbox{Var}(I)_{t^{\prime}\nu^{\prime}} =\displaystyle= m=0Dt1n=0Dν1Mxy(IxyI¯xy)2m=0Dt1n=0Dν1Mxy,\displaystyle\frac{\mathop{\textstyle\sum}\limits_{m=0}^{D_{t}-1}\,\mathop{\textstyle\sum}\limits_{n=0}^{D_{\nu}-1}M_{xy}{\left(I_{xy}-\bar{I}_{xy}\right)^{2}}}{\mathop{\textstyle\sum}\limits_{m=0}^{D_{t}-1}\,\mathop{\textstyle\sum}\limits_{n=0}^{D_{\nu}-1}{M_{xy}}}, (6)

where the indices xtDt+mx\equiv tD_{t}+m and yνDν+ny\equiv\nu D_{\nu}+n are defined with the downsampling factors DtNt/NtD_{t}\equiv N_{t^{\prime}}/N_{t} and DνNν/NνD_{\nu}\equiv N_{\nu^{\prime}}/N_{\nu} along the time and frequency axis, respectively. Here, the possibly downsampled mask MxyM_{xy} is treated as nonbinary weights. In addition, an intensity clipping transform can have an internal iteration that temporarily masks out intensity outliers nc,inn_{c,{\rm in}} times prior to the final computation of mean and variance. We configure the internal iteration by the threshold parameter σc,in\sigma_{c,{\rm in}}. Whether internal or external, every clipping iteration helps improve the RFI mask by recognizing more statistical outliers and hence reshaping the masked intensity PDF to a robust χ2\chi^{2} distribution in real time. In the second stage, we reshape these masked arrays by copying and tiling elements along the reduced axis. This recovers the initial shape of input intensity data, enabling the final element-wise operation in Equation (3).

Likewise, the stdv clipping transform masks RFI along the time or frequency axis using the following criterion:

|JtνJtν¯|(σcVar(Jtν)1/2),\left|J_{t^{\prime}\nu^{\prime}}-\overline{J_{t^{\prime}\nu^{\prime}}}\right|\geq\left(\sigma_{c}\mbox{Var}(J_{t^{\prime}\nu^{\prime}})^{1/2}\right), (7)

where the masked array JtνVar(I)tνJ_{t^{\prime}\nu^{\prime}}\equiv\mbox{Var}(I)_{t^{\prime}\nu^{\prime}} with no internal clipping, and the mean of variance Jtν¯\overline{J_{t^{\prime}\nu^{\prime}}} and variance of variance Var(Jtν)\mbox{Var}(J_{t^{\prime}\nu^{\prime}}) are scalar. Unlike the intensity clippers, the stdv clippers “reshape” the running mask to match its shape (Nt,NνN_{t^{\prime}},N_{\nu^{\prime}}) with the initial intensity ItνI_{t\nu} only after executing Equation (7).

Table 1 lists the CHIME/FRB clipping transform chain, which is iterated ncn_{c} times before and after the intensity detrenders, i.e., the clipping transform chain is a nested loop inside an outer loop over detrenders. The first three stdv clippers are along the time axis, and have a 3σ\sigma threshold with no down/upsampling factors. Then, two stdv clippers are applied along the frequency axis, with a 3σ\sigma threshold and no down/upsampling factors. Next, two intensity clippers are applied at a 5σ\sigma threshold. Finally, two intensity clippers are applied with downsampling factors of 16 and 2 along the time and frequency axis, respectively. The downsampling logic helps expand the mask and hence prevent intensity leakage over RFI-contaminated regions.

ii type σc\sigma_{c} σc,in\sigma_{c,{\rm in}} nc,inn_{c,{\rm in}} axis NtN_{t} DtD_{t} DνD_{\nu}
1 stdv 3 - - tt 4096 1 1
2 stdv 3 - - tt 4096 1 1
3 stdv 3 - - tt 4096 1 1
4 stdv 3 - - ν\nu 4096 1 1
5 stdv 3 - - ν\nu 4096 1 1
6 intensity 5 5 9 ν\nu 4096 1 1
7 intensity 5 5 9 tt 4096 1 1
8 intensity 5 3 9 (t,ν)(t,\nu) 4096 16 2
9 intensity 5 3 9 ν\nu 4096 16 2
Table 1: Clipping transform chain for dynamically masking RFI in the CHIME/FRB intensity stream. The index ii specifies the position of each transform in the chain. The intensity and standard deviation (stdv) transforms clip statistical outliers at the threshold level σc\sigma_{c} from the mean (see Equations 3 and 7). In addition, the intensity clipping transform has an internal iteration which allows for temporarily masking outliers at the σc,in\sigma_{c,{\rm in}} level prior to the final clipping operation. The number of internal iterations is set by the parameter nc,inn_{c,{\rm in}}. NtN_{t} is the total number of 1 ms time samples in an intensity chunk that each transform processes in real time. In each transform, the incoming chunks are downsampled (and consequently final coarse-grained masks are upsampled) by the two factors DtD_{t} and DνD_{\nu} along the time and frequency axis, respectively (see, e.g., Equations 5 and 6). Operations along the axis (t,ν)(t,\nu) are carried out over the 2D array of intensity values.

Since RFI is a non-Gaussian process (Fridman & Baan, 2001; Mirhosseini, 2020), we expect RFI-triggered false positives to appear more distinctly at higher statistical moments of the intensity. Therefore, we place the stdv transforms at the very beginning of the clipping transform chain. Additionally, we refrain from downsampling the intensity in the beginning, since coarse-grained masks lack precision in containing only regions with RFI. In the case of intensity clippers, internal iterations are effective in decontaminating PDFs at a low computational cost. On the other hand, RFI residuals can hide behind internal masks and hence contaminate the stream if σc,in\sigma_{c,{\rm in}} and nc,inn_{c,{\rm in}} are not set properly. Finally, we mask typically along the time axis first, since RFI tends to arrive in constant frequency bands. Furthermore, we maximize the chunk size NtN_{t}, which is constrained mainly by the latency requirement for the CHIME/FRB baseband recording system. We find empirically that by iterating the clipping transform chain (Table 1) six times before and after a chain of detrenders (see §3.2), we are able to sufficiently suppress RFI while detecting real pulsars in CHIME/FRB intensity acquisitions.999We initially designed the RFI transform chain through the analysis of incoherent beam data (400–800 MHz, NνN_{\nu}=1k) from the CHIME Pathfinder (Bandura et al., 2014) and single-beam data (400–800 MHz, NνN_{\nu}=16k) from the Galt 26 m telescope. These instruments share the same RFI environment with CHIME/FRB at DRAO. The current configuration is based on seven different production-level transform chains that were designed to either mitigate new sources of RFI or adapt to other pipeline requirements (e.g., constraining the buffer size NtN_{t} for the baseband recording system) sporadically since the first CHIME/FRB commissioning in mid-2018.

3.2 Detrending transform chain

The CHIME/FRB intensity stream exhibits large-scale variations from RFI, forward gains, and digital beamforming (Ng et al., 2017) as functions of the time, frequency, and sky location. Such baseline variations distort the intensity PDF, causing the clipping and dedispersion transforms to fail. We mitigate this effect by introducing a chain of detrending transforms in the pipeline. Detrending transforms remove a best-fit model with the polynomial degree α\alpha from the masked intensity array ItνI_{t\nu} along an axis (see, e.g., Eatough et al., 2009). This real operation is a computationally low-cost alternative to a high-pass filter in the harmonic space of intensity.

Table 2 lists the CHIME/FRB detrending transform chain, which contains two types of transforms: the polynomial (based on Legendre polynomials) and spline detrenders, both of which make use of the Cholesky decomposition for efficient computation of the best-fit coefficients. We designed the main RFI transform chain in the subpipeline such that detrenders are sandwiched between clipping transforms (see Figure 3). In other words, the detrending transform chain is an outer loop with ndn_{d} iterations, each of which contains ncn_{c} inner iterations over the clipping transform chain. We emphasize that subpipeline detrenders do not affect the intensity stream, which is fed into the dedispersion transform (see Figure 2). Subpipeline detrenders work in tandem with clipping transforms in order to generate a mask that mitigates RFI contamination in the detrended downsampled intensity chunks.

jj type α\alpha axis NtN_{t}
1 polynomial 4 tt 4096
2 spline 12 ν\nu 4096
Table 2: Detrending transform chain for the CHIME/FRB intensity stream. The index jj specifies the position of each transform in the chain. Each transform fits a model with the polynomial degree α\alpha to intensity arrays of width NtN_{t} (number of 1 ms time samples) along one “axis”. Then, the model is subtracted from data in order to account for large-scale variations due to RFI and instrumental systematics.

The CHIME/FRB subpipeline currently has only a single iteration (nd=1n_{d}=1) of the detrending transform chain. We detrend the raw intensity after generating the RFI mask; using the upsampled mask with 16k frequency channels, we apply the same detrending transform chain to the raw intensity (Nν=16N_{\nu}=16k) prior to the dedispersion transform, which assumes input data with a zero mean. In addition to modifying the intensity values, detrenders can modify the running mask (e.g., by masking an entire chunk) in corner cases where numerical instabilities, e.g., owing to highly active RFI, cause a failure in converging on a robust solution. Therefore, the upsampled RFI mask is subject to further modification in the last round of detrenders between the subpipeline and dedispersion transform (see Figure 2).

3.3 Auxiliary transforms

Thus far, we have described our RFI mitigation technique, which is tailored to recognize statistical outliers within a distribution of intensity values. Assuming unlimited computational resources, we find empirically that any changes in our standard transform chain parameters (e.g., Table 1) could result in either masking excessively or RFI residuals downstream. However, the CHIME/FRB RFI environment is dynamic; unknown RFI appears suddenly at DRAO from time to time. In particular, we have been experiencing a series of “RFI storms” since late-2020 (see, e.g., Figures 4 and 5). Such unusually intense events could escape the entire clipping transform chain, consequently triggering \sim1 million false positives in a single day of operation. Although new radio transmitters and reflective aircraft101010Aircraft-triggered RFI typically escapes the L1 pipeline, causing a few thousand subthreshold (7S/N<107\leq{\rm S/N}<10) false positives in a day (see, e.g., Figures 4 and 5). We eliminate these events in the downstream pipelines. could cause such high numbers of false positives, the exact source of RFI storms and their spectral properties are still an unsolved puzzle.

Refer to captionRefer to captionRefer to caption

Refer to captionRefer to captionRefer to caption

Refer to caption Refer to caption Refer to caption

Figure 4: CHIME/FRB L1 data for three single beams, containing impulsive RFI that potentially triggers false positives in the downstream pipelines. Top row: raw intensity with persistent features as described in Figure 1. Black regions indicate missing data. Middle row: detrended intensity, including the missing data and RFI mask (black regions) prior to the dedispersion transform. Bottom row: dedispersion space of events, including residual-induced false positives (left and middle panels) and subthreshold (7S/N<107\leq{\rm S/N}<10) high-DM false positives (right panel), which could result in, e.g., RFI storms (see Figure 5).

Refer to caption

Figure 5: CHIME/FRB L4 events over a 24 hr period on 2020 September 12–13 (Figure courtesy of the CHIME/FRB collaboration). Repeating and nonrepeating FRB candidates are marked by black circles and green diamonds, respectively. “Galactic” and “known” events mostly refer to the same set of nearby sources, such as pulsars in the Milky Way galaxy. “Ambiguous” events often require human intervention in order to be classified as Galactic or extragalactic sources. In this example, three “RFI” storms (gray points) appear at t11618t_{1}\approx 16-18 hr (9/12), t22022t_{2}\approx 20-22 hr (9/12), and t327t_{3}\approx 2-7 hr (9/13). Top panel: observed dispersion measure up to 13000 pc cm-3. For some unknown reason, most RFI storms tend to appear at relatively high DMs. In addition, aircraft transits appear as vertical lines of gray points with DM 103\lesssim 10^{3} pc cm-3 at t1721t\approx 17-21 hr (9/12). Middle panel: S/N with a minimum 8.5σ\sigma threshold. Bottom panel: north-south sky location based on 256 rows of digitally formed beams (each row contains four east-west beams, which are not shown here). The instrument has nonzero sensitivity in its sidelobes, which can continuously detect, e.g., the bright pulsar PSR B0329+54 for a few hours (see the arc centered at beam145{\rm beam}\approx 145 and t12t\approx 12 hr (9/13)).

If highly active RFI persists for a few weeks, then we start saving intensity data to disk for offline inspection. If the culprit is static in frequency, e.g., newly allocated radio bands, then we update our static mask. If the culprit is dynamic, e.g., abrupt changes in the running variance across irregular time intervals, then we might need to increase the number of clipping iterations (ncn_{c}) in order to account for a period of highly active RFI.

Another low-budget solution would be to append a few “auxiliary” transforms to the main RFI chain. Table 3 lists an auxiliary transform chain that we have designed for the CHIME/FRB pipeline. The chain contains two identical transforms that operate along the frequency axis with a relatively large downsampling factor along the time axis (Dt=256D_{t}=256), potentially resulting in expanding the mask widely over the time axis. Such mask expanding transforms can suppress RFI residuals around edges of RFI-contaminated regions that are localized in time. On the other hand, modifying the running L1 configuration with such coarse-grained masks could introduce a time-varying bias into the instrumental selection function. Therefore, we generally avoid using the auxiliary transform chain (Table 3) as a hotfix in the CHIME/FRB pipeline.

kk type σc\sigma_{c} σc,in\sigma_{c,{\rm in}} nc,inn_{c,{\rm in}} axis NtN_{t} DtD_{t} DνD_{\nu}
1 stdv 3 - - ν\nu 4096 256 2
2 stdv 3 - - ν\nu 4096 256 2
Table 3: Auxiliary transform chain for enhancing the CHIME/FRB RFI mask under severe RFI conditions. The index kk specifies the position of each transform in the chain. The other parameters are described in Table 1.

4 Discussion

The CHIME/FRB L1 pipeline is currently the leading search engine for detecting FRBs, which are usually immersed in an influx of RFI-triggered false positives. We mitigate RFI in a sequence of iterative operations that are fine-tuned empirically based on statistics of intensity values in real time. In this paper, we focus on false positives that arise in the 2D dedispersion space (t,DMt,\mbox{DM}). However, the L1 dedispersion transform generates a 4D trigger array indexed by (DM, scattering measure, spectral index, tt), presenting an opportunity for masking RFI in other dimensions.

Our RFI mitigation strategy is an attempt to minimize the overall computational cost, false-positive and false-negative rates for the CHIME/FRB instrument. This problem could be generalized to the continuous domain, where binary rates are replaced by ranges of parameters in the selection function of our instrument. For instance, we could lower our false-positive rate significantly by adding more clipping transforms with σc<3\sigma_{c}<3 to the RFI transform chain. This would introduce a significant bias in our sensitivity for detecting bright FRBs. In other words, any solution to the RFI problem places a set of constraints on the instrumental selection function in the continuous domain of FRB parameters (see, e.g., §3.2). On the other hand, our injection system, which injects synthetic FRB signals in real time into the pipeline allows us to characterize such biases in principle (Merryfield et al., 2022).

In this work, we have presented a customized set of operations that probably does not correspond to an optimal solution for the CHIME/FRB RFI environment. We could in principle replace our trial-and-error strategy by an automated pipeline, e.g., based on machine-learning techniques, that would solve the minimization problem more accurately. Such a systematic approach would enable us to impose user-defined constraints on the instrumental selection function for varying choices of RFI environment. In addition, the L1 pipeline could learn from downstream pipelines through an automated feedback mechanism; assuming that L1 could undergo a soft restart, we could modify the transform chain on-the-fly in order to account for sudden changes in the RFI environment. On the other hand, a time-varying transform chain might not lead to a well-characterized instrumental selection function. We defer the study and development of these features to future work.

Our algorithms have proven to be successful in maintaining a low false-positive rate, while allowing for the detection of astrophysical events. Despite being effective, our current detrenders operate independently along a single axis (time or frequency) that may not fully capture 2D variations of RFI across the intensity plane (t,νt,\nu). In addition, detrending along the frequency axis can remove significant signal from intrinsically wide bursts at low DMs, severely impacting the true-negative rate and completeness (see Figures 16 and 18 in CHIME/FRB Collaboration, 2021). Furthermore, we note that CHIME/FRB RFI mitigation and incoherent dedispersion routines were designed based on the assumption that FRBs are isolated events in time. However, this assumption might not be valid for repeating and multicomponent FRBs with 1\lesssim 1 s separation between events (CHIME/FRB Collaboration, 2022). Additionally, FRBs could be immersed in transits of bright pulses from Galactic sources such as PSR B0329+54, which could impact the running variance. Given the complexity of these multivariate problems, we characterize the CHIME/FRB selection function through systematic FRB injections into real data (CHIME/FRB Collaboration, 2021; Merryfield et al., 2022).

CHIME/FRB intensity data have provided us with a unique opportunity to investigate a multitude of RFI-related problems that might be of interest to other experiments such as the upcoming Hydrogen Intensity and Real-time Analysis Experiment (Crichton et al., 2022) and the Canadian Hydrogen Observatory and Radio-transient Detector (Vanderlinde et al., 2019). Here, we list a few unsolved problems for future exploration:

  • What is the source of RFI storms? How can we suppress them without masking excessively?

  • Most RFI storms tend to trigger false positives at high DMs (see, e.g., Figure 5). Given the CHIME/FRB L1 pipeline, would this be expected statistically? Could this be due to a design flaw in our RFI transform chain?

  • \sim10% of CHIME/FRB multibeam-detected FRB candidates seem to exhibit a false negative in at least one beam. Given that neighboring beams are expected to share the same RFI environment, what factors could contribute to such nondetections?

  • Technically, we could have multiple subpipelines with different downsampling factors in series. Would this be useful for enhancing the mask over narrowband RFI?

We hope to build on this work in the future to answer some of these problems as the CHIME/FRB instrument continues to observe the radio sky.

We acknowledge that CHIME is located on the traditional, ancestral, and unceded territory of the Syilx/Okanagan people. We are grateful to the staff of the Dominion Radio Astrophysical Observatory, which is operated by the National Research Council of Canada. CHIME is funded by a grant from the Canada Foundation for Innovation (CFI) 2012 Leading Edge Fund (Project 31170) and by contributions from the provinces of British Columbia, Québec, and Ontario. The CHIME/FRB Project is funded by a grant from the CFI 2015 Innovation Fund (Project 33213) and by contributions from the provinces of British Columbia and Québec, and by the Dunlap Institute for Astronomy and Astrophysics at the University of Toronto. Additional support was provided by the Canadian Institute for Advanced Research (CIFAR), McGill University and the McGill Space Institute thanks to the Trottier Family Foundation, and the University of British Columbia. Research at Perimeter Institute is supported by the Government of Canada through Industry Canada and by the Province of Ontario through the Ministry of Research & Innovation. K.M.S. was supported by an NSERC Discovery Grant and a CIFAR fellowship. We thank Sabrina Berger, Alice Curtin, Vicky Kaspi, Dustin Lang, Emily Petroff, Ziggy Pleunis, and Shriharsh Tendulkar for comments on the manuscript.

References

  • Akeret et al. (2017) Akeret, J., Chang, C., Lucchi, A., & Refregier, A. 2017, Astronomy and Computing, 18, 35, doi: 10.1016/j.ascom.2017.01.002
  • Bandura et al. (2014) Bandura, K., Addison, G. E., Amiri, M., et al. 2014, Ground-based and Airborne Telescopes V, 9145, 914522, doi: 10.1117/12.2054950
  • CHIME Collaboration (2022) CHIME Collaboration. 2022, The Astrophysical Journal Supplement Series, 261, 29, doi: 10.3847/1538-4365/ac6fd9
  • CHIME/FRB Collaboration (2018) CHIME/FRB Collaboration. 2018, The Astrophysical Journal, 863, 48, doi: 10.3847/1538-4357/aad188
  • CHIME/FRB Collaboration (2021) —. 2021, The Astrophysical Journal Supplement Series, 257, 59, doi: 10.3847/1538-4365/ac33ab
  • CHIME/FRB Collaboration (2022) —. 2022, Nature, 607, 256, doi: 10.1038/s41586-022-04841-8
  • Crichton et al. (2022) Crichton, D., Aich, M., Amara, A., et al. 2022, Journal of Astronomical Telescopes, Instruments, and Systems, 8, 011019, doi: 10.1117/1.JATIS.8.1.011019
  • Czech et al. (2018) Czech, D., Mishra, A., & Inggs, M. 2018, Astronomy and Computing, 25, 52, doi: 10.1016/j.ascom.2018.07.002
  • Eatough et al. (2009) Eatough, R. P., Keane, E. F., & Lyne, A. G. 2009, Monthly Notices of the Royal Astronomical Society, 395, 410, doi: 10.1111/j.1365-2966.2009.14524.x
  • Fridman & Baan (2001) Fridman, P. A., & Baan, W. A. 2001, Astronomy and Astrophysics, 378, 327, doi: 10.1051/0004-6361:20011166
  • Maan et al. (2021) Maan, Y., van Leeuwen, J., & Vohl, D. 2021, A&A, 650, A80, doi: 10.1051/0004-6361/202040164
  • Masui et al. (2019) Masui, K. W., Shaw, J. R., Ng, C., et al. 2019, The Astrophysical Journal, 879, 16, doi: 10.3847/1538-4357/ab229e
  • Merryfield et al. (2022) Merryfield, M., Tendulkar, S. P., Shin, K., et al. 2022, arXiv e-prints, arXiv:2206.14079, doi: 10.48550/arXiv.2206.14079
  • Michilli et al. (2021) Michilli, D., Masui, K. W., Mckinven, R., et al. 2021, The Astrophysical Journal, 910, 147, doi: 10.3847/1538-4357/abe626
  • Mirhosseini (2020) Mirhosseini, A. 2020, Master’s thesis, University of British Columbia. https://open.library.ubc.ca/collections/24/items/1.0394838
  • Ng et al. (2017) Ng, C., Vanderlinde, K., Paradise, A., et al. 2017, XXXII International Union of Radio Science General Assembly & Scientific Symposium (URSI GASS) 2017, 4, doi: 10.23919/URSIGASS.2017.8105318
  • Nita & Gary (2010a) Nita, G. M., & Gary, D. E. 2010a, PASP, 122, 595, doi: 10.1086/652409
  • Nita & Gary (2010b) —. 2010b, MNRAS, 406, L60, doi: 10.1111/j.1745-3933.2010.00882.x
  • Offringa et al. (2010a) Offringa, A. R., de Bruyn, A. G., Biehl, M., et al. 2010a, MNRAS, 405, 155, doi: 10.1111/j.1365-2966.2010.16471.x
  • Offringa et al. (2010b) Offringa, A. R., de Bruyn, A. G., Zaroubi, S., & Biehl, M. 2010b, arXiv e-prints, arXiv:1007.2089, doi: 10.48550/arXiv.1007.2089
  • Petroff et al. (2022) Petroff, E., Hessels, J. W. T., & Lorimer, D. R. 2022, The Astronomy and Astrophysics Review, 30, 2, doi: 10.1007/s00159-022-00139-w
  • Petroff & Yaron (2020) Petroff, E., & Yaron, O. 2020, Transient Name Server AstroNote, 160, 1
  • Spanakis-Misirlis (2021) Spanakis-Misirlis, A. 2021, Astrophysics Source Code Library, ascl:2106.028. https://ui.adsabs.harvard.edu/abs/2021ascl.soft06028S
  • Taylor et al. (2019) Taylor, J., Denman, N., Bandura, K., et al. 2019, Journal of Astronomical Instrumentation, 8, 1940004, doi: 10.1142/S225117171940004X
  • Vanderlinde et al. (2019) Vanderlinde, K., Liu, A., Gaensler, B., et al. 2019, Canadian Long Range Plan for Astronomy and Astrophysics White Papers, 2020, 28, doi: 10.5281/zenodo.3765414
  • Yang et al. (2020) Yang, Z., Yu, C., Xiao, J., & Zhang, B. 2020, MNRAS, 492, 1421, doi: 10.1093/mnras/stz3521
  • Zhang et al. (2022) Zhang, C.-P., Xu, J.-L., Wang, J., et al. 2022, Research in Astronomy and Astrophysics, 22, 025015, doi: 10.1088/1674-4527/ac3f2d
  • Zwaniga (2021) Zwaniga, A. 2021, Master’s thesis, McGill University. https://escholarship.mcgill.ca/concern/theses/p8418s91c