A novel stacked hybrid autoencoder for imputing LISA data gaps

Ruiting Mao Department of Statistics, University of Auckland, Auckland 1010, New Zealand Jeong Eun Lee Department of Statistics, University of Auckland, Auckland 1010, New Zealand Matthew C. Edwards Department of Statistics, University of Auckland, Auckland 1010, New Zealand

Abstract

The Laser Interferometer Space Antenna (LISA) data stream will contain gaps with missing or unusable data due to antenna repointing, orbital corrections, instrument malfunctions, and unknown random processes. We introduce a new deep learning model to impute data gaps in the LISA data stream. The stacked hybrid autoencoder combines a denoising convolutional autoencoder (DCAE) with a bi-directional gated recurrent unit (BiGRU). The DCAE is used to extract relevant features in the corrupted data, while the BiGRU captures the temporal dynamics of the gravitational-wave signals. We show for a massive black hole binary signal, corrupted by data gaps of various numbers and duration, that we yield an overlap of greater than 99.97% when the gaps do not occur in the merging phase and greater than 99% when the gaps do occur in the merging phase. However, if data gaps occur during merger time, we show that we get biased astrophysical parameter estimates, highlighting the need for “protected periods”, where antenna repointing does not occur during the predicted merger time.

Laser Interferometer Space Antenna, massive black hole binaries, gaps imputation, autoencoder, recurrent neural network.

I Introduction

The Laser Interferometer Space Antenna (LISA) lisa:2017 ; baker2019laser is a space-borne gravitational-wave observatory under development by the European Space Agency (ESA) and the National Aeronautics and Space Administration (NASA) bender1998pre . It could observe a wealth of possible gravitational-wave (GW) sources in the range 0.1 mHz to 1 Hz colpi2024lisadefinitionstudyreport , such as those from massive black hole binaries (MBHBs) berti2006gravitational ; hughes2002untangling ; sesana2005gravitational ; vecchio2004lisa , extreme-mass ratio inspirals (EMRIs) Gair_2017 ; Babak_2017 , and galactic binaries (GBs) Cornish_2017 ; Willems_2008 ; Littenberg_2019 . A fundamental distinction between LISA and the ground-based detectors (Advanced LIGO, Advanced Virgo, and KAGRA) lies in the fact that LISA will be capable of detecting sources that will persist within its frequency range for extended durations, spanning from hours to years.

The observatory will be operational for a nominal duration of four years, with potential extensions. However, due to factors such as antenna repointing, orbital corrections, instrument malfunctions, and other stochastic processes, the LISA data stream is expected to contain missing or unusable data Amaro_Seoane_2021 . To mitigate the impact of these data disturbances, it may be necessary to exclude the affected segments, which will result in gaps within the usable data streams. These data gaps will induce nonstationarity in the underlying noise process of the LISA instrument, consequently, the covariance matrix commonly employed in our statistical models will cease to be diagonal wang2024windowinpaintingdealingdata . Therefore, the estimation of astrophysical parameters will be biased unless these data gaps are adequately addressed. Furthermore, the discrete Fourier transform (DFT) of data containing gaps is susceptible to spectral leakage, thereby affecting both the gravitational wave signal and the stochastic noise carré2010effect ; PhysRevD.102.084062 . This degradation becomes increasingly significant as the frequency of the source decreases Baghi_2019 .

The operational data from LISA Pathfinder (LPF), launched in December 2015, underscores the necessity to consider the data gaps. It shows that there may indeed be disruptions in the data stream. Simple patterns for gaps in LISA were discussed in carré2010effect , which conducted a large-scale Monte Carlo parameter estimation simulation for GBs. Studies on data gaps in LISA have highlighted its impact on detectability and parameter estimation Baghi_2019 . Dey et al. Dey_2021 investigated the two types of gaps in MBHBs, showing that unscheduled gaps have a greater impact compared to scheduled ones.

To address this challenge, classic techniques from other domains for handling missing data, such as linear interpolation or mean imputation, often prove inadequate when dealing with the intricacies of GW signals Blackman_2014 . The phase-coherent signals from ultra-compact binaries (UCBs) can be used as calibration sources to measure the duration of data gaps in LISA independently. However, these benchmarks are heavily dependent on the realization of the galactic population of the UCB PhysRevD.98.043008 . The application of apodization, i.e., implementing a window function to the signal before taking the Discrete Fourier Transform (DFT), is widely used in signal processing and spectral analysis to mitigate the effects of spectral leakage carré2010effect ; Dey_2021 ; wang2024windowinpaintingdealingdata . Indeed, treating the remaining data segments as independent measurements may lead to modeling errors, and there is no ideal smoothing parameter for the window function. A Bayesian augmentation method was applied to treat missing data as auxiliary variables and samples them along with the parameters of interest, providing a statistically consistent way to handle gaps in GBs while improving sampling efficiency and mitigating spectral leakage effects Baghi_2019 . However, the high reliance on the parametric model assumption on time series will be challenging when extending it to other LISA sources, such as MBHBs. Longer duration gaps were considered in 10.1093/mnras/stab3314 ; Blelly_2020 , where a nonparametric inpainting algorithm grounded in sparse representation was implemented to attenuate the effects of gaps on the galactic binary signal in the frequency domain. This model-independent approach is heavily reliant on noise modeling, resulting in potential inaccuracies with the intricate patterns and dependencies inherent in gravitational wave data zhao2023dawningneweragravitational .

Deep learning has made considerable strides in processing data from GW observations, achieving significant success in this area. Various types of neural networks have been applied to detect and characterize GW signals, such as the convolutional neural network (CNN) in PhysRevD.101.104003 ; Krastev_2021 ; george2018deep ; PhysRevD.103.024025 , conditional variational autoencoders (VAEs) in Gabbard2019BayesianPE , the transformer-based extraction network in Zhao_2023 , and also the generative adversarial network (GAN) to generate simulated GW signals Jadhav_2023 ; eccleston_2024 ; lopez:2022 ; powell:2023 . Furthermore, deep generative models make it possible to accelerate the generation of GW waveforms chua:2019 ; katz:2021 ; Liao_2021 . Likelihood-free methods can be applied to approximate posteriors through deep learning techniques as well, such as VAE Gabbard_2018 and normalizing flows green:2020 ; Dax_2021 ; Langendorff_2023 . Deep learning techniques and algorithms have also been applied to various other fascinating challenges in gravitational wave research, like glitch classification, glitch cancelation, and gravitational wave bursts AI_2023 .

Data gaps can be considered as noisy data within the data stream, making it reasonable to develop an autoencoder (AE) to address missing data in the data stream wang2024deep as well as to manage high-dimensional signals with a small training set PhysRevD.109.083002 . Denoising autoencoders (DAE) are extensively utilized in analyzing GW observed data to identify and eliminate noise, thereby producing a denoised or reconstructed signal for subsequent analysis bacon2022denoising ; PhysRevD.108.043024 ; morawski2021anomaly . Given the time series nature of data streams from GW detectors, recurrent neural networks (RNNs) are frequently integrated with AE to perform detection and denoising tasks. Based on a convolutional autoencoder (CAE), long-short-term memory networks (LSTMs) hochreiter1997long were incorporated as layers in both the encoder and decoder of AEs, demonstrating superior performance in reconstructing samples affected by various anomalies moreno2021sourceagnosticgravitationalwavedetectionrecurrent . Identical methodologies are evident in Shen_2019 , yielding promising outcomes in signal extraction against highly noisy backgrounds. A comparable structure is illustrated in raikman2024gwak , where the input of the autoencoder is concentrated by the output of LSTMs and the signal itself. The asymmetric addition of RNN layers in AE is also observed in chatterjee2021extraction ; Xu:2024jbo , where LSTM layers are incorporated into the decoder component. Nevertheless, the substantial memory requirements and training latency associated with long sequence GW signals necessitate the segmentation of data streams into smaller fragments when incorporating RNN layers within AE, which poses a potential risk of losing information pertaining to the overall structure of signals.

The prevalence of missing data in time series analysis is a common issue across various disciplines, and encoder-decoder RNNs techniques have achieved substantial success in the reconstruction of time series data. By adding LSTM layers in the encoder of DAE, higher imputation performance in multivariate time series can be seen compared with multi-directional recurrent neural networks 8982996 . A spatio-temporal LSTM convolutional autoencoder method was proposed to fill gaps in satellite retrieval 9884482 ; Jia et al.8217773 proposed a stacked autoencoder-based imputation method that employs two loss functions: one exclusive for the internal LSTM layers and the other for the overall DAE with the same bottleneck layer. Similar to LSTMs, gated recurrent units, also known as GRUs, are utilized extensively in the process of gap imputation to avoid vanishing or exploding gradient issues cho2014learning ; gupta2017instability ; che2016recurrentneuralnetworksmultivariate ; 9374359 . Moreover, in 9221727 , empirical results demonstrated that GRU exhibits a $29.29\%$ increase in processing speed compared to LSTM when applied to an identical dataset, and it also yields superior performance with smaller training data. Alonso et al. alonso2024gap conducted empirical investigations with different unidirectional (simple RNN, GRU, LSTM) and bi-directional (BiSRNN, BiGRU, BiLSTM) RNN layers in DAE, and showed that BiGRU layers have the best performance with a low reconstruction error in industry data. A comprehensive comparative analysis of various hybrid models incorporating DAEs and RNN layers for short-term market forecasting has been conducted in abu2024comparative , which indicates that BiGRUs or GRUs generally exhibit superior performance relative to other hybrid models. DAEs incorporating GRU layers are prevalent in numerous imputation tasks CHEN2021120451 ; IKHLASSE202211565 ; s23249697 ; 10498920 ; these models typically integrate GRU layers within the DAE and utilize a single loss function for imputing sequences significantly shorter than GW signals.

The concept of stacked denoising autoencoders was introduced in vincent2010stacked , demonstrating that a locally applied unsupervised criterion yields a more effective representation of the preceding layer. Sequence to sequence AE and GRU based hybrid model were developed in Rai2021ARA , which gave a more robust result in short-term solar power forecasting. Drawing inspiration from these pioneering methodologies, we propose an innovative stacked hybrid autoencoder architecture featuring a locally trained denoising convolutional autoencoder (DCAE) as the encoder, complemented by bi-directional gated recurrent unit (BiGRU) layers in the decoder. This model employs two loss functions to effectively execute imputation tasks. This framework is particularly advantageous for GW signal analysis within a LISA data stream compromised by intermittent disruptions. We will introduce our proposed model, the Bi-directional Gated Recurrent Unit Convolutional Autoencoder (BiGRU-CAE), in Section III.1 following the methodology background in Section II. Then we demonstrate our method using a simple toy example in Section IV.1 and finally show its general applicability on an MBHB source within the LISA framework in Section IV.2. We give some concluding remarks and future directions in Section V.

In summary, the novelty of this work is in several directions:

•

We propose a novel imputation method for long-time series that is scalable enough for GW data analysis. It stacks a DCAE and BiGRU. In the LISA data stream, signals tend to be longer and can not be cut into pieces if we want to consider gaps at some specific periods, such as at merger time for MBHB signals. Instead of using the bottleneck layer of the denoising autoencoder as the input of the BiGRU, we apply two stacked hybrid models, where a DCAE is trained in the encoder to make the computation stable and efficient.
•

Our model can tackle both scheduled gaps and unscheduled gaps. Long-duration unscheduled gaps have also been considered in our analysis. Instead of using frequent daily random gaps (not longer than 1 hour), we investigate the impact of long-duration unscheduled gaps, which last 6 hours on average, and apply deep learning imputation to fill these gaps.
•

It is a common practice to employ overlap and Signal-to-Noise Ratio (SNR) loss as metrics for assessing performance in denoising applications. We further elaborate on the results of parameter estimation to demonstrate the precision of the parameter estimation in the reconstructed signal.
•

Addressing gaps that occur during the merger time of MBHB signals presents a significant challenge. To the best of the author’s knowledge, this is the first time this problem has been considered. We illustrate the discrepancies in the parameter estimation of the reconstructed signal when such gaps occur during versus outside the merger time.

II Methodology

II.1 Denoising convolutional autoencoder

Autoencoders autoencoderidea are a class of neural network architectures that aim to learn efficient representations (encodings) of input data, typically for the purpose of dimension reduction or feature learning. Unlike other types of neural network, the target of training is the input itself. This self-supervised learning approach allows autoencoders to be trained without annotated data, which can be particularly advantageous in scenarios where labeled data are scarce or expensive to obtain.

The fundamental architecture of an autoencoder consists of two main parts: an encoder and a decoder. The encoder compresses the input into a lower-dimensional latent space (also called the bottleneck layer), capturing the most salient features of the data. The decoder then reconstructs the input data from this compressed representation, aiming to minimize the difference between the original input and its reconstruction. The performance of an autoencoder is typically measured by the reconstruction error (usually mean squared error), which quantifies how well the decoder can reconstruct the input from the reduced encoding.

Particularly notable among autoencoder variants are denoising autoencoders (DAEs), seen in Fig. 1, designed to improve data quality by correcting input corrupted with some form of noise. This approach not only helps reduce the noise, but also forces the autoencoder to learn more robust and essential features of the data 8616075 , and prevents the model from learning the identity function. DAEs achieve this by first intentionally corrupting clean input data $X$ to $\hat{X}$ , training the model to recover the original uncorrupted input 10.1145/1390156.1390294 . Convolutional layers are computationally efficient in handling large-scale inputs and allow neural networks to learn features that are spatially invariant. Therefore, denoising convolutional autoencoders (DCAEs) are proposed JMLR:v11:vincent10a , adding convolutional layers have shown better performance than nonconvolutional neural networks in image processing DCAEP .

Refer to caption — Figure 1: Denoising Autoencoders (DAE) structure. $h_{W,b}$ represents the trained model, which has learned to reconstruct the original input $X$ from its corrupted version $\hat{X}$ .

II.2 Gated recurrent unit

As part of our network architecture, we make use of the gated recurrent unit (GRU) cho:2014 ; chung2014empiricalevaluationgatedrecurrent . The GRU is a type of recurrent neural network (RNN) that uses a gating mechanism to decide what information passes to the output, thus filtering out irrelevant information. Standard RNNs suffer from the vanishing gradient problem, where the gradients of the loss function become close to zero and are backpropagated through the neural network basodi:2020 . GRUs can avoid this issue altogether using update and reset gates to regulate the flow of information, allowing them to learn long-term time dependencies. GRUs are faster than LSTMs in low-complexity sequences Cahuantzi_2023 .

As can be seen in Fig. 2, the reset gate determines how much information from the past hidden state should be forgotten. It outputs a value between 0 and 1, where 0 means forget everything and 1 means remember everything. The output is used to determine a candidate hidden state, or the new information. The update gate determines how much weight to put on the past hidden state and how much weight to put on the candidate hidden state. It outputs a number between 0 and 1, where 0 means put all the weight on the previous hidden state and 1 means put all of the weight on the candidate hidden state. The standard GRU $GRU(x_{t},h_{t-1})$ is formulated:

$\displaystyle r_{t}$	$\displaystyle=\text{sigmoid}(W_{r}x_{t}+U_{r}h_{t-1}+b_{r})$	(Reset gate)
$\displaystyle z_{t}$	$\displaystyle=\text{sigmoid}(W_{z}x_{t}+U_{z}h_{t-1}+b_{z})$	(Update gate)
$\displaystyle\hat{h}_{t}$	$\displaystyle=\text{tanh}(W_{h}x_{t}+U_{h}(r_{t}\odot h_{t-1})+b_{h})$	(Candidate hidden state)
$\displaystyle h_{t}$	$\displaystyle=(1-z_{t})\odot h_{t-1}+z_{t}\odot\hat{h}_{t}$	(Updated hidden state)

The trainable parameters for this neural network are $(W_{z},U_{z},b_{z},W_{r},U_{r},b_{r},W_{h},U_{h},b_{h})$ . In addition to the standard GRU, the bi-directional GRU (BiGRU) enhances the model’s ability to capture context from both past and future states. In a BiGRU, two separate GRU layers are used: one processes the input sequence in a forward direction, while the other processes it in reverse.

$\displaystyle\overrightarrow{h}_{t}$	$\displaystyle=GRU_{f}(x_{t},\overrightarrow{h}_{t-1})$	(Forward)
$\displaystyle\overleftarrow{h}_{t}$	$\displaystyle=GRU_{b}(x_{t},\overleftarrow{h}_{t+1})$	(Backward)
$\displaystyle h_{t}^{bi}$	$\displaystyle=[\overrightarrow{h}_{t};\overleftarrow{h}_{t}]$	(Combined BiGRU output)

This structure allows the network to utilize information from both directions, improving performance on tasks that benefit from understanding the time dependence surrounding each input.

III Bi-directional Gated Recurrent Unit Convolutional Autoencoder (BiGRU-CAE)

Our BiGRU-CAE hierarchical neural network contains a denoising convolutional autoencoder (DCAE) and bi-directional gated recurrent unit (BiGRU) layers to deal with the complicated and long-duration LISA data. The denoising autoencoder model leverages the representation learning capabilities of CNNs to extract relevant features for the whole structure from the input data, while the BiGRU component captures the temporal dynamics of the GW signals. By training this end-to-end hybrid model to reconstruct clean GW signals from noisy detector data, it is able to effectively denoise and extract the underlying GW signals, simplifying the following signal processing procedure. Similar hybrid deep learning architectures have been shown to demonstrate superior performance compared to other techniques to recover GW signals from noisy detector data, such as CNN-LSTM in LIGO-Virgo data analysis Chatterjee_2021 and DENSE-LSTM model in Taiji data analysis Xu:2024jbo .

III.1 Model structure

The proposed hybrid model has two components. The DCAE component consists of a three-layer 1D convolutional encoder, followed by a fully-connected layer and a three-layer 1D transposed convolutional decoder, simplifying the computation and reducing the input signal’s high dimension. To further enhance the autoencoder’s efficiency, we employ larger strides in the convolutional layers, thereby eliminating the need for pooling operations 7780459 and the complexity of learning the overall structure of the sequence. This design choice enables the autoencoder to acquire a resilient representation of the input data, essential for further processing. The DCAE functions as the encoder component of the entire hybrid model, with its output regarded as a refined bottleneck.

In the decoder section of the model, bi-directional Gated Recurrent Unit (BiGRU) layers are considered. The purpose of these layers is to infer and capture the temporal patterns embedded within the sequential segments of the signal, thereby refining the partially denoised output generated by the DCAE. The BiGRU part consists of two BiGRU layers, followed by a fully connected layer, enabling the model to effectively learn and retain the temporal dependencies inherent in the input sequence.

The structure of our model is depicted in Fig. 3. The data stream undergoes normalization before being used to train the DCAE. The left dashed box shows the training of the DCAE process. Complete normalized signals $d_{i}$ are corrupted with varying gap patterns for each iteration to ensure the robustness of the DCAE 10.1145/1390156.1390294 . The detailed corruption process will be discussed in Section III.2. To address the issue of dying neurons associated with negative values, we employ the Leaky Rectified Linear Unit (LReLU) maas2013rectifier as the activation function following each convolutional layer in the encoder and the transposed convolutional layer in the decoder. This choice facilitates an improved gradient flow and enhances the model’s ability to learn robust feature representations. Due to the high dimension of the input, we did not apply the classic convolutional encoder-decoder architectures such as U-Net ronneberger2015unet , SegNet 7803544 , and DeepLabV3 chen2018encoderdecoder for semantic image segmentation. In the output of the DCAE, minor isolated artifacts can be seen within the reconstructed signal $h_{C}(\hat{d_{i}})$ of corrupted signal $\hat{d_{i}}$ ; see the orange line in Fig. 4.

To address these small corrupted fractions, we consider applying BiGRU layers to recover the continuity of the reconstructed signal. The training process can be seen with the right dashed box in Fig. 3. The input is the reconstructed signal after DCAE $h_{C}(\hat{d_{i}})$ with the newly corrupted signal. To make training computationally feasible, samples of these outputs are partitioned into several multiple subsequences of length $L$ , where $L$ depends on the complexity of the input signal. The hidden states of size $K$ in the BiGRU are fed into the fully connected layer to predict the corrupted data. The observed signal with gaps $\tilde{d_{i}}$ will follow the black line to produce the reconstructed signal after recombining the output of the BiGRU decoder $h_{G}(\hat{s}_{ij})$ and back normalization, where $\hat{s}_{ij}$ is the $j$ -th segment of the DCAE output $h_{C}(\tilde{d}_{i})$ .

III.2 Data preparation

In this subsection, we discuss the simulation of gaps in the data sequence. A binary mask function $w(t)$ is applied so that the simulated observed data with gaps is

\displaystyle\hat{d_{i}}=w(t)d_{i}

(1)

where

w(t)=\begin{cases}1,&\text{if data at $t$ is available}\\ 0,&\text{if data at $t$ is unavailable}.\end{cases}

(2)

The gaps are categorized into scheduled gaps and unscheduled gaps. Scheduled gaps result from periodic maintenance of the LISA spacecraft and its onboard instruments, such as antenna repointing, which typically causes predictable downtime of about 3.5 hours each week or longer interruptions of up to 7 hours every two weeks. In contrast, unscheduled gaps arise from unexpected hardware malfunctions or unforeseen physical occurrences, leading to unpredictable durations that can range from several hours to several days, significantly affecting the reliability of the collected data Dey_2021 ; Amaro_Seoane_2021 . Here, we will consider both scheduled and unscheduled gaps. The intervals between successive gaps are sampled from an exponential distribution, with its rate parameter defined according to the duty cycle 75% requirements Dey_2021 ; 10.1093/mnras/stab3314 ; Amaro_Seoane_2021 . The duration for the gaps is a uniform distribution between 4 and 8 hours per day and the total data sequence in our example is about 3 days. More flexible durations will be needed when considering longer signals.

III.3 Training strategy

Data normalization is a critical preprocessing step when training neural networks. The input data must be on a similar numeric scale to learn meaningful representations. Normalization transforms the input data stream to have zero mean and unit variance, ensuring all samples contribute equally to the learning process. This is typically achieved by subtracting the mean and dividing by the signal’s standard deviation. Normalization helps prevent certain samples from dominating the reconstruction loss during training, which could lead to the autoencoder learning trivial or biased representations.

Mean Squared Error (MSE) is commonly used as the loss function in a denoising autoencoder. The MSE measures the average squared difference between the autoencoder’s predictions and the true, uncorrupted input values. This loss metric aims to minimize the discrepancy between the network’s output and the original, unperturbed data, thereby enabling effective impainting. To mitigate overfitting for certain time periods or samples, L2 regularization is applied to discourage the autoencoder from learning large weight values. By adding a penalty proportional to the L2 norm of the weights, the Adam optimizer Kingma:2014 is encouraged to find a set of parameters with smaller magnitudes, resulting in a more generalized model.

Our BiGRU-CAE has two stacked autoencoders; a DCAE as the encoder, and an AE with BiGRU layers as the decoder, both of which have different loss functions. For DCAE, L2 regularization is added MSE, as discussed before to yield the following loss function:

\displaystyle L_{C}=\frac{1}{n}\sum_{i=1}^{n}(d_{i}-h_{C}(\hat{d}_{i}))^{2}+\lambda\sum_{l=1}^{p}|\varphi_{l}|^{2}

(3)

where $h_{C}(\hat{d}_{i})$ is an output of the DCAE with the corrupted samples; $\lambda$ is the L2 regularization hyperparameter, which is default to 0.001; $p$ is the total number of weight parameters in the autoencoder model; and $|\varphi_{l}|^{2}$ represents the L2 norm (Euclidean norm) of the $l$ -th weight parameter $\varphi_{l}$ . Data segmentation occurs before training of BiGRU layers. The loss function for the AE with a BiGRU is defined as:

	$\displaystyle L_{G}=\frac{1}{nJ}\sum_{i=1}^{n}\sum_{j=1}^{J}(s_{ij}-h_{G}(\hat{s}_{ij}))^{2}$		(4)
	$\displaystyle h_{C}(\hat{d}_{i})=\{\hat{s}_{i1},\hat{s}_{i2},...,\hat{s}_{iJ}\}$		(5)

where $s_{ij}$ denote the $j$ -th segment of length $L$ from the $i$ -th complete sampled signal and, $J$ is the number of segments. After injecting simulated gaps that differ from those in the DCAE training, the DCAE output is partitioned into $J$ -segments, denoted as $\hat{s}_{ij}$ , $j=1,...,J$ . The output of the hybrid model for each segment $h_{G}(\hat{s}_{ij})$ is stacked to form $h_{G}(h_{C}(\hat{d}_{i}))$ . Then the final reconstructed signal is obtained by denormalizing $h_{G}(h_{C}(\hat{d}_{i}))$ . Considering the overall pattern of the time sequence, the fractal Tanimoto similarity coefficient Diakogiannis_2021 can also be considered when dealing with complex signals Chatterjee_2021 .

III.4 Bayesian theory

Bayesian inference is the standard procedure used in GW astronomy to estimate parameters $\boldsymbol{\theta}$ given observations of a set of data streams $d_{o}$ . At the heart of Bayesian theory lies Bayes’ theorem:

	$\displaystyle p(\boldsymbol{\theta}\|d_{o})$	$\displaystyle=$	$\displaystyle\frac{p(d_{o}\|\boldsymbol{\theta})p(\boldsymbol{\theta})}{p(d_{o})}$		(6)
		$\displaystyle\propto$	$\displaystyle p(d_{o}\|\boldsymbol{\theta})p(\boldsymbol{\theta}),$		(7)

where $p(\boldsymbol{\theta}|d_{o})$ is the posterior density of unknown parameters $\boldsymbol{\theta}$ given the observation of a data stream $d_{o}$ , $p(d_{o}|\boldsymbol{\theta})$ the likelihood function and $p(\boldsymbol{\theta})$ is the prior distribution, representing our knowledge about the parameters $\boldsymbol{\theta}$ before observing the data. The marginal likelihood $p(d_{o})=\int_{\boldsymbol{\theta}\in\boldsymbol{\Theta}}p(d_{0}|\boldsymbol{\theta})p(\boldsymbol{\theta})\,\text{d}\boldsymbol{\theta}$ is a constant over the parameter space and is unnecessary in the parameter estimation of our experiment.

Stochastic sampling algorithms, such as Markov chain Monte Carlo (MCMC), are used to obtain random samples $\boldsymbol{\theta}$ from the posterior density $p(\boldsymbol{\theta}|d_{o})$ by constructing a Markov chain whose steady-state distribution is the posterior distribution of target parameter. In our work, we use an advanced MCMC sampler Eryn Karnesis_2023 , which harnesses an ensamble affine-invariant sampler affine:2010 with parallel tempering to obtain samples from $p(\boldsymbol{\theta}|d_{o})$ in the MBHB case.

The typical time-domain data stream observed by the LISA instrument will be a combination of TDI variables $X=\{A,E,T\}$ , representing the response of the LISA instrument to the plus and cross polarisations of the incoming GW source in the transverse-traceless gauge tinto:2021 ; tinto:2023 :

d_{o}^{(X)}(t)=h_{\text{e}}^{(X)}(t;\boldsymbol{\theta}_{\text{0}})+n^{(X)}(t),\quad X=\{A,E,T\}.

(8)

Here $d_{o}$ is the observed data stream, $\boldsymbol{\theta}_{0}$ are the true parameters of the true gravitational wave $h^{(X)}_{\text{e}}$ , and $n^{(X)}(t)$ are noise fluctuations arising from perturbations to the LISA instrument from unresolvable GW sources and non-GW instrumental perturbations. In this paper, to explain the whole imputation procedure, we take the data stream on channel A. Under the assumption that the noise is stationary and follows a Gaussian distribution in the parameter estimation process, the log-likelihood with inner product ¹¹1 $(a|b)^{(X)}=4\text{Re}\int_{0}^{\infty}\text{d}f\frac{\hat{a}^{(X)}(f)(\hat{b}^{(X)}(f^{\prime}))^{\star}}{S^{(X)}_{n}(f^{\prime})}.$ (9) where $S_{n}^{(X)}$ is the power spectral density (PSD) of the noise process within a channel $X$ . Hatted quantities refer to the Fourier transform with convention, $\hat{h}(f)=\int_{0}^{\infty}\text{d}t\,h(t)\exp(-2\pi\text{i}ft).$ (10) finn1992detection ; Flanagan:1997kp is

p(d|\boldsymbol{\theta})=-\frac{1}{2}\sum_{A}(d-h_{m}|d-h_{m})^{(A)},

(11)

where $h_{m}$ are model templates, generating the likelihood when inferring parameters $\boldsymbol{\theta}$ with MCMC. Window functions are employed to reduce the effects of spectral leakage in Fourier transform analysis, which are then utilized in the following section on the toy model case as a smoothing taper to formulate the likelihood of the signal with gaps.

IV Application

In this section, we demonstrate how to impute the gaps with our proposed model in a simple toy example in Section IV.1 and a MBHB signal in Section IV.2. Noise is not included here to examine the performance of the proposed model for imputation only. This analysis assumes a denoising procedure has been applied earlier in the pipeline, and this will be the focus of forthcoming research. The simulated data are partitioned into training and validation sets, with 80% of the samples allocated for training and 20% reserved for validation.

IV.1 Toy model case

Consider a data stream of the form

d(t;a,f,\dot{f})=a\sin\left(2\pi t\left[f+\frac{1}{2}\dot{f}t\right]\right).

(12)

Assume an observed test signal $d_{o}$ with parameters $\boldsymbol{\theta}_{0}=\{a_{0}=5\cdot 10^{-21},f_{0}=10^{-3}\,\text{Hz},\dot{f}_{0}=10^{-8}\,\text{Hz}/\text{s}\}$ . The training set contains 800 generated signals using prior samples.

	$\displaystyle a$	$\displaystyle\sim\text{U}[a_{0}-10^{-21},a_{0}+10^{-21}]$
	$\displaystyle f$	$\displaystyle\sim\text{U}[f_{0}-10^{-6},f_{0}+10^{-6}]\,\text{Hz}$
	$\displaystyle\dot{f}$	$\displaystyle\sim\text{U}[\dot{f}_{0}-10^{-12},\dot{f}_{0}+10^{-12}]\,\text{Hz}/\text{s}.$

Here, we simulated the signal with an observation period of 3 days sampled with cadence $\Delta t=5$ seconds, yielding a signal length of 51480. SNRs determined by equation (8) of PhysRevD.109.083002 are between 80 and 200.

The structure of DCAE in the proposed model is in Table 1. Due to memory limitations, 160 outputs of DCAE will be sampled and cut into pieces with a length of 48 due to a simple signal structure. In the training of the BiGRU component, complete signal samples will first pass through the DCAE and then pass through a two-layer bi-directional gated recurring unit with a hidden state size $K=12$ followed by a fully-connected layer and a Tanh activation function. The training time is about $0.5$ hours for DCAE with 100 epochs and $2.5$ hours for AE with BiGRU layers with 50 epochs. The validation loss can be seen in Fig.12 in Appendix A.

Table 1: DCAE Model Structure.

Layer Type	Input Channels	Output Channels	Kernel Size	Stride
Conv1d	1	16	7	1
Conv1d	16	32	5	4
Conv1d	32	64	3	8
Flatten	64	103680	-	-
Linear	103680	4096	-	-
Linear	4096	1024	-	-
Linear	1024	4096	-	-
Linear	4096	103680	-	-
Unflatten	103680	64	-	-
ConvTranspose1d	64	32	3	8
ConvTranspose1d	32	16	5	4
ConvTranspose1d	16	1	7	1

Parameter estimation via MCMC will be considered on the back-transformed output of the proposed model. We tested the signal with different gaps defined in Section III.2 and compared their posterior distributions $p(\boldsymbol{\theta}|h_{G}(h_{C}(\hat{d}_{i}))$ with the original one $p(\boldsymbol{\theta}|d_{i})$ . The process of the Bayesian parameter estimation is similar to the toy model case in PhysRevD.109.083002 . A test case characterized by an on-duty percentage of 87.5% was utilized as a representative example. The proposed model exhibits a superior level of accuracy in estimating the missing data within the gaps, as demonstrated by the output orange line entirely overlapping the target blue line in Fig. 5.

We consider 85 different scenarios where the test signal is corrupted with gaps. The results of parameter estimation are demonstrated in Fig. 6, which provides a visual comparison of the Kullback-Leibler (KL) divergence ²²2The Kullback-Leibler divergence of $p_{1}(x)$ from $p_{2}(x)$ is $D_{KL}(p_{1}(x)\|p_{2}(x))=\mathbb{E}_{p_{1}(x)}[\log p_{1}(x)-\log p_{2}(x)]\approx\frac{1}{m}\sum^{m}_{i=1}\log p_{1}(x_{i})-\log p_{2}(x_{i})$ where $x_{i}\sim p_{1}$ . and absolute relative error for reconstructed signals and corrupted signals with gaps, to assess the performance of parameter estimation based on reconstructed signals relative to the true values and the posterior distribution of the signal without gaps. Since the exact waveform is known, the window function is applied in the Bayesian inference process for the signal with gaps, called “corrupted” signals. Let $p(\boldsymbol{\theta}|h_{G}(h_{C}(\hat{d}_{o})))$ be the posterior of the imputed signal and $p(\boldsymbol{\theta}|d_{o})$ be the posterior of original signal. The Gaussian kernel density estimation with the bandwidth selected by Scott’s rule scott2015multivariate is denoted by $\hat{p}$ . The summary of KL divergence $D_{KL}(\hat{p}(\boldsymbol{\theta}|d_{o})||\hat{p}(\boldsymbol{\theta}|h_{G}(h_{C}(\hat{d}_{o}))))$ and $D_{KL}(\hat{p}(\boldsymbol{\theta}|d_{o})||\hat{p}(\boldsymbol{\theta}|\hat{d}_{o}))$ from 85 replicates are shown the top-left plot in Figure 6.

In general, a higher median and wider spread of KL divergences for signals with gaps are observed, even though 4 cases of infinite values were ignored in the plot. The boxplots with log scale comparing the absolute relative error ratio show our proposed model’s ability to mitigate large biases when doing Bayesian inference with corrupted signals. One example of this scenario is displayed in Fig.14 in Appendix B. This result shows that, on average, the reconstruction signal was more useful in parameter estimation than using the corrupted signal.

IV.2 Massive black hole binary case

In this section, we tested our proposed model on a realistic massive black hole binary signal with unscheduled gaps defined in section III.2. Using LISA Analysis Tools michael_katz_2024_10930980 , we generate and then analyze complete inspiral-merger-ringdown frequency domain spin-aligned MBHBs in the solar system barycenter frame with the LISA response applied. The training set contains 4,000 samples generated by IMRPhenomHM waveforms and BBHx Katz_2020 ; Katz_2022 ; Khan_2016 ; London_2018 ; Husa_2016 with a uniform distribution on the primary mass $m_{1}\sim\text{U}[1.5\times 10^{6},2.5\times 10^{6}]$ . The data stream of channel A is transformed into the time domain to implement the proposed model. The structure of the DCAE component is similar to that of the toy model in Table 1, while we cut the sequence into pieces with a length of 1024 before training the BiGRU component, which is longer than in the case of the toy model with a simpler signal structure. The training time is about $4.3$ hours when training DCAE with 100 epochs and $3$ hours when training BiGRU with 50 epochs since the number of sampled outputs of DCAE is still 160. The validation loss can be seen in Fig.13 in Appendix A.

To test the performance of our model, we run MCMC to do parameter estimation focusing on a subset of parameters $\boldsymbol{\theta}=\{m_{T},q,\phi_{\text{ref}}\}$ . The Heterodyned likelihood is used to speed up the Bayesian inference process zackay2018relative . The parameters for the test signal are defined as follows: the total mass $m_{T}=m_{1}+m_{2}=2.7\cdot 10^{6}M_{\odot}$ ; mass ratio $q=m_{1}/m_{2}=0.35$ ; the two effective spin parameters of the two-component masses $\chi_{1}=0.5$ and $\chi_{2}=0.7$ ; the reference time $t_{c}=10^{6}\,$ seconds; luminosity distance $15\,$ Gpc; the reference phase $\phi_{\text{ref}}=0.6$ ; sky position $(\beta=0.7,\lambda=3.4)$ in ecliptic coordinates; and polarisation angle $\psi=\pi/4$ . The observation time will be $\sim 3$ days, sampled with cadence $\Delta t=4$ seconds. Therefore, the length of the data sets is $N=65536$ with an SNR of 3233.49. The basic LISA sensitivity is applied. Fig. 7 illustrates the imputation results for the corrupted signal. Despite the substantial loss of information, especially during the merger period, the proposed model effectively captures the principal characteristics of the sequence, as evidenced by the high degree of similarity between the orange line representing the recovery signal and the blue line denoting the target original signal.

We consider 86 different cases where the test signal is corrupted with gaps. We then calculated the SNR and overlap of the recovered signals ³³3The overlap is defined as the inner product of the complete signal and the reconstructed signal: $\displaystyle O(d_{o},h_{G}(h_{c}(\hat{d}_{o})))=<d_{o}|h_{G}(h_{c}(\hat{d}_{o}))>$ (13) . Results can be seen in Fig. 8.

All of the recovered signals achieve an overlap larger than 99%, and their SNRs are close to the original SNR. However, the results of the recovered signals with merging occurring in gaps differ significantly from those with gaps that do not contain the merging point. Two outliers in the overlap metric are observed for signals where the merging does not occur in gaps. One outlier corresponds to a substantial unduty cycle of 35.7%, while the other arises due to a gap in the signal occurring near the merging point, see Fig. 9, which also contributes to the outlier in SNR. This observation aligns with the findings reported in Dey_2021 , which demonstrate a greater impact as the gaps approach the merger point. Nevertheless, this effect is attenuated by our model during parameter estimation, as evidenced by the insignificant bias depicted in Fig. 15 in Appendix B.

Likewise, we conducted an analysis of the performance of the reconstructed signal in parameter estimation. Given the unavailability of the precise waveform in the time domain and the inherent computational challenges when performing Bayesian inference on the “corrupted” signals, our investigation is confined to reconstructed signals. The parameter estimation for signals with no gaps at the merging point performs better than those with gaps at the merging point. In Fig. 10, the absolute relative error ratios are less than 1.3% for all three parameters, and the bigger relative error ratio for unscheduled gaps at the merger shows some limitations of our signal reconstruction method. The boxplot for KL divergence shows even a bigger disparity in uncertainty estimation for parameters, notwithstanding the exclusion of seven infinite cases and one case exceeding 150 for gaps occurring during the merger. When unscheduled gaps happened at merger time, the gaps were poorly reconstructed, and KL divergences were 1000 times bigger on average. This is observed in the posterior comparison for the two cases in Fig. 16 and Fig. 17 in Appendix B. This underscores the necessity to investigate scenarios in which scheduled gaps occur during the merging phase.

To investigate this issue, a total of 60 distinct instances of scheduled gaps were analyzed, comprising 30 instances of 3.5-hour gaps and 30 instances of 7-hour gaps, occurring during the merging time with various injection times. It is not surprising to observe that the parameter estimates of these reconstructed signals exhibit some deviations from the true posterior distribution, as illustrated in Fig. 18 and Fig. 19 in Appendix B. In comparison to unscheduled gaps occurring during mergers, reconstructed signals with scheduled gaps exhibit notably fewer distortions as measured by the KL divergence, less than 0.5. This observation is also consistent with the findings reported in Dey_2021 . Furthermore, the analysis indicates that the reconstructed signals associated with 7-hour scheduled gaps exhibit a slightly greater impact for parameter estimation than 3.5-hour scheduled gaps, as it is shown in Fig. 11. This highlights the need for protected periods within the LISA data stream, particularly when adopting a biweekly maintenance schedule, to prevent scheduled gaps from coinciding with the merger phase of a signal. Therefore, to mitigate possible biases in parameter estimates, we recommend not scheduling antenna repointing during merger time.

V Discussion

In this paper, we have proposed an innovative BiGRU-CAE hybrid model that leverages the strengths of both convolutional autoencoders and gated recurrent units to address the challenges posed by data gaps in LISA gravitational wave observations. The DCAE component is well-suited for extracting relevant features from the high-dimensional input signals, while the BiGRU component can effectively capture the temporal dynamics of the gravitational wave signals. With the limitation of computation, our research focuses on gaps in data streams without noise on one channel. The toy model study demonstrates improvements in parameter estimation when applying our model. It is noteworthy to examine the influence of the timing of data gaps in the context of the realistic massive black hole case study. This underscores the imperative of optimizing the maintenance schedule of the interferometer to mitigate potential biases in gravitational wave (GW) analysis that could result from data gaps.

One key advantage of this hybrid approach is its ability to handle long-duration and interrupted LISA data streams. The convolutional layers within the DCAE are adept at efficiently processing the input signals, while the GRU component is proficient in modeling temporal correlations to ameliorate discontinuities present in the output of the DCAE. This is a significant improvement over previous methods, which struggled with the computational costs and modelling errors associated with handling long gaps. The end-to-end training of the BiGRU-CAE model simplifies subsequent analysis steps compared to traditional Bayesian augmentation methods, which treat the missing data as auxiliary variables Baghi_2019 . Instead of chopping the data stream into pieces before the training of the autoencoder Xu:2024jbo , our model trains the autoencoder with the whole sequence, which gives more robustness when imputing the gaps.

Separating a complex deep-learning model into two stacked components allows for greater modularity and flexibility. Each component can be developed, trained, and optimized independently, making the overall model more adaptable and easier to iterate. It also reduces the overall computational requirements. Furthermore, each component can be designed and trained to specialize in a specific task or learn a particular set of features, which improves performance compared to a single, monolithic model that has to learn all the necessary capabilities.

This study constitutes the preliminary investigation of the gaps at merger in signals from MBHB. Although our model exhibits competence in recovering signals with gaps not located at the merger point, minor discrepancies from the true values in the reconstructed signal with gaps at merger suggest that further research is needed. While we opted for a simplified scenario for the pure signal to illustrate the applicability of our method, subsequent research should incorporate noise into the data stream, thereby enabling the architecture to perform denoising and imputation tasks simultaneously. This approach will align more closely with the realistic conditions encountered in current LISA data processing. In addition, certain computational constraints still exist. We plan to retrain our proposed model utilizing GPU with larger training dataset to process long data streams on the A and E channels, employing a 2D convolutional autoencoder. As our model is predominantly signal-focused, our plan is to develop a more efficient denoising and inpainting pipeline capable of processing signals from diverse sources in the future.

The Python code for the toy model case in Section IV.1 is provided at https://github.com/bpandamao/BiGRU_CAE.

Acknowledgements

We thank Ollie Burke for his helpful discussions. All computations are performed on a virtual machine with 32GB RAM, 16 VCPUs, and an Ubuntu Linux operating system. The autoencoder and bi-directional gated recurrent unit were implemented using Python package PyTorch. We thank the Center for eResearch (CeR) at the University of Auckland for providing access to and assistance with the Nectar Research Cloud. Ruiting Mao would like to thank the University of Auckland for a UoA Doctoral scholarship. MCE and JEL acknowledge support by the Marsden grant MFP-UOA2131 from New Zealand Government funding, administered by the Royal Society Te Aparangi.

Appendix A Loss during Training in toy model and MBHB case

Appendix B Examples of the parameter estimation in application

References

[1] Pau Amaro-Seoane, Heather Audley, Stanislav Babak, John Baker, Enrico Barausse, Peter Bender, Emanuele Berti, Pierre Binetruy, Michael Born, Daniele Bortoluzzi, Jordan Camp, Chiara Caprini, Vitor Cardoso, Monica Colpi, John Conklin, Neil Cornish, Curt Cutler, Karsten Danzmann, Rita Dolesi, Luigi Ferraioli, Valerio Ferroni, Ewan Fitzsimons, Jonathan Gair, Lluis Gesa Bote, Domenico Giardini, Ferran Gibert, Catia Grimani, Hubert Halloin, Gerhard Heinzel, Thomas Hertog, Martin Hewitson, Kelly Holley-Bockelmann, Daniel Hollington, Mauro Hueller, Henri Inchauspe, Philippe Jetzer, Nikos Karnesis, Christian Killow, Antoine Klein, Bill Klipstein, Natalia Korsakova, Shane L Larson, Jeffrey Livas, Ivan Lloro, Nary Man, Davor Mance, Joseph Martino, Ignacio Mateos, Kirk McKenzie, Sean T McWilliams, Cole Miller, Guido Mueller, Germano Nardini, Gijs Nelemans, Miquel Nofrarias, Antoine Petiteau, Paolo Pivato, Eric Plagnol, Ed Porter, Jens Reiche, David Robertson, Norna Robertson, Elena Rossi, Giuliana Russano, Bernard Schutz, Alberto Sesana, David Shoemaker, Jacob Slutsky, Carlos F. Sopuerta, Tim Sumner, Nicola Tamanini, Ira Thorpe, Michael Troebs, Michele Vallisneri, Alberto Vecchio, Daniele Vetrugno, Stefano Vitale, Marta Volonteri, Gudrun Wanner, Harry Ward, Peter Wass, William Weber, John Ziemer, and Peter Zweifel. Laser interferometer space antenna, 2017.
[2] John Baker, Jillian Bellovary, Peter L Bender, Emanuele Berti, Robert Caldwell, Jordan Camp, John W Conklin, Neil Cornish, Curt Cutler, Ryan DeRosa, et al. The laser interferometer space antenna: unveiling the millihertz gravitational wave sky. arXiv preprint arXiv:1907.06482, 2019.
[3] P Bender, A Brillet, I Ciufoloni, AM Cruise, et al. Pre-phase a report. Study Report MPQ, 233:2, 1998.
[4] Monica Colpi, Karsten Danzmann, Martin Hewitson, Kelly Holley-Bockelmann, Philippe Jetzer, Gijs Nelemans, Antoine Petiteau, David Shoemaker, Carlos Sopuerta, Robin Stebbins, Nial Tanvir, Henry Ward, William Joseph Weber, Ira Thorpe, Anna Daurskikh, Atul Deep, Ignacio Fernández Núñez, César García Marirrodriga, Martin Gehler, Jean-Philippe Halain, Oliver Jennrich, Uwe Lammers, Jonan Larrañaga, Maike Lieser, Nora Lützgendorf, Waldemar Martens, Linda Mondin, Ana Piris Niño, Pau Amaro-Seoane, Manuel Arca Sedda, Pierre Auclair, Stanislav Babak, Quentin Baghi, Vishal Baibhav, Tessa Baker, Jean-Baptiste Bayle, Christopher Berry, Emanuele Berti, Guillaume Boileau, Matteo Bonetti, Richard Brito, Riccardo Buscicchio, Gianluca Calcagni, Pedro R. Capelo, Chiara Caprini, Andrea Caputo, Eleonora Castelli, Hsin-Yu Chen, Xian Chen, Alvin Chua, Gareth Davies, Andrea Derdzinski, Valerie Fiona Domcke, Daniela Doneva, Irna Dvorkin, Jose María Ezquiaga, Jonathan Gair, Zoltan Haiman, Ian Harry, Olaf Hartwig, Aurelien Hees, Anna Heffernan, Sascha Husa, David Izquierdo, Nikolaos Karnesis, Antoine Klein, Valeriya Korol, Natalia Korsakova, Thomas Kupfer, Danny Laghi, Astrid Lamberts, Shane Larson, Maude Le Jeune, Marek Lewicki, Tyson Littenberg, Eric Madge, Alberto Mangiagli, Sylvain Marsat, Ivan Martin Vilchez, Andrea Maselli, Josh Mathews, Maarten van de Meent, Martina Muratore, Germano Nardini, Paolo Pani, Marco Peloso, Mauro Pieroni, Adam Pound, Hippolyte Quelquejay-Leclere, Angelo Ricciardone, Elena Maria Rossi, Andrea Sartirana, Etienne Savalle, Laura Sberna, Alberto Sesana, Deirdre Shoemaker, Jacob Slutsky, Thomas Sotiriou, Lorenzo Speri, Martin Staab, Danièle Steer, Nicola Tamanini, Gianmassimo Tasinato, Jesus Torrado, Alejandro Torres-Orjuela, Alexandre Toubiana, Michele Vallisneri, Alberto Vecchio, Marta Volonteri, Kent Yagi, and Lorenz Zwick. Lisa definition study report, 2024.
[5] Emanuele Berti, Vitor Cardoso, and Clifford M Will. Gravitational-wave spectroscopy of massive black holes with the space interferometer lisa. Physical Review D, 73(6):064030, 2006.
[6] Scott A Hughes. Untangling the merger history of massive black holes with lisa. Monthly Notices of the Royal Astronomical Society, 331(3):805–816, 2002.
[7] Alberto Sesana, Francesco Haardt, Piero Madau, and Marta Volonteri. The gravitational wave signal from massive black hole binaries and its contribution to the lisa data stream. The Astrophysical Journal, 623(1):23, 2005.
[8] Alberto Vecchio. Lisa observations of rapidly spinning massive black hole binary systems. Physical Review D, 70(4):042001, 2004.
[9] Jonathan R Gair, Stanislav Babak, Alberto Sesana, Pau Amaro-Seoane, Enrico Barausse, Christopher P L Berry, Emanuele Berti, and Carlos Sopuerta. Prospects for observing extreme-mass-ratio inspirals with lisa. Journal of Physics: Conference Series, 840:012021, May 2017.
[10] Stanislav Babak, Jonathan Gair, Alberto Sesana, Enrico Barausse, Carlos F. Sopuerta, Christopher P. L. Berry, Emanuele Berti, Pau Amaro-Seoane, Antoine Petiteau, and Antoine Klein. Science with the space-based interferometer lisa. v. extreme mass-ratio inspirals. Physical Review D, 95(10), May 2017.
[11] Neil Cornish and Travis Robson. Galactic binary science with the new lisa design. Journal of Physics: Conference Series, 840:012024, May 2017.
[12] B. Willems, A. Vecchio, and V. Kalogera. Probing white dwarf interiors with lisa: Periastron precession in eccentric double white dwarfs. Physical Review Letters, 100(4), January 2008.
[13] Tyson B Littenberg and Nicolás Yunes. Binary white dwarfs as laboratories for extreme gravity with lisa. Classical and Quantum Gravity, 36(9):095017, April 2019.
[14] Pau Amaro Seoane, Manuel Arca Sedda, Stanislav Babak, Christopher P. L. Berry, Emanuele Berti, Gianfranco Bertone, Diego Blas, Tamara Bogdanović, Matteo Bonetti, Katelyn Breivik, Richard Brito, Robert Caldwell, Pedro R. Capelo, Chiara Caprini, Vitor Cardoso, Zack Carson, Hsin-Yu Chen, Alvin J. K. Chua, Irina Dvorkin, Zoltan Haiman, Lavinia Heisenberg, Maximiliano Isi, Nikolaos Karnesis, Bradley J. Kavanagh, Tyson B. Littenberg, Alberto Mangiagli, Paolo Marcoccia, Andrea Maselli, Germano Nardini, Paolo Pani, Marco Peloso, Mauro Pieroni, Angelo Ricciardone, Alberto Sesana, Nicola Tamanini, Alexandre Toubiana, Rosa Valiante, Stamatis Vretinaris, David J. Weir, Kent Yagi, and Aaron Zimmerman. The effect of mission duration on lisa science objectives. General Relativity and Gravitation, 54(1), December 2021.
[15] Lu Wang, Hong-Yu Chen, Xiangyu Lyu, En-Kun Li, and Yi-Ming Hu. Window and inpainting: dealing with data gaps for tianqin, 2024.
[16] Jérôme Carré and Edward K. Porter. The effect of data gaps on lisa galactic binary parameter estimation, 2010.
[17] Matthew C. Edwards, Patricio Maturana-Russel, Renate Meyer, Jonathan Gair, Natalia Korsakova, and Nelson Christensen. Identifying and addressing nonstationary lisa noise. Phys. Rev. D, 102:084062, Oct 2020.
[18] Quentin Baghi, James Ira Thorpe, Jacob Slutsky, John Baker, Tito Dal Canton, Natalia Korsakova, and Nikos Karnesis. Gravitational-wave parameter estimation with gaps in lisa: A bayesian data augmentation method. Physical Review D, 100(2), July 2019.
[19] Kallol Dey, Nikolaos Karnesis, Alexandre Toubiana, Enrico Barausse, Natalia Korsakova, Quentin Baghi, and Soumen Basak. Effect of data gaps on the detectability and parameter estimation of massive black hole binaries with lisa. Physical Review D, 104(4), August 2021.
[20] Jonathan Blackman, Bela Szilagyi, Chad R. Galley, and Manuel Tiglio. Sparse representations of gravitational waves from precessing compact binaries. Physical Review Letters, 113(2), July 2014.
[21] Tyson B. Littenberg. Gravitational wave sources as timing references for lisa data. Phys. Rev. D, 98:043008, Aug 2018.
[22] Aurore Blelly, Jérôme Bobin, and Hervé Moutarde. Sparse data inpainting for the recovery of Galactic-binary gravitational wave signals from gapped data. Monthly Notices of the Royal Astronomical Society, 509(4):5902–5917, 11 2021.
[23] A. Blelly, H. Moutarde, and J. Bobin. Sparsity-based recovery of galactic-binary gravitational waves. Physical Review D, 102(10), November 2020.
[24] Tianyu Zhao, Ruijun Shi, Yue Zhou, Zhoujian Cao, and Zhixiang Ren. Dawning of a new era in gravitational wave data analysis: Unveiling cosmic mysteries via artificial intelligence – a systematic review, 2023.
[25] He Wang, Shichao Wu, Zhoujian Cao, Xiaolin Liu, and Jian-Yang Zhu. Gravitational-wave signal recognition of ligo data by deep learning. Phys. Rev. D, 101:104003, May 2020.
[26] Plamen G. Krastev, Kiranjyot Gill, V. Ashley Villar, and Edo Berger. Detection and parameter estimation of gravitational waves from binary neutron-star mergers in real ligo data using deep learning. Physics Letters B, 815:136161, April 2021.
[27] Daniel George and Eliu Antonio Huerta. Deep learning for real-time gravitational wave detection and parameter estimation: Results with advanced ligo data. Physics Letters B, 778:64–70, 2018.
[28] Matthew C. Edwards. Classifying the equation of state from rotating core collapse gravitational waves with deep learning. Phys. Rev. D, 103:024025, Jan 2021.
[29] Hunter Gabbard, Chris Messenger, Ik Siong Heng, Francesco Tonolini, and Roderick Murray-Smith. Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy. Nature Physics, 18:112 – 117, 2019.
[30] Tianyu Zhao, Ruoxi Lyu, He Wang, Zhoujian Cao, and Zhixiang Ren. Space-based gravitational wave signal detection and extraction with deep neural network. Communications Physics, 6(1), August 2023.
[31] Shreejit Jadhav, Mihir Shrivastava, and Sanjit Mitra. Towards a robust and reliable deep learning approach for detection of compact binary mergers in gravitational wave data. Machine Learning: Science and Technology, 4(4):045028, nov 2023.
[32] Tarin Eccleston and Matthew C. Edwards. A generative adversarial network for stellar core-collapse gravitational-waves, 2024.
[33] Melissa Lopez, Vincent Boudart, Kerwin Buijsman, Amit Reza, and Sarah Caudill. Simulating transient noise bursts in LIGO with generative adversarial networks. Phys. Rev. D, 106:023027, Jul 2022.
[34] Jade Powell, Ling Sun, Katinka Gereb, Paul D Lasky, and Markus Dollmann. Generating transient noise artefacts in gravitational-wave detector data with generative adversarial networks. Classical and Quantum Gravity, 40(3):035006, jan 2023.
[35] Alvin J. K. Chua, Chad R. Galley, and Michele Vallisneri. Reduced-order modeling with artificial neurons for gravitational-wave inference. Phys. Rev. Lett., 122:211101, May 2019.
[36] Michael L. Katz, Alvin J. K. Chua, Lorenzo Speri, Niels Warburton, and Scott A. Hughes. Fast extreme-mass-ratio-inspiral waveforms: New tools for millihertz gravitational-wave data analysis. Phys. Rev. D, 104:064047, Sep 2021.
[37] Chung-Hao Liao and Feng-Li Lin. Deep generative models of gravitational waveforms via conditional autoencoder. Physical Review D, 103(12), June 2021.
[38] Hunter Gabbard, Michael Williams, Fergus Hayes, and Chris Messenger. Matching matched filtering with deep networks for gravitational-wave astronomy. Physical Review Letters, 120(14), April 2018.
[39] Stephen R. Green, Christine Simpson, and Jonathan Gair. Gravitational-wave parameter estimation with autoregressive neural network flows. Phys. Rev. D, 102:104057, Nov 2020.
[40] Maximilian Dax, Stephen R. Green, Jonathan Gair, Jakob H. Macke, Alessandra Buonanno, and Bernhard Schölkopf. Real-time gravitational wave science with neural posterior estimation. Physical Review Letters, 127(24), December 2021.
[41] Jurriaan Langendorff, Alex Kolmus, Justin Janquart, and Chris Van Den Broeck. Normalizing flows as an avenue to studying overlapping gravitational wave signals. Physical Review Letters, 130(17), April 2023.
[42] Vincenzo Benedetto, Francesco Gissi, Gioele Ciaparrone, and Luigi Troiano. Ai in gravitational wave analysis, an overview. Applied Sciences, 13(17), 2023.
[43] Jun Wang, Wenjie Du, Wei Cao, Keli Zhang, Wenjia Wang, Yuxuan Liang, and Qingsong Wen. Deep learning for multivariate time series imputation: A survey, 2024.
[44] Ruiting Mao, Jeong Eun Lee, Ollie Burke, Alvin J. K. Chua, Matthew C. Edwards, and Renate Meyer. Calibrating approximate bayesian credible intervals of gravitational-wave parameters. Phys. Rev. D, 109:083002, Apr 2024.
[45] P. Bacon, A. Trovato, and M. Bejger. Denoising gravitational-wave signals from binary black holes with dilated convolutional autoencoder, 2022.
[46] Chinthak Murali and David Lumley. Detecting and denoising gravitational wave signals from binary black holes using deep learning. Phys. Rev. D, 108:043024, Aug 2023.
[47] Filip Morawski, Michał Bejger, Elena Cuoco, and Luigia Petre. Anomaly detection in gravitational waves data using convolutional autoencoders. Machine Learning: Science and Technology, 2(4):045014, 2021.
[48] S Hochreiter. Long short-term memory. Neural Computation MIT-Press, 1997.
[49] Eric A. Moreno, Jean-Roch Vlimant, Maria Spiropulu, Bartlomiej Borzyszkowski, and Maurizio Pierini. Source-agnostic gravitational-wave detection with recurrent autoencoders, 2021.
[50] Hongyu Shen, Daniel George, Eliu. A. Huerta, and Zhizhen Zhao. Denoising gravitational waves with enhanced deep recurrent denoising auto-encoders. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, May 2019.
[51] Ryan Raikman, Eric A Moreno, Ekaterina Govorkova, Ethan J Marx, Alec Gunny, William Benoit, Deep Chatterjee, Rafia Omer, Muhammed Saleem, Dylan S Rankin, et al. Gwak: gravitational-wave anomalous knowledge with recurrent autoencoders. Machine Learning: Science and Technology, 5(2):025020, 2024.
[52] Chayan Chatterjee, Linqing Wen, Foivos Diakogiannis, and Kevin Vinsen. Extraction of binary black hole gravitational wave signals from detector data using deep learning. Physical Review D, 104(6):064046, 2021.
[53] Yuxiang Xu, Minghui Du, Peng Xu, Bo Liang, and He Wang. Gravitational Wave Signal Extraction Against Non-Stationary Instrumental Noises with Deep Neural Network. 2 2024.
[54] Jianye Zhang and Peng Yin. Multivariate time series missing data imputation using recurrent denoising autoencoder. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 760–764, 2019.
[55] Jacob Daniels, Colleen P. Bailey, and Lu Liang. Filling cloud gaps in satellite aod retrievals using an lstm cnn-autoencoder model. In IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, pages 2758–2761, 2022.
[56] Yao Jia, Chongyu Zhou, and Mehul Motani. Spatio-temporal autoencoder for feature learning in patient data with missing observations. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 886–890, 2017.
[57] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014.
[58] Ankita Gupta, Gurunath Gurrala, and Pidaparthy S Sastry. Instability prediction in power systems using recurrent neural networks. In IJCAI international joint conference on artificial intelligence, pages 1795–1801. International Joint Conferences on Artificial Intelligence, 2017.
[59] Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values, 2016.
[60] Yijun Zhao, Matias Berretta, Tong Wang, and Tanuja Chitnis. Gru-df: A temporal model with dynamic imputation for missing target values in longitudinal patient data. In 2020 IEEE International Conference on Healthcare Informatics (ICHI), pages 1–7, 2020.
[61] Shudong Yang, Xueying Yu, and Ying Zhou. Lstm and gru neural network performance comparison study: Taking yelp review dataset as an example. In 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), pages 98–101, 2020.
[62] Serafín Alonso, Antonio Morán, Daniel Pérez, Miguel A Prada, Juan J Fuertes, and Manuel Domínguez. Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder1. Integrated Computer-Aided Engineering, (Preprint):1–16, 2024.
[63] Fayez Abu-Ajamieh. A comparative study of machine learning and neural network models in short-term market prediction. 2024.
[64] Junxiong Chen, Xiong Feng, Lin Jiang, and Qiao Zhu. State of charge estimation of lithium-ion battery using denoising autoencoder and gated recurrent unit recurrent neural network. Energy, 227:120451, 2021.
[65] Hamzaoui Ikhlasse, Duthil Benjamin, Courboulay Vincent, and Medromi Hicham. Multimodal cloud resources utilization forecasting using a bidirectional gated recurrent unit predictor based on a power efficient stacked denoising autoencoders. Alexandria Engineering Journal, 61(12):11565–11577, 2022.
[66] Ling Jiang, Juping Gu, Xinsong Zhang, Liang Hua, and Yueming Cai. Multi-type missing imputation of time-series power equipment monitoring data based on moving average filter–asymmetric denoising autoencoder. Sensors, 23(24), 2023.
[67] Xiaochen Lai, Yachen Yao, Jichong Mu, and Liyong Zhang. Tracking-removed gru with denoising autoencoder for multivariate time series imputation. In 2024 4th International Conference on Neural Networks, Information and Communication Engineering (NNICE), pages 1734–1738, 2024.
[68] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12), 2010.
[69] Amit Rai, Ashish Shrivastava, and Kartick Chandra Jana. A robust auto encoder-gated recurrent unit (ae-gru) based deep learning approach for short term solar power forecasting. Optik, 2021.
[70] Yann Lecun. PhD thesis: Modeles connexionnistes de l’apprentissage (connectionist learning models). Universite P. et M. Curie (Paris 6), June 1987.
[71] Junhai Zhai, Sufang Zhang, Junfen Chen, and Qiang He. Autoencoder and its various variants. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 415–419, 2018.
[72] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, page 1096–1103, New York, NY, USA, 2008. Association for Computing Machinery.
[73] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(110):3371–3408, 2010.
[74] Donghoon Lee, Sunghoon Choi, and Hee-Joung Kim. Performance evaluation of image denoising developed using convolutional denoising autoencoders in chest radiography. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 884, 12 2017.
[75] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches, 2014.
[76] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.
[77] Sunitha Basodi, Chunyan Ji, Haiping Zhang, and Yi Pan. Gradient amplification: An efficient way to train deep neural networks, 2020.
[78] Roberto Cahuantzi, Xinye Chen, and Stefan Güttel. A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences, page 771–785. Springer Nature Switzerland, 2023.
[79] Chayan Chatterjee, Linqing Wen, Foivos Diakogiannis, and Kevin Vinsen. Extraction of binary black hole gravitational wave signals from detector data using deep learning. Physical Review D, 104(6), September 2021.
[80] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
[81] Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3. Atlanta, GA, 2013.
[82] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation, 2015.
[83] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481–2495, 2017.
[84] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation, 2018.
[85] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. 2014.
[86] Foivos I. Diakogiannis, François Waldner, and Peter Caccetta. Looking for change? roll the dice and demand attention. Remote Sensing, 13(18):3707, September 2021.
[87] Nikolaos Karnesis, Michael L Katz, Natalia Korsakova, Jonathan R Gair, and Nikolaos Stergioulas. Eryn: a multipurpose sampler for bayesian inference. Monthly Notices of the Royal Astronomical Society, 526(4):4814–4830, September 2023.
[88] Jonathan Goodman and Jonathan Weare. Ensemble samplers with affine invariance. Communications in Applied Mathematics and Computational Science, 5(1):65 – 80, 2010.
[89] Massimo Tinto and Sanjeev Dhurandhar. Time-delay interferometry. Living Reviews in Relativity, 24, 2021.
[90] Massimo Tinto, Sanjeev Dhurandhar, and Dishari Malakar. Second-generation time-delay interferometry. Phys. Rev. D, 107:082001, Apr 2023.
[91] Lee S Finn. Detection, measurement, and gravitational radiation. Physical Review D, 46(12):5236, 1992.
[92] Eanna E. Flanagan and Scott A. Hughes. Measuring gravitational waves from binary black hole coalescences: 2. The Waves’ information and its extraction, with and without templates. Phys. Rev. D, 57:4566–4587, 1998.
[93] David W Scott. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015.
[94] Michael Katz, CChapmanbird, Lorenzo Speri, Nikolaos Karnesis, and Natalia Korsakova. mikekatz04/lisaanalysistools: First main release., April 2024.
[95] Michael L. Katz, Sylvain Marsat, Alvin J.K. Chua, Stanislav Babak, and Shane L. Larson. Gpu-accelerated massive black hole binary parameter estimation with lisa. Physical Review D, 102(2), July 2020.
[96] Michael L. Katz. Fully automated end-to-end pipeline for massive black hole binary signal extraction from lisa data. Physical Review D, 105(4), February 2022.
[97] Sebastian Khan, Sascha Husa, Mark Hannam, Frank Ohme, Michael Pürrer, Xisco Jiménez Forteza, and Alejandro Bohé. Frequency-domain gravitational waves from nonprecessing black-hole binaries. ii. a phenomenological model for the advanced detector era. Physical Review D, 93(4), February 2016.
[98] Lionel London, Sebastian Khan, Edward Fauchon-Jones, Cecilio García, Mark Hannam, Sascha Husa, Xisco Jiménez-Forteza, Chinmay Kalaghatgi, Frank Ohme, and Francesco Pannarale. First higher-multipole model of gravitational waves from spinning and coalescing black-hole binaries. Physical Review Letters, 120(16), April 2018.
[99] Sascha Husa, Sebastian Khan, Mark Hannam, Michael Pürrer, Frank Ohme, Xisco Jiménez Forteza, and Alejandro Bohé. Frequency-domain gravitational waves from nonprecessing black-hole binaries. i. new numerical waveforms and anatomy of the signal. Physical Review D, 93(4), February 2016.
[100] Barak Zackay, Liang Dai, and Tejaswi Venumadhav. Relative binning and fast likelihood evaluation for gravitational wave parameter estimation, 2018.