This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SimPSI: A Simple Strategy to Preserve Spectral Information
in Time Series Data Augmentation

Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, and Chang D. Yoo
Abstract

Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency domain. To address this issue, we propose a simple strategy to preserve spectral information (SimPSI) in time series data augmentation. SimPSI preserves the spectral information by mixing the original and augmented input spectrum weighted by a preservation map, which indicates the importance score of each frequency. Specifically, our experimental contributions are to build three distinct preservation maps: magnitude spectrum, saliency map, and spectrum-preservative map. We apply SimPSI to various time series data augmentations and evaluate its effectiveness across a wide range of time series benchmarks. Our experimental results support that SimPSI considerably enhances the performance of time series data augmentations by preserving core spectral information. The source code used in the paper is available at https://github.com/Hyun-Ryu/simpsi.

Introduction

Time series data, whether univariate or multivariate, plays a crucial role in various domains such as medicine (Lipton et al. 2016), physiology (Jia et al. 2020), and sensory devices (Yao et al. 2017). Unfortunately, it is limited to collecting data samples under consideration of different types, constraining the performance and capabilities of neural networks that learn from it. To address this issue, data augmentation (Iwana and Uchida 2021; Um et al. 2017) is employed as a simple yet effective solution via artificially increasing the number of samples based on a slight variation or perturbation on the original samples.

Data augmentation techniques have been extensively studied for time series, incorporating methods such as Jittering, Scaling, Magnitude warping, Time warping, Permutation (Um et al. 2017), Shifting (Woo et al. 2022), and Dropout (Yang and Hong 2022). These perturbations have been popular choices in the time domain. The data augmentation is also considered in the frequency domain via applying the Fourier transform to time series data. The spectrum is then randomly perturbed before being converted back into the time domain through the inverse Fourier transform. Notable techniques in this category include Frequency masking, Frequency mixing (Chen et al. 2023), and Frequency adding (Zhang et al. 2022).

Refer to caption
Figure 1: Dependency on data domain of time series data augmentation techniques. The plot shows the increment of classification accuracy of a baseline model after applying each data augmentation technique, which is evaluated on signal demodulation (Simulation), human activity recognition (HAR), and sleep stage detection (SleepEDF) tasks.

We have discovered that while the aforementioned data augmentation techniques show effectiveness in certain specific tasks (Um et al. 2017), they do not generalize well to time series classification benchmarks. Our experimental evidence in Fig. 1 presents the ungeneralized effectiveness of data augmentation techniques according to the datasets, such as signal demodulation, human activity recognition, and sleep stage detection.111Detailed information about the tasks and our experimental setup can be found in the Experiments section. Those techniques, though reliant on randomness, operate under the assumption that the core information within the data is preserved. However, the result suggests that perturbing the original time series data is heuristic and depends on the data domain, which leads to losing essential information necessary to solve the tasks.

The observed reduction in performance is attributed to an implicit bias in the frequency domain introduced by each data augmentation technique. This bias alters the original data distribution. For example, Jittering adds a consistent amount of random noise across all frequencies, often obscuring subtle high-frequency components. Permutation, meanwhile, introduces abrupt changes at the boundaries of each fragment, consistently enhancing high-frequency components. Time warping globally distorts the temporal density of the original data, introducing even more spectral bias than Permutation. Fig. 2 provides illustrative examples of these data augmentation techniques.

Refer to caption
Figure 2: Visualization of a representative example from the HAR dataset in the time and frequency domain with various time series data augmentation techniques. Each color denotes a channel, and three channels are shown.

In this paper, we introduce a simple strategy for preserving spectral information during time series data augmentation, which we refer to as SimPSI. Our strategy involves mixing the original spectral data and its augmented form, weighted by a preservation map. After applying any time series data augmentation technique, SimPSI converts the original and augmented time series to the frequency domain. It then combines the original spectrum with the augmented version based on the weightage given by the preservation map, which indicates the importance score for each frequency component. The combined spectrum is subsequently transformed back to the time domain, resulting in the final output of our framework. The remaining efforts concentrate on defining a well-structured preservation map. We propose three types of preservation maps: magnitude spectrum, saliency map, and spectrum-preservative map. The first two types use the given data’s magnitude spectrum and saliency map (Simonyan, Vedaldi, and Zisserman 2013) as the preservation map. For the spectrum-preservative map, we developed a preservation map generator that takes input spectrum data and returns the preservation map. This map is learned through a preservation contrastive loss function that influences differentiated model output scores based on the preservation quality. We also propose a training strategy for improved optimization. To demonstrate the efficacy of SimPSI, we apply it to various time series data augmentation techniques and compare performance across different benchmarks. We also create a simulation to assess whether the proposed method correctly identifies spectral regions to preserve during data augmentation. Our experimental results demonstrate that SimPSI significantly enhances the effectiveness of time series data augmentation techniques by preserving essential spectral information, thereby preventing unintentional loss of core spectral details.

Related Works

Refer to caption
Figure 3: A SimPSI diagram. The original data is augmented randomly in the time domain. Then, the original and augmented data are both Fourier-transformed. The original spectrum is weighted by its preservation map, while the augmented spectrum is weighted by the negated preservation map, and those two are added. It is inverse-Fourier-transformed, which generates an information-preserved augmented view of the original time series data. We use a single-channel time series for better understanding, in which we visualize the real parts of the time series and magnitudes of spectra and omit channel-wise broadcasting.

Data Augmentation for Time Series

Various data augmentation techniques have been proposed for time series. One prevalent and intuitive strategy involves slightly altering the magnitude. For instance, Jittering (Um et al. 2017) introduces additive white Gaussian noise, Scaling (Um et al. 2017) multiplies by a random scalar value, Shifting (Woo et al. 2022) adds a random scalar value, Magnitude warping (Um et al. 2017) multiplies by a random polynomial curve, and Dropout (Yang and Hong 2022) masks random time indices. An alternative approach involves modifying the time scale rather than the magnitude. Time warping (Um et al. 2017), for instance, interpolates the time scale with a random polynomial curve, while Permutation (Um et al. 2017) rearranges the time order. An additional method involves perturbing the spectrum. Techniques such as Frequency masking, Frequency mixing (Chen et al. 2023), and Frequency adding (Zhang et al. 2022) serve as simple strategies that appropriately perturb global dependencies in the time domain.

Data Augmentation for Information Preservation

Data augmentation inherently introduces perturbations into the original data. If not appropriately managed, these perturbations could lead to significant information loss or an introduction of unnecessary noise and ambiguity. To mitigate this, studies have focused on information preservation. In the vision domain, KeepAugment (Gong et al. 2021) employs a saliency map (Simonyan, Vedaldi, and Zisserman 2013) of each image to identify and preserve informative regions during augmentation. AugMix (Hendrycks et al. 2020) generates a composite of various augmented views of the data and mixes it with the original data, weighted by a random scalar. This ensures the final image is not overly distanced from the original one. In natural language processing, SSMix (Yoon, Kim, and Park 2021) and SMSMix (Yoon et al. 2022) leverage saliency map to retain certain word sequences, ensuring that crucial information remains intact during the data augmentation process. For time series data, Input smoothing (Liu et al. 2022) scales high-frequency entries in the frequency domain by a random scalar, thereby reducing the impact of data noise. However, its application is limited to noise reduction, and the degree of reduction is randomly determined.

Method

Mixing for Information Preservation

We transform an input time series xtC×Lx_{t}\in\mathbb{C}^{C\times L}, where CC and LL denote the number of channels and length of the input, to a spectrum xfC×Lx_{f}\in\mathbb{C}^{C\times L} by the fast Fourier transform (FFT). Then, we apply data augmentation to xtx_{t}, which gives an augmented time series xtC×Lx^{\prime}_{t}\in\mathbb{C}^{C\times L}. We transform xtx^{\prime}_{t} to an augmented spectrum xfC×Lx^{\prime}_{f}\in\mathbb{C}^{C\times L} by the FFT. Then, we define a preservation map PLP\in\mathbb{R}^{L} with the same length as the spectrum xfx_{f}, which indicates the importance score of each frequency component between 0 and 1. We mix the spectrum xfx_{f} and its augmented view xfx^{\prime}_{f} with the preservation map PP to produce an information-preserved spectrum x~fC×L\tilde{x}_{f}\in\mathbb{C}^{C\times L} as follows:

x~f=(𝟏CPT)xf+(𝟏C(𝟏LP)T)xf.\displaystyle\tilde{x}_{f}=(\mathbf{1}_{C}\cdot P^{T})\odot x_{f}+(\mathbf{1}_{C}\cdot(\mathbf{1}_{L}-P)^{T})\odot x^{\prime}_{f}. (1)

Since the preservation map PP applies uniformly to different channels of the spectra, we broadcast it to the channel dimension to enable elementwise multiplication with the spectra. Frequencies with high importance score have a spectrum value closer to the data xfx_{f} than its augmentation xfx^{\prime}_{f}, and those with low importance score have a spectrum value closer to xfx^{\prime}_{f} than xfx_{f}. It enables us to retain important spectral regions and distort non-informative regions during augmentation. We transform x~f\tilde{x}_{f} back to an information-preserved time series x~tC×L\tilde{x}_{t}\in\mathbb{C}^{C\times L} by applying inverse fast Fourier transform (IFFT), which is the final output of the proposed SimPSI. For classifier training, given a classifier p^\hat{p}, classification loss cl\mathcal{L}_{cl} is calculated using the cross-entropy loss of the prediction score of x~t\tilde{x}_{t} and its label yy as follows:

cl=ce(p^(y|x~t),y).\displaystyle\mathcal{L}_{cl}=\mathcal{L}_{ce}(\hat{p}(y|\tilde{x}_{t}),y). (2)

The following sections focus on defining the preservation map PP, and we propose three methods: magnitude spectrum, saliency map, and spectrum-preservative map.

Efficient Implementation for Real-Valued Time Series.

Most of the real-world time series data consists of real values. Using the conjugate symmetry property of the Fourier transform for real-valued time series, we take the first half of the spectrum, in which the dimensions of the spectrum reduce to {xf,xf,x~f}C×(L/2+1)\{x_{f},x^{\prime}_{f},\tilde{x}_{f}\}\in\mathbb{C}^{C\times(\lfloor L/2\rfloor+1)} while the dimensions of time series change to {xt,xt,x~t}C×L\{x_{t},x^{\prime}_{t},\tilde{x}_{t}\}\in\mathbb{R}^{C\times L}. The dimension of the preservation map also reduces to PL/2+1P\in\mathbb{R}^{\lfloor L/2\rfloor+1}.

Magnitude Spectrum

We introduce a magnitude spectrum PmagP_{mag} for preserving spectral information, assuming frequencies with large magnitudes are informative while those with small magnitudes are mainly non-informative noise. Given an input spectrum xfC×Lx_{f}\in\mathbb{C}^{C\times L}, we calculate the magnitude spectrum |xf|C×L|x_{f}|\in\mathbb{R}^{C\times L} and take the channel-wise maximum |xf|maxL|x_{f}|_{max}\in\mathbb{R}^{L} to aggregate the channel information as follows:

Pmag=Norm(|xf|max)\displaystyle P_{mag}=Norm(|x_{f}|_{max}) (3)

where NormNorm is a min-max normalization so that values of the magnitude spectrum PmagLP_{mag}\in\mathbb{R}^{L} are between 0 and 1. Preserving frequencies with large magnitudes makes the original and augmented data look alike, but the core information for solving the task might disappear. For instance, detecting abnormalities in the Electrocardiogram (ECG) signals relies on capturing the pattern of small high-frequency components (Tragardh and Schlegel 2006), whereas the magnitude spectrum PmagP_{mag} eliminates the core frequencies for the classification during the data augmentation process just because those have a small magnitude.

Saliency Map

We present a saliency map for time series, PslcP_{slc}, to find informative spectral regions regardless of their magnitudes. Given an input spectrum xfC×Lx_{f}\in\mathbb{C}^{C\times L}, we transform it to a time series xtC×Lx_{t}\in\mathbb{C}^{C\times L} by the IFFT and feed xtx_{t} into the classifier p^\hat{p} to obtain the corresponding label logit value f^(y|xt)\hat{f}(y|x_{t}). Then, we calculate an absolute value of a gradient of the logit value f^(y|xt)\hat{f}(y|x_{t}) with respect to the input spectrum xfx_{f} and take the channel-wise maximum |xff^(y|xt)|maxL|\nabla_{x_{f}}\hat{f}(y|x_{t})|_{max}\in\mathbb{R}^{L} to aggregate the channel information as follows:

Pslc=Norm(|xff^(y|1(xf))|max)\displaystyle P_{slc}=Norm(|\nabla_{x_{f}}\hat{f}(y|\mathcal{F}^{-1}(x_{f}))|_{max}) (4)

where NormNorm is a min-max normalization to make values of the saliency map PslcLP_{slc}\in\mathbb{R}^{L} between 0 and 1. However, it has a practical problem that the preservation quality solely depends on the training dynamics of the classifier, which could lead to an unstable performance. In addition, calculating the saliency map takes a significant amount of time backpropagating the gradients, which incurs a computational burden.

Algorithm 1 SimPSI (Spectrum-Preservative Map) Pseudocode
  Input: Given an input time series xtx_{t}, label yy, preservation map generator G()G(\cdot), classifier p^\hat{p}, and data augmentation 𝒯\mathcal{T}
  function AugmentAndPreserve(xtx_{t}, xfx_{f}, PP)
        Sample operation T𝒯T\sim\mathcal{T}
        xt=T(xt)x^{\prime}_{t}=T(x_{t}) \triangleright Apply data augmentation
        xf=x^{\prime}_{f}= FFT(xt)(x^{\prime}_{t})
        x~f=(𝟏CPT)xf+(𝟏C(𝟏LP)T)xf\tilde{x}_{f}=(\mathbf{1}_{C}\cdot P^{T})\odot x_{f}+(\mathbf{1}_{C}\cdot(\mathbf{1}_{L}-P)^{T})\odot x^{\prime}_{f}
        x~t=\tilde{x}_{t}= IFFT(x~f)(\tilde{x}_{f})
        return x~t\tilde{x}_{t}
  end function
  xf=x_{f}= FFT(xt)(x_{t})
  x~t\tilde{x}_{t} = AugmentAndPreserve(xtx_{t}, xfx_{f}, G(xf)G(x_{f}))
  Compute classification loss cl=ce(p^(y|x~t),y)\mathcal{L}_{cl}=\mathcal{L}_{ce}(\hat{p}(y|\tilde{x}_{t}),y)
  Sample random preservation map nfU(0,1)n_{f}\sim U(0,1)
  x~trnd\tilde{x}^{rnd}_{t} = AugmentAndPreserve(xtx_{t}, xfx_{f}, nfn_{f})
  x~t+\tilde{x}^{+}_{t} = AugmentAndPreserve(xtx_{t}, xfx_{f}, G(xf)G(x_{f})) \triangleright x~tx~t+\tilde{x}_{t}\neq\tilde{x}^{+}_{t}
  x~t\tilde{x}^{-}_{t} = AugmentAndPreserve(xtx_{t}, xfx_{f}, 1G(xf)1-G(x_{f}))
  Compute classification loss clrnd\mathcal{L}_{cl}^{rnd}, cl+\mathcal{L}_{cl}^{+}, and cl\mathcal{L}_{cl}^{-}
  for x~trnd\tilde{x}^{rnd}_{t}, x~t+\tilde{x}^{+}_{t}, and x~t\tilde{x}^{-}_{t}, respectively
  Compute preservation contrastive loss pc\mathcal{L}_{pc}
  =max(cl+clrnd+β1,0)+max(cl+Lcl+β2,0)=max(\mathcal{L}_{cl}^{+}-\mathcal{L}_{cl}^{rnd}+\beta_{1},0)+max(\mathcal{L}_{cl}^{+}-L_{cl}^{-}+\beta_{2},0)
  Loss output: cl\mathcal{L}_{cl}, pc\mathcal{L}_{pc}

Spectrum-Preservative Map

We introduce a spectrum-preservative map PspP_{sp}, incorporating a preservation map generator GG on top of the classifier p^\hat{p}, alleviating the unstable training dynamics of the saliency map. It is also a feedforward network that does not require any additional backpropagation of the classifier p^\hat{p} during estimating the preservation map, resolving the computational burden. The following describes how to design the preservation map generator GG, what objective functions are used, and how to train it with the classifier p^\hat{p}.

Preservation Map Generator.

Given an input spectrum xfC×Lx_{f}\in\mathbb{C}^{C\times L}, we concatenate real and imaginary parts of xfx_{f} into a channel dimension, which the dimension changes to xf2C×Lx_{f}\in\mathbb{R}^{2C\times L}. Then, xfx_{f} is fed into a two-layer transformer encoder to capture the underlying context of spectral representation. The output of the last layer is averaged over the channel dimension to aggregate the channel information and passes through the sigmoid function to make the values between 0 and 1 as follows:

Psp=G(xf)=Sigmoid(Enc(xf)mean).\displaystyle P_{sp}=G(x_{f})=Sigmoid(Enc(x_{f})_{mean}). (5)

Preservation Contrastive Loss.

To train the preservation map generator GG, we introduce a preservation contrastive loss pc\mathcal{L}_{pc}. Assume that an input spectrum xfC×Lx_{f}\in\mathbb{C}^{C\times L}, augmented spectrum xfC×Lx^{\prime}_{f}\in\mathbb{C}^{C\times L}, and corresponding spectrum-preservative map PspP_{sp} are given. We define an information-preserved spectrum x~f+=(𝟏CPspT)xf+(𝟏C(𝟏LPsp)T)xf\tilde{x}^{+}_{f}=(\mathbf{1}_{C}\cdot P_{sp}^{T})\odot x_{f}+(\mathbf{1}_{C}\cdot(\mathbf{1}_{L}-P_{sp})^{T})\odot x^{\prime}_{f} and a spectrum that preserves the inverted preservation map x~f=(𝟏C(𝟏LPsp)T)xf+(𝟏CPspT)xf\tilde{x}^{-}_{f}=(\mathbf{1}_{C}\cdot(\mathbf{1}_{L}-P_{sp})^{T})\odot x_{f}+(\mathbf{1}_{C}\cdot P_{sp}^{T})\odot x^{\prime}_{f}. Then, the classifier p^\hat{p} should predict x~t+\tilde{x}^{+}_{t} better than x~t\tilde{x}^{-}_{t}. Furthermore, if we define a randomly-preserved spectrum x~frnd=(𝟏CnfT)xf+(𝟏C(𝟏Lnf)T)xf\tilde{x}^{rnd}_{f}=(\mathbf{1}_{C}\cdot n_{f}^{T})\odot x_{f}+(\mathbf{1}_{C}\cdot(\mathbf{1}_{L}-n_{f})^{T})\odot x^{\prime}_{f} where nfLn_{f}\in\mathbb{R}^{L} is a random noise sampled from U(0,1)U(0,1), then the prediction score of x~trnd\tilde{x}^{rnd}_{t} by the classifier p^\hat{p} should be in between those of x~t+\tilde{x}^{+}_{t} and x~t\tilde{x}^{-}_{t}. We notate x~t{+,,rnd}\tilde{x}^{\{+,-,rnd\}}_{t} and x~f{+,,rnd}\tilde{x}^{\{+,-,rnd\}}_{f} as Fourier transform pairs of time series and corresponding spectrum. These constraints can be formulated as follows:

p^(y|x~t+)>p^(y|x~trnd)>p^(y|x~t).\hat{p}(y|\tilde{x}^{+}_{t})>\hat{p}(y|\tilde{x}^{rnd}_{t})>\hat{p}(y|\tilde{x}^{-}_{t}). (6)

We then define the corresponding classification loss using the cross-entropy loss for these three predictions, cl+=ce(p^(y|x~t+),y)\mathcal{L}_{cl}^{+}=\mathcal{L}_{ce}(\hat{p}(y|\tilde{x}^{+}_{t}),y), clrnd=ce(p^(y|x~trnd),y)\mathcal{L}_{cl}^{rnd}=\mathcal{L}_{ce}(\hat{p}(y|\tilde{x}^{rnd}_{t}),y), and cl=ce(p^(y|x~t),y)\mathcal{L}_{cl}^{-}=\mathcal{L}_{ce}(\hat{p}(y|\tilde{x}^{-}_{t}),y) where yy is the label of the input time series xtx_{t}, and translate the constraints into an objective function as follows:

pc=max(cl+clrnd+β1,0)+max(cl+cl+β2,0)\mathcal{L}_{pc}=max(\mathcal{L}_{cl}^{+}-\mathcal{L}_{cl}^{rnd}+\beta_{1},0)\\ +max(\mathcal{L}_{cl}^{+}-\mathcal{L}_{cl}^{-}+\beta_{2},0) (7)

where β1\beta_{1} and β2\beta_{2} are hyperparameters satisfying β1<β2\beta_{1}<\beta_{2}.

Model Training and Inference.

We use two objective functions for model training, classification loss cl\mathcal{L}_{cl} for classifier training and the preservation contrastive loss pc\mathcal{L}_{pc} for preservation map generator training. We separate the training procedure, updating the classifier p^\hat{p} by cl\mathcal{L}_{cl} with the preservation map generator GG froze, and then updating the preservation map generator GG by pc\mathcal{L}_{pc} with the classifier p^\hat{p} froze. It can be formulated as follows:

θ^p\displaystyle\hat{\theta}_{p} =argminθpcl(xt|θG,θp)\displaystyle=\operatorname*{argmin}_{\theta_{p}}\mathcal{L}_{cl}(x_{t}|\theta_{G},\theta_{p}) (8)
θ^G\displaystyle\hat{\theta}_{G} =argminθGpc(xt|θG,θp)\displaystyle=\operatorname*{argmin}_{\theta_{G}}\mathcal{L}_{pc}(x_{t}|\theta_{G},\theta_{p})

where θG\theta_{G} and θp\theta_{p} are the parameters of the preservation map generator GG and the classifier p^\hat{p}, respectively. Note that θG\theta_{G} does not descend towards the gradient of cl\mathcal{L}_{cl}. This prevents GG from learning undesirable local minima, such as returning the uniform scalar value across different frequencies or the same map across different samples, in which the preservation map is not adaptive to the input time series but acts as a uniform band-pass filter. Also, the preservation map generator GG is removed during inference, so GG updated by the classification loss might interrupt classifier training.

Experiments

Signal Demodulation (Simulation)

Refer to caption
Figure 4: Finding a set of frequencies to preserve using SimPSI (Spectrum-preservative map) during Frequency masking. The top row shows representative input magnitude spectra from the FSK8 test set. The bottom row shows the corresponding learned preservation map where the ten largest values are marked as diamonds.

Experimental Setting.

We verified if the proposed method improves the performance of existing data augmentation techniques by capturing important spectral regions and preserving them. To do that, we devised a simulation where information is carried on a set of known frequencies. Inspired by the wireless communication domain (Ryu and Choi 2023), we constructed a synthetic dataset by modulating a sequence of random bits into the corresponding frequencies of a signal, called frequency shift keying (FSK), and the task is demodulating it. We used 8 and 32 different frequencies for modulation (FSK8 and FSK32), and each dataset consists of 2,304 training signals, 288 validation signals, and 288 testing signals. We chose ResNet1D (Ramjee et al. 2020) as a baseline network, which was given a 128-length modulated signal and returned a 32-length M-ary (M=8, 32) sequence. We used the Adam optimizer (Kingma and Ba 2015) with the learning rate 10310^{-3}, and the networks were trained for 50 epochs. The training was performed on a single NVIDIA RTX A6000 GPU. Appendix A provides more details about our experimental setup.

Performance Enhancement through SimPSI.

The performance improvement of random augmentations by the proposed method on the FSK32 dataset is described in Table 1. The accuracies of Jittering, Scale-Shift-Jittering, and Frequency masking were increased by 1.5%, 1.4%, and 1.5%, respectively, using the spectrum-preservative map.

Learned Preservation Map.

We then verified whether the learned preservation map genuinely preserves the informative frequency components during augmentation. We displayed learned preservation maps of representative samples from the FSK8 test set in Fig. 4. We could observe eight equally-spaced frequencies that were preserved the most during Frequency masking. It perfectly matches the data generation process since we used those eight frequencies for signal modulation. The other frequencies did not contain information and showed a preservation value of around 0.5, meaning those components barely attributed to achieving Eq. (6).

Human Activity Recognition (HAR)

Model Simulation HAR SleepEDF
AccuracyAccuracy Δ\Delta AccuracyAccuracy Δ\Delta AccuracyAccuracy Δ\Delta
None 94.8 ±\pm 0.1 N/A 94.0 ±\pm 0.8 N/A 80.7 ±\pm 0.1 N/A
Jittering (Um et al. 2017) 94.6 ±\pm 0.1 −0.2 93.8 ±\pm 0.8 −0.2 81.2 ±\pm 0.4 +0.5
+ Random preservation map 96.2 ±\pm 0.3 +1.4 93.6 ±\pm 0.5 −0.4 81.2 ±\pm 0.5 +0.5
+ SimPSI (Magnitude spectrum) 94.9 ±\pm 0.3 +0.1 94.2 ±\pm 0.5 +0.2 81.0 ±\pm 0.3 +0.3
+ SimPSI (Saliency map) 96.2 ±\pm 0.2 +1.4 93.7 ±\pm 0.6 −0.3 81.4 ±\pm 0.4 +0.7
+ SimPSI (Spectrum-preservative map) 96.3 ±\pm 0.1 +1.5 94.1 ±\pm 0.3 +0.1 81.3 ±\pm 0.4 +0.6
Scale-Shift-Jittering (Woo et al. 2022) 94.6 ±\pm 0.2 −0.2 92.8 ±\pm 0.5 −1.2 80.7 ±\pm 0.2 0
+ Random preservation map 95.3 ±\pm 0.1 +0.5 94.0 ±\pm 1.3 0 80.4 ±\pm 1.5 −0.3
+ SimPSI (Magnitude spectrum) 94.9 ±\pm 0.1 +0.1 94.9 ±\pm 1.0 +0.9 81.4 ±\pm 0.7 +0.7
+ SimPSI (Saliency map) 95.6 ±\pm 0.3 +0.8 94.8 ±\pm 0.2 +0.8 81.2 ±\pm 0.5 +0.5
+ SimPSI (Spectrum-preservative map) 96.2 ±\pm 0.1 +1.4 94.8 ±\pm 0.6 +0.8 80.9 ±\pm 1.0 +0.2
Frequency masking (Chen et al. 2023) 94.2 ±\pm 0.3 −0.6 93.9 ±\pm 1.7 −0.1 81.0 ±\pm 0.4 +0.3
+ Random preservation map 96.2 ±\pm 0.3 +1.4 95.0 ±\pm 0.6 +1.0 81.5 ±\pm 0.5 +0.8
+ SimPSI (Magnitude spectrum) 94.9 ±\pm 0.1 +0.1 95.0 ±\pm 0.4 +1.0 80.5 ±\pm 0.4 −0.2
+ SimPSI (Saliency map) 96.3 ±\pm 0.1 +1.5 95.0 ±\pm 0.5 +1.0 80.2 ±\pm 1.4 −0.5
+ SimPSI (Spectrum-preservative map) 96.3 ±\pm 0.2 +1.5 95.0 ±\pm 0.5 +1.0 81.7 ±\pm 0.2 +1.0
Table 1: Performance on Signal Demodulation (Simulation test set), Human Activity Recognition (HAR test set), and Sleep Stage Detection (SleepEDF test set) using different random augmentations with and without SimPSI. Accuracy and its increment from not using augmentation are reported with three different seeds.

Model 3-layer 2-layer 2-layer CNN LSTM Transformer Jittering (Um et al. 2017) 95.1 92.2 94.2 + SimPSI (PspP_{sp}) 95.2 93.7 96.4

Table 2: Performance on Human Activity Recognition using various model architectures with and without SimPSI (Spectrum-preservative map). AUPRC scores are averaged over three different seeds.

Experimental Setting.

In the HAR dataset (Reyes-Ortiz et al. 2012), data is collected by the accelerometer and gyroscope of a smartphone mounted on a waist and sampled at 50 Hz, which aims to classify human activities. Following the data preprocessing in (Eldele et al. 2021), an input time series has a length of 128 and nine channels. The dataset consists of 7,352 training samples and 2,947 test samples labeled with six classes. We chose a 3-layer CNN model for classification, which was used in (Wang, Yan, and Oates 2017; Eldele et al. 2021; Zhang et al. 2022), and additionally included a 2-layer LSTM model and a 2-layer Transformer model for further verifications. We used the Adam optimizer with the learning rate 10310^{-3}, and the networks were trained for 100 epochs. We adhered to configurations in (Eldele et al. 2021), and the training was performed on a single NVIDIA RTX A6000 GPU. Appendix B and C provide more details about our experimental setup.

Refer to caption
Figure 5: Testing accuracy of a 3-layer CNN model trained on the HAR dataset using Permutation with and without SimPSI (Spectrum-preservative map) while varying the maximum number of segments.

Performance Enhancement through SimPSI.

We compared the performance of the model with and without SimPSI to evaluate the impact of SimPSI on recognition accuracy. To inspect its impact thoroughly, we performed experiments on three perspectives: different random augmentations, model architectures, and distortion magnitudes.

The performance increase of data augmentations by SimPSI on the HAR dataset is described in Table 1. We also tested an intuitive method that mixes the original data and its augmented view with a random preservation map sampled from U(0,1)U(0,1). The accuracy of Jittering is enhanced by 0.2% using the magnitude spectrum, while the random preservation map decreases it by 0.4%. The accuracy of Scale-Shift-Jittering is improved by 0.9% using the magnitude spectrum, and Frequency masking is improved by 1.0% using all the proposed preservation maps. Appendix D provides more experimental results, and Appendix E provides the training time cost of the preservation maps.

Using Jittering, we compared three types of networks for time series classification: CNN, LSTM, and Transformer (Table 2). SimPSI increased the area under the precision-recall curve (AUPRC) of a 3-layer CNN by 0.1, while it increased the AUPRC of a 2-layer LSTM by 1.5 and a 2-layer Transformer by 2.2.

Using Permutation, we also tested SimPSI while varying the distortion magnitude of data augmentation. We changed the maximum number of segments of Permutation and compared the accuracy with and without SimPSI (Fig. 5). SimPSI consistently improved the performance of Permutation regardless of its distortion strength, alleviating the performance drop while the number of segments increased. Specifically, comparing the performance at 10 and 12 segments, Permutation alone dropped the accuracy by 0.6, while Permutation with SimPSI dropped it by 0.3.

Sleep Stage Detection (SleepEDF)

Experimental Setting.

We used the SleepEDF dataset (Goldberger et al. 2000) for classifying sleep stages from Electroencephalogram (EEG) signals sampled at 100 Hz. We followed the data preprocessing in (Eldele et al. 2021), where the input has a length of 3,000 and a single channel. The dataset comprises 35,503 training samples and 6,805 test samples labeled with five classes. We chose a 3-layer CNN model for classification, also used in the human activity recognition experiments. We used the Adam optimizer with the learning rate 10310^{-3}, and the networks were trained for 40 epochs. We adhered to configurations in (Eldele et al. 2021) for other details. The training was performed on a single NVIDIA RTX A6000 GPU.

Performance Enhancement through SimPSI.

The performance improvement of data augmentations by SimPSI on the SleepEDF dataset is summarized in Table 1. SimPSI increased the detection accuracy of Jittering by 0.7% using the saliency map, Scale-Shift-Jittering by 0.7% using the magnitude spectrum, and Frequency masking by 1.0% using the spectrum-preservative map. We note that the spectrum-preservative map outperformed the random preservation map regardless of the baseline augmentation techniques, supporting the effectiveness of the information-preserving approach.

Atrial Fibrillation Classification (Waveform)

We used the Waveform dataset (Moody 1983) for classifying rhythm types from ECG recordings of human subjects with atrial fibrillation. It was sampled at 250 Hz, and we followed the data preprocessing step as in (Tonekaboni, Eytan, and Goldenberg 2021). Every input has a length of 2,500 and two channels. The dataset comprises 59,922 training samples and 16,645 test samples labeled with four classes. We chose a 1-dimensional strided CNN with six convolutional layers and a total down-sampling factor 16, proposed in (Tonekaboni, Eytan, and Goldenberg 2021). We used the Adam optimizer with the learning rate 10410^{-4}, and the networks were trained for 8 epochs. We adhered to configurations in (Tonekaboni, Eytan, and Goldenberg 2021) for other details. The training was performed on a single NVIDIA RTX A6000 GPU. Performance enhancement through SimPSI is described in Appendix D.

Model AccuracyAccuracy AUPRCAUPRC Scale-Shift-Jittering 92.0 ±\pm 1.9 62.6 ±\pm 2.3 + SimPSI (PspP_{sp}) 95.2 ±\pm 0.3 64.7 ±\pm 2.0 + SimPSI (PspP_{sp}) w/o pc\mathcal{L}_{pc} 94.9 ±\pm 0.1 63.7 ±\pm 2.1 + SimPSI (PspP_{sp}) w/ joint training 94.5 ±\pm 0.3 62.9 ±\pm 1.9

Table 3: Ablation of SimPSI (Spectrum-preservative map) on Atrial Fibrillation Classification. Accuracy and AUPRC scores are reported with three different seeds.

Ablations

We ablated the proposed method from two perspectives, verifying the impact of the preservation contrastive loss and separate training strategy. We showed the performance of a 6-layer CNN model on the Waveform dataset while a composition of Scaling, Shifting, and Jittering (Woo et al. 2022) was applied (Table 3). Removing the preservation contrastive loss resulted in a 0.3 decrease in accuracy and a 1.0 decrease in AUPRC. Applying joint training of the cross-entropy loss and the preservation contrastive loss made a 0.7 decrease in accuracy and a 1.8 decrease in AUPRC.

Discussions

Refer to caption
Figure 6: SimPSI’s dependency on data domain. The plot shows the increment of classification accuracy of a baseline model after applying each data augmentation technique with SimPSI (Spectrum-preservative map), which is evaluated on Simulation, HAR, and SleepEDF datasets.
Refer to caption
Figure 7: Comparison of spectrum-preservative maps and saliency maps. All the maps are averaged on the HAR, SleepEDF, and Waveform test sets. We used Jittering during training.

SimPSI’s Dependency on Data Domain.

We observed the data augmentation techniques did not generalize well to time series benchmarks (Fig. 1). Specifically, no augmentation increased the accuracy on the Simulation dataset. However, SimPSI resolved the issue in which the information-preserved approaches consistently improved the performance regardless of the tasks (Fig. 6). As a result, SimPSI encouraged augmentation independent of the data domain by preserving core spectral information. Appendix F provides more results using different preservation maps.

Comparison of Preservation Maps.

The averaged spectrum-preservative map for the HAR dataset showed that the few lowest frequencies were preserved better than the higher frequencies. The averaged saliency map showed a similar tendency, where the saliency value was high at the few lowest frequencies and fell to zero as the frequency increased. For the SleepEDF dataset, there were four distinct frequency clusters in the spectrum-preservative map, and we could find corresponding clusters in the saliency map. For the Waveform dataset, unlike the previous two datasets, high-frequency components are preserved more than the lower ones in both preservation maps. These are displayed in Fig. 7.

Conclusion

We presented a simple strategy to preserve spectral information (SimPSI) in time series data augmentation. Our investigation into the simulation task proved that the proposed method preserves the informative frequency components during augmentation. Our experimental results on various time series tasks with different data augmentation techniques illustrated the effectiveness of SimPSI in enhancing the model performance. We believe that SimPSI is a powerful tool to mitigate the data domain dependency of time series data augmentation techniques and improve the model performance in various time series tasks.

Acknowledgments

This work was supported by Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-01381, Development of Causal AI through Video Understanding and Reinforcement Learning, and Its Applications to Real Environments) and partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00184, Development and Study of AI Technologies to Inexpensively Conform to Evolving Policy on Ethics).

References

  • Chen et al. (2023) Chen, M.; Xu, Z.; Zeng, A.; and Xu, Q. 2023. FrAug: Frequency Domain Augmentation for Time Series Forecasting. arXiv preprint arXiv:2302.09292.
  • Eldele et al. (2021) Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C. K.; Li, X.; and Guan, C. 2021. Time-Series Representation Learning via Temporal and Contextual Contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, 2352–2359.
  • Goldberger et al. (2000) Goldberger, A. L.; Amaral, L. A.; Glass, L.; Hausdorff, J. M.; Ivanov, P. C.; Mark, R. G.; Mietus, J. E.; Moody, G. B.; Peng, C.-K.; and Stanley, H. E. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation, 101(23): e215–e220.
  • Gong et al. (2021) Gong, C.; Wang, D.; Li, M.; Chandra, V.; and Liu, Q. 2021. Keepaugment: A simple information-preserving data augmentation approach. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1055–1064.
  • Hendrycks et al. (2020) Hendrycks, D.; Mu, N.; Cubuk, E. D.; Zoph, B.; Gilmer, J.; and Lakshminarayanan, B. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. Proceedings of the International Conference on Learning Representations (ICLR).
  • Iwana and Uchida (2021) Iwana, B. K.; and Uchida, S. 2021. An empirical survey of data augmentation for time series classification with neural networks. Plos one, 16(7): e0254841.
  • Jia et al. (2020) Jia, Z.; Cai, X.; Zheng, G.; Wang, J.; and Lin, Y. 2020. SleepPrintNet: A multivariate multimodal neural network based on physiological time-series for automatic sleep staging. IEEE Transactions on Artificial Intelligence, 1(3): 248–257.
  • Kingma and Ba (2015) Kingma, D.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR). San Diega, CA, USA.
  • Lipton et al. (2016) Lipton, Z. C.; Kale, D. C.; Elkan, C.; and Wetzel, R. C. 2016. Learning to Diagnose with LSTM Recurrent Neural Networks. In Bengio, Y.; and LeCun, Y., eds., 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
  • Liu et al. (2022) Liu, X.; Liang, Y.; Huang, C.; Zheng, Y.; Hooi, B.; and Zimmermann, R. 2022. When do contrastive learning signals help spatio-temporal graph forecasting? In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, 1–12.
  • Moody (1983) Moody, G. 1983. A new method for detecting atrial fibrillation using RR intervals. Proc. Comput. Cardiol., 10: 227–230.
  • Ramjee et al. (2020) Ramjee, S.; Ju, S.; Yang, D.; Liu, X.; El Gamal, A.; and Eldar, Y. C. 2020. Fast Deep Learning for Automatic Modulation Classification. IEEE Transactions on Cognitive Communications and Networking.
  • Reyes-Ortiz et al. (2012) Reyes-Ortiz, J.; Anguita, D.; Ghio, A.; Oneto, L.; and Parra, X. 2012. Human Activity Recognition Using Smartphones. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C54S4K.
  • Ryu and Choi (2023) Ryu, H.; and Choi, J. 2023. EMC2-Net: Joint Equalization and Modulation Classification Based on Constellation Network. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE.
  • Simonyan, Vedaldi, and Zisserman (2013) Simonyan, K.; Vedaldi, A.; and Zisserman, A. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
  • Tonekaboni, Eytan, and Goldenberg (2021) Tonekaboni, S.; Eytan, D.; and Goldenberg, A. 2021. Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding. In International Conference on Learning Representations.
  • Tragardh and Schlegel (2006) Tragardh, E.; and Schlegel, T. T. 2006. High-frequency ECG.
  • Um et al. (2017) Um, T. T.; Pfister, F. M.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; and Kulić, D. 2017. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM international conference on multimodal interaction, 216–220.
  • Wang, Yan, and Oates (2017) Wang, Z.; Yan, W.; and Oates, T. 2017. Time series classification from scratch with deep neural networks: A strong baseline. In 2017 International joint conference on neural networks (IJCNN), 1578–1585. IEEE.
  • Woo et al. (2022) Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; and Hoi, S. 2022. CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting. In International Conference on Learning Representations.
  • Yang and Hong (2022) Yang, L.; and Hong, S. 2022. Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion. In International Conference on Machine Learning, 25038–25054. PMLR.
  • Yao et al. (2017) Yao, S.; Hu, S.; Zhao, Y.; Zhang, A.; and Abdelzaher, T. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th international conference on world wide web, 351–360.
  • Yoon et al. (2022) Yoon, H. S.; Yoon, E.; Harvill, J.; Yoon, S.; Hasegawa-Johnson, M.; and Yoo, C. 2022. SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation. In Findings of the Association for Computational Linguistics: EMNLP 2022, 1493–1502. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
  • Yoon, Kim, and Park (2021) Yoon, S.; Kim, G.; and Park, K. 2021. SSMix: Saliency-Based Span Mixup for Text Classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 3225–3234. Online: Association for Computational Linguistics.
  • Zhang et al. (2022) Zhang, X.; Zhao, Z.; Tsiligkaridis, T.; and Zitnik, M. 2022. Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.

Appendix A A. Details of Simulation Dataset

This section gives a detailed explanation of constructing the Simulation dataset for signal demodulation. We modulated the signal using frequency shift keying (FSK) to assign a sequence of bits to a sequence of frequencies of a signal. We used 8 and 32 different frequencies for signal modulation (i.e., FSK8, FSK32), where frequencies were separated by 16 Hz for FSK8 and 4 Hz for FSK32, while the sample rate was 128 Hz. The M-ary (M = 8, 32) sequence of random bits had a length of 32, and the samples per symbol rate was 4, which made the signal length 128. After signal modulation, we included an additive white Gaussian noise (AWGN) channel, in which the signal-to-noise ratio (SNR) varied from 10 to 28 dB. The signal was then normalized to unit power. Data is generated via MATLAB, and we adapted the instructions given by (Ryu and Choi 2023).

Appendix B B. Data Augmentations

We evaluated the effectiveness of the proposed SimPSI on seven random data augmentation techniques: Jittering (Um et al. 2017), Magnitude warping (Um et al. 2017), Dropout (Yang and Hong 2022), Time warping (Um et al. 2017), Permutation (Um et al. 2017), Frequency masking (Chen et al. 2023), and composition of Scaling, Shifting, and Jittering (Woo et al. 2022). We designed each technique to be applied or not by the probability p=0.5p=0.5 to incorporate original data in the training set. The following defines each technique and specifies parameter values. We notate an input time series as xtC×Lx_{t}\in\mathbb{C}^{C\times L} and an augmented one as xtC×Lx^{\prime}_{t}\in\mathbb{C}^{C\times L} where CC and LL denotes the number of channels and length of a time series. Fig. 8 to 16 visualize the original data and the corresponding augmented data with varying data augmentation techniques.

Scaling. An input time series is scaled by a random scalar ϵ\epsilon, sampled from a distribution N(1,0.5)N(1,0.5), where the augmented time series is xt=ϵxtx^{\prime}_{t}=\epsilon x_{t}.

Shifting. An input time series is shifted by a random scalar ϵ\epsilon, sampled from a distribution N(0,0.5)N(0,0.5), where the augmented time series is xt=xt+ϵx^{\prime}_{t}=x_{t}+\epsilon.

Jittering. Gaussian noise ntC×Ln_{t}\in\mathbb{R}^{C\times L}, sampled from a distribution N(0,0.5)N(0,0.5), is added to each time indices, where the augmented time series is xt=xt+ntx^{\prime}_{t}=x_{t}+n_{t}.

Magnitude warping. Random cubic polynomial ptC×Lp_{t}\in\mathbb{R}^{C\times L} is elementwise multiplied with an input time series, where the augmented time series is xt=xtptx^{\prime}_{t}=x_{t}\odot p_{t}.

Dropout. An input time series is masked randomly by the probability p=0.2p=0.2.

Time warping. Random cubic polynomial ptC×Lp_{t}\in\mathbb{R}^{C\times L} distorts the time interval of an input time series.

Permutation. An input time series is randomly partitioned and scrambled, where the maximum number of segments is 1010.

Frequency masking. An input time series is first transformed into the frequency domain. An input spectrum is masked randomly by the probability p=0.2p=0.2, then transformed back to the time domain.

Appendix C C. Model Architectures

We chose three baseline models, 3-layer CNN, 2-layer LSTM, and 2-layer Transformer, to assess the effectiveness of SimPSI on different model architectures for the Human Activity Recognition task. A design of the CNN model adhered to (Eldele et al. 2021), so we provide a detailed configuration of the LSTM and Transformer model. The LSTM model has two layers with a hidden dimension of 100, followed by a single linear layer for classification. The Transformer model has two transformer encoder layers, where each layer consists of a number of heads 2, dimension of feedforward network 256, and dropout with probability p=0.1p=0.1. It is followed by a single linear layer for classification purposes.

Refer to caption
Figure 8: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, without any data augmentation. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 9: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Scaling. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 10: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Shifting. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 11: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Jittering. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 12: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Magnitude warping. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 13: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Dropout. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 14: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Time warping. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 15: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Permutation. Each color denotes a channel, and three channels are shown.
Refer to caption
Figure 16: Visualization of eight representative examples from the HAR dataset in (a) the time and (b) frequency domain, augmented by Frequency masking. Each color denotes a channel, and three channels are shown.

Appendix D D. Additional Results

D.1 Signal Demodulation

We provided additional results for performance enhancement through SimPSI on the Simulation dataset (Table 4). The accuracy of Dropout is enhanced by 0.9% using the spectrum-preservative map, while the random preservation map enhances it by 0.3%. The accuracy of Time warping is increased by 0.8% using the spectrum-preservative map, while the random preservation map decreases it by 0.7%. On the other hand, Permutation is not an appropriate augmentation technique for signal demodulation task that requires sequential decoding of the modulated signal, which significantly degrades the accuracy by 12.1%. Surprisingly, the accuracy of Permutation is improved by 1.4% using the spectrum-preservative map, while the random preservation map decreases it by 1.9%. It highlights the potential of SimPSI as a simple yet strong strategy to improve the efficacy of any time series data augmentation technique. The magnitude spectrum and saliency map show either marginal improvement or performance drop.

Table 4: Performance on Signal Demodulation (Simulation test set) using different random augmentations with and without SimPSI. Accuracy and its increment from not using augmentation are reported with three different seeds.

Model AccuracyAccuracy Δ\Delta None 94.8 ±\pm 0.1 N/A Dropout (Yang and Hong 2022) 93.4 ±\pm 0.4 −1.4 + Random preservation map 95.1 ±\pm 0.5 +0.3 + SimPSI (Magnitude spectrum) 94.9 ±\pm 0.1 +0.1 + SimPSI (Saliency map) 93.0 ±\pm 0.4 −1.8 + SimPSI (Spectrum-preservative map) 95.7 ±\pm 0.2 +0.9 Time warping (Um et al. 2017) 90.9 ±\pm 1.0 −3.9 + Random preservation map 94.1 ±\pm 0.3 −0.7 + SimPSI (Magnitude spectrum) 94.9 ±\pm 0.2 +0.1 + SimPSI (Saliency map) 94.0 ±\pm 0.1 −0.8 + SimPSI (Spectrum-preservative map) 95.6 ±\pm 0.1 +0.8 Permutation (Um et al. 2017) 82.7 ±\pm 2.9 −12.1 + Random preservation map 92.9 ±\pm 0.5 −1.9 + SimPSI (Magnitude spectrum) 94.8 ±\pm 0.2 0 + SimPSI (Saliency map) 93.5 ±\pm 0.6 −1.3 + SimPSI (Spectrum-preservative map) 96.2 ±\pm 0.2 +1.4

D.2 Human Activity Recognition

The performance improvement of data augmentations by SimPSI on the HAR dataset is summarized in Table 5. The accuracy of Time warping is enhanced by 1.0% using the spectrum-preservative map, increased by 0.7% using the magnitude spectrum and saliency map, while the random preservation map enhances it by 0.2%. The accuracy of Permutation is improved by 0.2% using the spectrum-preservative map, while the random preservation map decreases it by 0.8%.

Table 5: Performance on Human Activity Recognition (HAR test set) using different random augmentations with and without SimPSI. Accuracy and its increment from not using augmentation are reported with three different seeds.

Model AccuracyAccuracy Δ\Delta None 94.0 ±\pm 0.8 N/A Time warping (Um et al. 2017) 93.5 ±\pm 0.5 −0.5 + Random preservation map 94.2 ±\pm 0.2 +0.2 + SimPSI (Magnitude spectrum) 94.7 ±\pm 0.9 +0.7 + SimPSI (Saliency map) 94.7 ±\pm 1.1 +0.7 + SimPSI (Spectrum-preservative map) 95.0 ±\pm 0.4 +1.0 Permutation (Um et al. 2017) 93.7 ±\pm 0.6 −0.3 + Random preservation map 93.2 ±\pm 0.4 −0.8 + SimPSI (Magnitude spectrum) 93.6 ±\pm 1.7 −0.4 + SimPSI (Saliency map) 93.4 ±\pm 0.3 −0.6 + SimPSI (Spectrum-preservative map) 94.2 ±\pm 0.4 +0.2

D.3 Sleep Stage Detection

The performance increment of Dropout by SimPSI on the SleepEDF dataset is summarized in Table 6. The accuracy of Dropout is enhanced by 0.9% using the spectrum-preservative map, while the random preservation map enhances it by 0.4%.

Table 6: Performance on Sleep Stage Detection (SleepEDF test set) with and without SimPSI. Accuracy and its increment from not using augmentation are reported with three different seeds.

Model AccuracyAccuracy Δ\Delta None 80.7 ±\pm 0.1 N/A Dropout (Yang and Hong 2022) 80.8 ±\pm 0.8 +0.1 + Random preservation map 81.1 ±\pm 0.5 +0.4 + SimPSI (Magnitude spectrum) 80.4 ±\pm 0.9 −0.3 + SimPSI (Saliency map) 81.1 ±\pm 0.8 +0.4 + SimPSI (Spectrum-preservative map) 81.6 ±\pm 0.2 +0.9

D.4 Atrial Fibrillation Classification

The performance enhancement of data augmentations by SimPSI on the Waveform dataset is summarized in Table 7. The accuracy of Jittering is improved by 0.4% using the saliency map, while the random preservation map shows the same accuracy as not using augmentation. For Magnitude warping, none of the preservation maps improve the classification accuracy. We leave this deficiency as a future work to be resolved, which might be incurred by the wrong SimPSI hyperparameters choice since we did not focus on carefully choosing those. The accuracy of a composition of Scaling, Shifting, and Jittering is enhanced by 0.6% using the saliency map and 0.5% using the spectrum-preservative map, while the random preservation map decreases it by 0.2%.

Table 7: Performance on Atrial Fibrillation Classification (Waveform test set) using different random augmentations with and without SimPSI. Accuracy and its increment from not using augmentation are reported with three different seeds.

Model AccuracyAccuracy Δ\Delta None 94.7 ±\pm 0.2 N/A Jittering (Um et al. 2017) 94.2 ±\pm 0.1 −0.5 + Random preservation map 94.7 ±\pm 0.2 0 + SimPSI (Magnitude spectrum) 94.6 ±\pm 0.4 −0.1 + SimPSI (Saliency map) 95.1 ±\pm 0.1 +0.4 + SimPSI (Spectrum-preservative map) 94.7 ±\pm 0.1 0 Magnitude warping (Um et al. 2017) 94.4 ±\pm 0.3 −0.3 + Random preservation map 94.3 ±\pm 0.3 −0.4 + SimPSI (Magnitude spectrum) 94.2 ±\pm 0.3 −0.5 + SimPSI (Saliency map) 94.4 ±\pm 0.5 −0.3 + SimPSI (Spectrum-preservative map) 94.7 ±\pm 0.4 0 Scale-Shift-Jittering (Woo et al. 2022) 92.0 ±\pm 1.9 −2.7 + Random preservation map 94.5 ±\pm 0.4 −0.2 + SimPSI (Magnitude spectrum) 94.9 ±\pm 0.6 +0.2 + SimPSI (Saliency map) 95.3 ±\pm 0.1 +0.6 + SimPSI (Spectrum-preservative map) 95.2 ±\pm 0.3 +0.5

Appendix E E. Training Time Cost

In this section, we compared the training time costs of different preservation maps applied to a composition of Scaling, Shifting, and Jittering on the HAR and SleepEDF datasets (Table 8). The magnitude spectrum requires a similar training time to the random preservation map, while the saliency map requires more than two times the training time than the magnitude spectrum. The spectrum-preservative map partially alleviates the computational burden but is still costly compared to the magnitude spectrum. We leave this limitation to further reduce the training time as future work.

Table 8: Training time on Human Activity Recognition (HAR train set) and Sleep Stage Detection (SleepEDF train set) with and without SimPSI. The total training times (second) are reported with three different seeds.

Model HAR SleepEDF None 57 85 Scale-Shift-Jittering (Woo et al. 2022) 59 89 + Random preservation map 70 159 + SimPSI (Magnitude spectrum) 74 134 + SimPSI (Saliency map) 245 391 + SimPSI (Spectrum-preservative map) 162 364

Appendix F F. Data Domain Dependency

This section additionally provided data domain dependency of the magnitude spectrum and saliency map of SimPSI, as well as that of the random preservation map. The magnitude spectrum shows an overall improvement in the Simulation dataset, but the increment is marginal and does not apply to the HAR and SleepEDF datasets. The saliency map enhances performance by a large amount in some data augmentation techniques but highly depends on the data domain. The random preservation map also shows a high data domain dependency. The results are summarized in Fig. 17.

Refer to caption
Figure 17: SimPSI’s dependency on data domain. Each plot shows the increment of classification accuracy of a baseline model after applying each data augmentation technique with different preservation maps. The top one corresponds to a random preservation map, the middle one corresponds to SimPSI (Magnitude spectrum), and the bottom one corresponds to SimPSI (Saliency map).