Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data

Wenkai Li Tsinghua UniversityBeijingChina [email protected] , Cheng Feng Siemens AGBeijingChina [email protected] , Ting Chen Tsinghua UniversityBeijingChina [email protected] and Jun Zhu Tsinghua UniversityBeijingChina [email protected]

Abstract.

Time series anomaly detection (TSAD) is an important data mining task with numerous applications in the IoT era. In recent years, a large number of deep neural network-based methods have been proposed, demonstrating significantly better performance than conventional methods on addressing challenging TSAD problems in a variety of areas. Nevertheless, these deep TSAD methods typically rely on a clean training dataset that is not polluted by anomalies to learn the ”normal profile” of the underlying dynamics. This requirement is nontrivial since a clean dataset can hardly be provided in practice. Moreover, without the awareness of their robustness, blindly applying deep TSAD methods with potentially contaminated training data can possibly incur significant performance degradation in the detection phase. In this work, to tackle this important challenge, we firstly investigate the robustness of commonly used deep TSAD methods with contaminated training data which provides a guideline for applying these methods when the provided training data are not guaranteed to be anomaly-free. Furthermore, we propose a model-agnostic method which can effectively improve the robustness of learning mainstream deep TSAD models with potentially contaminated data. Experiment results show that our method can consistently prevent or mitigate performance degradation of mainstream deep TSAD models on widely used benchmark datasets.

1. Introduction

Recent advances in sensor technology allow us to collect a large amount of time series data in a variety of areas. With numerous applications in intrusion detection, system health monitoring and predictive maintenance, time series anomaly detection (TSAD) which aims to find temporal signals that deviate significantly from other observations becomes an important data mining task. With growing complexity of collected data, conventional TSAD models such as dynamical state space models (Ding et al., 2020), autoregressive models (Günnemann et al., 2014; Melnyk et al., 2016), rule-based models (Feng et al., 2019; Dhaou et al., 2021) generally suffer from sub-optimal performance due to their limited capacity to capture nonlinear system dynamics in high dimensional space.

In recent years, deep neural networks (DNNs) are widely used, demonstrating significantly better performance than conventional anomaly detection models on addressing challenging TSAD problems in a variety of real-world applications. To date, the mainstream deep TSAD models can be categorized into two types: prediction-based and reconstruction-based. Specifically, prediction-based TSAD first learns a predictive model such as recurrent neural networks (Hundman et al., 2018; Tariq et al., 2019; Feng et al., 2017; Wu et al., 2020) and convolutional neural networks (Wen and Keyes, 2019; He and Zhao, 2019) to predict signals for future time steps. Then the predicted signals are compared with observed ones to generate a residual error. An anomaly is detected if the residual error exceeds a threshold. Reconstruction-based TSAD takes a similar approach except that it learns a reconstruction model such as autoencoders (Audibert et al., 2020; Kieu et al., 2019; Malhotra et al., 2016; Park et al., 2018; Zhang et al., 2019) and transformers (Tuli et al., 2022; Meng et al., 2019) to compress temporal signals to lower dimensional embeddings and reconstruct them afterwards. The reconstruction error is used to detect anomalies. There are also other deep anomaly detection models such as density-based (An and Cho, 2015; Su et al., 2019; Zong et al., 2018; Xu et al., 2018; Feng and Tian, 2021; Li et al., 2021), generative adversarial network-based (Schlegl et al., 2017, 2019; Li et al., 2018, 2019) and one class-based (Ruff et al., 2018; Wu et al., 2019). However, they are either not specifically designed for time series data or closely related to the mainstream models since their underlying backbone is mostly a reconstruction or prediction model.

Due to scarcity of labelled anomalies in general, most deep TSAD models are semi-supervised, meaning that they require a clean training dataset that is not polluted by any anomalies to learn the normal profile of the temporal dynamics within the data. However, such a clean dataset rarely exists in practice. In real applications, the modellers are often provided with datasets that are likely to be polluted with unknown anomalies. Meanwhile, DNNs are often trained with an over-parameterized regime, meaning that the number of their parameters can exceed the size of the training data. As a result, deep TSAD models have the capacity to overfit to any given training data regardless of the unknown ratio of anomalies, leading to serious performance degradation in the detection phase. This problem is clearly demonstrated in our experiments using two representative deep TSAD models (a prediction-based and a reconstruction-based) on various benchmark datasets.

Furthermore, we propose a simple yet effective model-agnostic method which can significantly improve the robustness of deep TSAD models learned from contaminated training data. Specifically, our method is to discard samples with consistently large training losses or strong oscillation on loss updates during the early phase of model training. Our sample filtering method is justified by the ”memorization effect” (Arpit et al., 2017) of DNNs, which means that DNNs tend to prioritize learning simple patterns that are shared by multiple samples first and then gradually memorize noisy samples during training. In our experiments, we show that our method can consistently prevent or mitigate performance degradation of mainstream deep TSAD models on various benchmark datasets.

2. Methodology

Our model-agnostic sample filtering method for learning robust deep TSAD models is based on the observation that DNNs start to learn from common patterns in initial phases and gradually adapt to noisy samples during training (Arpit et al., 2017). Thus when trained on contaminated data, DNNs will prioritize learning from normal samples before over-fitting to the whole dataset. Intuitively, when DNNs update their parameters using backpropagation, the gradient is averaged over all training data. As a result, the overall gradient is more likely to be dominated by the normal samples during the initial phase. For this reason, the training losses on normal samples are more likely to decrease steadily, but stay large or oscillate strongly on abnormal samples over the training epochs in the initial phase.

2.1. Filtering Plausible Abnormal Samples

Based on the above observation, we propose two metrics by which we filter out anomalous training samples using a few trial epochs. Specifically, let $N$ be the total number of trial epochs and $\mathbf{x}$ be a training datum, we use $L_{\mathbf{x}}^{i}$ to denote the loss of ${\mathbf{x}}$ at the end of $i$ th epoch and $\Delta_{\mathbf{x}}^{i}=L_{\mathbf{x}}^{i}-L_{\mathbf{x}}^{i-1}$ to denote the update of loss at the $i$ th epoch. The following two metrics are calculated for each datum at the end of the $N$ th trial epoch:

(1)		$\displaystyle m_{\mathbf{x}}$	$\displaystyle=$	$\displaystyle\mu(L_{\mathbf{x}})\quad\text{where}\ L_{\mathbf{x}}=[L_{\mathbf{x}}^{1},\dots,L_{\mathbf{x}}^{N}]$
(2)		$\displaystyle v_{\mathbf{x}}$	$\displaystyle=$	$\displaystyle\sigma(\Delta_{\mathbf{x}})\quad\text{where}\ \Delta_{\mathbf{x}}=[\Delta_{\mathbf{x}}^{1},\dots,\Delta_{\mathbf{x}}^{N}]$

where $m_{\mathbf{x}}$ is the mean of losses for $\mathbf{x}$ during the trial epochs, $v_{\mathbf{x}}$ is the standard deviation of loss updates for $\mathbf{x}$ during the trial epochs. Concretely, $m_{\mathbf{x}}$ is used to filter out samples with larger training losses; $v_{\mathbf{x}}$ is used to filter out samples with stronger oscillations on per-epoch loss updates indicating that the gradient direction of the datum is inconsistent with normal samples. A higher value of either metric indicates the sample is likely to be an anomaly.

2.2. Discarding Plausible Abnormal Samples

Let $\tau$ to be the user-defined upper bound of anomaly ratio in the potentially contaminated dataset, we let $S_{m}=\{\mathbf{x}\mid m_{\mathbf{x}}>Q_{m}(1-\tau)\}$ and $S_{v}=\{\mathbf{x}\mid v_{\mathbf{x}}>Q_{v}(1-\tau)\}$ where $Q_{m}(1-\tau)$ and $Q_{v}(1-\tau)$ are the $1-\tau$ quantile of $m_{\mathbf{x}}$ and $v_{\mathbf{x}}$ in the dataset, respectively. We discard all samples in $S_{m}\cup S_{v}$ . After discarding the plausible abnormal samples, we retrain the model on the remaining data from scratch. Our sample discarding strategy seems rather aggressive, however, as the anomaly ratio in most cases is a small value and time series data oftentimes enjoy a high degree of redundancy due to repeated patterns caused by seasonality, discarding a small portion of normal data is unlikely to cause performance degradation of the trained model.

3. Experiments

The goal of our experiments is two-fold. Firstly, we investigate the robustness of various deep TSAD models with contaminated training data. Secondly, we show whether our proposed method can effectively improve the robustness of those deep TSAD models.

3.1. Benchmark Deep TSAD Models

We first define the models that we will analyze. Since it is impossible to cover all deep TSAD models, we first select one model in the reconstruction-based class and another in the prediction-based class as the representative mainstream deep TSAD models. To make our experiments more comprehensive, we also select one density-based model (DAGMM (Zong et al., 2018)) and one one-class classification deep anomaly detection model (DeepSVDD (Ruff et al., 2018)). Although these two models are not designed for time series data, they are frequently used as baselines in the deep TSAD literature. More details of selected models are given as follows:

•

LSTMAE: LSTMAE is a recurrent autoencoder implemented by Long Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) networks. It is the ”vanilla” version of reconstruction-based deep TSAD model that has been the backbone of many more complicated models (Kieu et al., 2019; Malhotra et al., 2016; Park et al., 2018) in this class.
•

Seq2SeqPred: Seq2SeqPred is a prediction-based model implemented as a sequence to sequence LSTM (Sutskever et al., 2014). It is a commonly used deep time series prediction model for anomaly detection.
•

DAGMM: DAGMM is a density-based anomaly detection model using a deep generative model that assumes the Gaussian mixture prior in the latent space to estimate the likelihood of input samples. It is commonly used as a baseline model in the deep TSAD literature.
•

DeepSVDD: DeepSVDD is an one-class classification deep anomaly detection model that is also commonly used as a baseline in the deep TSAD literature .

3.2. Benchmark Datasets

We consider the following four commonly used benchmark datasets for TSAD:

•

SWAT (Mathur and Tippenhauer, 2016): This dataset is collected from a testbed which is a scaled down version of a real-world industrial water treatment plant producing filtered water. It consists of operation records collected per second for 11 days with the first 7 days running normally.
•

WADI (Ahmed et al., 2017): This dataset is collected from a testbed which represents a scaled down version of an urban water distribution system. The dataset contains records of each second in 16 days with the former 14 days running normally. The data in the last day is ignored as they have different distributions with previous 15 days due to change of operational mode.
•

PUMP (Feng and Tian, 2021): This dataset is collected from a water pump system of a small town. It contains data points of every minute in 5 months.
•

PSM (Abdulaal et al., 2021): This dataset, collected by eBay, consists of recorded server machine metrics per minute for 21 weeks, in which 13 weeks are for training and 8 weeks are for testing.

All the above datasets consist of a training set which is anomaly-free and a test set which contains timely distributed anomalies. To conduct our experiments with contaminated training data, we pollute these training sets by randomly injecting anomalous windows sampled from test data. Specifically, we pollute training sets at different anomaly ratios from 0% to 20%. The performance at the ratio 0% tells the performance of models when using clean training data. As for the range upper bound, it is sufficiently high since anomaly ratios of real-world datasets seldomly exceed 15%. Concretely, we select $[0\%,1\%,2\%,3\%,4\%,6\%,8\%,10\%,13\%,16\%,20\%]$ , 11 anomaly ratios in total.

3.3. Evaluation Metrics

To date, AUC-ROC score, the best F1 score and the best point-adjusted F1 score are the most commonly adopted evaluation metrics for TSAD. However, (Kim et al., 2021) reveals that the best point-adjusted F1 score has a great possibility of overestimating the detection performance and even a random anomaly score can easily turn into a state-of-the-art TSAD method. Thus we use AUC-ROC score and the best F1 score as our evaluation metrics. Specifically, for the best F1 score, it is reported as the best F1 score by searching all possible anomaly thresholds.

3.4. Baseline Training Methods

To demonstrate the benefits of our proposed sample filtering method in improving the robustness of trained deep TSAD models with contaminated data, we compare our method with the following three baseline training methods:

•

Vanilla method: We just train the models with the contaminated data without any adjustment. The objective of comparing with this baseline is two-fold: investigating the robustness of the benchmark models and show the superiority of our proposed method.
•

$m_{\mathbf{x}}$ filtering method: We only discard samples with larger mean training losses in the trial epochs. That is to say, we only discard samples in $S_{m}$ as defined in Section 2.2.
•

$v_{\mathbf{x}}$ filtering method: We only discard samples with larger oscillations on per-epoch loss updates in the trial epochs. That is to say, we only discard samples in $S_{v}$ as defined in Section 2.2.

3.5. Experiment Results

We have 4 benchmark deep TSAD models, 4 benchmark datasets and 4 training methods for each model. To produce convincing results, each benchmark model under a certain training method is trained on each training set over 11 selected contamination ratios with 5 times’ repetition, and then we report the average anomaly detection performance on the test sets with standard deviation. For all experiments except the vanilla method, the value of $\tau$ (the upper bound of anomaly ratio) is fixed at $0.2$ , $N$ (the number of trial epochs) is fixed at $10$ .

Regarding model hyperparameters, the length of reconstruction and prediction window for LSTMAE and Seq2SeqPred is fixed at 12 for SWAT and WADI, 5 for PUMP and 20 for PSM respectively. After discarding plausible abnormal samples at the end of the trial epochs, we split the filtered training data according to the ratio 4:1, among which 1 out of 5 data is used for validation. Randomised grid search is used to tune other model-specific hyperparameters for best validation loss.

3.5.1. Results for Reconstruction-based and Prediction-based Models

Table 1. Coverage of injected anomalies by the discarded samples under 6%, 13% and 20% contamination ratios

Model

Filter

SWAT

WADI

PUMP

PSM

13%

20%

13%

20%

13%

20%

13%

20%

LSTMAE

m_{\mathbf{x}}

filtering

96.33

(0.46)

61.83

(7.83)

43.38

(1.30)

41.57

(2.15)

27.38

(3.21)

20.88

(1.13)

100.00

(0.00)

96.79

(3.12)

71.13

(1.46)

90.41

(5.16)

81.09

(4.52)

69.22

(5.22)

v_{\mathbf{x}}

filtering

89.12

(2.44)

88.39

(1.21)

84.65

(5.74)

78.86

(7.48)

84.10

(2.85)

86.59

(2.10)

99.84

(0.17)

99.83

(0.13)

99.68

(0.20)

68.84

(5.35)

73.10

(4.13)

68.99

(7.53)

ours

97.90

(0.92)

95.88

(1.09)

94.16

(5.57)

82.11

(4.70)

84.49

(2.84)

87.80

(1.79)

100.00

(0.00)

100.00

(0.00)

100.00

(0.00)

96.25

(2.67)

90.03

(4.01)

83.95

(1.91)

Seq2Seq- Pred

m_{\mathbf{x}}

filtering

82.80

(4.00)

53.38

(2.26)

43.47

(1.82)

44.22

(1.99)

27.79

(1.18)

20.44

(0.72)

100.00

(0.00)

88.64

(9.05)

62.42

(3.85)

89.43

(5.70)

71.10

(3.15)

66.78

(2.56)

v_{\mathbf{x}}

filtering

92.06

(1.86)

91.00

(1.93)

92.32

(2.39)

75.78

(10.34)

76.54

(4.13)

77.96

(3.09)

99.98

(0.03)

99.99

(0.02)

99.95

(0.04)

67.31

(6.83)

77.67

(3.10)

72.95

(3.19)

ours

98.20

(1.11)

97.79

(1.13)

97.34

(0.95)

81.81

(9.05)

77.57

(4.18)

81.25

(4.17)

100.00

(0.00)

100.00

(0.00)

100.00

(0.00)

95.43

(2.64)

87.80

(2.98)

82.55

(1.23)

Refer to caption — Figure 1. Performance of LSTMAE with contaminated training data under different anomaly ratios

The performance of LSTMAE and Seq2SeqPred with contaminated training data under different anomaly ratios are given in Fig. 1 and Fig. 2 respectively. Notably, the AUC and best-F1 curves for the vanilla method decrease sharply as the contamination ratio increasing from 0%, indicating the poor robustness of the original LSTMAE and Seq2SeqLSTM models with contaminated training data. As a result, blindly applying these deep TSAD models with potentially contaminated training data has a great possibility of encountering unexpected serious performance degradation in the detection phase.

Furthermore, after applying the filtering methods, we find that the robustness of both models can be significantly improved. Specifically, the AUC and best-F1 curves of $m_{\mathbf{x}}$ filtering, $v_{\mathbf{x}}$ filtering and our method all lie above the one from the vanilla method by a clear margin. The comparison between $m_{\mathbf{x}}$ filtering and $v_{\mathbf{x}}$ filtering empirically justify our motivation when designing these two metrics. $v_{\mathbf{x}}$ filtering achieves better performance on SWAT and PUMP while $m_{\mathbf{x}}$ filtering performs better in PSM. In Tab. 1, we show the coverage (avg. $\pm$ standard deviation) of injected anomalies by the discarded samples of different filtering methods under 6%, 13% and 20% contamination ratios. It can be seen that using $v_{\mathbf{x}}$ can filter more abnormal samples than $m_{\mathbf{x}}$ in SWAT, WADI and PUMP, whilst $m_{\mathbf{x}}$ in many cases filter out more abnormal samples in PSM. By leveraging the advantages of both metrics, our method constantly achieves the best performance in all scenarios.

It is worthy to note that even discarding more than $20\%$ of data, our method achieves a similar performance with the vanilla method under zero contamination ratio. This means that dropping a proportion of data by our method will be unlikely to damage the performance of original reconstruction-based or prediction-based deep TSAD model with a much lower contamination ratio.

3.5.2. Results for Density-based and One-Class Classification Models

As illustrated in Fig. 3, compared with LSTMAE and Seq2SeqPred, DAGMM is not so sensitive to contaminated training data. This is because DAGMM assumes that lower embeddings of training samples are governed by a mixture of Gaussian distributions, which is a strong prior that limits the capacity of the model and further prevents over-fitting to abnormal samples. DAGMM tends to underfit the samples because of its limited model capacity. Thus, we believe that when the model underfits, the loss traces are no longer reliable to distinguish the abnormal samples and our sample filtering method does not work in these cases.

Instead of learning the distribution of normal samples, DeepSVDD aims to squeeze all training samples into a hypersphere whose radius is as small as possible. Since all samples, no matter normal ones or abnormal ones, are pushed to the center point, their loss traces smoothly approach 0 which are uninformative. Thus our sample filtering method also does not work for one-class classification models as illustrated in Fig. 4.

4. Discussion and Conclusion

We study the robustness of deep TSAD models with contaminated training data, a problem of significant practical meaning. We show that the mainstream deep TSAD models have poor robustness to contaminated training data. Furthermore, we show that our proposed sample filtering method can effectively prevent or mitigate performance degradation of mainstream deep TSAD models under contaminated training data.

Robustness of anomaly detection models with contaminated training data have been studied before. For example, in (Zhang et al., 2021), a method called ELITE is proposed which uses a small number of labeled anomalies to infer the anomalies hidden in the training samples. Unlike ELITE, our method does not require any labeled samples to filter out anomalies. More similar work to ours are (Xia et al., 2015) and (Du et al., 2021) where samples with larger losses are filtered out during model training to enhance the robustness of anomaly detection models. In this work, we show that filtering out plausible abnormal samples with stronger oscillations on per-epoch loss updates in the initial phase of model training can further enhance the robustness of deep TSAD models significantly.

References

(1)
Abdulaal et al. (2021) Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. 2021. Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery; Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 2485–2494. https://doi.org/10.1145/3447548.3467174
Ahmed et al. (2017) Chuadhry Mujeeb Ahmed, Venkata Reddy Palleti, and Aditya P. Mathur. 2017. WADI: A Water Distribution Testbed for Research in the Design of Secure Cyber Physical Systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks (Pittsburgh, Pennsylvania) (CySWATER ’17). Association for Computing Machinery, New York, NY, USA, 25–28. https://doi.org/10.1145/3055366.3055375
An and Cho (2015) Jinwon An and Sungzoon Cho. 2015. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE 2, 1 (2015), 1–18.
Arpit et al. (2017) Devansh Arpit, Stanisław Jastrz\kebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In International conference on machine learning. PMLR, 233–242.
Audibert et al. (2020) Julien Audibert, Pietro Michiardi, Frédéric Guyard, Sébastien Marti, and Maria A Zuluaga. 2020. USAD: unsupervised anomaly detection on multivariate time series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3395–3404.
Dhaou et al. (2021) Amin Dhaou, Antoine Bertoncello, Sébastien Gourvénec, Josselin Garnier, and Erwan Le Pennec. 2021. Causal and Interpretable Rules for Time Series Analysis. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2764–2772.
Ding et al. (2020) Derui Ding, Qing-Long Han, Xiaohua Ge, and Jun Wang. 2020. Secure state estimation and control of cyber-physical systems: A survey. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51, 1 (2020), 176–190.
Du et al. (2021) Bowen Du, Xuanxuan Sun, Junchen Ye, Ke Cheng, Jingyuan Wang, and Leilei Sun. 2021. GAN-Based Anomaly Detection for Multivariate Time Series Using Polluted Training Set. IEEE Transactions on Knowledge and Data Engineering (2021).
Feng et al. (2017) Cheng Feng, Tingting Li, and Deeph Chana. 2017. Multi-level anomaly detection in industrial control systems via package signatures and LSTM networks. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 261–272.
Feng et al. (2019) Cheng Feng, Venkata Reddy Palleti, Aditya Mathur, and Deeph Chana. 2019. A Systematic Framework to Generate Invariants for Anomaly Detection in Industrial Control Systems. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society.
Feng and Tian (2021) Cheng Feng and Pengwei Tian. 2021. Time series anomaly detection for cyber-physical systems via neural system identification and bayesian filtering. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2858–2867.
Günnemann et al. (2014) Nikou Günnemann, Stephan Günnemann, and Christos Faloutsos. 2014. Robust multivariate autoregression for anomaly detection in dynamic product ratings. In Proceedings of the 23rd international conference on World wide web. 361–372.
He and Zhao (2019) Yangdong He and Jiabao Zhao. 2019. Temporal convolutional networks for anomaly detection in time series. In Journal of Physics: Conference Series, Vol. 1213. IOP Publishing, 042050.
Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
Hundman et al. (2018) Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. 2018. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 387–395.
Kieu et al. (2019) Tung Kieu, Bin Yang, Chenjuan Guo, and Christian S Jensen. 2019. Outlier Detection for Time Series with Recurrent Autoencoder Ensembles.. In IJCAI. 2725–2732.
Kim et al. (2021) Siwon Kim, Kukjin Choi, Hyun-Soo Choi, Byunghan Lee, and Sungroh Yoon. 2021. Towards a Rigorous Evaluation of Time-series Anomaly Detection. arXiv preprint arXiv:2109.05257 (2021).
Li et al. (2018) Dan Li, Dacheng Chen, Jonathan Goh, and See-kiong Ng. 2018. Anomaly detection with generative adversarial networks for multivariate time series. arXiv preprint arXiv:1809.04758 (2018).
Li et al. (2019) Dan Li, Dacheng Chen, Baihong Jin, Lei Shi, Jonathan Goh, and See-Kiong Ng. 2019. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In International Conference on Artificial Neural Networks. Springer, 703–716.
Li et al. (2021) Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, and Dan Pei. 2021. Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3220–3230.
Malhotra et al. (2016) Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2016. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148 (2016).
Mathur and Tippenhauer (2016) Aditya P. Mathur and Nils Ole Tippenhauer. 2016. SWaT: a water treatment testbed for research and training on ICS security. In 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater). 31–36. https://doi.org/10.1109/CySWater.2016.7469060
Melnyk et al. (2016) Igor Melnyk, Bryan Matthews, Hamed Valizadegan, Arindam Banerjee, and Nikunj Oza. 2016. Vector autoregressive model-based anomaly detection in aviation systems. Journal of Aerospace Information Systems 13, 4 (2016), 161–173.
Meng et al. (2019) Hengyu Meng, Yuxuan Zhang, Yuanxiang Li, and Honghua Zhao. 2019. Spacecraft Anomaly Detection via Transformer Reconstruction Error. In International Conference on Aerospace System Science and Engineering. Springer, 351–362.
Park et al. (2018) Daehyung Park, Yuuna Hoshi, and Charles C Kemp. 2018. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robotics and Automation Letters 3, 3 (2018), 1544–1551.
Ruff et al. (2018) Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International conference on machine learning. PMLR, 4393–4402.
Schlegl et al. (2019) Thomas Schlegl, Philipp Seeböck, Sebastian M Waldstein, Georg Langs, and Ursula Schmidt-Erfurth. 2019. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis 54 (2019), 30–44.
Schlegl et al. (2017) Thomas Schlegl, Philipp Seeböck, Sebastian M Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging. Springer, 146–157.
Su et al. (2019) Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. 2019. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2828–2837.
Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014), 3104–3112.
Tariq et al. (2019) Shahroz Tariq, Sangyup Lee, Youjin Shin, Myeong Shin Lee, Okchul Jung, Daewon Chung, and Simon S Woo. 2019. Detecting anomalies in space using multivariate convolutional LSTM with mixtures of probabilistic PCA. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2123–2133.
Tuli et al. (2022) Shreshth Tuli, Giuliano Casale, and Nicholas R Jennings. 2022. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. arXiv preprint arXiv:2201.07284 (2022).
Wen and Keyes (2019) Tailai Wen and Roy Keyes. 2019. Time series anomaly detection using convolutional neural networks and transfer learning. arXiv preprint arXiv:1905.13628 (2019).
Wu et al. (2019) Peng Wu, Jing Liu, and Fang Shen. 2019. A deep one-class neural network for anomalous event detection in complex scenes. IEEE transactions on neural networks and learning systems 31, 7 (2019), 2609–2622.
Wu et al. (2020) Wentai Wu, Ligang He, Weiwei Lin, Yi Su, Yuhua Cui, Carsten Maple, and Stephen A Jarvis. 2020. Developing an unsupervised real-time anomaly detection scheme for time series with multi-seasonality. IEEE Transactions on Knowledge and Data Engineering (2020).
Xia et al. (2015) Yan Xia, Xudong Cao, Fang Wen, Gang Hua, and Jian Sun. 2015. Learning discriminative reconstructions for unsupervised outlier removal. In Proceedings of the IEEE International Conference on Computer Vision. 1511–1519.
Xu et al. (2018) Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, et al. 2018. Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In Proceedings of the 2018 world wide web conference. 187–196.
Zhang et al. (2019) Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, and Nitesh V Chawla. 2019. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 1409–1416.
Zhang et al. (2021) Huayi Zhang, Lei Cao, Peter VanNostrand, Samuel Madden, and Elke A Rundensteiner. 2021. ELITE: Robust Deep Anomaly Detection with Meta Gradient. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2174–2182.
Zong et al. (2018) Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International conference on learning representations.