This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Institute of Information Engineering,Chinese Academy of Sciences,Beijing,China
11email: {\{yanghaitian}\}@iie.ac.cn
22institutetext: School of Cyber Security,University of Chinese Academy of Sciences,Beijing,China

ASGNet: Adaptive Semantic Gate Networks for Log-Based Anomaly Diagnosis

Haitian Yang(✉) 1122    Degang Sun(✉) 22    Wen Liu 11   
Yanshu Li
1122
   Yan Wang 1122    Weiqing Huang 1122
Abstract

Logs are widely used in the development and maintenance of software systems. Logs can help engineers understand the runtime behavior of systems and diagnose system failures. For anomaly diagnosis, existing methods generally use log event data extracted from historical logs to build diagnostic models. However, we find that existing methods do not make full use of two types of features, (1) statistical features: some inherent statistical features in log data, such as word frequency and abnormal label distribution, are not well exploited. Compared with log raw data, statistical features are deterministic and naturally compatible with corresponding tasks. (2) semantic features: Logs contain the execution logic behind software systems, thus log statements share deep semantic relationships. How to effectively combine statistical features and semantic features in log data to improve the performance of log anomaly diagnosis is the key point of this paper. In this paper, we propose an adaptive semantic gate networks (ASGNet) that combines statistical features and semantic features to selectively use statistical features to consolidate log text semantic representation. Specifically, ASGNet encodes statistical features via a variational encoding module and fuses useful information through a well-designed adaptive semantic threshold mechanism. The threshold mechanism introduces the information flow into the classifier based on the confidence of the semantic features in the decision, which is conducive to training a robust classifier and can solve the overfitting problem caused by the use of statistical features. The experimental results on the real data set show that our method proposed is superior to all baseline methods in terms of various performance indicators.

Keywords:
Anomaly Diagnosis Semantic features Statistical Features Diagnose System Failures.

1 Introduction

With the rapid development and evolution of information technology, During the past few years, we witness that large-scale distributed systems and cloud computing systems gradually become critical technical support of the IT industry [1]. Anomaly detection and diagnosis play an important role in the event management of large-scale systems[2, 3], which aims to detect abnormal behavior of the system in time. Timely anomaly detection enables system developers (or engineers) to pinpoint problems the first time and resolve them immediately, thereby reducing system downtime [4, 5, 6]. However, as the scale of modern software become larger and more complex, the traditional log anomaly detection and diagnosis approaches based on specialized domain knowledge or manually constructed and maintained rules become less and less inefficient [7, 8, 9, 10]. Benefiting from the development of deep learning technology, a number of effective log anomaly detection and diagnosis methods emerge in recent years, but these methods ignore two issues[11, 12], (1) Statistical features: some inherent statistical features in log data, such as word frequency and abnormality label distribution, is not well utilized by deep learning based methods. Statistical features[13, 14] consist of statistical characteristics deterministic compared with log raw data, which is naturally compatible with the corresponding task. (2) Semantic features: Logs contain the execution logic behind the software system, thus log statements share deep semantic relationships. Hence, this paper focuses on how to effectively combine statistical features and semantic features in log data to improve the performance of log anomaly diagnosis. In order to demonstrate the problems that we raised in log data, we list different examples in Table 1 and Figure 1.

Table 1: Statistics on the number of occurrences of words in the seven log datasets
Dataset BGL BGP Tbird Spirit HDFS Liberty Zokeeper
Dataset size 1.207GB 1.04GB 27.367GB 30.289GB 1.58GB 29.5GB 10.4M
total number of distinct words 5632912 4491076 23330854 12793353 3585666 14682493 53094
appear only once 5173492 4330501 7789598 3861462 2576220 2020068 28954
(91.8%) (96.4%) (33.4%) (30.2%) (71.8%) (13.8%) (54.5%)
appear less than 5 times 5298053 4424922 16863767 9914926 2853030 6656253 51443
(94.05%) (98.5%) (72.3%) (77.5%) (79.6%) (45.3%) (96.9%)
appear less than 10 times 5403556 4463325 19948834 10815055 2885414 9882010 52686
(95.9%) (99.4%) (85.5%) (84.5%) (80.5%) (67.3%) (99.23%)
appear less than 20 times 5465876 4472566 21112093 11311509 3353119 11425515 52797
(97.03%) (99.6%) (90.5%) (88.4%) (93.5%) (77.8%) (99.4%)
appear at least once per 10000 lines 3825 1299 79063 137224 1998 10052 564
(0.068%) (0.029%) (0.34%) (1.07%) (0.056%) (0.068%) (1.06%)
appear at least once per 1000 lines 573 316 6243 2663 258 5365 172
(0.01%) (0.007%) (0.027%) (0.021%) (0.007%) (0.037%) (0.32%)

From Table 1, we can see that most of the words are infrequent, and most of these infrequent words appear only once. For example, in the BGP dataset, 96.4% of words appear only one time. In the Liberty dataset, 77.8% of the words have a frequency lower than 20. In the BGP and Zookeeper datasets, the number of words with a frequency lower than 20 accounts for more than 99%. At the same time, only a small fraction of words are frequent, i.e. they occur at least once in every 10000 or 1000 lines of logs. We can observe that most of the logs generated during the system operation are normal, and the abnormal logs are limited. Based on this, we infer that the abnormal words are not frequent, further we conclude that statistical features in logs are necessary for anomaly diagnosis.

Refer to caption
Figure 1: The process from initialization to sending data.

From Figure 1, we can see that the first log to the sixth log shows the complete process from initialization to sending data, demonstrating the complete program execution logic. Therefore, log sequence contains the execution logic behind the program and has rich semantic information. We can see from Table 1 and Figure 1, that the statistical features and text semantic features used in this paper are widely present in log data.

To deal with the two above-mentioned issues, this paper designs an Adaptive Semantic Gate Networks (ASGNet) which consists of three parts, log statistics information representation(V-Net),log deep demantic representation(S-Net) and adaptive semantic threshold mechanism(G-Net). Specifically, V-Net leverages an unsupervised autoencoder[15] to learn a global representation of each statistical feature vector, where we note that employing variational inference can further improve model performance compared to vanilla autoencoders. S-Net extracts latent semantic representations from text input by pre-trained RoBERTa[16]. G-Net aligns information from two sources, then adjusts the information flow.

The main contributions of this paper are summarized as follows:

(1) We attempt to explicitly leverage statistical features of system log data for anomaly diagnosis in a deep learning architecture. We also demonstrate that our proposed approach is very effective based on various experiments.

(2) To fuse statistical features into low-confidence semantic features, we propose a novel adaptive semantic threshold mechanism to retrieve necessary and useful global information. The experimental results prove that the threshold mechanism is very effective for the log-based anomaly diagnosis task.

(3) We conduct extensive experiments on 7 datasets of different scales and subjects. Results show that our proposed model yields a significant improvement over the baseline models.

2 Related Work

The main purpose of anomaly diagnosis is to help O&\&M engineers analyse the cause of anomalies and understand the operational status of the system or network. A number of excellent methods have emerged in recent years.

Yu et al.[17] utilized log templates to build workflows offline. Such models can provide contextual log messages of problems and diagnose problems buried deep within log sequences, which can provide the correct log sequence and tell engineers what is happening. Jia et al.[18] proposed a black-box diagnostic method TCFG (Timed-weight Control Flow Graph) for control flow graphs with temporal weights, which does not require prior domain knowledge and any assumptions and achieves better performance on relevant datasets.

Fu et al.[19] proposed the use of a Finite Automata Machine (FSA) to simulate the execution behaviour of a log-based system model. The model first clusters logs to generate log templates and removes all parameters based on regular expressions. Each log is then labelled by its log template type to construct sequences from which the FSA is learned to capture the system’s normal work flow, achieving better performance on relevant datasets. Beschastnikh et al.[20] generated finite state machines from the execution trace of a concurrent system to infer a concise and accurate model of that system’s behaviour. Engineers can use the inferred finite state machine model of communication to understand complex behaviour and detect anomalies for developers. Lou et al.[21] proposed an automaton model and a corresponding mining algorithm for reconstructing concurrent workflows. The algorithm can automatically discover program workflows and can construct concurrent workflows based on traces of interleaved events.

Although these methods have achieved better performance, none of these explored statistical feature and semantic feature of log sequence. Our study breaks the conventional thinking of treating logs as general objects of time series and introduces novel ideas in natural language processing to anomaly detection and diagnosis, investigating log sequences as text sequences with semantic information.

3 The Proposed Model

In this section, we present our proposed log-based anomaly diagnosis ASGNet model. Model structure is depicted in Figure 2.

Refer to caption
Figure 2: Overview of our proposed ASGNet model.

3.1 Task description

In this research, the log-based anomaly diagnosis task can be described as a tuple of three elements (S,I,y)(S,I,y), where S=[s1,s2,,sg]S=[s^{1},s^{2},\dots,s^{g}] represents the log sequence whose length is gg. II denotes the current log message task ID and yYy\in Y conveys the anomaly diagnosis specific labels for logging exceptions, currently diagnosable exceptions are Stream exception, Connection broken, Redundant request, Unexpected error, etc.

3.2 Definition of terms

In this section, we first give the formal definition of the log statistics vector, which is defined as follows. Given a word ww in a log message and a set of log exception labels C={c1,c2,,cn,}C=\{c_{1},c_{2},\cdots,c_{n},\}, the statistical information vector of ww is:

Ew=[e1,e1,,en]E^{w}=[e_{1},e_{1},\cdots,e_{n}] (1)

where eie_{i} represents the count of word ww on label cic_{i}. Given a log message L={wi}i=1mL=\{w_{i}\}_{i=1}^{m}, the word statistics matrix in the log message is as follows:

EL=[Ew1,Ew2,,Ewm]E^{L}=[E^{w_{1}},E^{w_{2}},…,E^{w_{m}}] (2)

The statistical information vector captures the global distribution of log anomaly labels as a feature of the words in the log.

These statistical features are primitive but can be used for feature selection by determining word-relatedness. Intuitively, if a word ww has very high or very low frequency across all labels, then we can assume that ww has a limited contribution to the anomaly diagnosis task. In contrast, if a word appears more frequently in specific label class, we assume this word is discriminative. Here, the term statistical label vector dictionary EE is only obtained from the training set.

3.3 Log Statistics information Representation

Inspired by the text classification [13],Log Statistics information Representation (V-Net) aims to convert statistical features into an efficient representation. Log statistics vectors contain integer counts of words, thus initially incompatible with semantic features in both dimension and scale. Therefore, V-Net employs an autoencoder to map discrete vectors of log statistics into a latent continuous space to obtain a global representation of the statistics. In this work, we employ a variational autoencoder(VAE)[15] to encode log statistics vectors.

We generate log statistics vectors for all log messages in the data set to obtain S={E(i)L}i=1NS=\{E_{(i)}^{L}\}_{i=1}^{N}, which consists of NN discrete log statistics Vector composition. We assume that all log statistics vectors are generated by a stochastic process pθ(E|S)p\theta(E|S) involving a latent variable S sampled from a prior distribution pθ(S)p\theta(S). Since the posterior pθ(S|E)p\theta(S|E) is intractable, we cannot directly learn the generative model parameters θ\theta. Next, we employ a variational approximation qϕ(S|E)q\phi(S|E) to jointly learn the variational parameters ϕ\phi and θ\theta. Therefore, we can optimize the model by maximizing the marginal likelihood that is composed of a sum over the marginal likelihoods of individual EE:

logpθ(E)=DKL(qϕ(S|E)||pθ(SE))+Ł(θ,ϕ;E)log⁡p_{\theta}(E)=D_{KL}(q_{\phi}(S|E)||p_{\theta}(S∣E))+{\L}(\theta,\phi;E) (3)

Because the KLKL divergence term is non-negative, we can derive the likelihood term Ł(θ,ϕ;E){\L}(\theta,\phi;E) to obtain a variational lower bound on the marginal likelihood, i.e.:

Ł(θ,ϕ;E)=DKL(qϕ(SE)||pθ(S))+E(qϕ(S|E))[logpθ(ES)]){\L}(\theta,\phi;E)=-D_{KL}(q_{\phi}(S∣E)||p_{\theta}(S))+E_{(}q_{\phi}(S|E))[log⁡p_{\theta}(E∣S)]) (4)

where the KLKL term is the closed solution, and the expected term is the reconstruction error. We employ a reparameterization approach to adapt the variational framework to an autoencoder. We use two encoders to generate two sets of μ\mu and σ\sigma as the mean and standard deviation of the prior distributions, respectively. Since our approximate prior is a multivariate Gaussian distribution, we represent the variational posterior with a diagonal covariance structure:

logqϕ(S|E)=logN(S;μ,σ2I)logq_{\phi}(S|E)=log⁡N(S;\mu,\sigma^{2}I) (5)

By training an unsupervised VAE model, we can obtain latent variables EsE^{s} through a probabilistic encoder, which would be a global representation of statistical features. The training of V-Net is independent of the main classifier, and the representation EsE^{s} is generated in the preprocessing stage and is fed to the classifier through the adaptive semantic threshold mechanism.

3.4 Log Deep Semantic Representation

Log Deep Semantic Representation (S-Net) extracts semantic features from log message input and projects the semantic features into the information space for confidence evaluation. The input of S-Net is a log message W=[wi,w2,,wm,]W=[w_{i},w_{2},\cdots,w_{m},] with fixed length mm.

In this paper, we use the pre-training model to obtain the semantic representation of the log. Specifically, we use the pre-trained RoBERTa[16] to extract the feature map of the input log text:

C=RoBERTa(W)C=RoBERTa(W) (6)

Then, we map the semantic feature map CC into the information space through dense layers. We use the sigmoid to activate values in the representation.

HC=WCC+bCH^{C}=W^{C}\cdot C+b^{C} (7)

where HC=σHCH^{{}^{\prime}C}=\sigma(H^{C}), where σ()\sigma(\cdot) is the sigmoid function, are used to evaluate the confidence of the corresponding semantic features in the decision-making process.

3.5 Adaptive Semantic threshold mechanism

As described in Section 2.3, the semantic representation of log statistics features EsE^{s} is obtained offline. To flexibly utilize statistical features, we apply dense layers to project EsE^{s} into the information space shared with semantic features:

HE=WE(Es)+bEH^{E}=W^{E}\odot(E^{s})+b^{E} (8)

The valve component is fused with HCH^{C} and HEH^{E}, and the semantic feature map HOH^{O} enhanced by statistical information is output through the AdaSemGate function,

HO=AdaSemGate(HC,HC,HE,ϵ)=ReLU(HC)+Gate(HC,ϵ)HEH^{O}=AdaSemGate⁡(H^{C},H^{{}^{\prime}C},H^{E},\epsilon)\\ =ReLU⁡(H^{C})+Gate⁡(H^{{}^{\prime}C},\epsilon)\odot H^{E} (9)

where ReLU(·)ReLU(·) is the activation function, and ⊙ represents element-wise point multiplication. The values in HCH^{{}^{\prime}C} are in probabilistic form, and the Gate function is designed to recover less confident entries (probability close to 0.5) to match elements in HζH\zeta. Specifically, for each unit aHCa\in H^{{}^{\prime}C},

Gate(α,ϵ)={α,``if"0.5ϵα0.5+ϵ0,``otherwise"Gate⁡(\alpha,\epsilon)=\begin{cases}\alpha,&``if"0.5-\epsilon\geq\alpha\leq 0.5+\epsilon\\ 0,&``otherwise"\end{cases} (10)

where ϵ\epsilon is a vulnerable hyperparameter for tuning the confidence threshold. Specifically, if ϵ=0\epsilon=0, all statistics are rejected, and if ϵ=0.5\epsilon=0.5, statistics are accepted. Therefore, the Gate(α,ϵ)Gate⁡(\alpha,\epsilon) function uses element-wise multiplication as a filter to extract only the necessary information.

We employ global attention to combine the consolidated semantic representation HOH^{O} with the original feature map CC:

GlobalAttention(𝐇O,𝐂)=softmax(𝐇O𝐂𝖳)𝐂GlobalAttention(\mathbf{H}^{O},\mathbf{C})={\mathrm{softmax}}(\mathbf{H}^{O}\mathbf{C}^{\mathsf{T}})\mathbf{C} (11)

Note that if we reject all statistical information (i.e., ϵ=0\epsilon=0),Eqn.(11) will become self-attention[22] as HO=CH^{O}=C.

After passing through fully-connected layers and a softmax layer, feature vectors are mapped to the label space for label prediction and loss calculation. To maximize the probability of the correct label YTrueY_{True}, we deploy an optimizer tominimize cross-entropy loss LL.

L=CrossEntropy(YTrue,YPred.)L=\mathrm{CrossEntropy}(Y_{\mathrm{True}},Y_{\mathrm{Pred.}}) (12)

4 Experimental Setup

4.1 Dataset and Hyper-parameters

We evaluate our model ASGNet on seven public datasets. The statistics of these seven datasets are listed in Table 2.

Table 2: The statistics of the seven datasets
Datasets Size #of logs #of anomalies #of anomalies types
BGL 1.207GB 4,747,963 949,024 5
BGP 1.04GB 11,428,282 1,276,742 11
Tbird 27.367GB 211,212,192 43,087,287 7
Spirit 30.289GB 272,298,969 78,360,273 8
HDFS 1.58GB 11,175,629 362,793 6
Liberty 29.5GB 266,991,013 191,839,098 17
Zookeeper 10.4M 74,380 49,124 10

4.2 Training and hyperparameters

We fix all the hyper-parameters applied to our model. Specifically, We use the basic version of RoBERTa as the pre-trained embeddings in our experiments. The ϵ\epsilon set to 0.2, which empirically shows the best performance. The algorithm we choose for optimization is Adam Optimizer with the first momentum coefficient β1\beta_{1}= 0.9 and the second momentum coefficient β2\beta_{2}=0.999. We use the best parameters on development sets and evaluate the performance of our model on test sets.

5 Experimental Results

In this section, we elaborate the experimental setup and analyze the experimental results, aiming to answer:

RQ1: Can ASGNet achieve better log-based anomaly diagnosis performance than the state-of-the-art methods for log-based anomaly diagnosis task?

RQ2: How do the key model components and information types used in ASGNet contribute to the overall performance?

RQ3: How does the size of these parameters, specifically, hidden state dimension of global attention and the gate function ϵ\epsilon, affect the performance of the entire model?

5.1 Model Comparisons (RQ1)

To analyze the effectiveness of our model, we take some of the most advanced methods as baselines on the above-mentioned seven datasets, to validate the performance of our ASGNet model. The results are demonstrated as follows.

Refer to caption
Figure 3: Overall perform of our proposed ASGNet model.

As shown in Figure 3, NeuralLog [23] , LogRobust [9] and LogAnomaly [24] are strongest baseline models of the Log-Based Anomaly Detection task. The experiment results show that our model ASGNet achieves best performance on seven datasets. For instance, on the BGL dataset, our model outperforms baseline model NeuralLog [23] by 2.17% in F1-Score (p << 0.05 on student t-test). LogRobust [9] by 1.3% in F1-Score (p << 0.05 on student t-test). LogAnomaly [24] by 4.37% in F1-Score (p << 0.05 on student t-test). It also showed good performance compared to the comparison method on the other six datasets.

The reason is our proposed model takes into account not only the semantic features behind the log execution logic but also the statistical features of the log text in the log exception diagnosis task. In order to better fit the two features, we propose well-designed threshold mechanism which effectively selects the statistical features to consolidate the semantic features, instead of using all statistical features. The threshold mechanism introduces the information flow into the classifier based on the confidence of the semantic feature in the decision, which is conducive to training a robust classifier and can solve the overfitting problem caused by the use of statistical features.Therefore the final experimental results demonstrate that our method achieves remarkable performance in the log-based anomaly diagnosis task.

5.2 Ablation Study (RQ2)

To thoroughly figure out the effect of each key model component, we carry out a series of ablation study to decompose the whole model into three derived models.

Refer to caption
Figure 4: Ablation study on seven datasets

Model (1): only use log statistics information representation – The entire model only uses the statistics information Representation of the log sequence.

Model (2): only use log deep semantic representation – The entire model only uses the log deep semantic representation of the log sequence.

Model (3): without adaptive semantic gate networks – The entire model excludes the adaptive semantic gate networks.

As shown in Figure 4, we can see that in our model, the log deep semantic representation module contributes more to task log-based anomaly diagnosis than the log statistics information representation module, mainly because the semantic information of the text of the logs can better express the deep logical semantic information of the logs, while the statistical features can only express the high and low frequency distribution of words, so log deep semantic representation module shows more advantages. In addition, it can be seen that the adaptive semantic gate module is indispensable in the whole component of our model, and removing the adaptive semantic gate module will affect the overall performance of the text proposed model.

5.3 Parameter Sensitivity (RQ3)

Refer to caption
Figure 5: Performance of model ASGNet influenced by different hidden state dimension of global attention and the gate function ϵ\epsilon

As shown in Figure 5, firstly, we can see that on two datasets the f1-score of our model shows an upward trend when the dimension size is less than 300, especially achieving highest when the dimension size is exactly 300, which indicates that a large dimension size could contribute to model performance. However, when the dimension size is larger than 300, the F1-Score of the model drops on both HDFS and BGL Dataset, possibly due to insufficient training data.

Secondly, we compare the performance of our model using the gate function ϵ\epsilon on two datasets. As illustrated in Figure 5, our model achieves best f1-score with the ϵ=0.2\epsilon=0.2 thus demonstrating that not all statistical features are useful for the model.

6 Conclusion

Anomaly detection and diagnosis play an important role in the event management of large-scale systems, which aims to detect abnormal behavior of the system in time. Timely anomaly detection enables system developers (or engineers) to pinpoint problems the first time and resolve them immediately, thereby reducing system downtime. In this paper we design an Adaptive Semantic Gate Networks (ASGNet) which consists of three parts, log statistics information representation(V-Net),log deep semantic representation(S-Net) and adaptive semantic threshold mechanism(G-Net). Specifically, V-Net leverages an unsupervised autoencoder to learn a global representation of each statistical feature vector. S-Net extracts latent semantic representations from text input by pre-trained RoBERTa. G-Net aligns information from two sources, then adjusts the information flow.The final experimental results demonstrate that our method achieves remarkable performance in the log-based anomaly diagnosis task.

7 Acknowledgement

This work is partially supported by E0HG043104.

References

  • [1] Junjie Chen, Xiaoting He, and et al. An empirical investigation of incident triage for online service systems. In ICSE-SEIP. IEEE, 2019.
  • [2] Rui Chen, Shenglin Zhang, and et al. Logtransfer: Cross-system log anomaly detection for software systems with transfer learning. In ISSRE. IEEE, 2020.
  • [3] Shengming Zhang, Yanchi Liu, and et al. Cat: Beyond efficient transformer for content-aware anomaly detection in event sequences. In SIGKDD, 2022.
  • [4] Junjie Chen, Shu Zhang, and et al. How incidental are the incidents? characterizing and prioritizing incidents for large-scale online service systems. In ASE, 2020.
  • [5] Nengwen Zhao, Junjie Chen, and et al. Real-time incident prediction for online service systems. In ESEC/FSE, 2020.
  • [6] Nengwen Zhao, Junjie Chen, and et al. Understanding and handling alert storm for online service systems. In ICSE-SEIP. IEEE, 2020.
  • [7] Chen Zhi, Jianwei Yin, and et al. An exploratory study of logging configuration practice in java. In ICSME. IEEE, 2019.
  • [8] Lingzhi Wang, Nengwen Zhao, and et al. Root-cause metric location for microservice systems via log anomaly detection. In ICWS. IEEE, 2020.
  • [9] Xu Zhang, Yong Xu, and et al. Robust log-based anomaly detection on unstable log data. In ESEC/FSE, 2019.
  • [10] Leticia Decker, Daniel Leite, and et al. Real-time anomaly detection in data centers for log-based predictive maintenance using an evolving fuzzy-rule-based approach. In FUZZ-IEEE. IEEE, 2020.
  • [11] Shaohan Huang, Yi Liu, and et al. Hitanomaly: Hierarchical transformers for anomaly detection in system log. IEEE Transactions on Network and Service Management, 2020.
  • [12] Yongzheng Xie, Hongyu Zhang, and et al. Logdp: Combining dependency and proximity for log-based anomaly detection. In ICSC. Springer, 2021.
  • [13] Xianming Li, Zongxi Li, and et al. Merging statistical feature via adaptive gate for improved text classification. In AAAI, 2021.
  • [14] Han Zhang, Zongxi Li, and et al. Leveraging statistical information in fine-grained financial sentiment analysis. World Wide Web, 2022.
  • [15] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. stat, 2014.
  • [16] Yinhan Liu, Myle Ott, and et al. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  • [17] Xiao Yu, Pallavi Joshi, and et al. Cloudseer: Workflow monitoring of cloud infrastructures via interleaved logs. SIGARCH, 2016.
  • [18] Tong Jia, Lin Yang, and et al. Logsed: Anomaly diagnosis through mining time-weighted control flow graph in logs. In CLOUD. IEEE, 2017.
  • [19] Qiang Fu, Jian-Guang Lou, and et al. Execution anomaly detection in distributed systems through unstructured log analysis. In ICDM. IEEE, 2009.
  • [20] Ivan Beschastnikh, Yuriy Brun, and et al. Inferring models of concurrent systems from logs of their behavior with csight. In ICSE, pages 468–479, 2014.
  • [21] Jian-Guang Lou, Qiang Fu, and et al. Mining program workflow from interleaved traces. In SIGKDD, 2010.
  • [22] Ashish Vaswani, Noam Shazeer, and et al. Attention is all you need. NIPS, 2017.
  • [23] Van-Hoang Le and Hongyu Zhang. Log-based anomaly detection without log parsing. In ASE, pages 492–504. IEEE, 2021.
  • [24] Weibin Meng, Ying Liu, , and et al. Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. In IJCAI, 2019.