Are GNNs Effective for Multimodal Fault Diagnosis in Microservice Systems?
Abstract
Fault diagnosis in microservice systems has increasingly embraced multimodal observation data for a holistic and multifaceted view of the system, with Graph Neural Networks (GNNs) commonly employed to model complex service dependencies. However, despite the intuitive appeal, there remains a lack of compelling justification for the adoption of GNNs, as no direct evidence supports their necessity or effectiveness. To critically evaluate the current use of GNNs, we propose DiagMLP, a simple topology-agnostic baseline as a substitute for GNNs in fault diagnosis frameworks. Through experiments on five public datasets, we surprisingly find that DiagMLP performs competitively with and even outperforms GNN-based methods in fault diagnosis tasks, indicating that the current paradigm of using GNNs to model service dependencies has not yet demonstrated a tangible contribution. We further discuss potential reasons for this observation and advocate shifting the focus from solely pursuing novel model designs to developing challenging datasets, standardizing preprocessing protocols, and critically evaluating the utility of advanced deep learning modules.
Index Terms:
MicroSS, fault diagnosis, multimodal data, multi-layer perceptron, graph neural networks.I Introduction
Microservice systems (MicroSS) underpin modern distributed applications in cloud environments, providing scalability, flexibility, and modular development. These systems are decomposed into interdependent services and monitored through multimodal data, including logs, metrics, and traces. Faults in MicroSS may manifest in single or multiple modalities (as illustrated in Fig. 1) and propagate through intricate service dependencies [1], making fault diagnosis in MicroSS a significant challenge. Recent studies [2, 3, 4, 5, 6, 7, 8] employ Graph Neural Networks (GNNs) to integrate local multimodal observation data with global service dependencies for fault diagnosis, aiming to address this challenge. Despite growing interest in this area [9], evidence remains limited regarding the necessity and effectiveness of GNNs in capturing service dependency information, raising a critical question about whether the current paradigm of using GNNs to model these dependencies is genuinely effective.
This paper critically examines the current research paradigm in multimodal fault diagnosis for MicroSS, specifically questioning the effectiveness of using GNNs for these tasks. To address this, we introduce DiagMLP, a simple yet robust baseline model based on a topology-agnostic Multi-Layer Perceptron (MLP) framework, serving as a substitute for the GNN module. DiagMLP is evaluated against state-of-the-art GNN-based methods, and experimental results across five public datasets yield a surprising insight: the minimalist design of DiagMLP not only achieves competitive performance but also outperforms several sophisticated GNN-based models in fault detection, localization, and classification tasks. The contributions of this work are threefold:
-
1.
We present a unified GNN-based fault diagnosis framework, synthesizing existing research and questioning the necessity of GNNs given the sufficient preprocessing of multimodal data.
-
2.
We propose DiagMLP, a simple yet effective baseline model, challenging the prevailing paradigm of using GNNs to model service dependencies.
-
3.
Through experiments on five datasets, we demonstrate the limited utility of GNNs in current fault diagnosis tasks and provide actionable insights to inform future research.
As the first work to critically examine the effectiveness of GNNs in fault diagnosis for MicroSS, our results underscore the importance of adopting non-trivial datasets and standardized preprocessing to properly assess the need for complex models. And we advocate for a shift in focus within the research community toward practical, interpretable, and effective solutions. We also caution against the use of complex deep learning models that provide only marginal improvements.
II GNN-Based Fault Diagnosis Methods
Existing researches typically follow a common paradigm, wherein dependency graphs are constructed to represent the relationships among service instances (nodes) based on trace data and system deployment configuration, node features are extracted through multimodal data fusion, and graph-based representations are subsequently employed for fault diagnosis. GNNs, known for capturing complex interactions and relationships between nodes, have become a prevalent choice for fault diagnosis in MicroSS [10, 7]. We summarize a general framework of existing researches including several key design elements as illustrated in Fig. 2.
Preprocessing
Multimodal raw data, including metrics, logs, and traces, is typically unprocessed and lacks standardization, making preprocessing both crucial and diverse. The first step in preprocessing is aligning the multimodal data by timestamps. For log data, log parsing tools, such as Drain3 [11], are commonly used to extract log templates. Through heuristic filtering [4], logs are transformed into time series of template occurrences [7, 12, 6]. Although this reduces semantic richness, it effectively mitigates redundancy in raw log text. Metrics are processed using standard techniques for multivariate time series, with dimensionality reduction achieved through clustering [2]. Traces are used to extract latency time series, which serve as edge [7] or node features. Additionally, combined with deployment configurations [3, 5], traces are employed to construct dependency graphs of service instances. Due to the rarity of anomalies, processed time series may still contain redundant information. Lightweight anomaly detection techniques, such as 3-sigma [3, 4], can be applied to generate alert event sequences, offering a unified representation that balances informativeness and efficiency across modalities [10, 13].
Embedding
To preserve both temporal and semantic information, carefully selected models are employed to embed preprocessed data. For time-series data, advanced sequence models such as Temporal Convolutional Networks (TCN) [14] and Transformers [15] are commonly used to capture complex temporal dependencies. Event sequences are embedded using language models like FastText [16] and GloVe [17], while raw log text is encoded into contextual embeddings through pretrained language models such as BERT [18]. Embedding modules influence the model’s training process, which may involve either a pretraining strategy [3, 4, 5] or end-to-end integration within the fault diagnosis pipeline.
GNN-based Modeling
Faults in MicroSS propagate through service dependencies, often leading to cascading failures. Integrating multimodal observations with dependency graphs using GNNs is considered a promising approach for diagnosis. The most commonly used GNN architectures include basic models such as GCN, GAT, and GraphSAGE. More advanced and specialized GNN variants have also been proposed. For example, DGERCL [19] employs a stream-based dynamic GNNs to model temporal sequences of service invocations, while DeepHunt [5] introduces a graph autoencoder framework for self-supervised learning that effectively addresses the challenge of limited labeled data.
The review of existing literature reveals two key observations that underpin our research. First, while preprocessing and embedding methodologies vary significantly, the modeling stage predominantly relies on GNNs, whose effectiveness has not been rigorously benchmarked against minimal baselines. Second, preprocessing—particularly for trace data—often incorporates system dependency topology into feature representations, which may overshadow the unique contributions of GNNs. These observations call for a thorough re-evaluation of GNNs’ efficacy in fault diagnosis for MicroSS, raising the question of whether their complexity is justified given the extensive preprocessing that already captures critical topological information.
III A Topology-Agnostic Baseline
State-of-the-art GNN-based models usually optimize the entire pipeline of fault diagnosis tasks, making it unclear whether performance gains result from the GNNs or other components like multimodal data and preprocessing. We hypothesize that the improvements largely come from non-GNN elements, considering that preprocessing of multimodal data already extracts rich information. To test the hypothesis, we propose a topology-agnostic baseline to isolate and evaluate the true contributions of GNN-based models.
III-A Problem Formulation
Given a MicroSS represented as a set of nodes (service instances), each node is associated with multimodal features:
(1) |
where are embeddings derived from metric, log, and trace data, respectively. These embeddings can be generated using any preprocessing and embedding pipeline, without imposing restrictions on the specific methods employed.
The goal is to learn a mapping that determines whether the system is anomalous (for ), identifies the root cause (for ), and classifies the type of failure (where is the number of failure types).
III-B DiagMLP
We propose a minimal fault diagnosis model based on a topology-agnostic MLP, named DiagMLP, which serves as a null model for evaluating GNN-based methods. The architecture of DiagMLP is defined in Eq. 2:
(2) | ||||
where denotes the concatenation operator, is the learnable position embedding [20] for node , and is the system-wise representation. The two Fusion MLP modules share the same simple architecture, defined as follows in Eq. 3:
(3) |
where LN refers to layer normalization, ReLU introduces non-linearity, and Dropout serves as regularization to prevent overfitting.
As illustrated in Fig. 3, DiagMLP (➁ and ➂) maps node-level embeddings into task-oriented system-level representations, imposing no constraints on the embedding module or downstream tasks. This adaptability positions DiagMLP as a topology-agnostic replacement for GNNs modules in existing frameworks.
III-C Design Choices of DiagMLP
The design of DiagMLP prioritizes simplicity and effectiveness by incorporating several key decisions:
Learnable Position Embeddings
DiagMLP includes a learnable position embedding for each node, which empirically improves performance. This is likely due to improved node discrimination, which allows the model to capture node-specific patterns that are critical to fault diagnosis.
Single-Layer MLP
A single-layer MLP is used instead of a deeper architecture because of the empirical effectiveness of linear transformations of raw features in fault diagnosis. This choice minimizes complexity while maintaining strong performance, reducing overfitting, and emphasizing the impact of preprocessing and embedding steps.
Fusion by Concatenation
Modal-wise and node-wise features are integrated through straightforward concatenation to retain the full breadth of input information. While this method proves effective, it results in the parameter space of the Node Fusion MLP scaling linearly with the number of nodes . Although this complexity is manageable for publicly available multimodal datasets, it poses significant challenges to scalability in larger systems. To overcome these limitations, future research could investigate more sophisticated and scalable alternatives, such as DeepSets [21] or Set Transformers [22].
Dataset | #instances | #train | #valid | #test | Task aaaFault Detection (Det.), Localization (Loc.) and Classification (Cla.). |
SN | 12 | 316 | 78 | 169 | Det. & Loc. |
TT | 27 | 3256 | 813 | 1744 | Det. & Loc. |
GAIA | 10 | 128 | 32 | 939 | Cla. & Loc. |
D1 | 46 | 63 | - | 147 | Loc. |
D2 | 18 | 40 | - | 93 | Loc. |
IV Experiments
To validate our hypothesis, we integrate DiagMLP as a substitute for the GNNs components in SOTA fault diagnosis frameworks. Crucially, we maintain consistency across all other pipeline components, including the raw dataset, preprocessing methods, embedding techniques, objectives, and training strategies. This controlled experimental design ensures that any observed performance differences are attributable solely to the modeling stage, thereby minimizing the influence of confounding variables.
IV-A Experimental Settings
Baselines
We use 3 recently proposed methods—Eadro [2], TVDiag [4], and DeepHunt [5]—as backbone frameworks, replacing their GNNs components with our topology-agnostic DiagMLP, to validate the effectiveness of our model. Additionally, we include two other methods, DiagFusion [3] and CHASE [12], as they follow the same evaluation protocols.
Datasets
To ensure maximum control over experimental variables, we adopt the datasets used in the corresponding papers: Social Network (SN) and Train Tickets (TT) datasets from Eadro222https://zenodo.org/doi/10.5281/zenodo.7615393, the GAIA dataset from TVDiag333https://github.com/WHU-AISE/TVDiag, and the D1 and D2 datasets from DeepHunt444https://github.com/bbyldebb/DeepHunt. Specifically, we reimplement555https://github.com/FeiGSSS/Eadro the preprocessing codes for SN and TT datasets, as the original preprocessing scripts were not provided. For GAIA, D1, and D2, we directly use the preprocessed multi-modal data publicly provided by the authors. This ensures consistency and comparability across all experiments. We present in Table I the statistics of the datasets used.
Evaluation Metrics
For fault detection and classification tasks, we evaluate model performance using precision, recall, and F1 score. In fault localization, we assess performance through Top accuracy, which is the proportion of cases in which the faulty node is correctly identified within the top predictions.
Notes
(1) We do not attempt to evaluate all models on all datasets, which is unnecessary for the purpose of our experiments. (2) We have identified and addressed flaws in existing frameworks, such as the window splitting strategy in Eadro that could result in data leakage. Furthermore, we introduce a validation set to align with standard best practices in machine learning evaluation.
IV-B Results
SN | TT | |||||
Pre | Rec | F1 | Pre | Rec | F1 | |
Eadro[2] | 92.1 | 92.0 | 92.1 | 83.3 | 99.5 | 90.7 |
DiagMLP | 93.8 | 99.7 | 96.7 | 83.4 | 99.8 | 90.8 |
Methods | Pre | Rec | F1 |
---|---|---|---|
TVDiag [4] | 93.4 ± 0.2 | 95.5 ± 0.0 | 94.5 ± 0.1 |
DiagMLP | 92.1 ± 0.2 | 94.7 ± 0.3 | 93.4 ± 0.2 |
Datasets 666Results without standard deviations are directly extracted from the cited papers, while results with standard deviations are derived from 10 independent experiments conducted under the same settings as the cited works. | Eadro [2] | DiagFusion [3] | TVDiag [4] | CHASE [12] | DeepHunt [5] | DiagMLP (Ours) | Backbone | |
SN | Top1 | 41.8 ± 14.5 | - | - | - | - | 80.2 ± 3.1 | Eadro |
Top3 | 61.1 ± 16.0 | - | - | - | - | 88.6 ± 2.8 | ||
Top5 | 71.7 ± 10.8 | - | - | - | - | 89.4 ± 3.2 | ||
TT | Top1 | 91.1 ± 2.2 | - | - | - | - | 98.5 ± 0.8 | Eadro |
Top3 | 96.5 ± 0.7 | - | - | - | - | 99.7 ± 0.6 | ||
Top5 | 97.7 ± 0.6 | - | - | - | - | 99.2 ± 0.6 | ||
GAIA | Top1 | 30.4 | 41.9 | 59.8 ± 0.8 | 61.4 | - | 61.0 ± 1.0 | TVDiag |
Top3 | 59.2 | 81.3 | 80.4 ± 0.5 | 88.2 | - | 80.7 ± 0.8 | ||
Top5 | - | 91.4 | 88.5 ± 0.2 | - | - | 87.4 ± 0.8 | ||
D1 | Top1 | 31.0 | 33.3 | - | - | 79.8 ± 0.4 | 79.8 ± 1.0 | DeepHunt |
Top3 | 44.6 | 50.5 | - | - | 90.6 ± 0.6 | 90.6 ± 0.5 | ||
Top5 | 48.4 | 64.8 | - | - | 96.7 ± 0.3 | 95.8 ± 1.0 | ||
D2 | Top1 | 21.4 | 39.8 | - | - | 78.9 ± 0.7 | 79.0 ± 0.5 | DeepHunt |
Top3 | 38.6 | 55.2 | - | - | 93.4 ± 0.3 | 93.0 ± 0.7 | ||
Top5 | 45.4 | 75.0 | - | - | 94.6 ± 0.0 | 94.6 ± 0.0 |
We present the fault detection results for the SN and TT datasets in Table II, the fault localization results across all datasets in Table IV, and the fault classification results for the D1 and D2 datasets in Table III.
Surprisingly, DiagMLP demonstrates competitive or superior performance across all three tasks compared to state-of-the-art models incorporating GNNs modules. Notably, DiagMLP significantly outperforms the backbone Eadro in fault detection and localization accuracy on both the SN and TT datasets. This improvement can likely be attributed to Eadro’s overfitting tendencies due to its use of a more complex GAT [23] model, as evidenced by the large standard deviation reported for Eadro in Table IV. Furthermore, DiagMLP achieves performance nearly equivalent to that of TVDiag and DeepHunt, underscoring that the GNNs modules (both use GraphSAGE [24]) utilized in these models fail to provide any tangible performance gains.
IV-C Visualization


The embeddings for different datasets are further visualized using UMAP[25] to analyze their distinguishing capabilities.
For the SN and TT datasets, we concatenate all preprocessed multimodal node-level information corresponding to the same fault to construct fault window representations. These are visualized using UMAP in Fig. 4. The results reveal that preprocessed fault window features alone are sufficient to effectively distinguish different fault root causes.
For the GAIA dataset, we use three fault window representations: (1) concatenated features from the embedding module (Original Features), (2) embeddings from DiagMLP’s node fusion (MLP Embedding), and (3) embeddings from TVDiag’s graph pooling (GNN Embedding), as shown in Fig. 5. Neither MLP nor GNNs modeling significantly improved class separation compared to the original features, with GNNs offering no clear advantage despite leveraging topological information.
Your Analysis and Discussion section is well-written, but I’ve made a few adjustments to improve clarity, flow, and conciseness, while maintaining the academic tone. Here’s a refined version:
V Analysis and Discussion
The competitive performance of the topology-agnostic DiagMLP suggests that current state-of-the-art methods fail to fully exploit the assumed advantages of modeling system dependencies with GNNs. In fact, the use of GNN modules may even degrade model accuracy. We analyze two possible reasons: (1) topological dependencies may be either redundant or not fully utilized, and (2) the preprocessing steps may have already embedded essential topological information into node embeddings. As for the first reason, prior studies have demonstrated the effectiveness of leveraging topological dependencies in related domains [26, 27]. Therefore, we argue that existing fault diagnosis methods fail to effectively leverage topological dependency information. The second hypothesis suggests that the datasets used in existing studies may be too small in scale or simplistic in structure, rendering global topological information unnecessary for achieving competitive diagnostic performance.
Based on these observations, we urge the MicroSS research community to exercise greater caution when introducing complex models like GNNs for multimodal fault diagnosis. It is crucial to clearly distinguish the innovations that genuinely advance the field. Furthermore, we emphasize the need for larger-scale, more complex public datasets, alongside standardized preprocessing protocols. We also advocate for adherence to best practices in deep learning, including rigorous validation and testing protocols, to ensure the reproducibility and generalizability of experimental results.
VI Conclusion
This paper critically evaluates the effectiveness of GNNs in fault diagnosis for MicroSS by proposing a topology-agnostic baseline model, DiagMLP. Experiments on five datasets reveal that the GNN modules used in state-of-the-art methods provide no substantial performance gains. In contrast, DiagMLP achieves comparable, and in some cases superior, results in fault detection, localization, and classification tasks. These findings suggest that improvements are largely driven by multimodal data preprocessing and embedding techniques, rather than GNN-based dependency modeling. The study also highlights the limitations of existing datasets and experimental designs, calling for larger, more complex datasets and standardized preprocessing protocols. This work not only challenges the prevailing assumptions about the utility of GNNs in this domain but also provides a robust methodological foundation and clear directions for future research in microservice fault diagnosis.
References
- [1] R. Xin, P. Chen, and Z. Zhao, “Causalrca: Causal inference based precise fine-grained root cause localization for microservice applications,” Journal of Systems and Software, vol. 203, p. 111724, 2023.
- [2] C. Lee, T. Yang, Z. Chen, Y. Su, and M. R. Lyu, “Eadro: An end-to-end troubleshooting framework for microservices on multi-source data,” in Proceedings of the 45th International Conference on Software Engineering, ser. ICSE ’23. IEEE Press, 2023, p. 1750–1762.
- [3] S. Zhang, P. Jin, Z. Lin, Y. Sun, B. Zhang, S. Xia, Z. Li, Z. Zhong, M. Ma, W. Jin, D. Zhang, Z. Zhu, and D. Pei, “Robust failure diagnosis of microservice system through multimodal data,” IEEE Transactions on Services Computing, vol. 16, no. 6, pp. 3851–3864, 2023.
- [4] S. Xie, J. Wang, H. He, Z. Wang, Y. Zhao, N. Zhang, and B. Li, “Tvdiag: A task-oriented and view-invariant failure diagnosis framework with multimodal data,” ArXiv, vol. abs/2407.19711, 2024.
- [5] Y. Sun, Z. Lin, B. Shi, S. Zhang, S. Ma, P. Jin, Z. Zhong, L. Pan, Y. Guo, and D. Pei, “Interpretable failure localization for microservice systems based on graph autoencoder,” ACM Trans. Softw. Eng. Methodol., Sep. 2024, just Accepted.
- [6] L. Zheng, Z. Chen, J. He, and H. Chen, “Mulan: Multi-modal causal structure learning and root cause analysis for microservice systems,” in Proceedings of the ACM Web Conference 2024, ser. WWW ’24. New York, NY, USA: Association for Computing Machinery, 2024, p. 4107–4116.
- [7] J. Huang, Y. Yang, H. Yu, J. Li, and X. Zheng, “ Twin Graph-Based Anomaly Detection via Attentive Multi-Modal Learning for Microservice System ,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). Los Alamitos, CA, USA: IEEE Computer Society, Sep. 2023, pp. 66–78.
- [8] Y. Zhang, Y. Zhao, H. Shi, F. Gao, G. Xu, X. Chen, and X. Li, “Fault-aware service scheduling optimization framework in edge data center,” in 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), 2024, pp. 514–519.
- [9] S. Zhang, S. Xia, W. Fan, B. Shi, X. Xiong, Z. Zhong, M. Ma, Y. Sun, and D. Pei, “Failure diagnosis in microservice systems: A comprehensive survey and analysis,” arXiv preprint arXiv:2407.01710, 2024.
- [10] C. Zhang, X. Peng, C. Sha, K. Zhang, Z. Fu, X. Wu, Q. Lin, and D. Zhang, “Deeptralog: Trace-log combined microservice anomaly detection through graph-based deep learning,” in 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022, pp. 623–634.
- [11] P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing approach with fixed depth tree,” in 2017 IEEE International Conference on Web Services (ICWS), 2017, pp. 33–40.
- [12] Z. Zhao, T. Zhang, Z. Shen, H. Dong, X. Ma, X. Liu, and Y. Yang, “Chase: A causal heterogeneous graph based framework for root cause analysis in multimodal microservice systems,” ArXiv, vol. abs/2406.19711, 2024.
- [13] G. Yu, P. Chen, Y. Li, H. Chen, X. Li, and Z. Zheng, “Nezha: Interpretable fine-grained root causes analysis for microservices on multi-modal observability data,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY, USA: Association for Computing Machinery, 2023, p. 553–565.
- [14] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271, 2018.
- [15] A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Neural Information Processing Systems, 2017.
- [16] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 06 2017.
- [17] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
- [18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186.
- [19] H. Cheng, Q. Li, B. Liu, S. Liu, and L. Pan, “Dgercl: A dynamic graph embedding approach for root cause localization in microservice systems,” IEEE Transactions on Services Computing, pp. 1–12, 2024.
- [20] V. P. Dwivedi, A. T. Luu, T. Laurent, Y. Bengio, and X. Bresson, “Graph neural networks with learnable structural and positional representations,” in International Conference on Learning Representations, 2022.
- [21] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,” Advances in neural information processing systems, vol. 30, 2017.
- [22] J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” in International conference on machine learning. PMLR, 2019, pp. 3744–3753.
- [23] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018.
- [24] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017.
- [25] L. McInnes, J. Healy, N. Saul, and L. Großberger, “Umap: Uniform manifold approximation and projection,” Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018. [Online]. Available: https://doi.org/10.21105/joss.00861
- [26] S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y. Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” in Proceedings of the ACM Symposium on Cloud Computing, ser. SoCC ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 412–426.
- [27] Y. Zhang, Z. Guan, H. Qian, L. Xu, H. Liu, Q. Wen, L. Sun, J. Jiang, L. Fan, and M. Ke, “Cloudrca: A root cause analysis framework for cloud computing platforms,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 4373–4382.