Machine Learning Approaches for Diagnostics and Prognostics of Industrial Systems Using Open Source Data from PHM Data Challenges: A Review

Hanqi Su, Jay Lee
Center for Industrial Artificial Intelligence, Department of Mechanical Engineering
University of Maryland, College Park, MD, USA
{hanqisu,leejay}@umd.edu
Corresponding author

Abstract

In the field of Prognostics and Health Management (PHM), recent years have witnessed a significant surge in the application of machine learning (ML). Despite this growth, the field grapples with a lack of unified guidelines and systematic approaches for effectively implementing these ML techniques and comprehensive analysis regarding industrial open-source data across varied scenarios. To address these gaps, this paper provides a comprehensive review of ML approaches for diagnostics and prognostics of industrial systems using open-source datasets from PHM Data Challenge Competitions held between 2018 and 2023 by PHM Society and IEEE Reliability Society and summarizes a unified ML framework. This review systematically categorizes and scrutinizes the problems, challenges, methodologies, and advancements demonstrated in these competitions, highlighting the evolving role of both conventional machine learning and deep learning in tackling complex industrial tasks related to detection, diagnosis, assessment, and prognosis. Moreover, this paper delves into the common challenges in PHM data challenge competitions by emphasizing data-related and model-related issues and evaluating the limitations of these competitions. The potential solutions to address these challenges are also summarized. Finally, we identify key themes and potential directions for future research, providing opportunities and prospects for next-generation ML-PHM development in PHM domain.

1 Introduction

In the era of Industry 4.0, the emphasis on the reliability, efficiency, and longevity of industrial systems has become crucial [57]. Prognostics and Health Management (PHM) integrates the detection, diagnosis, assessment, and prognosis of system failures to address the growing need for proactive system health management [132]. This integrative approach can enhance the reliability and safety of industrial systems, with minimal unplanned downtimes and reduced maintenance costs.

The rise of the Internet of Things [100], big data analytics [116], cyber-physical systems [59], machine learning (ML) [41], deep learning (DL) [93], and industrial artificial intelligence [87, 61] has paved the way for a transformative shift in PHM. Historically, PHM methods largely relied on physical-based methods, which derived their strength from profound insights into system physics, material properties, and failure mechanisms. These approaches, however, often struggled with scalability, adaptability, and the ability to handle the vast variability and uncertainties inherent in real-world operations. In contrast, ML techniques present the capability to model complex systems comprehensively, uncover intricate patterns, and accurately diagnose and predict failures. Yet, employing ML in PHM poses several challenges, including the critical need for usable, useful, high-quality, and extensive data sets, extensive computing resources, a solid infrastructure for data collection and processing, as well as the necessity for assembly of a team of professionals proficient in artificial intelligence (AI) and ML [60, 5]. Despite these challenges, the momentum for implementing ML in industrial settings is evident. With the gradual move towards digital transformation within the industry, incorporating ML into PHM aligns with the principles of Industry 4.0 [88]. It underscores the capability of data-driven approaches to provide precise and adaptable diagnostic and prognostic solutions and represents a strategic shift towards leveraging ML techniques for enhanced predictive maintenance.

1.1 A Survey of Machine Learning Based PHM Reviews

A multitude of comprehensive reviews on AI/ML/DL methods applied in PHM domain are noted [96, 89, 56, 85, 128, 84, 88, 93, 37]. These reviews evaluate the existing literature both qualitatively and quantitatively, presenting their distinct perspectives and pinpointing the trends and new concepts of AI/ML/DL methods for PHM across various scenarios. [96] reviewed DL architectures like a one-class neural network (OCNN), self-organizing map (SOM), and generative techniques aligned with the industrial needs from a predictive maintenance perspective. [89] highlighted seven DL architectures, including emerging DL methods such as graph neural networks, transformers, and generative adversarial networks, addressing four different challenges (imbalanced data, multimodal data fusion, compound fault types, and edge device implementation). Additionally, [56] provided a comprehensive examination of PHM methods in the context of smart factories, spanning from traditional ML approaches to DL-based approaches. An extensive review on modeling techniques supporting PHM of industrial equipment, specifically within onshore wind energy and civil aviation sectors, was given by [128], wherein they discuss how modeling approaches are shaped by industry-specific factors (maintenance strategies, implementation aspects, and supporting technologies). Furthermore, [84] proposed a general guideline under AI-based PHM for selecting appropriate techniques to solve specific PHM problems.

1.2 Motivation

The aforementioned review papers in section 1.1 serve as the groundwork for further PHM development. Currently, a significant portion of the literature primarily examines algorithms based on their capabilities and functionalities, with numerous reviews covering various ML or DL methods like CNN, RNN, GAN, GNN, Transformer, etc. However, many researchers tend to rely on artificial datasets for algorithm testing in their studies, rather than using real, industry-specific datasets. Additionally, the focus of discussed data sets predominantly lies on mechanical components commonly used in algorithm development, such as gears and bearings. Furthermore, a considerable portion of the data sets mentioned in these studies are not publicly accessible, and among the available ones, most date back more than six years (before 2018). In PHM domain, organizations such as the PHM Society¹¹1https://phmsociety.org/ and the IEEE Reliability Society²²2https://rs.ieee.org/ have played pivotal roles in fostering innovation, research, and collaboration over the last fifteen years. It is worth noting that these organizations provide participants with different real open-source industrial datasets and pose real-world challenges by holding PHM data challenge competitions. These competitions seek to accelerate the development and validation of cutting-edge PHM methodologies, bridging the gap between academia and industry. Therefore, conducting in-depth analysis and review of industrial open source datasets is necessary to propel the evolution of data-centric techniques, ML, and AI within the PHM domain.

1.3 Contributions

This study aims to conduct a comprehensive problem-challe- nge-solution-application-oriented review using the industrial open source data available in the last six years from PHM Data Challenge. In pursuit of the objective, 59 research papers were reviewed, encompassing both competition winning contributions and subsequent exploratory research undertaken post competition. The detailed paper selection and investigation are discussed in section 2.1. Our contributions include the following:

1.

This study summarizes the problems and solutions presented in nine PHM data challenge competitions, elucidating the tasks, challenges, ML or DL methods, and analytical strategies employed to tackle these competitions.
2.

We propose a unified ML framework for the PHM domain based on this review study, serving as a general guideline for the development of future ML models.
3.

We discuss common challenges associated with industrial open source data, underscoring specific issues related to data issues (missing data, data imbalance, and domain shift), and model issues (model selection, machine learning model interpretability, model robustness and generalization) in data-driven approaches. Possible solutions are also provided. Moreover, we evaluate the limitations of these competitions and suggest future directions.
4.

We identify five further research directions for the application of ML in PHM. These include: (1) a need for open-source multi-modal datasets, (2) development of multi-modal machine learning approaches, (3) further exploration in machine learning model interpretability, (4) novel transfer learning and domain adaptation techniques development for model robustness and generalization, and (5) potential utilization of large language models and industrial large knowledge models.

Refer to caption — Figure 1: Research Paper Selection Process Using PRISMA Method

The rest of this paper is organized as follows: Section 2 introduces the methodology of how we select research papers, an overview of PHM data challenge competitions, and underscores major research tasks within PHM. Section 3 respectively introduces the prevalent challenges associated with two parts: detection & diagnosis, and assessment & prognosis, subsequently detailing the solutions presented in various PHM data competitions individually. A unified ML framework for the PHM domain is proposed. Section 4 critically summarizes and examines common challenges in the PHM domain through the perspective of ML methodologies, addressing concerns related to data issues, and model issues. The limitations of these competitions are also discussed. Section 5 provides five research directions for future PHM development. Section 6 concludes the paper, highlighting its findings and contributions.

2 Background

In this section, we first introduce the procedure for selecting research papers using systematic reviews and meta-analyses (PRISMA) method. Next, we provide an overview of the PHM data challenge competition. Then, we outline the major research tasks in the PHM field and provide respective explanations.

2.1 Identification, Screening, and Inclusion of Studies

For paper selection and investigation, this review adheres to the guidelines outlined in the PRISMA statement [75]. As shown in Figure 1, the PRISMA flowchart illustrates a systematic way of selecting papers. Initially, the search keywords were structured around three key aspects: data sources, industrial systems, and PHM/AI-related issues, with a focus on PHM related research using industrial open-source data from recent PHM data challenges. The search scope was restricted to articles published between 2018 and 2023 across Google Scholar, IEEE Xplore, Web of Science, Scopus, and ScienceDirect. Then, the literature search was conducted on December 18, 2023, using the predefined keywords. Subsequently, the identified records from five databases were consolidated, and duplicates were removed. After that, based on the exclusion criteria provided in Table 1, reviewers further reviewed and evaluated the remaining articles and finally identified 59 representative papers for analysis, as presented in this survey.

Table 1: Exclusion Criteria for Screening Stage

Exclusion criteria

Full text is not available

The method of the paper is not

based on machine learning techniques

The paper is not written in English

The research does not utilize

industrial open source data for analysis

2.2 Overview of PHM Data Challenge Competitions

Table 2: Overview of PHM Data Challenge Competitions from 2018 to 2023

Competition	Industrial Systems	Research Task	No. of Papers
2018 PHM NA	Ion Mill Etching System	Detection & Diagnosis & Assessment & Prognosis	12
2019 PHM NA	Fatigue Crack	Assessment & Prognosis	4
2020 PHM EU	Filtration System	Prognosis	8
2021 PHM EU	Manufacturing Production Line	Detection & Diagnosis	7
2021 PHM NA	Turbofan Engine	Prognosis	8
2022 PHM EU	Printed Circuit Board	Detection & Diagnosis	6
2022 PHM NA	Rock Drill	Detection & Diagnosis	5
2023 PHM AP	Spacecraft Propulsion System	Detection & Diagnosis	5
2023 IEEE	Gearbox	Detection & Diagnosis	4

From 2018 to 2023, PHM Society and IEEE Reliability Society have initiated nine PHM data challenge competitions which display challenges associated with analyzing industrial data across diverse industrial sectors. The topics of these competitions are extensive, encompassing different industrial systems such as Ion Mill Etching Tools, Filtration Systems, Manufacturing Production Lines, Turbofan Engines, Printed Circuit Boards, Rock Drills, Spacecraft Propulsion Systems, and Gearbox, among others. Furthermore, the range of problems posed covers the main tasks in PHM domain including detection, diagnosis, assessment, and prognosis. These tasks align with PHM’s ultimate goal: to accurately evaluate and predict system health, degradation, and eventual failure, thereby improving system reliability, safety, and operational efficiency.

For ease of discussing different competitions, we’ve adopted a condensed naming convention for the PHM data challenge competition: ”YEAR ORGANIZATION” indicates the respective year and organizer of the data challenge. Among the challenges discussed, one is organized by the IEEE Reliability Society, while the remaining eight are organized by the PHM Society. The PHM Society conducts an annual conference in North America, an Asia-Pacific conference in odd years, and a European conference in even years. We use the abbreviations ”PHM NA”, ”PHM AP”, and ”PHM EU” to represent the competitions held in North America, Asia-Pacific, and Europe, respectively. For instance, ”2018 PHM NA” refers to the challenge held in North America by the PHM Society in 2018 while ”2023 IEEE” represents the competition hosted by the IEEE Reliability Society in 2023. Table 2 presents the systems, research tasks, and the number of research papers discussed in nine data competitions. For an in-depth overview of the PHM data challenge competitions and their associated datasets, please refer to Appendix and Section 3.

2.3 Major Research Tasks within PHM

By reviewing 9 PHM data challenge competitions, we summarized four major research tasks: detection, diagnosis, assessment, and prognosis.

Detection: It refers to identifying the presence of a fault, anomaly, or abnormal condition in a system or component. This is typically the first step in PHM, where sensors and monitoring systems are used to detect deviations from normal operations that might indicate a problem.

Diagnosis: Upon detecting anomalies, the diagnostic phase delves deeper to determine failure types or failure modes and find out the root causes of the problem. In diagnostic scenarios, detected failures often need to be classified into specific failure types.

Assessment: In the assessment phase, the current operational status and performance of the system is evaluated. Leveraging either historical data or recent machinery behavior, this phase evaluates potential risks or assesses the health status of the system in its present condition.

Prognosis: Prognosis leverages both current and historical data to forecast the future health of a system or the residual life of a machine. Commonly, this is referred to as predicting the Remaining Useful Life (RUL). This phase provides estimates on potential system or machinery failure timelines, thereby facilitating proactive maintenance planning.

Table 3: Overview of Detection and Diagnosis Problems in PHM Data Challenge Competitions

	2021 PHM EU	2022 PHM EU	2022 PHM NA	2023 PHM AP	2023 IEEE
System	Manufacturing Production Line	Printed Circuit Boards	Rock Drill	Spacecraft Propulsion System	Gearbox
Failure Mode	8	1+1+2	10	3	4
Sensor Number	50	NA	3	7	1
Asset	NA	NA	8	4	NA
Operating Condition	2	1+1+1	1	1	2
Data Type	Time Series	Tabular	Pressure Signals	Pressure Signals	Vibration Signals
Volume of Data	Limited	Medium	Large	Limited	Large
Volume of Data	(70 Train, 29 Test)	(SPI:1924, AOI:1924)	(37229 Train, 16396 Test)	(177 Train, 46 Test)	(Total 50000)
Sampling Rate	0.1 Hz	NA	50 kHz	1 kHz	10 kHz
Sampling Interval	1-3 hours	NA	NA	1.2 s	5 minutes

3 Methodology & Analytics

In Section 3, we categorize the competitions based on ”Detection & Diagnosis” task and ”Assessment & Prognosis” task and delve deeper into problems, challenges, and ML method analysis in Section 3.1 and Section 3.2, respectively. In Section 3.3, we summarize a unified ML framework for the PHM domain.

3.1 Detection & Diagnosis

In this subsection, we sequentially introduce the fundamental competition information, including background, objectives, challenges, and datasets, highlighting innovative approaches for fault detection and diagnosis problems. A summarization of the competition problems and their datasets is encapsulated in Table 3. Furthermore, we provided a comprehensive summary of the methodologies utilized for detection and diagnosis problems, as outlined in Table 4.

3.1.1 2021 PHM EU (Manufacturing Production Line)

In a collaborative effort with the Swiss Centre for Electronics and Microtechnology (CSEM), 2021 PHM EU offered a dataset derived from a real-world industrial manufacturing line dedicated to testing electrical fuses. The objective of this competition is to perform fault identification and classification, root cause analysis, and system operation parameter identification. The dataset showed eight unique system failure modes under two distinct operating conditions. The primary challenge with the training data was its class imbalance, as the majority of samples represented healthy conditions. Additionally, the dataset was quite small, with only 70 training samples and 29 testing samples. In the fuse test bench dataset, about 10% of the data was missing, and it was not evenly distributed across variables.

Against this background, the winning solution was a combination of decision tree algorithms and a propagation system [23]. While the decision trees focused on diagnosis issue, the propagation system tackled chronology by incoperating a Kalman-style filter. To address data imbalance, the SMOTE (Synthetic Minority Oversampling Technique) method was utilized. Additionally, [45] incorporated the Leave One Feature Out Importance (LOFO-Importance) package for capturing essential features. Subsequently, they applied linear discriminant analysis (LDA) for dimensionality reduction. During the modeling phase, different ML methods were employed — gradient boosting algorithms such as Extreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), LDA classifier, and Gaussian process classifier. Especially for gradient boosting algorithms, they used Genetic Algorithms to optimize hyperparameters. A key observation was XGBoost’s superior performance over the other algorithms. Differing from their approach, [3] showcased a rule-based diagnostic technique, comparing its performance with that of decision trees and random forest methods. Additionally, [7] proposed a regularized LSTM for sifting through vital features and then leveraged an ensemble of binary LSTM classifiers for fault detection and classification.

Moreover, the XGBoost algorithm found favor not just in the aforementioned studies but also in three other distinct research papers [4, 91, 112] outside the competition. The distinctions among these studies were as follows: For instance, [112] relied on a feature importance ranking (FIR) method, targeting enhanced performance and simplification in complex industrial classification scenarios, whereas [91] focused on the challenge of missing value imputation, harnessing Partial Least Squares (PLS-MV). A significant contribution of Ramezani’s framework was its explainability, achieved by pinpointing key performance indicators for each fault family using the SHAP method [74].

3.1.2 2022 PHM EU (Printed Circuit Boards)

The 2022 European PHME Data Challenge, hosted in collaboration with Bitron Spa, focused on a classification issue within an actual industrial Printed Circuit Board (PCB) production line. The challenge’s objectives contain three important tasks: (1) Task 1: predicting Automatic Optical Inspection (AOI) defect detection based on Solder Paste Inspection (SPI) data; (2) Task 2: predicting human-made visual inspection labels (OperatorLabel); (3) Task 3 predicting the human-assigned repair label (RepairLabel). Task 1 and Task 2 are binary classification problems, whereas Task 3 is a multi-class classification problem. This explains why, in Table 3, the ”Failure Mode” is listed as ”1+1+2”. In Table 3, the ”Operating Condition” is listed as ”1+1+1”, which means each task has its own operating condition. Meanwhile, the SPI dataset(2022 PHM EU) had missing information in specific fields and 95% of the SPI data was classified as healthy, highlighting a significant imbalance issue.

In 2022 PHM EU, Gaffet’s team got 1st place using XGBoost method based on encoding and feature engineering [32]. Notably, they harnessed the SHAP method for model interpretation. Similarly, [108] leveraged two tree-based algorithms (LightGBM and XGBoost). Their approach centered on solving classification problems by extracting significant statistical data during the feature engineering phase. The importance of feature engineering was further emphasized by [110]. They introduced a novel statistical feature extraction method coupled with a PinNumber-based technique. This method aimed to compress pin-level data into component-level information. When predicting automatic inspection defects, they integrated a neural network model, factoring in feeding imbalance control to navigate data imbalance challenges. Additionally, a random forest model was developed for both human inspection and repair predictions. Apart from feature engineering, [95] applied a multi-layer perceptron (MLP) neural network for defect predictions in automated inspections. For human inspection outcomes, a random forest algorithm was their choice, while decision trees were favored for predicting human repair labels.

Outside of competition, Mirzaei’s team delved deep into the challenge of imbalanced data. They proposed a data-level technique that integrated recursive feature elimination (RFE) for feature selection and oversampling methods. This ensured balanced representation for minority classes, enhancing the performance of multiple ML algorithms, from decision trees and random forests to SVMs and 1dCNNs [80]. Additionally, [62] introduced a new quality management paradigm termed Stream-of-Quality (SoQ) tailored for multi-stage manufacturing processes. By leveraging this dataset, they showcased the effectiveness of their methodology, offering promising avenues for refining industrial AI algorithms and methodologies in a systematic manner.

3.1.3 2022 PHM NA (Rock Drill)

Rock drills play a pivotal role in sectors like mining, tunneling, and construction. Due to the potential economic and human costs from work interruptions caused by rock drill faults, ensuring accurate fault diagnosis is necessary. 2022 PHM NA aimed to address fault classification problem under various product configurations. The main challenges is domain-shift problem, caused by data being collected from different rock drill machines. The dataset encompassed training data from five rock drill machines, validation data from another machine, and test data from two additional machines, highlighting potential domain shift issues between training, validation, and test data. This leads to heterogeneous signal distributions, which can negatively impact classification accuracy. The dataset [48] encompasses ten distinct fault modes and one health mode, with training, validation, and testing data sizes of 34,045, 3,184, and 16,396, respectively.

In the competition, [86] won the first place. They deployed a hybrid strategy, combining data-driven techniques with various signal-processing methods. Their approach harnessed domain adaptation, metric learning, and pseudo-label-based deep learning to construct an ensemble DL model for comprehensive fault classification. For samples that posed challenges for DL, they deployed signal processing methods like Dynamic Time Warping (DTW), and Cross-correlation, and used SVM for supervised learning. The runner-up solution introduced a data-cropping technique, employing a convolutional neural network (CNN) as a feature extractor to bridge data length discrepancies. [53] innovatively addressed the domain-shift issue through a domain-adaptation-based scheme that harnessed a domain adversarial learning neural network for extracting domain invariant features while using maximum mean discrepancy (MMD) minimization for bridging distribution discrepancy, and a soft voting ensemble to reduce model uncertainty. Moreover, Minami’s team proposed an ensemble approach, appending specialized models onto a baseline model. This ensemble incorporated domain adaptation strategies to accommodate domain fluctuations. The baseline model employed conventional ML algorithms like SVM, Random Forest, and XGBoost for whole multi-class classification, whereas the specialized sub-models, leveraging CNN for feature extraction and classification, concentrated on binary classifications targeting specific classes that poorly performed on the baseline model [79].

Table 4: Overview of Detection and Diagnosis Methodologies in PHM Data Challenge Competitions

Methodology

Deep Learning

Conventional Machine Learning

Feature Engineering

2021 PHM EU

Regularized LSTM

Ensemble LSTM

XGBoost, Decision Trees

Random Forest

LDA, Rule-based Method

Gaussian Process

FIR

PLS-MV

FCM

SMOTE

2022 PHM EU

1D-CNN

Decision Trees, Random Forest

XGBoost, LightGBM

SVM, MLP

RFE

Statistical Feature Extraction

2022 PHM NA

Domain Adaption

DANN, X-Vectors

Metric Learning

Pseudo Label Technique

Ensemble Learning

XGBoost

SVM

Deep Forest Algorithm

MMD

RFE

DTW

2023 PHM AP

Similarity-based Method

K-means clustering, KNN

Decision Trees, XGBoost

Rule-based method

Ensemble Learning

Physical Feature Extraction

DTW

2023 IEEE

ROCKET, LSTM-FCN

1D-CNN with ResNet

Deep Residual Network

Residual-based CNN

Ensemble Learning

Data Augmentation

Data Regularization

STFT

Outside the competition, [69] deployed an end-to-end fault classification framework derived from X-Vectors [101], achieving integrated optimization of both feature extraction and classification phases. [109] proposed a novel deep forest algorithm that fused multi-grained scanning for feature extraction and a cascade forest for layered predictions to perform failure model classification.

3.1.4 2023 PHM AP (Spacecraft Propulsion System)

The Japan Aerospace Exploration Agency (JAXA) initiated a competition centered on the advancement of PHM technology for spacecraft propulsion systems. The primary objective was to accurately diagnose various states ranging from normal conditions to bubble anomalies, solenoid valve faults, and unknown faults. Simulation data was sourced from four distinct spacecraft, resulting in a dataset comprising 177 training samples and 46 test samples [114]. The paucity of data posed a challenge for data-driven approaches.

In the competition, the champions [78], introduced a novel two-step approach. Initially, a similarity-based model was proposed for the categorization of data into four distinct states. Subsequently, for data corresponding to the solenoid valve fault, a model incorporating physic-inspired features was employed to pinpoint the fault location and estimate the valve opening ratio. They also deployed DTW [11] on the training dataset which was instrumental in quantifying the variability across various segments of the sensor data. Standing in the second position, [65] devised a hybrid approach combining the XGBoost-based method and the rule-based method. While the XGBoost-based approach primarily addressed comprehensive fault classification, the rule-based method was employed to formulate the solenoid valve opening ratio equation. This was accomplished through polynomial fitting, rooted in intrinsic physical characteristics, enabling a precise estimation of the solenoid valve opening ratio.

Additionally, [51] utilized the K-NN algorithms to classify faults and pinpoint the location of anomalies. Prior to the classification, a differentiation between normal and anomalous data was executed based on a similarity-based approach. Their estimation metrics for valve opening ratio were hinged upon the similarity of time series waveforms. Moreover, [2] employed an ensemble framework, integrating K-means clustering and decision trees. This model, enriched by domain-specific expertise, exhibited good precision in both anomaly detection and fault diagnosis. Various approaches in 2023 PHM AP underscore the importance of similarity-based methods and the extraction of physical features when the dataset size is small.

Table 5: Overview of Assessment and Prognosis Problems in PHM Data Challenge Competitions

	2018 PHM NA	2019 PHM NA	2020 PHM EU	2021 PHM NA
System	Ion Mill Etching System	Fatigue Crack	Filtration System	Turbofan Engine
Failure Mode	3	1	1	7
Sensor Number	5	2	3	14
Asset	20	8	NA	NA
Operating Condition	Multiple	Variable Loading Conditions	12	4
Data Type	Time Series	Time Series	Time Series	Time Series
Volume of Data	Limited Samples	Limited Samples	Limited Samples	Limited Samples
Volume of Data	(20 Train, 5 Test)	(74 Train, 36 Test)	(24 Train, 8 Validation, 16 Test)	(90 Train Units, 38 Test Units)
Sampling Rate	0.25 Hz	5 Hz	10 Hz	1 Hz
Sampling Interval	Above 70 million seconds	14000-100774 cycles	200-350 s	1-3 hours, 3-5 hours, Above 5 hours

3.1.5 2023 IEEE (Gearbox)

The objective of 2023 IEEE is to develop ML-based models that can efficiently detect faults in the planetary gearboxes of industrial machinery using vibration signals. The dataset covers four prevalent sun gear faults: surface wear, chipped teeth, cracks, and missing teeth. Vibration signals have been recorded for a duration of five minutes each, with a sampling rate of 10 kHz, under two distinct operational conditions. Given the substantial dataset, comprising 50,000 samples, the deployment of DL techniques is feasible.

In 2023 IEEE. [66] utilized ensemble-based CNN techniques to address fault detection via time-series vibration data. Their findings underscored robust performance of integrating three convolution kernel-based methods such as ROCKET (RandOm Convolutional KErnel Transform) method [24], one dimensional convolutional neural networks (1dCNN) integrated with ResNet, and a fusion of LSTM and Fully Convolutional Network (FCN) in delivering classification results for multivariate time-series data. Similarly, both [97] and [55], designed residual-based CNN models to address fault classification challenges. On the one hand, [97] incorporated the Short-Time Fourier Transform (STFT) [44], transforming frequency domain signals into time-frequency domain signals using Fourier analysis - an useful tool for interpreting time-evolving signals. On the other hand, [55] leveraged data augmentation and regularization techniques, enabling model construction with fewer parameters without sacrificing performance quality. Additionally, [102] harnessed a tree classifier to select valuable features from raw vibration signals, subsequently developing a sequential neural network model tailored for the concurrent detection of multiple gear faults.

3.2 Assessment & Prognosis

Similar to the structure of subsection 3.1, we detail specifics of each competition, including the problems, challenges, and data-driven approaches applied. We also provide a consolidated summary of both the problems with their associated datasets and the proposed solutions in Table 5 and Table 6, respectively.

3.2.1 2018 PHM NA (Ion Mill Etching System)

The 2018 PHM NA emphasized the analysis of fault behavior within the ion mill etch tool, tasking participants to develop a model from the sensor-derived time series data capable of accurately detecting, diagnosing, and prognosticating the time-to-failure for three principal failure modes—Flowcool leak (F1), Flowcool Pressure Too High Check Flowcool Pump (F2), and Flowcool Pressure Dropped Below Limit (F3). Additionally, the Ion Mill Etching (IME) dataset faced an imbalance issue, where the faulty data is much less than normal operation data.

For this dataset, [125, 42, 129, 38]have each proposed strategies based on random forest algorithms for early degradation mode detection and diagnosis. Extending the exploration of ML techniques, [99] evaluated an assortment of models, including Generalized Linear Models, MLP, Multivariate Adaptive Regression Splines, Support Vector Regression (SVR), random forest, etc. Among these, the random forest model distinguished itself with superior performance. Meanwhile, [131] utilized a knowledge distillation approach towards fault detection across various modes, aiming to improve detection performance of infrequent but critical faults.

Recent advancements in RUL prediction [52] have predominantly hinged on the application of DL algorithms to analyze complex multivariate time series data. [125, 42, 38] have all utilized Long Short-Term Memory (LSTM) networks and their variations, such as Gated Recurrent Units (GRU) and LSTM-based Metric Regression (LSTM-MR), to capture important features from the raw data. [39] expanded upon this approach by integrating a Temporal Convolutional Network (TCN) with LSTM with attention mechanisms, which facilitated refined feature extraction from sensor data for accurate RUL prediction. Moreover, Liu et al.’s two-stage deep transfer learning framework aimed at achieving accurate RUL prediction. In the first stage, the developed model leveraged TCN for initial temporal feature learning, followed by domain adversarial learning for data alignment based on one fault mode. Then in the second stage, the first-stage model was fine-tuned based on other fault mode data to handle multiple fault modes and enhance the RUL prediction performance [70]. Distinct from these methods, [129] explored transformer networks, focusing solely on data from abnormal operation phases, while [72] provided a comprehensive comparison of state-of-the-art methods, including TCN, LSTM, attention-based mechanisms, CNN, and Transformer. Moreover, [99] introduced a novel approach using DTW for RUL estimation, leveraging a library of truncated degradation curves and health score models to refine the final RUL predictions.

3.2.2 2019 PHM NA (Fatigue Crack)

2019 PHM NA focused on the task of fatigue crack length estimation and prediction within aluminum structures at different points. The challenge harnessed wave signal data collected by piezoelectric sensors subjected to both static and dynamic tensile stresses. Given that the test dataset was limited to wave signals from several initial loading cycles, participants were challenged to estimate crack lengths where signal data were present and predict future crack growth for specified cycles lacking signal data. The complexity of the problem is that data-driven methodologies were viable when signals existed, whereas scenarios lacking wave data required exploration into physics-based strategies. Therefore, solutions derived from this challenge have predominantly employed a hybrid approach of data-driven and physics-based techniques.

Regarding the estimation of fatigue crack length with accessible wave signals, a variety of data-driven models and feature engineering strategies emerged, both within and beyond the competition’s scope. [50] proposed a neural network architecture reliant on features manually derived from raw signals, such as the Pearson correlation coefficient, phase shifts, energy, and information entropy, to train models for accurate crack length estimation. Similarly, [54] initiated their approach with signal preprocessing, utilizing techniques like band-pass filtering and phase alignment to mitigate noise and uncertainty before applying physically insightful feature extraction methods. Thereafter, they harnessed a random forest algorithm, optimizing it through feature selection and grid search for hyperparameter fine-tuning, to estimate crack lengths. Additionally, [127] also leveraged a band-pass filter for feature extraction from raw wave signals, subsequently constructing an SVR model with hyperparameters optimized via grid search. Rao and collaborators, meanwhile, designed an ensemble learning regression model to improve estimation performance with four useful extracted features (root mean square value, correlation coefficient, first peak value, and the logarithm of kurtosis) [92].

Predictive modeling for scenarios lacking wave signal data required a pivot towards physics-based techniques, often in conjunction with data-driven insights, to forecast crack progression. [50] devised a Particle Filter (PF) strategy, integrating the Paris Law and outputs from the previously developed neural network as observational inputs to refine and update the crack propagation pathway. [54] suggested an ensemble prognostics framework under consistent loading conditions, constructing a probability density function (PDF) for each instance. This was followed by a computation of weights derived from each PDF to output the final crack length prediction. Furthermore, when different loading conditions prevailed, Walker’s equation was utilized to forecast crack lengths. Innovatively, [127] introduced a trans-fitting approach, aimed at extracting the crack growth trend from training data and extrapolating it to the test data predictions. Additionally, Rao’s group advanced a variation version of Paris’ Law, aiming to elucidate the correlation between crack progression and the number of loading cycles [92].

3.2.3 2020 PHM EU (Filtration System)

Table 6: Overview of Assessment and Prognosis Methodologies in PHM Data Challenge Competitions

Methodology

Deep Learning

Conventional Machine Learning

Feature Engineering

2018 PHM NA

Transfer Learning

LSTM, LSTM-MR, GRU

Transformer

CNN

TCN, TCN-LSTM, TCN-DANN

Random Forest, Gradient Boosting

Logistic Regression

Generalized Linear Model

SVR, MLP

MASR

DTW

SVR-RFE

Degradation Curve Library

2019 PHM NA

Random Forest, Gaussian Process

SVR

Neural Network

Ensemble Learning

Linear Regression

Grid Search

Genetic Algorithm

Paris’s Law

Walker’s Equation

Particle Filter

2020 PHM EU

Neural Turing Machine

Transfer Ensemble Learning

LSTM, Bi-LSTM, TEL-Bi-LSTM

Autoencoder-Regression Network

Deep CNN

Random Forest, Gradient Boosting

Gaussian Process

Kernel Regression

SVR

Ensemble Learning

Simple Statistics Method

Linear SVM Coefficient

Correlation Metric

RFE, Health Index

Monotonicity Test

2021 PHM NA

Deep CNN, FCN, VGG

ResNet(Residual Block)

GoogLeNet(Inception Module)

Random Forest, XGBoost

Extreme Random Forest

ANN

PCA

XAI,SHAP, LIME

In the domain of industrial maintenance, filtration systems are crucial for helping process pollutants from industrial equipment, ensuring seamless system operation. A predominant challenge encountered within these systems is filter clogging - a phenomenon where accumulated pollutants impede flow rates, thereby disrupting standard industrial workflows. To address this issue, 2020 PHM EU concentrated on the prediction of filtration systems’ RUL. RUL, in this context, is delineated as the time until the pressure differential across the filter breaches a threshold of 20 psi. The challenge provided the PHME20 dataset, collected from a controlled experimental setup designed to simulate filter clogging at various contamination levels. Twelve distinct conditions were established influenced by two operational parameters: solid ratio (%) and particle size (µm). Additionally, the 2020 PHM EU dataset demonstrated domain shift problem, with training data featuring small and large particle sizes, while the test data included medium particle sizes.

The champion solution [71] employed a novel hybrid approach, combining kernel regression with fundamental statistical methodologies. Meanwhile, the runner-up, [10], adopted different cutting-edge feature engineering, and ML methods. Their process began with the extraction of features through a rolling window technique and proceeded with feature selection informed by the linear kernel Support Vector Machine (SVM) coefficients, RFE, correlation matrix, and monotonicity testing. A four-layer sequential neural network was their model of choice, supported by K-fold cross-validation throughout the training and validation stages. Capturing the third position, [47] conducted a comparative study of tree-based algorithms and the Bayesian approach. Specifically, random forest, gradient boosting, and Gaussian process regression were utilized to estimate the RUL of the filtration system, supplemented by a novel fault-based RUL assignment that integrated ”Piecewise RUL Assignment” and ”Linear RUL Assignment”.

Contrasting with the feature engineering and conventional ML strategies, some researchers explored some DL methods. [120] introduced a CNN-based DL methodology for RUL prediction, marked by two architectural innovations: a Parameterized Fully Connected Layer that adjusts network weights in response to operational parameter shifts, and a multi-head predictor tailored to distinct degradation process stages. Additionally, [113] presented a new transfer ensemble learning (TEL) framework, leveraging metric learning alongside domain dissimilarity metric and Kullback–Leibler (KL) divergence, to enhance model generalization from source to target domains. This TEL framework amalgamated with a bidirectional long short-term memory (Bi-LSTM) algorithm, coined as TEL-Bi-LSTM, was offered for RUL estimation under different operating conditions. In another innovative approach, [46] proposed a joint autoencoder-regression network, a deep neural architecture that fused a CNN autoencoder with an LSTM network regressor in an end-to-end training paradigm. Genetic algorithms were instrumental in optimizing hyperparameters for this architecture. Additionally, Lee’s team developed a distinct strategy by first establishing a health assessment criterion [64]. They defined a Health Index (HI) for the filter system and utilized K-means clustering for the categorization of the system’s health stages. Subsequent HI predictions were facilitated by the Bi-LSTM algorithm, thus determining the system’s RUL. Lastly, Falcon et al. introduced an innovative sequence modeling technique termed the Neural Turing Machine (NTM) [30]. Conceptualized as a computational architecture, the NTM leverages available data to interact with an external memory component, an approach that facilitates enhanced accuracy in predictions. This model stands out for its ability to generate more precise outcomes when benchmarked against the prevalent LSTM-based solutions that dominate the field.

3.2.4 2021 PHM NA (Turbofan Engine)

2021 PHM NA was primarily focused on the prediction of RUL for turbofan engines [13], specifically under four distinct flight conditions and seven failure modes. Participants were required to create predictive models leveraging the N-CMAPSS dataset, aiming to accurately forecast RUL using complex condition monitoring data. The dataset, a collection consisting of 90 synthetic run-to-failure trajectories for training and an additional 38 truncated datasets for testing, served as a comprehensive foundation for developing robust predictive algorithms to predict RUL accurately.

During the competition, the winning team, led by Lovberg, put forward an innovative approach leveraging a deep convolutional neural network (DCNN) [73]. This network was distinguished by its use of dilated convolutions complemented by gated linear unit activations and integration of residual skip connections. Those techniques were designed to expand the network’s receptive field and enhance flexibility, so as to reduce the complexity of the neural network architecture by using less number of parameters, but still having comparable performance. Additionally, they adopted a strategic sequence sampling method, minimizing less informative samples while retaining enough degradation signals for the network’s input. Moreover, Solis-Martin et al. and DeVol’s team pursued advancements in DCNNs as well. The former developed a two-level DCNN system where the first-level DCNN focused on extracting important features from raw data and the second-level DCNN leverages the output from the first level to accurately estimate the RUL [103]. The latter drew upon established DL architectures, deploying the basic blocks or modules from the VGG, GoogLeNet, and ResNet designs into their DCNN framework. This enabled a comparative analysis of model performances using a variety of well-known architectural features [25, 26].

Outside of the competition, [19] addressed concerns regarding the uncertain and poor interpretability of deep learning models by integrating principal component analysis (PCA) to refine time-domain feature sets and subsequently applying four supervised learning techniques, including artificial neural network, random forest, extreme random forest and XGBoost, to estimate RUL. Their innovative use of a custom loss function in conjunction with traditional ANN models got the best results in both Area Under the Receiver Operating Characteristic (AUROC) and Area Under the Precision-Recall (AUPR) metrics. Moreover, the domain of PHM has witnessed an upsurge in the application of XAI techniques to enhance the interpretability and trustworthiness of ML models. Various methods, including LIME, SHAP, LRP, Image-Specific Class Saliency Maps, and Gradient weighted Class Activation Mapping (Grad-CAM), have been reported in recent literature [18, 20, 104]. These techniques are useful in elucidating the decision-making processes of complex models.

3.3 Comprehensive Summarization of Data-Driven Approaches in Recent PHM Data Challenge Competitions

After conducting an in-depth review of data competitions from the last six years, Figure 2 provides details into the publication count and the frequency of particular ML approaches within each competition. Furthermore, Figure 3 illustrates the density of specific ML or DL approaches discussed in this paper. It is important to note that we count the occurrences of distinct ML methods mentioned in a research paper. Given that some articles employ multiple ML methods, the aggregate count of methods exceeds the total number of publications.

Moreover, we have summarized a unified ML framework that concludes ML approaches for PHM data competitions during this period. As shown in Figure 4, it encompasses five primary components: Data Collection, Data Processing, Data Visualization, Conventional Machine Learning & Deep Learning, and Model Interpretability.

Data Collection: Data collection is the foundation of these open-source data challenge competitions (PHM Society and IEEE Reliability Society). Companies across various industries contribute datasets to encourage PHM community to develop innovative solutions to address real-world challenges. Typically, these industrial datasets encompass a diverse range of data types, including but not limited to time-series data, tabular data, images, sensor readings, and simulation data.

Data Processing: An important stage before ML model development is data processing, which is crucial for enhancing data quality and ensuring effective model training [17, 111, 21, 35]. This stage can be further subdivided into two aspects: data preprocessing and feature engineering. Data preprocessing addresses raw data challenges, typically including missing data, noise, outliers, data imbalance, and scaling issues. Techniques like imputation [29], denoising [23], outlier detection [77], resampling [16], and normalization & standardization [58, 34] are commonly applied. Feature engineering follows, refining data representation post-preprocessing. This step often uncovers hidden insights and deepens understanding of data, therefore significantly enhancing model performance and predictive capabilities in the later stage [98].

Data Visualization: Data visualization is another important aspect of the ML process. It involves transforming data into intuitive graphical representations, such as graphs or charts. Researchers can use these charts to better observe trends, patterns, or outliers in data, which further help people have a better understanding of data and generate some useful insights. In the PHM domain, effective visualization can help initial data exploration and analysis, and accelerate data processing and ML modeling development [12, 15].

Conventional Machine Learning & Deep Learning Techniques: Post data processing, various conventional ML, and DL methods are developed to solve the competition problems using useful and usable data, as shown in Figure 4. This segment includes ”model training and testing” and ”model prediction and classification”. During the training and testing phase. ML/DL techniques are diverse, including Tree-Based Methods, CNNs, RNNs and their variants, Transformers, DANN, SVMs, Unsupervised Learning, Ensemble Learning, Similarity-Based Methods, Rule-Based Methods, Transfer Learning, Domain Adaptation, etc. [96, 89] Once the ML models are trained and tested well, they could be deployed into various PHM applications which include detecting abnormal performance or faults, classifying different faults, assessing the current health state of systems, or predicting the RUL of components.

Model Interpretability: Except for the pursuit of accuracy in prediction and classification problems, model interpretability is becoming more and more important. Model interpretability refers to interpreting ML model outputs [81] which helps humans understand the ’why’ behind a model’s predictions, facilitating collaboration and more informed decision making. Many explainable artificial intelligence (XAI) methods such as Local Interpretable Model-Agnostic Explanations (LIME) [94] and SHapley Additive exPlanations (SHAP) [74] are utilized in the PHM data challenge competitions to explain model outputs in PHM [104].

4 Challenges and Possible Solutions

In this section, we summarize common challenges regarding data-related issues and model-related issues and analyze relevant solutions applied to solve these challenges, shown in Table 7. Data-related challenges encompass issues such as missing data, data imbalance, and domain shift while model-related challenges include the critical aspects of model selection, interpretability of ML models, and their robustness and generalization capabilities. These challenges highlight the complexities of developing effective ML methods to solve various PHM problems.

Moreover, the limitations of current PHM competitions reveal a gap in adopting systematic approaches for building effective PHM systems, and a need for multi-modal machine learning analysis. We also suggest further steps, aiming to accelerate the development of next-generation ML-driven PHM solutions.

Table 7: Summary of Common Challenges Regrading Data-related and Model-related Issues

Common Challenges

Potential Solutions

Related Competitions

Missing Data

(1) Listwise or Pairwise Deletion

(2) Imputation (LOCF, PLS-MV)

(3) Other

2021 PHM EU

2022 PHM EU

Data Issue

Data Imbalance

(1) Resampling

(Oversampling or Downsampling)

(2) Synthetic Data Generation

(SMOTE and Variants)

(3) Transfer Learning Techniques

2018 PHM NA

2021 PHM EU

2022 PHM EU

Domain Shift

(1) Transfer Learning Techniques

(2) Domain Adaptation Techniques

(3) DANN

2020 PHM EU

2022 PHM NA

Model Selection

(1) Need to Consider Volume and Quality

of Data

(2) A Trade-off Between Computational

Cost and Performance

All Competitions

Model Issue

Model Interpretability

(1) Explainable Artificial Intelligence

(XAI) methods like LIME, SHAP, etc.

No Competition

Required

Model Robustness

& Generalization

(1) Data Augmentation

(2) Regularization Techniques

(3) Ensemble Methods

(4) Transfer Learning & Domain Adaptation

All Competitions

4.1 Data-related Issues

4.1.1 Missing Data

Handling missing data in ML, especially in the context of the PHM is necessary, as incomplete data can significantly impact the model performance and prediction accuracy. The strategies to address missing data in PHM have evolved, encompassing a range of techniques from basic deletion to advanced imputation methods. A straightforward strategy is the deletion of data points with missing values, such as listwise or pairwise deletion. For example, in the 2022 SPI dataset, instances having missing values in crucial identifiers like ”Panel_ID”, ”Figure_ID”, and ”Component_ID” were eliminated [108]. However, this approach can lead to the loss of valuable information to some degree. Regarding imputation methods, several innovative methods were used in 2021 PHM EU. [23] applied the Last Observation Carried Forward (LOCF) method, coupled with backward filling, to address the gaps, while [91] introduced a novel approach, PLS-MV, a partial least squares-based method for imputing missing values. Interpolation was another technique used to estimate missing values, ensuring that cases with absent data did not skew the results. In addition to various imputation methods highlighted in the competitions, there remains scope for exploration in future research such as mean/median/multiple, K-Nearest Neighbors(KNN), regression imputation, maximum likelihood estimation, and ML-based approaches [43].

4.1.2 Data Imbalance

Data imbalance arises when the distribution of classes in a dataset is uneven. This issue is evident in PHM due to the rarity of failure events in comparison to data representing normal conditions. Data imbalance can significantly undermine the performance of ML models, particularly in classification tasks. When trained on imbalanced data, models may become biased towards the majority class (normal operation) and may not effectively recognize the minority class (failure). This leads to poor performance in predicting failures, which may bring ineffective maintenance planning and unexpected downtimes.

To counteract data imbalance in PHM, various strategies have been adopted. Resampling techniques are commonly used to adjust the dataset to balance the class distribution, either by oversampling the minority class or downsampling the majority class. Synthetic data generation is another approach, as demonstrated by a team in 2021 PHM EU that used the SMOTE to create synthetic samples of the minority class to balance the dataset [23]. Advanced algorithms also play a crucial role in addressing data imbalance. For instance, Tang’s team in 2022 PHM EU used the FIR to control the imbalance ratio in each mini-batch during neural network training [110]. Liu et al. employed transfer learning and domain adaptation techniques in 2018 PHM NA to facilitate knowledge transfer across different fault modes, addressing the issue of insufficient data in specific faults [70]. Moreover, during the 2018 PHM NA competition, many teams utilized Random Forest, an ensemble learning method known for its proficiency in handling imbalanced data, by constructing a forest of decision trees.

Beyond the techniques showcased in the competitions, new DL approaches are proposed, such as the semi-supervised information maximizing generative adversarial network [124], the integration of deep residual networks with auxiliary classifier generative adversarial networks [14], the combination of DL with SMOTE [22], etc. Furthermore, a standardized experimental framework is proposed by Aguiar’s team in order to evaluate 24 state-of-the-art data stream algorithms across 515 imbalanced data streams [1]. Going forward, there is a need for the exploration of additional DL-based strategies to enhance the handling of data imbalance in the PHM domain.

4.1.3 Domain Shift

Domain shift refers to the changes in the data distribution between the training phase (source domain) and the real-world application phase (target domain) of the ML models. This phenomenon frequently occurs when models, initially trained on data from a specific set of machines or under certain conditions, are subsequently applied to different machines or varied operating conditions. Such a shift can markedly affect the performance and reliability of ML models in PHM, making tackling with domain shift problem essential for maintaining the robustness and effectiveness of PHM systems. Transfer learning has emerged as a primary solution to domain shift challenges, as it can achieve the adaptation of models from one domain to be effective in another through fine-tuning and domain adaptation techniques. For instance, Kim et al. utilized domain adversarial neural networks (DANN) along with the minimization of MMD to tackle domain discrepancies in 2022 PHM NA [53]. Meanwhile, Oh et al. employed the deep CORAL method, calculating coral loss to diminish domain discrepancy effects by aligning the covariance of the six training domains with that of the test domain [86]. Moreover, Tian’s team developed a novel TEL framework, facilitating knowledge transfer from the source to the target domain in 2020 PHM EU [113]. Additionally, robust modeling approaches and the ability to continuously update models with new data are also useful and imperative for mitigating the impact of domain shift.

4.2 Model-related Issues

4.2.1 Model Selection: Conventional Machine Learning or Deep Learning?

In PHM data challenge competitions, participants often grapple with the difficult decision of whether to use conventional ML algorithms or advanced DL models. This choice is influenced by various factors: the volume and quality of available data, and a trade-off between computational cost and performance. An analysis of research papers from the past six years reveals insightful trends and preferences in model selection:

(1) The volume and quality of training data play a significant role in determining the choice of modeling approach. Research teams often choose DL for building data-driven models when training data are abundant. Conversely, in scenarios with limited training data, conventional ML methods, augmented by feature engineering, are more commonly employed.

This trend is clear when addressing classification problems within PHM competitions. As depicted in Table 4 and Figure 2, 2022 PHM NA and 2023 IEEE competitions, because of their large datasets, have facilitated the application of DL techniques. In 2023 IEEE, various deep CNN models are proposed, including ensemble-based CNNs [66] and residual-based CNNs [97, 55]. Additionally, in 2022 PHM NA, many teams integrate diverse DL approaches with transfer learning such as metric learning + pseudo label-based DL [86], CNN + DANN [53], X-Vectors [69]. However, the scenario differed for the 2022 PHM EU competition, which provided approximately 2000 samples and encountered data imbalance challenges. Given these constraints, many teams leaned towards tree-based methods known for their robustness and ability to address data imbalance, deploying algorithms like XGBoost [32, 108], LightGBM [108], decision trees [80], and random forest [110, 80]. In competitions with significantly fewer data samples, such as 2021 PHM EU and 2023 PHM AP, where the dataset sizes were around 100-200 samples, the deployment of DL was impractical due to its requirement for a large volume of data. Instead, conventional ML methods proved more effective. In 2021 PHM EU, tree-based methods were predominantly used [23, 45, 4], while in 2023 PHM AP, similarity-based [78, 51], rule-based [65], KNN/K-means [51, 2], and tree-based methods [65, 2] are proposed, emphasizing the adaptability of traditional methods to limited data scenarios.

In the realm of RUL prediction tasks, the inherent long-time series nature of the datasets, even when datasets are smaller (less than 100 samples), allows for different approaches to data utilization and data augmentation. By considering each time point or a sequence of time points (data window) as an independent training sample, the effective size of the dataset can be substantially increased. This is evident from Table 6 and Figure 2, which indicate the application of both DL and conventional ML methods across various competitions such as CNN-based [120, 103, 25], RNN-based [125, 113], transfer learning [70, 113], Autoencoder-based [46], transformer-based [129, 72] methods, and conventional ML with physics-based approaches [54, 127] to tackle RUL prediction challenges.

(2) When selecting DL models, there is often a trade-off between computational cost and performance. In the competitions, participants may choose highly complex DL models that require substantial computational resources to achieve even a small improvement in accuracy. While this approach might secure a higher position on the leaderboard, it may not be the most practical choice for real-world implementation. In practice, the value of such a small accuracy improvement needs to be weighed against the increased computational cost and potential scalability issues.

(3) When datasets become publicly accessible, there is a clear shift towards the adoption of more sophisticated DL approaches in PHM data challenge competitions over time. Taking the 2018 PHM NA competition as a case study, our review of 12 published papers utilizing this dataset reveals that the methodologies initially favored by the competition teams largely comprised tree-based methods, SVMs, and basic LSTM algorithms [42, 99]. However, as time goes by, there is a discernible trend towards the development of more complex DL algorithms. This includes but is not limited to, GRU [125], knowledge distillation [131], a two-stage deep transfer learning framework utilizing TCN and DANN [70], TCN combined with LSTM [39], attention mechanisms [39, 72], and transformers [129, 72]. This trend underscores a growing need for continuous innovation and refinement of DL methods in the PHM domain.

4.2.2 Machine Learning Model Interpretability

In PHM data challenge competitions, accuracy in prediction and classification is still the primary goal. However, with the advance of ML and DL, emphasizing the importance of model interpretability is becoming as crucial as their accuracy in prediction and classification, because it helps to illustrate the model decision-making process and plays an important role in error analysis and further model refinement in PHM.

While only a few teams in competitions have used model interpretability methods to explain the output of their models, outside of competitions, various techniques have been developed and applied to enhance ML model interpretability [68, 6]. As the complexity of deep neural network models, often referred to as ”black boxes”, some previous research utilized interpretable methods like Layer-wise Relevance Propagation (LRP), Gradient-weighted Class Activation Mapping in CNNs, and attention mechanisms in sequential models to shed light on model decision-making processes [104]. Additionally, model agnostic methods like SHAP [74, 107] and LIME [94]have been instrumental in offering insights into model behavior. For instance, Baptista et al. applied the SHAP model to evaluate the outcomes of three different algorithms (Linear Regression, MLP, and Echo State Network) using the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset (jet engines) [9]. Moreover, Moradi et al. introduced an interpretable artificial neural network designed for the automatic selection and fusion of features to develop optimal health indicators from data gathered through structural health monitoring (SHM) [82]. Furthermore, in an era increasingly focused on ethical and responsible AI, transparent and interpretable models are key to not only enhancing the technical aspects of AI solutions in PHM but also extending to ensuring their successful integration and acceptance in real-world applications [119].

4.2.3 Model Robustness and Generalization

Model robustness is defined as the ability of a model to maintain its performance when facing diverse challenges like noise, outliers, and adversarial examples. Techniques such as data augmentation, where training data can be expanded by creating modified versions of existing data or synthesizing new data, have become commonplace. For instance, in 2023 IEEE competition, Kreuzer’s team employed methods like additive white Gaussian noise, circular shift, and random amplitude scaling to increase the volume of training data [55]. Moreover, regularization techniques like L1 and L2 regularization have been instrumental in preventing overfitting and enhancing stability, making the model less sensitive to small fluctuations in input data. What’s more, ensemble methods have been increasingly recognized for the contribution to model robustness in PHM. A comprehensive analysis of recent studies shows that out of 59 research papers, nine utilized various ensemble techniques - including tree-based methods, ensemble LSTM [7], soft voting ensemble [53], transfer ensemble learning [113], ensemble regression [92], and ensemble CNN-based [66] approaches - across six different PHM data challenge competitions. Additionally, adversarial training, which involves training models on both regular and adversarial data, has been recognized for its potential to fortify models against adversarial attacks [124, 53, 89]. Moreover, it is also important to consider probabilistic machine learning techniques, such as Bayesian networks, Gaussian processes, and probabilistic graphical models, which incorporate probability theory into the modeling process to handle uncertainty issues. They are crucial for dealing with uncertainties and improving the robustness of PHM models [33, 83, 36].

Alongside robustness, model generalization refers to developing models that perform reliably on new, unseen data. One of the techniques for improving generalization is cross-validation, which involves dividing the training data into several subsets and each time using one of the subsets as the validation data and others as the training data for model validation during the training process. In 2021 PHM NA, researchers opted for k-fold repeated random subsampling validation to address its limitation, wherein the size of the validation set diminishes as the number of folds (k) increases [103]. In addition to cross-validation, transfer learning, and domain adaptation methods have been crucial in maintaining model effectiveness and generalization when data from the target domain differs from the source domain. The specific implementations in PHM competitions have been detailed in the earlier subsubsection titled “Domain Shift”. Outside of PHM data challenges, some novel approaches are proposed to deal with model generalization issues. Matthew Russell and Peng Wang adopted a domain adversarial transfer learning method inspired by generative adversarial networks, utilizing a 1D CNN architecture to predict tool wear on unseen domains using NASA’s milling dataset [122]. Ding et al. developed a multi-source domain generalization learning approach (GRU + Transformer) that can effectively learn useful degradation feature representations from various run-to-failure datasets of internal combustion engine journal bearings across different conditions and predict unseen working conditions well [27]. Furthermore, Ding et al. proposed an adversarial out-domain augmentation (AOA) framework for predicting the RUL of bearings under unseen conditions. The effectiveness of this AOA-based RUL prediction was validated using IEEE PHM Challenge 2012 and XJTU-SY run-to-failure datasets, illustrating its robustness in domain generalization for predictive maintenance [28].

4.3 Limitations of Current PHM Competitions and Opportunities

4.3.1 A Lack of Multi-Modal Machine Learning Analysis

Our review of PHM data challenge competitions reveals a reliance on single-modality data in the competitions, such as pressure signals, currents, vibrations, or images, without incorporating multi-modal datasets for fault diagnosis and prognosis. Multi-modal machine learning (MMML) in PHM refers to capturing complementary information from multiple data sources (different types) to achieve a more comprehensive and precise evaluation of PHM tasks [90, 115]. Despite its potential, it is still underexplored in the PHM domain.

Recently, Jiang et al. leveraged two modality data (vibration and current signals) to develop deep belief networks (DBNs) for diagnosing wind turbine gearbox faults [49]. Su et al. proposed an MMML model using parametric specifications, text descriptions, and images of vehicles to predict five vehicle rating scores [106]. Additionally, Fan et al. evaluate several MMML strategies to create a comprehensive PHM system for coolant pumps in commercial heavy-duty vehicles, utilizing data from onboard signals, multi-dimensional histograms, and categorical variables [31]. Wang et al. proposed a novel method for feature fusion in multimodal data (vibration and torque signals), applying it to diagnose the bearings faults [121]. For future PHM data challenges, the provision of open-source multi-modal datasets would empower researchers to investigate and apply more advanced MMML techniques, potentially leading to advancements in the context of PHM.

4.3.2 A Lack of Adopting Systematic Approaches for Effective PHM Systems Construction

Our analysis of ML and DL methods across nine open-source industrial datasets has revealed the advantages of ML methods applied in PHM. Nevertheless, the nature of the competition tends to prioritize solutions that chiefly enhance accuracy, potentially at the expense of a systematic approach, reusability, and methodological inheritance. It is therefore vital to pursue systematic methodologies for constructing effective PHM systems that go beyond the competitive framework. This should involve thorough research and analysis of open-source datasets to advance ML and DL strategies, aiming not just for competition success but also for benchmarking and comparative analysis.

Souza et al. devised an ML-based, data-oriented pipeline for constructing a Prognosis and Health Management System (PHMS) focused on RUL prediction, utilizing semi-supervised ML with Autoencoder, XGBoost, and SHAP method [105]. As shown in Figure 5, Hu et al. offered a new perspective on reviewing PHM efforts by proposing a division of the PHM lifecycle into DEsign, DEvelopment, and DEcision ( $DE^{3}$ ) phases, and showcasing the important activities and challenges within these stages [40]. Additionally, Lee et al. introduced a novel SoQ methodology for multi-stage manufacturing processes. It can help to analyze multi-parameter influences on product quality and model inter-process relationships in multi-stage manufacturing systems [62]. Moving forward, developing novel, systematic approaches for PHM systems should be encouraged in future PHM data challenge competitions. These efforts can augment the systematization and applicability of ML and DL approaches, thereby expanding their utility beyond the confines of the competition-centric paradigm. Such advancements promise to narrow the divide between academic research and industrial practice, facilitating the broader adoption of data-driven ML across various industrial contexts.

5 Prospects

There are still some research directions that deserve deeper investigation and exploration by the research community going forward.

(1) A Need for Open-Source Multi-Modal Datasets. In the PHM domain, the availability of multi-modal datasets is notably limited. While prior research has leveraged diverse modal information, including vibration, current, or torque signals for diagnosing issues in wind turbine gearboxes, bearings, or other industrial products [49, 121, 31], these datasets are all private. This restriction to some extent hampers the capacity for broad-based development of MMML approaches. To overcome this challenge, there is a need for the PHM community to collaboratively establish and maintain multi-modal industrial datasets, enriched with high-quality data and labels. This would involve the collection, alignment, and annotation of multi-modal data with PHM-centric attributes. Moreover, providing latent representations or pre-trained embeddings, if possible, can also accelerate and efficiently train new MMML models and facilitate knowledge transfer across various PHM tasks, ultimately benefiting the whole PHM community.

(2) Development of Multi-Modal Machine Learning Approaches. Furthermore, the investigation of a broader range of MMML techniques is highly encouraged. On the one hand, for time series data of a single modality, it may be feasible to extract features representative of different modalities from the time series data itself, such as the time domain (origin signal), frequency domain (FFT), PSD, STFT, etc. This approach could lead to the preliminary training of unimodal ML models on each single modality, followed by the exploration of MMML strategies. On the other hand, MMML methodologies necessitate more effective representation learning and information alignment techniques—the former concerning the efficient encoding of single modality data, and the latter focusing on the enhanced analysis and fusion of multimodal information for effective PHM prediction or classification tasks. Although simple concatenation is a common method for data fusion in MMML, emerging fusion techniques, such as attention based or transformer based mechanisms [117], deserve further exploration. These advanced methods have the potential to effectively capture implicit feature alignments across modalities and facilitate cross-modal synthesis [76, 126]. Yet, research on MMML within the PHM field remains underexplored, highlighting a significant opportunity to explore.

(3) Further Exploration in Machine Learning Model Interpretability. The adoption of DL in PHM has underscored the need for models that are not only high-performing but also interpretable. Techniques such as advanced data visualization and XAI methods are emerging as key tools in explaining the outputs of ML models [68, 6]. These methods are anticipated to provide industries and academia with clearer insights into the decision-making processes of PHM models, thereby building trust and facilitating more informed decision-making [104]. However, current XAI methods predominantly address the interpretability of models using tabular data, text, and images as inputs, leaving a gap in methods tailored for time series data. Moreover, most of the current interpretability methods are applied to unimodal ML models, and the interpretability of MMML models has not been explored. Addressing these gaps can help to balance the performance and interpretability of ML models.

(4) Novel Transfer Learning and Domain Adaptation Techniques Development for Model Robustness, and Generalization. Alongside interpretability, the robustness and generalization of ML models are also important. Novel approaches in transfer learning and domain adaptation can be further developed to ensure models are resilient to data variability and operational uncertainties and capable of adapting to new, unseen scenarios [8]. Currently, research on transfer learning in the PHM domain predominantly addresses fault diagnosis, with only a few studies exploring prognosis. Looking forward, how to utilize the power of transfer learning for prediction problems is still a critical issue. Additionally, cross-modal transfer learning (CMTL) emerges as a critical area of interest in PHM, aiming to improve the knowledge transfer between distinct domains. Moreover, the challenge of collecting a sufficiently large, labeled dataset is a significant barrier in practical applications. The development of unsupervised and semi-supervised transfer learning techniques may help to address this issue.

(5) Potential Utilization of Large Language Models (LLMs) and Industrial Large Knowledge Models (ILKMs). Recent advancements in large language model technologies have shown remarkable abilities in natural language processing and related tasks, hinting at the potential for general artificial intelligence applications [130]. Leveraging these cutting-edge technologies could lead to new changes in PHM domain. Yang et al. introduced a novel benchmark dataset focused on Question Answering (QA) in the industrial domain and proposed a new model interaction paradigm, aimed at enhancing the performance of LLMs in domain-specific QA tasks [123]. This approach signifies a substantial stride in customizing LLMs for more specialized, industry-oriented applications. Meanwhile, Li’s team systematically reviewed the current progress and key components of ChatGPT-like large-scale foundation (LSF) models, and provided a comprehensive guide on adapting these models to meet the specific needs of PHM, underscoring the challenges and opportunities for future development [67]. Moreover, as shown in Figure 6, Lee’s team proposed an Industrial Large Knowledge Model (ILKM) framework that aims to solve complex challenges in intelligent manufacturing by combining LLMs and domain-specific knowledge [63]. Therefore, integrating specialized domain knowledge with LLM technology presents a good opportunity to develop more effective ML models, potentially leading to better solutions for challenges in PHM.

6 Conclusion

In summary, ML gradually becomes a cornerstone in PHM, reflecting the potential for innovative advancements in future PHM development. This paper serves as a valuable resource for both academic and industry professionals in the PHM domain, offering a unified ML framework in PHM and a comprehensive overview of the current state-of-the-art ML approaches for diagnostics and prognostics of industrial systems using industrial open-source data from recent PHM data challenge. Based on two primary research task categories: ”Detection & Diagnosis” and ”Assessment & Prognosis”, we provide a detailed explanation of the problems, tasks, challenges, and relevant ML approaches to each competition. Furthermore, we summarize common challenges, including data-related and model-related issues, and analyze the solutions to address these challenges. Moreover, we evaluate the limitations of these PHM data challenge competitions and suggest future directions that PHM data challenge competition could focus on. Finally, we prospect five potential research directions in the application of data-driven ML within PHM, encompassing a need for open-source multi-modal datasets, development of MMML approaches, further exploration of ML model interpretability, improving the robustness, and generalization of ML models, and utilization the potential of LLMs and ILKMs.

Nomenclature

$AI$	Artificial Intelligence
$Bi-LSTM$	Bidirectional Long Short-Term Memory
$CNN$	Convolutional Neural Network
$DANN$	Domain Adversarial Neural Networks
$DL$	Deep Learning
$DTW$	Dynamic Time Warping
$FCM$	Fuzzy C-Means
$FCN$	Fully Convolutional Network
$FIR$	Feature Importance Ranking
$GAN$	Generative Adversarial Networks
$GRU$	Gated Recurrent Units
$ILKM$	Industrial Large Knowledge Model
$KNN$	K-Nearest Neighbors
$LDA$	Linear Discriminant Analysis
$LightGBM$	Light Gradient Boosting Machine
$LIME$	Local Interpretable Model-Agnostic Explanations
$LKL$	Large Knowledge Library
$LLM$	Large Language Model
$LSTM$	Long Short-Term Memory
$ML$	Machine Learning
$MLP$	Multi-Layer Perceptron
$MMD$	Maximum Mean Discrepancy
$PCA$	Principal Component Analysis
$PDF$	Probability Density Function
$PHM$	Prognostics and Health Management
$PLS$	Partial Least Squares
$PSD$	Power Spectral Density
$QA$	Question Answering
$RNN$	Recurrent Neural Network
$RUL$	Remaining Useful Life
$SHAP$	SHapley Additive exPlanations
$SMOTE$	Synthetic Minority Oversampling TEchnique
$SoQ$	Stream-of-Quality
$STFT$	Short-Time Fourier Transform
$SVM$	Support Vector Machine
$SVR$	Support Vector Regression
$TCN$	Temporal Convolutional Network
$XAI$	Explainable Artificial Intelligence
$XGBoost$	Extreme Gradient Boosting

References

[1] G. Aguiar, B. Krawczyk, and A. Cano. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Machine learning, pages 1–79, 2023.
[2] O. K. Aimiyekagbon, A. Lowen, A. Bender, L. Muth, and W. Sextro. Expert-informed hierarchical diagnostics of multiple fault modes of a spacecraft propulsion system. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[3] O. K. Aimiyekagbon, L. Muth, M. Wohlleben, A. Bender, and W. Sextro. Rule-based diagnostics of a production line. In PHM Society European Conference, volume 6, pages 10–10, 2021.
[4] M. G. Alfarizi, J. Vatn, and S. Yin. An extreme gradient boosting aided fault diagnosis approach: A case study of fuse test bench. IEEE Transactions on Artificial Intelligence, 2022.
[5] S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 291–300. IEEE, 2019.
[6] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82–115, 2020.
[7] G. Aydemir, A. Avcı, M. Kocakulak, and T. Bekiryazıcı. Ensemble of lstm networks for fault detection, classification, and root cause identification in quality control line. In PHM Society European Conference, volume 6, pages 6–6, 2021.
[8] M. S. Azari, F. Flammini, S. Santini, and M. Caporuscio. A systematic literature review on transfer learning for predictive maintenance in industry 4.0. IEEE access, 11:12887–12910, 2023.
[9] M. L. Baptista, K. Goebel, and E. M. Henriques. Relation between prognostics predictor evaluation metrics and local interpretability shap values. Artificial Intelligence, 306:103667, 2022.
[10] H. Beirami, D. Calzà, A. Cimatti, M. Islam, M. Roveri, and P. Svaizer. A data-driven approach for rul prediction of an experimental filtration system. In PHM Society European Conference, volume 5, 2020.
[11] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd international conference on knowledge discovery and data mining, pages 359–370, 1994.
[12] R. Carley, S. Fuller, W. Bond, P. Jones, D. Allen, A. Jordan, and T. Falls. Data analytics and visualization application for asset health monitoring. In Annual Conference of the PHM Society, volume 14, 2022.
[13] M. A. Chao, C. Kulkarni, K. Goebel, and O. Fink. Phm society data challenge 2021, 2021.
[14] J. Chen, C. Lin, J. Cui, and H. Ge. An fault diagnostic method based on drn-acgan for data imbalance. In 2022 Prognostics and Health Management Conference (PHM-2022 London), pages 97–102. IEEE, 2022.
[15] X. Cheng, J. K. Chaw, K. M. Goh, T. T. Ting, S. Sahrani, M. N. Ahmad, R. Abdul Kadir, and M. C. Ang. Systematic literature review on visual analytics of predictive maintenance in the manufacturing industry. Sensors, 22(17):6321, 2022.
[16] S. Cicak and U. Avci. Handling imbalanced data in predictive maintenance: A resampling-based approach. In 2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pages 1–6. IEEE, 2023.
[17] S. Cofre-Martel, E. Lopez Droguett, and M. Modarres. Big machinery data preprocessing methodology for data-driven models in prognostics and health management. Sensors, 21(20):6841, 2021.
[18] J. Cohen, E. Byon, and X. Huan. To trust or not: Towards efficient uncertainty quantification for stochastic shapley explanations. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[19] J. Cohen, X. Huan, and J. Ni. Fault prognosis of turbofan engines: Eventual failure prediction and remaining useful life estimation. arXiv preprint arXiv:2303.12982, 2023.
[20] J. Cohen, X. Huan, and J. Ni. Shapley-based explainable ai for clustering applications in fault diagnosis and prognosis. arXiv preprint arXiv:2303.14581, 2023.
[21] D. Corrêa, A. Polpo, M. Small, S. Srikanth, K. Hollins, and M. Hodkiewicz. Data-driven approach for labelling process plant event data. International Journal of Prognostics and Health Management, 13(1), 2022.
[22] D. Dablain, B. Krawczyk, and N. V. Chawla. Deepsmote: Fusing deep learning and smote for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2022.
[23] K. L. de Calle-Etxabe, M. Gómez-Omella, and E. Garate-Perez. Divide, propagate and conquer: Splitting a complex diagnosis problem for early detection of faults in a manufacturing production line. In PHM Society European Conference, volume 6, pages 9–9, 2021.
[24] A. Dempster, F. Petitjean, and G. I. Webb. Rocket: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery, 34(5):1454–1495, 2020.
[25] N. DeVol, C. Saldana, and K. Fu. Inception based deep convolutional neural network for remaining useful life estimation of turbofan engines. In Annual Conference of the PHM Society, volume 13, 2021.
[26] N. DeVol, C. Saldana, and K. Fu. Evaluating image classification deep convolutional neural network architectures for remaining useful life estimation of turbofan engines. International Journal of Prognostics and Health Management, 13(2), 2022.
[27] N. Ding, H. Li, Q. Xin, B. Wu, and D. Jiang. Multi-source domain generalization for degradation monitoring of journal bearings under unseen conditions. Reliability Engineering & System Safety, 230:108966, 2023.
[28] Y. Ding, M. Jia, Y. Cao, P. Ding, X. Zhao, and C.-G. Lee. Domain generalization via adversarial out-domain augmentation for remaining useful life prediction of bearings under unseen conditions. Knowledge-Based Systems, 261:110199, 2023.
[29] I. Eekhout, R. M. de Boer, J. W. Twisk, H. C. de Vet, and M. W. Heymans. Missing data: a systematic review of how they are reported and handled. Epidemiology, 23(5):729–732, 2012.
[30] A. Falcon, G. D’Agostino, O. Lanz, G. Brajnik, C. Tasso, and G. Serra. Neural turing machines for the remaining useful life estimation problem. Computers in Industry, 143:103762, 2022.
[31] Y. Fan, A. Atoui, S. Nowaczyk, and T. Rognvaldsson. Evaluation of multi-modal learning for predicting coolant pump failures in heavy duty vehicles. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[32] A. Gaffet, N. B. Roa, P. Ribot, E. Chanthery, and C. Merle. A hierarchical xgboost early detection method for quality and productivity improvement of electronics manufacturing systems. In 7th European Conference of the Prognostics and Health Management Society 2022, 2022.
[33] Z. Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521(7553):452–459, 2015.
[34] I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, 2016.
[35] T. Griffiths, D. Corrêa, M. Hodkiewicz, and A. Polpo. Managing streamed sensor data for mobile equipment prognostics. Data-Centric Engineering, 3:e11, 2022.
[36] I. Hazra, A. Chatterjee, J. Southgate, M. J. Weiner, K. M. Groth, and S. Azarm. A reliability-based optimization framework for planning operational profiles for unmanned systems. Journal of Mechanical Design, 146(5):051704, 2024.
[37] I. Hazra, M. J. Weiner, R. Yang, A. Chatterjee, J. Southgate, K. M. Groth, and S. Azarm. Prognostics and health management of unmanned surface vessels: Past, present, and future. Journal of Computing and Information Science in Engineering, 24(8), 2024.
[38] A. He and X. Jin. Failure detection and remaining life estimation for ion mill etching process through deep-learning based multimodal data fusion. Journal of Manufacturing Science and Engineering, 141(10):101008, 2019.
[39] C.-Y. Hsu, Y.-W. Lu, and J.-H. Yan. Temporal convolution-based long-short term memory network with attention mechanism for remaining useful life prediction. IEEE Transactions on Semiconductor Manufacturing, 35(2):220–228, 2022.
[40] Y. Hu, X. Miao, Y. Si, E. Pan, and E. Zio. Prognostics and health management: A review from the perspectives of design, development and decision. Reliability Engineering & System Safety, 217:108063, 2022.
[41] B. Huang, Y. Di, C. Jin, and J. Lee. Review of data-driven prognostics and health management techniques: lessions learned from phm data challenge competitions. Machine Failure Prevention Technology, 2017:1–17, 2017.
[42] W. Huang, H. Khorasgani, C. Gupta, A. Farahat, and S. Zheng. Remaining useful life estimation for systems with abrupt failures. In Annual conference of the PHM society. September, pages 24–27, 2018.
[43] Y. Huang, Y. Tang, J. VanZwieten, and J. Liu. Reliable machine prognostic health management in the presence of missing data. Concurrency and Computation: Practice and Experience, 34(12):e5762, 2022.
[44] Z. Huang, J. Zhu, J. Lei, X. Li, and F. Tian. Tool wear monitoring with vibration signals based on short-time fourier transform and deep convolutional neural network in milling. Mathematical Problems in Engineering, 2021:1–14, 2021.
[45] K. İnce, U. Ceylan, N. N. Erdoğmuş, E. Sirkeci, and Y. Genc. Fault detection and classification for robotic test-bench. In PHM Society European Conference, volume 6, pages 7–7, 2021.
[46] K. İnce and Y. Genc. Joint autoencoder-regressor deep neural network for remaining useful life prediction. Engineering Science and Technology, an International Journal, 41:101409, 2023.
[47] K. Ince, E. Sirkeci, and Y. Genç. Remaining useful life prediction for experimental filtration system: A data challenge. In PHM Society European Conference, volume 5, 2020.
[48] E. Jakobsson, E. Frisk, M. Krysander, and R. Pettersson. A dataset for fault classification in rock drills, a fast oscillating hydraulic system. In Annual Conference of the PHM Society, volume 14, 2022.
[49] G. Jiang, J. Zhao, C. Jia, Q. He, P. Xie, and Z. Meng. Intelligent fault diagnosis of gearbox based on vibration and current signals: a multimodal deep learning approach. In 2019 Prognostics and System Health Management Conference (PHM-Qingdao), pages 1–6. IEEE, 2019.
[50] S. F. Karimian, R. Moradi, S. Cofre-Martel, K. M. Groth, and M. Modarres. Neural network and particle filtering: a hybrid framework for crack propagation prediction. arXiv preprint arXiv:2004.13556, 2020.
[51] Y. Kato, T. Kato, and T. Tanaka. Anomaly detection in spacecraft propulsion system using time series classification based on k-nn. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[52] S. Kim, J.-H. Choi, and N. H. Kim. Challenges and opportunities of system-level prognostics. Sensors, 21(22):7655, 2021.
[53] Y. C. Kim, T. Kim, J. U. Ko, J. Lee, and K. Kim. Domain adaptation based fault diagnosis under variable operating conditions of a rock drill. International Journal of Prognostics and Health Management, 14(2), 2023.
[54] H. B. Kong, S.-H. Jo, J. H. Jung, J. M. Ha, Y. C. Shin, H. Yoon, K. H. Sun, Y.-H. Seo, and B. C. Jeon. A hybrid approach of data-driven and physics-based methods for estimation and prediction of fatigue crack growth. International Journal of Prognostics and Health Management, 11(1), 2020.
[55] M. Kreuzer and W. Kellermann. 1-d residual convolutional neural network coupled with data augmentation and regularization for the icphm 2023 data challenge. In 2023 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 186–191. IEEE, 2023.
[56] P. Kumar, I. Raouf, and H. S. Kim. Review on prognostics and health management in smart factory: From conventional to deep learning perspectives. Engineering Applications of Artificial Intelligence, 126:107126, 2023.
[57] H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, and M. Hoffmann. Industry 4.0. Business & information systems engineering, 6:239–242, 2014.
[58] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436–444, 2015.
[59] J. Lee, B. Bagheri, and H.-A. Kao. A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing letters, 3:18–23, 2015.
[60] J. Lee, H. Davari, J. Singh, and V. Pandhare. Industrial artificial intelligence for industry 4.0-based manufacturing systems. Manufacturing letters, 18:20–23, 2018.
[61] J. Lee et al. Industrial ai. Applications with sustainable performance, 2020.
[62] J. Lee, P. Gore, X. Jia, S. Siahpour, P. Kundu, and K. Sun. Stream-of-quality methodology for industrial internet-based manufacturing system. Manufacturing Letters, 34:58–61, 2022.
[63] J. Lee and H. Su. A unified industrial large knowledge model framework in industry 4.0 and smart manufacturing. International Journal of AI for Materials and Design, page 3681, 2024.
[64] S. Lee, S. Lee, K. Lee, S. Lee, J. Chung, C.-W. Kim, and J. Yoon. Data-driven health condition and rul prognosis for liquid filtration systems. Journal of Mechanical Science and Technology, 35:1597–1607, 2021.
[65] S. K. Lee, J. Lee, S. Lee, B. Kim, Y. C. Kim, J. Lee, and B. D. Youn. Hybrid approach of xgboost and rule-based model for fault detection and severity estimation in spacecraft propulsion system. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[66] X. Y. Lee, A. Kumar, L. Vidyaratne, A. R. Rao, A. Farahat, and C. Gupta. An ensemble of convolution-based methods for fault detection using vibration signals. arXiv preprint arXiv:2305.05532, 2023.
[67] Y.-F. Li, H. Wang, and M. Sun. Chatgpt-like large-scale foundation models for prognostics and health management: a survey and roadmaps. Reliability Engineering & System Safety, page 109850, 2023.
[68] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2020.
[69] H. Ling, T. Gao, T. Gong, J. Wu, and L. Zou. Hydraulic rock drill fault classification using x- vectors. Mathematics, 11(7):1724, 2023.
[70] C. Liu, L. Zhang, J. Li, J. Zheng, and C. Wu. Two-stage transfer learning for fault prognosis of ion mill etching process. IEEE Transactions on Semiconductor Manufacturing, 34(2):185–193, 2021.
[71] R. Łomowski and S. Hummel. A method to estimate the remaining useful life of a filter using a hybrid approach based on kernel regression and simple statistics. In PHM Society European Conference, volume 5, 2020.
[72] L. Lorenti, D. Dalle Pezze, J. Andreoli, C. Masiero, N. Gentner, Y. Yang, and G. A. Susto. Predictive maintenance in the industry: A comparative study on deep learning-based remaining useful life estimation. In 2023 IEEE 21st International Conference on Industrial Informatics (INDIN), pages 1–9. IEEE, 2023.
[73] A. Lövberg. Remaining useful life prediction of aircraft engines with variable length input sequences. In Annual Conference of the PHM Society, volume 13, 2021.
[74] S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
[75] R. Mallett, J. Hagen-Zanker, R. Slater, and M. Duvendack. The benefits and challenges of using systematic reviews in international development research. Journal of development effectiveness, 4(3):445–455, 2012.
[76] E. Mansimov, E. Parisotto, J. L. Ba, and R. Salakhutdinov. Generating images from captions with attention. arXiv preprint arXiv:1511.02793, 2015.
[77] P. Marti-Puig, A. Blanco-M, J. J. Cárdenas, J. Cusidó, and J. Solé-Casals. Effects of the pre-processing algorithms in fault diagnosis of wind turbines. Environmental modelling & software, 110:119–128, 2018.
[78] T. Minami and J. Lee. Phm for spacecraft propulsion systems: Similarity-based model and physics-inspired features. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[79] T. Minami, A. Suer, P. Kundu, S. Siahpour, and J. Lee. Novel ensemble domain adaptation methodology for enhanced multi-class fault diagnosis of highly-connected fleet of assets. In PHM Society Asia-Pacific Conference, volume 4, 2023.
[80] M. Mirzaei, M. H. Sadat, and F. Naderkhani. Application of machine learning for anomaly detection in printed circuit boards imbalance date set. In 2023 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 128–133. IEEE, 2023.
[81] C. Molnar. Interpretable machine learning. Lulu. com, 2020.
[82] M. Moradi, P. Komninos, R. Benedictus, and D. Zarouchas. Interpretable neural network with limited weights for constructing simple and explainable hi using shm data. In Annual Conference of the PHM Society, volume 14, 2022.
[83] K. P. Murphy. Probabilistic machine learning: an introduction. MIT press, 2022.
[84] K. T. Nguyen, K. Medjaher, and D. T. Tran. A review of artificial intelligence methods for engineering prognostics and health management with implementation guidelines. Artificial Intelligence Review, 56(4):3659–3709, 2023.
[85] S. Ochella, M. Shafiee, and F. Dinmohammadi. Artificial intelligence in prognostics and health management of engineering systems. Engineering Applications of Artificial Intelligence, 108:104552, 2022.
[86] H. J. Oh, J. Yoo, S. Lee, M. Chae, J. Park, and B. D. Youn. A hybrid approach combining data-driven and signal-processing-based methods for fault diagnosis of a hydraulic rock drill. International Journal of Prognostics and Health Management, 14(1), 2023.
[87] R. S. Peres, X. Jia, J. Lee, K. Sun, A. W. Colombo, and J. Barata. Industrial artificial intelligence in industry 4.0-systematic review, challenges and outlook. IEEE Access, 8:220121–220139, 2020.
[88] L. Polverino, R. Abbate, P. Manco, D. Perfetto, F. Caputo, R. Macchiaroli, and M. Caterino. Machine learning for prognostics and health management of industrial mechanical systems and equipment: A systematic literature review. International Journal of Engineering Business Management, 15:18479790231186848, 2023.
[89] S. Qiu, X. Cui, Z. Ping, N. Shan, Z. Li, X. Bao, and X. Xu. Deep learning techniques in intelligent fault diagnosis and prognosis for industrial systems: A review. Sensors, 23(3):1305, 2023.
[90] D. Ramachandram and G. W. Taylor. Deep multimodal learning: A survey on recent advances and trends. IEEE signal processing magazine, 34(6):96–108, 2017.
[91] S. B. Ramezani, A. Amirlatifi, T. Kirby, M. Seale, and S. Rahimi. Explainable machinery faults prediction using ensemble tree classifiers: Bagging or boosting? In Annual Conference of the PHM Society, volume 13, 2021.
[92] M. Rao, X. Yang, D. Wei, Y. Chen, L. Meng, and M. J. Zuo. Structure fatigue crack length estimation and prediction using ultrasonic wave data based on ensemble linear regression and paris’s law. International Journal of Prognostics and Health Management, 11(2), 2020.
[93] B. Rezaeianjouybari and Y. Shang. Deep learning for prognostics and health management: State of the art, challenges, and opportunities. Measurement, 163:107929, 2020.
[94] M. T. Ribeiro, S. Singh, and C. Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
[95] I. Schmidt, L. Dingeldein, D. Hünemohr, H. Simon, and M. Weigert. Application of machine learning methods to predict the quality of electric circuit boards of a production line. In PHM Society European Conference, volume 7, pages 550–555, 2022.
[96] O. Serradilla, E. Zugasti, J. Rodriguez, and U. Zurutuza. Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects. Applied Intelligence, 52(10):10934–10964, 2022.
[97] H. Shen, X. Wang, L. Fu, and J. Xiong. Gear fault diagnosis based on short-time fourier transform and deep residual network under multiple operation conditions. In 2023 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 166–171. IEEE, 2023.
[98] J. Sim, S. Kim, H. J. Park, and J.-H. Choi. A tutorial for feature engineering in the prognostics and health management of gears and bearings. Applied Sciences, 10(16):5639, 2020.
[99] K. Singh, B. Selvanathan, K. Zope, S. H. Nistala, and V. Runkana. Concurrent estimation of remaining useful life for multiple faults in an ion etch mill: a data-driven approach. In Annual conference of the PHM society, volume 10, 2018.
[100] E. Sisinni, A. Saifullah, S. Han, U. Jennehag, and M. Gidlund. Industrial internet of things: Challenges, opportunities, and directions. IEEE transactions on industrial informatics, 14(11):4724–4734, 2018.
[101] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5329–5333. IEEE, 2018.
[102] P. Sobha, M. Xavier, and P. Chandran. A comprehensive approach for gearbox fault detection and diagnosis using sequential neural networks. In 2023 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 180–185. IEEE, 2023.
[103] D. Solis-Martin, J. Galán-Páez, and J. Borrego-Diaz. A stacked deep convolutional neural network to predict the remaining useful life of a turbofan engine. arXiv preprint arXiv:2111.12689, 2021.
[104] D. Solís-Martín, J. Galán-Páez, and J. Borrego-Díaz. On the soundness of xai in prognostics and health management (phm). Information, 14(5):256, 2023.
[105] M. L. H. Souza, C. A. da Costa, and G. de Oliveira Ramos. A machine-learning based data-oriented pipeline for prognosis and health management systems. Computers in Industry, 148:103903, 2023.
[106] H. Su, B. Song, and F. Ahmed. Multi-modal machine learning for vehicle rating predictions using image, text, and parametric data. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, volume 87295, page V002T02A089. American Society of Mechanical Engineers, 2023.
[107] M. Sundararajan and A. Najmi. The many shapley values for model explanation. In International conference on machine learning, pages 9269–9278. PMLR, 2020.
[108] J. Taco, P. Gore, T. Minami, P. Kundu, A. Suer, and J. Lee. A novel methodology for health assessment in printed circuit boards. In PHM Society European Conference, volume 7, pages 556–562, 2022.
[109] J. Taco, P. Kundu, and J. Lee. A novel technique for multiple failure modes classification based on deep forest algorithm. Journal of Intelligent Manufacturing, pages 1–15, 2023.
[110] H. Tang, Y. Tian, J. Dai, Y. Wang, J. Cong, Q. Liu, X. Zhao, and Y. Fu. Prediction of production line status for printed circuit boards. In PHM Society European Conference, volume 7, pages 563–570, 2022.
[111] S. Tang, S. Yuan, and Y. Zhu. Data preprocessing techniques in convolutional neural network based on fault diagnosis towards rotating machinery. IEEE Access, 8:149487–149496, 2020.
[112] J. Tian, Y. Jiang, J. Zhang, Z. Wang, J. J. Rodríguez-Andina, and H. Luo. High-performance fault classification based on feature importance ranking-xgboost approach with feature selection of redundant sensor data. Current Chinese Science, 2(3):243–251, 2022.
[113] J. Tian, Y. Jiang, J. Zhang, S. Wu, and H. Luo. A novel transfer ensemble learning framework for remaining useful life prediction under multiple working conditions. IEEE Transactions on Instrumentation and Measurement, 2023.
[114] K. Tominaga, Y. Daimon, M. Toyama, K. Adachi, S. Tsutsumi, N. Omata, and T. Nagata. Dataset generation based on 1d-cae modeling for fault diagnostics in a spacecraft propulsion system. In PHM Society Asia-Pacific Conference, 2023.
[115] A. Tsanousa, E. Bektsis, C. Kyriakopoulos, A. G. González, U. Leturiondo, I. Gialampoukidis, A. Karakostas, S. Vrochidis, and I. Kompatsiaris. A review of multisensor data fusion solutions in smart manufacturing: Systems and trends. Sensors, 22(5):1734, 2022.
[116] K. L. Tsui, Y. Zhao, and D. Wang. Big data opportunities: System health monitoring and management. IEEE Access, 7:68853–68867, 2019.
[117] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[118] T. Vishnu, P. Gupta, P. Malhotra, L. Vig, and G. Shroff. Recurrent neural networks for online remaining useful life estimation in ion mill etching system. In Proceedings of the Annual Conference of the PHM Society, Philadelphia, PA, USA, volume 22, 2018.
[119] S. Vollert, M. Atzmueller, and A. Theissler. Interpretable machine learning: A brief survey from the predictive maintenance perspective. In 2021 26th IEEE international conference on emerging technologies and factory automation (ETFA), pages 01–08. IEEE, 2021.
[120] C. T. Vu, A. Chandra-Sekaran, and W. Stork. A deep learning first approach to remaining useful lifetime prediction of filtration system with improved response to changing operational parameters using parameterized fully-connected layer. In PHM Society European Conference, volume 6, pages 9–9, 2021.
[121] D. Wang, Y. Li, L. Jia, Y. Song, and Y. Liu. Novel three-stage feature fusion method of multimodal data for bearing fault diagnosis. IEEE Transactions on Instrumentation and Measurement, 70:1–10, 2021.
[122] P. E. Wang and M. Russell. Domain adversarial transfer learning for generalized tool wear prediction. In Annual conference of the PHM Society, volume 12, pages 8–8, 2020.
[123] Z. Wang, F. Yang, P. Zhao, L. Wang, J. Zhang, M. Garg, Q. Lin, and D. Zhang. Empower large language model to perform better on industrial domain-specific question answering. arXiv preprint arXiv:2305.11541, 2023.
[124] J. Wu, Z. Zhao, C. Sun, R. Yan, and X. Chen. Ss-infogan for class-imbalance classification of bearing faults. Procedia Manufacturing, 49:99–104, 2020.
[125] S. Wu, Y. Jiang, H. Luo, and S. Yin. Remaining useful life prediction for ion etching machine cooling system using deep recurrent neural network-based approaches. Control Engineering Practice, 109:104748, 2021.
[126] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1316–1324, 2018.
[127] M. Youn, Y. Kim, D. Lee, M. Cho, and B. D. Youn. Fatigue crack length estimation and prediction using trans-fitting with support vector regression. International Journal of Prognostics and Health Management, 11(1), 2020.
[128] Y. A. Yucesan, A. Dourado, and F. A. Viana. A survey of modeling for prognosis and health management of industrial equipment. Advanced Engineering Informatics, 50:101404, 2021.
[129] L. Zhao, Y. Zhu, and T. Zhao. Deep learning-based remaining useful life prediction method with transformer module and random forest. Mathematics, 10(16):2921, 2022.
[130] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
[131] J. Zheng, C. Liu, and L. Zhang. Cross-modal knowledge distillation for fault detection under multiple failure modes. In 2021 China Automation Congress (CAC), pages 7714–7719. IEEE, 2021.
[132] E. Zio. Prognostics and health management (phm): Where are we and where do we (need to) go in theory and practice. Reliability Engineering & System Safety, 218:108119, 2022.