Prediction of rare events in the operation of household equipment using co-evolving time series
abstract
In this study, we propose an approach for predicting rare events by exploiting time series in coevolution. Our approach involves a weighted autologistic regression model, where we leverage the temporal behavior of the data to enhance predictive capabilities. By addressing the issue of imbalanced datasets, we establish constraints leading to weight estimation and to improved performance. Evaluation on synthetic and real-world datasets confirms that our approach outperform state-of-the-art of predicting home equipment failure methods.
Keywords
Predictive analysis, time series, rare events, autologistic regression.
1 Introduction
Predicting rare events using time series in coevolution presents a significant challenge with a wide range of real-world applications, including equipment failures, disease outbreaks, and financial anomalies [1] [8] [7]. We consider an event as rare if it occurs with low frequency, but it is predictable insofar as observations of the event have been documented. The home equipment failures, which is the focus of this work, are an example of a such rare events. Home equipment are tools used by humans in their home lives. Predicting equipment failure is crucial for enhancing safety and reducing inconvenience and financial burdens on homeowners. Accurate prediction empowers proactive measures, optimizing maintenance, and aligning with IoT trends for smarter living environments [10]. Time series data are invaluable in this context, as it captures temporal patterns and dependencies influencing the event occurrences. Some studies dealing with the same problem have applied several predictive methods, such as the use of logistic regression and artificial neural networks to predict the failure of gearboxes for modern wind turbines in [5] and Random forest to predict hard drive failure [11]. However, those studies, especially when dealing with statistic based methods, did not fully leverage the temporal behavior of time series data, leading to low scores. For example, a logistic regression model used in [5] achieved a low accuracy of 59%. In contrast, two-class neural networks demonstrated higher prediction accuracy of 72.5%.
Furthermore, statistics literature reports that predicting rare event with imbalanced datasets, is challenging. For example, standard methods like logistic regression often underestimate rare event probabilities, leading to low recall and inadequate event prediction [4]. To address this, many studies propose giving more importance to the rare class by applying data sampling and class weighting strategies, like Bayesian-based modifications [2] [3]. In [12], weights are assigned based on the distribution of failure samples in the time series data, offering a different approach to address this issue. In [4], two algorithms have been introduced to calculate weights during the training step, enhancing the prediction of the rare class without relying on the distribution of failure samples in the data. Instead, these algorithms are based on the prediction errors of each class in the last iteration of training.
Moreover, studies have shown that fusioning additional external factors and phenomena that coevolves with sensors’ time series data can improve prediction performance. In [6], several phenomena are taken into account to predict electricity consumption such as population growth, technological developments, economic conditions, weather conditions, calendar, and calendar effects.
This study highlights the significance of leveraging temporal behavior of time series in coevolution and addressing imbalanced datasets to enhance rare event prediction’s precision. Compared to existing approaches, our novel incremental work presents several improvements: 1) Explore the coevolution of time series data with an autologistic regression model: this model is designed to incorporate the historical context of the time series’ coevolution. It captures autocorrelation and dependencies in the data, leading to a significant enhancement in predictive precision when compared to other models. 2) Incorporate weighted methods for precision optimization: our weighted approach considers the trade-off between minimizing prediction errors and maximizing precision, unlike the referenced works that primarily focus on the error of the rare class and disregard the error of predicting the main class. 3) Exploration of external factors and phenomena: our investigation delves into the incorporation of external factors and phenomena into our time series fusion. This exploration provides valuable insights for refining predictive models, contributing to proactive decision-making in real-world scenarios. 4) Emphasis on time continuity: in alignment with the observations in [9], we underscore the critical role of time continuity. Events do not unfold randomly in isolation, but often follow temporal patterns influenced by the passage of time.
2 Methodology
2.1 Autologistic Regression for Time Series
Throughout this paper, we consider a time series in coevolution , where represents the vector of observations recorded in time from different sensors describing the functioning of an equipment. Among the observations there are the ones relative to physical features such as the temperature, speed, noise, and other relatives to the operating state of the equipment such as the time since last failure and past failure frequency at each moment . Discussing the redundancy of these measurements is crucial in our analysis. Some of these measurements may exhibit correlations, such as the temperature and noise levels in specific equipment, or the time since the last failure and the past failure frequency at each time point, as pointed out. Redundancy can provide further insights into the phenomena under study, but it can also impact the conception of our prediction model. For instance, a strong correlation between two variables can pose challenges when utilizing them in a regression model. Therefore, it is essential to assume that the time series data we are working with are not highly correlated. To validate this assumption, we perform experimental assessments by measuring the pairwise correlations between the time series. Each observation is associated with a binary variable , which takes the value when a failure occurs at time , and otherwise. The binary variables also form a time series . Our objective is to predict the occurrence of a failure at time using and ; that is .
Autologistic regression is particularly well suited to modeling the probability of a failure, as it is designed to specify the probability of failure by leveraging various quantitative variables, which can be binary, categorical, or real. This method has demonstrated considerable success in predicting failure. Notably, when compared to complex models like neural networks, our approach offers a balance between frugality and explainability. It’s frugal in the sense that it requires fewer data during the learning phase, making it a resource-efficient choice. In addition, it is explainable because it makes it possible to evaluate the role of each variable in estimating the probability of failure. In our specific case, we argue that incorporating as a regressor introduces a temporal correlation in the predictions, effectively reducing randomness in the outcomes. This logistic regression model, enhanced with the regressor, is referred to as autologistic. The approach offers a valuable perspective for improving prediction accuracy while maintaining model interpretability. To specify the prediction model, we define as a Bernoulli variable, if a failure occurs with the probability in the interval and otherwise. The conditional probability of occurrence of this failure is given by:
(1) |
where is the following logistic function:
(2) |
The vector is the set of the autologistic regression parameters . In addition, takes the value 1 if a failure occurs between times t and t + , and 0 otherwise. Furthermore, represents the start date of the memory for each time series. The same start date for all chronological series simplifies the model, as it is a compromise between the different coevolving chronological series. Selecting the value of involves identifying the time lags that leads to an acceptable prediction error. In this work, we utilized the Forward Feature Selection (FFFS) method to select this value.
The determination of a failure occurrence is established by contrasting the probability derived from Equation 7 with a predefined threshold. If the calculated probability surpasses this threshold, it signifies a potential indication of failure, prompting the decision. Multiple threshold values were experimentally examined to fine-tune the decision process in alignment with the model’s effectiveness. Upon surpassing the choosing threshold, signaling a heightened probability of failure, the verdict is made that a fault is indeed present.
2.2 Parameter estimation
There are several methods to estimate the parameter vector in autologistic regression such as Maximum Likelihood Estimation (MLE) [13] and Bayesian estimation [14]. In this paper, we employ MLE for robust parameter estimation by maximizing the likelihood, ensuring principled model fitting and capturing underlying relationships effectively. For that, we define the likelihood as the probability of given , and :
(3) |
Applying conditional probabilities rule and since depends conditionally on the previous observations and , as well as on the parameter set , we can therefore write:
(4) |
For computational convenience, we use the log of the previous equation given by:
(5) |
As mentioned in equation 1, we can deduce that the conditional probability of observing can be modeled using a Bernoulli distribution as follows:
(6) |
where is the logistic function as mentioned in equation 2.
In this paper, we used the gradient descent algorithm [16] to minimize with respect to . Note that the values of the probability in Equation 1 can vary abruptly in time. These fluctuations can cause temporal discontinuities in our predictions, leading to false alarms. To overcome this challenge, we added a smoothing probabilities constraint. Supposing that a failure lasts for a minimum duration L, we express this constraint by choosing to calculate the average of the last L probabilities using a moving average. This average probability is then used to make a final decision on the presence or absence of the failure, as mentioned in equation 7.
(7) |
By averaging the L probabilities, we mitigate the impact of any outliers, i.e. abnormally high or low probabilities, often associated with noise. So, if a probability has been abnormally influenced by an outlier, this influence is reduced when averaged with L other values, helping to stabilize our predictions.
2.3 Class weighting
Recall that a binary variable is a label associated with . The rarity of failure leads to having fewer observations with associated labels . This imbalance leads to biased estimates and failure prediction errors [4]. To counter this, we incorporate class weighting into the likelihood. Assigning higher weight to the minority class and lower weight to the majority class rebalances their influence. In this case, equation 6 is rewritten as follows:
(8) |
Two methods were often used for estimating weights and : simple weighting and adaptive weighting [4]. The simple weighting method involves assigning weights to classes based on their frequencies in the dataset. For the majority class (class 0), is calculated as , where is the number of minority class observations, and is the total number of observations. Conversely, for the minority class (class 1), is computed as , with representing the number of majority class observations. This approach has limitations, notably its ineffectiveness in the presence of extreme class imbalance and outliers. This can give too much emphasis to the majority class, which hurts model performance for underrepresented classes. Our choice, the Adaptive Class Weights approach, addresses these limitations. Initially, we can use the weights calculated by the simple method instead of considering them equal as mentioned in [4]. Then, at each training iteration, weights are updated based on prediction errors as suggested in equation 9. The algorithm adaptively calculates the weights, giving priority to classes with higher prediction errors. This adaptability readjusts the model’s focus and enhances its predictive capabilities for underrepresented classes.
(9) |
3 Experimental results
3.1 Used data
In our analysis, we concentrated on various home equipment failure data sources to evaluate our model’s performance. We began by generating synthetic data, creating a controlled environment to evaluate the model’s efficacy. This synthetic dataset, consisting solely of sensor readings, served as a benchmark for comparison. Subsequently, we introduced simulated data from HVAC (Heating, Ventilation, and Air Conditioning) systems, featuring 4 failures. Additionally, we tested our model’s proficiency using real-world data from a water pump. This pump logs its sensor readings and operational status every minute and recorded a total of 7 failures.
The table 1 below succinctly summarizes these datasets. It provides a clear overview of the duration, granularity of data collection, and, most importantly, the number of failures experienced by each equipment. It’s important to note that each sensor in these datasets provides a time series, which is integral to our modeling approach.
3.2 Performance assessment
In this section, we aim to measure the performance of the proposed approach by evaluating our autologistic regression model across various types of equipment. For this evaluation, we have considered five metrics, specifically chosen to align with and facilitate direct comparison with state-of-the-art methods. The first is Accuracy, which indicates the proportion of correct predictions relative to all predictions. The second is Recall, representing the proportion of failures that the model correctly identified. Specificity measures the proportion of normal operation identified correctly, while the F-Score is the harmonic mean of precision and recall. Additionally, we evaluate the Number of false alarms, counting failure predictions while the equipment is operating normally. Beyond these metrics, we also delve into investigating the effects of varying time intervals, understanding the influence of weighting, and gauging the impact of other considered phenomena.
Table 2 summarizes the scores: For the ”pump” equipment type over a 10-day interval, the model exhibits an accuracy of 0.9997, a recall of 0.7255, and only 3 false alerts. When the interval is shortened to 5 days, the accuracy stands firm at 0.9750, but the recall dips to 0.5056, accompanied by 154 false alerts, while retaining the same decision threshold of 0.9. On a 1-day interval, the model’s recall drops significantly to 0.2185, although its accuracy remains at 0.896. Synthetic data demonstrates exceptional performance without any imbalance.
In this approach, the data division for training and testing the model is based on the proportion of failures. Specifically, 80% of the failures are used for training the model, and the remaining 20% are used for testing. For instance, with synthetic data that contains a total of 30 failures, the split is as follows: 24 failures (which is 80% of the total) are utilized for training the model, and the remaining 6 failures (20% of the total) are used for testing the model’s predictions at different time intervals. This method ensures a balanced approach, where the majority of data is used for learning, and a significant portion is reserved for evaluating the model’s predictive accuracy.
q |
|
Accuracy | Recall | Specificity | F1-score |
|
|||||
Synthetic | |||||||||||
1 hr | 10 min | 0.4100 | 1.000 | 0.8832 | 1.000 | 0.9380 | 0 | ||||
2 hrs | 13 min | 1.3500 | 1.000 | 0.8464 | 1.000 | 0.9168 | 0 | ||||
Pump | |||||||||||
10 days | 22 min | 0.5800 | 0.9997 | 0.7255 | 0.9997 | 0.8408 | 3 | ||||
5 days | 20 min | 0.3000 | 0.9750 | 0.5056 | 0.9914 | 0.6659 | 154 | ||||
1 day | 15 min | 0.0500 | 0.896 | 0.2185 | 0.9973 | 0.3514 | 73 | ||||
HVAC | |||||||||||
10 days | 10 min | 0.7200 | 0.9938 | 0.4537 | 0.9994 | 0.6239 | 8 | ||||
20 days | 12 min | 3.1200 | 0.9989 | 0.7739 | 0.9974 | 0.8721 | 16 | ||||
15 days | 10 min | 1.4300 | 0.9931 | 0.4767 | 0.9951 | 0.6442 | 50 |
3.2.1 Changing memory effect
From this point forward, our focus will be on the 10-day prediction model, which has demonstrated superior performance with water pump data. This model will be the subject of our detailed analysis in the subsequent sections. An important aspect of this model is the role of memory. As shown in Figure 1, there is a positive correlation between memory size and the F1-score, indicating enhancements in both precision and recall. Notably, a memory duration of 22 minutes is identified as yielding the optimal F1-score.
The choice of the F1-score as a metric is deliberate, as it provides a balanced measure between precision and recall. This balance is crucial in our context, since both false positives (predicting a failure when there is none) and false negatives (overlooking an impending failure) have significant consequences.

3.2.2 Weighting effect
In evaluating the impact of different weighting methods on model performance, as presented in Table 3, significant variations are observed. The Adaptive Weighting method notably outperforms the other approaches, especially when considering precision, recall, and F1-score. Specifically, it increases precision by 0.46% compared to the unweighted approach, while significantly enhancing recall by 258.27%. This method also achieves an impressive 149.85% increase in the F1-score, underlining its effectiveness in balancing precision and recall. Moreover, it dramatically reduces the number of false alarms by 84.21%, a crucial factor in practical applications.
In contrast, the Simple Weighting method, while increasing recall by 46.77%, does so at the expense of precision, which decreases by 1.34%. Furthermore, this method results in a substantial increase in false alarms, rising by 452.63% compared to the unweighted approach. These figures highlight the trade-offs inherent in different weighting strategies, with Adaptive Weighting emerging as a more balanced and effective solution in this context.
|
|
|
|||||||
W0 | 1 | 1.23 | 0.369 | ||||||
W1 | 1 | 1.36 | 0.631 | ||||||
Precision | 0.9951 | 0.9997 | 0.9818 | ||||||
Recall | 0.2025 | 0.7255 | 0.2972 | ||||||
Specificity | 0.9982 | 0.9997 | 0.9903 | ||||||
F1-score | 0.3365 | 0.8408 | 0.4563 | ||||||
False Alarms | 19 | 3 | 105 |
3.2.3 Added phenomena effect
The integration of the variables ”Elapsed functioning time since last failure (G)” and ”past failures count (C)” significantly enhances our predictive model performance. As detailed in Table 4, these variables are not only relevant but also crucial for accurately predicting the likelihood of future failures. Their inclusion has notably improved the model’s performance, underscoring their value in effective failure prediction.
Further analysis reveals that a specific combination of these variables, with a (time interval) of 10 days and a memory window (q) of 22 minutes, consistently yields the best results in our tests.
Count | G | |||
Before | After | Before | After | |
Precision | 0.9785 | 0.9997 | 0.9995 | 0.9997 |
Recall | 0.3147 | 0.7255 | 0.6715 | 0.7255 |
Sens. | 0.9870 | 0.9997 | 0.9994 | 0.9997 |
F1 | 0.4763 | 0.8408 | 0.8033 | 0.8408 |
False Alarms | 132 | 3 | 6 | 3 |
3.2.4 Smoothing probabilities effect
we explored the impact of introducing smoothing to our model. Smoothing, in this context, involves taking the average of the past 10 probabilities rather than relying solely on the current probability. As demonstrated in Table 5, this method led to a significant reduction in the number of false alarms, plummeting from 71 to just 3. Additionally, both precision and specificity approached near-perfect scores, achieving a remarkable 0.9997. However, this improvement in accuracy came with a trade-off in recall, which decreased from 0.8415 to 0.7255. This drop suggests a potential compromise in the model’s ability to detect all real positive cases, despite the substantial reduction in false alarms. Notably, a configuration with of 10 days and a q of 22 minutes emerged as the most effective, consistently yielding the best performance across various tests.
Smoothing Probabilities | ||
Before | After | |
Precision | 0.9956 | 0.9997 |
Recall | 0.8415 | 0.7255 |
Specificity | 0.9934 | 0.9997 |
F1-score | 0.9121 | 0.8408 |
False Alarms | 71 | 3 |
4 Conclusion
In this paper, we have presented a model of autologistic regression for predicting rare events, particularly focusing on home equipment failures, by exploiting time series data in coevolution. Our model’s core innovation lies in its adept handling of the coevolution of multiple time series, a feature that is critical in capturing the complex dynamics in many real-world systems.
The efficiency of our model was further enhanced by integrating two essential phenomena: the time elapsed since the last failure and the total number of failures that have already occurred. These additions led to a significant improvement in the model’s ability to predict failures accurately. Furthermore, we introduced a probability smoothing technique to mitigate the issue of false alarms, making our model more reliable.
One of the standout advantages of this model is its versatility. Although our study focused on failure prediction, this model could easily be adapted to predict other types of rare events using time series data, making it potentially useful across various application domains and equipment.
Our research contributes to the growing field of predictive analysis, offering a new perspective on handling rare events in time series data. The insights gained from this study can be valuable for predictive maintenance in smart home systems, potentially leading to more efficient and timely interventions.
References
- [1] Panda, C., & Singh, T. R. (2023). ML-based vehicle downtime reduction: A case of air compressor failure detection. Engineering Applications of Artificial Intelligence, 122, 106031. https://doi.org/10.1016/j.engappai.2023.106031.
- [2] Li, Q., and Mao, Y. (2014). A Review of Boosting Methods for Imbalanced Data Classification. Pattern Anal. Applic. 17, 679–693.
- [3] Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., and Asadpour, M. (2020). Boosting Methods for Multi-Class Imbalanced Data Classification: an Experimental Review. J. Big Data 7, 1–47.
- [4] He, J., & Cheng, M. X. (2021). Weighting methods for rare event identification from imbalanced datasets. Frontiers in big Data, 4, 715320.
- [5] Carroll, J., Koukoura, S., McDonald, A., Charalambous, A., Weiss, S., McArthur, S. (2019). Wind turbine gearbox failure and remaining useful life prediction using machine learning techniques. Wind Energy, 22(3), 360-375.
- [6] Hyndman, R. J., & Fan, S. (2009). Density forecasting for long-term peak electricity demand. IEEE Transactions on Power Systems, 25(2), 1142-1153.
- [7] Li, P., Li, S., Bi, T., & Liu, Y. (2014). Telecom customer churn prediction method based on cluster stratified sampling logistic regression.
- [8] Liu, T. (2020). US Pandemic prediction using regression and neural network models. In 2020 international conference on intelligent computing and human-computer interaction (ICHCI) (pp. 351-354).
- [9] Fass, F., Ziou, D., & Kadri, N. (2022). Route Planning for a Tractor in an Agriculture Field with Obstacles. In 2022 International Conference of Advanced Technology in Electronic and Electrical Engineering (ICATEEE) (pp. 1-6). IEEE.
- [10] Eckerson, W. W. (2007). Predictive analytics. Extending the Value of Your Data Warehousing Investment. TDWI Best Practices Report, 1, 1-36.
- [11] Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., & Chabridon, S. (2017). Predictive models of hard drive failures based on operational data. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 619-625).
- [12] Padmanabh, K., Al-Rubaie, A., Davies, J., Clarke, S. S., & Aljasmi, A. A. A. A. (2021). Fault Prediction in HVAC Chillers by Analysis of Internal System Dynamics. In 2021 International Conference on Smart Applications, Communications and Networking (SmartNets) (pp. 1-6).
- [13] Albert, A., & Anderson, J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71(1), 1-10.
- [14] Anderson, J. A., & Blair, V. (1982). Penalized maximum likelihood estimation in logistic regression and discrimination. Biometrika, 69(1), 123-136.
- [15] Pump Sensor Data (2018). https://www.kaggle.com/datasets/nphantawee/pump-sensor-data.
- [16] Curry, H. B. (1944). The method of steepest descent for non-linear minimization problems. Quarterly of Applied Mathematics, 2(3), 258-261.