This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exploring the Effect of Sequence Smoothness on Machine Learning Accuracy

Cangqing Wang1{}^{1}* and Hoc T Quach2 1{}^{1}*Cangqing Wang be with Boston University, MA 02215, USA {kriswang}@bu.edu2Hoc T Quach is an independent researcher. Correspondence to Hoc T Quach via email: {tomquach22}@gmail.com
Abstract

Time series data, amassed through regular intervals over time, has become abundant with the rapid evolution of science and technology. Analyzing these time series unveils inherent dynamics, facilitating a nuanced understanding of diverse social phenomena. Moreover, such analyses empower future predictions, establishing a scientific foundation for informed decision-making and regulatory measures. Consequently, the importance of enhancing the accuracy of time series predictions cannot be overstated.

Addressing the challenges posed by large-scale, non-linear, and non-smooth time-series data, numerous machine learning-based time-series prediction methods have emerged and found extensive application across various industries. Long Short-Term Memory (LSTM) networks, characterized by their specialized architecture facilitating long-term memory retention, are particularly well-suited for modeling and predicting real-world time-series data. While significant research efforts have concentrated on optimizing the LSTM model structure to enhance prediction accuracy, limited attention has been given to the preprocessing inadequacies in machine learning for time-series data.

This paper adopts a perspective grounded in statistical theory and machine learning methods to explore the impact of sequence smoothness on machine learning accuracy through data preprocessing. Utilizing Python software, the study simulates eight non-smooth series, including random wandering with smooth + AR(1) sequence, random wandering with + MA(2) sequence, random wandering with + smooth ARMA(1, 1) sequence, random wandering with + smooth SARMA(1, 10, 1) sequence, linear trending smooth sequence, linear trending with + smooth AR(2) sequence, quadratic trending smooth sequence, and linear trending with + smooth AR(1, 10, 1) sequence, each of length 300.

Subsequently, employing the time series split cross-validation method, the original data and differenced data are segmented into multiple training and testing sets for model training and prediction. The study calculates the prediction accuracy of the original series and the inverse prediction accuracy of the differenced series. Comparisons of LSTM learning effects before and after differential smoothing reveal that preprocessing the original series with differential smoothing significantly improves the learning accuracy of LSTM when modeling non-stationary time series. Lastly, to validate the generality of the findings, daily closing price data of SSE and SZSE indices from December 31, 2012, to December 31, 2022, are utilized. To optimize the original LSTM from a data preprocessing perspective, a novel differential-LSTM algorithm, combining differential smoothing with LSTM, is proposed. This approach contributes to advancing the understanding and application of machine learning in optimizing time-series prediction accuracy.

Index Terms:
Sequence smoothing; machine learning accuracy; LSTM; time series prediction

I Introduction

Time series analysis is crucial, in industries and everyday life providing insights into trends in historical data. This method helps us understand phenomena and make predictions about the future guiding decision making processes. The accuracy of these predictions greatly influences how decisions are made leading experts in fields to focus on improving time series forecasting precision.

With advancements in computer technology there are now machine learning based techniques for time series forecasting. These new methods address the limitations of forecasting approaches when dealing with real world problems. Machine learning algorithms can handle amounts of data and nonlinear patterns by adjusting weights based on desired outcomes. Unlike methods that require data machine learning can predict changes, in sequences that are not constant. The concept of stationarity, which refers to the consistency of properties over time regardless of the starting and ending points implies that these properties are solely determined by the time interval. Essentially the sequence maintains its pattern of change at any given moment.

Our exploration introduced a method that significantly improves performance. By studying how sequence stationarity affects machine learning accuracy we have developed a smoothing technique that not only stabilizes non stationary sequences but also enhances the learning precision of the Long Short Term Memory (LSTM) neural network model. This unique differential LSTM algorithm fills gaps in current machine learning research by providing a refined predictive framework, for time series data.

The key contribution of this research is the integration of a smoothing technique into LSTM model training. This technique not stabilizes stationary sequences but also boosts the learning accuracy of the LSTM neural network model. By combining smoothing with LSTM we address existing shortcomings, in machine learning research on time series data preprocessing offering an more accurate predictive framework. Traditional methods, for forecasting time series data have theories. Work well for predicting linear trends. However they struggle when it comes to trends and large scales. In the age of data machine learning based approaches are gaining importance for handling time series data. Among these methods the LSTM neural network model is known for its high prediction accuracy.

It’s important to note how improving machine learning accuracy in time series predictions can benefit society significantly. Better forecasting can be applied in areas like optimizing production processes and aiding policymakers in making decisions. This research aims to contribute to enhancing machine learning accuracy for applications, across different societal sectors.

Henceforth, to scrutinize the ramifications of sequence stationarity on machine learning precision and enhance the prognostic acuity of machine learning prediction methodologies, this inquiry delves into the influence of varied non-stationary sequences on LSTM learning accuracy both pre- and post-stabilization. The empirical validation of these postulations employs time series data derived from the closing prices of financial market indices. The methodological framework is as follows:

(1) Python software is employed to simulate diverse non-stationary time series;

(2) The Time Series split cross-validation methodology partitions the simulated non-stationary time series, facilitating the training of the LSTM model. Subsequently, the average learning accuracy of the LSTM on the unaltered non-stationary time series is computed;

(3) Differential smoothing and white noise testing are applied to the simulated non-stationary time series;

(4) Utilizing the time series split cross-validation technique, the original data and the differenced data are segregated into numerous sets for LSTM model training and prediction. Post-de-stabilization, the average learning accuracy of the LSTM on the time series following differential stabilization is computed;

(5) Evaluation metrics, specifically MSE, MAE, and MAPE, are employed for a comprehensive juxtaposition of LSTM learning accuracy on various non-stationary time series before and after differencing. Conclusive inferences are drawn from these analyses.;

(6) (6) To substantiate the derived conclusions, the daily closing prices of the Shanghai Composite Index and the Shenzhen Composite Component Index are systematically acquired through web scraping from December 31, 2012, to December 31, 2022.

II Neural network related theories

II-A Recurrent Neural Network

Diverging from conventional artificial neural networks where inputs and outputs operate independently, the Recurrent Neural Network (RNN) exhibits a distinctive characteristic. Within the hidden layer of the RNN, connections establish a temporal relationship, linking the output at any given moment to the input from the preceding moment, thereby imbuing the network with a discernible short-term memory. Consequently, RNN manifests as particularly adept in processing sequentially ordered data, exemplified by its applicability to time series data.

II-A1 Network structure of RNN

Given an input sequence {Xt,t=1,2,,N}\{X_{t},t=1,2,\ldots,N\} and a corresponding output sequence {Yt,t=1,2,,N}\{Y_{t},t=1,2,\ldots,N\}, the learning process within the RNN unfolds through the network configuration illustrated in Figure 1. Here, the input layer XX signifies the vector comprising the input sequence, hh denotes the hidden layer value, and YY represents the vector comprising the output sequence. The matrices, UU, WW, and VV delineate the weight matrices governing the transitions from the input layer to the hidden layer, the structure of the hidden layer itself, and the connectivity from the hidden layer to the output layer, respectively.

Refer to caption
Figure 1: Illustration of Recurrent Neural Network (RNN) Architecture
Refer to caption
Figure 2: Diagrammatic Elaboration of Recurrent Neural Network (RNN) Structure Expansion

Expanding the temporal network facilitates a more explicit representation of the memory capabilities inherent in RNNs. As illustrated in Figure 2, the output of the hidden layer at time tt, denoted as hth_{t}, is intricately linked to the current input XtX_{t} and the output ht1h_{t-1} from the preceding time. Mathematically expressed as ht=f(UXt+Wht1)h_{t}=f(U\cdot X_{t}+W\cdot h_{t-1}), where f()f(\cdot) signifies the activation function, and Yt=g(Vht)Y_{t}=g(V\cdot h_{t}), with g()g(\cdot) representing the activation function. Consequently, RNN establishes a continuum between input and output across consecutive temporal instances through the interconnection of hidden layers.

II-A2 Advantages and Disadvantages of RNN

In contrast to conventional neural networks, recurrent neural networks exhibit superior attributes in parameter determination and suitability for time series data. Primarily, traditional neural network models necessitate the determination of connection weights for each neuron, leading to computationally intensive calculations, particularly when dealing with an abundance of hidden layers or input layer units. In RNN, a set of weight matrices is shared among hidden layers, substantially mitigating computational complexity and augmenting the model’s training speed. Moreover, owing to the cyclic interconnections within RNN’s hidden layers, it inherently possesses a short-term memory, wherein data proximal in time exerts a more pronounced influence on future data output. Consequently, RNN emerges as a neural network model well-suited for the nuanced intricacies of time series data.

However, practical applications of RNN often grapple with challenges in parameter training. During the computation of the gradient of the loss function concerning weights, an extended time step may give rise to gradient explosion or gradient disappearance issues. This phenomenon destabilizes the network structure or results in the cessation of weight updates, rendering the learning of information from a more extensive historical context unattainable.

II-B LSTM neutral network

In contrast to conventional RNN, the LSTM neural network structure exhibits a distinctive capability to maintain prolonged memory and execute selective memory operations. An extension of the original RNN architecture, the LSTM introduces novel unit states within the hidden layer to facilitate the storage of long-term memory, coupled with the incorporation of a gating mechanism designed for memory selection. This unique structural augmentation effectively addresses challenges related to the gradient explosion and vanishing gradient issues inherent in RNN.

II-B1 Network structure of LSTM

As depicted in Figures 3 and 4, the LSTM network introduces a dynamic cell state CtC_{t} evolving over time, augmenting the original two inputs (Xt,ht1)(X_{t},h_{t-1}) to the hidden layer at any given time tt to three inputs (Xt,ht1,Ct1)(X_{t},h_{t-1},C_{t-1}). Similarly, the outputs of the hidden layer, originally comprising two elements (ht1,Yt)(h_{t-1},Y_{t}), are transformed into three outputs (Ct1,ht1,Yt)(C_{t-1},h_{t-1},Y_{t}).

Refer to caption
Figure 3: Comparison diagram of RNN and LSTM structures
Refer to caption
Figure 4: LSTM input and output diagram

Moreover, as delineated in Figure 5, the LSTM architecture incorporates a forget gate, an input gate, and an output gate. These three pivotal ”switches” play a crucial role in governing the erasure, storage, and retrieval of long-term memory within the state CC, respectively. Within the forget gate mechanism, the memory state ht1h_{t-1} from the preceding time step and the current moment’s input XtX_{t} undergo a transformative process denoted as Ft=σ(Wf[ht1,Xt]+bf)F_{t}=\sigma(W_{f}\cdot[h_{t-1},X_{t}]+b_{f}) where σ()\sigma(\cdot) represents the sigmoid function, WfW_{f} signifies the weight matrix associated with the forgetting gate, bfb_{f} represents the bias vector of the forgetting gate, yielding a value within the range of 0 to 1 serving as the forgetting weight. A value of 1 signifies the retention of all information, while a value of 0 signifies complete memory erasure.

Refer to caption
Figure 5: Schematic diagram of LSTM network structure

The input gate serves as the orchestrator for the information that undergoes updating within the cell. Its determination is contingent upon the function It=σ(Wi[ht1,Xt]+bi)I_{t}=\sigma(W_{i}\cdot[h_{t-1},X_{t}]+b_{i}), where σ()\sigma(\cdot) represents the sigmoid function, WiW_{i} denotes the weight matrix associated with the input gate, and bib_{i} is the bias vector of the input gate. This formulation entails the initiation of the input weight for the gate, enabling the discernment of which information to update. Subsequently, a novel candidate memory unit is conceived through the function Ct~=tanh(Wc[ht1,Xt]+bc)\widetilde{C_{t}}=\tanh(W_{c}\cdot[h_{t-1},X_{t}]+b_{c}). Ultimately, the extant memory unit undergoes modification to Ct=ftCt1+ItCt~C_{t}=f_{t}\cdot C_{t-1}+I_{t}\cdot\widetilde{C_{t}}, thereby encapsulating both the information elected for retention by the forgetting gate and the information selected for updating by the input gate.

Concomitantly, the output gate assumes authority over the information dissemination from the cell. Its determination adheres to the function Ot=σ(Wo[ht1,Xt]+bo)O_{t}=\sigma(W_{o}\cdot[h_{t-1},X_{t}]+b_{o}), featuring the sigmoid function σ()\sigma(\cdot), the weight matrix WiW_{i} for the output gate, and the bias vector bob_{o}. The output weight of the output gate selects the pertinent information for output. The ultimate determinant of the output is then disseminated through the function ht=Ottanh(Ct)h_{t}=O_{t}\cdot\tanh(C_{t}).

The LSTM network structure, as delineated above, achieves the nuanced management of memory through selective storage and updating mechanisms. Additionally, it mitigates challenges such as gradient explosion and vanishing gradients that may arise during the training process. This renders LSTM particularly well-suited for the modeling and prediction of time series data encountered in practical applications.

II-B2 A measure of learning accuracy

For the observed sequence {xt}\{x_{t}\}, let {x^t}\{\hat{x}_{t}\} denote the predicted values, and define the error ei=xix^ie_{i}=x_{i}-\hat{x}_{i} (t=1,,nt=1,\ldots,n). The subsequent error measures are frequently employed for assessment:

Mean square error: MSE=n1t=1n(et)2\text{MSE}=n^{-1}\sum_{t=1}^{n}(e_{t})^{2}

Root mean square error: RMSE=n1t=1n(et)2\text{RMSE}=\sqrt{n^{-1}\sum_{t=1}^{n}(e_{t})^{2}}

Mean absolute error: MAE=n1t=1n|et|\text{MAE}=n^{-1}\sum_{t=1}^{n}|e_{t}|

Mean absolute percentage error: MAPE=n1t=1n100|etyt|\text{MAPE}=n^{-1}\sum_{t=1}^{n}100\left|\frac{e_{t}}{y_{t}}\right|

III Model training and impact exploration

The exploration of the influence of sequence stationarity on the accuracy of LSTM learning is comprehensively detailed in Figure 6, encompassing distinct stages such as the data preparation phase, the model training and prediction phase, and the subsequent application of sequence difference smoothing and post-prediction inverse difference operations.

Refer to caption
Figure 6: Schematic diagram of the exploration process

III-A Data preparation

In the data preparation phase, Python 3.7 was utilized to simulate diverse non-stationary sequences for investigating alterations in the learning accuracy of LSTM networks resulting from the differentiation of distinct non-stationary sequence types. A total of eight non-stationary sequences, each with a length of 300 and characterized by diverse trends, were meticulously simulated (refer to Table 1 for a comprehensive delineation of sequence types and their corresponding expressions).

TABLE I: Simulation sequence list
Sequence Type Sequence Expression
Random walk + stationary AR(1) sequence Yt=1.5Y(t1)+2ωtY_{t}=1.5Y_{(t-1)}+2\omega_{t}
Random walk + MA(2) sequence Yt=Y(t1)+2ωt+0.4ω(t1)+0.2ω(t2)Y_{t}=Y_{(t-1)}+2\omega_{t}+0.4\omega_{(t-1)}+0.2\omega_{(t-2)}
Random walk + stationary ARMA (1, 1) sequence Yt=0.6Y(t1)+2ωt+0.2ω(t1)Y_{t}=0.6Y_{(t-1)}+2\omega_{t}+0.2\omega_{(t-1)}
Random walk + stationary SARMA (1, 10, 1) sequence Yt=Y(t1)+0.2Y(t10)+2ωt+0.3ω(t10)Y_{t}=Y_{(t-1)}+0.2Y_{(t-10)}+2\omega_{t}+0.3\omega_{(t-10)}
Linear trend stationary series Yt=0.5+0.4t+ωtY_{t}=0.5+0.4t+\omega_{t}
Linear trend + stationary AR(2) sequence Yt=0.8+0.5t+Y(t1)0.5Y(t2)+ωtY_{t}=0.8+0.5t+Y_{(t-1)}-0.5Y_{(t-2)}+\omega_{t}
Quadratic trend stationary series Yt=0.6+0.5t+0.2t2+ωtY_{t}=0.6+0.5t+0.2t^{2}+\omega_{t}
Quadratic trend + stationary ARMA (1, 1) sequence Yt=0.6+0.8t+0.25t2+0.5Y(t1)+ωt+3ω(t1)Y_{t}=0.6+0.8t+0.25t^{2}+0.5Y_{(t-1)}+\omega_{t}+3\omega_{(t-1)}

The novelty in our approach becomes apparent with the introduction of differential smoothing as part of the data preparation process. This technique, as explained later, significantly enhances the learning accuracy of the LSTM model, providing a performance advantage over traditional methods. Following confirmation of non-stationarity through ADF and KPSS tests, the sequences underwent transformation into stationary sequences via first-order differentiation (second-order differentiation for quadratic trend sequences). Subsequently, the LB test was executed, and the resulting p-values for each category of sequence, both pre- and post-differentiation, are delineated in Table 2. The outcomes ascertain that subsequent to differentiation, all sequences exhibit stationarity and do not conform to a purely random pattern, rendering them suitable for LSTM model training.

TABLE II: Simulated sequence stationarity and white noise test results (p value)
Sequence Type ADF (inspection) KPSS LB
xtx_{t} Δxt\Delta x_{t} xtx_{t} Δxt\Delta x_{t} Δxt\Delta x_{t}
Random walk + stationary AR(1) sequence 0.30 0.00 0.01 0.10 0.00
Random walk + MA(2) sequence 0.87 0.00 0.01 0.10 0.00
Random walk + MA(2) sequence 0.95 0.00 0.01 0.10 0.00
Random + stationary SARMA (1, 10, 1) sequence 0.27 0.00 0.01 0.10 0.00
Linear trend stationary series 0.97 0.00 0.01 0.10 0.00
Linear trend + stationary AR(2) sequence 0.97 0.00 0.01 0.10 0.00
Quadratic trend stationary series 0.89 0.00 0.01 0.10 0.00
Quadratic trend + stationary ARMA (1, 1) sequence 0.97 0.00 0.01 0.10 0.00

III-B LSTM model training

(1) Data set division: Given the stochastic nature of the algorithm, the Time Series Split cross-validation method is employed to partition the dataset, including 8 non-stationary sequences and their difference-stabilized counterparts. This strategy enables the comparison of average outcomes across multiple training models, with specific lengths of the test set and training set detailed in Table 3.

TABLE III: Division of simulated sequence data sets
Group Training Set Length Test Set Length
First group 76 73
Second Group 149 73
The third group 222 73

(2)Data conversion: Following the normalization process, the data undergoes conversion into supervised learning data. Specifically, data at time t1 and time t serve as input fields, while data at time t+1 functions as the output field, facilitating the construction of samples. The array is then reshaped into the three-dimensional input format (input length, time step, data dimension) compatible with the LSTM model. Here, the input length corresponds to the length of the training set and test set employed for each round of model training. The time step is set to 2, signifying the input of data from the initial two steps and the output of the subsequent step, with the data dimension constrained to be one-dimensional.

(3) Model training: The establishment and training of the LSTM network are executed within the deep learning framework tensorflow.keras. For all experimental sequences, the number of hidden layer neuron nodes is standardized at 12. Sequences are processed one sample at a time (batch_size=1\text{batch\_size}=1), employing the adam optimization algorithm (optimizer=adam\text{optimizer}=^{\prime}adam^{\prime}). A rigorous training regimen of 100 iterations (epochs=100\text{epochs}=100) is implemented, culminating in the visualization of the loss decline curve. The innovation in our approach lies in the systematic training of the LSTM model on differentially stabilized sequences, resulting in a considerable improvement in predictive performance compared to traditional training methods, showcasing a clear performance advantage.

III-C Comparative analysis of prediction outcomes pre- and post-stabilization

(1) Model prediction and result visualization

Utilizing the trained LSTM models for all sequences, predictions were generated and systematically compared with the original sequence data. The primary objective was to visually assess the learning efficacy of the LSTM model both before and after the application of differencing. Due to spatial constraints, only select sequences’ prediction outcomes were showcased in the analysis. Random Walk + Stationary AR (1) Sequence Differencing (Figure 7): The left panel illustrates the learning outcome pre-differencing, while the right panel illustrates the learning outcome post-preprocessing with difference smoothing. The differential smoothing imparts stability, and post-transformation, the LSTM model retains its capacity to capture the sequence trend, demonstrating an improved fitting effect compared to direct utilization of the original data.

Refer to caption
Figure 7: Comparison of predictive outcomes pre and post the introduction of a random walk + stationary AR(1) sequence difference.

Random Walk + Stationary ARMA (1,1) Sequence Differencing (Figure 8): The left panel depicts the learning outcomes pre-differential processing, and the right panel portrays the outcomes after differential stabilization preprocessing. Evidently, subsequent to stabilization, the LSTM model exhibits superior capacity to discern and learn the inherent trends within the sequence, showcasing a notable advantage in learning accuracy.

Refer to caption
Figure 8: Comparison of prediction results pre and post random walk + stationary ARMA (1,1) sequence difference

Linear Trend Stationary Sequence Differentiation (Figure 9): The left panel illustrates the learning outcomes before differentiation, while the right panel portrays the learning results after preprocessing for difference stabilization. Following the stabilization of the differentiation process, the LSTM model adeptly assimilates inherent sequence trends, exhibiting a more pronounced fitting efficacy compared to direct utilization of unprocessed raw data.

Refer to caption
Figure 9: Comparison of prediction results before and after linear trend stationary sequence difference

Linear Trend + Stationary AR (2) Sequence Differencing (Figure 10): The left panel shows the predictive outcome prior to differencing, juxtaposed with the right panel showcasing the result after preprocessing for difference smoothing. Visually discernible, the variance is mitigated, and the LSTM model exhibits proficiency in assimilating the sequence trend even after differencing, yielding a more robust fitting effect compared to direct utilization of the unprocessed original data.

Refer to caption
Figure 10: Comparison of prediction results pre and post linear trend stationary sequence difference

The comparative outcomes for alternative sequences consistently demonstrate that LSTM manifests superior fitting efficacy when applied to differentially stabilized sequences. This observation leads to the plausible inference that, in contrast to directly modeling non-stationary sequences, the preprocessing of data through differential stabilization serves to enhance the learning accuracy of the LSTM network.

(2) Calculate the average learning accuracy

To quantitatively assess the impact of differential stabilization on learning outcomes, the study computes the mean learning accuracy and alteration percentage of LSTM on the test set for each category of non-stationary sequences, as detailed in Table 4. The tabulated data clearly indicates a notable elevation in prediction accuracy in LSTM training subsequent to the application of differential stabilization techniques.

TABLE IV: Average learning accuracy and change rate before and after sequence difference stabilization
Serial No. MSE RMSE MAE MAPE
Value Rate of change Value Rate of change Value Rate of change Value Rate of change
1 2.48 -29% 1.57 -16% 1.30 -19% 0.78 -20%
1.76 1.33 1.05 0.62
2 6.27 -55% 2.37 -30% 2.00 -34% 0.42 11%
2.80 1.67 1.31 0.47
3 5.20 -25% 2.27 -14% 1.83 -12% 0.96 5%
3.88 1.96 1.60 1.01
4 4.11 -26% 2.02 -14% 1.64 -14% 0.06 -12%
3.05 1.74 1.40 0.06
5 9.50 -85% 3.06 -62% 2.73 -65% 0.04 -64%
1.40 1.18 0.94 0.01
6 6.66 -82% 2.48 -56% 2.18 -59% 0.02 -54%
1.18 1.09 0.89 0.01
7 21399.73 -100% 138.76 -99% 120.58 -99% 0.02 -98%
2.59 1.59 1.32 0.00
8 145943.57 -100% 292.56 -99% 247.37 -99% 0.02 -97%
16.05 3.97 3.22 0.00

Furthermore, the preprocessing involving differential stabilization manifests a marked enhancement in the predictive performance of LSTM models, particularly in the context of non-stationary sequences characterized by deterministic trends such as linear and quadratic trends. This discernible improvement underscores the efficacy of employing differential stabilization methodologies for augmenting the accuracy of LSTM predictions in the context of non-stationary sequences.

IV Example verification

IV-A Data sources and preprocessing

To assess the generalizability of our findings, this study acquired daily closing price data for the Shanghai Composite Index and the Shenzhen Composite Component Index spanning from December 31, 2012, to December 31, 2022, for the purpose of example verification. The overall sequence length encompasses 2430 data points. Initially, both sequences underwent ADF and KPSS stationarity tests to confirm their non-stationary nature. Subsequently, employing first-order differencing rendered the sequences stationary, and the LB test was then applied to ascertain their pure randomness. The p-values obtained from the tests conducted before and after differencing are detailed in Table 5. The outcomes affirm that the original sequences exhibit non-stationarity, which is rectified after the initial differencing process. The utilization of white noise sequences, indicative of non-purely random characteristics, is deemed appropriate for training the LSTM model.

TABLE V: Stability and white noise test results of market index data (p value)
Sequence ADF(inspection) KPSS LB
Before differential After differential Before differential After differential After differential
Shanghai Composite Index daily closing price 0.17 0.00 0.01 0.10 0.00
Shenzhen Stock Exchange Index closing price 0.19 0.00 0.01 0.10 0.00

IV-B Model training and result verification

Initially, the Time Series Split cross-validation methodology is employed to partition the dataset into the sequence and its difference-stabilized counterpart. This division serves as the foundation for assessing the collective outcomes of diverse training models. The specific configurations of the test set and training set durations are meticulously delineated in Table 6.

TABLE VI: Division of market index data set
Group Training Set Length Test Set Length
First group 609 607
Second Group 1216 607
The third group 1823 607

The training procedure of the LSTM model conforms seamlessly to the methodology outlined in Chapter 3 and is not reiterated herein. Examination of Figures 11 and 12 reveals that, across all test datasets, the outcomes of the training process are geared towards enhancing the smoothness of the daily closing price data pertaining to the Shanghai Composite Index and the Shenzhen Composite Component Index. The predictive representation, derived post-preprocessing and subsequently employed for LSTM model training, exhibits an elevated degree of concordance with the authentic dataset.

Refer to caption
Figure 11: Comparison of prediction results pre and post the daily closing price difference of the Shanghai Composite Index
Refer to caption
Figure 12: Comparison of prediction results pre and post the daily closing price difference of the Shenzhen Component Index

Determine the mean learning accuracy and alteration rate of LSTM on the test dataset both pre- and post-stabilization of the two-sequence disparity. The outcomes are elucidated in Table 7.

sequence MSE rate of change RMSE rate of change MAE rate of change MAPE rate of change
The Shanghai Composite Index 2458.82 -33.1% 49.33 -18.6% 36.73 -26.4% 0.01 -26.6%
1644.30 40.14 27.03 0.01
The Shenzhen Composite Index 46652.17 -30.4% 210.56 -14.9% 158.23 -19.6% 0.01 -17.4%
32465.38 179.09 127.17 0.01
TABLE VII: Your Table Caption

The data reveals that subsequent to the stabilization of the daily closing price difference of the Shanghai Composite Index, the prediction accuracy employed for LSTM training experiences a reduction of approximately 33

The real-world application of our approach is exemplified through the verification process using daily closing price data for the Shanghai Composite Index and the Shenzhen Composite Component Index. The discernible reduction in prediction accuracy post-differential stabilization provides real-world evidence of the efficacy of our methodology. This practical verification solidifies the novelty of our approach in enhancing LSTM learning accuracy for diverse, non-stationary time series.

V Summary

In conclusion, this investigation into the effect of sequence smoothness on machine learning accuracy, specifically focusing on the application of the differential Long Short-Term Memory (LSTM) algorithm, yields significant findings with broad implications. The study demonstrates that LSTM training achieves heightened prediction accuracy after the stabilization of non-stationary sequences through differential techniques. The proposed differential-LSTM algorithm emerges as an effective means of optimizing the original LSTM model, addressing gaps in machine learning research related to time series data preprocessing.

Beyond the academic realm, the practical applications of these findings are substantial. By enhancing machine learning models, especially through the innovative differential-LSTM algorithm, the research contributes to real-world scenarios. The improved accuracy has the potential to revolutionize industries, support evidence-based decision-making in governance, and ultimately benefit society at large.

While the study focuses on univariate time series stationarity, it acknowledges the absence of exploration into the impact of multivariate time series stationarity on machine learning accuracy. Future research directions are proposed, encouraging the expansion of variables to enable a more comprehensive understanding of the impact on machine learning accuracy. Exploring interactions among different variables in a multivariate context could offer valuable insights for practical applications where multiple factors contribute to time series dynamics.

The suggested avenues for future research also include tailoring the proposed differential-LSTM algorithm for specific industry applications, such as finance, healthcare, or environmental monitoring. This customization could enhance its applicability and contribute to advancements in domain-specific contexts. Additionally, the integration of advanced technologies like explainable AI and interpretability in machine learning models is recommended to foster trust and acceptance in decision-making processes.

In summary, this investigation pioneers a novel approach to time series analysis by emphasizing the role of sequence smoothness in optimizing machine learning accuracy. The application of the proposed differential-LSTM algorithm not only addresses existing research gaps but also sets the stage for positive societal impacts through advancements in machine learning accuracy.

References

  • [1] Livieris, I. E., Stavroyiannis, S., Iliadis, L., et al. Smoothing and stationarity enforcement framework for deep learning time-series forecasting. Neural Computing and Applications, 2021, 33(20): 14021-14035.
  • [2] Dixit, A., Jain, S. Effect of stationarity on traditional machine learning models: Time series analysis. In 2021 Thirteenth International Conference on Contemporary Computing (IC3-2021), 2021: 303-308.
  • [3] Patel, D., Canaday, D., Girvan, M., et al. Using machine learning to predict statistical properties of non-stationary dynamical processes: System climate, regime transitions, and the effect of stochasticity. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2021, 31(3).
  • [4] Salazar, J. J., Garland, L., Ochoa, J., et al. Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy. Journal of Petroleum Science and Engineering, 2022, 209: 109885.
  • [5] Erden, G. E., Yavuz, M., Deutsch, C. V. Combination of machine learning and kriging for spatial estimation of geological attributes. Natural Resources Research, 2022, 31(1): 191-213.
  • [6] Silva, R. P., Zarpelão, B. B., Cano, A., et al. Time series segmentation based on stationarity analysis to improve new samples prediction. Sensors, 2021, 21(21): 7333.
  • [7] Livieris, I. E., Stavroyiannis, S., Pintelas, E., et al. A novel validation framework to enhance deep learning models in time-series forecasting. Neural Computing and Applications, 2020, 32: 17149-17167.
  • [8] Kamrud, A., Borghetti, B., Schubert Kabban, C. The effects of individual differences, non-stationarity, and the importance of data partitioning decisions for training and testing of EEG cross-participant models. Sensors, 2021, 21(9): 3225.
  • [9] Wang, Y., Zhang, J., Zhu, H., et al. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 9154-9162.
  • [10] Jamshed, A., Mallick, B., Bharti, R. K. An Analysis of Sequential Pattern Mining Approach for Progressive Database by Deep Learning Technique. In 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, 2022: 1409-1415.
  • [11] Herbert, Z. C., Asghar, Z., Oroza, C. A. Long-term reservoir inflow forecasts: enhanced water supply and inflow volume accuracy using deep learning. Journal of Hydrology, 2021, 601: 126676.
  • [12] Hu, H., Xia, X., Luo, Y., et al. Development and application of an evolutionary deep learning framework of LSTM based on improved grasshopper optimization algorithm for short-term load forecasting. Journal of Building Engineering, 2022, 57: 104975.
  • [13] Huang, G., Li, X., Zhang, B., et al. PM2.5 concentration forecasting at surface monitoring sites using GRU neural network based on empirical mode decomposition. Science of the Total Environment, 2021, 768: 144516.
  • [14] Sun, W., Xu, Z. A hybrid Daily PM2.5 concentration prediction model based on secondary decomposition algorithm, mode recombination technique and deep learning. Stochastic Environmental Research and Risk Assessment, 2022: 1-20.
  • [15] Kumar, M., Gupta, D. K., Singh, S. Extreme event forecasting using machine learning models. In Advances in Communication and Computational Technology: Select Proceedings of ICACCT 2019. Springer Singapore, 2021: 1503-1514.
  • [16] Ji, C., Ma, F., Wang, J., et al. Profitability related industrial-scale batch processes monitoring via deep learning based soft sensor development. Computers & Chemical Engineering, 2023, 170: 108125.
  • [17] Salam, A., Ullah, F., Amin, F., et al. Deep learning techniques for web-based attack detection in industry 5.0: A novel approach. Technologies, 2023, 11(4): 107.
  • [18] Gao, Y., Mosalam, K. M., Chen, Y., et al. Auto-regressive integrated moving-average machine learning for damage identification of steel frames. Applied Sciences, 2021, 11(13): 6084.
  • [19] Niu, D., Sun, L., Yu, M., et al. Point and interval forecasting of ultra-short-term wind power based on a data-driven method and hybrid deep learning model. Energy, 2022, 254: 124384.