This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adaptive Law-Based Transformation (ALT): A Lightweight Feature Representation for Time Series Classification

Marcell T. Kurbucz [email protected] Balázs Hajós Balázs P. Halmos Vince Á. Molnár Antal Jakovác Department of Computational Sciences, Wigner Research Centre for Physics, 29-33 Konkoly-Thege Miklós Street, Budapest, 1121, Hungary Department of Statistics, Corvinus University of Budapest, 8 Fővám Square, Budapest, 1093, Hungary Faculty of Science, Eötvös Loránd University, 1/A Pázmány Péter Walkway, Budapest, 1117, Hungary Faculty of Engineering and Natural Sciences, Tampere University, Kalevantie 4, Tampere, 33100, Finland
Abstract

Time series classification (TSC) is fundamental in numerous domains, including finance, healthcare, and environmental monitoring. However, traditional TSC methods often struggle with the inherent complexity and variability of time series data. Building on our previous work with the linear law-based transformation (LLT)—which improved classification accuracy by transforming the feature space based on key data patterns—we introduce adaptive law-based transformation (ALT). ALT enhances LLT by incorporating variable-length shifted time windows, enabling it to capture distinguishing patterns of various lengths and thereby handle complex time series more effectively. By mapping features into a linearly separable space, ALT provides a fast, robust, and transparent solution that achieves state-of-the-art performance with only a few hyperparameters.

keywords:
Time series classification, Representation learning , Feature engineering , Artificial intelligence
journal: arXiv

1 Introduction

Time series classification (TSC) is essential in various domains such as finance, healthcare, and environmental monitoring, where the goal is to categorize temporal data into predefined classes [12, 13]. Traditional TSC approaches often rely on feature extraction methods designed to capture the temporal dynamics and structural patterns inherent in time series data [1, 4, 14]. However, these methods may struggle with the complexities and variability of time series data.

Our previous work introduced the linear law-based transformation (LLT) method, which performs uni- and multivariate TSC tasks by transforming the feature space based on identified governing patterns in the data [20]. LLT uses time-delay embedding and spectral decomposition to extract linear laws from training data and applies these laws to transform test data, resulting in improved classification accuracy with low computational cost.

In this paper, we build upon the LLT method by introducing an enhanced approach called adaptive law-based transformation (ALT) that utilizes variable-length shifted time windows. Unlike LLT, which operates on fixed-length windows, ALT explores patterns of varying lengths and shifts, making it more effective in capturing distinguishable patterns within time series data. This flexibility allows the method to identify local patterns of different scales, enhancing its ability to classify complex time series.

Similar to LLT, our method aims to transform features into a linearly separable feature space, offering a fast, robust, and transparent solution that achieves state-of-the-art performance. By reducing the need for extensive hyperparameter tuning and incorporating variable-length patterns, ALT simplifies the modeling process and enhances interpretability, setting it apart from mainstream neural networks and other deep learning techniques.

We evaluated ALT on eleven benchmark time series datasets, demonstrating its effectiveness compared to existing TSC techniques, including the original LLT method. The results show that the proposed approach not only achieves higher accuracy but also offers advantages in speed and transparency.

The remainder of this paper is organized as follows: Section 2 reviews related work, including the LLT method. Section 3 describes the datasets used and details our proposed method. Section 4 presents and discusses the experimental results. Finally, Section 5 concludes the paper and suggests directions for future research.

2 Related Work

Time series classification (TSC) methods can generally be grouped into three main categories: feature-based, distance-based, and deep learning-based approaches. Each category offers distinct advantages and faces specific challenges.

Feature-based approaches extract meaningful representations from time series data before applying classification algorithms. These representations may capture statistical descriptors [14], spectral transformations such as the discrete Fourier transform (DFT) or discrete wavelet transform (DWT) [1], or model-based features derived from techniques like autoregressive integrated moving average (ARIMA) [4]. Shapelet-based methods, which identify short, discriminative subsequences (shapelets) within the data [29], can be considered a subset of feature-based methods [18]. Shapelet-based approaches focus on local features that are highly interpretable and often effective for capturing localized variations, though they may struggle with multi-scale patterns and can be computationally intensive for long time series. Feature-based representations are typically classified using conventional methods such as logistic regression, random forests, or support vector machines (SVM).

Distance-based methods measure the similarity or dissimilarity between entire time series without explicitly transforming them into feature vectors. A well-known example is dynamic time warping (DTW) [26], which is robust to local temporal distortions and useful for aligning time series. However, these methods can become computationally expensive as the dataset size grows, and they lack an interpretable intermediate representation of the data.

Deep learning-based methods automatically learn hierarchical feature representations directly from raw time series. Convolutional neural networks (CNNs) are adept at identifying local temporal correlations, while recurrent neural networks (RNNs) excel at capturing sequential patterns, including long-term dependencies [13, 19, 30]. While deep learning methods often achieve strong empirical performance, they typically require large labeled datasets, involve extensive hyperparameter tuning, and may lack transparency in their learned representations.

The linear law-based transformation (LLT) [20] integrates elements of feature-based and distance-based methods. By using time-delay embedding and spectral decomposition, LLT extracts governing patterns from training data and applies these patterns to unseen instances, transforming the feature space to improve classification accuracy. Despite its low computational cost, LLT relies on fixed-length windows, which can limit its ability to capture patterns of variable lengths.

Building on LLT, this work introduces the adaptive law-based transformation (ALT). ALT incorporates variable-length shifted time windows to capture local patterns across multiple temporal scales while maintaining interpretability and computational efficiency. This adaptive design enables ALT to effectively handle complex time series, bridging the gaps between diverse TSC approaches.

3 Data and Methodology

3.1 Employed Data

This study utilizes eleven real-world datasets sourced from the UCR Time Series Classification Archive [8, 9].111These datasets are available at: https://www.timeseriesclassification.com (retrieved: January 15, 2025). The datasets are detailed in Table 1.

Table 1: Overview of the datasets employed in this study
Dataset Type Classes Features Train Size Test Size Length Balanced Description
BasicMotions Multivariate 4 6 40 40 100 Yes Contains motion sensor data from four different activities performed by participants.
Coffee Univariate 2 1 28 28 286 Yes Spectrographs of two types of coffee beans, with the task of differentiating between them.
Epilepsy Multivariate 4 3 137 138 207 Yes Data collected from a tri-axial accelerometer while participants performed four tasks, including mimicking a seizure.
Epilepsy2 Univariate 2 1 80 11420 178 Train  only Single-channel EEG measurements aimed at determining whether a participant is experiencing a seizure.
FordA Univariate 2 1 1320 3601 500 Yes Measurements of engine noise in automotive production, used to detect specific symptoms.
FordB Univariate 2 1 810 3636 500 Yes Similar to FordA, but focuses on detecting different symptoms in engine noise measurements.
GunPoint1 Univariate 2 1 50 150 150 Yes This dataset records X-axis hand motions for “Gun-Draw” and “Point” actions by two actors.
GunPoint2 Univariate 2 1 135 316 150 Yes Variation of the GunPoint dataset focusing on distinguishing participants from different age groups.
GunPoint3 Univariate 2 1 135 316 150 Yes Variation of the GunPoint dataset focusing on distinguishing male and female participants.
GunPoint4 Univariate 2 1 135 316 150 Yes Variation of the GunPoint dataset focusing on distinguishing old and young participants.
PowerCons Univariate 2 1 180 180 144 Yes Device power consumption data, with the task of determining the operational status.
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung.

3.2 Feature Representation and Classification

A general TSC task can be formalized as follows. The input data is represented as xti,jx_{t}^{i,j}, where t1,2,,ht\in{1,2,\dots,h} denotes the observation times, i1,2,,τi\in{1,2,\dots,\tau} identifies the instances, and j1,2,,mj\in{1,2,\dots,m} indexes the different input series belonging to a given instance. The output yi1,2,,cy^{i}\in{1,2,\dots,c} identifies the class of instance ii. The task is to predict the classes from the input data. To address this task, we use the following algorithm:

  1. [A1]

    Data Splitting. Divide the instances into learning (LrLr), training (TrTr), and test (TeTe) subsets using random selection stratified by class representation.

  2. [A2]

    Sequence Extraction. (For each LrLr, jj, and (r,l,k)(r,l,k)): Extract rr-length sequences using shifted time windows (shifted by kk), and take out 2l12l-1 points evenly. The triplets (r,l,k)(r,l,k) are pre-defined parameters, where rhr\leq h, and (2l2)(r1)(2l-2)\mid(r-1). For a given LrLr, jj, and (r,l,k)(r,l,k), hr+1k\left\lfloor{\frac{h-r+1}{k}}\right\rfloor sequences are generated.

  3. [A3]

    Shapelet Vectors. (For each sequence): Perform ll-dimensional time-delay embedding [27] (SS)—where 2l12l-1 denotes the length of the given sequence, and SS is a symmetric matrix. Perform spectral decomposition of SS. The eigenvector for the smallest absolute eigenvalue (+{0}\in\mathbb{R}^{+}\cup\{0\}) is called the vv shapelet vector, and Sv0Sv\approx 0.222Note that this step relates to principal component analysis (PCA) [15], which extracts informative directions using eigenvectors of the largest eigenvalues. In contrast, ALT focuses on the dimension where SS shows the least variability, using the corresponding vv vector to compare shapelets.

  4. [A4]

    Shapelet Matrices. (For each jj and (r,l,k)(r,l,k)): Use shapelet vectors related to the same jj and (r,l,k)(r,l,k) pairs as the column vectors of the shapelet matrix PP. Group patterns based on the related class within PP (cc classes result in cc partitions within the PP matrix).

  5. [A5]

    Transformation. (For each TrTr, jj, and (r,l,k)(r,l,k)): Let s=r12l2s=\frac{r-1}{2l-2} and o=hsl+1ko=\left\lfloor\frac{h-sl+1}{k}\right\rfloor. Embed the instance into an o×lo\times l matrix (A)(A) as follows:

    A=(x1i,jxs+1i,jx(l1)s+1i,jxk+1i,jxk+s+1i,jxk+(l1)s+1i,jx(o1)k+1i,jx(o1)k+s+1i,jx(o1)k+(l1)s+1i,j).A=\begin{pmatrix}x_{1}^{i,j}&x_{s+1}^{i,j}&\dots&x_{(l-1)s+1}^{i,j}\\ x_{k+1}^{i,j}&x_{k+s+1}^{i,j}&\dots&x_{k+(l-1)s+1}^{i,j}\\ \vdots&\vdots&\ddots&\vdots\\ x_{(o-1)k+1}^{i,j}&x_{(o-1)k+s+1}^{i,j}&\dots&x_{(o-1)k+(l-1)s+1}^{i,j}\\ \end{pmatrix}. (1)

    Right-multiply this matrix with the PP shapelet matrix related to the same pair of jj and (r,l,k)(r,l,k), that is, O=APO=AP. Shapelets from each class in PP “compete” to transform the AA matrix close to null vectors.

  6. [A6]

    Feature Generation. (For each transformed matrix): Square the values of the resulting OO and partition it by the class from which the shapelets originate. Different methods are used to extract features from the resulting partitions. For example, identify a specific percentile in all the rows, then calculate different statistical indicators from the percentiles. Alternatively, calculate a statistical indicator from all the values in the partitions. After this step, the original mm signals of an instance are represented in an m×c×n×gm\times c\times n\times g dimensional feature space, where nn is the number of extraction methods used, and gg is the number of (r,l,k)(r,l,k) triplets used.

  7. [B1]

    Classifier Tuning and Evaluation. Utilize new features to tune advanced classifiers (e.g., KK-nearest neighbors) via Bayesian hyperparameter optimization and cross-validation. Evaluate classifiers’ accuracy, tuning, and classification time on the training set.

  8. [B2]

    Test and Benchmark. Similar to steps [A5–A6], transform the test set (TeTe), generate new features, and apply tuned classifiers. Measure out-of-sample classification speed and accuracy. Benchmark results against state-of-the-art methods.

Figure 1 illustrates the complete feature representation and classification procedure, including a law selection step [C1] that is planned for implementation in a future study—see Section 5 for more details.

Figure 1: Applied ML framework
Refer to caption

3.3 Software and Settings

We implemented steps [A1–A6] in Python to transform the original feature spaces. The transformed features were then used to train KNN [2] and SVM [7] classifiers in the MATLAB Classification Learner App to perform steps [B1–B2].333More information can be found at https://www.mathworks.com/help/stats/classificationlearner-app.html (retrieved: January 15, 2025). During the classification procedure, a 30-step Bayesian hyperparameter optimization with 5-fold cross-validation was applied.444As a benchmark, we also used optimizable neural networks on the raw time-series data with a 500-step Bayesian hyperparameter optimization and 5-fold cross-validation. These benchmark results are presented in Table A.1 in the Appendix.

We also optimized the hyperparameters of the proposed method (r,l,k)(r,l,k) to achieve the highest classification accuracy. Furthermore, during feature extraction [A6], we incorporated various statistical indicator pairs that yielded the best performance. From the rows of the matrix OO, the mean and 5th percentile were computed, followed by calculations of the mean, variance, and the third and fourth moments. The exact parameter settings applied to each dataset are detailed in Table A.2 in the Appendix.

4 Results and discussion

The classification outcomes obtained with ALT are summarized in Table 2.

Table 2: Classification results
Dataset Validation Accuracy Test Accuracy Classification Method Transform. Time (s) Classification Time (s) Benchmark
BasicMotions 100.0% 100.0% KNN 1.90 9.00 95.3–100.0%
Coffee 100.0% 100.0% KNN 1.22 6.41 78.6–100.0%
Epilepsy 96.1% 97.8% SVM 84.58 12.87 85.0–100.0%
Epilepsy2 95.0% 93.8% KNN 48.09 7.43 89.4–100.0%
FordA 97.5% 97.5% SVM 915.80 28.95 49.0–100.0%
FordB 84.9% 94.4% KNN 3069.00 14.54 50.9–100.0%
GunPoint1 100.0% 96.7% SVM 7.49 7.90 68.0–100.0%
GunPoint2 98.5% 93.0% SVM 16.25 13.96 57.0–100.0%
GunPoint3 100.0% 99.4% KNN 6.99 6.85 68.0–100.0%
GunPoint4 100.0% 100.0% KNN 2.32 6.62 88.0–100.0%
PowerCons 92.4% 93.3% SVM 3.45 9.07 73.0–100.0%
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung. Results were obtained using 30 iterations of Bayesian hyperparameter optimization in the MATLAB Classification Learner App. Benchmarks were derived from the test accuracies reported by the studies summarized in Table A.3 in the Appendix.

As Table 2 shows, ALT consistently achieves high validation and test accuracies across all eleven datasets, including perfect scores (100%) on BasicMotions, Coffee, and GunPoint4. Transformations typically complete within a practical time frame; however, for larger datasets (e.g., FordB), the transformation step can be more time-consuming. This overhead arises primarily from shapelet vector generation and spectral decomposition steps. Once the transformed features are computed, classification (via KNN or SVM) is relatively fast.

Table A.2 details the hyperparameter and feature-extraction settings employed for each experiment, including the ratio of data used for shapelet generation versus classifier training. Notably, only a small subset of the data is typically required for learning shapelets, highlighting ALT’s efficiency in deriving class-relevant patterns.

For additional context, Table A.1 compares ALT’s accuracy to that of a neural network benchmark using an optimizable feed-forward architecture (MLP) implemented in MATLAB on the raw time-series data.555Neural networks were tuned using 500500-step Bayesian hyperparameter optimization and 55-fold cross-validation. On most datasets, regardless of their length, ALT outperforms or closely matches the neural network solution despite having far fewer hyperparameters and a shorter optimization process. Furthermore, the benchmark compilation in Table 2 demonstrates that ALT is highly competitive against a wide range of state-of-the-art approaches, including shapelet-based methods and advanced neural and kernel techniques.

Across tasks, ALT’s ability to capture subsequence patterns of varying lengths proves advantageous, particularly for datasets with subtle class-distinguishing events (e.g., Epilepsy, GunPoint2). This adaptability is reflected in consistent improvements over baseline neural methods, which often struggle with more complex sensor signals (e.g., FordA, FordB). Although certain tasks (e.g., Coffee, GunPoint4) are relatively straightforward for most algorithms, ALT maintains robust reliability while retaining interpretability by design.

5 Conclusion and Future Works

In this paper, we introduced ALT, a novel method for time series classification that generalizes our previous LLT approach. By incorporating variable-length shifted windows, it captures local subsequence patterns of different scales and embeds them in a linearly separable feature space. Extensive experiments across eleven diverse datasets confirm ALT’s capacity to deliver competitive or state-of-the-art results, as evidenced by Tables 2, A.1, and A.3.

In future work, we plan to integrate data-driven mechanisms for automatically tuning (r,l,k)(r,l,k), thus further reducing manual hyperparameter exploration. Additionally, we aim to investigate shapelet pruning techniques (see step [C] in Figure 1) to lower computational overhead, making ALT scalable to very large time series with minimal performance loss. The method’s interpretability could also be enriched by qualitative visualization of extracted shapelet vectors, potentially illuminating latent domain structures. Finally, exploring ALT’s capabilities in specialized domains like multi-channel EEG monitoring or IoT anomaly detection may reveal further performance gains and highlight the role of domain-specific knowledge in shaping the transformation pipeline.

Acknowledgments

The research was supported by the Hungarian Government and the European Union in the framework of a Grant Agreement No. MILAB RRF-2.3.1-21-2022-00004. Project no. PD142593 was implemented with the support provided by the Ministry of Culture and Innovation of Hungary from the National Research, Development, and Innovation Fund, financed under the PD_22 “OTKA” funding scheme.

References

  • Agrawal et al. [1993] Agrawal, R., Faloutsos, C., & Swami, A. N. (1993). Efficient similarity search in sequence databases. In Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO) (pp. 69–84). Springer.
  • Altman [1992] Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46, 175–185. Reference for KNN method.
  • Bostrom [2018] Bostrom, A. (2018). Shapelet transforms for univariate and multivariate time series classification. Ph.D. thesis University of East Anglia.
  • Box et al. [2015] Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
  • Cai et al. [2024] Cai, R., Peng, L., Lu, Z., Zhang, K., & Liu, Y. (2024). Dcs: Debiased contrastive learning with weak supervision for time series classification. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5625–5629). doi:10.1109/ICASSP48485.2024.10446381.
  • Ceni & Gallicchio [2023] Ceni, A., & Gallicchio, C. (2023). Residual reservoir computing neural networks for time-series classification. ESANN.
  • Cortes & Vapnik [1995] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273–297. Reference for SVM.
  • Dau et al. [2019] Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., & Keogh, E. (2019). The ucr time series archive. IEEE/CAA Journal of Automatica Sinica, 6, 1293–1305.
  • Dau et al. [2018] Dau, H. A., Keogh, E., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., & Hexagon-ML (2018). The ucr time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
  • Dhariyal et al. [2023] Dhariyal, B., Le Nguyen, T., & Ifrim, G. (2023). Back to basics: A sanity check on modern time series classification algorithms. In International Workshop on Advanced Analytics and Learning on Temporal Data (pp. 205–229). Springer.
  • Eldele et al. [2023] Eldele, E., Ragab, M., Chen, Z., Wu, M., Kwoh, C.-K., Li, X., & Guan, C. (2023). Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, .
  • Esling & Agon [2012] Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR), 45, 1–34.
  • Fawaz et al. [2019] Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., & Muller, P.-A. (2019). Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33, 917–963.
  • Fulcher & Jones [2014] Fulcher, B. D., & Jones, N. S. (2014). Highly comparative feature-based time-series classification. IEEE Transactions on Knowledge and Data Engineering, 26, 3026–3037.
  • Gao et al. [2021] Gao, F., Tian, T., Yao, T., & Zhang, Q. (2021). Human gait recognition based on multiple feature combination and parameter optimization algorithms. Computational Intelligence and Neuroscience, 2021, 6693206.
  • Hussein et al. [2024] Hussein, D., Nelson, L., & Bhat, G. (2024). Sensor-aware classifiers for energy-efficient time series applications on iot devices. doi:10.48550/arXiv.2407.08715.
  • Ito & Chakraborty [2020] Ito, H., & Chakraborty, B. (2020). Fast and interpretable transformation for time series classification: A comparative study. International Journal of Applied Science and Engineering, 17, 269–280. URL: https://doi.org/10.6703/IJASE.202009_17(3).269. doi:10.6703/IJASE.202009_17(3).269.
  • Ji et al. [2019] Ji, C., Zhao, C., Pan, L., Liu, S., Yang, C., & Meng, X. (2019). A just-in-time shapelet selection service for online time series classification. Computer Networks, 157, 89–98.
  • Karim et al. [2019] Karim, F., Majumdar, S., Darabi, H., & Chen, S. (2019). Multivariate lstm-fcns for time series classification. Neural Networks, 116, 237–245.
  • Kurbucz et al. [2022] Kurbucz, M. T., Pósfay, P., & Jakovác, A. (2022). Facilitating time series classification by linear law-based feature space transformation. Scientific Reports, 12, 18026.
  • Lin et al. [2023] Lin, C., Wen, X., Cao, W., Huang, C., Bian, J., Lin, S., & Wu, Z. (2023). Nutime: Numerically multi-scaled embedding for large-scale time series pretraining. arXiv preprint arXiv:2310.07402, .
  • Mukhopadhyay et al. [2024] Mukhopadhyay, S., Dey, S., Mukherjee, A., Pal, A., & Ashwin, S. (2024). Time series classification on edge with lightweight attention networks. In 2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) (pp. 487–492). IEEE.
  • Pasos Ruiz et al. [2021] Pasos Ruiz, A., Flynn, M., Large, J., Middlehurst, M., & Bagnall, A. (2021). The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 35, 1–49. doi:10.1007/s10618-020-00727-3.
  • [24] Saini, U. S., Zhuang, Z., Yeh, C.-C. M., Zhang, W., & Papalexakis, E. E. (). Analysis of causal and non-causal convolution networks for time series classification. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM) (pp. 797–805). URL: https://epubs.siam.org/doi/abs/10.1137/1.9781611978032.91. doi:10.1137/1.9781611978032.91. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611978032.91.
  • Schlegel & Keim [2023] Schlegel, U., & Keim, D. A. (2023). A deep dive into perturbations as evaluation technique for time series xai. In L. Longo (Ed.), Explainable Artificial Intelligence (pp. 165--180). Cham: Springer Nature Switzerland.
  • Senin [2008] Senin, P. (2008). Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, 40.
  • Takens [1981] Takens, F. (1981). Dynamical systems and turbulence. Warwick, 1980, (pp. 366--381).
  • Xi et al. [2023] Xi, W., Jain, A., Zhang, L., & Lin, J. (2023). Lb-simtsc: An efficient similarity-aware graph neural network for semi-supervised time series classification. arXiv preprint arXiv:2301.04838, .
  • Ye & Keogh [2009] Ye, L., & Keogh, E. (2009). Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 947--956). ACM.
  • Zheng et al. [2014] Zheng, Y., Liu, Q., Chen, E., Ge, Y., & Zhao, J. (2014). Time series classification using multi-channels deep convolutional neural networks. In International Conference on Web-Age Information Management (pp. 298--310). Springer.

Appendix

Table A.1: Classification of raw datasets with neural networks
Dataset Validation accuracy Test accuracy Training time (s)
BasicMotions 72.5% 87.5% 1941.7
Coffee 100.0% 100.0% 1105.7
Epilepsy 65.7% 67.4% 2745.3
Epilepsy2 80.0% 89.9% 1248.9
FordA 72.7% 72.0% 7347.9
FordB 63.0% 66.0% 5211.7
GunPoint1 98.0% 94.0% 1380.7
GunPoint2 96.3% 98.1% 2023.2
GunPoint3 99.3% 99.7% 1513.2
GunPoint4 100.0% 100.0% 866.5
PowerCons 100.0% 98.9% 1300.9
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung. Results were obtained using 500 iterations of Bayesian hyperparameter optimization and 5-fold cross-validation in the MATLAB Classification Learner App.
Table A.2: Applied parameters
Dataset Learn- train ratio Method Used (r,l,k)(r,l,k) values
BasicMotions 0.25 mean - mean, 5th5^{\text{th}} percentile - 4th4^{\text{th}} moment (53, 27, 1)
Coffee 0.25 5th5^{\text{th}} percentile - mean (3, 2, 1)
Epilepsy 0.25 mean - mean (29, 15, 1), (69, 35, 1), (89, 45, 1), (149, 75, 1), (169, 85, 1), (189, 95, 1)
Epilepsy2 0.25 5th5^{\text{th}} percentile - mean, 5th5^{\text{th}} percentile - variance (19, 10, 1), (29, 15, 1)
FordA 0.20 5th5^{\text{th}} percentile - mean (23, 12, 1), (29, 15, 1), (85, 43, 1), (95, 48, 1), (205, 103, 1)
FordB 0.50 5th5^{\text{th}} percentile - mean (19,10,1),(39,20,1),(129,65,1),(139,70,1),(159,80,1), (169,85,1),
(179,90,1),(199,100,1),(209,105,1),(275,138,1)
GunPoint1 0.20 mean - mean (7, 4, 1), (31, 2, 1), (51, 6, 1), (81, 6, 1), (121, 11, 1), (121, 31, 1),
(121, 61, 1), (121, 5, 1)
GunPoint2 0.50 mean - mean, 5th5^{\text{th}} percentile - excess kurtosis (49, 25, 1), (59, 30, 1), (69, 35, 1), (89, 45, 1)
GunPoint3 0.20 mean - mean, 5th5^{\text{th}} percentile - mean (3, 2, 1), (19, 10, 1), (39, 20, 1), (109, 55, 1)
GunPoint4 0.50 mean - mean (3, 2, 1)
PowerCons 0.20 mean - mean (3, 2, 1), (99, 50, 1)
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung. Results were obtained using 30 iterations of Bayesian hyperparameter optimization in the MATLAB Classification Learner App.
Table A.3: Literature benchmarks
Database Test accuracy (%) Reference Method
BasicMotions 95.3–100.0 [23] DTWD, ROCKET, CIF, HIVE-COTE
Coffee 96.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
Coffee 78.6–100.0 [17] Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
Epilepsy 96.3–100.0 [23] DTWD, ROCKET, CIF, HIVE-COTE
Epilepsy 95.7–97.1 [5] Debiased Contrastive Learning with Weak Supervision
Epilepsy 85.0–99.0 [16] CNN
Epilepsy2 89.4–100.0 [21] Multi-Scaled Embedding for Large-Scale Time-Series Pretraining
FordA 96.8–100.0 [22] Lightweight Attention Networks
FordA 79.3–86.4 [28] LB-SimTSC (Similarity-Aware Graph Neural Network)
FordA 49.0–95.0 [10] RandomForest, Rocket, Minirocket, Multirocket
FordA 74.54–95.6 [24] LSRSC (Centered Kernel Alignment)
FordA 56.7–93.6 [17] Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
FordA 53.4–71.3 [6] Residual Reservoir Computing Neural Networks
FordA 89.0 [25] Convolutional Neural Networks
FordA 96.5 [3] Shapelet Transform
FordA 50.6–90.9 [11] Time-Series/Class-Aware Temporal and Contextual Contrasting
FordB 92.9–100.0 [22] Lightweight Attention Networks
FordB 49.0–83.0 [10] RandomForest, Rocket, Minirocket, Multirocket
FordB 63.8–83.1 [24] LSRSC (Centered Kernel Alignment)
FordB 53.1–81.7 [17] Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
FordB 51.9–56.4 [6] Residual Reservoir Computing Neural Networks
FordB 70.0 [25] Convolutional Neural Networks
FordB 91.5 [3] Shapelet Transform
FordB 50.9–88.2 [11] Time-Series/Class-Aware Temporal and Contextual Contrasting
GunPoint1 85.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
GunPoint1 85.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
GunPoint1 68.0–99.0 [17] Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
GunPoint2 57.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
GunPoint3 68.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
GunPoint4 88.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
PowerCons 73.0–100.0 [10] RandomForest, Rocket, Minirocket, Multirocket
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung.