Adaptive Law-Based Transformation (ALT): A Lightweight Feature Representation for Time Series Classification

Marcell T. Kurbucz [email protected] Balázs Hajós Balázs P. Halmos Vince Á. Molnár Antal Jakovác Department of Computational Sciences, Wigner Research Centre for Physics, 29-33 Konkoly-Thege Miklós Street, Budapest, 1121, Hungary Department of Statistics, Corvinus University of Budapest, 8 Fővám Square, Budapest, 1093, Hungary Faculty of Science, Eötvös Loránd University, 1/A Pázmány Péter Walkway, Budapest, 1117, Hungary Faculty of Engineering and Natural Sciences, Tampere University, Kalevantie 4, Tampere, 33100, Finland

Abstract

Time series classification (TSC) is fundamental in numerous domains, including finance, healthcare, and environmental monitoring. However, traditional TSC methods often struggle with the inherent complexity and variability of time series data. Building on our previous work with the linear law-based transformation (LLT)—which improved classification accuracy by transforming the feature space based on key data patterns—we introduce adaptive law-based transformation (ALT). ALT enhances LLT by incorporating variable-length shifted time windows, enabling it to capture distinguishing patterns of various lengths and thereby handle complex time series more effectively. By mapping features into a linearly separable space, ALT provides a fast, robust, and transparent solution that achieves state-of-the-art performance with only a few hyperparameters.

keywords:

Time series classification, Representation learning , Feature engineering , Artificial intelligence

^†^†journal: arXiv

1 Introduction

Time series classification (TSC) is essential in various domains such as finance, healthcare, and environmental monitoring, where the goal is to categorize temporal data into predefined classes [12, 13]. Traditional TSC approaches often rely on feature extraction methods designed to capture the temporal dynamics and structural patterns inherent in time series data [1, 4, 14]. However, these methods may struggle with the complexities and variability of time series data.

Our previous work introduced the linear law-based transformation (LLT) method, which performs uni- and multivariate TSC tasks by transforming the feature space based on identified governing patterns in the data [20]. LLT uses time-delay embedding and spectral decomposition to extract linear laws from training data and applies these laws to transform test data, resulting in improved classification accuracy with low computational cost.

In this paper, we build upon the LLT method by introducing an enhanced approach called adaptive law-based transformation (ALT) that utilizes variable-length shifted time windows. Unlike LLT, which operates on fixed-length windows, ALT explores patterns of varying lengths and shifts, making it more effective in capturing distinguishable patterns within time series data. This flexibility allows the method to identify local patterns of different scales, enhancing its ability to classify complex time series.

Similar to LLT, our method aims to transform features into a linearly separable feature space, offering a fast, robust, and transparent solution that achieves state-of-the-art performance. By reducing the need for extensive hyperparameter tuning and incorporating variable-length patterns, ALT simplifies the modeling process and enhances interpretability, setting it apart from mainstream neural networks and other deep learning techniques.

We evaluated ALT on eleven benchmark time series datasets, demonstrating its effectiveness compared to existing TSC techniques, including the original LLT method. The results show that the proposed approach not only achieves higher accuracy but also offers advantages in speed and transparency.

The remainder of this paper is organized as follows: Section 2 reviews related work, including the LLT method. Section 3 describes the datasets used and details our proposed method. Section 4 presents and discusses the experimental results. Finally, Section 5 concludes the paper and suggests directions for future research.

2 Related Work

Time series classification (TSC) methods can generally be grouped into three main categories: feature-based, distance-based, and deep learning-based approaches. Each category offers distinct advantages and faces specific challenges.

Feature-based approaches extract meaningful representations from time series data before applying classification algorithms. These representations may capture statistical descriptors [14], spectral transformations such as the discrete Fourier transform (DFT) or discrete wavelet transform (DWT) [1], or model-based features derived from techniques like autoregressive integrated moving average (ARIMA) [4]. Shapelet-based methods, which identify short, discriminative subsequences (shapelets) within the data [29], can be considered a subset of feature-based methods [18]. Shapelet-based approaches focus on local features that are highly interpretable and often effective for capturing localized variations, though they may struggle with multi-scale patterns and can be computationally intensive for long time series. Feature-based representations are typically classified using conventional methods such as logistic regression, random forests, or support vector machines (SVM).

Distance-based methods measure the similarity or dissimilarity between entire time series without explicitly transforming them into feature vectors. A well-known example is dynamic time warping (DTW) [26], which is robust to local temporal distortions and useful for aligning time series. However, these methods can become computationally expensive as the dataset size grows, and they lack an interpretable intermediate representation of the data.

Deep learning-based methods automatically learn hierarchical feature representations directly from raw time series. Convolutional neural networks (CNNs) are adept at identifying local temporal correlations, while recurrent neural networks (RNNs) excel at capturing sequential patterns, including long-term dependencies [13, 19, 30]. While deep learning methods often achieve strong empirical performance, they typically require large labeled datasets, involve extensive hyperparameter tuning, and may lack transparency in their learned representations.

The linear law-based transformation (LLT) [20] integrates elements of feature-based and distance-based methods. By using time-delay embedding and spectral decomposition, LLT extracts governing patterns from training data and applies these patterns to unseen instances, transforming the feature space to improve classification accuracy. Despite its low computational cost, LLT relies on fixed-length windows, which can limit its ability to capture patterns of variable lengths.

Building on LLT, this work introduces the adaptive law-based transformation (ALT). ALT incorporates variable-length shifted time windows to capture local patterns across multiple temporal scales while maintaining interpretability and computational efficiency. This adaptive design enables ALT to effectively handle complex time series, bridging the gaps between diverse TSC approaches.

3 Data and Methodology

3.1 Employed Data

This study utilizes eleven real-world datasets sourced from the UCR Time Series Classification Archive [8, 9].¹¹1These datasets are available at: https://www.timeseriesclassification.com (retrieved: January 15, 2025). The datasets are detailed in Table 1.

Table 1: Overview of the datasets employed in this study

Dataset	Type	Classes	Features	Train Size	Test Size	Length	Balanced	Description
BasicMotions	Multivariate	4	6	40	40	100	Yes	Contains motion sensor data from four different activities performed by participants.
Coffee	Univariate	2	1	28	28	286	Yes	Spectrographs of two types of coffee beans, with the task of differentiating between them.
Epilepsy	Multivariate	4	3	137	138	207	Yes	Data collected from a tri-axial accelerometer while participants performed four tasks, including mimicking a seizure.
Epilepsy2	Univariate	2	1	80	11420	178	Train only	Single-channel EEG measurements aimed at determining whether a participant is experiencing a seizure.
FordA	Univariate	2	1	1320	3601	500	Yes	Measurements of engine noise in automotive production, used to detect specific symptoms.
FordB	Univariate	2	1	810	3636	500	Yes	Similar to FordA, but focuses on detecting different symptoms in engine noise measurements.
GunPoint1	Univariate	2	1	50	150	150	Yes	This dataset records X-axis hand motions for “Gun-Draw” and “Point” actions by two actors.
GunPoint2	Univariate	2	1	135	316	150	Yes	Variation of the GunPoint dataset focusing on distinguishing participants from different age groups.
GunPoint3	Univariate	2	1	135	316	150	Yes	Variation of the GunPoint dataset focusing on distinguishing male and female participants.
GunPoint4	Univariate	2	1	135	316	150	Yes	Variation of the GunPoint dataset focusing on distinguishing old and young participants.
PowerCons	Univariate	2	1	180	180	144	Yes	Device power consumption data, with the task of determining the operational status.
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung.

3.2 Feature Representation and Classification

A general TSC task can be formalized as follows. The input data is represented as $x_{t}^{i,j}$ , where $t\in{1,2,\dots,h}$ denotes the observation times, $i\in{1,2,\dots,\tau}$ identifies the instances, and $j\in{1,2,\dots,m}$ indexes the different input series belonging to a given instance. The output $y^{i}\in{1,2,\dots,c}$ identifies the class of instance $i$ . The task is to predict the classes from the input data. To address this task, we use the following algorithm:

[A1]

Data Splitting. Divide the instances into learning ( $Lr$ ), training ( $Tr$ ), and test ( $Te$ ) subsets using random selection stratified by class representation.
[A2]

Sequence Extraction. (For each $Lr$ , $j$ , and $(r,l,k)$ ): Extract $r$ -length sequences using shifted time windows (shifted by $k$ ), and take out $2l-1$ points evenly. The triplets $(r,l,k)$ are pre-defined parameters, where $r\leq h$ , and $(2l-2)\mid(r-1)$ . For a given $Lr$ , $j$ , and $(r,l,k)$ , $\left\lfloor{\frac{h-r+1}{k}}\right\rfloor$ sequences are generated.
[A3]

Shapelet Vectors. (For each sequence): Perform $l$ -dimensional time-delay embedding [27] ( $S$ )—where $2l-1$ denotes the length of the given sequence, and $S$ is a symmetric matrix. Perform spectral decomposition of $S$ . The eigenvector for the smallest absolute eigenvalue ( $\in\mathbb{R}^{+}\cup\{0\}$ ) is called the $v$ shapelet vector, and $Sv\approx 0$ .²²2Note that this step relates to principal component analysis (PCA) [15], which extracts informative directions using eigenvectors of the largest eigenvalues. In contrast, ALT focuses on the dimension where $S$ shows the least variability, using the corresponding $v$ vector to compare shapelets.
[A4]

Shapelet Matrices. (For each $j$ and $(r,l,k)$ ): Use shapelet vectors related to the same $j$ and $(r,l,k)$ pairs as the column vectors of the shapelet matrix $P$ . Group patterns based on the related class within $P$ ( $c$ classes result in $c$ partitions within the $P$ matrix).

[A5]

Transformation. (For each $Tr$ , $j$ , and $(r,l,k)$ ): Let $s=\frac{r-1}{2l-2}$ and $o=\left\lfloor\frac{h-sl+1}{k}\right\rfloor$ . Embed the instance into an $o\times l$ matrix $(A)$ as follows:

A=\begin{pmatrix}x_{1}^{i,j}&x_{s+1}^{i,j}&\dots&x_{(l-1)s+1}^{i,j}\\ x_{k+1}^{i,j}&x_{k+s+1}^{i,j}&\dots&x_{k+(l-1)s+1}^{i,j}\\ \vdots&\vdots&\ddots&\vdots\\ x_{(o-1)k+1}^{i,j}&x_{(o-1)k+s+1}^{i,j}&\dots&x_{(o-1)k+(l-1)s+1}^{i,j}\\ \end{pmatrix}.

(1)

Right-multiply this matrix with the $P$ shapelet matrix related to the same pair of $j$ and $(r,l,k)$ , that is, $O=AP$ . Shapelets from each class in $P$ “compete” to transform the $A$ matrix close to null vectors.

[A6]

Feature Generation. (For each transformed matrix): Square the values of the resulting $O$ and partition it by the class from which the shapelets originate. Different methods are used to extract features from the resulting partitions. For example, identify a specific percentile in all the rows, then calculate different statistical indicators from the percentiles. Alternatively, calculate a statistical indicator from all the values in the partitions. After this step, the original $m$ signals of an instance are represented in an $m\times c\times n\times g$ dimensional feature space, where $n$ is the number of extraction methods used, and $g$ is the number of $(r,l,k)$ triplets used.
[B1]

Classifier Tuning and Evaluation. Utilize new features to tune advanced classifiers (e.g., $K$ -nearest neighbors) via Bayesian hyperparameter optimization and cross-validation. Evaluate classifiers’ accuracy, tuning, and classification time on the training set.
[B2]

Test and Benchmark. Similar to steps [A5–A6], transform the test set ( $Te$ ), generate new features, and apply tuned classifiers. Measure out-of-sample classification speed and accuracy. Benchmark results against state-of-the-art methods.

Figure 1 illustrates the complete feature representation and classification procedure, including a law selection step [C1] that is planned for implementation in a future study—see Section 5 for more details.

Refer to caption — Figure 1: Applied ML framework

3.3 Software and Settings

We implemented steps [A1–A6] in Python to transform the original feature spaces. The transformed features were then used to train KNN [2] and SVM [7] classifiers in the MATLAB Classification Learner App to perform steps [B1–B2].³³3More information can be found at https://www.mathworks.com/help/stats/classificationlearner-app.html (retrieved: January 15, 2025). During the classification procedure, a 30-step Bayesian hyperparameter optimization with 5-fold cross-validation was applied.⁴⁴4As a benchmark, we also used optimizable neural networks on the raw time-series data with a 500-step Bayesian hyperparameter optimization and 5-fold cross-validation. These benchmark results are presented in Table A.1 in the Appendix.

We also optimized the hyperparameters of the proposed method $(r,l,k)$ to achieve the highest classification accuracy. Furthermore, during feature extraction [A6], we incorporated various statistical indicator pairs that yielded the best performance. From the rows of the matrix $O$ , the mean and 5^th percentile were computed, followed by calculations of the mean, variance, and the third and fourth moments. The exact parameter settings applied to each dataset are detailed in Table A.2 in the Appendix.

4 Results and discussion

The classification outcomes obtained with ALT are summarized in Table 2.

Table 2: Classification results

Dataset	Validation Accuracy	Test Accuracy	Classification Method	Transform. Time (s)	Classification Time (s)	Benchmark
BasicMotions	100.0%	100.0%	KNN	1.90	9.00	95.3–100.0%
Coffee	100.0%	100.0%	KNN	1.22	6.41	78.6–100.0%
Epilepsy	96.1%	97.8%	SVM	84.58	12.87	85.0–100.0%
Epilepsy2	95.0%	93.8%	KNN	48.09	7.43	89.4–100.0%
FordA	97.5%	97.5%	SVM	915.80	28.95	49.0–100.0%
FordB	84.9%	94.4%	KNN	3069.00	14.54	50.9–100.0%
GunPoint1	100.0%	96.7%	SVM	7.49	7.90	68.0–100.0%
GunPoint2	98.5%	93.0%	SVM	16.25	13.96	57.0–100.0%
GunPoint3	100.0%	99.4%	KNN	6.99	6.85	68.0–100.0%
GunPoint4	100.0%	100.0%	KNN	2.32	6.62	88.0–100.0%
PowerCons	92.4%	93.3%	SVM	3.45	9.07	73.0–100.0%
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung. Results were obtained using 30 iterations of Bayesian hyperparameter optimization in the MATLAB Classification Learner App. Benchmarks were derived from the test accuracies reported by the studies summarized in Table A.3 in the Appendix.

As Table 2 shows, ALT consistently achieves high validation and test accuracies across all eleven datasets, including perfect scores (100%) on BasicMotions, Coffee, and GunPoint4. Transformations typically complete within a practical time frame; however, for larger datasets (e.g., FordB), the transformation step can be more time-consuming. This overhead arises primarily from shapelet vector generation and spectral decomposition steps. Once the transformed features are computed, classification (via KNN or SVM) is relatively fast.

Table A.2 details the hyperparameter and feature-extraction settings employed for each experiment, including the ratio of data used for shapelet generation versus classifier training. Notably, only a small subset of the data is typically required for learning shapelets, highlighting ALT’s efficiency in deriving class-relevant patterns.

For additional context, Table A.1 compares ALT’s accuracy to that of a neural network benchmark using an optimizable feed-forward architecture (MLP) implemented in MATLAB on the raw time-series data.⁵⁵5Neural networks were tuned using $500$ -step Bayesian hyperparameter optimization and $5$ -fold cross-validation. On most datasets, regardless of their length, ALT outperforms or closely matches the neural network solution despite having far fewer hyperparameters and a shorter optimization process. Furthermore, the benchmark compilation in Table 2 demonstrates that ALT is highly competitive against a wide range of state-of-the-art approaches, including shapelet-based methods and advanced neural and kernel techniques.

Across tasks, ALT’s ability to capture subsequence patterns of varying lengths proves advantageous, particularly for datasets with subtle class-distinguishing events (e.g., Epilepsy, GunPoint2). This adaptability is reflected in consistent improvements over baseline neural methods, which often struggle with more complex sensor signals (e.g., FordA, FordB). Although certain tasks (e.g., Coffee, GunPoint4) are relatively straightforward for most algorithms, ALT maintains robust reliability while retaining interpretability by design.

5 Conclusion and Future Works

In this paper, we introduced ALT, a novel method for time series classification that generalizes our previous LLT approach. By incorporating variable-length shifted windows, it captures local subsequence patterns of different scales and embeds them in a linearly separable feature space. Extensive experiments across eleven diverse datasets confirm ALT’s capacity to deliver competitive or state-of-the-art results, as evidenced by Tables 2, A.1, and A.3.

In future work, we plan to integrate data-driven mechanisms for automatically tuning $(r,l,k)$ , thus further reducing manual hyperparameter exploration. Additionally, we aim to investigate shapelet pruning techniques (see step [C] in Figure 1) to lower computational overhead, making ALT scalable to very large time series with minimal performance loss. The method’s interpretability could also be enriched by qualitative visualization of extracted shapelet vectors, potentially illuminating latent domain structures. Finally, exploring ALT’s capabilities in specialized domains like multi-channel EEG monitoring or IoT anomaly detection may reveal further performance gains and highlight the role of domain-specific knowledge in shaping the transformation pipeline.

Acknowledgments

The research was supported by the Hungarian Government and the European Union in the framework of a Grant Agreement No. MILAB RRF-2.3.1-21-2022-00004. Project no. PD142593 was implemented with the support provided by the Ministry of Culture and Innovation of Hungary from the National Research, Development, and Innovation Fund, financed under the PD_22 “OTKA” funding scheme.

References

Agrawal et al. [1993] Agrawal, R., Faloutsos, C., & Swami, A. N. (1993). Efficient similarity search in sequence databases. In Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO) (pp. 69–84). Springer.
Altman [1992] Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46, 175–185. Reference for KNN method.
Bostrom [2018] Bostrom, A. (2018). Shapelet transforms for univariate and multivariate time series classification. Ph.D. thesis University of East Anglia.
Box et al. [2015] Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
Cai et al. [2024] Cai, R., Peng, L., Lu, Z., Zhang, K., & Liu, Y. (2024). Dcs: Debiased contrastive learning with weak supervision for time series classification. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5625–5629). doi:10.1109/ICASSP48485.2024.10446381.
Ceni & Gallicchio [2023] Ceni, A., & Gallicchio, C. (2023). Residual reservoir computing neural networks for time-series classification. ESANN.
Cortes & Vapnik [1995] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273–297. Reference for SVM.
Dau et al. [2019] Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., & Keogh, E. (2019). The ucr time series archive. IEEE/CAA Journal of Automatica Sinica, 6, 1293–1305.
Dau et al. [2018] Dau, H. A., Keogh, E., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., & Hexagon-ML (2018). The ucr time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
Dhariyal et al. [2023] Dhariyal, B., Le Nguyen, T., & Ifrim, G. (2023). Back to basics: A sanity check on modern time series classification algorithms. In International Workshop on Advanced Analytics and Learning on Temporal Data (pp. 205–229). Springer.
Eldele et al. [2023] Eldele, E., Ragab, M., Chen, Z., Wu, M., Kwoh, C.-K., Li, X., & Guan, C. (2023). Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, .
Esling & Agon [2012] Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR), 45, 1–34.
Fawaz et al. [2019] Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., & Muller, P.-A. (2019). Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33, 917–963.
Fulcher & Jones [2014] Fulcher, B. D., & Jones, N. S. (2014). Highly comparative feature-based time-series classification. IEEE Transactions on Knowledge and Data Engineering, 26, 3026–3037.
Gao et al. [2021] Gao, F., Tian, T., Yao, T., & Zhang, Q. (2021). Human gait recognition based on multiple feature combination and parameter optimization algorithms. Computational Intelligence and Neuroscience, 2021, 6693206.
Hussein et al. [2024] Hussein, D., Nelson, L., & Bhat, G. (2024). Sensor-aware classifiers for energy-efficient time series applications on iot devices. doi:10.48550/arXiv.2407.08715.
Ito & Chakraborty [2020] Ito, H., & Chakraborty, B. (2020). Fast and interpretable transformation for time series classification: A comparative study. International Journal of Applied Science and Engineering, 17, 269–280. URL: https://doi.org/10.6703/IJASE.202009_17(3).269. doi:10.6703/IJASE.202009_17(3).269.
Ji et al. [2019] Ji, C., Zhao, C., Pan, L., Liu, S., Yang, C., & Meng, X. (2019). A just-in-time shapelet selection service for online time series classification. Computer Networks, 157, 89–98.
Karim et al. [2019] Karim, F., Majumdar, S., Darabi, H., & Chen, S. (2019). Multivariate lstm-fcns for time series classification. Neural Networks, 116, 237–245.
Kurbucz et al. [2022] Kurbucz, M. T., Pósfay, P., & Jakovác, A. (2022). Facilitating time series classification by linear law-based feature space transformation. Scientific Reports, 12, 18026.
Lin et al. [2023] Lin, C., Wen, X., Cao, W., Huang, C., Bian, J., Lin, S., & Wu, Z. (2023). Nutime: Numerically multi-scaled embedding for large-scale time series pretraining. arXiv preprint arXiv:2310.07402, .
Mukhopadhyay et al. [2024] Mukhopadhyay, S., Dey, S., Mukherjee, A., Pal, A., & Ashwin, S. (2024). Time series classification on edge with lightweight attention networks. In 2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) (pp. 487–492). IEEE.
Pasos Ruiz et al. [2021] Pasos Ruiz, A., Flynn, M., Large, J., Middlehurst, M., & Bagnall, A. (2021). The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 35, 1–49. doi:10.1007/s10618-020-00727-3.
[24] Saini, U. S., Zhuang, Z., Yeh, C.-C. M., Zhang, W., & Papalexakis, E. E. (). Analysis of causal and non-causal convolution networks for time series classification. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM) (pp. 797–805). URL: https://epubs.siam.org/doi/abs/10.1137/1.9781611978032.91. doi:10.1137/1.9781611978032.91. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611978032.91.
Schlegel & Keim [2023] Schlegel, U., & Keim, D. A. (2023). A deep dive into perturbations as evaluation technique for time series xai. In L. Longo (Ed.), Explainable Artificial Intelligence (pp. 165--180). Cham: Springer Nature Switzerland.
Senin [2008] Senin, P. (2008). Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, 40.
Takens [1981] Takens, F. (1981). Dynamical systems and turbulence. Warwick, 1980, (pp. 366--381).
Xi et al. [2023] Xi, W., Jain, A., Zhang, L., & Lin, J. (2023). Lb-simtsc: An efficient similarity-aware graph neural network for semi-supervised time series classification. arXiv preprint arXiv:2301.04838, .
Ye & Keogh [2009] Ye, L., & Keogh, E. (2009). Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 947--956). ACM.
Zheng et al. [2014] Zheng, Y., Liu, Q., Chen, E., Ge, Y., & Zhao, J. (2014). Time series classification using multi-channels deep convolutional neural networks. In International Conference on Web-Age Information Management (pp. 298--310). Springer.

Appendix

Table A.1: Classification of raw datasets with neural networks

Dataset	Validation accuracy	Test accuracy	Training time (s)
BasicMotions	72.5%	87.5%	1941.7
Coffee	100.0%	100.0%	1105.7
Epilepsy	65.7%	67.4%	2745.3
Epilepsy2	80.0%	89.9%	1248.9
FordA	72.7%	72.0%	7347.9
FordB	63.0%	66.0%	5211.7
GunPoint1	98.0%	94.0%	1380.7
GunPoint2	96.3%	98.1%	2023.2
GunPoint3	99.3%	99.7%	1513.2
GunPoint4	100.0%	100.0%	866.5
PowerCons	100.0%	98.9%	1300.9
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung. Results were obtained using 500 iterations of Bayesian hyperparameter optimization and 5-fold cross-validation in the MATLAB Classification Learner App.

Table A.2: Applied parameters

Dataset	Learn- train ratio	Method	Used $(r,l,k)$ values
BasicMotions	0.25	mean - mean, $5^{\text{th}}$ percentile - $4^{\text{th}}$ moment	(53, 27, 1)
Coffee	0.25	$5^{\text{th}}$ percentile - mean	(3, 2, 1)
Epilepsy	0.25	mean - mean	(29, 15, 1), (69, 35, 1), (89, 45, 1), (149, 75, 1), (169, 85, 1), (189, 95, 1)
Epilepsy2	0.25	$5^{\text{th}}$ percentile - mean, $5^{\text{th}}$ percentile - variance	(19, 10, 1), (29, 15, 1)
FordA	0.20	$5^{\text{th}}$ percentile - mean	(23, 12, 1), (29, 15, 1), (85, 43, 1), (95, 48, 1), (205, 103, 1)
FordB	0.50	$5^{\text{th}}$ percentile - mean	(19,10,1),(39,20,1),(129,65,1),(139,70,1),(159,80,1), (169,85,1),
			(179,90,1),(199,100,1),(209,105,1),(275,138,1)
GunPoint1	0.20	mean - mean	(7, 4, 1), (31, 2, 1), (51, 6, 1), (81, 6, 1), (121, 11, 1), (121, 31, 1),
			(121, 61, 1), (121, 5, 1)
GunPoint2	0.50	mean - mean, $5^{\text{th}}$ percentile - excess kurtosis	(49, 25, 1), (59, 30, 1), (69, 35, 1), (89, 45, 1)
GunPoint3	0.20	mean - mean, $5^{\text{th}}$ percentile - mean	(3, 2, 1), (19, 10, 1), (39, 20, 1), (109, 55, 1)
GunPoint4	0.50	mean - mean	(3, 2, 1)
PowerCons	0.20	mean - mean	(3, 2, 1), (99, 50, 1)
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung. Results were obtained using 30 iterations of Bayesian hyperparameter optimization in the MATLAB Classification Learner App.

Table A.3: Literature benchmarks

Database	Test accuracy (%)	Reference	Method
BasicMotions	95.3–100.0	[23]	DTWD, ROCKET, CIF, HIVE-COTE
Coffee	96.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
Coffee	78.6–100.0	[17]	Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
Epilepsy	96.3–100.0	[23]	DTWD, ROCKET, CIF, HIVE-COTE
Epilepsy	95.7–97.1	[5]	Debiased Contrastive Learning with Weak Supervision
Epilepsy	85.0–99.0	[16]	CNN
Epilepsy2	89.4–100.0	[21]	Multi-Scaled Embedding for Large-Scale Time-Series Pretraining
FordA	96.8–100.0	[22]	Lightweight Attention Networks
FordA	79.3–86.4	[28]	LB-SimTSC (Similarity-Aware Graph Neural Network)
FordA	49.0–95.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
FordA	74.54–95.6	[24]	LSRSC (Centered Kernel Alignment)
FordA	56.7–93.6	[17]	Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
FordA	53.4–71.3	[6]	Residual Reservoir Computing Neural Networks
FordA	89.0	[25]	Convolutional Neural Networks
FordA	96.5	[3]	Shapelet Transform
FordA	50.6–90.9	[11]	Time-Series/Class-Aware Temporal and Contextual Contrasting
FordB	92.9–100.0	[22]	Lightweight Attention Networks
FordB	49.0–83.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
FordB	63.8–83.1	[24]	LSRSC (Centered Kernel Alignment)
FordB	53.1–81.7	[17]	Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
FordB	51.9–56.4	[6]	Residual Reservoir Computing Neural Networks
FordB	70.0	[25]	Convolutional Neural Networks
FordB	91.5	[3]	Shapelet Transform
FordB	50.9–88.2	[11]	Time-Series/Class-Aware Temporal and Contextual Contrasting
GunPoint1	85.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
GunPoint1	85.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
GunPoint1	68.0–99.0	[17]	Raw-ResNet, FoldCount-1NN, TimeAxisArea-1NN, DWT-1NN
GunPoint2	57.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
GunPoint3	68.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
GunPoint4	88.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
PowerCons	73.0–100.0	[10]	RandomForest, Rocket, Minirocket, Multirocket
Note: The original names of the GunPoint datasets, marked by numbers, are as follows: 1. GunPoint; 2. GunPointAgeSpan; 3. GunPointMaleVersusFemale; 4. GunPointOldVersusYoung.