UniTS: A Universal Time Series Analysis Framework Powered by Self-Supervised Representation Learning

Zhiyu Liang 0000-0003-0083-2547 Harbin Institute of TechnologyHarbinChina [email protected] , Chen Liang 0000-0002-1093-0362 Harbin Institute of TechnologyHarbinChina [email protected] , Zheng Liang 0000-0003-1844-4366 Harbin Institute of TechnologyHarbinChina [email protected] , Hongzhi Wang 0000-0002-7521-2871 Harbin Institute of TechnologyHarbinChina [email protected] and Bo Zheng 0009-0005-5309-3364 CnosDB Inc.BeijingChina [email protected]

Abstract.

Machine learning has emerged as a powerful tool for time series analysis. Existing methods are usually customized for different analysis tasks and face challenges in tackling practical problems such as partial labeling and domain shift. To improve the performance and address the practical problems universally, we develop UniTS, a novel framework that incorporates self-supervised representation learning (or pre-training). The components of UniTS are designed using sklearn-like APIs to allow flexible extensions. We demonstrate how users can easily perform an analysis task using the user-friendly GUIs, and show the superior performance of UniTS over the traditional task-specific methods without self-supervised pre-training on five mainstream tasks and two practical settings.

Time series analysis, Self-supervised learning, Pre-training.

^†^†conference: the 2024 International Conference on Management of Data; June 9–15, 2024; Santiago, AA, Chile^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Computing methodologies Machine learning^†^†ccs: Mathematics of computing Time series analysis

1. Introduction

Machine learning methods have achieved high performance in many time series analysis tasks, such as classification, forecasting, and anomaly detection(Löning et al., 2019; Tavenard et al., 2020; Schmidl et al., 2022). However, existing learning-based time series analysis algorithms still face challenges in real-world scenarios. First, the real-world time series data are usually partially labeled due to the high cost or the lack of knowledge for labeling, while the supervised machine learning techniques require adequate labels to perform well. Second, a common problem in practical applications is the domain shift, i.e., the distributions between the training data samples and the data encountered when deploying the models are different, which makes the models difficult to generalize. Last but not least, while there are many task- and domain-specific approaches, it is an open question to determine appropriate methods for a given application (a.k.a. analysis task and dataset).

Contributions. To deal with the above issues, in this paper, we propose UniTS, a novel universal framework for time series analysis. Our idea is to first perform self-supervised pre-training using the unlabeled data to get unified time series representations, which are more independent of the tasks (e.g. classification or anomaly detection) and domains (i.e., data distributions). Next, the models are learned for arbitrary analysis tasks by appending an output module upon the representations and then fine-tuning the model parameters. Compared to traditional machine learning pipelines that learn end-to-end models from scratch for specific tasks and domains, the UniTS framework has several advantages.

First, the pre-training module can leverage the inherent structure of the unlabeled data to learn class-distinguishing information, so that only a few labels are needed for fine-tuning. The additional information learned via self supervision can also improve the performance of different tasks. Second, benefiting from the self-supervised representation learning, the pre-training module can learn features transferable across domains by disentangling the domain and class information (Shen et al., 2022). Third, as UniTS produces unified representations for different pre-training models and downstream analysis tasks, we design a feature fusion module that automatically combines the features of diverse models to jointly facilitate the tasks, so that to avoid the method selection for applications. In addition, the UniTS pipeline can be more efficient when performing several tasks on one dataset, because the pre-training is needed only once, while the fine-tuning usually requires a much smaller number of iterations compared to the training from scratch.

Table 1. Examples of five mainstream time series analysis tasks.

Task	Target	Description
Classification	$y_{i}\in\{c_{j}\|j=1,\ldots,C\}$	The class label.
Clustering	$y_{i}\in\{j\|j=1,\ldots,C\}$	The cluster assignment.
Forecasting	$\boldsymbol{y}_{i}=\boldsymbol{x}_{i,\ T+1:T+H}$	The values at the $H$ subsequent time steps.
Anomaly detection	$\boldsymbol{y}_{i}\in\{0,1\}^{T}$	The bool values indicating whether the observations at each time step is anomalous.
Missing value imputation	$\boldsymbol{y}_{i}=\boldsymbol{x}_{i,j,k},(j,k)\in P_{i}$	The missing values. $P_{i}$ is the set of index indicating the positions of the missing values.

Refer to caption — Figure 1. Framework overview of UniTS.

While there exist several machine learning tools for time series, such as tslearn (Tavenard et al., 2020) and sktime (Löning et al., 2019), to the best of our knowledge, our UniTS is the first to incorporate self-supervised representation learning for universal time series analysis to achieve the aforementioned benefits. We integrate user-friendly Web interfaces and flexible modes of hyper-parameter configuration for usability. UniTS is designed as a framework that is agnostic to model architecture, pre-training algorithm, feature fusion method and analysis task. The model architecture is taken as hyper-parameters, and the latter three components are designed as templates using the popular sklearn-like APIs, which allows easy extension to support different models, algorithms and tasks.

In this paper, we demonstrate the usage of UniTS for five mainstream time series analysis tasks, including classification, clustering, forecasting, anomaly detection, and missing value imputation. We also examine the performance of UniTS in these tasks and the practical problems of partial labeling and domain shift. We have made the source code and the supplementary materials publicly available at https://github.com/LceOmlet/UniTS to enable the community to use and extend the system.

2. System Overview

2.1. Problem Formulation

We first introduce the unified formulation of time series analysis, which serves as the basis of our framework.

Definition 2.0 (Time Series Analysis).

Given a time series sample $\boldsymbol{x}_{i}=[\boldsymbol{x}_{i,1},\ldots,\boldsymbol{x}_{i,T}]\in\mathbb{R}^{D\times T}$ where $\boldsymbol{x}_{i,t}\in\mathbb{R}^{D}$ is the observation at time $t$ , $D$ is the number of dimensions, and $T$ is the length of time ranges, a time series analysis task aims to build a function $f$ that can map $\boldsymbol{x}_{i}$ to a task-dependent prediction $\hat{\boldsymbol{y}}_{i}$ (or $\hat{y}_{i}$ ), such that the prediction is close to the (usually unknown) target $\boldsymbol{y}_{i}$ ( $y_{i}$ ) . Table 1 shows examples of five important time series tasks.

In machine learning methods, the mapping function $f(\boldsymbol{x}_{i})$ is learned from a training dataset $\boldsymbol{X}=[\boldsymbol{x}_{1},\ldots,\boldsymbol{x}_{N}]\in\mathbb{R}^{N\times D\times T}$ and an optional label set $\boldsymbol{Y}\in\mathbb{R}^{N\times*}$ . Unlike the traditional approaches that learn $f_{{}_{\mathcal{T}}}(\boldsymbol{x}_{i})$ from scratch for each task $\mathcal{T}$ , our UniTS first pre-trains one or more task-independent encoders $h_{m}$ using only $\boldsymbol{X}$ to map the time series to unified representations, as $\boldsymbol{z}_{i,m}=h_{m}(\boldsymbol{x}_{i})\in\mathbb{R}^{K}$ ( $m=1,\ldots,M$ ). Then, given a task $\mathcal{T}$ , it fuses the features $\boldsymbol{z}_{i,1},\ldots,\boldsymbol{z}_{i,M}$ to one vector $\boldsymbol{z}_{i}^{\prime}\in\mathbb{R}^{K^{\prime}}$ to build a task-specific model based on the encoders, with $f_{{}_{\mathcal{T}}}(\boldsymbol{x}_{i})=g_{{}_{\mathcal{T}}}(\boldsymbol{z}_{i}^{\prime})$ where $g_{{}_{\mathcal{T}}}$ is an output model. The design goal is to take advantage of various self-supervised pre-training methods to easily achieve universal performance improvement for different analysis tasks and to address the challenges of partial labeling and domain shift. For the description, we denote $\boldsymbol{Z}_{m}=[\boldsymbol{z}_{1,m},\ldots,\boldsymbol{z}_{N,m}]\in\mathbb{R}^{N\times K}$ .

2.2. UniTS Framework

To achieve universal time series analysis via self-supervised pre-training, UniTS is designed with three main modules, including the Pre-training Module, the Feature Fusion Module, and the Analysis Task Module, as illustrated in Figure 1.

First of all, UniTS creates one or more instances of the pre-training templates with their hyper-parameters, where each template is a self-supervised learning method. UniTS has implemented various types of template, as discussed in Section 3.1. During pre-training, each instance separately learns its encoder $h_{m}$ ( $m=1,\ldots,M$ ), while all encoders $h_{1},\ldots,h_{M}$ are jointly used for the analysis tasks. The variety of pre-trained representations can be complementary with each other to achieve better performance.

After pre-training, the feature fusion module combines all learned representations of each sample $\boldsymbol{x}_{i}$ into one embedding, denoted as $\boldsymbol{z}_{i}^{\prime}\in\mathbb{R}^{K^{\prime}}$ . The goal of this module is to automatically fuse the information from different pre-training instances to better facilitate the tasks. The details of this module are shown in Section 3.2.

Any analysis task $\mathcal{T}$ can be performed on top of $\boldsymbol{z}_{i}^{\prime}$ by using a task-specific function $g_{{}_{\mathcal{T}}}$ to map $\boldsymbol{z}_{i}^{\prime}$ to the prediction $\hat{\boldsymbol{y}}_{i}$ (or $\hat{y}_{i}$ ) of the target, and then fine-tuning the models (including the encoders, the learnable feature fusion model and the task-specific layers) by minimizing a loss function of the task. Presently, UniTS supports the five mainstream tasks illustrated in Table 1, while other tasks can be seamlessly integrated using the sklearn-style APIs. The analysis task module is discussed in detail in Section 3.3.

Discussion. From the above description, any time series analysis task can be carried out with UniTS in a unified way. At the training stage, the model $f_{{}_{\mathcal{T}}}$ is built following the above pipeline. During the inference stage, the learned $f_{{}_{\mathcal{T}}}$ is used to map the input series to the predictions. To deal with the problems of partial labeling, only a small size of the labeled data is required for fine-tuning, while the pre-training stage does not rely on any labels (Figure 2(a)). Similarly, when facing the problem of domain shift, the user can pre-train the encoders using the data from an available source domain to get the transferable representations, and then fine-tune $f_{{}_{\mathcal{T}}}$ using a small data set from the target domain (Figure 2(a)). In any case, the pre-trained encoders can be directly used without re-training, and the data size and the number of iterations for fine-tuning can be much smaller than the training from scratch (Liang et al., 2023) while guaranteeing competitive performance (Figure 3).

3. System Internals

This section introduces the technical details of the three main modules in UniTS. All the techniques mentioned below have been implemented in the current version.

3.1. Pre-training Module

UniTS provides various self-supervised representation learning methods that can be used as pre-training templates. Below, we explain the currently implemented pre-training templates, and discuss how to easily add new templates via the sklearn-like APIs.

In general, UniTS has integrated the following pre-training methods, which are divided into three types based on different pre-training objective functions (see the references for more details).

Contrastive Learning. UniTS currently supports time series contrastive learning at three levels, i.e. the whole-series level (Liang et al., 2023), the sub-sequence level (Franceschi et al., 2019), and the timestamp level (Yue et al., 2022). These methods encourage the representations generated from the similar inputs to be closer than those of dissimilar inputs, which has shown effective in extracting informative features.

Autoregression. The autoregression-based pre-training algorithm (Zerveas et al., 2021) learns the representations by masking some observations of the input series (e.g., set to 0) and then predicting the masked values using the unmasked data. It is inspired by the masked language model in natural language processing, since both time series and sentences share the same nature of sequential dependencies.

Hybrid. This branch of approach (Eldele et al., 2021) optimizes a hybrid objective of the two types above, which may be better than the individual objectives in some cases, but not always.

The pre-training template is designed using a unified sklearn-like interface to allow flexible extension. A new algorithm can be seamlessly integrated as long as it is wrapped in a Python class with two methods: fit which takes the input $\boldsymbol{X}$ for pre-training, and transform which maps $\boldsymbol{X}$ to the representations $\boldsymbol{Z}$ .

3.2. Feature Fusion Module

This module fuses the representations of a variety of pre-training instances. The module is also designed using sklearn-style interfaces. A feature fusion class contains a transform function that maps a set of representations $(\boldsymbol{Z}_{1},\ldots,\boldsymbol{Z}_{M})$ into one unified $\boldsymbol{Z}^{\prime}\in\mathbb{R}^{N\times K^{\prime}}$ , and $\boldsymbol{Z}^{\prime}$ is used for the analysis tasks. Note that the parameters of a feature fusion instance can be optimized at the fine-tuning stage if they are learnable, which allows automatically extracting information from all pre-trained representations. UniTS now supports two basic feature fusion methods as described below.

Concatenation. The simplest but most popular way of feature fusion is directly concatenating the features of each sample, i.e., $\boldsymbol{z}_{i}^{\prime}=\boldsymbol{z}_{i,1}\oplus\ldots\oplus\boldsymbol{z}_{i,M}$ , where $\oplus$ is the concatenation operation.

Projection. One can also use a learnable model $p$ to project the concatenated features into another latent space. This is especially effective in some cases such as clustering where dimension reduction is usually required. Formally, we have $\boldsymbol{z}_{i}^{\prime}=p(\boldsymbol{z}_{i,1}\oplus\ldots\oplus\boldsymbol{z}_{i,M})$ .

3.3. Analysis Task Module

The analysis task module is designed as a template wrapped with the sklearn-like APIs, where each task is an instance. The template contains two major components: a fit method which performs the task-specific fine-tuning, and a predict method which outputs the final prediction. UniTS currently supports the five important tasks described in Table 1. Below, we briefly explain the technical details.

Classification. This task is performed in a standard manner. The output model projects the representations into $C$ classes, and the softmax function is used to generate the class distribution. The standard cross-entropy loss is used for fine-tuning, and the class that gives the maximum probability is determined as the prediction.

Clustering. The clustering task can be directly performed by running a classical clustering algorithm (e.g., $k$ -Means) on top of the representations. Moreover, one can also fine-tune the models for better performance. The fine-tuning is designed based on the $k$ -Means loss. At each epoch, the $k$ -Means algorithm is run on the representations to obtain the $C$ centroids. Then, the sum of the L2 norms between each representation vector and its centroid is added to the pre-training loss as a regularization term, which encourages the clustering structure of the representations. Note that we do not directly minimize the k-means loss to avoid the trivial solution, i.e, all representations become equal to their centroids.

Forecasting. Forecasting is performed in the standard way. A decoder is used to transform the representations into predictions. Then, the forecasting loss such as the mean square error (MSE) or the mean absolute error (MAE) is minimized for fine-tuning. In the inference stage, the decoder outputs are the forecasted values.

Anomaly detection. We employ a popular reconstruction-based framework (Schmidl et al., 2022) for anomaly detection. A decoder is added to project the representations to the predicted inputs $\hat{\boldsymbol{x}}_{i}$ . The objective is to minimize the reconstruction loss as $||\boldsymbol{x}_{i}-\hat{\boldsymbol{x}}_{i}||$ . For detection, an anomaly score $s_{t}$ is computed at each time step $t$ as $|\hat{\boldsymbol{x}}_{i,t}-\boldsymbol{x}_{i,t}|$ . The observation with the score larger than a threshold $\tau$ is determined as an anomaly, i.e., $\hat{\boldsymbol{y}}_{i,t}=1$ if $s_{t}>\tau$ and 0 otherwise.

Missing value imputation. This task is performed using a modern structure named denoising autoencoder (DAE) (Vincent et al., 2008). During fine-tuning, we generate a random binary mask $\boldsymbol{m}_{i}\in\{0,1\}^{D\times T}$ for each $\boldsymbol{x}_{i}$ . The masked sample $\boldsymbol{x}_{i}\otimes\boldsymbol{m}_{i}$ is input to the pre-trained encoders, where $\otimes$ denotes the element-wise multiplication. Then, a decoder is used on top of the representations to reconstruct the entire input $\boldsymbol{x}_{i}$ as $\hat{\boldsymbol{x}}_{i}$ . DAE is learned by minimizing the reconstruction loss $||\boldsymbol{x}_{i}-\hat{\boldsymbol{x}}_{i}||$ . At the inference stage, the missing values of an input $\boldsymbol{x}_{i}$ are first replaced by 0, i.e., $\boldsymbol{x}_{i,j,k}=0,\forall(j,k)\in P_{i}$ . Next, $\boldsymbol{x}_{i}$ is input into the fine-tuned model to predict $\hat{\boldsymbol{x}}_{i}$ . To this end, the missing value at position $(j,k)\in P_{i}$ is imputed using $\hat{\boldsymbol{x}}_{i,j,k}$ .

4. Demonstration

In our demonstration, we intend to show how UniTS can help the user to perform different time series analysis tasks, and how it can tackle the practical problems of partial labeling and domain shift.

Unified pipeline for time series analysis. We prepare the UEA datasets (Bagnall et al., 2018) for the audiences to interact with UniTS. They can also analyze their own data using our system. Overall, the user takes the following two steps to perform an anslysis task using UniTS.

$\ \ \bullet$ Pre-training. In this step, the user adopts the UniTS interfaces to load the data set and configure the pre-training methods with the templates to generate the instances, and then clicks the “Pre-training” button to start the self-supervised learning. UniTS visualizes the loss curves to help the user for monitoring. The user can select and save the encoders for the analysis tasks. Note that this step is only needed once. All the following processes can be repeatedly performed using the pre-trained encoders without re-training.

$\ \ \bullet$ Fine-tuning. In this stage, the user loads the pre-trained encoders and the training data of the task. The user can configure the feature fusion and analysis task modules through the GUIs. Once the “Fine-tuning” button is activated, UniTS starts to learn the model for the selected task. The loss curves are plotted as the pre-training process. UniTS also provides the visualization and evaluation of the results for different tasks to validate the models. After fine-tuning, the user can save the model as a standard JSON file which can be employed by any machine learning tool for inference.

Addressing the practical problems. Benefiting from the self-supervised pre-training which learns informative and transferable features, the user can tackle the problems of partial labeling and domain shift using UniTS. The process is illustrated in Figure 2(a).

$\ \ \bullet$ Partial labeling. In this scenario, the user only needs to load the available labeled data for fine-tuning. UniTS can achieve significantly better performance compared to traditional training from scratch using only the labeled data (Figure 3).

$\ \ \bullet$ Domain shift. To achieve cross-domain analysis using UniTS, the user can fine-tune the models with only a small size of data from the target domain based on the pre-trained encoders. The models learned with UniTS can be more generalizable than the models trained from scratch using the same data from the target domain (i.e. TFS-Target) or both domains (TFS-Both) (Figure 3).

Superior performance. We also evaluate UniTS using real-world tasks and settings. Our application scenarios include human action recognition (HAR), fault detection (FD) for two machines under different working conditions (a.k.a. domains) and server monitoring (SM). For benchmarking, we create one instance for every pre-training template described in Sec. 3.1 using their default encoders, and use a linear layer to fuse all representations into 64 dimensions. For task-specific output models, we use a standard linear layer for classification tasks and a simple multilayer perceptron (MLP) with one hidden layer as the decoder for forecasting, anomaly detection, and imputation. We compare UniTS with the traditional solutions of directly training the entire model $f_{{}_{\mathcal{T}}}$ (encoders + fusion layer + output model) from scratch using the task-specific loss function without self-supervised pre-training. The model architectures are the same as UniTS. The results in Figure 3 indicate the superiority of UniTS powered by self-supervised representation learning.

Acknowledgements.

This paper was supported by the NSFC grant (62232005, 62202126), the National Key Research and Development Program of China (2021YFB3300502), and the Postdoctoral Fellowship Program of CPSF (GZC20233457).

References

(1)
Bagnall et al. (2018) Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA multivariate time series classification archive, 2018. CoRR (2018).
Eldele et al. (2021) Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. 2021. Time-Series Representation Learning via Temporal and Contextual Contrasting. In IJCAI. 2352–2359.
Franceschi et al. (2019) Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. NeurIPS (2019).
Liang et al. (2023) Zhiyu Liang, Jianfeng Zhang, Chen Liang, Hongzhi Wang, Zheng Liang, and Lujia Pan. 2023. A Shapelet-Based Framework for Unsupervised Multivariate Time Series Representation Learning. PVLDB. 17, 3 (nov 2023), 386–399.
Löning et al. (2019) Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, and Franz J Király. 2019. sktime: A unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872 (2019).
Schmidl et al. (2022) Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. 2022. Anomaly detection in time series: a comprehensive evaluation. PVLDB. 15, 9, 1779–1797.
Shen et al. (2022) Kendrick Shen, Robbie M Jones, Ananya Kumar, Sang Michael Xie, Jeff Z HaoChen, Tengyu Ma, and Percy Liang. 2022. Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation. In ICML. 19847–19878.
Tavenard et al. (2020) Romain Tavenard, Johann Faouzi, Gilles Vandewiele, Felix Divo, Guillaume Androz, Chester Holtz, Marie Payne, Roman Yurchak, Marc Rußwurm, Kushal Kolar, and Eli Woods. 2020. Tslearn, A Machine Learning Toolkit for Time Series Data. JMLR 21, 118 (2020), 1–6.
Vincent et al. (2008) Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and Composing Robust Features with Denoising Autoencoders. In ICML. 1096–1103.
Yue et al. (2022) Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. 2022. Ts2vec: Towards universal representation of time series. In AAAI, Vol. 36. 8980–8987.
Zerveas et al. (2021) George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. 2021. A transformer-based framework for multivariate time series representation learning. In SIGKDD. 2114–2124.