This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Guaranteed Multidimensional Time Series Prediction via Deterministic Tensor Completion Theory

Hao Shu, Jicheng Li, Yu Jin,    Hao Shu, Jicheng Li, Yu Jin, Hailin Wang The work was supported by National Natural Science Foundation of China (12171384). (Corresponding author: Jicheng Li)H. Shu, J. Li, Y. Jin and H. Wang are with the School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, Shanxi, China (email: [email protected], [email protected], [email protected], [email protected]).
Abstract

In recent years, the prediction of multidimensional time series data has become increasingly important due to its wide-ranging applications. Tensor-based prediction methods have gained attention for their ability to preserve the inherent structure of such data. However, existing approaches, such as tensor autoregression and tensor decomposition, often have consistently failed to provide clear assertions regarding the number of samples that can be exactly predicted. While matrix-based methods using nuclear norms address this limitation, their reliance on matrices limits accuracy and increases computational costs when handling multidimensional data. To overcome these challenges, we reformulate multidimensional time series prediction as a deterministic tensor completion problem and propose a novel theoretical framework. Specifically, we develop a deterministic tensor completion theory and introduce the Temporal Convolutional Tensor Nuclear Norm (TCTNN) model. By convolving the multidimensional time series along the temporal dimension and applying the tensor nuclear norm, our approach identifies the maximum forecast horizon for exact predictions. Additionally, TCTNN achieves superior performance in prediction accuracy and computational efficiency compared to existing methods across diverse real-world datasets, including climate temperature, network flow, and traffic ride data. Our implementation is publicly available at https://github.com/HaoShu2000/TCTNN.

Index Terms:
multidimensional time series, prediction, deterministic tensor completion, temporal convolution low-rankness, exact prediction theory

I Introduction

Multidimensional time series, characterized by data with two or more dimensions at each time point, are often generated in a wide range of real-world scenarios, such as regional climate data [1], network flow data [2], video data [3], international relations data [4], and social network data [5]. Accurate forecasting of these temporal datasets is critical for numerous applications [6] [7] [8], such as optimizing route planning and traffic signal management in intelligent transportation systems [9] [10]. Affected by the acquisition equipment and other practical factors, the length of the observation data may be limited. For instance, high-throughput biological data, such as gene expression datasets, typically contain numerous features but only a few time samples [11]. Therefore, developing reliable few-sample multidimensional time series prediction methods has been a long-standing basic research challenge.

Traditional time series models, such as vector autoregression [12] and its variants [13] [14] [15], are not well-suited for handling multidimensional data. Recent studies represent multidimensional time series as tensors [5], where the first dimension corresponds to time and the remaining dimensions capture features or locations. For example, temperature data across different geographical locations can be structured as a third-order tensor (time × longitude × latitude), inherently preserving interactions between dimensions [16]. This tensor representation has enabled the development of methods such as tensor factor models [5] [17] and tensor autoregressive models [18] [19] [20] [21].

Another promising approach is to frame multidimensional time series prediction as a tensor completion task, where the regions of interest are treated as missing values. As illustrated in Fig.1, this process involves three steps: concatenating observed and predicted time series into an incomplete tensor, completing the tensor, and extracting the forecasted values. Within this framework, the low-rankness of multidimensional time series has been extensively utilized, leading to decomposition-based forecasting methods [1] [22] [23]. Nevertheless, existing tensor factorization, autoregressive, and decomposition models fail to offer a definitive conclusion about the number of samples that can be predicted exactly.

Refer to caption
Figure 1: Illustration of tensor completion for multidimensional time series prediction, where black cubes represent unsampled entries, and other cubes represent sampled entries.

Low-rank tensor completion methods based on nuclear norm minimization, supported by recovery theory, offer a promising alternative. Some researchers have converted multidimensional time series into matrices and applied deterministic matrix nuclear norm minimization theories for predictive modeling [24] [25] [26] [27]. While these approaches provide theoretical support, they suffer from two key limitations: (1) the conversion from tensors to matrices loses critical structural information, reducing prediction accuracy; and (2) the computation of large-scale matrix nuclear norms imposes significant computational and memory costs. Despite advances in deterministic matrix recovery theory, an equivalent theory for tensor nuclear norm minimization remains absent, necessitating such matrix-based approaches [28, 29, 30, 31, 32, 33, 34, 35, 36, 37].

To address this gap, we propose a deterministic tensor completion theory based on tensor nuclear norm minimization for non-random missing data. Unlike direct application of tensor nuclear norm minimization to forecasting, our method introduces a Temporal Convolution Tensor Nuclear Norm (TCTNN) model, which transforms multidimensional time series via temporal convolution and applies tensor nuclear norm minimization to the transformed data. This approach establishes an exact prediction theory while achieving higher accuracy and lower computational costs compared to existing methods. Our main contribution can be summarized as follows:

  • Within the t-SVD framework, a tensor completion theory suitable for any deterministic missing data has been established by introducing the minimum slice sampling rate as a new sampling metric.

  • The TCTNN model is proposed by encoding the low-rankness of the temporal convolution tensor, and its corresponding exact prediction theory is also provided. Furthermore, the impact of the smoothness and periodicity of the time dimension on the temporal convolution low-rankness is analyzed.

  • The proposed TCTNN model is efficiently solved based on ADMM framework and well tested on several real-world multidimensional time series datasets. The results indicate that the proposed method achieves better performance in prediction accuracy and computation time over many state-of-the-art algorithms, especially in the few-shot scenario.

The remainder of this paper is organized as follows. In Section II, we review a series of related works. In Section III, A deterministic tensor completion theory is establish. Section IV gives the proposed TCTNN method and its corresponding prediction theory, and Section V introduces the optimization algorithm. In Section VI, we conduct extensive experiments on some multidimensional time series datasets. Finally, the conclusion is given in Section VII. It should be noted that all proof details are given in supplementary materials.

II Related Works

II-A Deterministic completion theory

Recovery theory for matrix/tensor completion in random cases has been extensively studied, yielding significant advancements in many fields [38] [39] [40] [41]. In contrast, deterministic observation patterns have garnered considerably less attention. However, the pattern of missing entries depends on the specific problem context in practice, which can lead to highly non-random occurrences. A case in point is the presence of missing entries in images, which is often attributed to occlusions or shadows present in the scene. To investigate the recovery theory for matrix/tensor completion under non-random missing conditions, previous researchers have conducted extensive exploration using various tools [28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. For instance, Singer et al. [29] apply the basic ideas and tools of rigidity theory to determine the uniqueness of low-rank matrix completion. Based on Plücker coordinates, Tsakiris [31] provides three families of patterns of arbitrary rank and arbitrary matrix size that allow unique or finite number of completions. Besides, Liu et al. [36] propose isomeric condition and relative well-conditionedness to ensure that any matrix can be recovered from sampling of matrix terms. Despite these advancements, most of the current theoretical work focuses on deterministic matrix completion, while the tensor case is not explored enough. Therefore, the theoretical framework for deterministic tensor completion still requires significant enhancement.

II-B Time series prediction via matrix/tensor completion

Matrix and tensor completion methods have garnered significant attention for time series forecasting [23] [26] [42]. These approaches generally fall into two categories: decomposition-based methods and nuclear norm-based methods. Matrix/tensor decomposition, widely used for collaborative filtering, provide a natural solution for handling missing data in time series forecasting [42] [43] [44]. These models approximate the incomplete time series using bilinear or multilinear decompositions with predefined rank parameters. To capture temporal dynamics, recent studies have introduced temporal constraint functions, such as autoregressive regularization terms, to regularize the temporal factor matrix [45] [23] [44]. Based on this, Takeuchi et al. [22] model spatiotemporal tensor data by introducing spatial autoregressive regularizer, providing additional predictive power in the spatial dimension.

As opposed to decomposition-based methods, nuclear norm-based approaches do not require predefined rank parameters [38] [40] [41]. However, directly applying nuclear norms to time series often yields suboptimal predictions [6]. A practical enhancement involves applying structural transformations to time series data to create structured matrices, enabling the application of nuclear norms for forecasting. For example, Gillard et al. [26] and Butcher et al. [27] used the Hankel transform to convert univariate time series into Hankel matrices and applied nuclear norm minimization for prediction. Similarly, Liu et al. [24] [25] utilized convolution matrices for encoding nuclear norms, providing insights into the predictive domain that can be exactly forecasted. Although these methods offer predictive insights, they involve converting time series into large matrices, resulting in the loss of inherent tensor structure and imposing high computational costs. To address these limitations, we propose the Temporal Convolution Tensor Nulclear Norm (TCTNN) model, which preserves the tensor structure, offers theoretical guarantees for predictions, and achieves superior performance in terms of both accuracy and computational efficiency.

III Deterministic Tensor Completion

In contrast to the typical scenario of tensor completion with random missing values, the prediction problem in the context of multidimensional time series prediction corresponds to a deterministic tensor completion problem. To address this, we first introduce a new theory of deterministic tensor completion in this section.

III-A T-SVD Framework

We begin by providing a brief overview of the key concepts within the tensor Singular Value Decomposition (t-SVD) framework [46] [47] [48] [49] [50]. This includes the tensor-tensor product, tensor nuclear norm, and the conditions for tensor incoherence. Following this, we propose a novel deterministic sampling metric designed to establish an exact recovery theory for deterministic tensor completion.

For consistency, we use lowercase, boldface lowercase, capital, and Euler script letters to represent scalars, vectors, matrices, and tensors, respectively, e.g., xx\in\mathbb{R}, 𝒙m\bm{x}\in\mathbb{R}^{m}, Xm1×m2X\in\mathbb{R}^{m_{1}\times m_{2}}, and 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}}. For an order-dd tensor 𝒳\mathcal{X} of dimensions m1×m2××mdm_{1}\times m_{2}\times\cdots\times m_{d}, its elements, slices, and sub-tensors are represented as 𝒳()\mathcal{X}_{(...)}, such as 𝒳(i1,i2,,id)\mathcal{X}_{(i_{1},i_{2},\cdots,i_{d})} represents the (i1,i2,,id)(i_{1},i_{2},\cdots,i_{d})-th element and 𝒳(:,:,i3,,id)\mathcal{X}_{(:,:,i_{3},\cdots,i_{d})} represents the (i3,,id)(i_{3},\cdots,i_{d})-th face slice. To construct the t-SVD framework, some operators on 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} are summarized in Table I.

TABLE I: Summary of operators for t-SVD
Notations Descriptions
circ(𝒳)\operatorname{circ}(\mathcal{X}) [𝒳(:,,:,1)𝒳(:,,:,nd)𝒳(:,,:,2)𝒳(:,,:,2)𝒳(:,,:,1)𝒳(:,,:,3)𝒳(:,,:,nd)𝒳(:,,:,nd1)𝒳(:,,:,1)]\left[\begin{array}[]{cccc}\mathcal{X}_{(:,\cdots,:,1)}&\mathcal{X}_{(:,\cdots,:,n_{d})}&\cdots&\mathcal{X}_{(:,\cdots,:,2)}\\ \mathcal{X}_{(:,\cdots,:,2)}&\mathcal{X}_{(:,\cdots,:,1)}&\cdots&\mathcal{X}_{(:,\cdots,:,3)}\\ \vdots&\vdots&\vdots&\vdots\\ \mathcal{X}_{(:,\cdots,:,n_{d})}&\mathcal{X}_{(:,\cdots,:,n_{d}-1)}&\cdots&\mathcal{X}_{(:,\cdots,:,1)}\end{array}\right],
size m1md×m2md×m3×md1m_{1}m_{d}\times m_{2}m_{d}\times m_{3}\cdots\times m_{d-1}
bcirc(𝒳)\operatorname{bcirc}(\mathcal{X}) bcirc(𝒳)=circd2(𝒳)\operatorname{bcirc}(\mathcal{X})=\operatorname{circ}^{d-2}(\mathcal{X}),
size m1j=3dmj×m2j=3dmjm_{1}\prod_{j=3}^{d}m_{j}\times m_{2}\prod_{j=3}^{d}m_{j}
unfold(𝒳)\operatorname{unfold}(\mathcal{X}) [𝒳(:,,:,1)𝒳(:,,:,2)𝒳(:,,:,nd)]\left[\begin{array}[]{c}\mathcal{X}_{(:,\cdots,:,1)}\\ \mathcal{X}_{(:,\cdots,:,2)}\\ \vdots\\ \mathcal{X}_{(:,\cdots,:,n_{d})}\\ \end{array}\right],
size m1md×m2××md1m_{1}m_{d}\times m_{2}\times\cdots\times m_{d-1}
bunfold(𝒳)\operatorname{bunfold}(\mathcal{X}) bunfold(𝒳)=unfoldd2(𝒳)\operatorname{bunfold}(\mathcal{X})=\operatorname{unfold}^{d-2}(\mathcal{X}),
size m1j=3dmj×m2m_{1}\prod_{j=3}^{d}m_{j}\times m_{2}
bfold(𝒳)\operatorname{bfold}(\mathcal{X}) the inverse operation of bunfold(𝒳)\operatorname{bunfold}(\mathcal{X})
bdiag(𝒳)\operatorname{bdiag}(\mathcal{X}) [𝒳(:,:,1,,1)𝒳(:,:,2,,1)𝒳(:,:,i3,,id)]\left[\begin{array}[]{cccc}\mathcal{X}_{(:,:,1,\cdots,1)}&&&\\ &\mathcal{X}_{(:,:,2,\cdots,1)}&&\\ &&\ddots&\\ &&&\mathcal{X}_{(:,:,i_{3},\cdots,i_{d})}\end{array}\right],
size m1j=3dmj×m2j=3dmjm_{1}\prod_{j=3}^{d}m_{j}\times m_{2}\prod_{j=3}^{d}m_{j}
Definition III.1 (T-product [49]).

Let order-dd tensors 𝒜m1×a×m3××md\mathcal{A}\in\mathbb{R}^{m_{1}\times a\times m_{3}\times\cdots\times m_{d}} and a×m2×m3××md\mathcal{B}\in\mathbb{R}^{a\times m_{2}\times m_{3}\times\cdots\times m_{d}}, then the t-product 𝒜\mathcal{A}*\mathcal{B} is defined to be a tensor of size m1×m2×m3××mdm_{1}\times m_{2}\times m_{3}\times\cdots\times m_{d},

𝒜=bfold(bcirc(𝒜)bunfold()).\mathcal{A}*\mathcal{B}=\operatorname{bfold}(\operatorname{bcirc}(\mathcal{A})\cdot\operatorname{bunfold}(\mathcal{B})). (1)
Definition III.2 (Circular convolution [51]).

For any tensors 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} and 𝒦ν1×ν2××νd\mathcal{K}\in\mathbb{R}^{\nu_{1}\times\nu_{2}\times\cdots\times\nu_{d}} with νsms,s=1,2,,d\nu_{s}\leq m_{s},s=1,2,\cdots,d, the circular convolution of two tensors is

𝒵=𝒳𝒦m1×m2××md\mathcal{Z}=\mathcal{X}\star\mathcal{K}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} (2)

or element-wise,

𝒵(i1,..,id)=j1=1ν1jd=1νd𝒳(i1j1+1,,idjd+1)𝒦(j1,..,jd).\mathcal{Z}_{(i_{1},..,i_{d})}=\sum_{j_{1}=1}^{\nu_{1}}...\sum_{j_{d}=1}^{\nu_{d}}\mathcal{X}_{(i_{1}-j_{1}+1,...,i_{d}-j_{d}+1)}\mathcal{K}_{(j_{1},..,j_{d})}. (3)

where \star denotes the circular convolution operator. The circulant boundary condition are satisfied by assuming 𝒳(i1j1+1+δ1m1,,idjd+1+δdmd)=𝒳(i1j1+1,,idjd+1)\mathcal{X}_{(i_{1}-j_{1}+1+\delta_{1}m_{1},...,i_{d}-j_{d}+1+\delta_{d}m_{d})}=\mathcal{X}_{(i_{1}-j_{1}+1,...,i_{d}-j_{d}+1)} with δs=1\delta_{s}=1 if is<js1i_{s}<j_{s}-1 and otherwise, δs=0\delta_{s}=0.

Since the multiplication of bcirc(𝒜)\operatorname{bcirc}(\mathcal{A}) and bunfold()\operatorname{bunfold}(\mathcal{B}) is in the form of circular convolution, t-product can also be defined using circular convolution:

[𝒜](i1,i2,:,,:)=j=1m2𝒜(i1,j,:,,:)(j,i2,:,,:)[\mathcal{A}*\mathcal{B}]_{(i_{1},i_{2},:,\cdots,:)}=\sum_{j=1}^{m_{2}}\mathcal{A}_{(i_{1},j,:,\cdots,:)}\star\mathcal{B}_{(j,i_{2},:,\cdots,:)} (4)

where \star denotes the operator of circular convolution. Evidently, the t-product of tensors is an operation analogous to matrix multiplication. As a result, many properties of matrix multiplication can be extended to the t-product [48] [50].

In addition, the discrete Fourier transform (DFT) can convert the circular convolution operation into an element-wise product. Therefore, the t-product 𝒜\mathcal{A}*\mathcal{B} can be obtained through DFT. For a given tensor 𝒳\mathcal{X}, its Fourier-transformed in last d-2 dimensions is expressed as

𝒳¯:=d2(𝒳)=𝒳×3Fm3×4×dFmd,\bar{\mathcal{X}}:=\mathcal{F}_{d-2}(\mathcal{X})=\mathcal{X}\times_{3}\mathrm{F}_{m_{3}}\times_{4}\cdots\times_{d}\mathrm{F}_{m_{d}}, (5)

where ×j\times_{j} denotes the j-mode product [52] and Fmj\mathrm{F}_{m_{j}} represents the DFT matrices of size mj×mjm_{j}\times m_{j} for j=3,,dj=3,\cdots,d. Then

𝒜=d21(d2(𝒜)Δd2()),\mathcal{A}*\mathcal{B}=\mathcal{F}_{d-2}^{-1}(\mathcal{F}_{d-2}(\mathcal{A})\Delta\mathcal{F}_{d-2}(\mathcal{B})), (6)

where Δ\Delta denotes the face-wise product (𝒵=𝒳Δ𝒴𝒵(:,:,i3,,id)=𝒳(:,:,i3,,id)𝒴(:,:,i3,,id)\mathcal{Z}=\mathcal{X}\Delta\mathcal{Y}\Leftrightarrow\mathcal{Z}_{(:,:,i_{3},\cdots,i_{d})}=\mathcal{X}_{(:,:,i_{3},\cdots,i_{d})}\mathcal{Y}_{(:,:,i_{3},\cdots,i_{d})} for all face slices).

Definition III.3 (Identity tensor [49]).

The identity tensor m×m×m3××md\mathcal{I}\in\mathbb{R}^{m\times m\times m_{3}\times\cdots\times m_{d}} is the tensor with its first face slice 𝒳(:,:,1,,1)\mathcal{X}_{(:,:,1,\cdots,1)} being the identity matrix m×mm\times m and the other face slices being all zeros.

Definition III.4 (Transpose [49]).

For an order-dd tensor 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}}, its transpose 𝒳Tm2×m1×m3××md\mathcal{X}^{\mathrm{T}}\in\mathbb{R}^{m_{2}\times m_{1}\times m_{3}\times\cdots\times m_{d}} is the obtained by tensor transposing each 𝒳(:,,:,i)\mathcal{X}_{(:,\cdots,:,i)} for i=1,,npi=1,\ldots,n_{p} and then reversing the order of the 𝒳(:,,:,i)\mathcal{X}_{(:,\cdots,:,i)} 22 through npn_{p}.

Definition III.5 (Orthogonal tensor [49]).

An order-dd tensor 𝒰m×m×m3××md\mathcal{U}\in\mathbb{R}^{m\times m\times m_{3}\times\cdots\times m_{d}} is orthogonal if 𝒰T𝒰=𝒰𝒰T=\mathcal{U}^{\mathrm{T}}*\mathcal{U}=\mathcal{U}*\mathcal{U}^{\mathrm{T}}=\mathcal{I}, where m×m×m3××md\mathcal{I}\in\mathbb{R}^{m\times m\times m_{3}\times\cdots\times m_{d}} is an order-dd identity tensor.

Definition III.6 (F-diagonal tensor [49]).

A tensor is said to be f-diagonal if all of its face slices are diagonal.

Lemma III.1 (T-SVD [49]).

Let tensor 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}}, then it can be factorized as

𝒳=𝒰𝒮𝒱T,\mathcal{X}=\mathcal{U}*\mathcal{S}*\mathcal{V}^{\mathrm{T}}, (7)

where 𝒰m1×m1××md\mathcal{U}\in\mathbb{R}^{m_{1}\times m_{1}\times\cdots\times m_{d}}, 𝒱m2×m2××md\mathcal{V}\in\mathbb{R}^{m_{2}\times m_{2}\times\cdots\times m_{d}} are orthogonal tensors, and 𝒮m1×m2××md\mathcal{S}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} is a f-diagonal tensor.

Definition III.7 (Tensor tubal rank) [48] [50]).

For 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} with t-SVD 𝒳=𝒰𝒮𝒱T\mathcal{X}=\mathcal{U}*\mathcal{S}*\mathcal{V}^{\mathrm{T}}, its tubal rank is defined as

rankt(𝒳):\displaystyle\operatorname{rank}_{t}(\mathcal{X}): ={i:𝒮(i,i,:,,:)0},\displaystyle=\sharp\{i:\mathcal{S}_{(i,i,:,\cdots,:)}\neq\textbf{0}\},

where \sharp denotes the cardinality of a set.

Definition III.8 (Tensor multi-rank [48] [50]).

The multi-rank of a tensor 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} is represented by a vector 𝐫m3m4md\mathbf{r}\in\mathbb{R}^{m_{3}m_{4}\cdots m_{d}}. The ii-th element of 𝐫\mathbf{r} corresponds to the rank of the ii-th block of bdiag(𝒳¯)\operatorname{bdiag}(\bar{\mathcal{X}}). We denote the tensor multi-rank sum as rsr_{s}, which means rs=i=1m3m4md𝐫(i)r_{s}=\sum_{i=1}^{m_{3}m_{4}\cdots m_{d}}\mathbf{r}^{(i)}.

Definition III.9 (Tensor spectral norm [48] [50]).

For order-dd tensor 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}}, its tensor spectral norm is defined as

𝒳:=bdiag(𝒳¯),\displaystyle\|\mathcal{X}\|:=\|\operatorname{bdiag}(\bar{\mathcal{X}})\|,

where \|\cdot\| denotes the spectral norm of a matrix or tensor.

Definition III.10 (Tensor nuclear norm [48] [50]).

For order-dd tensor 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} under the t-SVD framework, its tensor nuclear norm (TNN) is defined as

𝒳:=1mbdiag(𝒳¯)\displaystyle\|\mathcal{X}\|_{\circledast}:=\frac{1}{m}\lVert\operatorname{bdiag}(\bar{\mathcal{X}})\rVert_{*}

where m=m3m4mdm=m_{3}m_{4}\cdots m_{d} and \|\cdot\|_{*} denotes the nuclear norm of a matrix. Note that the tensor nuclear norm is the dual norm of its tensor spectral norm, which is consistent with the matrix case [38] [48].

Lemma III.2 (T-SVT [48] [50]).

Given 𝒳m1×m2××md\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}} with t-SVD 𝒳=𝒰𝒮𝒱T\mathcal{X}=\mathcal{U}*\mathcal{S}*\mathcal{V}^{\mathrm{T}}, its tensor singular value thresholding (t-SVT) is defined by tSVTτ(𝒳):=𝒰𝒮τ𝒱T\operatorname{t-SVT}_{\tau}(\mathcal{X}):=\mathcal{U}*\mathcal{S}_{\tau}*\mathcal{V}^{\mathrm{T}}, where 𝒮τ=d21((𝒮¯τ)+)\mathcal{S}_{\tau}=\mathcal{F}_{d-2}^{-1}((\bar{\mathcal{S}}-\tau)_{+}), a+=max(0,a)a_{+}=\max(0,a), which obeys

tSVTτ(𝒳)=argmin𝒴τ𝒴+12𝒴𝒳F2.\operatorname{t-SVT}_{\tau}(\mathcal{X})=\arg\min_{\mathcal{Y}}\tau\|\mathcal{Y}\|_{\circledast}+\frac{1}{2}\|\mathcal{Y}-\mathcal{X}\|_{\mathrm{F}}^{2}. (8)

III-B Deterministic Tensor Completion

Now we consider deterministic tensor completion problem, a broader research challenge. Let m1××md\mathcal{M}\in\mathbb{R}^{m_{1}\times\cdots\times m_{d}} represent an unknow target tensor with tubal rank rr and skinny t-SVD =𝒰𝒮𝒱T\mathcal{M}=\mathcal{U}*\mathcal{S}*\mathcal{V}^{\mathrm{T}}. Suppose that we observe only the entries of \mathcal{M} over a deterministic sampling set Ω[m1][m2][md]\Omega\subseteq\left[m_{1}\right]\otimes\left[m_{2}\right]\otimes\cdots\otimes\left[m_{d}\right]. The corresponding mask tensor is denoted by Ω¯\bar{\Omega}, with the following entry definition:

[Ω¯](i1,,id)={1 if (i1,,id)Ω,0 otherwise.[\bar{\Omega}]_{(i_{1},\cdots,i_{d})}=\begin{cases}1&\text{ if }(i_{1},\cdots,i_{d})\in\Omega,\\ 0&\text{ otherwise.}\end{cases}

The sampling operator 𝒫Ω\mathcal{P}_{\Omega} is then defined as 𝒫Ω()=Ω¯,\mathcal{P}_{\Omega}(\mathcal{M})=\bar{\Omega}\circ\mathcal{M}, where \circ denotes the Hadamard product.

Refer to caption
Figure 2: Illustrations of horizontal/lateral mask sub-tensor sampling number in the three-dimensional case. (a): Arrangement of the mask tensor Ω¯\bar{\Omega}, where the white cubes represent the value 1 (sampled entries) and the black cubes represent the value 0 (unsampled entries); (b):|Ω¯3|=10\left|\bar{\Omega}_{3}\right|=10 indicates that there are 10 sampled entries in the third horizontal mask sub-tensor; (c): |Ω¯2|=9\left|\bar{\Omega}^{2}\right|=9 indicates that there are 9 sampled entries in the second lateral mask sub-tensor.

To address the deterministic tensor completion problem, we employ a standard constrained Tensor Nuclear Norm (TNN) model:

min𝒳m1×m2××md𝒳,s.t.𝒫Ω(𝒳)=𝒫Ω(),\min_{\mathcal{X}\in\mathbb{R}^{m_{1}\times m_{2}\times\cdots\times m_{d}}}~{}\lVert\mathcal{X}\rVert_{\circledast},\ \text{s.t.}\ \mathcal{P}_{\Omega}(\mathcal{X})=\mathcal{P}_{\Omega}(\mathcal{M}), (9)

where Ω\Omega is a deterministic sampling set. In the context of tensor completion under random sampling, it is often assumed that the data sampling follows a specific distribution, such as the Bernoulli distribution, then the sampling probability pp can naturally serve as a measure of sampling [48]. However, this approach is clearly ineffective for deterministic sampling. Inspired by the minimum row/column sampling ratio for matrix deterministic sampling [36], we introduce a tensor deterministic sampling metric ρ\rho, termed the minimum horizontal/lateral sub-tensor sampling ratio. Here, the horizontal/lateral sub-tensors of a tensor are analogous to the rows/columns of a matrix, respectively. As mentioned above, the tensor t-product, spectral/nuclear norm within the t-SVD framework are defined in a matrix-like manner, providing a solid basis for defining tensor deterministic sampling metric by following matrix case.

Definition III.11 (Horizontal/lateral mask sub-tensor sampling number).

For any fixed sampling set Ω[m1][m2][md]\Omega\subseteq\left[m_{1}\right]\otimes\left[m_{2}\right]\otimes\cdots\otimes\left[m_{d}\right] with corresponding mask tensor Ω¯m1××md\bar{\Omega}\in\mathbb{R}^{m_{1}\times\cdots\times m_{d}}, the i1i_{1}-th horizontal mask sub-tensor, i,e., the i1i_{1}-th horizontal sub-tensor of mask tensor, is given as Ω¯i1:=[Ω¯](i1,:,,:)\bar{\Omega}_{i_{1}}:=[\bar{\Omega}]_{(i_{1},:,\cdots,:)}. The i1i_{1}-th horizontal mask sub-tensor sampling number is then defined as

|Ω¯i1|:={(i2,i3,,id)|[Ω¯i1](i2,i3,,id)=1}.\left|\bar{\Omega}_{i_{1}}\right|:=\sharp\{(i_{2},i_{3},\cdots,i_{d})|[\bar{\Omega}_{i_{1}}]_{(i_{2},i_{3},\cdots,i_{d})}=1\}.

Similarly, following Ω¯i2:=Ω¯(:,i2,,:)\bar{\Omega}^{i_{2}}:=\bar{\Omega}_{(:,i_{2},\cdots,:)}, the i2i_{2}-th lateral mask sub-tensor sampling number is defined as

|Ω¯i2|:={(i1,i3,,id)|[Ω¯i2](i1,i3,,id)=1}.\left|\bar{\Omega}^{i_{2}}\right|:=\sharp\{(i_{1},i_{3},\cdots,i_{d})|[\bar{\Omega}^{i_{2}}]_{(i_{1},i_{3},\cdots,i_{d})}=1\}.

where \sharp denotes the cardinality of a set.

For an intuitive understanding, we provide a schematic diagram illustrating horizontal/lateral mask sub-tensor sampling number in the three-dimensional case, as shown in Fig.2.

Definition III.12 (Minimum horizontal/lateral sub-tensor sampling ratio).

For any fixed sampling set Ω[m1][m2][md]\Omega\subseteq\left[m_{1}\right]\otimes\left[m_{2}\right]\otimes\cdots\otimes\left[m_{d}\right], its minimum horizontal/lateral sub-tensor sampling ratio is defined as the smallest fraction of sampled entries in each horizontal and lateral mask sub-tensor; namely,

ρ(Ω)=min(min1i1m1|Ω¯i1|m2×m,min1i2m2|Ω¯i2|m1×m),\rho(\Omega)=\min\left(\min_{1\leq i_{1}\leq m_{1}}\frac{\left|\bar{\Omega}_{i_{1}}\right|}{m_{2}\times m},\min_{1\leq i_{2}\leq m_{2}}\frac{\left|\bar{\Omega}^{i_{2}}\right|}{m_{1}\times m}\right), (10)

where m=m3m4mdm=m_{3}m_{4}\cdots m_{d}.

Tensor incoherence is a crucial theoretical tool in low-rank tensor recovery [38] [39] [50] [53]. It enforces or constrains the low-rank structure of the underlying tensor to avoid ill-posed problems, such as when most elements of \mathcal{M} are zero, yet it is still low-rank.

Definition III.13 (Tensor incoherence conditions [50]).

For m1××md\mathcal{M}\in\mathbb{R}^{m_{1}\times\cdots\times m_{d}} with tubal rank rr and it has the skinny t-SVD =𝒰𝒮𝒱T\mathcal{M}=\mathcal{U}*\mathcal{S}*\mathcal{V}^{\mathrm{T}}, and then \mathcal{M} is said to satisfy the tensor incoherence conditions with parameter μ>0\mu>0 if

maxi1=1,,m1𝒰T𝔢̊1(i1)Fμrm1m,\max_{i_{1}=1,\cdots,m_{1}}\|\mathcal{U}^{\mathrm{T}}*\mathring{\mathfrak{e}}_{1}^{(i_{1})}\|_{\mathrm{F}}\leq\sqrt{\frac{\mu r}{m_{1}m}}, (11)
maxi2=1,,m2𝒱T𝔢̊2(i2)Fμrm2m,\max_{i_{2}=1,\cdots,m_{2}}\|\mathcal{V}^{\mathrm{T}}*\mathring{\mathfrak{e}}_{2}^{(i_{2})}\|_{\mathrm{F}}\leq\sqrt{\frac{\mu r}{m_{2}m}}, (12)

where m=m3m4mdm=m_{3}m_{4}\cdots m_{d}, 𝔢̊1(i1)\mathring{\mathfrak{e}}_{1}^{(i_{1})} is the order-dd tensor mode-1 basis sized m1×1×m3××mdm_{1}\times 1\times m_{3}\times\cdots\times m_{d}, whose (i1,1,i3,,id)(i_{1},1,i_{3},\cdots,i_{d})-th entry equals 1 and the rest equal 0, and 𝔢̊2(i2):=(𝔢̊1(i2))T\mathring{\mathfrak{e}}_{2}^{(i_{2})}:=(\mathring{\mathfrak{e}}_{1}^{(i_{2})})^{\mathrm{T}} is the mode-2 basis.

III-C Exact Recovery Guarantee

We briefly show the exact recovery guarantee of deterministic low-rank tensor completion problem.

Theorem III.1.

Suppose that m1××md\mathcal{M}\in\mathbb{R}^{m_{1}\times\cdots\times m_{d}} obeys the standard tensor incoherence conditions (11)-(12), Ω[m1][m2][md]\Omega\subseteq\left[m_{1}\right]\otimes\left[m_{2}\right]\otimes\cdots\otimes\left[m_{d}\right] and ρ(Ω)\rho(\Omega) is its minimum horizontal/lateral sub-tensor sampling ratio. if

ρ(Ω)>112μr(rs+1),\rho(\Omega)>1-\frac{1}{2\mu r(r_{s}+1)}, (13)

where rr is the tubal rank of tensor \mathcal{M}, rsr_{s} is the multi-rank sum and μ\mu is the parameter of tensor incoherence conditions , then \mathcal{M} is the unique solution to the TNN model (9).

The above result demonstrates that minimizing the tensor nuclear norm can achieve exact deterministic tensor completion, provided that condition (13) is satisfied. The feasibility of exact recovery hinges upon the interplay between the sampling set and the tensor rank. As the low-rank structure of the tensor strengthens, the minimum slice sampling ratio required for exact deterministic tensor completion decreases correspondingly. Moreover, the proposed deterministic exact recovery theory can be extended to the random sampling case. The following corollary reveals that when the sampling set satisfies ΩBer(p)\Omega\sim\operatorname{Ber}(p) and p11/2μr(rs+1)p\geq 1-1/{2\mu r(r_{s}+1)}, exact recovery holds with high probability.

Proposition III.1.

Suppose that m1××md\mathcal{M}\in\mathbb{R}^{m_{1}\times\cdots\times m_{d}} obeys the standard tensor incoherence conditions (11)-(12) and ΩBer(p)\Omega\sim\operatorname{Ber}(p). if p11/2μr(rs+1)p\geq 1-1/{2\mu r(r_{s}+1)}, then \mathcal{M} is the unique solution to the TNN model (9) with probability at least 1e4a2m01-e^{-4a^{2}m_{0}}, where a=p1+1/2μr(rs+1),m0=m1m2mda=p-1+1/{2\mu r(r_{s}+1)},m_{0}=m_{1}m_{2}\cdots m_{d}.

IV Multidimensional Time Series Prediction

Let us denote the multidimensional time series as {i}i=1t\left\{\mathcal{M}_{i}\right\}_{i=1}^{t}, where in1××np\mathcal{M}_{i}\in\mathbb{R}^{n_{1}\times\cdots\times n_{p}}. The objective is to predict the next hh unseen samples {i}i=th+1t\{\mathcal{M}_{i}\}_{i=t-h+1}^{t} given the historical portion {i}i=1th\{\mathcal{M}_{i}\}_{i=1}^{t-h}, where tt represents the number of samples in the time dimension and hh denotes the number of samples to be predicted, referred to as the forecast horizon.

IV-A Time series prediction via TNN

Multidimensional time series prediction, as a specific instance of deterministic tensor completion, can be addressed using the aforementioned tensor nuclear norm minimization, which results in the following model:

min𝒳t×n1××np𝒳,s.t.𝒫Ω(𝒳)=𝒫Ω(),\min_{\mathcal{X}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}}}~{}\lVert\mathcal{X}\rVert_{\circledast},\ \text{s.t.}\ \mathcal{P}_{\Omega}(\mathcal{X})=\mathcal{P}_{\Omega}(\mathcal{M}), (14)

where Ω=[th][n1][np]\Omega=\left[t-h\right]\otimes\left[n_{1}\right]\otimes\cdots\otimes\left[n_{p}\right] represents the deterministic sampling set corresponding to the historical portion sample, 𝒫Ω()\mathcal{P}_{\Omega}(\cdot) is the projection operator. t×n1×n2××np\mathcal{M}\in\mathbb{R}^{t\times n_{1}\times n_{2}\times\cdots\times n_{p}} denotes the multidimensional time series to be completed.

Deterministic tensor completion theory can be applied to prediction tasks where the sampling set Ω\Omega is chosen as the historical data region in the prediction problem. However, in such cases, the minimum horizontal/lateral sub-tensor sampling ratio

ρ(Ω)=0,\rho(\Omega)=0, (15)

which fails to meet the recovery conditions (13). This implies that tensor nuclear norm minimization cannot be directly applied to prediction tasks, as it would result in prediction outcomes that are entirely zero. The core issue lies in the concentrated nature of missing data in time series forecasting, which leads to insufficient sampling coverage. A natural solution to this problem is to implement a structural transformation that disperses the missing data, thereby increasing the minimum horizontal/lateral sub-tensor sampling ratio above zero. To address this, we introduce the temporal convolution tensor for multidimensional time series. This approach ensures that each horizontal/lateral sub-tensor contains a sufficient number of sampled elements, as illustrated in Fig.3.

IV-B Temporal Convolution Tensor

In this study, circular convolution (III.2) is applied only along the time dimension of the multidimensional time series, a process referred to as temporal circular convolution. Following this, we obtain the temporal convolution tensor of the multidimensional time series.

Definition IV.1 (Temporal circular convolution operator).

The temporal circular convolution procedure of converting t×n1××np\mathcal{M}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}} and 𝐤k(kt)\mathbf{k}\in\mathbb{R}^{k}(k\leq t) into t𝐤t×n1××np\mathcal{M}\star_{t}\mathbf{k}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}} is expressed as follows:

[t𝐤](i,i1,,ip)=j=1k(ij+1,i1,,ip)𝐤(j),[\mathcal{M}\star_{t}\mathbf{k}]_{(i,i_{1},...,i_{p})}=\sum_{j=1}^{k}\mathcal{M}_{(i-j+1,i_{1},...,i_{p})}\mathbf{k}_{(j)}, (16)

where t\star_{t} denotes the temporal circular convolution operator, 𝐤\mathbf{k} is the kernel vector, and it is assumed that (ij+1,i1,,ip)=(ij+1+t,i1,,ip)\mathcal{M}_{(i-j+1,i_{1},...,i_{p})}=\mathcal{M}_{(i-j+1+t,i_{1},...,i_{p})} for i+1ji+1\leq j, which is the temporal circulant boundary condition.

Definition IV.2 (Temporal Convolution Tensor).

The temporal convolution tensor 𝒯k()t×k×n1××np\mathcal{T}_{k}(\mathcal{M})\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}} of multidimensional time series t×n1××np\mathcal{M}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}} can be obtained via temporal circular convolution:

[t𝐤](:,i1,,ip)=[𝒯k()](:,:,i1,,ip)𝐤,[\mathcal{M}\star_{t}\mathbf{k}]_{(:,i_{1},...,i_{p})}=[\mathcal{T}_{k}(\mathcal{M})]_{(:,:,i_{1},...,i_{p})}\mathbf{k}, (17)

where t\star_{t} denotes the temporal circular convolution operator, 𝒯k()\mathcal{T}_{k}(\cdot) is the temporal convolution transform, and the temporal convolution tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) is given by:

𝒯k()=[1ttk+221tk+3tt1tk+1],\mathcal{T}_{k}(\mathcal{M})=\left[\begin{array}[]{cccc}\mathcal{M}^{1}&\mathcal{M}^{t}&\cdots&\mathcal{M}^{t-k+2}\\ \mathcal{M}^{2}&\mathcal{M}^{1}&\cdots&\mathcal{M}^{t-k+3}\\ \vdots&\vdots&\vdots&\vdots\\ \mathcal{M}^{t}&\mathcal{M}^{t-1}&\cdots&\mathcal{M}^{t-k+1}\end{array}\right],

where i1×1×n1××np\mathcal{M}^{i}\in\mathbb{R}^{1\times 1\times n_{1}\times\cdots\times n_{p}} represents the information of multidimensional time series \mathcal{M} at the ii-th time sampling point, in other words, i=reshape(i,1,1,n1,,np)\mathcal{M}^{i}=reshape(\mathcal{M}_{i},1,1,n_{1},\cdots,n_{p}).

Definition IV.3 (Temporal convolution sampling set).

For a sampling set of prediction problem Ω=[th][n1][np]\Omega=\left[t-h\right]\otimes\left[n_{1}\right]\otimes\cdots\otimes\left[n_{p}\right], its temporal convolution sampling set is denoted by Ω𝒯\Omega_{\mathcal{T}} and given by

Ω¯𝒯=𝒯k(Ω¯) and Ω𝒯=supp(Ω¯𝒯),\bar{\Omega}_{\mathcal{T}}=\mathcal{T}_{k}\left(\bar{\Omega}\right)\text{ and }\Omega_{\mathcal{T}}=\operatorname{supp}\left(\bar{\Omega}_{\mathcal{T}}\right), (18)

where Ω¯t×n1××np\bar{\Omega}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}} is the mask tensor of Ω\Omega and Ω¯𝒯t×k×n1××np\bar{\Omega}_{\mathcal{T}}\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}} is the mask tensor of Ω𝒯\Omega_{\mathcal{T}}. Note that the consistency between sampling after transformation and transformation after sampling: 𝒯k(𝒫Ω())=𝒫Ω𝒯(𝒯k())\mathcal{T}_{k}(\mathcal{P}_{\Omega}(\mathcal{M}))=\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{T}_{k}(\mathcal{M})), where 𝒫Ω𝒯(𝒴)=Ω¯𝒯𝒴\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{Y})=\bar{\Omega}_{\mathcal{T}}\circ\mathcal{Y}, 𝒴t×k×n1××np.\forall\mathcal{Y}\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}}.

To provide an intuitive understanding, we present a schematic diagram illustrating the conversion from a multidimensional time series (incomplete tensor) of size 4×3×24\times 3\times 2 to the corresponding temporal convolution tensor of size 4×4×3×24\times 4\times 3\times 2, as shown in Fig.3(a,b). Notably, the last horizontal sub-tensor of the incomplete tensor 𝒫Ω()\mathcal{P}_{\Omega}(\mathcal{M}) consists entirely of zeros. However, after applying the temporal convolution transform 𝒯k()\mathcal{T}_{k}(\cdot), each horizontal and lateral slice of the incomplete temporal convolution tensor 𝒯k(𝒫Ω())\mathcal{T}_{k}(\mathcal{P}_{\Omega}(\mathcal{M})) contains a sufficient number of sampled entries. As depicted in Fig.3(b), the temporal convolution sampling set achieves a minimum horizontal/lateral sub-tensor sampling ratio

ρ(Ω𝒯)=3/4.\rho(\Omega_{\mathcal{T}})=3/4. (19)

This significantly disperses the unobserved regions, enhancing the potential for axact predictions.

Refer to caption
Figure 3: Illustrations of temporal convolution tensor nulclear norm for multidimensional time series prediction. (a): Incomplete tensor, where black squares represent 0 values (not observed) and other color blocks represent non-zero values (observed); (b): incomplete temporal convolution tensor; (c): complete temporal convolution tensor; (d): complete tensor.

IV-C Temporal Convolution Tensor Low-rankness from Smoothness and Periodicity

Refer to caption
Figure 4: Illustrations of temporal convolution low-rankness from smoothness and periodicity using simulated data.

By applying transform 𝒯k()\mathcal{T}_{k}(\cdot) to the incomplete multidimensional time series 𝒫Ω()\mathcal{P}_{\Omega}(\mathcal{M}), we obtain the incomplete temporal convolution tensor 𝒫Ω𝒯(𝒯k())\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{T}_{k}(\mathcal{M})) with a relatively dispersed distribution of missing values. In many cases, the completion of a tensor relies on its low-rankness, and this holds true for the temporal convolution tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) as well. Fortunately, The low-rankness of the temporal convolution tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) can often be satisfied in various scenarios, as it can be derived from the smoothness and periodicity of the time series \mathcal{M}. To reach a rigorous conclusion, we consider the rank-r approximation error of temporal convolution tensor 𝒯k()t×k×n1××np\mathcal{T}_{k}(\mathcal{M})\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}}, which is denoted by εr(𝒯k())\varepsilon_{r}(\mathcal{T}_{k}(\mathcal{M})) and defined as

εr(𝒯k())=min𝒴𝒯k()𝒴F, s.t. rankt(𝒴)r,\varepsilon_{r}(\mathcal{T}_{k}(\mathcal{M}))=\min_{\mathcal{Y}}\|\mathcal{T}_{k}(\mathcal{M})-\mathcal{Y}\|_{F},\text{ s.t. }\operatorname{rank}_{t}(\mathcal{Y})\leq r,

where 𝒴t×k×n1××np\mathcal{Y}\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}}.

Lemma IV.1.

We can find an indicator η()\eta(\mathcal{M}) to represent the strength of the smoothness of tensor \mathcal{M} along the temporal dimension, which can be defined as

η()=(i=1t1i+1iF2)12,\displaystyle\eta(\mathcal{M})=\left(\sum_{i=1}^{t-1}~{}\lVert\mathcal{M}_{i+1}-\mathcal{M}_{i}\rVert_{F}^{2}\right)^{\frac{1}{2}},

then

εr(𝒯k())t(k+r)3krη().\varepsilon_{r}\left(\mathcal{T}_{k}(\mathcal{M})\right)\leq\sqrt{\frac{t(k+r)}{3}}\left\lceil\frac{k}{r}\right\rceil\eta(\mathcal{M}).
Lemma IV.2.

If the approximate period is τ\tau, we can find an indicator βτ()\beta_{\tau}(\mathcal{M}) to represent the strength of the periodicity of tensor \mathcal{M} along the time dimension, which can be defined as

βτ()=maxi1,,tii+τF,\beta_{\tau}(\mathcal{M})=\max_{i\in 1,...,t}\lVert\mathcal{M}_{i}-\mathcal{M}_{i+\tau}\rVert_{F},

where i=it\mathcal{M}_{i}=\mathcal{M}_{i-t} for i>ti>t then

ετ(𝒯k())τt(kτ1)βτ().\varepsilon_{\tau}\left(\mathcal{T}_{k}(\mathcal{M})\right)\leq\tau t(\left\lceil\frac{k}{\tau}\right\rceil-1)\beta_{\tau}(\mathcal{M}).

The notation \left\lceil\cdot\right\rceil in aboves lemmas denotes the ceiling operation. Lemma IV.1 and Lemma IV.2 indicate that as the periodicity and smoothness of the original data matrix/tensor \mathcal{M} along the time dimension increase, the low-rankness of its temporal convolution tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) becomes more pronounced. To illustrate this concept intuitively, we generated two multivariate time series: one exhibiting strong periodicity and the other characterized by temporal smoothness, as shown in Fig.4(a-1,a-2), respectively. Temporal convolution transform were applied to these time series matrices, and the resulting temporal convolution tensors are depicted in Fig.4(b-1,b-2). As shown in Fig.4(c-1,c-2), the tensor singular values of the temporal convolution tensors decrease rapidly. This trend highlights the enhanced low-rankness of the temporal convolution tensors, further substantiating the theoretical findings.

IV-D Temporal Convolution Tensor Nulclear Norm

Given that the smoothness and periodicity of the original time series tensor \mathcal{M} ensure the low-rank property of the temporal convolution tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}), it is natural to leverage the convex relaxation of tensor rank—the tensor nuclear norm—to encode this low-rankness. This leads to the formulation of the following completion model:

min𝒴t×k×n1××np𝒴, s.t.𝒫Ω𝒯(𝒴)=𝒫Ω𝒯(𝒯k()).\displaystyle\min_{\mathcal{Y}\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}}}~{}\lVert\mathcal{Y}\rVert_{\circledast},~{}\text{ s.t.}~{}\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{Y})=\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{T}_{k}(\mathcal{M})). (20)

to obtain 𝒴^\hat{\mathcal{Y}} as the estimator of 𝒯k()\mathcal{T}_{k}(\mathcal{M}). After that, we perform the inverse temporal convolution on 𝒴^\hat{\mathcal{Y}}, thereby obtaining 𝒯k1(𝒴^)\mathcal{T}_{k}^{-1}(\hat{\mathcal{Y}}) as the final preditior of \mathcal{M}.

This logical flow is depicted in Fig.3, outlining the steps to obtain the complete tensor from the incomplete tensor 𝒫Ω()\mathcal{P}_{\Omega}(\mathcal{M}) in the prediction problem. The process consists of three main steps: 1.Transformation: Apply the temporal convolution transform 𝒯k()\mathcal{T}_{k}(\cdot) to the incomplete tensor 𝒫Ω()\mathcal{P}_{\Omega}(\mathcal{M}), resulting in the incomplete temporal convolution tensor P𝒫Ω𝒯(𝒯k())\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{T}_{k}(\mathcal{M})); 2. Completion: Solve model (20) to derive the complete temporal convolution tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) from the incomplete tensor 𝒫Ω𝒯(𝒯k())\mathcal{P}_{{\Omega}_{\mathcal{T}}}(\mathcal{T}_{k}(\mathcal{M})); 3. Inverse Transformation: Retrieve the complete tensor \mathcal{M} by applying the inverse transform 𝒯k1()\mathcal{T}_{k}^{-1}(\cdot). By consolidating these steps into an equivalent representation, we formulate the Temporal Convolution Tensor Nuclear Norm (TCTNN) model for multidimensional time series prediction, i.e.,

min𝒳t×n1××np𝒯k(𝒳),s.t.𝒫Ω(𝒳)=𝒫Ω().\min_{\mathcal{X}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}}}~{}\lVert\mathcal{T}_{k}(\mathcal{X})\rVert_{\circledast},\ \text{s.t.}\ \mathcal{P}_{\Omega}(\mathcal{X})=\mathcal{P}_{\Omega}(\mathcal{M}). (21)

where Ω=[th][n1][np]\Omega=\left[t-h\right]\otimes\left[n_{1}\right]\otimes\cdots\otimes\left[n_{p}\right] and hh is the forecast horizon.

IV-E Exact Prediction Guarantee

We now develop the exact prediction gaurantee for time series data prediction by applying the deterministic tensor completion theory established in the preceding section. We first define the temporal convolution tensor incoherence conditions as follows:

Definition IV.4 (Temporal convolution tensor incoherence).

For t×n1××np\mathcal{M}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}}, assume that the temporal convolution tensor 𝒯k()t×k×n1××np\mathcal{T}_{k}(\mathcal{M})\in\mathbb{R}^{t\times k\times n_{1}\times\cdots\times n_{p}} with tubal rank r𝒯r_{\mathcal{T}} and it has the skinny t-SVD 𝒯k()=𝒰𝒮𝒱T\mathcal{T}_{k}(\mathcal{M})=\mathcal{U}*\mathcal{S}*\mathcal{V}^{\mathrm{T}}, and then \mathcal{M} is said to satisfy the temporal convolution tensor incoherence conditions with kernel size kk and parameter μ𝒯>0\mu_{\mathcal{T}}>0 if

maxit=1,,t𝒰T𝔢̊t(it)Fμ𝒯r𝒯tn,\max_{i_{t}=1,\cdots,t}\|\mathcal{U}^{\mathrm{T}}*\mathring{\mathfrak{e}}_{t}^{(i_{t})}\|_{\mathrm{F}}\leq\sqrt{\frac{\mu_{\mathcal{T}}r_{\mathcal{T}}}{tn}}, (22)
maxik=1,,k𝒱T𝔢̊k(ik)Fμ𝒯r𝒯kn,\max_{i_{k}=1,\cdots,k}\|\mathcal{V}^{\mathrm{T}}*\mathring{\mathfrak{e}}_{k}^{(i_{k})}\|_{\mathrm{F}}\leq\sqrt{\frac{\mu_{\mathcal{T}}r_{\mathcal{T}}}{kn}}, (23)

where n=n1××npn=n_{1}\times\cdots\times n_{p},𝔢̊t(it)\mathring{\mathfrak{e}}_{t}^{(i_{t})} is the order-(p+2)(p+2) tensor sized t×1×n1××npt\times 1\times n_{1}\times\cdots\times n_{p}, whose (it,1,i1,,ip)(i_{t},1,i_{1},\cdots,i_{p})-th entry equals 1 and the rest equal 0, and 𝔢̊k(ik):=(𝔢̊t(it))T\mathring{\mathfrak{e}}_{k}^{(i_{k})}:=(\mathring{\mathfrak{e}}_{t}^{(i_{t})})^{\mathrm{T}} .

Based on the temporal convolution tensor incoherence, we derive the exact prediction theory for the TCTNN model (21) from the deterministic tensor completion recovery theory.

Theorem IV.1.

Suppose that t×n1××np\mathcal{M}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}} obeys the temporal convolution tensor incoherence conditions (22)-(23) with kernel size k. If

h<k2μ𝒯r𝒯((rs)𝒯+1),h<\frac{k}{2\mu_{\mathcal{T}}r_{\mathcal{T}}((r_{s})_{\mathcal{T}}+1)}, (24)

where r𝒯r_{\mathcal{T}} is the tubal rank of tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) , (rs)𝒯(r_{s})_{\mathcal{T}} is the multi-rank sum of tensor 𝒯k()\mathcal{T}_{k}(\mathcal{M}) and μ𝒯\mu_{\mathcal{T}} is the parameter of temporal convolution tensor incoherence conditions, then \mathcal{M} is the unique solution to the TCTNN model (21).

Theorem IV.1 states that exact prediction is promising when the temporal convolution low-rankness is satisfied. As the low-rankness of the temporal convolution tensor increases and the prediction horizon hh decreases, achieving exact predictions becomes increasingly feasible. According to Lemmas IV.2 and IV.1, stronger periodicity and smoothness in the temporal dimension contribute to greater low-rankness in the temporal convolution tensor, thereby improving the likelihood of exact prediction.

IV-F Comparison with the CNNM model

In very recent, Liu et al. [25] propose the convolution matrix nulclear norm minimization (CNNM) model which can be formulated as

min𝒳t×n1××np𝒞𝐬(𝒳),s.t.𝒫Ω(𝒳)=𝒫Ω(),\min_{\mathcal{X}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}}}~{}\lVert\mathcal{C}_{\mathbf{s}}(\mathcal{X})\rVert_{*},\ \text{s.t.}\ \mathcal{P}_{\Omega}(\mathcal{X})=\mathcal{P}_{\Omega}(\mathcal{M}), (25)

where 𝐬=[st,s1,,sp]\mathbf{s}=[s_{t},s_{1},\cdots,s_{p}] is kernel size, \mathcal{M} represents the time series to be completed, \|\cdot\|_{*} denotes the nuclear norm of a matrix and 𝒞𝐬()\mathcal{C}_{\mathbf{s}}(\cdot) denotes the transformation that converts the time series into the convolution matrix.

In contrast to the CNNM model, the proposed TCTNN model does not convert the original data tensor into convolution matrices along each dimension. Instead, we perform tensor convolution along the temporal dimension of the time series data. Our approach offers advantages in three aspects. 1. Tensor structure information: It preserves the tensor structure of multi-dimensional time series and avoids the information loss caused by converting tensor data into matrices. As demonstrated by the fact that [𝒯k(𝒳)](:,1,:,:)=𝒳[\mathcal{T}_{k}(\mathcal{X})](:,1,:,:)=\mathcal{X}, our temporal convolution tensors maintain the structure information of the multidimensional time series, unlike the convolution matrices, which satisfy [𝒞𝐬(𝒳)](:,1)=vec(𝒳)[\mathcal{C}_{\mathbf{s}}(\mathcal{X})](:,1)=vec(\mathcal{X}). 2. Computational complexity: It avoids the computationally expensive task of processing the nuclear norm of very large matrices, significantly reducing the computational cost. For example, to predict a (p+1)(p+1)-order multidimensional time series n××n\mathcal{M}\in\mathbb{R}^{n\times\cdots\times n}, the computational complexity of each iteration of the CNNM model is O(n3p+3)O(n^{3p+3}), while the TCTNN model requires only O(np+3)O(n^{p+3}) per iteration. Clearly, the computational complexity of the former is nearly prohibitive. 3. Feature utilization: By performing convolution exclusively along the time dimension, our method is better suited for time series data. Many time series exhibit strong characteristics such as periodicity and smoothness only along the time dimension. Convolution along all dimensions could lead to improper feature capture, thereby reducing prediction accuracy.

V Optimization Algorithm

This section gives the optimization algorithm of the proposed TCTNN model (21) for the multidimensional time series prediction task.

V-A Optimization to TCTNN

We adopt the famous optimization framework alternating direction method of multipliers (ADMM) [54, 55, 56] to solve the model (21). First of all, we introduce the auxiliary variable 𝒴=𝒯k(𝒳)\mathcal{Y}=\mathcal{T}_{k}(\mathcal{X}) to separate the temporal convolution transform 𝒯k()\mathcal{T}_{k}(\cdot), then the TCTNN model (21) can be rewritten as below:

min𝒳,𝒴𝒴,\displaystyle\min_{\mathcal{X},\mathcal{Y}}\ \lVert\mathcal{Y}\rVert_{\circledast}, (26)
 s.t.𝒴=\displaystyle~{}\text{ s.t.}~{}\mathcal{Y}= 𝒯k(𝒳),𝒫Ω(𝒳)=𝒫Ω().\displaystyle\mathcal{T}_{k}(\mathcal{X}),\mathcal{P}_{\Omega}(\mathcal{X})=\mathcal{P}_{\Omega}(\mathcal{M}).

For model (26), its partial augmented Lagrangian function is

L(𝒳,𝒴,𝒩)=\displaystyle L(\mathcal{X},\mathcal{Y},\mathcal{N})= 𝒴+𝒩,𝒴𝒯k(𝒳)+μ2𝒴𝒯k(𝒳)F2\displaystyle\lVert\mathcal{Y}\rVert_{\circledast}+\langle\mathcal{N},\mathcal{Y}-\mathcal{T}_{k}(\mathcal{X})\rangle+\frac{\mu_{\ell}}{2}\left\lVert\mathcal{Y}-\mathcal{T}_{k}(\mathcal{X})\right\rVert_{F}^{2} (27)
=\displaystyle= 𝒴+μ2𝒴𝒯k(𝒳)+𝒩/μF2+𝒞\displaystyle\lVert\mathcal{Y}\rVert_{\circledast}+\frac{\mu_{\ell}}{2}\left\lVert\mathcal{Y}-\mathcal{T}_{k}(\mathcal{X})+{\mathcal{N}}/{\mu_{\ell}}\right\rVert_{F}^{2}+\mathcal{C}

where 𝒩\mathcal{N} are Lagrange multipliers, μ>0\mu_{\ell}>0 is a positive scalar, and 𝒞\mathcal{C} is only the multipliers dependent squared items. According to the ADMM optimization framework, the minimization problem of (27) can be transformed into the following subproblems for each variable in an iterative manner, i.e.,

𝒴+1\displaystyle\mathcal{Y}^{{\ell}+1} =argmin𝒴L(𝒳,𝒴,𝒩),\displaystyle=\operatorname{arg}\min_{\mathcal{Y}}~{}L(\mathcal{X}^{\ell},\mathcal{Y},\mathcal{N}^{\ell}), (28)
𝒳+1\displaystyle\mathcal{X}^{{\ell}+1} =argmin𝒳L(𝒳,𝒴+1,𝒩),\displaystyle=\operatorname{arg}\min_{\mathcal{X}}~{}L(\mathcal{X},\mathcal{Y}^{{\ell}+1},\mathcal{N}^{\ell}), (29)
s.t. 𝒫Ω(𝒳+1)=𝒫Ω(),\displaystyle\mathcal{P}_{\Omega}(\mathcal{X}^{{\ell}+1})=\mathcal{P}_{\Omega}(\mathcal{M}),

and the multipliers is updated by

𝒩+1\displaystyle\mathcal{N}^{{\ell}+1} =𝒩+μ(𝒴+1𝒯k(𝒳+1)),\displaystyle=\mathcal{N}^{{\ell}}+\mu_{\ell}(\mathcal{Y}^{{\ell}+1}-\mathcal{T}_{k}(\mathcal{X}^{{\ell}+1})), (30)

where {\ell} denotes the count of iteration in the ADMM. In the following, we deduce the solutions for (28) and (29) respectively, each of which has the closed-form solution.

1) Updating 𝒴+1\mathcal{Y}^{{\ell}+1}: Fixed other variables in (28), then it degenerates into the following tensor nuclear norm minimization with respect to 𝒴\mathcal{Y}, i.e.,

𝒴+1=argmin𝒴1μ𝒴+12𝒴(𝒯k(𝒳)𝒩/μ)F2,\displaystyle\mathcal{Y}^{{\ell}+1}=\operatorname{arg}\min_{\mathcal{Y}}\frac{1}{\mu_{\ell}}\lVert\mathcal{Y}\rVert_{\circledast}+\frac{1}{2}\left\lVert\mathcal{Y}-(\mathcal{T}_{k}(\mathcal{X})-{\mathcal{N}^{\ell}}/{\mu_{\ell}})\right\rVert_{F}^{2}, (31)

the close-form solution of this sub-problem is given as

𝒴+1=tSVT1/μ(𝒯k(𝒳)𝒩/μ)\displaystyle\mathcal{Y}^{{\ell}+1}=\operatorname{t-SVT}_{1/\mu_{\ell}}(\mathcal{T}_{k}(\mathcal{X}^{{\ell}})-{\mathcal{N}^{\ell}}/{\mu_{\ell}}) (32)

via the order-(p+2)(p+2) t-SVT as stated in Lemma III.2.

2) Updating 𝒳+1\mathcal{X}^{{\ell}+1}: By deriving simply the KKT conditions, the closed-form solution of (29) is given by

𝒳+1=𝒫Ω()+𝒫Ω(𝒯k1(𝒴+1+𝒩/μ)),\mathcal{X}^{{\ell}+1}=\mathcal{P}_{\Omega}(\mathcal{M})+\mathcal{P}_{{\Omega}^{\perp}}(\mathcal{T}_{k}^{-1}(\mathcal{Y}^{{\ell}+1}+\mathcal{N}^{\ell}/\mu_{\ell})), (33)

where Ω\Omega^{\perp} is the complement of Ω\Omega and 𝒯k1()\mathcal{T}_{k}^{-1}(\cdot) is the inverse operator of 𝒯k()\mathcal{T}_{k}(\cdot). The whole optimization procedure for solving proposed TCTNN model (21) is summarized in Algorithm 1.

Algorithm 1 ADMM for solving the TCTNN model (21)
0:  observed tensor 𝒫Ω()\mathcal{P}_{\Omega}(\mathcal{M}), kk.
1:  Initialize 𝒳0=𝒫Ω()\mathcal{X}^{0}=\mathcal{P}_{\Omega}(\mathcal{M}), 𝒴0=𝒯k(𝒳)\mathcal{Y}^{0}=\mathcal{T}_{k}(\mathcal{X}), 𝒩0=𝒪\mathcal{N}^{0}=\mathcal{O}, μ0=1e5\mu_{0}=1e-5.
2:  while not converge do
3:   Update 𝒴+1\mathcal{Y}^{{\ell}+1} by (32);
4:   Update 𝒳+1\mathcal{X}^{{\ell}+1} by (33);
5:   Update multipliers 𝒩+1\mathcal{N}^{{\ell}+1} by (30) ;
6:   Let μ+1=1.1μ\mu_{{\ell}+1}=1.1\mu_{\ell}; =+1{\ell}={\ell}+1.
7:  end while
7:  imputed tensor 𝒳=𝒳+1{\mathcal{X}}^{*}=\mathcal{X}^{{\ell}+1}.

V-B Computational Complexity Analysis

For Algorithm 1, the computational complexity in each iteration contains three parts, i.e., steps 353\sim 5. First, the time complexity for order-(p+2)(p+2) t-SVT in step 3 is O(tk(n1np)(n1++np)+tk2n1np)O(tk(n_{1}\cdots n_{p})(n_{1}+\cdots+n_{p})+tk^{2}n_{1}\cdots n_{p}), corresponding to the linear transform and the matrix SVD, respectively [50]. The steps 4 and 5 have the same complexity O(tkn1np)O(tkn_{1}\cdots n_{p}) with only element-wise computation. In all, the pre-iteration computational complexity of Algorithm 1 is O(tk(n1np)(n1++np)+tk2n1np)O(tk(n_{1}\cdots n_{p})(n_{1}+\cdots+n_{p})+tk^{2}n_{1}\cdots n_{p}).

V-C Convergence Analysis

Using optimality conditions and the saddle point theory of the Lagrangian function, we derive the following convergence theory for Algorithm 1.

Theorem V.1.

The sequence (𝒳,𝒢,𝒩)(\mathcal{X}^{{\ell}},\mathcal{G}^{{\ell}},\mathcal{N}^{\ell}) generated by Algorithm 1 converges to a feasible solution of the TCTNN model (21), and the corresponding objective function converges to the optimal value p=𝒴p^{*}=\lVert\mathcal{Y}^{*}\rVert_{\circledast}, where (𝒳,𝒴,𝒩)(\mathcal{X}^{*},\mathcal{Y}^{*},\mathcal{N}^{*}) is an optimal solution of model (21).

VI Experimental results

In this section, we present a comprehensive experimental evaluation of the proposed temporal convolution tensor nulclear norm (TCTNN) approach using several real-world multidimensional time series datasets.

VI-A Multidimensional Time Series Datasets

Throughout the experiments, we adopt the following three real-world multidimensional time series datasets which are widely used in peer works.

  1. 1.

    Pacific: The Pacific surface temperature dataset111http://iridl.ldeo.columbia.edu/SOURCES/.CAC/. This dataset is sourced from sea surface temperature on the Pacific over 50 consecutive months from January 1970 to February 1974. The spatial locations are represented by a grid system with a resolution of 2 × 2 latitude-longitude. The grid comprises 30 × 84 cells, resulting in a three-dimensional temperature data tensor with dimensions of 50 × 30 × 84, where 50 corresponds to the temporal dimension, while 30 and 84 represent the latitudinal and longitudinal grid counts, respectively.

  2. 2.

    Abilene: The Abilene network flow dataset222http://abilene.internet2.edu/observatory/data-collections.html. Regarding the Abilene dataset, it encompasses 12 routers. Network traffic data for each Origin-Destination (OD) pair is recorded from 12:00 AM to 5:00 AM on March 1, 2004, with a temporal resolution of 5 minutes. We structure this data into a tensor with dimensions 60 × 12 × 12, where the first mode represents 60 time intervals, the second mode denotes 12 source routers, and the third mode signifies 12 destination routers.

  3. 3.

    NYC taxi: The New York taxi ride information Dataset333https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page. For the experimental phase, we selected trip data collected from 12:00 AM on May 1st to 02:00 AM on May 3rd, 2018. The raw data is subsequently organized into a third-order tensor, structured as (pick-up zone × drop-off zone × time slot). We delineate a total of 30 distinct pick-up/drop-off zones, and adopt a temporal resolution of 1 hour for trip aggregation. Consequently, the resulting spatiotemporal tensor exhibits dimensions of 50 × 30 × 30.

VI-B Baseline Methods

To evaluate the proposed TCTNN method, we compare it with the following baselines:

  1. 1.

    SNN: The Sum of Nuclear Norms [3]. This Low-Rank Tensor Completion (LRTC) method employs the minimization of the sum of nuclear norms of unfolded matrices of tensors to accurately estimate unobserved or missing entries in tensor data.

  2. 2.

    TNN: Tensor Nuclear Norm [53]. This method achieves low-rank tensor completion by minimizing the tensor nuclear norm, thereby facilitating the reconstruction of missing data.

  3. 3.

    BTTF: Bayesian Temporal Tensor Factorization [1]. This method employs tensor factorization and incorporates a Gaussian Vector Autoregressive (VAR) process to characterize the temporal factor matrix.

  4. 4.

    BTRTF: Bayesian Temporal Regularized Tensor Factorization [23]. The temporal regularized matrix factorization (TRMF) method discussed in the literature is applicable only to multivariate time series forecasting and is not suitable for multidimensional time series. Therefore, we employ its tensor variant method.

  5. 5.

    CNNM: Convolution Nulclear Norm Minimization [25]. This method convolves the multidimensional time series tensor along each dimension into matrices and minimizes the matrix nuclear norm of the convolution matrix to achieve temporal data prediction.

VI-C Experiment Setup

For all the parameters in the aforementioned baselines, we set them according to their released code and perform further fine-tuning to achieve optimal performance. For the proposed TCTNN method, we use the parameter k=t/2k=t/2 in all datasets. In the evaluation criteria setting, We use the MAE and RMSE to assess the prediction performance, where lower values of MAE and RMSE indicate more precise predictions.

VI-D Experimental Results

TABLE II: Performance comparison (in MAE/RMSE) ) of TCTNN and other baseline models for multidimensional time series prediction across various scenarios. The forecast horizon (FH) column indicates the respective forecast horizons.
Dataset FH SNN TNN BTTF BTRTF CNNM TCTNN
Pacific h=2 25.55/25.69 25.55/25.69 1.52/1.89 1.57/1.95 1.67/2.10 0.63/0.83
h=4 25.55/25.69 25.55/25.69 1.88/2.39 2.51/2.94 1.70/2.21 0.84/1.13
h=6 25.46/25.65 25.46/25.65 2.70/3.25 2.55/2.99 1.95/2.46 1.07/1.39
h=8 25.39/25.59 25.39/25.59 2.84/3.69 4.45/5.04 2.23/2.79 1.28/1.65
h=10 25.43/25.62 25.43/25.62 3.43/4.21 5.55/6.82 2.29/2.90 1.34/1.70
Abilene h=2 2.77/4.78 2.77/4.78 0.49/0.78 0.48/0.80 0.40/0.71 0.40/0.65
h=4 2.78/4.87 2.78/4.87 0.56/1.01 0.53/0.88 0.48/0.91 0.48/0.83
h=6 2.77/4.85 2.77/4.85 0.58/1.03 0.59/1.06 0.49/1.00 0.52/0.96
h=8 2.79/4.85 2.79/4.85 0.62/1.07 0.73/1.31 0.52/1.01 0.55/0.98
h=10 2.78/4.84 2.78/4.84 0.60/1.02 0.65/1.12 0.51/1.00 0.57/0.99
NYC taxi h=2 4.49/7.88 4.49/7.88 2.89/3.98 3.30/4.69 2.73/4.04 2.54/3.48
h=4 7.37/12.62 7.37/12.62 3.40/4.99 3.93/6.54 3.23/5.09 3.05/4.43
h=6 8.89/15.08 8.89/15.08 3.38/5.21 4.44/ 6.83 3.96/6.59 3.24/4.78
h=8 10.01/17.42 10.01/17.42 3.47/5.31 6.28/10.11 5.30/9.28 3.55/5.38
h=10 10.17/17.96 10.17/17.96 3.67/5.72 7.73/13.39 5.89/10.85 3.57/5.55
The best results are highlighted with bold and the second best are highlighted with underline.

In assessing the proposed TCTNN model under various conditions, forecast horizons are established at 2, 4, 6, 8, and 10 for all the multidimensional time series datasets. We summarize the predictive results of the TCTNN model alongside other baseline methods in Table II. The first two models are low-rank tensor completion models, while the latter four are tensor-based prediction models. From Table II, we observe that our proposed TCTNN model outperforms the other baseline methods in most testing scenarios, with the CNNM and BTTF models being the next best performers. Notably, the poor prediction results of the SNN and TNN models confirm that low-rank tensor completion models are not suitable for prediction problems. The running times of different methods across various datasets, with the prediction domain set to 10, are summarized in Table III. From Table III, it is evident that our proposed TCTNN model is more time-efficient than the CNNM model, which is consistent with the previous complexity analyses.

TABLE III: Summary of the running time (in seconds) of different methods for prediction tasks across various datasets. Time less than 1 second is counted as 1 second.
Method Abilene NYC taxi Pacific
SNN 1 1 2
TNN 1 1 2
BTTF 17 72 114
BTRTF 9 67 119
CNNM 28 418 18192
TCTNN 7 25 104

May 2nd, 11:00 PM May 3rd, 2:00 AM
Refer to caption Refer to caption
Ground truth 1 Ground truth 2
Refer to caption Refer to caption
TCTNN 1 TCTNN 2
Figure 5: The predicted results of the TCTNN model and the corresponding true values under forecast horizon 4 on the NYC taxi dataset. Ground truth 1 and Ground truth 2 represent the true values at two distinct time points, while TCTNN 1 and TCTNN 2 denote the predicted values at the corresponding time points.
Refer to caption Refer to caption
h=4 h=6
Refer to caption Refer to caption
h=8 h=10
Figure 6: The gap between the predictive values obtained by the TCTNN model and the true values at 5:00 PM on March 1 under different forecast horizon on the Abilene data.
Refer to caption Refer to caption Refer to caption
Ground truth SNN&TNN BTTF
Refer to caption Refer to caption Refer to caption
BTRTF CNNM TCTNN
Figure 7: The Prediction results of the Pacific surface temperature in September 1973 using the TCTNN model and other tensor based models.
Sep. 1973 Oct. 1973 Nov. 1973 Dec. 1973 Jan. 1974 Feb. 1974
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Ground truth 1 Ground truth 2 Ground truth 3 Ground truth 4 Ground truth 5 Ground truth 6
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
CNNM 1 CNNM 2 CNNM 3 CNNM 4 CNNM 5 CNNM 6
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
TCTNN 1 TCTNN 2 TCTNN 3 TCTNN 4 TCTNN 5 TCTNN 6
Figure 8: The predictive results of the TCTNN model and the CNNM model on the Pacific surface temperature dataset.

To intuitively evaluate the prediction data obtained by the proposed TCTNN method, we use the NYC taxi traffic dataset for testing and take the forecast horizon as 4, then plot the comparison of the prediction results and the true value in Fig.5. We can see that our prediction results are very consistent with the actual situation. Besides, we calculate the difference between the predicted value and the true value under different forecast horizon on the Abilene data, and the results are shown in Fig.6. The figure illustrates that the difference between the predicted values and the true values across each prediction domain is minimal, with most differences approaching zero. In order to further demonstrate the advantages of the proposed TCTNN method over other baseline methods, we test these models on the the Pacific temperature dataset with a prediction domain of 6 and obtain the prediction results of each method, as shown in Figure 7. We observe that the prediction results of SNN and TNN are all 0, the excellent prediction performance of CNNM and TCTNN shows the advantages of convolution nulclear norm type methods in small-sample time series prediction. To ulteriorly clarify that our proposed TCTNN method has a stronger ability to capture temporal patterns in multidimensional time series compared to the CNNM method, we apply the TCTNN model and the CNNM model to the Pacific temperature dataset for predictions at h=6. The prediction results are displayed alongside the Ground truth over time in Figure 8. It is unequivocal that while CNNM’s predictions closely align with the Ground truth during the first three months, its performance deteriorates significantly in the following three months compared to TCTNN. This effectively demonstrates that TCTNN possesses a stronger capability for capturing the characteristics of temporal data.

VI-E Discussions

VI-E1 Convergence testing

To verify the convergence of our proposed Algorithm 1, we employ a randomly generated tensor of size 50×50×50 for testing. We calculate the relative error at each iteration step using the formula 𝒳k+1𝒳kF/𝒳kF\|{\mathcal{X}_{k+1}}-{\mathcal{X}_{k}}\|_{F}/\|{\mathcal{X}_{k}}\|_{F}, and present the corresponding error plots in Figure 9. From the error curves, it is obvious that the tensor changes minimally after 100 iterations, demonstrating the convergence of the algorithm.

Refer to caption
Figure 9: The convergence curves of Algorithm 1.

VI-E2 Kernel size selection

The selection of convolution kernel size is a crucial issue for TCTNN models, as different kernel sizes lead to varying prediction results. After conducting tests on multiple types of data, we find that setting the kernel size kk to half the time dimension scale tt, i.e., k=t/2k=t/2, consistently produces favorable prediction results, which aligns with findings from previous study [25]. As illustrated in Figure 10, the test results on the Pacific dataset also demonstrate the advantage of using k=t/2k=t/2.

Refer to caption
Figure 10: Histogram of the predicted RMSE results caused by various kernel size selections under the FFT transform matrix on the Pacific dataset.

VI-E3 Abalation study

To further highlight the benefits of the proposed TCTNN method in ensuring the tensor structure, we convolve time series along the temporal dimension to obtain the convolution matrix, upon which the nuclear norm is imposed to get the following Temporal Convolution Matrix Nulclear Norm (TCMNN) model:

min𝒳t×n1××np𝒱k(𝒳),s.t.𝒫Ω(𝒳)=𝒫Ω().\min_{\mathcal{X}\in\mathbb{R}^{t\times n_{1}\times\cdots\times n_{p}}}~{}\lVert\mathcal{V}_{k}(\mathcal{X})\rVert_{*},\ \text{s.t.}\ \mathcal{P}_{\Omega}(\mathcal{X})=\mathcal{P}_{\Omega}(\mathcal{M}).

where \|\cdot\|_{*} denotes the nuclear norm of a matrix, Y=reshape(𝒳,[n1np,t])=[𝐲𝟏,,𝐲𝐧𝟏𝐧𝐩]Y=reshape(\mathcal{X},[n_{1}...n_{p},t])=[\mathbf{y_{1}}^{\prime},\cdots,\mathbf{y_{n_{1}...n_{p}}}^{\prime}]^{\prime}, 𝒱k(𝒳)=[𝒜k(𝐲𝟏),,𝒜k(𝐲𝐧𝟏𝐧𝐩)]\mathcal{V}_{k}(\mathcal{X})=[\mathcal{A}_{k}(\mathbf{y_{1}})^{\prime},\cdots,\mathcal{A}_{k}(\mathbf{y_{n_{1}...n_{p}}})^{\prime}]^{\prime}, and 𝒜k()\mathcal{A}_{k}(\cdot) is a vector convolution transform that obeys the following form

𝒜k(𝐚)=[a1atatk+2a2a1atk+3atat1˙atk+1],𝐚=[a1,,at].\mathcal{A}_{k}(\mathbf{a})=\left[\begin{array}[]{cccc}a_{1}&a_{t}&\cdots&a_{t-k+2}\\ a_{2}&a_{1}&\cdots&a_{t-k+3}\\ \vdots&\vdots&\vdots&\vdots\\ a_{t}&a_{t-1}&\dot{}&a_{t-k+1}\end{array}\right],\mathbf{a}=[a_{1},\cdots,a_{t}]^{\prime}.

Obviously, the TCMNN model is a special case of CNNM with a convolution kernel size of k×1××1k\times 1\times\cdots\times 1. For both the TCTNN model and the TCMNN model, we use k=t/2k=t/2 to test on the Pacific dataset. We select the forecast horizon h=3,5,7,9h=3,5,7,9 and summarize the prediction error MAE and RMSE of TCTNN and TCMNN in Table IV. Table IV shows that the prediction accuracy of TCTNN model is much higher than that of TCMNN, which reflects the importance of preserving the tensor structure of multi-dimensional time series in the prediction task. For other discussions of the experiment, such as multi-sample time series analysis and applications to multivariate time series, please refer to supplementary material.

TABLE IV: Performance comparison (in MAE/RMSE) ) of TCTNN and TCMNN for time series prediction on the Pacific dataset.
Method h=3 h=5 h=7 h=9
TCMNN 1.77/2.18 2.32/2.75 2.79/3.29 2.93/3.49
TCTNN 0.75/1.01 0.94/1.25 1.20/1.56 1.29/1.68

VII Conclusion and future directions

This study introduces an efficient method for multidimensional time series prediction by leveraging a novel deterministic tensor completion theory. Initially, we illustrate the limitations of applying tensor nuclear norm minimization directly to the prediction problem within the deterministic tensor completion framework. To address this challenge, we propose structural modifications to the original multidimensional time series and employ tensor nuclear norm minimization on the temporal convolution tensor, resulting in the TCTNN model. This model achieves exceptional performance in terms of accuracy and computational efficiency across various real-world scenarios.

However, as discussed, the assumption of low-rankness in the temporal convolution tensor may not hold when the multidimensional time series lack periodicity or smoothness along the time dimension. This observation suggests the need for further investigation into learning-based low-rank temporal convolution methods, similar to learning-based convolution norm minimization approaches [24]. Additionally, the deterministic tensor completion theory developed in this study has broader applicability to structured missing data completion problems, presenting exciting opportunities for extending its use to other structured data recovery tasks.

References

  • Chen and Sun [2021] X. Chen and L. Sun, “Bayesian temporal factorization for multidimensional time series prediction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 4659–4673, 2021.
  • Ling et al. [2021] C. Ling, G. Yu, L. Qi, and Y. Xu, “T-product factorization method for internet traffic data completion with spatio-temporal regularization,” Comput. Optim. Appl., vol. 80, pp. 883–913, 2021.
  • Liu et al. [2012] J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 208–220, 2012.
  • Schein et al. [2016] A. Schein, M. Zhou, D. Blei, and H. Wallach, “Bayesian poisson tucker decomposition for learning the structure of international relations,” in Int. Conf. Mach. Learn.   PMLR, 2016, pp. 2810–2819.
  • Chen et al. [2022] R. Chen, D. Yang, and C.-H. Zhang, “Factor models for high-dimensional tensor time series,” J. Amer. Stat. Assoc., vol. 117, no. 537, pp. 94–116, 2022.
  • Liu et al. [2020] T. Liu, S. Chen, S. Liang, S. Gan, and C. J. Harris, “Fast adaptive gradient rbf networks for online learning of nonstationary time series,” IEEE Trans. Signal Process., vol. 68, pp. 2015–2030, 2020.
  • Meynard and Wu [2021] A. Meynard and H.-T. Wu, “An efficient forecasting approach to reduce boundary effects in real-time time-frequency analysis,” IEEE Trans. Signal Process., vol. 69, pp. 1653–1663, 2021.
  • Isufi et al. [2019] E. Isufi, A. Loukas, N. Perraudin, and G. Leus, “Forecasting time series with varma recursions on graphs,” IEEE Trans. Signal Process., vol. 67, no. 18, pp. 4870–4885, 2019.
  • Li and Shahabi [2018] Y. Li and C. Shahabi, “A brief overview of machine learning methods for short-term traffic forecasting and future directions,” Sigspatial Special, vol. 10, no. 1, pp. 3–9, 2018.
  • Shu et al. [2024] H. Shu, H. Wang, J. Peng, and D. Meng, “Low-rank tensor completion with 3-d spatiotemporal transform for traffic data imputation,” IEEE Trans. Intell. Transp. Syst., vol. 25, no. 11, pp. 18 673–18 687, 2024.
  • Ma et al. [2018] H. Ma, S. Leng, K. Aihara, W. Lin, and L. Chen, “Randomly distributed embedding making short-term high-dimensional data predictable,” Proc. Nat. Acad. Sci, vol. 115, no. 43, pp. E9994–E10 002, 2018.
  • Hyndman [2018] R. Hyndman, Forecasting: principles and practice.   OTexts, 2018.
  • Athanasopoulos and Vahid [2008] G. Athanasopoulos and F. Vahid, “Varma versus var for macroeconomic forecasting,” J. Bus. Econ. Stat., vol. 26, no. 2, pp. 237–252, 2008.
  • Wang et al. [2022] D. Wang, Y. Zheng, H. Lian, and G. Li, “High-dimensional vector autoregressive time series modeling via tensor decomposition,” J. Amer. Stat. Assoc., vol. 117, no. 539, pp. 1338–1356, 2022.
  • Basu and Michailidis [2015] S. Basu and G. Michailidis, “Regularized estimation in sparse high-dimensional time series models,” Annal. Stat., vol. 43, p. 1535–1567, 2015.
  • Chen et al. [2023] X. Chen, C. Zhang, X. Chen, N. Saunier, and L. Sun, “Discovering dynamic patterns from spatiotemporal data with time-varying low-rank autoregression,” IEEE Trans. Knowl. Data Eng., 2023.
  • Lam and Yao [2012] C. Lam and Q. Yao, “Factor modeling for high-dimensional time series: inference for the number of factors,” Annal. Stat., pp. 694–726, 2012.
  • Chen et al. [2021] R. Chen, H. Xiao, and D. Yang, “Autoregressive models for matrix-valued time series,” J. Econom., vol. 222, no. 1, pp. 539–560, 2021.
  • Wang et al. [2024] D. Wang, Y. Zheng, and G. Li, “High-dimensional low-rank tensor autoregressive time series modeling,” J. Econom., vol. 238, no. 1, p. 105544, 2024.
  • Shi et al. [2020] Q. Shi, J. Yin, J. Cai, A. Cichocki, T. Yokota, L. Chen, M. Yuan, and J. Zeng, “Block hankel tensor arima for multiple short time series forecasting,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 04, 2020, pp. 5758–5766.
  • Jing et al. [2018] P. Jing, Y. Su, X. Jin, and C. Zhang, “High-order temporal correlation model learning for time-series prediction,” IEEE Trans. Cybern., vol. 49, no. 6, pp. 2385–2397, 2018.
  • Takeuchi et al. [2017] K. Takeuchi, H. Kashima, and N. Ueda, “Autoregressive tensor factorization for spatio-temporal predictions,” in Proc. IEEE Int. Conf. Data Mining.   IEEE, 2017, pp. 1105–1110.
  • Yu et al. [2016] H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal regularized matrix factorization for high-dimensional time series prediction,” in Proc. Advances Neural Inf. Process. Syst., vol. 29, 2016.
  • Liu [2022] G. Liu, “Time series forecasting via learning convolutionally low-rank models,” IEEE Trans. Inf. Theory, vol. 68, no. 5, pp. 3362–3380, 2022.
  • Liu and Zhang [2022] G. Liu and W. Zhang, “Recovery of future data via convolution nuclear norm minimization,” IEEE Trans. Inf. Theory, vol. 69, no. 1, pp. 650–665, 2022.
  • Gillard and Usevich [2018] J. Gillard and K. Usevich, “Structured low-rank matrix completion for forecasting in time series analysis,” Int. J. Forecasting, vol. 34, no. 4, pp. 582–597, 2018.
  • Butcher and Gillard [2017] H. Butcher and J. Gillard, “Simple nuclear norm based algorithms for imputing missing data and forecasting in time series,” Stat. Interface, vol. 10, no. 1, pp. 19–25, 2017.
  • Király and Tomioka [2012] F. Király and R. Tomioka, “A combinatorial algebraic approach for the identifiability of low-rank matrix completion,” Proc. Int. Conf. Mach. Learn., p. 755–762, 2012.
  • Singer and Cucuringu [2010] A. Singer and M. Cucuringu, “Uniqueness of low-rank matrix completion by rigidity theory,” SIAM J. Matrix Anal. Appl., vol. 31, no. 4, pp. 1621–1641, 2010.
  • Harris and Zhu [2021] K. D. Harris and Y. Zhu, “Deterministic tensor completion with hypergraph expanders,” SIAM J. Math. Data Sci., vol. 3, no. 4, pp. 1117–1140, 2021.
  • Tsakiris [2023] M. C. Tsakiris, “Low-rank matrix completion theory via plücker coordinates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 10 084–10 099, 2023.
  • Foucart et al. [2020] S. Foucart, D. Needell, R. Pathak, Y. Plan, and M. Wootters, “Weighted matrix completion from non-random, non-uniform sampling patterns,” IEEE Trans. Inf. Theory, vol. 67, no. 2, pp. 1264–1290, 2020.
  • Shapiro et al. [2018] A. Shapiro, Y. Xie, and R. Zhang, “Matrix completion with deterministic pattern: A geometric perspective,” IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1088–1103, 2018.
  • Chatterjee [2020] S. Chatterjee, “A deterministic theory of low rank matrix completion,” IEEE Trans. Inf. Theory, vol. 66, no. 12, pp. 8046–8055, 2020.
  • Chen et al. [2015] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, “Completing any low-rank matrix, provably,” J. Mach. Learn. Res., vol. 16, no. 1, pp. 2999–3034, 2015.
  • Liu et al. [2019] G. Liu, Q. Liu, X.-T. Yuan, and M. Wang, “Matrix completion with deterministic sampling: Theories and methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 549–566, 2019.
  • Lee and Shraibman [2013] T. Lee and A. Shraibman, “Matrix completion from any given set of observations,” Proc. Advances Neural Inf. Process. Syst., vol. 26, 2013.
  • Candès and Recht [2009] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Found. Comput. Math., vol. 9, no. 6, pp. 717–772, 2009.
  • Candès and Tao [2010] E. J. Candès and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2053–2080, 2010.
  • Wang et al. [2023] H. Wang, J. Peng, W. Qin, J. Wang, and D. Meng, “Guaranteed tensor recovery fused low-rankness and smoothness,” IEEE Trans. Pattern Anal. Mach. Intell., 2023.
  • Peng et al. [2022] J. Peng, Y. Wang, H. Zhang, J. Wang, and D. Meng, “Exact decomposition of joint low rankness and local smoothness plus sparse matrices,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5766–5781, 2022.
  • Tan et al. [2016] H. Tan, Y. Wu, B. Shen, P. J. Jin, and B. Ran, “Short-term traffic prediction based on dynamic tensor completion,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 8, pp. 2123–2133, 2016.
  • Dunlavy et al. [2011] D. M. Dunlavy, T. G. Kolda, and E. Acar, “Temporal link prediction using matrix and tensor factorizations,” ACM Trans. Knowl. Discovery Data, vol. 5, no. 2, pp. 1–27, 2011.
  • Chen et al. [2019] X. Chen, Z. He, and L. Sun, “A bayesian tensor decomposition approach for spatiotemporal traffic data imputation,” Transp. Res. Pt. C-Emerg. Technol., vol. 98, pp. 73–84, 2019.
  • Yang et al. [2021] J.-M. Yang, Z.-R. Peng, and L. Lin, “Real-time spatiotemporal prediction and imputation of traffic status based on lstm and graph laplacian regularized matrix factorization,” Transp. Res. Pt. C-Emerg. Technol., vol. 129, p. 103228, 2021.
  • Kilmer and Martin [2011] M. E. Kilmer and C. D. Martin, “Factorization strategies for third-order tensors,” Linear Alg. Appl., vol. 435, no. 3, pp. 641–658, 2011.
  • Kilmer et al. [2021] M. E. Kilmer, L. Horesh, H. Avron, and E. Newman, “Tensor-tensor algebra for optimal representation and compression of multiway data,” Proc Natl Acad Sci U S A., vol. 118, no. 28, 2021.
  • Lu et al. [2019] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan, “Tensor robust principal component analysis with a new tensor nuclear norm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 925–938, 2019.
  • Martin et al. [2013] C. D. Martin, R. Shafer, and B. LaRue, “An order-p tensor factorization with applications in imaging,” SIAM J. Sci. Comput., vol. 35, no. 1, pp. A474–A490, 2013.
  • Qin et al. [2022] W. Qin, H. Wang, F. Zhang, J. Wang, X. Luo, and T. Huang, “Low-rank high-order tensor completion with applications in visual data,” IEEE Trans. Image Process., vol. 31, pp. 2433–2448, 2022.
  • Fahmy et al. [2012] M. F. Fahmy, G. M. A. Raheem, U. S. Mohamed, O. F. Fahmy et al., “A new fast iterative blind deconvolution algorithm,” J. Signal Inf. Process, vol. 3, no. 01, p. 98, 2012.
  • Kolda and Bader [2009] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.
  • Zhang and Aeron [2016] Z. Zhang and S. Aeron, “Exact tensor completion using t-svd,” IEEE Trans. Signal Process., vol. 65, no. 6, pp. 1511–1526, 2016.
  • Boyd et al. [2011] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.
  • Bai et al. [2018] J. Bai, J. Li, F. Xu, and H. Zhang, “Generalized symmetric admm for separable convex optimization,” Comput. Optim. Appl., vol. 70, no. 1, pp. 129–170, 2018.
  • Bai et al. [2022] J. Bai, W. W. Hager, and H. Zhang, “An inexact accelerated stochastic admm for separable convex optimization,” Comput. Optim. Appl., pp. 1–40, 2022.