SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams
Abstract
Consider traffic data (i.e., triplets in the form of source-destination-timestamp) that grow over time. Tensors (i.e., multi-dimensional arrays) with a time mode are widely used for modeling and analyzing such multi-aspect data streams. In such tensors, however, new entries are added only once per period, which is often an hour, a day, or even a year. This discreteness of tensors has limited their usage for real-time applications, where new data should be analyzed instantly as it arrives.
How can we analyze time-evolving multi-aspect sparse data ‘continuously’ using tensors where time is ‘discrete’? We propose SliceNStitch for continuous CANDECOMP/PARAFAC (CP) decomposition, which has numerous time-critical applications, including anomaly detection, recommender systems, and stock market prediction. SliceNStitch changes the starting point of each period adaptively, based on the current time, and updates factor matrices (i.e., outputs of CP decomposition) instantly as new data arrives. We show, theoretically and experimentally, that SliceNStitch is (1) ‘Any time’: updating factor matrices immediately without having to wait until the current time period ends, (2) Fast: with constant-time updates up to faster than online methods, and (3) Accurate: with fitness comparable (specifically, ) to offline methods.
I Introduction
Tensors (i.e., multidimensional arrays) are simple but powerful tools widely used for representing time-evolving multi-aspect data from various fields, including bioinformatics [1], data mining [2, 3], text mining [4], and cybersecurity [5, 6]. For example, consider a traffic dataset given as triplets in the form of (source, destination, timestamp). The dataset is naturally represented as a -mode tensor whose three modes correspond to sources, destinations, and time ranges, respectively (see Fig. 1a and 1b). Each -th entry of the tensor represents the amount of traffic from the -th source to the -th destination during the -th time range.






Methods Based on Conventional CPD | SliceNStitch | ||
---|---|---|---|
(Coarse-grained) | (Fine-grained) | (Proposed) | |
Update Interval | Long (\faThumbsODown) | Short (\faThumbsOUp) | Short (\faThumbsOUp) |
Parameters | Few (\faThumbsOUp) | Many (\faThumbsODown) | Few (\faThumbsOUp) |
Fitness | High (\faThumbsOUp) | Low (\faThumbsODown) | High (\faThumbsOUp) |
Once we represent data as a tensor, many tools [7, 8, 9, 10] are available for tensor analysis, and CANDECOMP/PARAFAC decomposition (CPD) [7] is one of the most widely used tools. Given an -mode tensor, CPD gives a low-rank approximation, specifically a sum of few outer products of vectors, which form factor matrices. CPD has been used for various applications, and many of them, including anomaly detection [2], recommendation [11, 12], stock market prediction [13], and weather forecast [14], are time-critical.
While tensors and CPD are powerful tools, they are not suitable for real-time applications since time in them advances in a discrete manner, specifically once per period. For example, in the tensor in Fig. 1a, each slice represents the amounts of traffic for one hour, and thus the tensor grows with a new slice only once per hour. That is, it may take one hour for new traffic to be applied to the tensor. For instance, traffic occurring at 2:00:01 is applied to the tensor at 3:00:00. Due to this discreteness, the outputs of CPD (i.e., factor matrices) are updated also only once per period even if incremental algorithms [15, 16, 17] are used.
How can we perform CPD ‘continuously’ for real-time applications? A potential solution is to make the granularity of the time mode extremely fine. According to our preliminary studies, however, it causes the following problems:
-
•
Degradation of Fitness (Fig. 1c) An extremely fine-grained time mode results in an extremely sparse tensor, which is known to be of high rank [18], and thus it degrades the fitness of the low-rank approximation, such as conventional CPD. As shown in Fig. 1c, the finer the time mode is, the lower the fitness of CPD is.
-
•
Increase in the Number of Parameters (Fig. 1d): The parameters of CPD are the entries of factor matrices, as explained in Section II, and the size of each factor matrix is proportional to the length of the corresponding mode of the input tensor. An extremely fine-grained time mode leads to an extremely long time mode and thus extremely many parameters, which require huge computational and storage resources. As shown in Fig. 1d, the finer the time mode is, the more parameters CPD requires.
In this work, we propose SliceNStitch for continuous CPD without increasing the number of parameters. It consists of a data model and online algorithms for CPD. From the data model aspect, we propose the continuous tensor model for time-evolving tensors. In the model, the starting point of each period changes adaptively, based on the current time, so that newly arrived data are applied to the input tensor instantly. From the algorithmic aspect, we propose a family of online algorithms for CPD of sparse tensors in the continuous tensor model. They update factor matrices instantly in response to each change in an entry of the input tensor. To the best of our knowledge, they are the first algorithms for this purpose, and existing online CPD algorithms [15, 16, 17] update factor matrices only once per period. We summarize our contributions as follows:
-
•
New data model: We propose the continuous tensor model, which allows for processing time-evolving tensors continuously in real-time for time-critical applications.
-
•
Fast online algorithms: We propose the first online algorithms that update outputs of CPD instantly in response to changes in an entry. Their fitness is comparable (specifically, ) even to offline competitors, and an update by them is up to faster than that of online competitors.
-
•
Extensive experiments: We extensively evaluate our algorithms on 4 real-world sparse tensors, and based on the results, we provide practitioner’s guides to algorithm selection and hyperparameter tuning.
Reproducibility: The code and datasets used in the paper are available at https://github.com/DMLab-Tensor/SliceNStitch.
Remarks: CPD may not be the best decomposition model for tensors with time modes, and there exist a number of alternatives, such as Tucker, INDSCAL, and DEDICOM (see [19]). Nevertheless, as a first step, we focus on making CPD ‘continuous’ due to its prevalence and simplicity. We leave extending our approach to more models as future work.
In Section II, we introduce some notations and preliminaries. In Section III, we provide a formal problem definition. In Sections IV and V, we present the model and optimization algorithms of SliceNStitch, respectively. In Section VI, we review our experiments. After discussing some related works in Section VII, we conclude in Section VIII.
II Preliminaries
In this section, we introduce some notations and preliminaries. Some frequently-used symbols are listed in Table I.
Symbol | Definition |
---|---|
a matrix | |
-th row of , -th column of | |
, | transpose of , pseudoinverse of |
, | Khatri-Rao product, Hadamard product |
a tensor | |
order of | |
number of indices in the -th mode of | |
-th entry of | |
number of non-zeros of | |
Frobenius norm of | |
mode- matricization of | |
rank of CPD | |
factor matrix of the -th mode | |
an approximation of by CPD | |
tensor window at time | |
a change in | |
number of indices in the time mode | |
number of non-zeros with -th mode index | |
-th entry of |
Basic Notations: Consider a matrix . We denote its -th row by and its -th column by . We denote the transpose of by and the Moore-Penrose pseudoinverse of by . We denote the Khatri-Rao and Hadamard products by and , respectively. See Section I of [20] for the definitions of the Moore-Penrose pseudoinverse and both products.
Consider an -mode sparse tensor , where denotes the length of the -th mode. We denote each -th entry of by . We let be the number of non-zero entries in , and we let be the Frobenius norm of . We let be the matrix obtained by matricizing along the -th mode. See Section I of [20] for the definitions of the Frobenius norm and matricization.
CANDECOMP/PARAFAC Decomposition (CPD): Given an -mode tensor and rank , CANDECOMP/PARAFAC Decomposition (CPD) [7] gives a rank- approximation of , expressed as the sum of rank- tensors (i.e., outer products of vectors) as follows:
(1) |
where for all , and denotes the outer product (see Section I of [20] for the definition). Each is called the factor matrix of the -th mode.
CP decomposition aims to find factor matrices that minimize the difference between the input tensor and its approximation . That is, it aims to solve Eq. (2).
(2) |
where is the Frobenius norm (see Section I of [20] for the definition).
Alternating Least Squares (ALS): Alternating least squares (ALS) [8, 21] is a standard algorithm for computing CPD of a static tensor. For each -th mode, , and thus the mode- matricization of Eq. (2) becomes
(3) |
While the objective function in Eq. (3) is non-convex, solving Eq. (3) only for while fixing all other factor matrices is a least-square problem. Finding that makes the derivative of the objective function in Eq. (3) with respect to zero leads to the following update rule for :
(4) |
ALS performs CPD by alternatively updating each factor matrix using Eq. (4) until convergence.
III Problem Definition
In this section, we first define multi-aspect data streams. Then, we describe how it is typically modeled as a tensor and discuss the limitation of the common tensor modeling method. Lastly, we introduce the problem considered in this work.
Multi-aspect Data Stream and Examples: We define a multi-aspect data stream, which we aim to analyze, as follows:
Definition 1 (Multi-aspect Data Stream).
A multi-aspect data stream is defined as a sequence of timestamped -tuples , where are categorical variables, is a numerical variable, and is the time (e.g., Unix timestamp) when the -th tuple occurs. We assume that the sequence is chronological, i.e.,
For simplicity, we also assume that the categorical variables are also (assigned to) integers, i.e.,
Time-evolving data from various domains are naturally represented as a multi-aspect data stream as follows:
-
•
Traffic History: Each -tuple = (source, destination, ) indicates a trip that started at time .
-
•
Crime History: Each -tuple = (location, type, ) indicates an incidence of crime at time
-
•
Purchase History: Each -tuple = (user, product, color, quantity) indicates a purchase at time .
Common Tensor Modeling Method and its Limitations: Multi-aspect data streams are commonly modeled as tensors to benefit from powerful tensor analysis tools (e.g., CP decomposition) [22, 23, 24, 25, 26, 27, 16, 17]. Specifically, a multi-aspect data stream is modeled as an -mode tensor , where is the number of indices in the time mode (i.e., the -th mode). For each , let be the -mode tensor obtained from by fixing the -th mode index to . That is, where denotes concatenation. Each tensor is the sum over time units (i.e., is the period) ending at . That is, each -th entry of is the sum of over all -tuples where for all and . See Figs. 1a and 1b for examples where is an hour and a second, respectively. As new tuples in the multi-aspect data stream arrive, a new -mode tensor is added to once per period .
Additionally, in many previous studies on window-based tensor analysis [22, 23, 24, 26], the oldest -mode tensor is removed from once per period to fix the number of indices in the time mode to . That is, at time where , .
A serious limitation of this widely-used tensor model is that the tensor changes only once per period while the input multi-aspect data stream grows continuously. Thus, it is impossible to analyze multi-aspect data streams continuously in real time in response to the arrival of each new tuple.
Problem Definition: How can we continuously analyze multi-aspect data streams using tensor-analysis tools, specifically, CPD? We aim to answer this question, as stated in Problem 1.
Problem 1 (Continuous CP Decomposition).
(1) Given: a multi-aspect data stream, (2) to update: its CP decomposition instantly in response to each new tuple in the stream, (3) without having to wait for the current period to end.
IV Proposed Data Model and Implementation
In this section, we propose the continuous tensor model and its efficient implementation. This data model is one component of SliceNStitch. See Section V for the other component.
IV-A Proposed Data Model: Continuous Tensor Model
We first define several terms which our model is based on.
Definition 2 (Tensor Slice).
Given a multi-aspect data stream (Definition 1), for each timestamped -tuple , we define the tensor slice as an -mode tensor where the -th entry is and the other entries are zero.
Definition 3 (Tensor Unit).
Given a multi-aspect data stream, a time , and the period , we define the tensor unit as
That is, is an aggregation of the tuples occurred within the half-open interval .
Definition 4 (Tensor Window).
Given a multi-aspect data stream, a time , the period , and the number of time-mode indices , we define the tensor window as
where denotes concatenation.
That is, concatenates the latest tensor units.

The main idea of the continuous tensor model is to adaptively adjust the starting point (or equally the end point) of each tensor unit based on the current time, as described in Definition 5 and Fig. 2.
Definition 5 (Continuous Tensor Model).
In the continuous tensor model, given a multi-aspect data stream, the period , and the number of time-mode indices , the modeled tensor evolves from to at each time , where represents the infinitesimal change in time.
Note that in the continuous tensor model, the modeled tensor changes ‘continuously’, i.e., once per minimum time unit in the input multi-aspect data stream (e.g., a millisecond), while the modeled tensor changes once per period in the typical models, discussed in Section III.
IV-B Event-driven Implementation of Continuous Tensor Model
In the continuous tensor model, it is crucial to efficiently update the modeled tensor, or in other words, efficiently compute the change in in the modeled tensor. This is because repeatedly rebuilding the modeled tensor at every time from scratch is computationally prohibitive.
We propose an efficient event-driven implementation of the continuous tensor model. Let for simplicity. Our implementation, described in Algorithm 1, is based on the observation that each tuple in the input multi-aspect data stream causes the following events:
-
S.1
At time , the value is added to .
-
S.2
At time for each , the value is subtracted from and then added to .
-
S.3
At time , the value is subtracted from .
As formalized in Theorem 1, our implementation maintains the modeled tensor up-to-date at each time by performing operations per tuple in the input stream. Note that and are usually small numbers, and if we regard them as constants, the time complexity becomes , i.e. processing each tuple takes constant time.
Theorem 1 (Time Complexity of the Continuous Tensor Model).
In Algorithm 1, the time complexity of processing each timestamped -tuple is .
Proof.
Theorem 2 (Space Complexity of the Continuous Tensor Model).
In Algorithm 1, the space complexity is
Proof.
We call the set of active tuples at time . The number of non-zeros in is upper bounded by , and thus the space required for is . Since at most one event is scheduled for each active tuple, the space required for storing all scheduled events is also . ∎
V Proposed Optimization Algorithms
In this section, we present the other part of SliceNStitch. We propose a family of online optimization algorithms for CPD of sparse tensors in the continuous tensor model. As stated in Problem 2, they aim to update factor matrices fast and accurately in response to each change in the tensor window.
Problem 2 (Online CP Decomposition of Sparse Tensors in the Continuous Tensor Model).
(1) Given:
-
•
the current tensor window ,
-
•
factor matrices for ,
-
•
an event for occurring at ,
(2) Update: the factor matrices in response to the event,
(3) To Solve: the minimization problem in Eq. (2).
Below, we use to denote the change in due to the given event. By S.1-S.3 of Section IV-B, Definition 6 follows.
Definition 6 (Input Change).
The input change is defined as the change in due to an event for occurring at , i.e.,
-
•
[If ] The -th entry of is , and the other entries are zero.
-
•
[If for ] The -th and -th entries of are and , respectively, and the others are zero.
-
•
[If ] The -th entry of is , and the others are zero.
We first introduce SliceNStitch-Matrix (SNSmat), which naively applies ALS to Problem 2. Then, we present SliceNStitch-Vector (SNSvec) and SliceNStitch-Random (SNSrnd). Lastly, we propose our main methods, SNS+vec and SNS+rnd.
V-A SliceNStitch-Matrix (SNSmat)
When we apply ALS to Problem 2, the factor matrices for the current window are strong initial points. Thus, a single iteration of ALS is enough to achieve high fitness. The detailed procedure of SNSmat is given in Algorithm 2. In line 2, we normalize111 Let the -th entry of as and the -th column of be . We set to and set to for . Then, is approximated as . the columns of each updated factor matrices for balancing the scales of the factor matrices.
Pros and Cons: For each event, SNSmat accesses every entry of the current tensor window and updates every row of the factor matrices. Thus, it suffers from high computational cost, as formalized in Theorem 3, while it maintains a high-quality solution (i.e., factor matrices).
Theorem 3 (Time complexity of SNSmat).
Let . Then, the time complexity of SNSmat is
(5) |
Proof.
See Section II.A of the online appendix [20]. ∎

V-B SliceNStitch-Vector (SNSvec)
We propose SNSvec, a fast algorithm for Problem 2. The outline of SNSvec is given in Algorithm 3, and the update rules are given in Algorithm 4 with a running example in Fig. 3. The key idea of SNSvec is to update only the rows of factor matrices that approximate changed entries of the tensor window. Starting from the maintained factor matrices, SNSvec updates such rows of the time-mode factor matrix (lines 3-3) and then such rows of the non-time mode factor matrices (lines 3-3). Below, we describe the update rules used.
Time Mode: Real-world tensors are typically modeled so that the time mode of has fewer indices than the other modes. Thus, each tensor unit (see Definition 3) in is likely to contain many non-zeros, and thus even updating only few rows of the time-mode factor matrix (i.e., ) is likely to incur considerable computational cost. To avoid the cost, SNSvec employs an approximated update rule.
From Eq. (4), the following update rule for the time-mode factor matrix follows:
(6) |
where and . If we assume that the approximated tensor in Eq. (1) approximates well, then Eq. (6) is approximated to Eq. (7).
(7) |
By a property of the Khatri-Rao product [19], Eq. (8) holds.
(8) |
By Eq. (8), Eq. (7) is reduced to Eq. (9).
(9) |
Computing Eq. (9) is much cheaper than computing Eq. (6). Since contains at most two non-zeros (see Problem 2), computing in Eq. (9) takes time. Due to the same reason, at most two rows of are updated.
Non-time Modes: When updating , while fixing the other factor matrices, the objective Eq. (3) becomes Eq. (10).
(10) |
Note that contains up to two non-zeros, which are the entries changed in , and their -th mode indices are (see Problem 2). By Eq. (3), only the -th row of is used to approximate the changed entries, and thus SNSvec updates only the row.
If we fix all the other variables except , the problem in Eq. (10) becomes the problem in Eq. (11).
(11) |
The problem in Eq. (11) is a least-square problem, and its analytical solution in Eq. (12) is available.
(12) |
where and .
After updating the -th row of each -th mode factor matrix either by Eq. (9) or Eq. (12), SNSvec incrementally maintains up to date by Eq. (13).
(13) |
where is the -th row of before the update.
Pros and Cons: By updating only few rows of each factor matrix, SNSvec significantly reduces the computational cost of SNSmat, as formalized in Theorem 4, without much loss in the quality of the solution. However, SNSvec slows down if many non-zeros are of the same index (see Eq. (14)), and it is often unstable due to numerical errors, as discussed later.
Theorem 4 (Time complexity of SNSvec).
Let be the number of non-zeros of whose -th mode index is . Then, the time complexity of SNSvec is
(14) |
Proof.
See Section II.B of the online appendix [20]. ∎
V-C SliceNStitch-Random (SNSrnd)
We introduce SNSrnd, which is even faster than SNSvec. The outline of SNSrnd (see Algorithm 3) is the same with that of SNSvec. That is, SNSrnd also updates only the rows of the factor matrices that approximate the changed entries in the current tensor window . However, when updating such a row, the number of entries accessed by SNSrnd is upper bounded by a user-specific constant , while SNSvec accesses entries (see Theorem 4), which can be as many as all the entries in . Below, we present its update rule.
Assume SNSrnd updates the -th row of . That is, consider the problem in Eq. (11). As described in the procedure updateRowRan in Algorithm 4, SNSrnd uses different approaches depending on a user-specific threshold and , i.e., the number of non-zeros of whose -th mode index is . If is smaller than or equal to , then SNSrnd uses Eq. (12) (lines 4-4), which is also used in SNSvec.
However, if is greater than , SNSrnd speeds up the update through approximation. First, it samples indices from without replacement, while fixing the -th mode index to (line 4).222We ignore the indices of non-zeros in even if they are sampled. Let the set of sampled indices be , and let be a tensor whose entries are all 0 except those with the sampled indices . For each sampled index , . Note that for any index of , if and otherwise. Thus, the more samples SNSrnd draws with larger , the closer is to . SNSrnd uses to approximate in the update. Specifically, it replaces of the objective function in Eq. (11) with . Then, as in Eq. (12), the update rule in Eq. (15) follows.
(15) |
where and . Let be the -th mode factor matrix before the update and be . By Eq. (8), Eq. (15) is equivalent to Eq. (16).
(16) |
Noteworthy, has at most non-zeros. SNSrnd uses Eq. (16) to update the -th row of (line 4 of Algorithm 4). It incrementally maintains up to date by Eq. (13) (line 4), as SNSvec does. It also maintains up to date by Eq. (17) (line 4).
(17) |
where .
Pros and Cons: Through approximation, SNSrnd upper bounds the number of non-zeros of in Eq. (16) by . As a result, the time complexity of SNSrnd, given in Theorem 5, is lower than that of SNSvec. Specifically, in Eq. (14), which can be as big as , is replaced with the user-specific constant in Eq. (18) Noteworthy, if we regard , , and in Eq. (18) as constants, the time complexity of SNSrnd becomes constant. This change makes SNSrnd significantly faster than SNSvec, at the expense of a slight reduction in the quality of the solution.
Theorem 5 (Time complexity of SNSrnd).
Proof.
See Section II.C of the online appendix [20]. ∎
Unlike SNSmat, SNSvec and SNSrnd do not normalize the columns of factor matrices during the update process. This is because normalization requires time, which is proportional to the number of all entries in all factor matrices, and thus significantly increases the time complexity of SNSvec and SNSrnd. However, without normalization, the entries of factor matrices may have extremely large or extremely small absolute values, making SNSvec and SNSrnd vulnerable to numerical errors. In our experiments (see Fig. 4 in Section VI-C), the accuracies of SNSvec and SNSrnd suddenly drop due to numerical errors in some datasets.
V-D SliceNStitch-Stable (SNS+vec and SNS+rnd)
In this section, we propose SNS+vec and SNS+rnd, which successfully address the aforementioned instability of SNSvec and SNSrnd. The main idea is to clip each entry, (i.e., ensure that each entry is within a predefined range), while at the same time ensuring that the objective function does not increase. To this end, SNS+vec and SNS+rnd employs coordinate descent, where each entry of factor matrices is updated one by one. The outline of SNS+vec and SNS+rnd (see Algorithm 3) is the same as that of SNSvec and SNSrnd. Below, we present their update rules, which are used in Algorithm 5.
Coordinate descent updates one variable (i.e., entry of a factor matrix) at a time while fixing all the other variables. Assume an entry of is updated. Solving the problem in Eq. (2) with respect to while fixing the other variables is equivalent to solving the problem in Eq. (19).
(19) |
where , and is the set of indices of of which the -th mode index is . To describe its solution, we first define the following terms:
(20) | ||||
where is before any update.
Solving the Problem in Eq. (19): The problem in Eq. (19) is a least square problem, and thus there exists the closed-form solution, which is used to update in Eq. (21).
(21) |
Eq. (21) is used, instead of Eq. (12), when updating non-time mode factor matrices (i.e., when ) in SNS+vec (line 5 of Algorithm 5). It is also used, instead of Eq. (12), in SNS+rnd when (line 5). As in SNSvec, when updating the time-mode factor matrix, SNS+vec approximates by , and thus it uses Eq. (22) (line 5).
(22) |
Similarly, as in SNSrnd, when , SNS+rnd approximates by , and thus it uses Eq. (23) (line 5).
(23) |
Note that all Eq. (21), Eq. (22), and Eq. (23) are based on Eq. (20). For the rapid computation of Eq. (20), SNS+vec and SNS+rnd incrementally maintain and , which are the -th and -th entries of , by Eq. (24) and Eq. (25) (lines 5 and 5). SNS+rnd also incrementally maintains , which is the -th entry of , by Eq. (26) (line 5).
(24) | ||||
(25) | ||||
(26) |
where and . Proofs of Eqs. (21)-(26) can be found in an online appendix [20].
Clipping: In order to prevent the entries of factor matrices from having extremely large or small absolute values, SNS+vec and SNS+rnd ensures that its absolute value is at most , which is a user-specific threshold. Specifically, if the updated entry is greater than , it is set to , and if the updated entry is smaller than , it is set to (lines 5 and 5 in Algorithm 5). Eq. (21) followed by clipping never increases the objective function in Eq. (19). 333Let , , and be before update, after being updated by Eq. (21), and after being clipped, respectively. The objective function in Eq. (19) is convex, minimized at , and symmetric around . holds.,444For Eq. (22) and Eq. (23), this is true only when is well approximated.
Pros and Cons: SNS+vec and SNS+rnd does not suffer from instability due to numerical errors, which SNSvec and SNSrnd suffer from. Moreover, as shown in Theorems 6 and 7, the time complexities of SNS+vec and SNS+rnd are lower than those of SNSvec and SNSrnd, respectively. Empirically, however, SNS+vec and SNS+rnd are slightly slower and less accurate than SNSvec and SNSrnd, respectively (see Section VI-C).
Theorem 6 (Time complexity of SNS+vec).
The time complexity of SNS+vec is
(27) |
Proof.
See Section II.E of the online appendix [20]. ∎
Name | Description | Size | # Non-zeros | Density |
---|---|---|---|---|
Divvy Bikes | sources destinations timestamps [minutes] | |||
Chicago Crime | communities crime types timestamps [hours] | |||
New York Taxi | sources destinations timestamps [seconds] | |||
Ride Austin | sources destinations colors timestamps [minutes] |
Theorem 7 (Time complexity of SNS+rnd).
If , then the time complexity of SNS+rnd is
(28) |
If , , and are regarded as constants, Eq. (28) is .
Proof.
See Section II.F of the online appendix [20]. ∎
VI Experiments
In this section, we design and review experiments to answer the following questions:
-
•
Q1. Advantages of Continuous CP Decomposition: What are the advantages of continuous CP decomposition over conventional CP decomposition?
-
•
Q2. Speed and Fitness: How rapidly and precisely does SliceNStitch fit the input tensor, compared to baselines?
-
•
Q3. Data Scalability: How does SliceNStitch scale with regard to the number of events?
-
•
Q4. Effect of Parameters: How do user-specific thresholds and affect the performance of SliceNStitch?
-
•
Q5. Practitioner’s Guide: Which versions of SliceNStitch do we have to use?
-
•
Q6. Application to Anomaly Detection: Can SliceNStitch spot abnormal events rapidly and accurately?
VI-A Experiment Specifications
Machine: We ran all experiments on a machine with a 3.7GHz Intel i5-9600K CPU and 64GB memory.
Datasets: We used four different real-world sparse tensor datasets summarized in Table II. They are sparse tensors with a time mode, and their densities vary from to .
Evaluation Metrics: We evaluated SliceNStitch and baselines using the following metrics:
-
•
Elapsed Time per Update: The average elapsed time for updating the factor matrices in response to each event.
-
•
Fitness (The higher the better): Fitness is a widely-used metric to evaluate the accuracy of tensor decomposition algorithms. It is defined as where is the input tensor, and (Eq. (1)) is its approximation.
-
•
Relative Fitness [16] (The higher the better): Relative fitness, which is defined as the ratio between the fitness of the target algorithm and the fitness of ALS, i.e.,
Recall that ALS (see Section II) is the standard batch algorithm for tensor decomposition.
Baselines: Since there is no previous algorithm for continuous CPD, we compared SliceNStitch with ALS, OnlineSCP [16], CP-stream [15], and NeCPD () with iterations [28], all of which are for conventional CPD (see Section VII). All baselines except ALS update factor matrices once per period (instead of whenever an event occurs).555We modified the baselines, which are for decomposing the entire tensor, to decompose the tensor window (see Definition 4), as SliceNStitch does. We implemented SliceNStitch and ALS in C++. We used the official implementation of OnlineSCP in MATLAB and that of CP-stream in C++.666https://shuozhou.github.io, https://github.com/ShadenSmith/splatt-stream We implemented NeCPD in MATLAB.
Experimental Setup: We set the hyperparameters as listed in Table III unless otherwise stated. We set the threshold for sampling to be smaller than half of the average degree of indices (i.e., the average number of non-zeros when fixing an index) in the initial tensor window. In each experiment, we initialized factor matrices using ALS on the initial tensor window, and we processed the events during time units. We measured relative fitness times.
VI-B Q1. Advantages of Continuous CP Decomposition
We compared the continuous CPD and conventional CPD, in terms of the update interval (i.e., the minimum interval between two consecutive updates), fitness, and the number of parameters, using the New York Taxi dataset. We used SNSrnd and fixed the period to hour for continuous CPD; and we used CP-stream, OnlineSCP, and ALS while varying (i.e., the granularity of the time mode) from second to hour for conventional CPD. Fig. 1 shows the result,777Before measuring the fitness of all baselines, we merged the rows of fine-grained time-mode factor matrices sequentially by adding entries so that one row corresponds to an hour. Without this postprocessing step, the fitness of the baselines was even lower than those reported in Fig. 1c. and we found Observation 1.
Observation 1 (Advantages of Continuous CPD).
Continuous CPD achieved (a) near-instant updates, (b) high fitness, and (c) a small number of parameters at the same time, while conventional CPD cannot. When the update interval was the same, continuous CPD achieved higher fitness with fewer parameters than conventional CPD. When they showed similar fitness, the update interval of continuous CPD was shorter than that of conventional CPD.
Name | (Period) | ||||
---|---|---|---|---|---|
Divvy Bikes | |||||
Chicago Crime | |||||
New York Taxi | |||||
Ride Austin |
VI-C Q2. Speed and Fitness








We compared the speed and fitness of all versions of SliceNStitch and the baseline methods. Fig. 4 shows how the relative fitness (i.e., fitness relative to ALS) changed over time, and Fig. 5 shows the average relative fitness and the average elapsed time for processing an event. We found Observations 2, 3, and 4.
Observation 2 (Significant Speed-ups).
All versions of SliceNStitch updated factor matrices significantly faster than the fastest baseline. For example, SNS+rnd and SNSmat were up to and faster than CP-stream, respectively.
Observation 3 (Effect of Clipping).
SNSvec and SNSrnd failed in some datasets due to numerical errors, as discussed in the last paragraph of Section V-C. SNS+vec and SNS+rnd, where clipping is used, successfully addressed this problem.
Observation 4 (Comparable Fitness).
All stable versions of SliceNStitch (i.e., SNS+vec, SNS+rnd, and SNSmat) achieved - fitness relative to the most accurate baseline.
VI-D Q3. Data Scalability
We measured how rapidly the total running time of different versions of SliceNStitch increase with respect to the number of events in Fig. 6. We found Observation 5.
Observation 5 (Linear Scalability).
The total runtime of all SliceNStitch versions was linear in the number of events.


(a) Divvy Bikes (b) Chicago Crime (c) New York Taxi (d) Ride Austin
VI-E Q4. Effect of Parameters
To investigate the effect of the threshold for sampling on the performance of SNSrnd and SNS+rnd, we measured their relative fitness and update time while varying from to of the default value in Table III.888We set to in the Chicago Crime dataset since setting it to led to unstable results. The results are reported in Fig. 7, and we found Observation 6.

Effect on Relative Fitness:
(a) Divvy Bikes (b) Chicago Crime (c) New York Taxi (d) Ride Austin
Effect on Speed (Elapsed Time per Update):
(e) Divvy Bikes (f) Chicago Crime (g) New York Taxi (h) Ride Austin
Observation 6 (Effect of ).
As increases (i.e., more indices are sampled), the fitness of SNSrnd and SNS+rnd increases with diminishing returns, while their runtime grows linearly.
In Fig. 8, we measured the effect of the threshold for clipping on the relative fitness of SNS+vec and SNS+rnd, while changing from to . Note that does not affect their speed. We found Observation 7.
Observation 7 (Effect of ).
The fitness of SNS+vec and SNS+rnd is insensitive to as long as is small enough.


(a) Divvy Bikes (b) Chicago Crime (c) New York Taxi (d) Ride Austin
VI-F Q5. Practitioner’s Guide
Based on the above theoretical and empirical results, we provide a practitioner’s guide for SliceNStitch’s users.
-
•
We do not recommend SNSvec and SNSrnd. They are prone to numerical errors and thus unstable.
-
•
We recommend using the version fitting the input tensor best within your runtime budget among SNSmat, SNS+vec, and SNS+rnd. There exists a clear trade-off between their speed and fitness. In terms of speed, SNS+rnd is the best followed by SNS+vec, and SNSmat is the slowest. In terms of fitness, SNSmat is the best followed by SNS+vec, and SNS+rnd is the most inaccurate.
-
•
If SNS+rnd is chosen, we recommend increasing as much as possible, within your runtime budget.
VI-G Q6. Application to Anomaly Detection
We applied SNS+rnd, OnlineSCP, and CP-stream to an anomaly detection task. In the New York Taxi dataset, we injected abnormally large changes (specifically, , which is times the maximum change in 1 second in the data stream) in randomly chosen entries. Then, we measured the Z scores of the errors in all entries in the latest tensor unit, where new changes arrive, as each method proceeded. After that, we investigated the top- Z-scores in each method. As summarized in Fig. 9, the precision, which is the same as the recall in our setting, was highest in SNS+rnd and OnlineSCP. More importantly, the average time gap between the occurrence and detection of the injected anomalies was about seconds in SNS+rnd, while that exceeded seconds in the others, which have to wait until the current period ends to update CPD.
VII Related Work
In this section, we review related work on online CP decomposition (CPD) and window-based tensor analysis. Then, we briefly discuss the relation between CPD and machine learning. See [19, 29] for more models, solvers, and applications.
VII-A Online Tensor Decomposition
Nion et al. [25] proposed Simultaneously Diagonalization Tracking (SDT) and Recursively Least Squares Tracking (RLST) for incremental CP decomposition (CPD) of three-mode dense tensors. Specifically, SDT incrementally tracks the SVD of the unfolded tensor, while RLST recursively updates the factor matrices to minimize the weighted squared error. A limitation of the algorithms is that they are only applicable to three-mode dense tensors. Gujral et al. [17] proposed a sampling-based method called SamBaTen for incremental CPD of three-mode dense and sparse tensors. Zhou et al. proposed onlineCP [27] and onlineSCP [16] for incremental CPD of higher-order dense tensors and sparse tensors, respectively. Smith et al. [15] proposed an online CPD algorithm that can be extended to include non-negativity and sparsity constraints. It is suitable for both sparse and dense tensors. SGD-based methods [28, 30] have also been developed for online CPD. Specifically, Ye et al. [30] proposed one for Poisson-distributed count data with missing values and Anaissi et al. [28] employed Nesterov’s Accelerated Gradient method into SGD. Sobral et al. [31] developed an online framework for subtracting background pixels from multispectral video data. However, all these algorithms process every new entry with the same time-mode index (e.g., a slice in Fig. 1a) at once. They are not applicable to continuous CPD (Problem 2), where changes in entries need to be processed instantly.
BICP [32] efficiently updates block-based CPD [33, 34] when the size of the input tensor is fixed but some existing entries change per update. It requires partitioned subtensors and their CPDs rather than the CPD of the entire input tensor.
Moreover, several algorithms have been developed for incrementally updating the outputs of matrix factorization [35, 36], Bayesian probabilistic CP factorization [37], and generalized CPD [38], when previously unknown entries are revealed afterward. They are also not applicable to continuous CPD (Problem 2), where even increments and decrements of revealed entries need to be handled (see Definition 6).


Precision @ Top-20 | Time Gap between Occurrence and Detection | |
---|---|---|
SNS+rnd | 0.80 | 0.0015 seconds |
OnlineSCP | 0.80 | 1601.00 seconds |
CP-stream | 0.70 | 1424.57 seconds |
Lastly, it is worth mentioning that there have been several studies on approximation properties of some offline CPD algorithms. Haupt et al. [39] proved a sufficient condition for a sparse random projection technique to solve the low-rank tensor regression problem efficiently with an approximation quality. Song et al. [40] showed that an importance-sampling based orthogonal tensor decomposition algorithm achieves a sublinear time complexity with provable guarantees. To the best of our knowledge, however, there has been limited work on theoretical properties of online CPD of tensor streams.
VII-B Window-based Tensor Analysis
Sun et al. [22, 23] first suggested the concept of window-based tensor analysis (WTA). Instead of analyzing the entire tensor at once, they proposed to analyze a temporally adjacent subtensor within a time window at a time, while sliding the window. Based on the sliding window model, they devised an incremental Tucker decomposition algorithm for tensors growing over time. Xu et al. [24] also suggested a Tucker decomposition algorithm for sliding window tensors and used it to detect anomalies in road networks. Zhang et al. [26] used the sliding window model with exponential weighting for robust Bayesian probabilistic CP factorization and completion. Note that all these studies assume a time window moves ‘discretely’, while in our continuous tensor model, a time window moves ‘continuously’, as explained in Section IV.
VII-C Relation to Machine Learning
CP decomposition (CPD) has been a core building block of numerous machine learning (ML) algorithms, which are designed for classification [41], weather forecast [14], recommendation [11], stock price prediction [13], to name a few. Moreover, CPD has proven useful for outlier removal [42, 43], imputation [12, 43], and dimensionality reduction [19], and thus it can be used as a preprocessing step of ML algorithms, many of which are known to be vulnerable to outliers, missings, and the curse of dimensionality. We refer the reader to [44] for more roles of tensor decomposition for ML. By making the core building block “real-time”, our work represents a step towards real-time ML. Moreover, SliceNStitch can directly be used as a preprocessing step of existing streaming ML algorithms.
VIII Conclusion
In this work, we propose SliceNStitch, aiming to make tensor analysis “real-time” and applicable to time-critical applications. We summarize our contributions as follows:
- •
-
•
Fast online algorithms: We propose a family of online algorithms for CPD in the continuous tensor model (Section V). They update factor matrices in response to changes in an entry up to faster than online competitors, with fitness even comparable (spec., -) to offline competitors (Fig. 5). We analyze their complexities (Theorems 3-7).
-
•
Extensive experiments: We evaluate the speed, fitness, and scalability of our algorithms on real-world sparse tensors. We analyze the effects of hyperparameters. The results indicate a clear trade-off between speed and fitness, based on which we provide practitioner’s guides (Section VI).
Reproducibility: The code and datasets used in the paper are available at https://github.com/DMLab-Tensor/SliceNStitch.
Acknowledgement
This work was supported by Samsung Electronics Co., Ltd., Disaster-Safety Platform Technology Development Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. 2019M3D7A1094364), and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)).
References
- [1] L. Zhao and M. J. Zaki, “Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data,” in SIGMOD, 2005.
- [2] D. Koutra, E. E. Papalexakis, and C. Faloutsos, “Tensorsplat: Spotting latent anomalies in time,” in PCI, 2012.
- [3] Y. Cai, H. Tong, W. Fan, P. Ji, and Q. He, “Facets: Fast comprehensive mining of coevolving high-order time series,” in KDD, 2015.
- [4] B. W. Bader, M. W. Berry, and M. Browne, “Discussion tracking in enron email using parafac,” in Survey of Text Mining II. Springer, 2008, pp. 147–163.
- [5] D. Bruns-Smith, M. M. Baskaran, J. Ezick, T. Henretty, and R. Lethin, “Cyber security through multidimensional data decompositions,” in CYBERSEC, 2016.
- [6] H. Fanaee-T and J. Gama, “Tensor-based anomaly detection: An interdisciplinary survey,” Knowledge-Based Systems, vol. 98, pp. 130–147, 2016.
- [7] F. L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” Journal of Mathematics and Physics, vol. 6, no. 1-4, pp. 164–189, 1927.
- [8] J. D. Carroll and J.-J. Chang, “Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition,” Psychometrika, vol. 35, no. 3, pp. 283–319, 1970.
- [9] R. A. Harshman, “Parafac2: Mathematical and technical notes,” UCLA working papers in phonetics, vol. 22, no. 3044, p. 122215, 1972.
- [10] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
- [11] L. Yao, Q. Z. Sheng, Y. Qin, X. Wang, A. Shemshadi, and Q. He, “Context-aware point-of-interest recommendation using tensor factorization with social regularization,” in SIGIR, 2015.
- [12] K. Shin, L. Sael, and U. Kang, “Fully scalable methods for distributed tensor factorization,” TKDE, vol. 29, no. 1, pp. 100–113, 2016.
- [13] A. Spelta, “Financial market predictability with tensor decomposition and links forecast,” Applied network science, vol. 2, no. 1, p. 7, 2017.
- [14] J. Xu, X. Liu, T. Wilson, P.-N. Tan, P. Hatami, and L. Luo, “Muscat: Multi-scale spatio-temporal learning with application to climate modeling.” in IJCAI, 2018.
- [15] S. Smith, K. Huang, N. D. Sidiropoulos, and G. Karypis, “Streaming tensor factorization for infinite data sources,” in SDM, 2018.
- [16] S. Zhou, S. Erfani, and J. Bailey, “Online cp decomposition for sparse tensors,” in ICDM, 2018.
- [17] E. Gujral, R. Pasricha, and E. E. Papalexakis, “Sambaten: Sampling-based batch incremental tensor decomposition,” in SDM, 2018.
- [18] R. Pasricha, E. Gujral, and E. E. Papalexakis, “Adaptive granularity in tensors: A quest for interpretable structure,” arXiv preprint arXiv:1912.09009, 2019.
- [19] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
- [20] (2020) Supplementary document. [Online]. Available: https://github.com/DMLab-Tensor/SliceNStitch/blob/master/doc/supplementary.pdf
- [21] R. A. Harshman et al., “Foundations of the parafac procedure: Models and conditions for an” explanatory” multimodal factor analysis,” 1970.
- [22] J. Sun, S. Papadimitriou, and S. Y. Philip, “Window-based tensor analysis on high-dimensional and multi-aspect streams,” in ICDM, 2006.
- [23] J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos, “Incremental tensor analysis: Theory and applications,” ACM Transactions on Knowledge Discovery from Data, vol. 2, no. 3, pp. 1–37, 2008.
- [24] M. Xu, J. Wu, H. Wang, and M. Cao, “Anomaly detection in road networks using sliding-window tensor factorization,” T-ITS, vol. 20, no. 12, pp. 4704–4713, 2019.
- [25] D. Nion and N. D. Sidiropoulos, “Adaptive algorithms to track the parafac decomposition of a third-order tensor,” TSP, vol. 57, no. 6, pp. 2299–2310, 2009.
- [26] Z. Zhang and C. Hawkins, “Variational bayesian inference for robust streaming tensor factorization and completion,” in ICDM, 2018.
- [27] S. Zhou, N. X. Vinh, J. Bailey, Y. Jia, and I. Davidson, “Accelerating online cp decompositions for higher order tensors,” in KDD, 2016.
- [28] A. Anaissi, B. Suleiman, and S. M. Zandavi, “Necpd: An online tensor decomposition with optimal stochastic gradient descent,” arXiv preprint arXiv:2003.08844, 2020.
- [29] E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos, “Tensors for data mining and data fusion: Models, applications, and scalable algorithms,” ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 2, pp. 1–44, 2016.
- [30] C. Ye and G. Mateos, “Online tensor decomposition and imputation for count data.” in DSW, 2019.
- [31] A. Sobral, S. Javed, S. Ki Jung, T. Bouwmans, and E.-h. Zahzah, “Online stochastic tensor decomposition for background subtraction in multispectral video sequences,” in ICCVW, 2015.
- [32] S. Huang, K. S. Candan, and M. L. Sapino, “Bicp: block-incremental cp decomposition with update sensitive refinement,” in CIKM, 2016.
- [33] A. H. Phan and A. Cichocki, “Parafac algorithms for large-scale problems,” Neurocomputing, vol. 74, no. 11, pp. 1970–1984, 2011.
- [34] X. Li, S. Huang, K. S. Candan, and M. L. Sapino, “2pcp: Two-phase cp decomposition for billion-scale dense tensors,” in ICDE, 2016.
- [35] X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua, “Fast matrix factorization for online recommendation with implicit feedback,” in SIGIR, 2016.
- [36] R. Devooght, N. Kourtellis, and A. Mantrach, “Dynamic matrix factorization with priors on unknown values,” in KDD, 2015.
- [37] Y. Du, Y. Zheng, K.-c. Lee, and S. Zhe, “Probabilistic streaming tensor decomposition,” in ICDM, 2018.
- [38] S. Zhou, S. M. Erfani, and J. Bailey, “Sced: A general framework for sparse tensor decomposition with constraints and elementwise dynamic learning,” in ICDM, 2017.
- [39] J. Haupt, X. Li, and D. P. Woodruff, “Near optimal sketching of low-rank tensor regression,” in NIPS, 2017.
- [40] Z. Song, D. P. Woodruff, and H. Zhang, “Sublinear time orthogonal tensor decomposition,” in NIPS, 2016.
- [41] S. Rendle, “Factorization machines,” in ICDM, 2010.
- [42] M. Najafi, L. He, and S. Y. Philip, “Outlier-robust multi-aspect streaming tensor completion and factorization.” in IJCAI, 2019.
- [43] D. Lee and K. Shin, “Robust factorization of real-world tensor streams with patterns, missing values, and outliers,” in ICDE, 2021.
- [44] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, “Tensor decomposition for signal processing and machine learning,” TSP, vol. 65, no. 13, pp. 3551–3582, 2017.