Robust Projection based Anomaly Extraction (RPE) in Univariate Time-Series

Abstract

This paper presents a novel, closed-form, and data/computation efficient online anomaly detection algorithm for time-series data. The proposed method, dubbed RPE, is a window based method and in sharp contrast to the existing window based methods, it is robust to the presence of anomalies in its window and it can distinguish the anomalies in time-stamp level. RPE leverages the linear structure of the trajectory matrix of the time-series and employs a robust projection step which makes the algorithm able to handle the presence of multiple arbitrarily large anomalies in its window. A closed-form/non-iterative algorithm for the robust projection step is provided and it is proved that it can identify the corrupted time-stamps in the presence of arbitrary large anomalies and noise. The proposed method is a great candidate for the applications where a large training data is not available which is the common scenario in the era of time-series. An extensive set of numerical experiments show that RPE can outperform the existing approaches with a notable margin.

I Introduction

Anomaly/outlier detection is one of the main research problems in unsupervised learning where the main task is to identify the elements/points/samples in the given data which do not follow the common structure of the data. Outliers mostly correspond to important rare events whose detection is of paramount importance. For instance, in the medical imaging, the outlying patches could correspond to malignant tissues and in computer networks, an anomaly can imply an intrusion attack.

In many applications and businesses, the given data is a time-series of real values where each value represents the value of an important metric/sensor/measurement at a time-stamp and the values are mostly sampled with a fixed frequency. In this applications, it is important to identify if (based on the past observation) a given time-stamp value is an anomaly (or if it is a part of an anomalous pattern) and the problem is referred to as online anomaly detection. An anomaly detection algorithm for time-series data is preferred to have two desirable properties: (a) The algorithm should be able to declare an anomaly without any delay. In other word, it should only rely on the part observations and the algorithm should not wait for the future samples. (b) If an anomaly happens in time-stamp $t$ , the algorithm should only label time-stamp $t$ as anomaly and the anomaly label should not be diffused to the neighboring time-stamps. We will discuss this feature further in the next sections since the window based time-series anomaly detection methods can keep declaring a single anomaly for a long period of time (as long as their window contains the anomaly).

This paper presents a new window based algorithm which in sharp contrast to the current window based methods, it is provably robust to the presence of anomalies in its window and the algorithm, dubbed RPE, can distinguish outlying behaviors in time-stamp level. The main contributions of this work can be summarized as follows. $\bullet$ We present an accurate and data/computationally efficient algorithm which is based on transforming the problem of anomaly detection in univariate time series into robust projection into a linear manifold. $\bullet$ To the best of our knowldge, RPE is the only window based method which is provably robust to the presence of outlying time-stamps in its window. $\bullet$ A closed form algorithm for the robust projection step is presented which saves the algorithm from running an iterative solver for each time-stamp. $\bullet$ Several theoretical results are established which guarantee the performance of the robust projection step. $\bullet$ It is shown via an extensive set of numerical experiments that RPE can notably outperform the former approaches, specially when the time-series contains a underlying pattern.

I-A Definitions and notation

Bold-face upper-case and lower-case letters are used to denote matrices and vectors, respectively. For a vector $\ba$ , $\|\ba\|_{p}$ denotes its $\ell_{p}$ -norm, $\ba(i)$ denotes its $i^{\text{th}}$ element, and $|\ba|$ is a vector whose values are equal to the absolute value of the corresponding elements in $\ba$ . Similarly, the elements of $\bY=|\bX|$ are equal to the absolute value of the elements of matrix $\bX$ . For a matrix $\bA$ , $\bA^{T}$ is the transpose of $\bA$ and $\ba_{i}$ indicates the $i^{th}$ row of $\bA$ . $\mathbb{S}^{M_{1}-1}$ indicates the unit $\ell_{2}$ -norm sphere in $\mathbb{R}^{M_{1}}$ . Subspace $\calU^{\perp}$ is called the complement of $\calU$ when the dimension of $\calU^{\perp}\oplus\calU$ is equal to the dimension of the ambient space and $\calU^{\perp}$ is orthogonal to $\calU$ where $\oplus$ denotes the direct sum operator. For a set $\calI$ , $|\calI|$ indicates the cardinality of $\calI$ .

The coherency of a subspace $\calU$ is defined as a measure of the sparsity of the vectors which lie in $\calU$ [chandrasekaran2011rank]. The following definition provides three metrics which can represent the coherency of a linear subspace. {definition} Suppose $\bU\in\mathbb{R}^{M_{1}\times r}$ is an orthonormal matrix. Then we define

			(1)
			(2)
			(3)

where\mathbb

S^r-1 $istheunitsphereinthe$ r $-dimensionalspaceand$ \be_i $isthe$ i^th $rowoftheidentitymatrix$ \bI∈\mathbbR^M_1 ×M_1 $.\par\noindent Theparameter$ μ^2(\bU) $showshowcloseis$ \calU $tothestandardbasisand$ γ(\bU) $measuresthe$ ℓ_1 $-normofallthevectorsontheintersectionof$ \mathbbS^r-1 $and$ \calU $.Clearly,themoresparsearethevectorsin$ \calU $,thelargeris$ μ^2(\bU) $and$ γ(\bU) $.Theparameter$ κ(\bU) $issimplydefinedastheirmultiplication.\par Supposewehaveasetofsamplesfromarandomvariablesavedinset$ \calM $.TheempiricalCDFvaluecorrespondingtoasample$ \bx $isdefinedas$ —\calM¡ x——\calM— $where$ — \calM¡ x— $isthenumberofelementsin$ \calM $whicharesmallerthan$ x $and$ —\calM— $isthesizeof$ \calM $.Thetrajectorymatrix$ \bX∈\mathbbR^M_1 ×(n - M_1 + 1) $correspondingtotime-series$ \bt∈\mathbbR^n $isdefinedas$ $\bX=\bmatrix\bt(1$ &\bt(2)\text…\bt(n-M1+1) \bt(2)\bt(3)\text…\bt(n-M1+2) ⋮⋮⋮ \bt(M1)\bt(M1+1)\text…\bt(n)

$and$ M_1 $isthesizeoftherunningwindow.\par\par\par\par\par\algorithm\@@generic@caption{RPEforanomalydetectioninunivariatetime-series}$ 1.1Ifthelengthof\bttrislargerthantmax,discardthefirstn-tmaxsamples. 1.2Buildtrajectorymatrix\bX∈\mathbbRM1×M2using\bttr. 1.3Applytherobustsubspacerecoveryalgorithm(e.g.,Algorithm3)to\bXtoestimate\calUanddefine\bU∈\mathbbRM1×rasanorthonormalbasisfor\calU. Remark.Therankof\bXisestimatedbythesubspacerecoveryalgorithm. 1.4Set\calMasanemptymemoryandpopulateitwiththeresidualvaluesofthesamplesin\bttrusingthesameprocedureusedinStep2.2toStep2.4.

1.5Setcounterc=0.

2. Online Inference.Foranynewtime-stampvaluevdo: 2.1Appendvto\bttrandupdatec=c+1. 2.2Build\bx∈\mathbbRM1