This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fast and Accurate Anomaly Detection in Dynamic Graphs
with a Two-Pronged Approach

Minji Yoon11footnotemark: 1,  Bryan Hooi22footnotemark: 2,  Kijung Shin33footnotemark: 3,  Christos Faloutsos11footnotemark: 1 11footnotemark: 1 School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA22footnotemark: 2 School of Computing, National University of Singapore, Singapore33footnotemark: 3 School of Electrical Engineering, KAIST, South Korea[email protected], [email protected], [email protected], [email protected]
(2019)
Abstract.

Given a dynamic graph stream, how can we detect the sudden appearance of anomalous patterns, such as link spam, follower boosting, or denial of service attacks? Additionally, can we categorize the types of anomalies that occur in practice, and theoretically analyze the anomalous signs arising from each type?

In this work, we propose AnomRank, an online algorithm for anomaly detection in dynamic graphs. AnomRank uses a two-pronged approach defining two novel metrics for anomalousness. Each metric tracks the derivatives of its own version of a ‘node score’ (or node importance) function. This allows us to detect sudden changes in the importance of any node. We show theoretically and experimentally that the two-pronged approach successfully detects two common types of anomalies: sudden weight changes along an edge, and sudden structural changes to the graph. AnomRank is (a) Fast and Accurate: up to 49.5×\times faster or 35% more accurate than state-of-the-art methods, (b) Scalable: linear in the number of edges in the input graph, processing millions of edges within 2 seconds on a stock laptop/desktop, and (c) Theoretically Sound: providing theoretical guarantees of the two-pronged approach.

journalyear: 2019copyright: acmcopyrightconference: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 4–8, 2019; Anchorage, AK, USAbooktitle: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), August 4–8, 2019, Anchorage, AK, USAprice: 15.00doi: 10.1145/3292500.3330946isbn: 978-1-4503-6201-6/19/08

1. Introduction

Network-based systems including computer networks and social network services have been a focus of various attacks. In computer networks, distributed denial of service (DDOS) attacks use a number of machines to make connections to a target machine to block their availability. In social networks, users pay spammers to ”Like” or ”Follow” their page to manipulate their public trust. By abstracting those networks to a graph, we can detect those attacks by finding suddenly emerging anomalous signs in the graph.

Various approaches have been proposed to detect anomalies in graphs, the majority of which focus on static graphs (Akoglu et al., 2010; Chakrabarti, 2004; Hooi et al., 2017; Jiang et al., 2016; Kleinberg, 1999; Shin et al., 2018a; Tong and Lin, 2011). However, many real-world graphs are dynamic, with timestamps indicating the time when each edge was inserted/deleted. Static anomaly detection methods, which are solely based on static connections, miss useful temporal signals of anomalies.

Several approaches (Eswaran et al., 2018; Shin et al., 2017; Eswaran and Faloutsos, 2018) have been proposed to detect anomalies on dynamic graphs (we review these in greater detail in Section 2 and Table 1). However, they are not satisfactory in terms of accuracy and speed. Accurately detecting anomalies in near real-time is important in order to cut back the impact of malicious activities and start recovery processes in a timely manner.

Refer to caption
(a) Speed and precision of AnomRank
Refer to caption
(b) Scalability of AnomRank
Refer to caption
(c) Effectiveness of AnomRank
Figure 1. AnomRank is accurate, fast, and scalable: (a) Precision at top-100 and running time on the DARPA dataset: AnomRank is significantly faster than state-of-the-art methods while achieving higher accuracy. (b) AnomRank scales linearly with the number of edges in the input dynamic graph. (c) AnomRank computes anomaly scores (green line) on the DARPA dataset. Red crosses indicate ground truth anomalies; the blue line is an anomalousness threshold. 7777% of green spikes above the blue line are true positives; see Section 5 for details.

In this paper, we propose AnomRank, a fast and accurate online algorithm for detecting anomalies in dynamic graphs with a two-pronged approach. We classify anomalies in dynamic graphs into two types: AnomalyS and AnomalyW. AnomalyS denotes suspicious changes to the structure of the graph, such as through the addition of edges between previously unrelated nodes in spam attacks. AnomalyW indicates anomalous changes in the weight (i.e. number of edges) between connected nodes, such as suspiciously frequent connections in port scan attacks.

Various node score functions have been proposed to map each node of a graph to an importance score: PageRank (Page et al., 1999), HITS and its derivatives (SALSA) (Kleinberg, 1999; Lempel and Moran, 2001), Fiedler vector (Chung, 1994), etc. Our intuition is that anomalies induce sudden changes in node scores. Based on this intuition, AnomRank focuses on the 1st and 2nd order derivatives of node scores to detect anomalies with large changes in node scores within a short time. To detect AnomalyS and AnomalyW effectively, we design two versions of node score functions based on characteristics of these two types of anomalies. Then, we define two novel metrics, AnomRankS and AnomRankW, which measure the 1st and 2nd order derivatives of our two versions of node score functions, as a barometer of anomalousness. We theoretically analyze the effectiveness of each metric on the corresponding anomaly type, and provide rigid guarantees on the accuracy of AnomRank. Through extensive experiments with real-world and synthetic graphs, we demonstrate the superior performance of AnomRank over existing methods. Our main contributions are:

  • Online, two-pronged approach: We introduce AnomRank, an online detection method, for two types of anomalies (AnomalyS, AnomalyW) in dynamic graphs.

  • Theoretical guarantees: We prove the effectiveness of our proposed metrics, AnomRankS and AnomRankW, theoretically (Theorems 1 and 2).

  • Practicality: Experiments on public benchmarks show that AnomRank outperforms state-of-the-art competitors, being up to 49.5×49.5\times faster or 35%35\% more accurate (Figure 1). Moreover, thanks to its two-pronged approach, it spots anomalies that the competitors miss (Figure 4).

Reproducibility: our code and data are publicly available111https://github.com/minjiyoon/anomrank. The paper is organized in the usual way (related work, preliminaries, proposed method, experiments, and conclusions).

2. Related Work

We discuss previous work on detecting anomalous entities (nodes, edges, events, etc.) on static and dynamic graphs. See (Akoglu et al., 2015) for an extensive survey on graph-based anomaly detection.

Anomaly detection in static graphs can be described under the following categories:

  • Anomalous Node Detection: (Akoglu et al., 2010) extracts egonet-based features and finds empirical patterns with respect to the features. Then, it identifies nodes whose egonets deviate from the patterns. (Xu et al., 2007) groups nodes that share many neighbors and spots nodes that cannot be assigned to any community.

  • Anomalous Edge Detection: (Chakrabarti, 2004) encodes the input graph based on similar connectivity between nodes, then spots edges whose removal significantly reduces the total encoding cost. (Tong and Lin, 2011) factorizes the adjacency matrix and flags edges which introduce high reconstruction error as outliers.

  • Anomalous Subgraph Detection: (Hooi et al., 2017) and (Shin et al., 2018a) measure the anomalousness of nodes and edges, then find a dense subgraph consisting of many anomalous nodes and edges.

Anomaly detection in dynamic graphs can also be described under the following categories:

  • Anomalous Node Detection: (Sun et al., 2006) approximates the adjacency matrix of the current snapshot based on incremental matrix factorization. Then, it spots nodes corresponding to rows with high reconstruction error. (Wang et al., 2015) computes nodes features (degree, closeness centrality, etc) in each graph snapshot. Then, it identifies nodes whose features are notably different from their previous values and the features of nodes in the same community.

  • Anomalous Edge Detection: (Eswaran and Faloutsos, 2018) detects edges that connect sparsely-connected parts of a graph. (Ranshous et al., 2016) spots edge anomalies based on their occurrence, preferential attachment and mutual neighbors.

  • Anomalous Subgraph Detection: (Beutel et al., 2013) spots near-bipartite cores where each node is connected to others in the same core densly within a short time. (Jiang et al., 2016) and (Shin et al., 2018b) detect groups of nodes who form dense subgraphs in a temporally synchronized manner. (Shin et al., 2017) identifies dense subtensors created within a short time.

  • Event Detection: (Eswaran et al., 2018; Aggarwal et al., 2011; Koutra et al., 2016; Henderson et al., 2010) detect the following events: sudden appearance of many unexpected edges (Aggarwal et al., 2011), sudden appearance of a dense graph (Eswaran et al., 2018), sudden drop in the similarity between two consecutive snapshots (Koutra et al., 2016), and sudden prolonged spikes and lightweight stars (Henderson et al., 2010).

Our proposed AnomRank is an anomalous event detection method with fast speed and high accuracy. It can be easily extended to localize culprits of anomalies into nodes and substructures (Section 4.4), and it detects various types of anomalies in dynamic graphs in a real-time. Table 1 compares AnomRank to existing methods.

Table 1. AnomRank out-features competitors: comparison of our proposed AnomRank and existing methods for anomaly detection in dynamic graphs.
Property Method

Oddball (Akoglu et al., 2010)

MetricFor. (Henderson et al., 2010)

CC, CS(Beutel et al., 2013; Jiang et al., 2016)

DenseAlert (Shin et al., 2017)

SpotLight (Eswaran et al., 2018)

SedanSpot (Eswaran and Faloutsos, 2018)

AnomRank

Real-time detection
Allow edge deletions
Structural anomalies
Edge weight anomalies
  • compute 11M edges within 55 seconds.

3. Preliminaries

Table 2 gives a list of symbols and definitions.

Various node score functions have been designed to estimate importance (centrality, etc.) of nodes in a graph: PageRank (Page et al., 1999); HITS (Kleinberg, 1999) and its derivatives (SALSA) (Lempel and Moran, 2001); Fiedler vector (Chung, 1994); all the centrality measures from social network analysis (eigenvector-, degree-, betweeness-centrality (Wasserman and Faust, 1994)). Among them, we extend PageRank to design our node score functions in Section 4.2 because (a) it is fast to compute, (b) it led to the ultra-successful ranking method of Google, (c) it is intuitive (‘your importance depends on the importance of your neighbors’). Next, we briefly review PageRank and its incremental version in dynamic graphs.

PageRank

As shown in (Yoon et al., 2018b), PageRank scores for all nodes are represented as a PageRank score vector 𝐩\mathbf{p} which is defined by the following equation:

𝐩=(1c)i=0(c𝐀~)i𝐛\mathbf{p}=(1-c)\sum_{i=0}^{\infty}(c\mathbf{\tilde{A}}^{\top})^{i}\mathbf{b}

where cc is a damping factor, 𝐀~\mathbf{\tilde{A}} is the row-normalized adjacency matrix, and 𝐛\mathbf{b} is the starting vector. This equation is interpreted as a propagation of scores across a graph: initial scores in the starting vector 𝐛\mathbf{b} are propagated across the graph by multiplying with 𝐀~\mathbf{\tilde{A}}^{\top}; since the damping factor cc is smaller than 11, propagated scores converge, resulting in PageRank scores. As shown in (Yoon et al., 2018a), PageRank computation time is proportional to the L1 length of the starting vector 𝐛\mathbf{b}, since small L1 length of 𝐛\mathbf{b} leads to faster convergence of iteration (i=0(c𝐀~)i\sum_{i=0}^{\infty}(c\mathbf{\tilde{A}}^{\top})^{i}) with damping factor cc. Here L1 length of a vector is defined as the sum of absolute values of its entries. The L1 length of a matrix is defined as the maximum L1 length of its columns.

Incremental PageRank

When edges are inserted or deleted, PageRank scores can be updated incrementally from the previous PageRank scores. Let 𝐀~\mathbf{\tilde{A}} be the row-normalized adjacency matrix of a graph GG and 𝐁~\mathbf{\tilde{B}} be the row-normalized adjacency matrix after a change ΔG\Delta G happened during Δt\Delta t. From now on, denote Δ𝐀=𝐁~𝐀~\Delta\mathbf{A}=\mathbf{\tilde{B}}^{\top}-\mathbf{\tilde{A}}^{\top}, the difference between transpose of normalized matrices 𝐀~\mathbf{\tilde{A}}^{\top} and 𝐁~\mathbf{\tilde{B}}^{\top}.

Lemma 1 (Dynamic PageRank, Theorem 3.2 in (Yoon et al., 2018a)).

Given updates Δ𝐀\Delta\mathbf{A} in a graph during Δt\Delta t, an updated PageRank vector 𝐩(t+Δt)\mathbf{p}(t+\Delta t) is computed incrementally from a previous PageRank vector 𝐩(t)\mathbf{p}(t) as follows:

𝐩(t+Δt)=𝐩(t)+k=0(c(𝐀~+Δ𝐀))kcΔ𝐀𝐩(t)\mathbf{p}(t+\Delta t)=\mathbf{p}(t)+\sum_{k=0}^{\infty}(c(\mathbf{\tilde{A}}^{\top}+\Delta\mathbf{A}))^{k}c\Delta\mathbf{A}\mathbf{p}(t)

Note that, for small changes in a graph, the L1 length of the starting vector cΔ𝐀𝐩(t)c\Delta\mathbf{A}\mathbf{p}(t) is much smaller than 11, the L1 length of the starting vector 𝐛\mathbf{b} in the static PageRank equation, resulting in much faster convergence.

Table 2. Table of symbols.
Symbol Definition
GG (un)directed and (un)weighted input graph
ΔG\Delta G update in graph
n,mn,m numbers of nodes and edges in GG
𝐀~\mathbf{\tilde{A}} (n×nn\times n) row-normalized adjacency matrix of GG
𝐁~\mathbf{\tilde{B}} (n×nn\times n) row-normalized adjacency matrix of G+ΔGG+\Delta G
Δ𝐀\Delta\mathbf{A} (n×nn\times n) difference between 𝐀~\mathbf{\tilde{A}}^{\top} and 𝐁~\mathbf{\tilde{B}}^{\top} (=𝐁~𝐀~=\mathbf{\tilde{B}}^{\top}-\mathbf{\tilde{A}}^{\top})
cc damping factor of PageRank
𝐛s\mathbf{b}_{s} (n×1n\times 1) uniform starting vector
𝐛w\mathbf{b}_{w} (n×1n\times 1) out-edge proportional starting vector
𝐀~s\mathbf{\tilde{A}}_{s} (n×nn\times n) row-normalized unweighted adjacency matrix
𝐀~w\mathbf{\tilde{A}}_{w} (n×nn\times n) row-normalized weighted adjacency matrix

4. Proposed Method

Node scores present the importance of nodes across a given graph. Thus, as the graph evolves under normal behavior with the insertion and deletion of edges, node scores evolve smoothly. In contrast, anomalies such as network attacks or rating manipulation often aim to complete their goals in a short time, e.g. to satisfy their clients, inducing large and abrupt changes in node scores. Our key intuition is that such abrupt gains or losses are reflected in the 1st and 2nd derivative of node scores: large 1st derivative identifies large changes, while large 2nd derivative identifies abrupt changes in the trend of the data, thereby distinguishing changes from normal users who evolve according to smooth trends. Thus, tracking 1st and 2nd order derivatives helps detect anomalies in dynamic graphs.

Changes in a dynamic graph are classified into two types: structure changes and edge weight changes. Since these two types of changes affect node scores of the graph differently (more details in Section 4.2), we need to handle them separately. Thus, first, we classify anomalies in dynamic graphs into two types: AnomalyS and AnomalyW (Section 4.1). Then we design two node score functions based on characteristics of these two types of anomalies, respectively (Section 4.2). Next, we define two novel metrics for anomalousness using 1st and 2nd order derivatives of our node scores, and verify the effectiveness of each metric on the respective type of anomalies theoretically (Section 4.3). Based on these analyses, we introduce our method AnomRank, a fast and accurate anomaly detection algorithm in dynamic graphs (Section 4.4).

Refer to caption
(a) Structure Change
Refer to caption
(b) Edge Weight Change
Figure 2. Two-pronged approach: Changes in dynamic graphs are classified into two types, structure change and edge weight change.

4.1. Anomalies in Dynamic Graphs

We classify anomalies in dynamic graphs into two types: AnomalyS and AnomalyW.

4.1.1. AnomalyS

It is suspicious if a number of edges are inserted/deleted among nodes which are previously unrelated/related. Hence, AnomalyS denotes a massive change with regard to the graph structure. One example of AnomalyS is spam in mail network graphs: a spammer sends mail to many unknown individuals, generating out-edges toward previously unrelated nodes. Data exfiltration attacks in computer network graphs are another example: attackers transfer a target’s data stealthily, generating unseen edges around a target machine to steal information. As illustrated in Figure 2, we define a structure change as follows:

Definition 1 (Structure Change).

If a node uu changes the destination of Δm\Delta m of its out-edges from previous neighbors v1,,vΔmv_{1},\dots,v_{\Delta m} to new neighbors v1,,vΔmv_{1}^{\prime},\dots,v_{\Delta m}^{\prime}, we call the change a structure change of size Δm\Delta m.

With abnormally large Δm\Delta m, a structure change becomes an AnomalyS. To detect AnomalyS, we need to focus on the existence of edges between two nodes, rather than the number of occurrences of edges between two nodes.

4.1.2. AnomalyW

In dynamic graphs, an edge between two nodes could occur several times. Edge weight is proportional to the number of edge occurrences. AnomalyW denotes a massive change of edge weights in a graph. One example of AnomalyW is port scan attacks in computer network graphs: to scan ports in a target IP address, attackers repeatedly connect to the IP address, thus increasing the number of edge occurrences to the target node. On Twitter, high edge density on a user-keyword graph could indicate bot-like behavior, e.g. bots posting about the same content repeatedly. As illustrated in Figure 2, we define an edge weight change as follows:

Definition 2 (Edge Weight Change).

If a node uu adds/subtracts Δm\Delta m out-edges to neighbor node vv, we call the change an edge weight change of size Δm\Delta m.

With abnormally large Δm\Delta m, an edge weight change becomes an AnomalyW. In contrast to AnomalyS, here we focus on the number of occurrences of each edge, rather than only the presence or absence of an edge.

4.2. Node Score Functions for Detecting AnomalyS and AnomalyW

To detect AnomalyS and AnomalyW, we first define two node score functions, ScoreS and ScoreW, which we use to define our anomalousness metrics in Section 4.3.

4.2.1. ScoreS

We introduce node score ScoreS, which we use to catch AnomalyS. Define the row-normalized unweighted adjacency matrix 𝐀~s\mathbf{\tilde{A}}_{s}, a starting vector 𝐛s\mathbf{b}_{s} which is an all-1n\frac{1}{n} vector of length nn (the number of nodes), and the damping factor cc.

Definition 3 (ScoreS).

ScoreS node score vector 𝐩s\mathbf{p}_{s} is defined by the following iterative equation:

𝐩s=c𝐀~s𝐩s+(1c)𝐛s\mathbf{p}_{s}=c\mathbf{\tilde{A}}_{s}^{\top}\mathbf{p}_{s}+(1-c)\mathbf{b}_{s}

For this (unweighted) case, ScoreS is the same as PageRank, but we refer to it as ScoreS for consistency with our later definitions. Note that the number of edge occurrences between nodes is not considered in ScoreS. Using Lemma 1, we can compute ScoreS incrementally at fast speed, in dynamic graphs.

4.2.2. ScoreW

Next, we introduce the second node score, ScoreW, which we use to catch AnomalyW. To incorporate edge weight, we use the weighted adjacency matrix 𝐀w\mathbf{A}_{w} instead of 𝐀s\mathbf{A}_{s}. However, this is not enough on its own: imagine an attacker node who adds a massive number of edges, all toward a single target node, and the attacker has no other neighbors. Since 𝐀~w\mathbf{\tilde{A}}_{w} is row-normalized, this attacker appears no different in 𝐀~w\mathbf{\tilde{A}}_{w} as if they only added a single edge toward the same target. Hence, to catch such attackers, we also introduce an out-degree proportional starting vector 𝐛w\mathbf{b}_{w}, i.e. setting the initial scores of each node proportional to its outdegree.

Definition 4 (ScoreW).

ScoreW node score vector 𝐩w\mathbf{p}_{w} is defined by the following iterative equation:

𝐩w=c𝐀~w𝐩w+(1c)𝐛w\mathbf{p}_{w}=c\mathbf{\tilde{A}}_{w}^{\top}\mathbf{p}_{w}+(1-c)\mathbf{b}_{w}

𝐀w(i,j)\mathbf{A}_{w}(i,j) is the edge weight from node ii to node jj. 𝐛w(i)\mathbf{b}_{w}(i) is mim\frac{m_{i}}{m}, where mim_{i} denotes the total edge weight of out-edges of node ii, and mm denotes the total edge weight of the graph.

Next, we show how ScoreW is computed incrementally in a dynamic graph. Assume that a change ΔG\Delta G happens in graph GG in time interval Δt\Delta t, inducing changes Δ𝐀w\Delta\mathbf{A}_{w} and Δ𝐛w\Delta\mathbf{b}_{w} in the adjacency matrix and the starting vector, respectively.

Lemma 2 (Dynamic ScoreW).

Given updates Δ𝐀w\Delta\mathbf{A}_{w} and Δ𝐛w\Delta\mathbf{b}_{w} in a graph during Δt\Delta t, an updated score vector 𝐩w(t+Δt)\mathbf{p}_{w}(t+\Delta t) is computed incrementally from a previous score vector 𝐩w(t)\mathbf{p}_{w}(t) as follows:

𝐩w(t+Δt)=\displaystyle\mathbf{p}_{w}(t+\Delta t)=~{} 𝐩w(t)+k=0(c(𝐀~w+Δ𝐀w))kcΔ𝐀w𝐩w(t)\displaystyle\mathbf{p}_{w}(t)+\sum_{k=0}^{\infty}(c(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w}))^{k}c\Delta\mathbf{A}_{w}\mathbf{p}_{w}(t)
+(1c)k=0(c(𝐀~w+Δ𝐀w))kΔ𝐛w\displaystyle+(1-c)\sum_{k=0}^{\infty}(c(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w}))^{k}\Delta\mathbf{b}_{w}
Proof.

For brevity, 𝐩wn𝐩w(t+Δt)\mathbf{p}_{w}^{n}\leftarrow\mathbf{p}_{w}(t+\Delta t) and 𝐩wo𝐩w(t)\mathbf{p}_{w}^{o}\leftarrow\mathbf{p}_{w}(t).

𝐩wn=(1c)k=0ck(𝐀~w+Δ𝐀w)k(𝐛w+Δ𝐛w)\displaystyle\mathbf{p}_{w}^{n}=(1-c)\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w})^{k}(\mathbf{b}_{w}+\Delta\mathbf{b}_{w})
=(1c)k=0ck(𝐀~w+Δ𝐀w)k𝐛w+(1c)k=0ck(𝐀~w+Δ𝐀w)kΔ𝐛w\displaystyle=(1-c)\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w})^{k}\mathbf{b}_{w}+(1-c)\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w})^{k}\Delta\mathbf{b}_{w}
=𝐩wo+k=0(c(𝐀~w+Δ𝐀w))kcΔ𝐀w𝐩wo+(1c)k=0(c(𝐀~w+Δ𝐀w))kΔ𝐛w\displaystyle=\mathbf{p}_{w}^{o}+\sum_{k=0}^{\infty}(c(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w}))^{k}c\Delta\mathbf{A}_{w}\mathbf{p}_{w}^{o}+(1-c)\sum_{k=0}^{\infty}(c(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w}))^{k}\Delta\mathbf{b}_{w}

In the third line, we use Lemma 1. \hfill\blacksquare

Note that, for small changes in a graph, the starting vectors of the last two terms, cΔ𝐀w𝐩w(t)c\Delta\mathbf{A}_{w}\mathbf{p}_{w}(t) and Δ𝐛w\Delta\mathbf{b}_{w} have much smaller L1L1 lengths than the original starting vector 𝐛w\mathbf{b}_{w}, so they can be computed at fast speed.

4.2.3. Suitability

We estimate changes in ScoreS induced by a structure change (Definition 1) and compare the changes with those in ScoreW to prove the suitability of ScoreS for detecting AnomalyS.

Lemma 3 (Upper Bound for Structure Change in ScoreS).

When a structure change of size Δm\Delta m happens around a node uu with kk out-neighbors, Δ𝐀s1\lVert\Delta\mathbf{A}_{s}\rVert_{1} is upper-bounded by 2Δmk\frac{2\Delta m}{k}.

Proof.

In Δ𝐀s\Delta\mathbf{A}_{s}, only the uu-th column has nonzeros. Thus, Δ𝐀s1\lVert\Delta\mathbf{A}_{s}\rVert_{1} =Δ𝐀s(u)1=\lVert\Delta\mathbf{A}_{s}(u)\rVert_{1}. Δ𝐀s(u)\Delta\mathbf{A}_{s}(u) is normalized by kk as the total number of out-neighbors of node uu is kk. For out-neighbors vi=v1,,vΔmv_{i}=v_{1},\dots,v_{\Delta m} who lose edges, Δ𝐀s(vi,u)=1k\Delta\mathbf{A}_{s}(v_{i},u)=-\frac{1}{k}. For out-neighbors vi=v1,,vΔmv_{i}^{\prime}=v_{1}^{\prime},\dots,v_{\Delta m}^{\prime} who earn edges, Δ𝐀s(vi,u)=1k\Delta\mathbf{A}_{s}(v_{i}^{\prime},u)=\frac{1}{k}. Then Δ𝐀s1=Δ𝐀s(u)1=Δmk+Δmk=2Δmk\lVert\Delta\mathbf{A}_{s}\rVert_{1}=\lVert\Delta\mathbf{A}_{s}(u)\rVert_{1}=\frac{\Delta m}{k}+\frac{\Delta m}{k}=\frac{2\Delta m}{k}. \hfill\blacksquare

When a structure change is presented in ScoreW, Δ𝐛w1=0\lVert\Delta\mathbf{b}_{w}\rVert_{1}=0 since there is no change in the number of edges. Moreover Δ𝐀w1=2Δmmu\lVert\Delta\mathbf{A}_{w}\rVert_{1}=\frac{2\Delta m}{m_{u}} since each row in 𝐀w\mathbf{A}_{w} is normalized by the total sum of out-edge weights, mum_{u}, which is larger than the total number of out-neighbors kk. In other words, a structure change generates larger changes in ScoreS (2Δmk\frac{2\Delta m}{k}) than ScoreW (2Δmmu\frac{2\Delta m}{m_{u}}). Thus ScoreS is more suitable to detect AnomalyS than ScoreW.

Similarly, we estimate changes in ScoreW induced by an edge weight change (Definition 2) and compare the changes with those in ScoreS to prove the suitability of ScoreW for detecting AnomalyW.

Lemma 4 (Upper Bound for Edge Weight Change in ScoreW).

When an edge weight change of size Δm\Delta m happens around a node uu with mum_{u} out-edge weights in a graph with mm total edge weights, Δ𝐀w1\lVert\Delta\mathbf{A}_{w}\rVert_{1} and Δ𝐛w1\lVert\Delta\mathbf{b}_{w}\rVert_{1} are upper bounded by 2Δmmu\frac{2\Delta m}{m_{u}} and 2Δmm\frac{2\Delta m}{m}, respectively.

Proof.

In Δ𝐀w\Delta\mathbf{A}_{w}, only the uu-th column has nonzeros. Then Δ𝐀w1\lVert\Delta\mathbf{A}_{w}\rVert_{1} =Δ𝐀w(u)1=\lVert\Delta\mathbf{A}_{w}(u)\rVert_{1}. Node uu has mvim_{v_{i}} edges toward each out-neighbor vi(i=1,,k)v_{i}(i=1,\dots,k). Thus the total sum of out-edge weights, mum_{u}, is i=1kmvi\sum_{i=1}^{k}m_{v_{i}}. Since an weighted adjacency matrix is normalized by the total out-edge weights, 𝐀~w(vi,u)=mvimu\mathbf{\tilde{A}}_{w}^{\top}(v_{i},u)=\frac{m_{v_{i}}}{m_{u}}. After Δm\Delta m edges are added from node uu to node vkv_{k}, Δ𝐀w(vi,u)=mvimu+Δmmvimu\Delta\mathbf{A}_{w}(v_{i},u)=\frac{m_{v_{i}}}{m_{u}+\Delta m}-\frac{m_{v_{i}}}{m_{u}} for iki\neq k, Δ𝐀w(vi,u)=mvi+Δmmu+Δmmvimu\Delta\mathbf{A}_{w}(v_{i},u)=\frac{m_{v_{i}}+\Delta m}{m_{u}+\Delta m}-\frac{m_{v_{i}}}{m_{u}} for i=ki=k. Then Δ𝐀w1=Δ𝐀w(u)1\lVert\Delta\mathbf{A}_{w}\rVert_{1}=\lVert\Delta\mathbf{A}_{w}(u)\rVert_{1} is bounded as follows:

Δ𝐀w1=Δ𝐀w(u)1\displaystyle\lVert\Delta\mathbf{A}_{w}\rVert_{1}=\lVert\Delta\mathbf{A}_{w}(u)\rVert_{1} =i=1kmvi(1mu1mu+Δm)+Δmmu+Δm\displaystyle=\sum_{i=1}^{k}m_{v_{i}}(\frac{1}{m_{u}}-\frac{1}{m_{u}+\Delta m})+\frac{\Delta m}{m_{u}+\Delta m}
=2Δmmu+Δm2Δmmu\displaystyle=\frac{2\Delta m}{m_{u}+\Delta m}\leq\frac{2\Delta m}{m_{u}}

𝐛w(i)=mim\mathbf{b}_{w}(i)=\frac{m_{i}}{m} where mim_{i} is the total sum of out-edge weights of node ii. After Δm\Delta m edges are added from node uu to node vkv_{k}, Δ𝐛w(i)=mim+Δmmim\Delta\mathbf{b}_{w}(i)=\frac{m_{i}}{m+\Delta m}-\frac{m_{i}}{m} for iui\neq u, Δ𝐛w(i)=mi+Δmm+Δmmim\Delta\mathbf{b}_{w}(i)=\frac{m_{i}+\Delta m}{m+\Delta m}-\frac{m_{i}}{m} for i=ui=u. Then Δ𝐛w1\lVert\Delta\mathbf{b}_{w}\rVert_{1} is bounded as follows:

Δ𝐛w1=i=1nmi(1m1m+Δm)+Δmm+Δm=2Δmm+Δm2Δmm\displaystyle\lVert\Delta\mathbf{b}_{w}\rVert_{1}=\sum_{i=1}^{n}m_{i}(\frac{1}{m}-\frac{1}{m+\Delta m})+\frac{\Delta m}{m+\Delta m}=\frac{2\Delta m}{m+\Delta m}\leq\frac{2\Delta m}{m}

\hfill\blacksquare

In contrast, when an edge weight change is presented in ScoreS, Δ𝐀s1=0\lVert\Delta\mathbf{A}_{s}\rVert_{1}=0 since the number of out-neighbors is unchanged. Note that Δ𝐛s1=0\lVert\Delta\mathbf{b}_{s}\rVert_{1}=0 since 𝐛s\mathbf{b}_{s} is fixed in ScoreS. In other words, AnomalyW does not induce any change in ScoreS.

4.3. Metrics for AnomalyS and AnomalyW

Next, we define our two novel metrics for evaluating the anomalousness at each time in dynamic graphs.

4.3.1. AnomRankS

First, we discretize the first order derivative of ScoreS vector 𝐩s\mathbf{p}_{s} as follows:

𝐩s=𝐩s(t+Δt)𝐩s(t)Δt\mathbf{p}_{s}^{\prime}=\frac{\mathbf{p}_{s}(t+\Delta t)-\mathbf{p}_{s}(t)}{\Delta t}

Similarly, the second order derivative of 𝐩s\mathbf{p}_{s} is discretized as follows:

𝐩s′′=(𝐩s(t+Δt)𝐩s(t))(𝐩s(t)𝐩s(tΔt))Δt2\displaystyle\mathbf{p}_{s}^{\prime\prime}=\frac{(\mathbf{p}_{s}(t+\Delta t)-\mathbf{p}_{s}(t))-(\mathbf{p}_{s}(t)-\mathbf{p}_{s}(t-\Delta t))}{\Delta t^{2}}

Next, we define a novel metric AnomRankS which is designed to detect AnomalyS effectively.

Definition 5 (AnomRankS).

Given ScoreS vector 𝐩s\mathbf{p}_{s}, AnomRankS 𝐚s\mathbf{a}_{s} is an (n×2)(n\times 2) matrix [𝐩s𝐩s′′][\mathbf{p}_{s}^{\prime}~{}\mathbf{p}_{s}^{\prime\prime}], concatenating 1st and 2nd derivatives of 𝐩s\mathbf{p}_{s}. The AnomRankS score is 𝐚s1\lVert\mathbf{a}_{s}\rVert_{1}.

Next, we study how AnomRankS scores change under the assumption of a normal stream, or an anomaly, thus explaining how it distinguishes anomalies from normal behavior. First, we model a normal graph stream based on Lipschitz continuity to capture smoothness:

Assumption 1 (𝐩s(t)1\lVert\mathbf{p}_{s}(t)\rVert_{1} in Normal Stream).

In a normal graph stream, 𝐩s(t)1\lVert\mathbf{p}_{s}(t)\rVert_{1} is Lipschitz continuous with positive real constants K1K_{1} and K2K_{2} such that,

𝐩s1K1and𝐩s′′1K2\displaystyle\lVert\mathbf{p}_{s}^{\prime}\rVert_{1}\leq K_{1}~{}and~{}\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}\leq K_{2}

In Lemma 5, we back up Assumption 1 by upper-bounding 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} and 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}. For brevity, all proofs of this subsection are given in Supplement A.3.

Lemma 5 (Upper bound of 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1}).

Given damping factor cc and updates Δ𝐀s\Delta\mathbf{A}_{s} in the adjacency matrix during Δt\Delta t, 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} is upper-bounded by c1cΔ𝐀sΔt1\frac{c}{1-c}\lVert\frac{\Delta\mathbf{A}_{s}}{\Delta t}\rVert_{1}.

Proof.

Proofs are given in Supplement A.3. \hfill\blacksquare

We bound the L1L1 length of 𝐩s′′\mathbf{p}_{s}^{\prime\prime} in terms of L1L1 length of Δ𝐀so\Delta\mathbf{A}_{s_{o}} and Δ𝐀sn\Delta\mathbf{A}_{s_{n}}, where Δ𝐀so\Delta\mathbf{A}_{s_{o}} denotes the changes in 𝐀s\mathbf{A}_{s} from time (tΔt)(t-\Delta t) to time tt, and Δ𝐀sn\Delta\mathbf{A}_{s_{n}} denotes the changes in 𝐀s\mathbf{A}_{s} from tt to (t+Δt)(t+\Delta t).

Lemma 6 (Upper bound of 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}).

Given damping factor cc and sequencing updates Δ𝐀so\Delta\mathbf{A}_{s_{o}} and Δ𝐀sn\Delta\mathbf{A}_{s_{n}}, 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} is upper-bounded by 1Δt2(c1cΔ𝐀snΔ𝐀so1+(c1c)2Δ𝐀sn12+(c1c)2Δ𝐀so12)\frac{1}{\Delta t^{2}}(\frac{c}{1-c}\lVert\Delta\mathbf{A}_{s_{n}}-\Delta\mathbf{A}_{s_{o}}\rVert_{1}+(\frac{c}{1-c})^{2}\lVert\Delta\mathbf{A}_{s_{n}}\rVert_{1}^{2}+(\frac{c}{1-c})^{2}\lVert\Delta\mathbf{A}_{s_{o}}\rVert_{1}^{2}).

Proof.

Proofs are given in Supplement A.3. \hfill\blacksquare

Normal graphs have small changes thus having small Δ𝐀s1\lVert\Delta\mathbf{A}_{s}\rVert_{1}. This results in small values of 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1}. In addition, normal graphs change gradually thus having smallΔ𝐀snΔ𝐀so1\lVert\Delta\mathbf{A}_{s_{n}}-\Delta\mathbf{A}_{s_{o}}\rVert_{1}. This leads to small values of 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}. Then, AnomRankS score 𝐚s1=max(𝐩s1,𝐩s′′1)\lVert\mathbf{a}_{s}\rVert_{1}=\max(\lVert\mathbf{p}_{s}^{\prime}\rVert_{1},\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}) has small values in normal graph streams under small upper bounds.

Observation 1 (AnomalyS in AnomRankS).

AnomalyS involves sudden structure changes, inducing large AnomRankS scores.

AnomalyS happens with massive changes (ΔmΔt\frac{\Delta m}{\Delta t}) abruptly (Δ2mΔt2\frac{\Delta^{2}m}{\Delta t^{2}}). In the following Theorem 1, we explain Observation 1 based on large values of ΔmΔt\frac{\Delta m}{\Delta t} and Δ2mΔt2\frac{\Delta^{2}m}{\Delta t^{2}} in AnomalyS.

Theorem 1 (Upper bounds of 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} and 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} with AnomalyS).

When AnomalyS occurs with large ΔmΔt\frac{\Delta m}{\Delta t} and Δ2mΔt2\frac{\Delta^{2}m}{\Delta t^{2}}, L1 lengths of 𝐩s\mathbf{p}_{s}^{\prime} and 𝐩s′′\mathbf{p}_{s}^{\prime\prime} are upper-bounded as follows:

𝐩s1\displaystyle\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} c1c2kΔmΔt\displaystyle\leq\frac{c}{1-c}\frac{2}{k}\frac{\Delta m}{\Delta t}
𝐩s′′1\displaystyle\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} c1c2kΔ2mΔt2+2(c1c)2(2k)2(ΔmΔt)2\displaystyle\leq\frac{c}{1-c}\frac{2}{k}\frac{\Delta^{2}m}{\Delta t^{2}}+2(\frac{c}{1-c})^{2}(\frac{2}{k})^{2}(\frac{\Delta m}{\Delta t})^{2}
Proof.

Proofs are given in Supplement A.3. \hfill\blacksquare

Based on Theorem 1, AnomalyS has higher upper bounds of 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} and 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} than normal streams. This gives an intuition for why AnomalyS results in high AnomRankS scores (Figure 1(c)). We detect AnomalyS successfully based on AnomRankS scores in real-world graphs (Figure 3).

4.3.2. AnomRankW

We discretize the first and second order derivatives 𝐩w\mathbf{p}_{w}^{\prime} and 𝐩w′′\mathbf{p}_{w}^{\prime\prime} of 𝐩w\mathbf{p}_{w} as follows:

𝐩w\displaystyle\mathbf{p}_{w}^{\prime} =𝐩w(t+Δt)𝐩w(t)Δt\displaystyle=\frac{\mathbf{p}_{w}(t+\Delta t)-\mathbf{p}_{w}(t)}{\Delta t}
𝐩w′′\displaystyle\mathbf{p}_{w}^{\prime\prime} =(𝐩w(t+Δt)𝐩w(t))(𝐩w(t)𝐩w(tΔt))Δt2\displaystyle=\frac{(\mathbf{p}_{w}(t+\Delta t)-\mathbf{p}_{w}(t))-(\mathbf{p}_{w}(t)-\mathbf{p}_{w}(t-\Delta t))}{\Delta t^{2}}

Then we define the second metric AnomRankW which is designed to find AnomalyW effectively.

Definition 6 (AnomRankW).

Given ScoreW vector 𝐩w\mathbf{p}_{w}, AnomRankW 𝐚w\mathbf{a}_{w} is a (n×2)(n\times 2) matrix [𝐩w𝐩w′′][\mathbf{p}_{w}^{\prime}~{}\mathbf{p}_{w}^{\prime\prime}], concatenating 1st and 2nd derivatives of 𝐩w\mathbf{p}_{w}. The AnomRankW score is 𝐚w1\lVert\mathbf{a}_{w}\rVert_{1}.

We model smoothness of 𝐩w(t)1\lVert\mathbf{p}_{w}(t)\rVert_{1} in a normal graph stream using Lipschitz continuity in Assumption 2. Then, similar to what we have shown in the previous Section 4.3.1, we show upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} and 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} in Lemmas 7 and  8 to explain Assumption 2.

Assumption 2 (𝐩w(t)1\lVert\mathbf{p}_{w}(t)\rVert_{1} in Normal Stream).

In a normal graph stream, 𝐩w(t)1\lVert\mathbf{p}_{w}(t)\rVert_{1} is Lipschitz continuous with positive real constants C1C_{1} and C2C_{2} such that,

𝐩w1C1and𝐩w′′1C2\displaystyle\lVert\mathbf{p}_{w}^{\prime}\rVert_{1}\leq C_{1}~{}and~{}\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1}\leq C_{2}
Lemma 7 (Upper bound of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1}).

Given damping factor cc, updates Δ𝐀w\Delta\mathbf{A}_{w} in the adjacency matrix, and updates Δ𝐛w\Delta\mathbf{b}_{w} in the starting vector during Δt\Delta t, 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} is upper-bounded by 1Δt(c1cΔ𝐀w1+Δ𝐛w1)\frac{1}{\Delta t}(\frac{c}{1-c}\lVert\Delta\mathbf{A}_{w}\rVert_{1}+\lVert\Delta\mathbf{b}_{w}\rVert_{1}).

Proof.

Proofs are given in Supplement A.3. \hfill\blacksquare

In the following lemma, 𝐩s′′max\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{max} denotes the upper bound of 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} which we show in Lemma 6. Δ𝐀wo\Delta\mathbf{A}_{w_{o}} is the changes in 𝐀w\mathbf{A}_{w} from time (tΔt)(t-\Delta t) to time tt, and Δ𝐀wn\Delta\mathbf{A}_{w_{n}} is the changes in 𝐀w\mathbf{A}_{w} from tt to (t+Δt)(t+\Delta t). Δ𝐛wo\Delta\mathbf{b}_{w_{o}} is the changes in 𝐛w\mathbf{b}_{w} from time (tΔt)(t-\Delta t) to time tt, and Δ𝐛wn\Delta\mathbf{b}_{w_{n}} is the changes in 𝐛w\mathbf{b}_{w} from tt to (t+Δt)(t+\Delta t).

Lemma 8 (Upper bound of 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1}).

Given damping factor cc, sequencing updates Δ𝐀wo\Delta\mathbf{A}_{w_{o}} and Δ𝐀wn\Delta\mathbf{A}_{w_{n}}, and sequencing updates Δ𝐛wo\Delta\mathbf{b}_{w_{o}} and Δ𝐛wn\Delta\mathbf{b}_{w_{n}}, the upper bound of 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} is presented as follows:

𝐩s′′max+1Δt2(Δ𝐛wnΔ𝐛wo1+c1cΔ𝐀wn1Δ𝐛wn1)\displaystyle\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{max}+\frac{1}{\Delta t^{2}}(\lVert\Delta\mathbf{b}_{w_{n}}-\Delta\mathbf{b}_{w_{o}}\rVert_{1}+\frac{c}{1-c}\lVert\Delta\mathbf{A}_{w_{n}}\rVert_{1}\lVert\Delta\mathbf{b}_{w_{n}}\rVert_{1})
Proof.

Proofs are given in Supplement A.3. \hfill\blacksquare

Normal graph streams have small changes (small Δ𝐀w1\lVert\Delta\mathbf{A}_{w}\rVert_{1} and small Δ𝐛w1\lVert\Delta\mathbf{b}_{w}\rVert_{1}) and evolve gradually (small Δ𝐛wnΔ𝐛wo1\lVert\Delta\mathbf{b}_{w_{n}}-\Delta\mathbf{b}_{w_{o}}\rVert_{1}). Then, normal graph streams have small AnomRankW scores under small upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} and 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1}.

Observation 2 (AnomalyW in AnomRankW).

AnomalyW involves sudden edge weight changes, inducing large AnomRankW.

We explain Observation 2 by showing large upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} and 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} induced by large values of ΔmΔt\frac{\Delta m}{\Delta t} and Δ2mΔt2\frac{\Delta^{2}m}{\Delta t^{2}} in AnomalyW.

Theorem 2 (Upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} and 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} with AnomalyW).

When AnomalyW occurs with large ΔmΔt\frac{\Delta m}{\Delta t} and Δ2mΔt2\frac{\Delta^{2}m}{\Delta t^{2}}, L1 lengths of 𝐩w\mathbf{p}_{w}^{\prime} and 𝐩w′′\mathbf{p}_{w}^{\prime\prime} are upper-bounded as follows:

𝐩w1\displaystyle\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} c1c2muΔmΔt+2mΔmΔt\displaystyle\leq\frac{c}{1-c}\frac{2}{m_{u}}\frac{\Delta m}{\Delta t}+\frac{2}{m}\frac{\Delta m}{\Delta t}
𝐩w′′1\displaystyle\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} c1c2kΔ2mΔt2+2mΔ2mΔt2\displaystyle\leq\frac{c}{1-c}\frac{2}{k}\frac{\Delta^{2}m}{\Delta t^{2}}+\frac{2}{m}\frac{\Delta^{2}m}{\Delta t^{2}}
+2(c1c)2(2k)2(ΔmΔt)2+c1c2mu2m(ΔmΔt)2\displaystyle+2(\frac{c}{1-c})^{2}(\frac{2}{k})^{2}(\frac{\Delta m}{\Delta t})^{2}+\frac{c}{1-c}\frac{2}{m_{u}}\frac{2}{m}(\frac{\Delta m}{\Delta t})^{2}
Proof.

Proofs are given in Supplement A.3. \hfill\blacksquare

With high upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} and 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1}, shown in Theorem 2, AnomalyW has high AnomRankW scores (Figure 1(c)). We detect AnomalyW successfully based on AnomRankW scores in real-world graphs (Figure 3).

0:  updates in a graph: ΔG\Delta G, previous ScoreS/W: 𝐩sold,𝐩wold\mathbf{p}_{s}^{old},\mathbf{p}_{w}^{old} 0:  anomaly score: sanomalys_{\text{anomaly}}, updated ScoreS/W: 𝐩snew,𝐩wnew\mathbf{p}_{s}^{new},\mathbf{p}_{w}^{new} 1:  compute updates Δ𝐀s,Δ𝐀w\Delta\mathbf{A}_{s},\Delta\mathbf{A}_{w} and Δ𝐛w\Delta\mathbf{b}_{w} 2:  compute 𝐩snew\mathbf{p}_{s}^{new} and 𝐩wnew\mathbf{p}_{w}^{new} incrementally from 𝐩sold\mathbf{p}_{s}^{old} and 𝐩wold\mathbf{p}_{w}^{old} using Δ𝐀s,Δ𝐀w\Delta\mathbf{A}_{s},\Delta\mathbf{A}_{w} and Δ𝐛w\Delta\mathbf{b}_{w} 3:  sanomaly=ComputeAnomalyScore(𝐩snew,𝐩wnew)s_{\text{anomaly}}=\textit{ComputeAnomalyScore}(\mathbf{p}_{s}^{new},\mathbf{p}_{w}^{new}) 4:  return  sanomalys_{\text{anomaly}}
Algorithm 1 AnomRank
0:  ScoreS and ScoreW vectors: 𝐩s,𝐩w\mathbf{p}_{s},\mathbf{p}_{w} 0:  anomaly score: sanomalys_{\text{anomaly}} 1:  compute AnomRankS 𝐚s=[𝐩s𝐩s′′]\mathbf{a}_{s}=[\mathbf{p}_{s}^{\prime}~{}\mathbf{p}_{s}^{\prime\prime}] 2:  compute AnomRankW 𝐚w=[𝐩w𝐩w′′]\mathbf{a}_{w}=[\mathbf{p}_{w}^{\prime}~{}\mathbf{p}_{w}^{\prime\prime}] 3:  sanomaly=𝐚1=max(𝐚s1,𝐚w1)s_{\text{anomaly}}=\lVert\mathbf{a}\rVert_{1}=\max(\lVert\mathbf{a}_{s}\rVert_{1},\lVert\mathbf{a}_{w}\rVert_{1}) 4:  return  sanomalys_{\text{anomaly}}
Algorithm 2 ComputeAnomalyScore

4.4. Algorithm

Algorithm 1 describes how we detect anomalies in a dynamic graph. We first calculate updates Δ𝐀s\Delta\mathbf{A}_{s} in the unweighted adjacency matrix, updates Δ𝐀w\Delta\mathbf{A}_{w} in the weighted adjacency matrix, and updates Δ𝐛w\Delta\mathbf{b}_{w} in the out-edge proportional starting vector (Line 1). These computations are proportional to the number of edge changes, taking a few milliseconds for small changes. Then, AnomRank updates ScoreS and ScoreW vectors using the update rules in Lemmas 1 and  2 (Line 2). Then AnomRank calculates an anomaly score given ScoreS and ScoreW in Algorithm 2. AnomRank computes AnomRankS and AnomRankW, and returns the maximum L1 length between them as the anomaly score.

Normalization: As shown in Theorems 1 and 2, the upper bounds of AnomRankS and AnomRankW are based on the number of out-neighbors kk and the number of out-edge weights mum_{u}. This leads to skew in anomalousness score distributions since many real-world graphs have skewed degree distributions. Thus, we normalize each node’s AnomRankS and AnomRankW scores by subtracting its mean and dividing by its standard deviation, which we maintain along the stream.

Explainability and Attribution: AnomRank explains the type of anomalies by comparing AnomRankS and AnomRankW: higher scores of AnomRankS suggest that AnomalyS has happened, and vice versa. High scores of both metrics indicate a large edge weight change that also alters the graph structure. Furthermore, we can localize culprits of anomalies by ranking AnomRank scores of each node in the score vector, as computed in Lines 1 and 2 of Algorithm 2. We show this localization experimentally in Section 5.5.

Δt\Delta t selection: Our analysis and proofs, hold for any value of Δt\Delta t. The choice of Δt\Delta t is outside the scope of this paper, and probably best decided by a domain expert: large Δt\Delta t is suitable for slow (’low temperature’) attacks; small Δt\Delta t spots fast and abrupt attacks. In our experiments, we chose Δt=1\Delta t=1 hour, and 11 day, respectively, for a computer-network intrusion setting, and for a who-emails-whom network.

5. Experiments

In this section, we evaluate the performance of AnomRank compared to state-of-the-art anomaly detection methods on dynamic graphs. We aim to answer the following questions:

  • Q1. Practicality. How fast, accurate, and scalable is AnomRank compared to its competitors? (Section 5.2)

  • Q2. Effectiveness of two-pronged approach. How do our two metrics, AnomRankS and AnomRankW, complement each other in real-world and synthetic graphs? (Section 5.3)

  • Q3. Effectiveness of two-derivatives approach. How do the 1st and 2nd order derivatives of ScoreS and ScoreW complement each other? (Section 5.4)

  • Q4. Attribution. How accurately does AnomRank localize culprits of anomalous events? (Section 5.5)

5.1. Setup

We use SedanSpot (Eswaran and Faloutsos, 2018) and DenseAlert (Shin et al., 2017), state-of-the-art anomaly detection methods on dynamic graphs as our baselines. We use two real-world dynamic graphs, DARPA and ENRON, and one synthetic dynamic graph, RTM, with two anomaly injection scenarios. Anomalies are verified by comparing to manual annotations or by correlating with real-world events. More details of experimental settings and datasets are described in Supplement A.1 and A.2.

Refer to caption
Refer to caption
(a) Precision@50 vs. time taken
Refer to caption
(b) Precision@150 vs. time taken
Refer to caption
(c) Precision@200 vs. time taken
Refer to caption
(d) Precision vs. Recall
Figure 3. AnomRank is fastest and most accurate: (a-c) AnomRank outperforms the baselines in terms of both accuracy and speed on the DARPA dataset. (d) AnomRank achieves better precision and recall on high ranks (top-50,,25050,\dots,250) than its competitors.

5.2. Practicality

We examine the practicality of AnomRank on the DARPA dataset, a public benchmark for Network Intrusion Detection Systems. In this network intrusion setting, our focus is on detecting high-volume (i.e. high edge-weight) intrusions such as denial of service (DOS) attacks, which are typically the focus of graph mining-based detection approaches. Hence, we use only the AnomRankW metric.

Precision and Recall: Using each method, we first compute anomaly scores for each of the 11391139 graph snapshots, then select the top-k most anomalous snapshots (k=50,100,,800k=50,100,\cdots,800). Then we compute precision and recall for each method’s output. In Figure 3(d), AnomRank shows higher precision and recall than DenseAlert and SedanSpot on high ranks (top-50,,25050,\dots,250). Considering that anomaly detection tasks in real-world settings are generally focused on the most anomalous instances, high accuracy on high ranks is more meaningful than high accuracy on low ranks. Moreover, considering that the number of ground truth anomalies is 288288, its precision and recall up to top-250250 better reflects its practicality.

Accuracy vs. Running Time: In Figures 1(a) and 3(a-c), AnomRank is most accurate and fastest. Compared to SedanSpot, AnomRank achieves up to 18%18\% higher precision on top-k ranks with 49.5×49.5\times faster speed. Compared to DenseAlert, AnomRank achieves up to 35%35\% higher precision with 4.8×4.8\times faster speed. DenseAlert and SedanSpot require several subprocesses (hashing, random-walking, reordering, sampling, etc), resulting in large computation time.

Scalability: Figure 1(b) shows the scalability of AnomRank with the number of edges. We plot the wall-clock time needed to run on the (chronologically) first 2,22,,2222,2^{2},\dots,2^{22} edges of the DARPA dataset. This confirms the linear scalability of AnomRank with respect to the number of edges in the input dynamic graph. Note that AnomRank processes 1M1M edges within 11 second, allowing real-time anomaly detection.

Effectiveness: Figure 1(c) shows changes of AnomRank scores in the DARPA dataset, with time window of ΔT=1\Delta T=1 hour. Consistently with the large upper bounds shown in Theorems 1 and 2, ground truth attacks (red crosses) have large AnomRank scores in Figure 1(c). Given mean and std of anomaly scores of all snapshots, setting an anomalousness threshold of (mean + 12\frac{1}{2}std), 77%77\% of spikes above the threshold are true positives. This shows the effectiveness of AnomRank as a barometer of anomalousness.

5.3. Effectiveness of Two-Pronged Approach

In this experiment, we show the effectiveness of our two-pronged approach using real-world and synthetic graphs.

5.3.1. Real-World Graph

We measure anomaly scores based on four metrics: AnomRankS, AnomRankW, SedanSpot, and DenseAlert, on the ENRON dataset. In Figure 4, AnomRankW and SedanSpot show similar trends, while AnomRankS detects different events as anomalies on the same dataset. DenseAlert shows similar trends with the sum of AnomRankS and AnomRankW, while missing several anomalous events. This is also reflected in the low accuracy of DenseAlert on the DARPA dataset in Figure 3. The anomalies detected by AnomRankS and AnomRankW coincide with major events in the ENRON timeline222http://www.agsm.edu.au/bobm/teaching/BE/Enron/timeline.html as follows:

  1. (1)

    June 12, 2000: Skilling makes joke at Las Vegas conference, comparing California to the Titanic.

  2. (2)

    August 23, 2000: FERC orders an investigation into Timothy Belden’s strategies designed to drive electricity prices up in California.

  3. (3)

    Oct. 3, 2000: Enron attorney Richard Sanders travels to Portland to discuss Timothy Belden’s strategies.

  4. (4)

    Dec. 13, 2000: Enron announces that Jeffrey Skilling will take over as chief executive.

  5. (5)

    Mar. 2001: Enron transfers large portions of EES business into wholesale to hide EES losses.

  6. (6)

    July 13, 2001: Skilling announces desire to resign to Lay. Lay asks Skilling to take the weekend and think it over.

  7. (7)

    Oct. 17, 2001: Wall Street Journal reveals the precarious nature of Enron’s business.

  8. (8)

    Nov. 19, 2001: Enron discloses it is trying to restructure a 690 million obligation.

  9. (9)

    Jan. 23-25, 2002: Lay resigns as chairman and CEO of Enron. Cliff Baxter, former Enron vice chairman, commits suicide.

  10. (10)

    Feb. 2, 2002: The Powers Report, a summary of an internal investigation into Enron’s collapse spreads out.

Refer to caption
Figure 4. Two-pronged approach pays off: AnomRankW and SedanSpot show similar trends on the Enron dataset, while AnomRankS detects different events as anomalies on the same dataset. DenseAlert shows similar trends with the sum of AnomRankS and AnomRankW while missing several anomalous events.

The high anomaly scores of AnomRankS coincide with the timeline events when Enron conspired to drive the California electricity price up (Events 1,2,3) and office politics played out in Enron (Events 4,5,6). Meanwhile, AnomRankW shows high anomaly scores when the biggest scandals of the company continued to unfold (Events 7, 8,9,10). Note that AnomRankS and AnomRankW are designed to detect different properties of anomalies on dynamic graphs. AnomRankS is effective at detecting AnomalyS like unusual email communications for conspiration, while AnomRankW is effective at detecting AnomalyW like massive email communications about big scandals that swept the whole company. The non-overlapping trends of AnomRankS and AnomRankW show that the two metrics complement each other successfully in real-world data. Summarizing the two observations we make:

  • Observation 1. AnomRankS and AnomRankW spot different types of anomalous events.

  • Observation 2. DenseAlert and SedanSpot detect a subset of the anomalies detected by AnomRank.

5.3.2. Synthetic Graph

In our synthetic graph generated by RTM method, we inject two types of anomalies to examine the effectiveness of our two metrics. Details of the injections are as follows:

  • InjectionS: We choose 5050 timestamps uniformly at random: at each chosen timestamp, we select 88 nodes uniformly at random, and introduce all edges between these nodes in both directions.

  • InjectionW: We choose 5050 timestamps uniformly at random: at each chosen timestamp, we select two nodes uniformly at random, and add 7070 edges from the first to the second.

A clique is an example of AnomalyS with unusual structure pattern, while high edge weights are an example of AnomalyW. Hence, InjectionS and InjectionW are composed of AnomalyS and AnomalyW, respectively.

Then we evaluate the precision of the top-50 highest anomaly scores output by the AnomRankS metric and the AnomRankW metric. We also evaluate each metric on the DARPA dataset based on their top-250 anomaly scores. In Table 3, AnomRankS shows higher precision on InjectionS than AnomRankW, while AnomRankW has higher precision on InjectionW and DARPA. In Section 4.2.3, we showed theoretically that AnomalyS induces larger changes in AnomRankS than AnomRankW, explaining the higher precision of AnomRankS than AnomRankW on InjectionS. We also showed that adding additional edge weights has no effect on AnomRankS, explaining that AnomRankS does not work on InjectionW. For the DARPA dataset, AnomRankW shows higher accuracy than AnomRankS. DARPA contains 2.7M attacks, and 90%90\% of the attacks (2.4M attacks) are DOS attacks generated from only 2-3 source IP adderesses toward 2-3 target IP addresses. These attacks are of AnomalyW type with high edge weights. Thus AnomRankW shows higher precision on DARPA than AnomRankS.

Table 3. AnomRankS and AnomRankW complement each other:
we measure precision of the two metrics on the synthetic graph with two injection scenarios (InjectionS, InjectionW) and the real-world graph DARPA. We measure precision on top-50 and top-250 ranks on the synthetic graph and DARPA, respectively.
Metric Dataset InjectionS InjectionW DARPA
AnomRankS only .96 .00 .42
AnomRankW only .82 .79 .69
Table 4. 1st and 2nd order derivatives complement each other:
AnomRankW-1st and AnomRankW-2nd are 1st and 2nd derivatives of ScoreW. Combining AnomRankW-1st and AnomRankW-2nd results in the highest precision.
Metric Dataset InjectionS InjectionW DARPA
AnomRankW-1st .06 .11 .65
AnomRankW-2nd .80 .78 .61
AnomRankW .82 .79 .69

5.4. Effectiveness of Two-Derivatives Approach

In this experiment, we show the effectiveness of 1st and 2nd order derivatives of ScoreS and ScoreW in detecting anomalies in dynamic graphs. For brevity, we show the result on ScoreW; result on ScoreS is similar. Recall that AnomRankW score is defined as the L1 length of 𝐚w=[𝐩w𝐩w′′]\mathbf{a}_{w}=[\mathbf{p}_{w}^{\prime}~{}\mathbf{p}_{w}^{\prime\prime}] where 𝐩w\mathbf{p}_{w}^{\prime} and 𝐩w′′\mathbf{p}_{w}^{\prime\prime} are the 1st and 2nd order derivatives of ScoreW, respectively. We define two metrics, AnomRankW-1st and AnomRankW-2nd, which denote the L1 lengths of 𝐩w\mathbf{p}_{w}^{\prime} and 𝐩w′′\mathbf{p}_{w}^{\prime\prime}, respectively. By estimating precision using AnomRankW-1st and AnomRankW-2nd individually, we examine the effectiveness of each derivative using the same injection scenarios and evaluation approach as Section 5.3.2.

In Table 4, AnomRankW-1st shows higher precision on the DARPA dataset, while AnomRankW-2nd has higher precision on injection scenarios. AnomRankW-1st detects suspiciously large anomalies based on L1 length of 1st order derivatives, while AnomRa nkW-2nd detects abruptness of anomalies based on L1 length of 2nd order derivatives. Note that combining 1st and 2nd order derivatives leads to better precision. This shows that 1st and 2nd order derivatives complement each other.

5.5. Attribution

In this experiment, we show that AnomRank successfully localizes the culprits of anomalous events as explained in the last paragraph of Section 4.4. In Figure 5, given a graph snapshot detected as an anomaly in the DARPA dataset, nodes (IP addresses) are sorted in order of their AnomRank scores. Outliers with significantly large scores correspond to IP addresses which are likely to be engaged in network intrusion attacks. At the 1515th snapshot (T=15T=15) when Back DOS attacks occur, the attacker IP (135.008.060.182135.008.060.182) and victim IP (172.016.114.050172.016.114.050) have the largest AnomRank scores. In the 133133th snapshot (T=133T=133) where Nmap probing attacks take place, the victim IP (172.016.112.050172.016.112.050) has the largest score.

6. Conclusion

In this paper, we proposed a two-pronged approach for detecting anomalous events in a dynamic graph.

Our main contributions are:

  • Online, Two-Pronged Approach We introduced AnomRank, a novel and simple detection method in dynamic graphs.

  • Theoretical Guarantees We present theoretical analysis (Theorems 1 and 2) on the effectiveness of AnomRank.

  • Practicality In Section 5, we show that AnomRank outperforms state-of-the-art baselines, with up to 49.5×\mathit{49.5\times} faster speed or 35%\mathit{35\%} higher accuracy. AnomRank is fast, taking about 2 seconds on a graph with 4.5 million edges.

Our code and data are publicly available333https://github.com/minjiyoon/anomrank.

Refer to caption
Figure 5. Attribution: AnomRank localizes the culprits of anomalous events in the DARPA dataset: in a Back DOS attack, the attacker and victim IP have the top-2 largest scores; in an Nmap probing attack, the victim IP has the largest AnomRank score.

Acknowledgment

This material is based upon work supported by the National Science Foundation under Grants No. CNS-1314632 and IIS-1408924. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

  • (1)
  • Aggarwal et al. (2011) Charu C Aggarwal, Yuchen Zhao, and Philip S Yu. 2011. Outlier detection in graph streams. In ICDE.
  • Akoglu et al. (2008) Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2008. RTM: Laws and a recursive generator for weighted time-evolving graphs. In ICDM.
  • Akoglu et al. (2010) Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD.
  • Akoglu et al. (2015) Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: a survey. DMKD 29, 3 (2015), 626–688.
  • Beutel et al. (2013) Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In WWW.
  • Chakrabarti (2004) Deepayan Chakrabarti. 2004. Autopart: Parameter-free graph partitioning and outlier detection. In PKDD.
  • Chung (1994) F.R.K. Chung. 1994. Spectral Graph Theory. Number no. 92 in CBMS Regional Conference Series. Conference Board of the Mathematical Sciences.
  • Eswaran and Faloutsos (2018) Dhivya Eswaran and Christos Faloutsos. 2018. SEDANSPOT: Detecting Anomalies in Edge Streams. In ICDM.
  • Eswaran et al. (2018) Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. SpotLight: Detecting Anomalies in Streaming Graphs. In KDD.
  • Henderson et al. (2010) Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B Aditya Prakash, and Hanghang Tong. 2010. Metric forensics: a multi-level approach for mining volatile graphs. In KDD.
  • Hooi et al. (2017) Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. TKDD 11, 4 (2017), 44.
  • Jiang et al. (2016) Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2016. Catching synchronized behaviors in large networks: A graph mining approach. TKDD 10, 4 (2016), 35.
  • Kleinberg (1999) Jon M Kleinberg. 1999. Authoritative sources in a hyperlinked environment. JACM 46, 5 (1999), 604–632.
  • Koutra et al. (2016) Danai Koutra, Neil Shah, Joshua T Vogelstein, Brian Gallagher, and Christos Faloutsos. 2016. DeltaCon: principled massive-graph similarity function with attribution. TKDD 10, 3 (2016), 28.
  • Lempel and Moran (2001) Ronny Lempel and Shlomo Moran. 2001. SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19, 2 (2001), 131–160.
  • Lippmann et al. (1999) Richard Lippmann, Robert K Cunningham, David J Fried, Isaac Graf, Kris R Kendall, Seth E Webster, and Marc A Zissman. 1999. Results of the DARPA 1998 Offline Intrusion Detection Evaluation.. In Recent advances in intrusion detection, Vol. 99. 829–835.
  • Page et al. (1999) Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
  • Ranshous et al. (2016) Stephen Ranshous, Steve Harenberg, Kshitij Sharma, and Nagiza F Samatova. 2016. A scalable approach for outlier detection in edge streams using sketch-based approximations. In SDM.
  • Shetty and Adibi (2004) Jitesh Shetty and Jafar Adibi. 2004. The Enron email dataset database schema and brief statistical report. Information sciences institute technical report, University of Southern California 4, 1 (2004), 120–128.
  • Shin et al. (2018a) Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2018a. Patterns and anomalies in k-cores of real-world graphs with applications. KAIS 54, 3 (2018), 677–710.
  • Shin et al. (2018b) Kijung Shin, Bryan Hooi, and Christo Faloutsos. 2018b. Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining. TKDD 12, 3 (2018), 28.
  • Shin et al. (2017) Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017. DenseAlert: Incremental dense-subtensor detection in tensor streams. In KDD.
  • Sun et al. (2006) Jimeng Sun, Dacheng Tao, and Christos Faloutsos. 2006. Beyond streams and graphs: dynamic tensor analysis. In KDD.
  • Tong and Lin (2011) Hanghang Tong and Ching-Yung Lin. 2011. Non-negative residual matrix factorization with application to graph anomaly detection. In SDM.
  • Wang et al. (2015) Teng Wang, Chunsheng Fang, Derek Lin, and S Felix Wu. 2015. Localizing temporal anomalies in large evolving graphs. In SDM.
  • Wasserman and Faust (1994) Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. Cambridge university press.
  • Xu et al. (2007) Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan: a structural clustering algorithm for networks. In KDD.
  • Yoon et al. (2018a) Minji Yoon, Woojeong Jin, and U Kang. 2018a. Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees. In WWW.
  • Yoon et al. (2018b) Minji Yoon, Jinhong Jung, and U Kang. 2018b. TPA: Fast, Scalable, and Accurate method for Approximate Random Walk with Restart on Billion Scale Graphs. In ICDE.

Appendix A Supplement

A.1. Experimental Setting

All experiments are carried out on a 3 GHz Intel Core i5 iMac, 16 GB RAM, running OS X 10.13.6. We implemented AnomRank and SedanSpot in C++, and we used an open-sourced implementation of DenseAlert444https://github.com/kijungs/densealert, provided by the authors of (Shin et al., 2017). To show the best trade-off between speed and accuracy, we set the sample size to 50 for SedanSpot and follow other parameter settings as suggested in the original paper (Eswaran and Faloutsos, 2018). For AnomRank, we set the damping factor cc to 0.50.5, and stop iterations for computing node score vectors when L1L1 changes of node score vectors are less than 10310^{-3}.

A.2. Dataset

DARPA (Lippmann et al., 1999) has 4.5M IP-IP communications between 9.4K source IP and 2.3K destination IP over 87.7K minutes. Each communication is a directed edge (srcIP, dstIP, timestamp, attack) where the attack label indicates whether the communication is an attack or not. We aggregate edges occurring in every hour, resulting in a stream of 1463 graphs. We annotate a graph snapshot as anomalous if it contains at least 50 attack edges. Then there are 288 ground truth anomalies (23.8%23.8\% of total). We use the first 256 graphs for initializing means and variances needed during normalization (as described in Section 4.4).

ENRON (Shetty and Adibi, 2004) contains 50K emails from 151 employees over 3 years in the ENRON Corporation. Each email is a directed edge (sender, receiver, timestamp). We aggregate edges occurring in every day duration, resulting in a stream of 1139 graphs. We use the first 256 graphs for initializing means and variances.

RTM method (Akoglu et al., 2008) generates time-evolving graphs with repeated Kronecker products. We use the publicly available code555www.alexbeutel.com/l/www2013. The generated graph is a directed graph with 1K nodes and 8.1K edges over 2.7K timestamps. We use the first 300 timestamps for initializing means and variances.

A.3. Proofs

We prove upper bounds on the 1st and 2nd derivatives of ScoreS and ScoreW, showing their effectiveness in detecting AnomalyS and AnomalyW.

Proof of Lemma 5 (Upper bound of 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1}).

For brevity, 𝐩sn𝐩s(t+Δt),𝐩so𝐩s(t)\mathbf{p}_{s}^{n}\leftarrow\mathbf{p}_{s}(t+\Delta t),\mathbf{p}_{s}^{o}\leftarrow\mathbf{p}_{s}(t). By Lemma 1, 𝐩sn𝐩so1\lVert\mathbf{p}_{s}^{n}-\mathbf{p}_{s}^{o}\rVert_{1} is presented as follows:

𝐩sn𝐩so1\displaystyle\lVert\mathbf{p}_{s}^{n}-\mathbf{p}_{s}^{o}\rVert_{1} =k=0ck(𝐀~s+Δ𝐀s)kc(Δ𝐀s𝐩so)1\displaystyle=\lVert\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{s}^{\top}+\Delta\mathbf{A}_{s})^{k}c(\Delta\mathbf{A}_{s}\mathbf{p}_{s}^{o})\rVert_{1}
ck=0ck(𝐀~s+Δ𝐀s)k1Δ𝐀s𝐩so1\displaystyle\leq c\sum_{k=0}^{\infty}\lVert c^{k}(\mathbf{\tilde{A}}_{s}^{\top}+\Delta\mathbf{A}_{s})^{k}\rVert_{1}\lVert\Delta\mathbf{A}_{s}\mathbf{p}_{s}^{o}\rVert_{1}
c1cΔ𝐀s𝐩so1c1cΔ𝐀s1\displaystyle\leq\frac{c}{1-c}\lVert\Delta\mathbf{A}_{s}\mathbf{p}_{s}^{o}\rVert_{1}\leq\frac{c}{1-c}\lVert\Delta\mathbf{A}_{s}\rVert_{1}
𝐩s1\displaystyle\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} =𝐩sn𝐩soΔt1c1cΔ𝐀sΔt1\displaystyle=\lVert\frac{\mathbf{p}_{s}^{n}-\mathbf{p}_{s}^{o}}{\Delta t}\rVert_{1}\leq\frac{c}{1-c}\lVert\frac{\Delta\mathbf{A}_{s}}{\Delta t}\rVert_{1}

Note that (𝐀~s+Δ𝐀s)k1=𝐩so1=1\lVert(\mathbf{\tilde{A}}_{s}^{\top}+\Delta\mathbf{A}_{s})^{k}\rVert_{1}=\lVert\mathbf{p}_{s}^{o}\rVert_{1}=1 since (𝐀~s+Δ𝐀s)(\mathbf{\tilde{A}}_{s}^{\top}+\Delta\mathbf{A}_{s}) is a column-normalized stochastic matrix, and 𝐩so\mathbf{p}_{s}^{o} is a PageRank vector. \hfill\blacksquare

Proof of Lemma 6 (Upper bound of 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}).

For brevity, 𝐩0𝐩s(tΔt),𝐩1𝐩s(t),𝐩2𝐩s(t+Δt),Δ𝐩o𝐩1𝐩0,Δ𝐩n𝐩2𝐩1,𝐀𝐀~s,Δ𝐀1Δ𝐀so\mathbf{p}_{0}\leftarrow\mathbf{p}_{s}(t-\Delta t),\mathbf{p}_{1}\leftarrow\mathbf{p}_{s}(t),\mathbf{p}_{2}\leftarrow\mathbf{p}_{s}(t+\Delta t),\Delta\mathbf{p}^{o}\leftarrow\mathbf{p}_{1}-\mathbf{p}_{0},\Delta\mathbf{p}^{n}\leftarrow\mathbf{p}_{2}-\mathbf{p}_{1},\mathbf{A}\leftarrow\mathbf{\tilde{A}}_{s}^{\top},\Delta\mathbf{A}_{1}\leftarrow\Delta\mathbf{A}_{s_{o}} and Δ𝐀2Δ𝐀sn\Delta\mathbf{A}_{2}\leftarrow\Delta\mathbf{A}_{s_{n}}. In addition, we omit cc by substituting 𝐀c𝐀\mathbf{A}\leftarrow c\mathbf{A} and Δ𝐀cΔ𝐀\Delta\mathbf{A}\leftarrow c\Delta\mathbf{A} during this proof. By Lemma 1, Δ𝐩n\Delta\mathbf{p}^{n} is:

Δ𝐩n\displaystyle\Delta\mathbf{p}^{n} =k=0(𝐀+Δ𝐀1+Δ𝐀2)k(Δ𝐀2𝐩1)\displaystyle=\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1}+\Delta\mathbf{A}_{2})^{k}(\Delta\mathbf{A}_{2}\mathbf{p}_{1})

Δ𝐩n\Delta\mathbf{p}^{n} can be viewed as an updated ScoreS with the original adjacency matrix Y1=(𝐀+Δ𝐀1)Y_{1}=(\mathbf{A}+\Delta\mathbf{A}_{1}), the update Δ𝐀2\Delta\mathbf{A}_{2}, and the starting vector (Δ𝐀2𝐩1)(\Delta\mathbf{A}_{2}\mathbf{p}_{1}) from an original vector 𝐩temp=k=0Y1k(Δ𝐀2𝐩1)\mathbf{p}_{temp}=\sum_{k=0}^{\infty}Y_{1}^{k}(\Delta\mathbf{A}_{2}\mathbf{p}_{1}). Then, by Lemma 1, Δ𝐩n\Delta\mathbf{p}^{n} is presented as follows:

Δ𝐩n\displaystyle\Delta\mathbf{p}^{n} =𝐩temp+k=0(Y1+Δ𝐀2)kΔ𝐀2𝐩temp\displaystyle=\mathbf{p}_{temp}+\sum_{k=0}^{\infty}(Y_{1}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{A}_{2}\mathbf{p}_{temp}
=k=0Y1k(Δ𝐀2𝐩1)+k=0(Y1+Δ𝐀2)kΔ𝐀2i=0Y1i(Δ𝐀2𝐩1)\displaystyle=\sum_{k=0}^{\infty}Y_{1}^{k}(\Delta\mathbf{A}_{2}\mathbf{p}_{1})+\sum_{k=0}^{\infty}(Y_{1}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{A}_{2}\sum_{i=0}^{\infty}Y_{1}^{i}(\Delta\mathbf{A}_{2}\mathbf{p}_{1})

Then Δ𝐩nΔ𝐩o\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o} becomes as follows:

Δ𝐩o\displaystyle\Delta\mathbf{p}^{o} =k=0(𝐀+Δ𝐀1)k(Δ𝐀1𝐩0)=k=0Y1k(Δ𝐀1𝐩0)\displaystyle=\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1})^{k}(\Delta\mathbf{A}_{1}\mathbf{p}_{0})=\sum_{k=0}^{\infty}Y_{1}^{k}(\Delta\mathbf{A}_{1}\mathbf{p}_{0})
Δ𝐩nΔ𝐩o=\displaystyle\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o}= k=0Y1k(Δ𝐀2Δ𝐀1)𝐩1+k=0Y1kΔ𝐀1(𝐩1𝐩0)\displaystyle\sum_{k=0}^{\infty}Y_{1}^{k}(\Delta\mathbf{A}_{2}-\Delta\mathbf{A}_{1})\mathbf{p}_{1}+\sum_{k=0}^{\infty}Y_{1}^{k}\Delta\mathbf{A}_{1}(\mathbf{p}_{1}-\mathbf{p}_{0})
+k=0(Y1+Δ𝐀2)kΔ𝐀2i=0Y1i(Δ𝐀2𝐩1)\displaystyle+\sum_{k=0}^{\infty}(Y_{1}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{A}_{2}\sum_{i=0}^{\infty}Y_{1}^{i}(\Delta\mathbf{A}_{2}\mathbf{p}_{1})

Since 𝐩1𝐩0=Δ𝐩o=k=0Y1k(Δ𝐀1𝐩0)\mathbf{p}_{1}-\mathbf{p}_{0}=\Delta\mathbf{p}^{o}=\sum_{k=0}^{\infty}Y_{1}^{k}(\Delta\mathbf{A}_{1}\mathbf{p}_{0}), the second term k=0Y1kΔ𝐀1(𝐩1𝐩0)\sum_{k=0}^{\infty}Y_{1}^{k}\Delta\mathbf{A}_{1}(\mathbf{p}_{1}-\mathbf{p}_{0}) in the equation above is:

k=0Y1kΔ𝐀1(𝐩1𝐩0)=k=0Y1kΔ𝐀1i=0Y1i(Δ𝐀1𝐩0)\displaystyle\sum_{k=0}^{\infty}Y_{1}^{k}\Delta\mathbf{A}_{1}(\mathbf{p}_{1}-\mathbf{p}_{0})=\sum_{k=0}^{\infty}Y_{1}^{k}\Delta\mathbf{A}_{1}\sum_{i=0}^{\infty}Y_{1}^{i}(\Delta\mathbf{A}_{1}\mathbf{p}_{0})

Note that𝐩01=𝐩11=1\lVert\mathbf{p}_{0}\rVert_{1}=\lVert\mathbf{p}_{1}\rVert_{1}=1 since 𝐩0\mathbf{p}_{0} and 𝐩1\mathbf{p}_{1} are PageRank vectors. Recovering cc from 𝐀\mathbf{A} and Δ𝐀\Delta\mathbf{A}, k=0Y1k1\lVert\sum_{k=0}^{\infty}Y_{1}^{k}\rVert_{1} and k=0(Y1+Δ𝐀2)k1\lVert\sum_{k=0}^{\infty}(Y_{1}+\Delta\mathbf{A}_{2})^{k}\rVert_{1} becomes as follows:

k=0Y1k1\displaystyle\lVert\sum_{k=0}^{\infty}Y_{1}^{k}\rVert_{1} =k=0ck(𝐀+Δ𝐀1)k1=11c\displaystyle=\lVert\sum_{k=0}^{\infty}c^{k}(\mathbf{A}+\Delta\mathbf{A}_{1})^{k}\rVert_{1}=\frac{1}{1-c}
k=0(Y1+Δ𝐀2)k1\displaystyle\lVert\sum_{k=0}^{\infty}(Y_{1}+\Delta\mathbf{A}_{2})^{k}\rVert_{1} =k=0ck(𝐀+Δ𝐀1+Δ𝐀2)k1=11c\displaystyle=\lVert\sum_{k=0}^{\infty}c^{k}(\mathbf{A}+\Delta\mathbf{A}_{1}+\Delta\mathbf{A}_{2})^{k}\rVert_{1}=\frac{1}{1-c}

Note that 𝐀+Δ𝐀1\mathbf{A}+\Delta\mathbf{A}_{1} and 𝐀+Δ𝐀1+Δ𝐀2\mathbf{A}+\Delta\mathbf{A}_{1}+\Delta\mathbf{A}_{2} are column-normalized stochastic matrices. Then Δ𝐩nΔ𝐩o1\lVert\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o}\rVert_{1} is bounded as follow:

Δ𝐩nΔ𝐩o1c1cΔ𝐀2Δ𝐀11+(c1c)2(Δ𝐀112+Δ𝐀212)\displaystyle\lVert\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o}\rVert_{1}\leq\frac{c}{1-c}\lVert\Delta\mathbf{A}_{2}-\Delta\mathbf{A}_{1}\rVert_{1}+(\frac{c}{1-c})^{2}(\lVert\Delta\mathbf{A}_{1}\rVert_{1}^{2}+\lVert\Delta\mathbf{A}_{2}\rVert_{1}^{2})

Then, recovering cc from all terms, 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} is bounded as follows:

𝐩s′′1=Δ𝐩nΔ𝐩o1Δt2\displaystyle\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1}=\frac{\lVert\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o}\rVert_{1}}{\Delta t^{2}}
c1cΔ𝐀2Δ𝐀11+(c1c)2(Δ𝐀112+Δ𝐀212)Δt2\displaystyle\leq\frac{\frac{c}{1-c}\lVert\Delta\mathbf{A}_{2}-\Delta\mathbf{A}_{1}\rVert_{1}+(\frac{c}{1-c})^{2}(\lVert\Delta\mathbf{A}_{1}\rVert_{1}^{2}+\lVert\Delta\mathbf{A}_{2}\rVert_{1}^{2})}{\Delta t^{2}}

\hfill\blacksquare

Proof of Theorem 1.

(Upper bounds of 𝐩s1\lVert\mathbf{p}_{s}^{\prime}\rVert_{1} and 𝐩s′′1\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{1} with AnomalyS) Use Lemma 3 and 5. \hfill\blacksquare

Proof of Lemma 7 (Upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1}).

For brevity, denote 𝐩wo𝐩w(t)\mathbf{p}_{w}^{o}\leftarrow\mathbf{p}_{w}(t) and 𝐩wn𝐩w(t+Δt)\mathbf{p}_{w}^{n}\leftarrow\mathbf{p}_{w}(t+\Delta t). By Lemma 2, 𝐩wn𝐩wo1\lVert\mathbf{p}_{w}^{n}-\mathbf{p}_{w}^{o}\rVert_{1} is presented as follows:

𝐩wn𝐩wo=\displaystyle\mathbf{p}_{w}^{n}-\mathbf{p}_{w}^{o}=~{} k=0ck(𝐀~w+Δ𝐀w)kcΔ𝐀w𝐩wo\displaystyle\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w})^{k}c\Delta\mathbf{A}_{w}\mathbf{p}_{w}^{o}
+(1c)k=0ck(𝐀~w+Δ𝐀w)kΔ𝐛w\displaystyle+(1-c)\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w})^{k}\Delta\mathbf{b}_{w}
𝐩wn𝐩wo1\displaystyle\lVert\mathbf{p}_{w}^{n}-\mathbf{p}_{w}^{o}\rVert_{1}\leq~{} c1cΔ𝐀w1+Δ𝐛w1\displaystyle\frac{c}{1-c}\lVert\Delta\mathbf{A}_{w}\rVert_{1}+\lVert\Delta\mathbf{b}_{w}\rVert_{1}
𝐩w1=\displaystyle\lVert\mathbf{p}_{w}^{\prime}\rVert_{1}=~{} 𝐩wn𝐩wo1Δt1Δt(c1cΔ𝐀w1+Δ𝐛w1)\displaystyle\frac{\lVert\mathbf{p}_{w}^{n}-\mathbf{p}_{w}^{o}\rVert_{1}}{\Delta t}\leq\frac{1}{\Delta t}(\frac{c}{1-c}\lVert\Delta\mathbf{A}_{w}\rVert_{1}+\lVert\Delta\mathbf{b}_{w}\rVert_{1})

(𝐀~w+Δ𝐀w)k1=𝐩wo1=1\lVert(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w})^{k}\rVert_{1}=\lVert\mathbf{p}_{w}^{o}\rVert_{1}=1 since 𝐀~w+Δ𝐀w\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w} is a column-normalized stochastic matrix and 𝐩wo\mathbf{p}_{w}^{o} is a PageRank vector. \hfill\blacksquare

Proof of Lemma 8 (Upper bound of 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1}).

For brevity, denote 𝐩0𝐩w(tΔt),𝐩1𝐩w(t),𝐩2𝐩w(t+Δt),Δ𝐩o𝐩1𝐩0,Δ𝐩n𝐩2𝐩1,𝐀𝐀~w,Δ𝐀1Δ𝐀wo,Δ𝐀2Δ𝐀wn,Δ𝐛1Δ𝐛wo\mathbf{p}_{0}\leftarrow\mathbf{p}_{w}(t-\Delta t),\mathbf{p}_{1}\leftarrow\mathbf{p}_{w}(t),\mathbf{p}_{2}\leftarrow\mathbf{p}_{w}(t+\Delta t),\Delta\mathbf{p}^{o}\leftarrow\mathbf{p}_{1}-\mathbf{p}_{0},\Delta\mathbf{p}^{n}\leftarrow\mathbf{p}_{2}-\mathbf{p}_{1},\mathbf{A}\leftarrow\mathbf{\tilde{A}}_{w}^{\top},\Delta\mathbf{A}_{1}\leftarrow\Delta\mathbf{A}_{w_{o}},\Delta\mathbf{A}_{2}\leftarrow\Delta\mathbf{A}_{w_{n}},\Delta\mathbf{b}_{1}\leftarrow\Delta\mathbf{b}_{w_{o}} and Δ𝐛2Δ𝐛wn\Delta\mathbf{b}_{2}\leftarrow\Delta\mathbf{b}_{w_{n}}. In addition, we omit the cc term by substituting 𝐀c𝐀,Δ𝐀cΔ𝐀\mathbf{A}\leftarrow c\mathbf{A},\Delta\mathbf{A}\leftarrow c\Delta\mathbf{A} and Δ𝐛(1c)Δ𝐛\Delta\mathbf{b}\leftarrow(1-c)\Delta\mathbf{b} during this proof. By Lemma 2, Δ𝐩o\Delta\mathbf{p}^{o} and Δ𝐩n\Delta\mathbf{p}^{n} are presented as follows:

Δ𝐩o\displaystyle\Delta\mathbf{p}^{o} =k=0(𝐀+Δ𝐀1)kΔ𝐀1𝐩0+k=0(𝐀+Δ𝐀1)kΔ𝐛1\displaystyle=\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1})^{k}\Delta\mathbf{A}_{1}\mathbf{p}_{0}+\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1})^{k}\Delta\mathbf{b}_{1}
Δ𝐩n\displaystyle\Delta\mathbf{p}^{n} =k=0(𝐀+Δ𝐀1+Δ𝐀2)kΔ𝐀2𝐩1+k=0(𝐀+Δ𝐀1+Δ𝐀2)kΔ𝐛2\displaystyle=\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{A}_{2}\mathbf{p}_{1}+\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{b}_{2}

Substracting the first term of Δ𝐩o\Delta\mathbf{p}^{o} from the first term of Δ𝐩n\Delta\mathbf{p}^{n} is equal to 𝐩s′′\mathbf{p}_{s}^{\prime\prime} as shown in Lemma 6. Then Δ𝐩nΔ𝐩o\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o} is:

Δ𝐩nΔ𝐩o=𝐩s′′+k=0(𝐀+Δ𝐀1+Δ𝐀2)kΔ𝐛2k=0(𝐀+Δ𝐀1)kΔ𝐛1\displaystyle\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o}=\mathbf{p}_{s}^{\prime\prime}+\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{b}_{2}-\sum_{k=0}^{\infty}(\mathbf{A}+\Delta\mathbf{A}_{1})^{k}\Delta\mathbf{b}_{1}

By substituting 𝐀+Δ𝐀1\mathbf{A}+\Delta\mathbf{A}_{1} with Y2Y_{2}, the last two terms in the above equation are presented as follows:

k=0(Y2+Δ𝐀2)kΔ𝐛2k=0Y2kΔ𝐛1\displaystyle\sum_{k=0}^{\infty}(Y_{2}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{b}_{2}-\sum_{k=0}^{\infty}Y_{2}^{k}\Delta\mathbf{b}_{1}
=k=0Y2kΔ𝐛2+k=0(Y2+Δ𝐀2)iΔ𝐀2i=0Y2kΔ𝐛2k=0Y2kΔ𝐛1\displaystyle=\sum_{k=0}^{\infty}Y_{2}^{k}\Delta\mathbf{b}_{2}+\sum_{k=0}^{\infty}(Y_{2}+\Delta\mathbf{A}_{2})^{i}\Delta\mathbf{A}_{2}\sum_{i=0}^{\infty}Y_{2}^{k}\Delta\mathbf{b}_{2}-\sum_{k=0}^{\infty}Y_{2}^{k}\Delta\mathbf{b}_{1}
=k=0Y2k(Δ𝐛2Δ𝐛1)+k=0(Y2+Δ𝐀2)iΔ𝐀2i=0Y2kΔ𝐛2\displaystyle=\sum_{k=0}^{\infty}Y_{2}^{k}(\Delta\mathbf{b}_{2}-\Delta\mathbf{b}_{1})+\sum_{k=0}^{\infty}(Y_{2}+\Delta\mathbf{A}_{2})^{i}\Delta\mathbf{A}_{2}\sum_{i=0}^{\infty}Y_{2}^{k}\Delta\mathbf{b}_{2}

In the first equation, we treat k=0(Y2+Δ𝐀2)kΔ𝐛2\sum_{k=0}^{\infty}(Y_{2}+\Delta\mathbf{A}_{2})^{k}\Delta\mathbf{b}_{2} as an updated PageRank with the update Δ𝐀2\Delta\mathbf{A}_{2} from an original PageRank k=0Y2kΔ𝐛2\sum_{k=0}^{\infty}Y_{2}^{k}\Delta\mathbf{b}_{2}, then apply Lemma 1. Note that both k=0Y2k1\lVert\sum_{k=0}^{\infty}Y_{2}^{k}\rVert_{1} and k=0(Y2+Δ𝐀2)k1\lVert\sum_{k=0}^{\infty}(Y_{2}+\Delta\mathbf{A}_{2})^{k}\rVert_{1} have value 11c\frac{1}{1-c} since the original expressions with cc terms are as follows:

k=0Y2k\displaystyle\sum_{k=0}^{\infty}Y_{2}^{k} =k=0ck(𝐀~w+Δ𝐀wo)k\displaystyle=\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w_{o}})^{k}
k=0(Y2+Δ𝐀2)k\displaystyle\sum_{k=0}^{\infty}(Y_{2}+\Delta\mathbf{A}_{2})^{k} =k=0ck(𝐀~w+Δ𝐀wo+Δ𝐀wn)k\displaystyle=\sum_{k=0}^{\infty}c^{k}(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w_{o}}+\Delta\mathbf{A}_{w_{n}})^{k}

(𝐀~w+Δ𝐀wo)(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w_{o}}) and (𝐀~w+Δ𝐀wo+Δ𝐀wn)(\mathbf{\tilde{A}}_{w}^{\top}+\Delta\mathbf{A}_{w_{o}}+\Delta\mathbf{A}_{w_{n}}) are column-normalized stochastic matrices. Then, recovering cc from all terms, 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} is bounded as follow:

Δ𝐩nΔ𝐩o1Δt2\displaystyle\frac{\lVert\Delta\mathbf{p}^{n}-\Delta\mathbf{p}^{o}\rVert_{1}}{\Delta t^{2}} 𝐩s′′max\displaystyle\leq\lVert\mathbf{p}_{s}^{\prime\prime}\rVert_{max}
+1Δt2(Δ𝐛2Δ𝐛11+c1cΔ𝐀21Δ𝐛21)\displaystyle+\frac{1}{\Delta t^{2}}(\lVert\Delta\mathbf{b}_{2}-\Delta\mathbf{b}_{1}\rVert_{1}+\frac{c}{1-c}\lVert\Delta\mathbf{A}_{2}\rVert_{1}\lVert\Delta\mathbf{b}_{2}\rVert_{1})

\hfill\blacksquare

Proof of Theorem 2.

(Upper bounds of 𝐩w1\lVert\mathbf{p}_{w}^{\prime}\rVert_{1} and 𝐩w′′1\lVert\mathbf{p}_{w}^{\prime\prime}\rVert_{1} with AnomalyW) Use Lemma 46 and 8. \hfill\blacksquare