Fast and Accurate Anomaly Detection in Dynamic Graphs
with a Two-Pronged Approach
Abstract.
Given a dynamic graph stream, how can we detect the sudden appearance of anomalous patterns, such as link spam, follower boosting, or denial of service attacks? Additionally, can we categorize the types of anomalies that occur in practice, and theoretically analyze the anomalous signs arising from each type?
In this work, we propose AnomRank, an online algorithm for anomaly detection in dynamic graphs. AnomRank uses a two-pronged approach defining two novel metrics for anomalousness. Each metric tracks the derivatives of its own version of a ‘node score’ (or node importance) function. This allows us to detect sudden changes in the importance of any node. We show theoretically and experimentally that the two-pronged approach successfully detects two common types of anomalies: sudden weight changes along an edge, and sudden structural changes to the graph. AnomRank is (a) Fast and Accurate: up to 49.5 faster or 35% more accurate than state-of-the-art methods, (b) Scalable: linear in the number of edges in the input graph, processing millions of edges within 2 seconds on a stock laptop/desktop, and (c) Theoretically Sound: providing theoretical guarantees of the two-pronged approach.
1. Introduction
Network-based systems including computer networks and social network services have been a focus of various attacks. In computer networks, distributed denial of service (DDOS) attacks use a number of machines to make connections to a target machine to block their availability. In social networks, users pay spammers to ”Like” or ”Follow” their page to manipulate their public trust. By abstracting those networks to a graph, we can detect those attacks by finding suddenly emerging anomalous signs in the graph.
Various approaches have been proposed to detect anomalies in graphs, the majority of which focus on static graphs (Akoglu et al., 2010; Chakrabarti, 2004; Hooi et al., 2017; Jiang et al., 2016; Kleinberg, 1999; Shin et al., 2018a; Tong and Lin, 2011). However, many real-world graphs are dynamic, with timestamps indicating the time when each edge was inserted/deleted. Static anomaly detection methods, which are solely based on static connections, miss useful temporal signals of anomalies.
Several approaches (Eswaran et al., 2018; Shin et al., 2017; Eswaran and Faloutsos, 2018) have been proposed to detect anomalies on dynamic graphs (we review these in greater detail in Section 2 and Table 1). However, they are not satisfactory in terms of accuracy and speed. Accurately detecting anomalies in near real-time is important in order to cut back the impact of malicious activities and start recovery processes in a timely manner.



In this paper, we propose AnomRank, a fast and accurate online algorithm for detecting anomalies in dynamic graphs with a two-pronged approach. We classify anomalies in dynamic graphs into two types: AnomalyS and AnomalyW. AnomalyS denotes suspicious changes to the structure of the graph, such as through the addition of edges between previously unrelated nodes in spam attacks. AnomalyW indicates anomalous changes in the weight (i.e. number of edges) between connected nodes, such as suspiciously frequent connections in port scan attacks.
Various node score functions have been proposed to map each node of a graph to an importance score: PageRank (Page et al., 1999), HITS and its derivatives (SALSA) (Kleinberg, 1999; Lempel and Moran, 2001), Fiedler vector (Chung, 1994), etc. Our intuition is that anomalies induce sudden changes in node scores. Based on this intuition, AnomRank focuses on the 1st and 2nd order derivatives of node scores to detect anomalies with large changes in node scores within a short time. To detect AnomalyS and AnomalyW effectively, we design two versions of node score functions based on characteristics of these two types of anomalies. Then, we define two novel metrics, AnomRankS and AnomRankW, which measure the 1st and 2nd order derivatives of our two versions of node score functions, as a barometer of anomalousness. We theoretically analyze the effectiveness of each metric on the corresponding anomaly type, and provide rigid guarantees on the accuracy of AnomRank. Through extensive experiments with real-world and synthetic graphs, we demonstrate the superior performance of AnomRank over existing methods. Our main contributions are:
-
•
Online, two-pronged approach: We introduce AnomRank, an online detection method, for two types of anomalies (AnomalyS, AnomalyW) in dynamic graphs.
- •
- •
Reproducibility: our code and data are publicly available111https://github.com/minjiyoon/anomrank. The paper is organized in the usual way (related work, preliminaries, proposed method, experiments, and conclusions).
2. Related Work
We discuss previous work on detecting anomalous entities (nodes, edges, events, etc.) on static and dynamic graphs. See (Akoglu et al., 2015) for an extensive survey on graph-based anomaly detection.
Anomaly detection in static graphs can be described under the following categories:
-
•
Anomalous Node Detection: (Akoglu et al., 2010) extracts egonet-based features and finds empirical patterns with respect to the features. Then, it identifies nodes whose egonets deviate from the patterns. (Xu et al., 2007) groups nodes that share many neighbors and spots nodes that cannot be assigned to any community.
-
•
Anomalous Edge Detection: (Chakrabarti, 2004) encodes the input graph based on similar connectivity between nodes, then spots edges whose removal significantly reduces the total encoding cost. (Tong and Lin, 2011) factorizes the adjacency matrix and flags edges which introduce high reconstruction error as outliers.
- •
Anomaly detection in dynamic graphs can also be described under the following categories:
-
•
Anomalous Node Detection: (Sun et al., 2006) approximates the adjacency matrix of the current snapshot based on incremental matrix factorization. Then, it spots nodes corresponding to rows with high reconstruction error. (Wang et al., 2015) computes nodes features (degree, closeness centrality, etc) in each graph snapshot. Then, it identifies nodes whose features are notably different from their previous values and the features of nodes in the same community.
- •
-
•
Anomalous Subgraph Detection: (Beutel et al., 2013) spots near-bipartite cores where each node is connected to others in the same core densly within a short time. (Jiang et al., 2016) and (Shin et al., 2018b) detect groups of nodes who form dense subgraphs in a temporally synchronized manner. (Shin et al., 2017) identifies dense subtensors created within a short time.
-
•
Event Detection: (Eswaran et al., 2018; Aggarwal et al., 2011; Koutra et al., 2016; Henderson et al., 2010) detect the following events: sudden appearance of many unexpected edges (Aggarwal et al., 2011), sudden appearance of a dense graph (Eswaran et al., 2018), sudden drop in the similarity between two consecutive snapshots (Koutra et al., 2016), and sudden prolonged spikes and lightweight stars (Henderson et al., 2010).
Our proposed AnomRank is an anomalous event detection method with fast speed and high accuracy. It can be easily extended to localize culprits of anomalies into nodes and substructures (Section 4.4), and it detects various types of anomalies in dynamic graphs in a real-time. Table 1 compares AnomRank to existing methods.
Oddball (Akoglu et al., 2010) |
MetricFor. (Henderson et al., 2010) |
DenseAlert (Shin et al., 2017) |
SpotLight (Eswaran et al., 2018) |
SedanSpot (Eswaran and Faloutsos, 2018) |
AnomRank |
||
Real-time detection∗ | ✓ | ✓ | |||||
Allow edge deletions | ✓ | ✓ | ✓ | ||||
Structural anomalies | ✓ | ✓ | ✓ | ||||
Edge weight anomalies | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-
•
∗compute M edges within seconds.
3. Preliminaries
Table 2 gives a list of symbols and definitions.
Various node score functions have been designed to estimate importance (centrality, etc.) of nodes in a graph: PageRank (Page et al., 1999); HITS (Kleinberg, 1999) and its derivatives (SALSA) (Lempel and Moran, 2001); Fiedler vector (Chung, 1994); all the centrality measures from social network analysis (eigenvector-, degree-, betweeness-centrality (Wasserman and Faust, 1994)). Among them, we extend PageRank to design our node score functions in Section 4.2 because (a) it is fast to compute, (b) it led to the ultra-successful ranking method of Google, (c) it is intuitive (‘your importance depends on the importance of your neighbors’). Next, we briefly review PageRank and its incremental version in dynamic graphs.
PageRank
As shown in (Yoon et al., 2018b), PageRank scores for all nodes are represented as a PageRank score vector which is defined by the following equation:
where is a damping factor, is the row-normalized adjacency matrix, and is the starting vector. This equation is interpreted as a propagation of scores across a graph: initial scores in the starting vector are propagated across the graph by multiplying with ; since the damping factor is smaller than , propagated scores converge, resulting in PageRank scores. As shown in (Yoon et al., 2018a), PageRank computation time is proportional to the L1 length of the starting vector , since small L1 length of leads to faster convergence of iteration () with damping factor . Here L1 length of a vector is defined as the sum of absolute values of its entries. The L1 length of a matrix is defined as the maximum L1 length of its columns.
Incremental PageRank
When edges are inserted or deleted, PageRank scores can be updated incrementally from the previous PageRank scores. Let be the row-normalized adjacency matrix of a graph and be the row-normalized adjacency matrix after a change happened during . From now on, denote , the difference between transpose of normalized matrices and .
Lemma 1 (Dynamic PageRank, Theorem 3.2 in (Yoon et al., 2018a)).
Given updates in a graph during , an updated PageRank vector is computed incrementally from a previous PageRank vector as follows:
Note that, for small changes in a graph, the L1 length of the starting vector is much smaller than , the L1 length of the starting vector in the static PageRank equation, resulting in much faster convergence.
Symbol | Definition |
(un)directed and (un)weighted input graph | |
update in graph | |
numbers of nodes and edges in | |
() row-normalized adjacency matrix of | |
() row-normalized adjacency matrix of | |
() difference between and () | |
damping factor of PageRank | |
() uniform starting vector | |
() out-edge proportional starting vector | |
() row-normalized unweighted adjacency matrix | |
() row-normalized weighted adjacency matrix |
4. Proposed Method
Node scores present the importance of nodes across a given graph. Thus, as the graph evolves under normal behavior with the insertion and deletion of edges, node scores evolve smoothly. In contrast, anomalies such as network attacks or rating manipulation often aim to complete their goals in a short time, e.g. to satisfy their clients, inducing large and abrupt changes in node scores. Our key intuition is that such abrupt gains or losses are reflected in the 1st and 2nd derivative of node scores: large 1st derivative identifies large changes, while large 2nd derivative identifies abrupt changes in the trend of the data, thereby distinguishing changes from normal users who evolve according to smooth trends. Thus, tracking 1st and 2nd order derivatives helps detect anomalies in dynamic graphs.
Changes in a dynamic graph are classified into two types: structure changes and edge weight changes. Since these two types of changes affect node scores of the graph differently (more details in Section 4.2), we need to handle them separately. Thus, first, we classify anomalies in dynamic graphs into two types: AnomalyS and AnomalyW (Section 4.1). Then we design two node score functions based on characteristics of these two types of anomalies, respectively (Section 4.2). Next, we define two novel metrics for anomalousness using 1st and 2nd order derivatives of our node scores, and verify the effectiveness of each metric on the respective type of anomalies theoretically (Section 4.3). Based on these analyses, we introduce our method AnomRank, a fast and accurate anomaly detection algorithm in dynamic graphs (Section 4.4).


4.1. Anomalies in Dynamic Graphs
We classify anomalies in dynamic graphs into two types: AnomalyS and AnomalyW.
4.1.1. AnomalyS
It is suspicious if a number of edges are inserted/deleted among nodes which are previously unrelated/related. Hence, AnomalyS denotes a massive change with regard to the graph structure. One example of AnomalyS is spam in mail network graphs: a spammer sends mail to many unknown individuals, generating out-edges toward previously unrelated nodes. Data exfiltration attacks in computer network graphs are another example: attackers transfer a target’s data stealthily, generating unseen edges around a target machine to steal information. As illustrated in Figure 2, we define a structure change as follows:
Definition 1 (Structure Change).
If a node changes the destination of of its out-edges from previous neighbors to new neighbors , we call the change a structure change of size .
With abnormally large , a structure change becomes an AnomalyS. To detect AnomalyS, we need to focus on the existence of edges between two nodes, rather than the number of occurrences of edges between two nodes.
4.1.2. AnomalyW
In dynamic graphs, an edge between two nodes could occur several times. Edge weight is proportional to the number of edge occurrences. AnomalyW denotes a massive change of edge weights in a graph. One example of AnomalyW is port scan attacks in computer network graphs: to scan ports in a target IP address, attackers repeatedly connect to the IP address, thus increasing the number of edge occurrences to the target node. On Twitter, high edge density on a user-keyword graph could indicate bot-like behavior, e.g. bots posting about the same content repeatedly. As illustrated in Figure 2, we define an edge weight change as follows:
Definition 2 (Edge Weight Change).
If a node adds/subtracts out-edges to neighbor node , we call the change an edge weight change of size .
With abnormally large , an edge weight change becomes an AnomalyW. In contrast to AnomalyS, here we focus on the number of occurrences of each edge, rather than only the presence or absence of an edge.
4.2. Node Score Functions for Detecting AnomalyS and AnomalyW
To detect AnomalyS and AnomalyW, we first define two node score functions, ScoreS and ScoreW, which we use to define our anomalousness metrics in Section 4.3.
4.2.1. ScoreS
We introduce node score ScoreS, which we use to catch AnomalyS. Define the row-normalized unweighted adjacency matrix , a starting vector which is an all- vector of length (the number of nodes), and the damping factor .
Definition 3 (ScoreS).
ScoreS node score vector is defined by the following iterative equation:
For this (unweighted) case, ScoreS is the same as PageRank, but we refer to it as ScoreS for consistency with our later definitions. Note that the number of edge occurrences between nodes is not considered in ScoreS. Using Lemma 1, we can compute ScoreS incrementally at fast speed, in dynamic graphs.
4.2.2. ScoreW
Next, we introduce the second node score, ScoreW, which we use to catch AnomalyW. To incorporate edge weight, we use the weighted adjacency matrix instead of . However, this is not enough on its own: imagine an attacker node who adds a massive number of edges, all toward a single target node, and the attacker has no other neighbors. Since is row-normalized, this attacker appears no different in as if they only added a single edge toward the same target. Hence, to catch such attackers, we also introduce an out-degree proportional starting vector , i.e. setting the initial scores of each node proportional to its outdegree.
Definition 4 (ScoreW).
ScoreW node score vector is defined by the following iterative equation:
is the edge weight from node to node . is , where denotes the total edge weight of out-edges of node , and denotes the total edge weight of the graph.
Next, we show how ScoreW is computed incrementally in a dynamic graph. Assume that a change happens in graph in time interval , inducing changes and in the adjacency matrix and the starting vector, respectively.
Lemma 2 (Dynamic ScoreW).
Given updates and in a graph during , an updated score vector is computed incrementally from a previous score vector as follows:
Proof.
Note that, for small changes in a graph, the starting vectors of the last two terms, and have much smaller lengths than the original starting vector , so they can be computed at fast speed.
4.2.3. Suitability
We estimate changes in ScoreS induced by a structure change (Definition 1) and compare the changes with those in ScoreW to prove the suitability of ScoreS for detecting AnomalyS.
Lemma 3 (Upper Bound for Structure Change in ScoreS).
When a structure change of size happens around a node with out-neighbors, is upper-bounded by .
Proof.
In , only the -th column has nonzeros. Thus, . is normalized by as the total number of out-neighbors of node is . For out-neighbors who lose edges, . For out-neighbors who earn edges, . Then .
When a structure change is presented in ScoreW, since there is no change in the number of edges. Moreover since each row in is normalized by the total sum of out-edge weights, , which is larger than the total number of out-neighbors . In other words, a structure change generates larger changes in ScoreS () than ScoreW (). Thus ScoreS is more suitable to detect AnomalyS than ScoreW.
Similarly, we estimate changes in ScoreW induced by an edge weight change (Definition 2) and compare the changes with those in ScoreS to prove the suitability of ScoreW for detecting AnomalyW.
Lemma 4 (Upper Bound for Edge Weight Change in ScoreW).
When an edge weight change of size happens around a node with out-edge weights in a graph with total edge weights, and are upper bounded by and , respectively.
Proof.
In , only the -th column has nonzeros. Then . Node has edges toward each out-neighbor . Thus the total sum of out-edge weights, , is . Since an weighted adjacency matrix is normalized by the total out-edge weights, . After edges are added from node to node , for , for . Then is bounded as follows:
where is the total sum of out-edge weights of node . After edges are added from node to node , for , for . Then is bounded as follows:
In contrast, when an edge weight change is presented in ScoreS, since the number of out-neighbors is unchanged. Note that since is fixed in ScoreS. In other words, AnomalyW does not induce any change in ScoreS.
4.3. Metrics for AnomalyS and AnomalyW
Next, we define our two novel metrics for evaluating the anomalousness at each time in dynamic graphs.
4.3.1. AnomRankS
First, we discretize the first order derivative of ScoreS vector as follows:
Similarly, the second order derivative of is discretized as follows:
Next, we define a novel metric AnomRankS which is designed to detect AnomalyS effectively.
Definition 5 (AnomRankS).
Given ScoreS vector , AnomRankS is an matrix , concatenating 1st and 2nd derivatives of . The AnomRankS score is .
Next, we study how AnomRankS scores change under the assumption of a normal stream, or an anomaly, thus explaining how it distinguishes anomalies from normal behavior. First, we model a normal graph stream based on Lipschitz continuity to capture smoothness:
Assumption 1 ( in Normal Stream).
In a normal graph stream, is Lipschitz continuous with positive real constants and such that,
In Lemma 5, we back up Assumption 1 by upper-bounding and . For brevity, all proofs of this subsection are given in Supplement A.3.
Lemma 5 (Upper bound of ).
Given damping factor and updates in the adjacency matrix during , is upper-bounded by .
Proof.
Proofs are given in Supplement A.3.
We bound the length of in terms of length of and , where denotes the changes in from time to time , and denotes the changes in from to .
Lemma 6 (Upper bound of ).
Given damping factor and sequencing updates and , is upper-bounded by .
Proof.
Proofs are given in Supplement A.3.
Normal graphs have small changes thus having small . This results in small values of . In addition, normal graphs change gradually thus having small. This leads to small values of . Then, AnomRankS score has small values in normal graph streams under small upper bounds.
Observation 1 (AnomalyS in AnomRankS).
AnomalyS involves sudden structure changes, inducing large AnomRankS scores.
AnomalyS happens with massive changes () abruptly (). In the following Theorem 1, we explain Observation 1 based on large values of and in AnomalyS.
Theorem 1 (Upper bounds of and with AnomalyS).
When AnomalyS occurs with large and , L1 lengths of and are upper-bounded as follows:
Proof.
Proofs are given in Supplement A.3.
4.3.2. AnomRankW
We discretize the first and second order derivatives and of as follows:
Then we define the second metric AnomRankW which is designed to find AnomalyW effectively.
Definition 6 (AnomRankW).
Given ScoreW vector , AnomRankW is a matrix , concatenating 1st and 2nd derivatives of . The AnomRankW score is .
We model smoothness of in a normal graph stream using Lipschitz continuity in Assumption 2. Then, similar to what we have shown in the previous Section 4.3.1, we show upper bounds of and in Lemmas 7 and 8 to explain Assumption 2.
Assumption 2 ( in Normal Stream).
In a normal graph stream, is Lipschitz continuous with positive real constants and such that,
Lemma 7 (Upper bound of ).
Given damping factor , updates in the adjacency matrix, and updates in the starting vector during , is upper-bounded by .
Proof.
Proofs are given in Supplement A.3.
In the following lemma, denotes the upper bound of which we show in Lemma 6. is the changes in from time to time , and is the changes in from to . is the changes in from time to time , and is the changes in from to .
Lemma 8 (Upper bound of ).
Given damping factor , sequencing updates and , and sequencing updates and ,
the upper bound of is presented as follows:
Proof.
Proofs are given in Supplement A.3.
Normal graph streams have small changes (small and small ) and evolve gradually (small ). Then, normal graph streams have small AnomRankW scores under small upper bounds of and .
Observation 2 (AnomalyW in AnomRankW).
AnomalyW involves sudden edge weight changes, inducing large AnomRankW.
We explain Observation 2 by showing large upper bounds of and induced by large values of and in AnomalyW.
Theorem 2 (Upper bounds of and with AnomalyW).
When AnomalyW occurs with large and , L1 lengths of and are upper-bounded as follows:
Proof.
Proofs are given in Supplement A.3.
With high upper bounds of and , shown in Theorem 2, AnomalyW has high AnomRankW scores (Figure 1(c)). We detect AnomalyW successfully based on AnomRankW scores in real-world graphs (Figure 3).
4.4. Algorithm
Algorithm 1 describes how we detect anomalies in a dynamic graph. We first calculate updates in the unweighted adjacency matrix, updates in the weighted adjacency matrix, and updates in the out-edge proportional starting vector (Line 1). These computations are proportional to the number of edge changes, taking a few milliseconds for small changes. Then, AnomRank updates ScoreS and ScoreW vectors using the update rules in Lemmas 1 and 2 (Line 2). Then AnomRank calculates an anomaly score given ScoreS and ScoreW in Algorithm 2. AnomRank computes AnomRankS and AnomRankW, and returns the maximum L1 length between them as the anomaly score.
Normalization: As shown in Theorems 1 and 2, the upper bounds of AnomRankS and AnomRankW are based on the number of out-neighbors and the number of out-edge weights . This leads to skew in anomalousness score distributions since many real-world graphs have skewed degree distributions. Thus, we normalize each node’s AnomRankS and AnomRankW scores by subtracting its mean and dividing by its standard deviation, which we maintain along the stream.
Explainability and Attribution: AnomRank explains the type of anomalies by comparing AnomRankS and AnomRankW: higher scores of AnomRankS suggest that AnomalyS has happened, and vice versa. High scores of both metrics indicate a large edge weight change that also alters the graph structure. Furthermore, we can localize culprits of anomalies by ranking AnomRank scores of each node in the score vector, as computed in Lines 1 and 2 of Algorithm 2. We show this localization experimentally in Section 5.5.
selection: Our analysis and proofs, hold for any value of . The choice of is outside the scope of this paper, and probably best decided by a domain expert: large is suitable for slow (’low temperature’) attacks; small spots fast and abrupt attacks. In our experiments, we chose hour, and day, respectively, for a computer-network intrusion setting, and for a who-emails-whom network.
5. Experiments
In this section, we evaluate the performance of AnomRank compared to state-of-the-art anomaly detection methods on dynamic graphs. We aim to answer the following questions:
-
•
Q1. Practicality. How fast, accurate, and scalable is AnomRank compared to its competitors? (Section 5.2)
-
•
Q2. Effectiveness of two-pronged approach. How do our two metrics, AnomRankS and AnomRankW, complement each other in real-world and synthetic graphs? (Section 5.3)
-
•
Q3. Effectiveness of two-derivatives approach. How do the 1st and 2nd order derivatives of ScoreS and ScoreW complement each other? (Section 5.4)
-
•
Q4. Attribution. How accurately does AnomRank localize culprits of anomalous events? (Section 5.5)
5.1. Setup
We use SedanSpot (Eswaran and Faloutsos, 2018) and DenseAlert (Shin et al., 2017), state-of-the-art anomaly detection methods on dynamic graphs as our baselines. We use two real-world dynamic graphs, DARPA and ENRON, and one synthetic dynamic graph, RTM, with two anomaly injection scenarios. Anomalies are verified by comparing to manual annotations or by correlating with real-world events. More details of experimental settings and datasets are described in Supplement A.1 and A.2.





5.2. Practicality
We examine the practicality of AnomRank on the DARPA dataset, a public benchmark for Network Intrusion Detection Systems. In this network intrusion setting, our focus is on detecting high-volume (i.e. high edge-weight) intrusions such as denial of service (DOS) attacks, which are typically the focus of graph mining-based detection approaches. Hence, we use only the AnomRankW metric.
Precision and Recall: Using each method, we first compute anomaly scores for each of the graph snapshots, then select the top-k most anomalous snapshots (). Then we compute precision and recall for each method’s output. In Figure 3(d), AnomRank shows higher precision and recall than DenseAlert and SedanSpot on high ranks (top-). Considering that anomaly detection tasks in real-world settings are generally focused on the most anomalous instances, high accuracy on high ranks is more meaningful than high accuracy on low ranks. Moreover, considering that the number of ground truth anomalies is , its precision and recall up to top- better reflects its practicality.
Accuracy vs. Running Time: In Figures 1(a) and 3(a-c), AnomRank is most accurate and fastest. Compared to SedanSpot, AnomRank achieves up to higher precision on top-k ranks with faster speed. Compared to DenseAlert, AnomRank achieves up to higher precision with faster speed. DenseAlert and SedanSpot require several subprocesses (hashing, random-walking, reordering, sampling, etc), resulting in large computation time.
Scalability: Figure 1(b) shows the scalability of AnomRank with the number of edges. We plot the wall-clock time needed to run on the (chronologically) first edges of the DARPA dataset. This confirms the linear scalability of AnomRank with respect to the number of edges in the input dynamic graph. Note that AnomRank processes edges within second, allowing real-time anomaly detection.
Effectiveness: Figure 1(c) shows changes of AnomRank scores in the DARPA dataset, with time window of hour. Consistently with the large upper bounds shown in Theorems 1 and 2, ground truth attacks (red crosses) have large AnomRank scores in Figure 1(c). Given mean and std of anomaly scores of all snapshots, setting an anomalousness threshold of (mean + std), of spikes above the threshold are true positives. This shows the effectiveness of AnomRank as a barometer of anomalousness.
5.3. Effectiveness of Two-Pronged Approach
In this experiment, we show the effectiveness of our two-pronged approach using real-world and synthetic graphs.
5.3.1. Real-World Graph
We measure anomaly scores based on four metrics: AnomRankS, AnomRankW, SedanSpot, and DenseAlert, on the ENRON dataset. In Figure 4, AnomRankW and SedanSpot show similar trends, while AnomRankS detects different events as anomalies on the same dataset. DenseAlert shows similar trends with the sum of AnomRankS and AnomRankW, while missing several anomalous events. This is also reflected in the low accuracy of DenseAlert on the DARPA dataset in Figure 3. The anomalies detected by AnomRankS and AnomRankW coincide with major events in the ENRON timeline222http://www.agsm.edu.au/bobm/teaching/BE/Enron/timeline.html as follows:
-
(1)
June 12, 2000: Skilling makes joke at Las Vegas conference, comparing California to the Titanic.
-
(2)
August 23, 2000: FERC orders an investigation into Timothy Belden’s strategies designed to drive electricity prices up in California.
-
(3)
Oct. 3, 2000: Enron attorney Richard Sanders travels to Portland to discuss Timothy Belden’s strategies.
-
(4)
Dec. 13, 2000: Enron announces that Jeffrey Skilling will take over as chief executive.
-
(5)
Mar. 2001: Enron transfers large portions of EES business into wholesale to hide EES losses.
-
(6)
July 13, 2001: Skilling announces desire to resign to Lay. Lay asks Skilling to take the weekend and think it over.
-
(7)
Oct. 17, 2001: Wall Street Journal reveals the precarious nature of Enron’s business.
-
(8)
Nov. 19, 2001: Enron discloses it is trying to restructure a 690 million obligation.
-
(9)
Jan. 23-25, 2002: Lay resigns as chairman and CEO of Enron. Cliff Baxter, former Enron vice chairman, commits suicide.
-
(10)
Feb. 2, 2002: The Powers Report, a summary of an internal investigation into Enron’s collapse spreads out.

The high anomaly scores of AnomRankS coincide with the timeline events when Enron conspired to drive the California electricity price up (Events 1,2,3) and office politics played out in Enron (Events 4,5,6). Meanwhile, AnomRankW shows high anomaly scores when the biggest scandals of the company continued to unfold (Events 7, 8,9,10). Note that AnomRankS and AnomRankW are designed to detect different properties of anomalies on dynamic graphs. AnomRankS is effective at detecting AnomalyS like unusual email communications for conspiration, while AnomRankW is effective at detecting AnomalyW like massive email communications about big scandals that swept the whole company. The non-overlapping trends of AnomRankS and AnomRankW show that the two metrics complement each other successfully in real-world data. Summarizing the two observations we make:
-
•
Observation 1. AnomRankS and AnomRankW spot different types of anomalous events.
-
•
Observation 2. DenseAlert and SedanSpot detect a subset of the anomalies detected by AnomRank.
5.3.2. Synthetic Graph
In our synthetic graph generated by RTM method, we inject two types of anomalies to examine the effectiveness of our two metrics. Details of the injections are as follows:
-
•
InjectionS: We choose timestamps uniformly at random: at each chosen timestamp, we select nodes uniformly at random, and introduce all edges between these nodes in both directions.
-
•
InjectionW: We choose timestamps uniformly at random: at each chosen timestamp, we select two nodes uniformly at random, and add edges from the first to the second.
A clique is an example of AnomalyS with unusual structure pattern, while high edge weights are an example of AnomalyW. Hence, InjectionS and InjectionW are composed of AnomalyS and AnomalyW, respectively.
Then we evaluate the precision of the top-50 highest anomaly scores output by the AnomRankS metric and the AnomRankW metric. We also evaluate each metric on the DARPA dataset based on their top-250 anomaly scores. In Table 3, AnomRankS shows higher precision on InjectionS than AnomRankW, while AnomRankW has higher precision on InjectionW and DARPA. In Section 4.2.3, we showed theoretically that AnomalyS induces larger changes in AnomRankS than AnomRankW, explaining the higher precision of AnomRankS than AnomRankW on InjectionS. We also showed that adding additional edge weights has no effect on AnomRankS, explaining that AnomRankS does not work on InjectionW. For the DARPA dataset, AnomRankW shows higher accuracy than AnomRankS. DARPA contains 2.7M attacks, and of the attacks (2.4M attacks) are DOS attacks generated from only 2-3 source IP adderesses toward 2-3 target IP addresses. These attacks are of AnomalyW type with high edge weights. Thus AnomRankW shows higher precision on DARPA than AnomRankS.
we measure precision of the two metrics on the synthetic graph with two injection scenarios (InjectionS, InjectionW) and the real-world graph DARPA. We measure precision on top-50 and top-250 ranks on the synthetic graph and DARPA, respectively.
InjectionS | InjectionW | DARPA | |
AnomRankS only | .96 | .00 | .42 |
AnomRankW only | .82 | .79 | .69 |
AnomRankW-1st and AnomRankW-2nd are 1st and 2nd derivatives of ScoreW. Combining AnomRankW-1st and AnomRankW-2nd results in the highest precision.
InjectionS | InjectionW | DARPA | |
AnomRankW-1st | .06 | .11 | .65 |
AnomRankW-2nd | .80 | .78 | .61 |
AnomRankW | .82 | .79 | .69 |
5.4. Effectiveness of Two-Derivatives Approach
In this experiment, we show the effectiveness of 1st and 2nd order derivatives of ScoreS and ScoreW in detecting anomalies in dynamic graphs. For brevity, we show the result on ScoreW; result on ScoreS is similar. Recall that AnomRankW score is defined as the L1 length of where and are the 1st and 2nd order derivatives of ScoreW, respectively. We define two metrics, AnomRankW-1st and AnomRankW-2nd, which denote the L1 lengths of and , respectively. By estimating precision using AnomRankW-1st and AnomRankW-2nd individually, we examine the effectiveness of each derivative using the same injection scenarios and evaluation approach as Section 5.3.2.
In Table 4, AnomRankW-1st shows higher precision on the DARPA dataset, while AnomRankW-2nd has higher precision on injection scenarios. AnomRankW-1st detects suspiciously large anomalies based on L1 length of 1st order derivatives, while AnomRa nkW-2nd detects abruptness of anomalies based on L1 length of 2nd order derivatives. Note that combining 1st and 2nd order derivatives leads to better precision. This shows that 1st and 2nd order derivatives complement each other.
5.5. Attribution
In this experiment, we show that AnomRank successfully localizes the culprits of anomalous events as explained in the last paragraph of Section 4.4. In Figure 5, given a graph snapshot detected as an anomaly in the DARPA dataset, nodes (IP addresses) are sorted in order of their AnomRank scores. Outliers with significantly large scores correspond to IP addresses which are likely to be engaged in network intrusion attacks. At the th snapshot () when Back DOS attacks occur, the attacker IP () and victim IP () have the largest AnomRank scores. In the th snapshot () where Nmap probing attacks take place, the victim IP () has the largest score.
6. Conclusion
In this paper, we proposed a two-pronged approach for detecting anomalous events in a dynamic graph.
Our main contributions are:
-
•
Online, Two-Pronged Approach We introduced AnomRank, a novel and simple detection method in dynamic graphs.
- •
-
•
Practicality In Section 5, we show that AnomRank outperforms state-of-the-art baselines, with up to faster speed or higher accuracy. AnomRank is fast, taking about 2 seconds on a graph with 4.5 million edges.
Our code and data are publicly available333https://github.com/minjiyoon/anomrank.

Acknowledgment
This material is based upon work supported by the National Science Foundation under Grants No. CNS-1314632 and IIS-1408924. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
References
- (1)
- Aggarwal et al. (2011) Charu C Aggarwal, Yuchen Zhao, and Philip S Yu. 2011. Outlier detection in graph streams. In ICDE.
- Akoglu et al. (2008) Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2008. RTM: Laws and a recursive generator for weighted time-evolving graphs. In ICDM.
- Akoglu et al. (2010) Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD.
- Akoglu et al. (2015) Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: a survey. DMKD 29, 3 (2015), 626–688.
- Beutel et al. (2013) Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In WWW.
- Chakrabarti (2004) Deepayan Chakrabarti. 2004. Autopart: Parameter-free graph partitioning and outlier detection. In PKDD.
- Chung (1994) F.R.K. Chung. 1994. Spectral Graph Theory. Number no. 92 in CBMS Regional Conference Series. Conference Board of the Mathematical Sciences.
- Eswaran and Faloutsos (2018) Dhivya Eswaran and Christos Faloutsos. 2018. SEDANSPOT: Detecting Anomalies in Edge Streams. In ICDM.
- Eswaran et al. (2018) Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. SpotLight: Detecting Anomalies in Streaming Graphs. In KDD.
- Henderson et al. (2010) Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B Aditya Prakash, and Hanghang Tong. 2010. Metric forensics: a multi-level approach for mining volatile graphs. In KDD.
- Hooi et al. (2017) Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. TKDD 11, 4 (2017), 44.
- Jiang et al. (2016) Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2016. Catching synchronized behaviors in large networks: A graph mining approach. TKDD 10, 4 (2016), 35.
- Kleinberg (1999) Jon M Kleinberg. 1999. Authoritative sources in a hyperlinked environment. JACM 46, 5 (1999), 604–632.
- Koutra et al. (2016) Danai Koutra, Neil Shah, Joshua T Vogelstein, Brian Gallagher, and Christos Faloutsos. 2016. DeltaCon: principled massive-graph similarity function with attribution. TKDD 10, 3 (2016), 28.
- Lempel and Moran (2001) Ronny Lempel and Shlomo Moran. 2001. SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19, 2 (2001), 131–160.
- Lippmann et al. (1999) Richard Lippmann, Robert K Cunningham, David J Fried, Isaac Graf, Kris R Kendall, Seth E Webster, and Marc A Zissman. 1999. Results of the DARPA 1998 Offline Intrusion Detection Evaluation.. In Recent advances in intrusion detection, Vol. 99. 829–835.
- Page et al. (1999) Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
- Ranshous et al. (2016) Stephen Ranshous, Steve Harenberg, Kshitij Sharma, and Nagiza F Samatova. 2016. A scalable approach for outlier detection in edge streams using sketch-based approximations. In SDM.
- Shetty and Adibi (2004) Jitesh Shetty and Jafar Adibi. 2004. The Enron email dataset database schema and brief statistical report. Information sciences institute technical report, University of Southern California 4, 1 (2004), 120–128.
- Shin et al. (2018a) Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2018a. Patterns and anomalies in k-cores of real-world graphs with applications. KAIS 54, 3 (2018), 677–710.
- Shin et al. (2018b) Kijung Shin, Bryan Hooi, and Christo Faloutsos. 2018b. Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining. TKDD 12, 3 (2018), 28.
- Shin et al. (2017) Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017. DenseAlert: Incremental dense-subtensor detection in tensor streams. In KDD.
- Sun et al. (2006) Jimeng Sun, Dacheng Tao, and Christos Faloutsos. 2006. Beyond streams and graphs: dynamic tensor analysis. In KDD.
- Tong and Lin (2011) Hanghang Tong and Ching-Yung Lin. 2011. Non-negative residual matrix factorization with application to graph anomaly detection. In SDM.
- Wang et al. (2015) Teng Wang, Chunsheng Fang, Derek Lin, and S Felix Wu. 2015. Localizing temporal anomalies in large evolving graphs. In SDM.
- Wasserman and Faust (1994) Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. Cambridge university press.
- Xu et al. (2007) Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan: a structural clustering algorithm for networks. In KDD.
- Yoon et al. (2018a) Minji Yoon, Woojeong Jin, and U Kang. 2018a. Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees. In WWW.
- Yoon et al. (2018b) Minji Yoon, Jinhong Jung, and U Kang. 2018b. TPA: Fast, Scalable, and Accurate method for Approximate Random Walk with Restart on Billion Scale Graphs. In ICDE.
Appendix A Supplement
A.1. Experimental Setting
All experiments are carried out on a 3 GHz Intel Core i5 iMac, 16 GB RAM, running OS X 10.13.6. We implemented AnomRank and SedanSpot in C++, and we used an open-sourced implementation of DenseAlert444https://github.com/kijungs/densealert, provided by the authors of (Shin et al., 2017). To show the best trade-off between speed and accuracy, we set the sample size to 50 for SedanSpot and follow other parameter settings as suggested in the original paper (Eswaran and Faloutsos, 2018). For AnomRank, we set the damping factor to , and stop iterations for computing node score vectors when changes of node score vectors are less than .
A.2. Dataset
DARPA (Lippmann et al., 1999) has 4.5M IP-IP communications between 9.4K source IP and 2.3K destination IP over 87.7K minutes. Each communication is a directed edge (srcIP, dstIP, timestamp, attack) where the attack label indicates whether the communication is an attack or not. We aggregate edges occurring in every hour, resulting in a stream of 1463 graphs. We annotate a graph snapshot as anomalous if it contains at least 50 attack edges. Then there are 288 ground truth anomalies ( of total). We use the first 256 graphs for initializing means and variances needed during normalization (as described in Section 4.4).
ENRON (Shetty and Adibi, 2004) contains 50K emails from 151 employees over 3 years in the ENRON Corporation. Each email is a directed edge (sender, receiver, timestamp). We aggregate edges occurring in every day duration, resulting in a stream of 1139 graphs. We use the first 256 graphs for initializing means and variances.
RTM method (Akoglu et al., 2008) generates time-evolving graphs with repeated Kronecker products. We use the publicly available code555www.alexbeutel.com/l/www2013. The generated graph is a directed graph with 1K nodes and 8.1K edges over 2.7K timestamps. We use the first 300 timestamps for initializing means and variances.
A.3. Proofs
We prove upper bounds on the 1st and 2nd derivatives of ScoreS and ScoreW, showing their effectiveness in detecting AnomalyS and AnomalyW.
Proof of Lemma 5 (Upper bound of ).
For brevity, . By Lemma 1, is presented as follows:
Note that since is a column-normalized stochastic matrix, and is a PageRank vector.
Proof of Lemma 6 (Upper bound of ).
For brevity, and . In addition, we omit by substituting and during this proof. By Lemma 1, is:
can be viewed as an updated ScoreS with the original adjacency matrix , the update , and the starting vector from an original vector . Then, by Lemma 1, is presented as follows:
Then becomes as follows:
Since , the second term in the equation above is:
Note that since and are PageRank vectors. Recovering from and , and becomes as follows:
Note that and are column-normalized stochastic matrices. Then is bounded as follow:
Then, recovering from all terms, is bounded as follows:
Proof of Lemma 7 (Upper bounds of ).
For brevity, denote and . By Lemma 2, is presented as follows:
since is a column-normalized stochastic matrix and is a PageRank vector.
Proof of Lemma 8 (Upper bound of ).
For brevity, denote and . In addition, we omit the term by substituting and during this proof. By Lemma 2, and are presented as follows:
Substracting the first term of from the first term of is equal to as shown in Lemma 6. Then is:
By substituting with , the last two terms in the above equation are presented as follows:
In the first equation, we treat as an updated PageRank with the update from an original PageRank , then apply Lemma 1. Note that both and have value since the original expressions with terms are as follows:
and are column-normalized stochastic matrices. Then, recovering from all terms, is bounded as follow: