This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamic Graph Node Classification via Time Augmentation

Jiarui Sun1, Mengting Gu2, Chin-Chia Michael Yeh2, Yujie Fan2, Girish Chowdhary3, Wei Zhang2 {jsun57,girishc}@illinois.edu, {mengu,miyeh,yufan,wzhan}@visa.com 1Work done while at Visa Research. 1Dept. of Electrical and Computer Engineering,
University of Illinois at Urbana, Champaign, Urbana, USA
3Dept. of Computer Science and Dept. of Agricultural and Biological Engineering,
University of Illinois at Urbana, Champaign, Urbana, USA
2Visa Research, Palo Alto, USA
Abstract

Node classification for graph-structured data aims to classify nodes whose labels are unknown. While studies on static graphs are prevalent, few studies have focused on dynamic graph node classification. Node classification on dynamic graphs is challenging for two reasons. First, the model needs to capture both structural and temporal information, particularly on dynamic graphs with a long history and require large receptive fields. Second, model scalability becomes a significant concern as the size of the dynamic graph increases. To address these problems, we propose the Time Augmented Dynamic Graph Neural Network (TADGNN) framework. TADGNN consists of two modules: 1) a time augmentation module that captures the temporal evolution of nodes across time structurally, creating a time-augmented spatio-temporal graph, and 2) an information propagation module that learns the dynamic representations for each node across time using the constructed time-augmented graph. We perform node classification experiments on four dynamic graph benchmarks. Experimental results demonstrate that TADGNN framework outperforms several static and dynamic state-of-the-art (SOTA) GNN models while demonstrating superior scalability. We also conduct theoretical and empirical analyses to validate the efficiency of the proposed method. Our code is available here.

Index Terms:
graph neural network, node classification, dynamic graph

I Introduction

Graph is a ubiquitous data structure that represents relationships between entities. Aiming to classify graph nodes whose labels are unknown, the task of node classification for graph-structured data has recently received increasing attention in various domains, such as biology [1] and social sciences [2]. This is attainable by the recent advance of Graph Neural Networks (GNNs), which generate node representations for classification through a message passing mechanism where each node iteratively aggregates neighborhood information.

Existing GNNs mainly focus on static graphs [3, 4]. However, many real-world graphs are dynamic. Dynamic graphs can generally be categorized into discrete-time dynamic graphs and continuous-time dynamic graphs according to their representations. Discrete-time dynamic graphs use an ordered sequence of graph snapshots, where each snapshot represents aggregated dynamic information within a fixed time interval. Continuous-time dynamic graphs maintain detailed temporal information and are often more complex to model than the discrete case. For both cases, graph structure, node attributes and node labels evolve over time in a complex manner. Performing node classification on such dynamic graphs demands capturing this complicated evolving nature, which not only requires exploring the structural topology but also modeling the temporal aspect, which is absent in static GNN models.

In this work, we focus on node classification on discrete-time dynamic graphs. Because the existing dynamic graph learning methods are inefficient in both time and space as they either rely on recurrent structures[5] or the attention mechanism[6], these methods are not applicable for domains where dynamic graphs with a large number of time steps and nodes inhabit. To this end, we propose a novel GNN framework named Time Augmented Dynamic Graph Neural Network (TADGNN). TADGNN first employs a time augmentation module that constructs a time-augmented spatio-temporal graph based on the original graph snapshots. The time augmentation module efficiently realizes the temporal evolution of nodes across time in a structural sense. An information propagation module is then applied to effectively learn the spatio-temporal information by aggregating node features from time-augmented neighborhoods. Comparing with existing literature, TADGNN achieves a good efficiency in two folds. First, the time augmentation module models temporal graph dynamics without introducing significant computational demand since the time-augmented graph construction does not require learning. In addition, by learning different importance scores for time-augmented neighborhoods of each node once, the dynamic attention mechanism introduced in the information propagation module enhances the model capacity while introducing little to no computational overhead. These advantages make TADGNN both powerful and efficient. We summarize our key contributions as below:

  • A novel GNN framework named TADGNN that works efficiently for node classification problems on dynamic graphs.

  • A theoretical and empirical analysis that validates the training efficiency of TADGNN.

  • A comprehensive set of experiments that demonstrate the effectiveness of TADGNN over SOTA methods.

II Related Work

II-A Node Classification on Static Graphs

Many works address static graph node classification problem by exploiting random walk statistics to optimize a stochastic measure of node similarities [7, 8]. Another line of research relies on graph spectral properties, designing convolutional filters based on the spectral graph theory [9, 10]. One significant work in this domain is Graph Convolutional Network (GCN) [3], which restricts spectral filters to operate on each node’s immediate neighbors, and can be interpreted as spatial aggregation. It also inspires more spatial-based graph convolutional neural network models [11, 4, 12].

II-B Node Classification on Dynamic Graphs

II-B1 Discrete-time Dynamic Graphs

Many existing works utilize recurrent models for discrete-time dynamic graphs to capture the temporal dynamics into hidden states for classification. Some works use separate GNNs to model individual graph snapshot and use RNNs to learn temporal dynamics [5]; some other works integrate GNNs and RNNs together into one layer, aiming to learn the spatial and temporal information concurrently [13]. Sankar et al.  [6] use the self-attention mechanism along both the spatial and temporal dimensions of dynamic graphs, while Xu et al.  [1] combine the self-attention mechanism and RNNs together to learn the spatio-temporal contextual information jointly.

II-B2 Continuous-time Dynamic Graphs

Existing works on continuous-time dynamic graphs include RNN-based methods, temporal walk-based methods and temporal point process-based methods. RNN-based methods perform node updates through recurrent models at fine-grained timestamps [14], and the other two categories incorporate temporal dependencies through temporal random walks and parameterized temporal point processes [15, 16].

III Preliminaries

In this work, we focus on classifying nodes for discrete-time dynamic graphs 𝔾={𝒢1,𝒢2,,𝒢T}{\mathbb{G}}=\{{\mathcal{G}}^{1},{\mathcal{G}}^{2},\ldots,{\mathcal{G}}^{T}\}. Each snapshot is an undirected graph 𝒢t=(𝒱,t,𝒳t){\mathcal{G}}^{t}=({\mathcal{V}},{\mathcal{E}}^{t},{\mathcal{X}}^{t}) at time step tt. 𝒱{\mathcal{V}} denotes the set of all the nodes appeared in 𝔾{\mathbb{G}}, and |𝒱|=N|{\mathcal{V}}|=N is the number of all nodes. t𝒱t×𝒱t{\mathcal{E}}^{t}\subseteq{\mathcal{V}}^{t}\times{\mathcal{V}}^{t} denotes the edge set of the snapshot at time tt. The edge set can also be represented as an adjacency matrix AtN×NA^{t}\in{\mathbb{R}}^{N\times N} where Auvt=1A^{t}_{uv}=1 if (u,v)t(u,v)\in{\mathcal{E}}^{t} otherwise Auvt=0A^{t}_{uv}=0. 𝒳tN×d{\mathcal{X}}^{t}\in{\mathbb{R}}^{N\times d} denotes the node attribute matrix at time step tt where dd is the initial node feature dimension, and 𝒙vtd{\bm{x}}_{v}^{t}\in{\mathbb{R}}^{d} is the feature vector of node vv. 𝒴t{\mathcal{Y}}^{t} denotes the class label associated with each node at time step tt, and 𝒚vt{\bm{y}}_{v}^{t} is the class label of node vv. All nodes have their dynamic labels which evolve over time.

Problem Statement: Given the dynamic graph history 𝔾L={𝒢1,𝒢2,,𝒢t}{\mathbb{G}}_{L}=\{{\mathcal{G}}^{1},{\mathcal{G}}^{2},\ldots,{\mathcal{G}}^{t}\} and the corresponding node labels 𝕐L={𝒴1,{\mathbb{Y}}_{L}=\{{\mathcal{Y}}^{1}, 𝒴2,,𝒴t}{\mathcal{Y}}^{2},\ldots,{\mathcal{Y}}^{t}\} up to time step tt, we aim to classify all the nodes in future snapshots 𝔾U={𝒢t+1,𝒢t+2,,𝒢T}{\mathbb{G}}_{U}=\{{\mathcal{G}}^{t+1},{\mathcal{G}}^{t+2},\ldots,{\mathcal{G}}^{T}\} whose node labels 𝕐U={𝒴t+1,𝒴t+2,,𝒴T}{\mathbb{Y}}_{U}=\{{\mathcal{Y}}^{t+1},{\mathcal{Y}}^{t+2},\ldots,{\mathcal{Y}}^{T}\} are unknown.

Refer to caption
Figure 1: TADGNN model architecture.

IV TADGNN Architecture

In this section, we present our framework, namely Time Augmented Dynamic Graph Neural Network (TADGNN), which is depicted in Fig. 1. We first introduce the proposed time augmentation module. Then, we describe the information propagation module. Finally, we discuss the time complexity of TADGNN and compare it with several SOTA methods.

IV-A Time Augmentation Module

The time augmentation module is designed to model the temporal evolution of nodes across different snapshots in a structural sense. To help explain our method, we define the temporal walk under discrete-time dynamic graph setting.

Definition IV.1 (Temporal Walk).

Given a discrete-time dynamic graph sequence 𝔾={𝒢1,𝒢2,,𝒢T}{\mathbb{G}}=\{{\mathcal{G}}^{1},{\mathcal{G}}^{2},\ldots,{\mathcal{G}}^{T}\}, a temporal walk from v1v_{1} to vk+1v_{k+1} is defined as ((v1,v2,t1),(v2,v3,t2)((v_{1},v_{2},t_{1}),(v_{2},v_{3},t_{2}) ,,(vk,vk+1,tk)),\ldots,(v_{k},v_{k+1},t_{k})) such that (vi,vi+1)ti(v_{i},v_{i+1})\in{\mathcal{E}}^{t_{i}}, 𝒢ti=(𝒱,ti){\mathcal{G}}^{t_{i}}=({\mathcal{V}},{\mathcal{E}}^{t_{i}}) for 1ik1\leq i\leq k, and 1t1t2tkT1\leq t_{1}\leq t_{2}\leq\ldots\leq t_{k}\leq T.

A temporal walk is a time-respected walk that represents natural information flow through dynamic graph, but it is rarely explored in the literature. To model such time-respected behaviors, we design the time augmentation module, which is inspired by the temporal PageRank algorithm [17]. The time augmentation module aims to construct a time-augmented spatio-temporal graph, the walks on which can be considered as temporal walks simulated on the original dynamic graph. In this work, we consider three different realizations of such a time-augmented graph: the full time-augmentation realization, the self-evolution time-augmentation realization, and the disentangled time-augmentation realization. In the first two cases, the information propagation module is directly applied on the constructed time-augmented graph, propagating information structurally and temporally in a joint manner. In the last case, the time-augmented graph is disentangled into a structural graph and a temporal graph, where different information propagation modules are applied on these two separate graphs as a two-stage process.

IV-A1 Full Time-Augmentation Realization

The first realization aims to produce a bijection between all possible walks on the time-augmented graph and all possible temporal walks on the original temporal graph. Formally, we denote the adjacency matrix of the constructed time-augmented graph as 𝑨TN×TN{\bm{A}}\in{\mathbb{R}}^{TN\times TN}. Then, the adjacency matrix of the time-augmented graph can be represented as:

𝑨=[A~1A2A30A~2A300A~3000A~T],{\bm{A}}=\begin{bmatrix}\tilde{A}^{1}&A^{2}&A^{3}&\dots\\ 0&\tilde{A}^{2}&A^{3}&\dots\\ 0&0&\tilde{A}^{3}&\dots\\ \vdots&\vdots&\vdots&\ddots\\ 0&0&0&\tilde{A}^{T}\end{bmatrix}, (1)

where A~tN×N\tilde{A}^{t}\in{\mathbb{R}}^{N\times N} is the corresponding adjacency matrix at time step tt with self-loops added.

IV-A2 Self-Evolution Time-Augmentation Realization

In the self-evolution time-augmentation realization, we restrict the cross-time walks such that only temporally adjacent nodes can be attended by their historical versions. Formally, let vtv^{t} and vt+1v^{t+1} denote node vv at time tt and t+1t+1. We only construct edge (vt,vt+1)(v^{t},v^{t+1}). By doing so, all temporal walks can be realized by performing random walk on the time-augmented graph as it can visit a node’s neighbor at different time steps by visiting itself at different time steps first. With previous notation, the adjacency matrix of the time-augmented graph can be represented as:

𝑨=[A~1I00A~2I00A~3000A~T],{\bm{A}}=\begin{bmatrix}\tilde{A}^{1}&I&0&\dots\\ 0&\tilde{A}^{2}&I&\dots\\ 0&0&\tilde{A}^{3}&\dots\\ \vdots&\vdots&\vdots&\ddots\\ 0&0&0&\tilde{A}^{T}\end{bmatrix}, (2)

where IN×NI\in{\mathbb{R}}^{N\times N} is the identity matrix modeling node version updates across time. It is important to note that this realization of the time-augmented graph is relatively simplified and imposes trivial overhead in model space complexity.

IV-A3 Disentangled Time-Augmentation Realization

In the third case where we aim to completely disentangle structural and temporal modeling, we decouple the self-evolution time-augmentation case. Instead of considering one time-augmented graph 𝑨{\bm{A}}, we decouple it into 𝑨sTN×TN{\bm{A}}_{s}\in{\mathbb{R}}^{TN\times TN} and 𝑨tTN×TN{\bm{A}}_{t}\in{\mathbb{R}}^{TN\times TN} as below:

𝑨s=[A~1000A~2000A~3000A~T],𝑨t=[II00II00I000I].{\bm{A}}_{s}=\begin{bmatrix}\tilde{A}^{1}&0&0&\dots\\ 0&\tilde{A}^{2}&0&\dots\\ 0&0&\tilde{A}^{3}&\dots\\ \vdots&\vdots&\vdots&\ddots\\ 0&0&0&\tilde{A}^{T}\end{bmatrix},{\bm{A}}_{t}=\begin{bmatrix}I&I&0&\dots\\ 0&I&I&\dots\\ 0&0&I&\dots\\ \vdots&\vdots&\vdots&\ddots\\ 0&0&0&I\end{bmatrix}. (3)
TABLE I: Space and time complexity comparison between TADGNN and other baselines.
Model Type Space Complexity Time Complexity Sequential Ops.
TADGNN O(ET+LF2+LTNF)O(ET+LF^{2}+LTNF) O(LETF+LTNF2)O(LETF+LTNF^{2}) O(1)O(1)
DySAT O(LET+LF2+LTNF+LNT2)O(LET+LF^{2}+LTNF+LNT^{2}) O(LETF+LTNF2+LNT2F)O(LETF+LTNF^{2}+LNT^{2}F) O(1)O(1)
EvolveGCN O(ET+LTF2+LTNF)O(ET+LTF^{2}+LTNF) O(LETF+LTNF2)O(LETF+LTNF^{2}) O(T)O(T)

IV-B Information Propagation Module

The information propagation module aims to capture both the structural and temporal properties of the dynamic graph via aggregating information from each node’s time-augmented neighborhoods. We use 𝒱ta{\mathcal{V}}_{ta}, ta{\mathcal{E}}_{ta} to denote the node set and edge set of the time-augmented graph respectively, and |𝒱ta|=N×T|{\mathcal{V}}_{ta}|=N\times T. Thus, the input to the information propagation module is the initial node representations: {𝒙vtd,vt𝒱ta}\{{\bm{x}}_{v}^{t}\in{\mathbb{R}}^{d},\forall v^{t}\in{\mathcal{V}}_{ta}\}. For convenience, we drop the superscript on node feature vectors and nodes as 𝒙v{\bm{x}}_{v} and vv.

IV-B1 Adaptive Transition Module

First, inspired by [18], we employ the dynamic attention mechanism to compute the adaptive information transition matrix, which allows different importance of nodes to be learned. Formally, the adaptive graph transition matrix 𝑨^TN×TN\hat{{\bm{A}}}\in{\mathbb{R}}^{TN\times TN} is computed as:

euv=𝒂Tσatt(𝑨uv([𝚯L,𝚯R][𝒙u||𝒙v])),e_{uv}={{\bm{a}}}^{T}\sigma_{att}\left({\bm{A}}_{uv}\cdot\left(\left[{\bm{\Theta}}_{L},{\bm{\Theta}}_{R}\right]\left[{\bm{x}}_{u}||{\bm{x}}_{v}\right]\right)\right), (4)
αuv=exp(euv)u𝒩vexp(euv),\alpha_{uv}=\frac{\exp(e_{uv})}{\sum_{u\in{\mathcal{N}}_{v}}\exp(e_{uv})}, (5)
𝑨^uvT=αuv,\hat{{\bm{A}}}^{T}_{uv}=\alpha_{uv}, (6)

where 𝚯LF×d{\bm{\Theta}}_{L}\in{\mathbb{R}}^{F\times d} and 𝚯RF×d{\bm{\Theta}}_{R}\in{\mathbb{R}}^{F\times d} are the adaptive weight matrices applied on initial node features, [,][\cdot,\cdot] is the horizontal concatenation of two matrix, [||][\cdot||\cdot] is the vector concatenation operation, 𝒂F{\bm{a}}\in{\mathbb{R}}^{F} is a learnable weight vector for attention calculation, σatt()\sigma_{att}(\cdot) is the LeakyReLU activation, 𝒩v={u𝒱ta:(u,v)ta}{\mathcal{N}}_{v}=\{u\in{\mathcal{V}}_{ta}:(u,v)\in{\mathcal{E}}_{ta}\} denotes the immediate neighbor set of node vv on the time-augmented graph, and αuv\alpha_{uv} denotes the normalized attention coefficient of link (u,v)(u,v). For the disentangled time-augmentation case, the adaptive structural and temporal transition matrices 𝑨^sTN×TN\hat{{\bm{A}}}_{s}\in{\mathbb{R}}^{TN\times TN} and 𝑨^tTN×TN\hat{{\bm{A}}}_{t}\in{\mathbb{R}}^{TN\times TN} are computed separately following the exact same procedures but with different parameters. We would like to emphasize that the dynamic attention mechanism we introduced does not impose quadratic computational complexity with respect to the number of snapshots as previously mentioned attention-based models [6] since the attention scores are computed structurally for time-augmented neighborhoods.

IV-B2 Message Passing Module

In order to capture long range temporal and structural dependencies on the time-augmented graph, deep architectures are necessary. However, it is well-known that some GNN models such as GCN [3] and GAT [4] demonstrate significant performance degradation when the number of layers increases [19]. Thus, we adapt the message passing mechanism from GCNII [20] which relieves such over-smoothing issue through initial residual and identity mapping techniques. Formally, we denote the number of stacked information propagation layers as LL and the node representation output from the lthl^{th} layer for the time-augmented graph as {𝒉vlF,v𝒱}\{{\bm{h}}_{v}^{l}\in{\mathbb{R}}^{F},\forall v\in{\mathcal{V}}\} where FF is the node representation dimension. The input is either the initial encoded or intermediate node representations: {𝒉vl1F,v𝒱}\{{\bm{h}}_{v}^{l-1}\in{\mathbb{R}}^{F},\forall v\in{\mathcal{V}}\}. When l=1l=1, i.e., at the first layer, we have 𝒉v0=𝚯R𝒙v{\bm{h}}_{v}^{0}={\bm{\Theta}}_{R}{\bm{x}}_{v}. Compactly, we have 𝑯lTN×F{\bm{H}}^{l}\in{\mathbb{R}}^{TN\times F}. Then, we can define the message passing mechanism as:

𝑯l+1=σip(\displaystyle{\bm{H}}^{l+1}=\sigma_{ip}\Big{(} ((1αl)𝑨^𝑯l+αl𝑯0)\displaystyle\big{(}(1-\alpha_{l})\hat{{\bm{A}}}{\bm{H}}^{l}+\alpha_{l}{\bm{H}}^{0}\big{)}
((1βl)𝑰+βl𝑾l)),\displaystyle\big{(}(1-\beta_{l}){\bm{I}}+\beta_{l}{\bm{W}}^{l}\big{)}\Big{)}, (7)

where 𝑾lF×F{\bm{W}}^{l}\in{\mathbb{R}}^{F\times F} is the weight matrix, 𝑰F×F{\bm{I}}\in{\mathbb{R}}^{F\times F} is the identity matrix, σip\sigma_{ip} is the ReLU activation, αl\alpha_{l} and βl\beta_{l} are two hyperparameters adopting the same practice of GCNII [20]. We also adapt GCNII variant, whose message passing mechanism is defined as:

𝑯l+1=σip(\displaystyle{\bm{H}}^{l+1}=\sigma_{ip}\Big{(} (1αl)𝑨^𝑯l((1βl)𝑰+βl𝑾1l)+\displaystyle(1-\alpha_{l})\hat{{\bm{A}}}{\bm{H}}^{l}\big{(}(1-\beta_{l}){\bm{I}}+\beta_{l}{\bm{W}}_{1}^{l}\big{)}+
αl𝑯0((1βl)𝑰+βl𝑾2l)),\displaystyle\alpha_{l}{\bm{H}}^{0}\big{(}(1-\beta_{l}){\bm{I}}+\beta_{l}{\bm{W}}_{2}^{l}\big{)}\Big{)}, (8)

where different weights 𝑾1l,𝑾2l{\bm{W}}_{1}^{l},{\bm{W}}_{2}^{l} are used for aggregated node representations 𝑨^𝑯l\hat{{\bm{A}}}{\bm{H}}^{l} and initial residual representations 𝑯0{\bm{H}}^{0}. Again, for the disentangled time-augmentation case, we compute the node representations 𝑯{\bm{H}} by first utilizing structural transition matrix 𝑨^s\hat{{\bm{A}}}_{s} to capture structural properties and then using temporal transition matrix 𝑨^t\hat{{\bm{A}}}_{t} to learn temporal dynamics, with different weight matrices 𝑾sl{\bm{W}}_{s}^{l} and 𝑾tl{\bm{W}}_{t}^{l}. With 𝑨s,𝑨t{\bm{A}}_{s},{\bm{A}}_{t} as the structural and temporal graph respectively, we first use one information propagation module for the structural graph 𝑨s{\bm{A}}_{s}. After we obtain 𝑯s{\bm{H}}_{s} which summarizes structural information, we apply the second information propagation module guided by the temporal graph 𝑨t{\bm{A}}_{t}, with 𝑯s{\bm{H}}_{s} as the initial node embedding input. Finally, we use 𝑯L{\bm{H}}^{L} that summarizes both structural and temporal dynamics for the dynamic node classification task.

IV-C Learning Algorithm

The outputs of the information propagation module 𝑯L{\bm{H}}^{L} are then feed into a multilayer perceptron (MLP) decoder which converts node representations to class logits, and are optimized to classify nodes in 𝔾L{\mathbb{G}}_{L} correctly as following:

=v𝔾LJ(yv,y^v),y^v=MLP(𝒉vL),{\mathcal{L}}=\sum_{v\in{\mathbb{G}}_{L}}J(y_{v},\hat{y}_{v}),\hat{y}_{v}=\mathrm{MLP}({\bm{h}}_{v}^{L}), (9)

where J()J(\cdot) measures weighted cross-entropy loss between groundtruth yvy_{v} and predicted score y^v\hat{y}_{v}. The loss weights are calculated based on class distribution from training partition.

IV-D Complexity Analysis

In this section, we compare TADGNN’s space and time complexity with DySAT and EvolveGCN, which can be considered as the representatives of attention-based and RNN-based dynamic graph learning models. The space and time complexity for each method is shown in Table I. From the table, we can observe that comparing with both DySAT and EvolveGCN, TADGNN has the lowest space complexity since DySAT is dominated by O(LNT2)O(LNT^{2}) term and EvolveGCN is dominated by O(LTF2)O(LTF^{2}) term. In practice, memory space is a limiting factor for DySAT and EvolveGCN when NN and LL are necessarily large. From the temporal perspective, the overall time complexity of TADGNN is the lowest among the select baselines. DySAT’s time complexity includes a T2T^{2} term that makes it inefficient when modeling dynamic graphs with a large TT. As an RNN-incorporated GNN model, EvolveGCN has sequential operation dependence, which makes it infeasible to be processed in parallel and makes its practical training time significantly slower than purely convolution-based methods.

V Experiments

In this section, we evaluate the effectiveness of TADGNN for dynamic node classification task on four real-world datasets by comparing with four SOTA baselines.

V-A Datasets

TABLE II: Dataset statistics
Datasets Nodes Edges Timestamps Classes Split
Wiki 9,227 2,833 11 2 4/3/4
Reddit 10,984 18,928 11 2 4/3/4
ML-Rating 9,746 90,928 11 5 4/3/4
ML-Genre 9,704 90,334 11 15 4/3/4

We use four real-world dynamic graph datasets to conduct experiments, including two social networks, Wiki[21] and Reddit[22] and two rating networks, ML-Rating[23] and ML-Genre[23]. All datasets are sliced into graph snapshot sequences. Each snapshot contains information during fixed time intervals based on the timestamps provided in the raw data. We also make sure that each snapshot contains sufficient interactions/links between nodes. Note that though these datasets are not attributed due to the difficulty of acquiring public attributed dynamic graphs, TADGNN is designed for attributed dynamic graphs. As such, we use one-hot encoding of node IDs as node features in our experiments. Detailed dataset statistics are summarized in Table II.

V-B Experimental Setup

We select four SOTA node classification algorithms, including GAT [4], GCNII [20], EvolveGCN [13] and DySAT [6] to conduct model evaluation. We use PyTorch to implement TADGNN along with all other baselines. We set the output feature dimension of the information propagation module to 128128. Adam [24] is used as the optimizer with learning rate of 0.010.01 along with weight decay of 0.00050.0005 as regularization to train all models for 200200 epochs in all experiments. In particular, for TADGNN, we use validation set performance to tune the hyper-parameters and select the best time augmentation module type from three realizations. For baselines, we select hyper-parameters following the papers’ tuning guidelines. We report the averaged results along with corresponding standard deviations of five runs. All models share the same MLP decoder architecture with dropout ratio 0.30.3 to covert dynamic node representations to class logits.

V-C Node Classification Experiments

In this section, we describe the conducted experiments and report the results together with the observed insights.

V-C1 Task Description

Dynamic node classification task is used to evaluate TADGNN’s effectiveness compared with other baselines. All models are trained based on the training partition 𝔾L={𝒢1,𝒢2,,𝒢t}{\mathbb{G}}_{L}=\{{\mathcal{G}}^{1},{\mathcal{G}}^{2},\ldots,{\mathcal{G}}^{t}\} with label information. The task is to predict node labels in 𝔾U={𝒢t+1,𝒢t+2,,𝒢T}{\mathbb{G}}_{U}=\{{\mathcal{G}}^{t+1},{\mathcal{G}}^{t+2},\ldots,{\mathcal{G}}^{T}\} by using the trained model with the entire dynamic graph 𝔾={𝒢1,𝒢2,,𝒢T}{\mathbb{G}}=\{{\mathcal{G}}^{1},{\mathcal{G}}^{2},\ldots,{\mathcal{G}}^{T}\} as input. Note that TADGNN is inductive since it is agnostic to number of input graph snapshots.

V-C2 Experiment Setting

Each dataset is sliced into a discrete graph snapshot sequence where each snapshot corresponds to a fixed time interval that contains a sufficient number of links. In each set of experiments, the first tt snapshots are used for model training. After training, we use the next tt^{\prime} snapshots for validation, and the rest TttT-t-t^{\prime} snapshots for testing. For all train/validation/test stages, TADGNN processes input graph snapshots all at once. We show the train/validation/test split statistics in Table II. Weighted cross-entropy loss is used as the objective function, that class weights obtained from training partition are used to relieve data imbalance issue. The original DySAT model is temporally transductive as it cannot easily incorporate new graph snapshots for testing. Thus, we modify its position embeddings [25] to sine and cosine positional encodings as in Transformer [26]. For the static baselines, we employ one shared model across all snapshots. All models are trained end-to-end.

V-C3 Evaluation Metric

Since label information for all four datasets are highly imbalanced, we select Area Under the Receiver Operating Characteristic Curve (AUC) metric to measure performance of different models. We use macro-AUC scores for evaluation. Macro-AUC is computed by treating performances from all classes across all time steps (i.e., snapshots) equally, which is desirable for the case of dynamic node classification.

TABLE III: Dynamic node classification macro-AUC results with std.
Model Wiki Reddit ML-Rating ML-Genre
GAT 63.1±2.863.1\pm 2.8 62.9±1.062.9\pm 1.0 67.8±3.067.8\pm 3.0 81.2±2.6\mathbf{81.2\pm 2.6}
GCNII 67.5±1.6¯\underline{67.5\pm 1.6} 61.1±1.461.1\pm 1.4 71.5±3.7¯\underline{71.5\pm 3.7} 62.1±1.262.1\pm 1.2
DySAT 54.3±4.354.3\pm 4.3 66.5±0.3¯\underline{66.5\pm 0.3} 53.9±5.853.9\pm 5.8 51.9±1.851.9\pm 1.8
EvolveGCN 58.9±2.158.9\pm 2.1 62.2±1.662.2\pm 1.6 61.3±2.561.3\pm 2.5 51.1±2.551.1\pm 2.5
TADGNN 69.2±1.8\mathbf{69.2\pm 1.8} 66.7±1.6\mathbf{66.7\pm 1.6} 73.7±0.2\mathbf{73.7\pm 0.2} 75.0±1.5¯\underline{75.0\pm 1.5}
TABLE IV: Ablation study of TADGNN .
Components Perf.
Time Augmentation Adaptive Transition Reddit ML-Rating
61.4±2.861.4\pm 2.8 71.2±0.671.2\pm 0.6
63.8±0.863.8\pm 0.8 70.9±1.370.9\pm 1.3
66.7±1.666.7\pm 1.6 73.7±0.273.7\pm 0.2

V-C4 Results and Discussion

We show the macro-AUC results in Table III. Our observations include:

  • TADGNN achieves superior performance on most of the datasets. This indicates that TADGNN can better capture both structural and temporal graph dynamics comparing to other methods. On one dataset (ML-Genre) GAT performed well and has better result, but its performances have larger variance on different datasets. In addition,  TADGNN is more efficient in space and time complexity, and can be applied in more scenarios especially where the dynamic graphs are large.

  • Other dynamic baselines have inferior performance on certain datasets comparing to static methods. Besides, these dynamic baselines tend to keep larger variances for most datasets, which suggests that TADGNN is more robust to random weight initialization. The results from our hyper-parameter search and analysis further indicate that the performance of these methods can be sensitive to hyper-parameter values; however, large time cost and space cost limit their potentials. This further suggests the importance of using temporal information efficiently to guide the dynamic graph node classification problem.

V-D Efficiency Comparison

Refer to caption
Figure 2: Efficiency comparison among TADGNN, DySAT, and EvolveGCN.

In this section, we empirically demonstrate the efficiency advantage of TADGNN. Specifically, we compare our model to EvolveGCN and DySAT on the average training time per epoch using different number of training snapshots. These two methods are chosen since they represent widely used RNN-based and attention-based dynamic graph learning models. For all three models, while keeping all common settings (i.e., learning rate) the same, we employ the best performing hyper-parameter setups. In particular, we compute the training time used per epoch averaged across 200200 epochs from 22 to 88 snapshots as training partition across all four datasets.

The efficiency comparison is shown in Fig. 2. The results are expected, as the training time of TADGNN demonstrates a small increasing rate with respect to the number of training snapshots, while both DySAT and EvolveGCN depict much larger increasing rate as the number of training snapshots increases, due to the self-attention mechanism and sequential operation in RNN respectively. This empirical result confirms the efficiency advantage of TADGNN over other deep dynamic graph learning methods. More importantly, as the number of training snapshots and number of layers increase, both DySAT and EvolveGCN quickly fill up most of the GPU memories, thus hardly scalable to longer sequences or multi-layer setups, due to the extra memory requirements as discussed in Section IV-D. In contrast, TADGNN takes much less memory even when much more layers inhabit, which allows TADGNN to capture long range temporal and structural properties when necessary. This empirical result validates our theoretical complexity analysis, demonstrating better efficiency of TADGNN, that it is powerful in modeling large dynamic graph datasets.

V-E Ablation Study

We conduct an ablation study to investigate how the time augmentation and information propagation modules affect TADGNN’s modeling ability. Specifically, we prepare three TADGNN variants, i.e., 1) disable both the time augmentation and adaptive transition modules; 2) only disable the adaptive transition module; and 3) the full TADGNN setup, and observe how disabling of different components affect the model performance. We select two datasets (Reddit and ML-Rating) to cover different types of dynamic graphs. We summarize our observations as below:

  • Both the time augmentation and adaptive transition modules are vital in temporal dynamics modeling, as they help TADGNN achieve the best node classification results among different setups. In particular, we observe that our designed modules significantly boost TADGNN performance on Reddit. The results suggest that temporal information is imperative to address dynamic node classification task.

  • Coupling the time augmentation and adaptive transition modules together helps TADGNN reduce performance variance. We observe that enabling the time augmentation module only for ML-Rating increases performance variance significantly; on the contrary, though enabling time augmentation module only for Reddit reduces performance variance to the minimum, the performance can be further improved by introducing the adaptive learning module. This indicates the importance of distinguishing different levels of importance of node’s time-augmented neighborhoods.

VI Conclusion

Node classification on dynamic graphs has been gaining considerable attention. In this paper, we propose the Time Augmented Dynamic Graph Neural Network (TADGNN) framework for discrete-time dynamic graph node classification. TADGNN consists of two modules: 1) a time augmentation module for converting the dynamic graph to a time-augmented spatio-temporal graph and 2) an information propagation module to process the time-augmented spatio-temporal graph. Evaluations are performed on four real-world dynamic graph benchmarks. Experimental results illustrate that TADGNN outperforms SOTA GNN models in terms of both accuracy and efficiency.

References

  • [1] D. Xu, W. Cheng, D. Luo, X. Liu, and X. Zhang, “Spatio-temporal attentive RNN for node classification in temporal attributed graphs,” in IJCAI, 2019, pp. 3947–3953.
  • [2] E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,” CoRR, vol. abs/2006.10637, 2020.
  • [3] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in ICLR, 2017.
  • [4] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in ICLR, 2018.
  • [5] F. Manessi, A. Rozza, and M. Manzo, “Dynamic graph convolutional networks,” Pattern Recognition, vol. 97, 2020.
  • [6] A. Sankar, Y. Wu, L. Gou, W. Zhang, and H. Yang, “Dysat: Deep neural representation learning on dynamic graphs via self-attention networks,” in WSDM.   ACM, 2020, pp. 519–527.
  • [7] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in KDD.   ACM, 2016, pp. 855–864.
  • [8] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: online learning of social representations,” in KDD.   ACM, 2014, pp. 701–710.
  • [9] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets: Graph convolutional neural networks with complex rational spectral filters,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 97–109, 2019.
  • [10] F. M. Bianchi, D. Grattarola, L. Livi, and C. Alippi, “Graph neural networks with convolutional ARMA filters,” CoRR, vol. abs/1901.01343, 2019.
  • [11] W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in NeurIPS, 2017, pp. 1024–1034.
  • [12] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in ICLR, 2019.
  • [13] A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, H. Kanezashi, T. Kaler, T. B. Schardl, and C. E. Leiserson, “Evolvegcn: Evolving graph convolutional networks for dynamic graphs,” in AAAI, 2020.
  • [14] S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” in KDD.   ACM, 2019.
  • [15] G. H. Nguyen, J. B. Lee, R. A. Rossi, N. K. Ahmed, E. Koh, and S. Kim, “Continuous-time dynamic network embeddings,” in WWW.   ACM, 2018, pp. 969–976.
  • [16] R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha, “Dyrep: Learning representations over dynamic graphs,” in ICLR, 2019.
  • [17] P. Rozenshtein and A. Gionis, “Temporal pagerank,” in ECML PKDD, vol. 9852.   Springer, 2016, pp. 674–689.
  • [18] S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” CoRR, vol. abs/2105.14491, 2021.
  • [19] Q. Li, Z. Han, and X. Wu, “Deeper insights into graph convolutional networks for semi-supervised learning,” in AAAI, 2018, pp. 3538–3545.
  • [20] M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li, “Simple and deep graph convolutional networks,” in ICML, 2020.
  • [21] “Wikipedia edit history dump,” https://meta.wikimedia.org/wiki/Data_dumps.
  • [22] “Reddit data dump,” http://files.pushshift.io/reddit/.
  • [23] F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, dec 2015.
  • [24] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.
  • [25] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in ICML, vol. 70.   PMLR, 2017, pp. 1243–1252.
  • [26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017, pp. 5998–6008.