This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Z-GCNETs: Time Zigzags at Graph Convolutional Networks
for Time Series Forecasting

Yuzhou Chen    Ignacio Segovia-Dominguez    Yulia R. Gel
Abstract

There recently has been a surge of interest in developing a new class of deep learning (DL) architectures that integrate an explicit time dimension as a fundamental building block of learning and representation mechanisms. In turn, many recent results show that topological descriptors of the observed data, encoding information on the shape of the dataset in a topological space at different scales, that is, persistent homology of the data, may contain important complementary information, improving both performance and robustness of DL. As convergence of these two emerging ideas, we propose to enhance DL architectures with the most salient time-conditioned topological information of the data and introduce the concept of zigzag persistence into time-aware graph convolutional networks (GCNs). Zigzag persistence provides a systematic and mathematically rigorous framework to track the most important topological features of the observed data that tend to manifest themselves over time. To integrate the extracted time-conditioned topological descriptors into DL, we develop a new topological summary, zigzag persistence image, and derive its theoretical stability guarantees. We validate the new GCNs with a time-aware zigzag topological layer (Z-GCNETs), in application to traffic forecasting and Ethereum blockchain price prediction. Our results indicate that Z-GCNET outperforms 13 state-of-the-art methods on 4 time series datasets.

Machine Learning, ICML

1 Introduction

Many real world phenomena are intrinsically dynamic by nature, and ideally neural networks, encoding the knowledge about the world should also be based on more explicit time-conditioned representation and learning mechanisms. However, most currently available deep learning (DL) architectures are inherently static and do not systematically integrate time-dimension into the learning process. As a result, such model architectures often cannot reliably, accurately and on time learn many salient time-conditioned characteristics of complex interdependent systems, often resulting in outdated decisions and requiring frequent model updates.

In turn, in the last few years we observe an increasing interest to integrate deep neural network architectures with persistent homology representations of the learned objects, typically in a form of some topological layer in DL (Hofer et al., 2019; Carrière et al., 2020; Carlsson & Gabrielsson, 2020). Such persistent homology representations allow us to extract and learn descriptors of the object shape. (By shape here we broadly understand data characteristics that are invariant under continuous transformations such as bending, stretching, and compressing.) Such interest in combining persistent homology representations with DL is explained by the complementary multi-scale information topological descriptors deliver about the underlying objects, and higher robustness of these salient object characterisations to perturbations.

Here we take the first step toward merging the two directions. To enhance DL with the most salient time-conditioned topological information, we introduce the concept of zigzag persistence into time-aware DL. Building on the fundamental results on quiver representations, zigzag persistence studies properties of topological spaces which are connected via inclusions going in both directions (Carlsson & Silva, 2010; Tausz & Carlsson, 2011; Carlsson, 2019). Such generalization of ordinary persistent homology allows us to track topological properties of time-conditioned objects by extracting and tracking salient time-aware topological features through time-ordered inclusions. We propose to summarize the extracted time-aware zigzag persistence in a form of zigzag persistence images and then integrate the resulting information as a learnable time-aware zigzag layer into GCN.

The key novelty of our paper can be summarized as follows:

  • This is the first approach bridging time-conditioned DL with time-aware persistent homology representations of the data.

  • We propose a new vectorized summary for time-aware persistence, namely, zigzag persistence image and discuss its theoretical stability guarantees.

  • We introduce the concepts of time-aware zigzag persistence into learning time-conditioned graph structures and develop a zigzag topological layer (Z-GCNET) for time-aware graph convolutional networks (GCNs).

  • Our experiments on application Z-GCNET to traffic forecasting and Ethereum blockchain price prediction show that Z-GCNET surpasses 13 state-of-the-art methods on 4 benchmark datasets, both in terms of accuracy and robustness.

2 Related Work

Zigzag Persistence is yet an emerging tool in applied topological data analysis, but many recent studies have already shown its high utility in such diverse applications as brain sciences (Chowdhury et al., 2018), imagery classification (Adams et al., 2020), cyber-security of mobile sensor networks (Adams & Carlsson, 2015; Gamble et al., 2015), and characterization of flocking and swarming behavior in biological sciences (Corcoran & Jones, 2017; Kim et al., 2020). An alternative to zigzag but a closely related approach to assess properties of time-varying data with persistent homology, namely, crocker stacks, has been recently suggested by Xian et al. (2020), though the crocker stacks representations are not learnable in DL models. While zigzag has been studied in conjunction with dynamic systems (Tymochko et al., 2020) and time-evolving point clouds (Corcoran & Jones, 2017), till now, the utility of zigzag persistence remains untapped not only in conjunction with GCNs but with any other DL tools.

Time series forecasting From a deep learning perspective, Recurrent Neural Networks (RNNs) are natural methods to model time-dependent datasets (Yu et al., 2019). In particular, the stable architecture of Long Short Term Memories (LSTMs), and its variant called Gate Recurrent Unit (GRU), solves the gradient instability of predecessors and adds extra flexibility due to their memory storage and forget gates. The ability of LSTM and GRU to selectively learn historical patterns awaken an interest among researchers and major companies to solve a variety of time-dependant machine learning problems (Chae et al., 2018; Schmidhuber, ; Yuan et al., 2019; Shin & Kim, 2020). Although most of variants of LSTM architecture performs similarly well in large scale studies, see (Greff et al., 2017), GRU models has fewer parameters and, in general, perform similarly well as LSTM (Gao et al., 2020). Applications of RNN are limited by the underlying structure of the input data, these methods are not designed to handle data from non-Euclidean spaces, such as graphs and manifolds.

Graph convolutional networks To overcome the limitations of traditional convolution on graph structured data, graph convolution-based methods (Defferrard et al., 2016; Kipf & Welling, 2017; Veličković et al., 2018) are proposed to explore both global and local structures. GCNs usually consists of graph convolution layers which extract the edge characteristics between neighbor nodes and aggregate feature information from neighborhood via graph filters. In addition to convolution, there has been a surge of interest in applying GCNs time series forecasting tasks (Yu et al., 2018b; Yao et al., 2018; Yan et al., 2018; Guo et al., 2019; Weber et al., 2019; Pareja et al., 2020). Although these methods have achieved state-of-the-art performance in traffic flow forecasting, human action recognition, and anti-money laundering regulation, the design of spatial temporal graph convolution network framework is mostly based on modeling spatial-temporal correlation in terms of feature-level and pre-defined graph structure.

3 Time-Aware Topological Signatures of Graphs

Spatio-temporal Data as Graph Structures The spatial-temporal networks can be represented as a sequence of discrete snapshots, {𝒢1,𝒢2,,𝒢T}\{\mathcal{G}_{1},\mathcal{G}_{2},\dots,\mathcal{G}_{T}\}, where 𝒢t={𝒱,t,Wt\mathcal{G}_{t}=\{\mathcal{V},\mathcal{E}_{t},W_{t}} represents the graph structure at time step tt, t=1,,Tt=1,\ldots,T. In spatial network 𝒢t\mathcal{G}_{t}, 𝒱\mathcal{V} is a node set with cardinality |𝒱||\mathcal{V}| of NN and t𝒱×𝒱\mathcal{E}_{t}\subseteq\mathcal{V}\times\mathcal{V} is an edge set. A nonnegative symmetric N×NN\times N-matrix WtW_{t} with entries {ωijt}1i,jN\{\omega^{t}_{ij}\}_{1\leq i,j\leq N} represents the adjacency matrix of 𝒢t\mathcal{G}_{t}, that is, ωijt>0\omega^{t}_{ij}>0 for any eijtte^{t}_{ij}\in\mathcal{E}_{t} and ωijt=0\omega^{t}_{ij}=0, otherwise. Let F,F>0F,F\in\mathbb{Z}_{>0} be the number of different node features associated each node v𝒱v\in\mathcal{V}. Then, a N×FN\times F feature matrix 𝑿t\boldsymbol{X}_{t} serves as an input to the framework of time series process modeling.

Background on Persistent Homology Persistent homology is a mathematical machinery to extract the intrinsic shape properties of 𝒢\mathcal{G} that are invariant under continuous transformations such as bending, stretching, and twisting. The key idea is, based on some appropriate scale parameter, to associate 𝒢\mathcal{G} with a graph filtration 𝒢1𝒢n=𝒢\mathcal{G}^{1}\subseteq\ldots\subseteq\mathcal{G}^{n}=\mathcal{G} and then to equip each 𝒢i\mathcal{G}^{i} with an abstract simplicial complex 𝒞(𝒢i)\mathscr{C}(\mathcal{G}^{i}), 1in1\leq i\leq n, yielding a filtration of complexes 𝒞(𝒢1)𝒞(𝒢n)\mathscr{C}(\mathcal{G}^{1})\subseteq\ldots\subseteq\mathscr{C}(\mathcal{G}^{n}). Now, we can systematically and efficiently track evolution of various patterns such as connected components, cycles, and voids throughout this hierarchical sequence of complexes. Each topological feature, or pp-hole (e.g., number of connected components and voids), 0pd0\leq p\leq d, is represented by a unique pair (ib,jd)(i_{b},j_{d}), where birth ibi_{b} and death jdj_{d} are the scale parameters at which the feature first appears and disappears, respectively. The lifespan of the feature is defined as idjbi_{d}-j_{b}. The extracted topological information can be then summarized as a persistence diagram Dgm={(ib,jd)2|ib<jd}{\rm{Dgm}}=\{(i_{b},j_{d})\in\mathbb{R}^{2}|i_{b}<j_{d}\}. Multiplicity of a point (ib,jd)𝒟(i_{b},j_{d})\in\mathcal{D} is the number of pp-dimensional topological features (pp-holes) that are born and die at ibi_{b} and jbj_{b}, respectively. Points at the diagonal Dgm\rm{Dgm} are taken with infinite multiplicity. The idea is then to evaluate topological features that persist (i.e. have longer lifespan) over the complex filtration and, hence, are likelier to contain important structural information on the graph.

Finally, a filtration of the weighted graph 𝒢\mathcal{G} can be constructed in multiple ways. For instance, (i) we can select a scale parameter as edge weight and, as an abstract simplicial complex 𝒞\mathscr{C} on 𝒢\mathcal{G}, consider a Vietoris–Rips (VR) complex 𝒱ν(𝒢)={𝒢𝒢|diam(𝒢)ν}\mathscr{VR}^{\nu_{*}}(\mathcal{G})=\{\mathcal{G}^{{}^{\prime}}\subseteq\mathcal{G}|diam(\mathcal{G}^{{}^{\prime}})\leq\nu_{*}\}, that is, 𝒱ν(𝒢)\mathscr{VR}^{\nu_{*}}(\mathcal{G}) consists of nodes with a shortest weighted path of at most ν\nu_{*}. Hence, for a set of scale thresholds ν1νn\nu_{1}\leq\ldots\nu_{n}, we obtain a VR filtration 𝒱1𝒱n\mathscr{VR}^{{1}}\subseteq\ldots\subseteq\mathscr{VR}^{{n}}. Alternatively, (ii) we can consider a sublevel filtration induced by a continuous function ff defined on nodes of 𝒢\mathcal{G}. Let f:Vf:V\rightarrow\mathbb{R} and ν1<ν2<<νn\nu_{1}<\nu_{2}<\ldots<\nu_{n} be a sequence of sorted filtered values, then 𝒞i={σ𝒞:maxvσf(v)νi}\mathscr{C}^{i}=\{\sigma\in\mathscr{C}:\max_{v\in\sigma}f(v)\leq\nu_{i}\}. Note that a VR filtration (i) is a subcase of sublevel filtration (ii) with ff being the diameter function (Adams et al., 2017; Bauer, 2019).

Time-Aware Zigzag Persistence Since our primary aim is to assess interconnected evolution of multiple time-conditioned objects, the developed methodology for tracking topological and geometric properties of these objects shall ideally account for their intrinsically dynamic nature. We address this goal by introducing the concept of zigzag persistence into GNN. Zigzag persistence is a generalization of persistent homology proposed by (Carlsson & Silva, 2010) and provides a systematic and mathematically rigorous framework to track the most important topological features of the data persisting over time.

Let {𝒢t}1T\{\mathcal{G}_{t}\}_{1}^{T} be a sequence of networks observed over time. The key idea of zigzag persistence is to evaluate pairwise compatible topological features in this time-ordered sequence of networks. First, we define a set of network inclusions over time

𝒢1\displaystyle\mathcal{G}_{1} 𝒢2\displaystyle\mathcal{G}_{2} 𝒢3\displaystyle\mathcal{G}_{3} \displaystyle\;\;\ldots
\displaystyle\mathrel{\rotatebox[origin={c}]{-45.0}{$\hookrightarrow$}} \displaystyle\mathrel{\rotatebox[origin={c}]{-135.0}{$\hookrightarrow$}} \displaystyle\mathrel{\rotatebox[origin={c}]{-45.0}{$\hookrightarrow$}} \displaystyle\mathrel{\rotatebox[origin={c}]{-135.0}{$\hookrightarrow$}} ,\displaystyle\;\;\;\;\;\;\;,
𝒢1𝒢2\displaystyle\mathcal{G}_{1}\cup\mathcal{G}_{2} 𝒢2𝒢3\displaystyle\mathcal{G}_{2}\cup\mathcal{G}_{3}

where 𝒢k𝒢k+1\mathcal{G}_{k}\cup\mathcal{G}_{k+1} is defined as a graph with a node set VkVk+1V_{k}\cup V_{k+1} and an edge set EkEk+1E_{k}\cup E_{k+1}. Second, we fix a scale parameter ν\nu_{*} and build a zigzag diagram of simplicial complexes for the given ν\nu_{*} over the constructed set of network inclusions

𝒞(𝒢1,ν)\displaystyle\mathscr{C}(\mathcal{G}_{1},\nu_{*}) 𝒞(𝒢2,ν)\displaystyle\mathscr{C}(\mathcal{G}_{2},\nu_{*})
\displaystyle\mathrel{\rotatebox[origin={c}]{-45.0}{$\hookrightarrow$}} \displaystyle\mathrel{\rotatebox[origin={c}]{-135.0}{$\hookrightarrow$}} \displaystyle\mathrel{\rotatebox[origin={c}]{-45.0}{$\hookrightarrow$}}
𝒞(𝒢1𝒢2,ν)\displaystyle\mathscr{C}(\mathcal{G}_{1}\cup\mathcal{G}_{2},\nu_{*}) \displaystyle\ldots

Using the zigzag filtration for the given ν\nu_{*}, we can track birth and death of each topological feature over {𝒢t}1T\{\mathcal{G}_{t}\}_{1}^{T} as time points tbt_{b} and tdt_{d}, 1tbtdT1\leq t_{b}\leq t_{d}\leq T, respectively. Similarly to a non-dynamic case, we can extend the notion of persistence diagram for the analysis of topological characteristics of time-varying data delivered by the zigzag persistence.

Definition 3.1 (Zigzag Persistence Diagram (ZPD)).

Let tbt_{b} and tdt_{d} be time points, when a topological feature first appears (i.e., is born) and disappears (i.e., dies) in the time period [1,T][1,T] over the zigzag diagram of simplicial complexes for a fixed scale parameter ν\nu_{*}, respectively. If the topological feature first appears in 𝒞(𝒢k,ν)\mathscr{C}(\mathcal{G}_{k},\nu_{*}), tb=kt_{b}=k; if it first appears in 𝒞(𝒢k𝒢k+1,ν)\mathscr{C}(\mathcal{G}_{k}\cup\mathcal{G}_{k+1},\nu_{*}), tb=k+1/2t_{b}=k+1/2. Similarly, if a topological feature last appears in 𝒞(𝒢k,ν)\mathscr{C}(\mathcal{G}_{k},\nu_{*}), td=kt_{d}=k; and if it last appears in 𝒞(𝒢k𝒢k+1,ν)\mathscr{C}(\mathcal{G}_{k}\cup\mathcal{G}_{k+1},\nu_{*}), td=k+1/2t_{d}=k+1/2. A multi-set of points in 2\mathbb{R}^{2} DgmZZν={(tb,td)2|tb<tb}\rm{DgmZZ}_{\nu_{*}}=\{(t_{b},t_{d})\in\mathbb{R}^{2}|t_{b}<t_{b}\} for a fixed ν\nu_{*} is called a zigzag persistence diagram (ZPD).

Inspired by the notion of a persistent image as a summary of ordinary persistence (Adams et al., 2017), to input topological information summarized by ZPD into a GNN, we propose a representation of ZPD as zigzag persistence image (ZPI). ZPI is a finite-dimensional vector representation of a ZPD and can be computed through the following steps:

  • Step 1: Map a zigzag persistence diagram DgmZZν\rm{DgmZZ}_{\nu_{*}} to an integrable function ρDgmZZν:22\rho_{\rm{DgmZZ}_{\nu_{*}}}:\mathbb{R}^{2}\rightarrow\mathbb{R}^{2}, called a zigzag persistence surface. The zigzag persistence surface is given by sums of weighted Gaussian functions that are centered at each point in DgmZZν\rm{DgmZZ}_{\nu_{*}}.

  • Step 2: Perform a discretization and linearization of a subdomain of zigzag persistence surface ρDgmZZν\rho_{\rm{DgmZZ}_{\nu_{*}}} in a grid.

  • Step 3: The ZPI, i.e., a matrix of pixel values, can be obtained by subsequent integration over each grid box.

Refer to caption
Figure 1: Illustrations of 0- and 1-dimensional ZPD and 0- and 1-dimensional ZPI for PeMSD4 dataset using the sliding window size τ=12\tau=12, i.e., dynamic network with 12 graphs. Upper part shows the 0-dimensional ZPD and ZPI whilst the lower part is the 1-dimensional ZPD and ZPI.

The value of each pixel z2z\in\mathbb{R}^{2} within a ZPI is defined as:

ZPIν(z)=zμDgmZZνg(μ)exp{zμ22ϑ2}dzxdzy,\displaystyle\rm{ZPI}_{\nu_{*}}(z)=\iint\limits_{z}\sum_{\mu\in\rm{DgmZZ}^{\prime}_{\nu_{*}}}g\left(\mu\right)\exp{\bigg{\{}-\frac{||z-\mu||^{2}}{2\vartheta^{2}}\bigg{\}}}dz_{x}dz_{y},

where DgmZZν\rm{DgmZZ}^{\prime}_{\nu_{*}} is the transformed multi-set in DgmZZν\rm{DgmZZ}_{\nu_{*}}, i.e., DgmZZν(x,y)=(x,yx)\rm{DgmZZ}^{\prime}_{\nu_{*}}(x,y)=(x,y-x); g(μ)g(\mu) is a weighting function with mean μ=(μx,μy)2\mu=(\mu_{x},\mu_{y})\in\mathbb{R}^{2} and variance ϑ2\vartheta^{2}, which depends on the distance from the diagonal. Here the zigzag persistence surface is defined by ρDgmZZν=μZPDg(μ)exp{zμ2/2ϑ2}\rho_{\rm{DgmZZ}_{\nu_{*}}}=\sum_{\mu\in ZPD^{\prime}}g\left(\mu\right)\exp{\bigl{\{}-{||z-\mu||^{2}}/{2\vartheta^{2}}\bigr{\}}}.

Proposition 3.1.

Let g:2g:\mathbb{R}^{2}\to\mathbb{R} be a non-negative continuous and piece-wise differentiable function. Let DgmZZν\rm{DgmZZ}_{\nu_{*}} be a zigzag persistence diagram for some fixed scale parameter ν\nu_{*}, and let ZPIν\rm{ZPI}_{\nu_{*}} be its corresponding zigzag persistence image. Then, ZPIν\rm{ZPI}_{\nu_{*}} is stable with respect to the Wasserstein-1 distance between zigzag persistence diagrams.

Tracking evolution of topological patterns in these sequences of time-evolving graphs allows us to glean insights into which properties of the observed time-conditioned objects, e.g., traffic data or Ethereum transaction graphs, tend to persist over time and, hence, are likelier to play a more important role in predictive tasks.

4 Z-GCNETs

Given the graph 𝒢\mathcal{G} and graph signals 𝑿τ={𝑿tτ,,𝑿t1}τ×N×F\boldsymbol{X}^{\tau}=\{\boldsymbol{X}_{t-\tau},\dots,\boldsymbol{X}_{t-1}\}\in\mathbb{R}^{\tau\times N\times F} of τ\tau past time periods (i.e. window size τ\tau; where 𝑿iN×F\boldsymbol{X}_{i}\in\mathbb{R}^{N\times F} and i{tτ,,t1}i\in\{t-\tau,\dots,t-1\}), we employ a model targeted on multi-step time series forecasting. That is, given the windows size τ\tau of past graph signals and the ahead horizon size hh, our goal is to learn a mapping function which maps the historical data {𝑿tτ,,𝑿t1}\{\boldsymbol{X}_{t-\tau},\dots,\boldsymbol{X}_{t-1}\} into the future data {𝑿t,,𝑿t+h}\{\boldsymbol{X}_{t},\dots,\boldsymbol{X}_{t+h}\}.

Refer to caption
Figure 2: The architecture of Z-GCNETs. Given a sliding window, e.g. (𝒢t3,,𝒢t)(\mathcal{G}_{t-3},\dots,\mathcal{G}_{t}), we extract zigzag persistence image (ZPI) based on zigzag filtration. For the ZPI 2\in\mathbb{R}^{2} with the shape of p×pp\times p, Z-GCNETs first learn the topological features of the ZPI through CNN-based framework, and then apply global max-pooling to obtain the maximum values among pooled activation maps. The output of zigzag persistence representation learning is decoded into spatial graph convolution and temporal graph convolution, where the inputs of spatial graph convolution and temporal graph convolution are current timestamps, e.g. 𝒢t\mathcal{G}_{t} in red dashed box, and the sliding window (i.e., (𝒢t3,,𝒢t)(\mathcal{G}_{t-3},\dots,\mathcal{G}_{t}) in yellow dashed box) respectively. After graph convolution operations, features from time-aware zigzag topological layer are combined and moved to GRU to perform forecasting. Symbol \otimes represents dot product whilst \oplus denotes combination.

Laplacianlink In spatial-temporal domain, the topology of graph may have different structure at different points in time. In this paper, we use the self-adaptive adjacency matrix (Wu et al., 2019) as the normalized Laplacian by trainable node embedding dictionaries ϕN×c\boldsymbol{\phi}\in\mathbb{R}^{N\times c}, i.e., 𝑳=softmax(ReLU(ϕϕ))\boldsymbol{L}=softmax(ReLU(\boldsymbol{\phi}\boldsymbol{\phi}^{\top})), where the dimension of embedding c1c\geq 1. Although introducing node embedding dictionaries allows capture hidden spatial dependence information, it cannot sufficiently capture the global graph information and the similarity between nodes. To overcome the limits and explore neighborhoods of nodes at different depths, we define a novel polynomial representation for Laplacian based on positive powers of the Laplacian matrix. In this work, Laplacianlink 𝑳~\tilde{\boldsymbol{L}} is formulated as:

𝑳~=[𝑰,𝑳,𝑳2,,𝑳K]N×N×(K+1),\displaystyle\tilde{\boldsymbol{L}}=\left[\boldsymbol{I},\boldsymbol{L},\boldsymbol{L}^{2},\dots,\boldsymbol{L}^{K}\right]\in\mathbb{R}^{N\times N\times(K+1)}, (1)

where K1K\geq 1, 𝑰N×N\boldsymbol{I}\in\mathbb{R}^{N\times N} represents the identity matrix, and 𝑳kN×N\boldsymbol{L}^{k}\in\mathbb{R}^{N\times N}, with 0kK0\leq k\leq K, denotes the power series of normalized Laplacian.

By linking (i.e., stacking) the power series of normalized Laplacian, we build a diffusion formalism to accumulate neighbors’ information of different power levels. Hence, each node will successfully exploit and propagate spatial-temporal correlations after spatial and temporal graph convolutional operations.

Spatial graph convolution To model the spatial network 𝒢t\mathcal{G}_{t} at timestamp tt with its node feature matrix 𝑿t\boldsymbol{X}_{t}, we define the spatial graph convolution as multiplying the input of each layer with the Laplacianlink 𝑳~\tilde{\boldsymbol{L}}, which is then fed into the trainable projection matrix 𝚯=ϕ𝑽\boldsymbol{\Theta}=\boldsymbol{\phi}\boldsymbol{V} (where 𝑽\boldsymbol{V} stands for the trainable weight). In spatial-temporal graph modeling, we prefer to use weight sharing in matrix factorization rather than directly assigning a trainable weight matrix in order not only to avoid the risk of over-fitting but also to reduce the computational complexity. We compute the transformation in spatial domain, in each layer, as follows:

𝑯i,𝒮()=(𝑳~𝑯i,𝒮(1))ϕ𝑽,\displaystyle\boldsymbol{H}^{(\ell)}_{i,\mathcal{S}}=(\tilde{\boldsymbol{L}}\boldsymbol{H}^{(\ell-1)}_{i,\mathcal{S}})^{\top}\boldsymbol{\phi}\boldsymbol{V}, (2)

where ϕN×c\boldsymbol{\phi}\in\mathbb{R}^{N\times c} is the node embedding and 𝑽c×(K+1)×Cin×(Cout/2)\boldsymbol{V}\in\mathbb{R}^{c\times(K+1)\times C_{in}\times({C_{out}}/{2})} is the trainable weight (CinC_{in} and CoutC_{out} are the number of channels in input and output, respectively). 𝑯i,𝒮(1)N×(Cin/2)\boldsymbol{H}^{(\ell-1)}_{i,\mathcal{S}}\in\mathbb{R}^{N\times({C_{in}}/{2})} is the matrix of activations of spatial graph convolution to the \ell-th layer and 𝑯i,𝒮(0)=𝑿i\boldsymbol{H}^{(0)}_{i,\mathcal{S}}=\boldsymbol{X}_{i}. As a result, all information regarding the \ell-layered input at time ii are reflected in the latest state variable.

Temporal graph convolution In addition to spatial domain, the nature of spatial-temporal networks includes temporal relationships among consistent spatial networks. To catch the temporal correlation patterns of nodes, we choose longer window size (i.e., by using the entire sliding window as input) and apply temporal graph convolution to graph signals in sliding window 𝑿τ\boldsymbol{X}^{\tau}. The mechanism has several excellent properties: (i) there is no need to select a particular size of nested sliding window, i.e., does not introduce additional computational complexity, (ii) temporal correlation patterns can be well captured and evaluated by (long) window sizes, whereas short window sizes (i.e., nested sliding window) are likely to be bias and noise, and (iii) the sliding window 𝑿τ\boldsymbol{X}^{\tau} maximizes the efficiency of estimating temporal correlations. The temporal graph convolution is presented in the form:

𝑯i,𝒯()=((𝑳~𝑯i,𝒯(1))ϕ𝑼(1))𝑸,\displaystyle\boldsymbol{H}^{(\ell)}_{i,\mathcal{T}}=\left((\tilde{\boldsymbol{L}}\boldsymbol{H}^{(\ell-1)}_{i,\mathcal{T}})^{\top}\boldsymbol{\phi}\boldsymbol{U}^{(\ell-1)}\right)\boldsymbol{Q}, (3)

where 𝑼()c×Cin×(Cout/2)\boldsymbol{U}^{(\ell)}\in\mathbb{R}^{c\times C_{in}\times({C_{out}}/{2})} is trainable weight and 𝑼(0)c×F×(Cout/2)\boldsymbol{U}^{(0)}\in\mathbb{R}^{c\times F\times({C_{out}}/{2})}, 𝑸τ×1\boldsymbol{Q}\in\mathbb{R}^{\tau\times 1} is the trainable projection vector in temporal graph convolutional layer, and 𝑯i,𝒯(1)τ×N×(Cin/2)\boldsymbol{H}^{(\ell-1)}_{i,\mathcal{T}}\in\mathbb{R}^{\tau\times N\times({C_{in}}/{2})} is the hidden matrix fed to the \ell-th layer and 𝑯i,𝒯(0)=𝑿ττ×N×F\boldsymbol{H}^{(0)}_{i,\mathcal{T}}=\boldsymbol{X}^{\tau}\in\mathbb{R}^{\tau\times N\times F}.

Time-aware zigzag topological layer To learn the topological features across a range of spatial and temporal scales, we extend the CNN model to be used along with ZPI. In this research, we present a framework to aggregate the topological persistent features into the feature representation learned from GCN. Let ZPIτ\text{ZPI}^{\tau} denotes the ZPI based on the sliding window 𝑿τ\boldsymbol{X}^{\tau}. (Here for brevity we suppress dependence of ZPI on a scale parameter ν\nu_{*}.) We design the time-aware zigzag topological layer to (i) extract and learn the spatial-temporal topological features contained in ZPI, (ii) aggregate transformed information from (spatial or temporal) graph convolution and spatial-temporal topological information from zigzag persistence module, and (iii) mix spatial-temporal and spatial-temporal topological information. The information’s extraction, aggregation, and combination processes are expressed as:

𝒁()=ξmax(fcnn()(ZPIτ)),𝑺i()=𝑯i,𝒮()𝒁(),𝑻i()=𝑯i,𝒯()𝒁(),𝑯i,out()=COMBINE()(𝑺i(),𝑻i()),\displaystyle\begin{split}\boldsymbol{Z}^{(\ell)}&=\xi_{\text{max}}\left(f^{(\ell)}_{\text{cnn}}(\text{ZPI}^{\tau})\right),\\ \boldsymbol{S}_{i}^{(\ell)}&=\boldsymbol{H}^{(\ell)}_{i,\mathcal{S}}\boldsymbol{Z}^{(\ell)},\\ \boldsymbol{T}_{i}^{(\ell)}&=\boldsymbol{H}^{(\ell)}_{i,\mathcal{T}}\boldsymbol{Z}^{(\ell)},\\ \boldsymbol{H}^{(\ell)}_{i,out}&=\text{COMBINE}^{(\ell)}(\boldsymbol{S}_{i}^{(\ell)},\boldsymbol{T}_{i}^{(\ell)}),\\ \end{split} (4)

where fcnn()f_{\text{cnn}}^{(\ell)} represents the convolutional neural network (CNN) in the \ell-th layer, ξmax()\xi_{\text{max}}(\cdot) denotes global max-pooling operation, 𝒁()(Cout/2)\boldsymbol{Z}^{(\ell)}\in\mathbb{R}^{({C_{out}}/{2})} is the learned zigzag persistence representation from CNN, 𝑺i()N×(Cout/2)\boldsymbol{S}^{(\ell)}_{i}\in\mathbb{R}^{N\times({C_{out}}/{2})} is the aggregated spatial2\text{spatial}^{2}-temporal representation, 𝑻i()N×(Cout/2)\boldsymbol{T}^{(\ell)}_{i}\in\mathbb{R}^{N\times({C_{out}}/{2})} is the aggregated spatial-temporal2\text{temporal}^{2} representation, and the output of time-aware zigzag topological layer 𝑯i,out()N×Cout\boldsymbol{H}^{(\ell)}_{i,out}\in\mathbb{R}^{N\times C_{out}} combines hidden states 𝑺i()\boldsymbol{S}^{(\ell)}_{i} and 𝑻i()\boldsymbol{T}^{(\ell)}_{i} at time ii.

GRU with time-aware zigzag topological layer GRU is a variant of LSTM network. Compared with LSTM, GRU has a simpler structure, fewer training parameters, and more easily overcome vanishing and exploding gradient problems. The feed forward propagation of GRU with time-aware zigzag topological layer is recursively conducted as:

𝒛i=φ(𝑾z[𝑶i1,𝑯i,out]+𝒃z),𝒓i=φ(𝑾r[𝑶i1,𝑯i,out]+𝒃r),𝑶~i=tanh(𝑾o[𝒓i𝑶i1,𝑯i,out]+𝒃o),𝑶~i=𝒛t𝑶i1+(1𝒛t)𝑶~i,\displaystyle\begin{split}\boldsymbol{z}_{i}&=\varphi\left(\boldsymbol{W}_{z}\left[\boldsymbol{O}_{i-1},\boldsymbol{H}_{i,out}\right]+\boldsymbol{b}_{z}\right),\\ \boldsymbol{r}_{i}&=\varphi\left(\boldsymbol{W}_{r}\left[\boldsymbol{O}_{i-1},\boldsymbol{H}_{i,out}\right]+\boldsymbol{b}_{r}\right),\\ \tilde{\boldsymbol{O}}_{i}&=\tanh{\left(\boldsymbol{W}_{o}\left[\boldsymbol{r}_{i}\odot\boldsymbol{O}_{i-1},\boldsymbol{H}_{i,out}\right]+\boldsymbol{b}_{o}\right)},\\ \tilde{\boldsymbol{O}}_{i}&=\boldsymbol{z}_{t}\odot\boldsymbol{O}_{i-1}+(1-\boldsymbol{z}_{t})\odot\tilde{\boldsymbol{O}}_{i},\\ \end{split} (5)

where φ()\varphi(\cdot) is a non-linear function, i.e., the ReLU function; \odot is the elementwise product; 𝒛i\boldsymbol{z}_{i} and 𝒓i\boldsymbol{r}_{i} are update gate and reset gate, respectively; 𝒃z\boldsymbol{b}_{z}, 𝒃r\boldsymbol{b}_{r}, 𝒃o\boldsymbol{b}_{o}, 𝑾z\boldsymbol{W}_{z}, 𝑾r\boldsymbol{W}_{r}, and 𝑾o\boldsymbol{W}_{o} are trainable parameters; [𝑶i1,𝑯i,out]\left[\boldsymbol{O}_{i-1},\boldsymbol{H}_{i,out}\right] and 𝑶i\boldsymbol{O}_{i} are the input and output of GRU model, respectively. In this way, Z-GCNETs contains structural, temporal, and topological information.

5 Experiments

5.1 Datasets

We consider two types of networks (i) traffic network and (ii) Ethereum token network. Statistical overview of all datasets is given in Table 1. We now describe the detailed construction of traffic and Ethereum transaction networks as follows

Table 1: Summary of datasets used in time series forecasting tasks. [][\dagger] means the average number of edges in transportation networks under threshold ν\nu_{*}.
Dataset # Nodes Avg # edges Time range
Bytom 100 9.98 27/07/2017 - 07/05/2018
Decentraland 100 16.94 14/10/2017 - 07/05/2018
PeMSD4 307 316.10 01/01/2018 - 28/02/2018
PeMSD8 170 193.53 01/07/2016 - 31/08/2016

(i) The freeway Performance Measurement System (PeMS) data sources (i.e., PeMSD4 and PeMSD8) (Chen et al., 2001) collects real time traffic data in California. Both PeMSD4 and PeMSD8 datasets are aggregated to 5 minutes, therefore there are overall 16,992 and 17,856 data points in PeMSD4 and PeMSD8, respectively. In the traffic network, the node is represented by the loop detector which can detect real time measurement of traffic conditions and the edge is a freeway segment between two nearest nodes. Thus, the node feature matrix of traffic network XtN×3X_{t}\in\mathbb{R}^{N\times 3} denotes that each node has 3 features (i.e., flow rate, speed, and occupancy) at time tt. To capture both spatial and temporal dependencies, we reconstruct the traffic graph structure 𝒢t={𝒱,,Wtν}\mathcal{G}_{t}=\{\mathcal{V},\mathcal{E},W^{\nu_{*}}_{t}\} at time tt. Here, we define the right censoring weight WtνW^{\nu_{*}}_{t}

ωt,uvν={wt,uv(u,v) and wt,uvν0(u,v) and wt,uv>ν0(u,v),\displaystyle\omega^{\nu_{*}}_{t,uv}=\begin{cases}w_{t,uv}&(u,v)\in\mathcal{E}\text{ and }w_{t,uv}\leq\nu_{*}\\ 0&(u,v)\in\mathcal{E}\text{ and }w_{t,uv}>\nu_{*}\\ 0&(u,v)\notin\mathcal{E}\end{cases}, (6)

where wt,uv=ext,uxt,v2/γw_{t,uv}=e^{{-||x_{t,u}-x_{t,v}||^{2}}/{\gamma}} is based on the Radial Basis Function (RBF). To investigate the topology of weighted graph, the traffic graph structure 𝒢t\mathcal{G}_{t} is obtained via sub-level sets of the weight function, that is, we restrict to the final graph keep all edges of weights ωt,uvν\omega^{\nu_{*}}_{t,uv} below or equal to threshold ν\nu_{*} and therefore threshold ν\nu_{*} makes a difference in the topology of resulting graphs at different observation points. In experiments, we assign parameter γ=1.0\gamma=1.0 to RBF and set the thresholds in PeMSD4 and PeMSD8 to ν=0.5\nu_{*}=0.5 and ν=0.3\nu_{*}=0.3, respectively.

Table 2: Forecasting performance comparison of different approaches on PeMSD4 and PeMSD8 datasets.
Model PeMSD4 PeMSD8
MAE RMSE MAPE MAE RMSE MAPE
HA 38.03 59.24 27.88% 34.86 52.04 24.07%
VAR (Hamilton, 2020) 24.54 38.61 17.24% 19.19 29.81 13.10%
FC-LSTM (Sutskever et al., 2014) 26.77 40.65 18.23% 23.09 35.17 14.99%
GRU-ED (Cho et al., 2014) 23.68 39.27 16.44% 22.00 36.23 13.33%
DSANet (Huang et al., 2019) 22.79 35.77 16.03% 17.14 26.96 11.32%
DCRNN (Li et al., 2018) 21.22 37.23 14.17% 16.82 26.36 10.92%
STGCN (Yu et al., 2018a) 21.16 35.69 13.83% 17.50 27.09 11.29%
GraphWaveNet (Wu et al., 2019) 28.15 39.88 18.52% 20.30 30.82 13.84%
ASTGCN (Guo et al., 2019) 22.93 34.33 16.56% 18.25 28.06 11.64%
MSTGCN (Guo et al., 2019) 23.96 37.21 14.33% 19.00 29.15 12.38%
STSGCN (Song et al., 2020) 21.19 33.69 13.90% 17.13 26.86 10.96%
AGCRN (Bai et al., 2020) 19.83 32.30 12.97% 15.95 25.22 10.09%
LSGCN (Huang et al., 2020) 21.53 33.86 13.18% 17.73 26.76 11.20%
Z-GCNETs (ours) 19.50 31.61 12.78% 15.76 25.11 10.01%

(ii) The Ethereum blockchain was developed in 2014 to implement Smart Contracts, which are used to create and sell digital assets on the network111Ethereum.org. In particular, token assets are specially valuable because each token naturally represents a network layer with the same nodes, i.e., addresses of users, appearing in the networks, i.e., layers, of multiple tokens (di Angelo & Salzer, 2020). For our experiments, we extract two token networks with more than $100M in market value222EtherScan.io, Bytom and Decentraland tokens, from the publicly available Ethereum blockchain. We focus our analysis on the dynamic network generated by the daily transactions on each token network, and historical daily closed prices2. Since each token has different creation date333End date: May 7, 2018 , Bytom dynamic network contains 285 nets whilst Decentraland dynamic network has 206 nets. Ethereum’s token networks have an average of 442788/1192722 nodes/edges. To maintain a reasonable computation time, we obtain a subgraph via the maximum weight subgraph approximation method of (Vassilevska et al., 2006), which allows to reduce the dynamic network size considering only most MM active edges and its corresponding nodes. In these experiments, we use dynamic networks with N=100N=100 nodes. Let 𝒢t={𝒱t,t,W~t}\mathcal{G}_{t}=\{\mathcal{V}_{t},\mathcal{E}_{t},\tilde{W}_{t}\} denotes the reduced Ethereum blockchain network on day tt and XtN×1X_{t}\in\mathbb{R}^{N\times 1} be the node feature matrix, we assume a solely node feature: the node degree. Each node in 𝒱t\mathcal{V}_{t} is a buyer/seller and edges in t\mathcal{E}_{t} represent transactions in the network. To construct the similarity matrix W~t\tilde{W}_{t}, the normalized number of transactions between node pairs (u,v)(u,v) serves as the edge weight value wt,uvW~tw_{t,uv}\in\tilde{W}_{t}.

5.2 Experiment Settings

For multi-step time series forecasting, we evaluate the performances of Z-GCNETs on 4 time series datasets versus 13 state-of-the-art baselines (SOAs). Among them, Historical Average (HA) and Vector Auto-Regression (VAR) (Hamilton, 2020) are the statistical time series models. FC-LSTM (Sutskever et al., 2014) and GRU-ED (Cho et al., 2014) are RNN-based neural networks. DSANet (Huang et al., 2019) is the self-attention networks. DCRNN (Li et al., 2018), STGCN (Yu et al., 2018a), GraphWaveNet (Wu et al., 2019), ASTGCN (Guo et al., 2019), MSTGCN (Guo et al., 2019), STSGCN (Song et al., 2020) are the spatial-temporal GCNs. AGCRN (Bai et al., 2020) and LSGCN (Huang et al., 2020) are the GRU-based GCNs. We conduct our experiments on NVIDIA GeForce RTX 3090 GPU card with 24GB memory. The PeMSD4 and PeMSD8 are split in chronological order with 60% for training sets, 20% for validation sets, and 20% for test sets. For PeMSD4 and PeMSD8, Z-GCNETs contains 2 layers, with each layer has 64 hidden units. We consider the window size τ=12\tau=12 and horizon h=12h=12 for Z-GCNETs on both PeMSD4 and PeMSD8 datasets. Besides, the inputs of PeMSD4 and PeMSD8 are normalized by min-max normalization approach.

Table 3: The computation cost for the generation of zigzag persistence image (ZPI) and a single training epoch of Z-GCNETs.
Dataset Window Size Average Time Taken (sec)
ZPI Z-GCNETs (epoch)
Decentraland 7 0.03 2.09
Bytom 7 0.03 2.05
PeMSD4 12 0.86 30.12
PeMSD8 12 0.65 36.76

We split Bytom and Decentraland with 80% for training sets and 20% for test sets. For token networks, Z-GCNETs contains 2 layers, where each layer has 16 hidden units. We use one week historical data to predict the next week’s data, i.e., window size τ=7\tau=7 and horizon h=7h=7 over Bytom and Decentralnad datasets. All reported results are based on the weight rank clique filtration. More detailed description of the experimental settings can be found in Appendix A, while the analysis of sensitivity with respect to the choice of filtration is in Appendix B. The code is available at https://github.com/Z-GCNETs/Z-GCNETs.git.

Table 3 reports the average running time of ZPI generation and training time per epoch of our Z-GCNETs model on all datasets.

5.3 Comparison with the Baseline Methods

Table 2 shows the comparison of our proposed Z-GCNETs and SOAs for traffic flow forecasting tasks. We assess model performance with Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) on PeMSD4 and PeMSD8. From Table 2, we find that our proposed model Z-GCNETs consistently outperforms SOAs on PeMSD4 and PeMSD8. The improvement gain of Z-GCNETs over the next most accurate methods ranges from 0.44% to 2.06% in RMSE for PeMSD4 and PeMSD8. Table 4 shows the performance results on Bytom and Decentraland using RMSE. We find that Z-GCNETsfor the weight rank clique filtration outperforms AGCRN by margins of 3.42% and 2.94% (see the results in Appendix B for other types of filtrations). In contrast with SOAs, Z-GCNETs fully leverages the topological information by incorporating zigzag topological features via CNN on topological space. Given the continuous nature of time series data, our analysis and experiments show that establishing a connection between the time zigzag pairs is naturally a perfect fit to topological spaces and continuous maps.

Table 4: Forecasting results (MAPE) on Ethereum token networks.
Model Bytom Decentraland
FC-LSTM (Sutskever et al., 2014) 40.72% 33.46%
DCRNN (Li et al., 2018) 35.36% 27.69%
STGCN (Yu et al., 2018a) 37.33% 28.22%
GraphWaveNet (Wu et al., 2019) 39.18% 37.67%
ASTGCN (Guo et al., 2019) 34.49% 27.43%
AGCRN (Bai et al., 2020) 34.46% 26.75%
LSGCN (Huang et al., 2020) 34.91% 28.37%
Z-GCNETs 31.04% 23.81%

5.4 Ablation Study

To better understand the importance of the different components in Z-GCNETs, we conduct ablation studies on PeMSD4 and PeMSD8 and the results are presented in Table 5. The results show that Z-GCNETs have better performance over Z-GCNETs without zigzag persistence representation learning (zigzag learning), spatial graph convolution (GCNSpatial), or temporal graph convolution (GCNTemporal). Specifically, we observe that when removing GCNTemporal, the multi-step forecasting is affected significantly, i.e., Z-GCNETs outperforms Z-GCNETs without temporal graph convolution with relative gain 6.46% on RMSE for PeMSD4. Comparison results on PeMSD8, w/o zigzag learning and w/o show the necessity for encoding topological information and modeling spatial structural information in multi-step forecasting over spatial-temporal time series dataset. Additional results for the ablation study on Ethereum tokens are presented in Appendix C.

Table 5: Ablation study of the network architecture. [*] means the GCNSpatial is only applied to the most recent time point in the sliding window.
Architecture MAE RMSE MAPE
PeMSD4 Z-GCNETs 19.50 31.61 12.78%
W/o Zigzag learning 19.65 31.94 13.01%
W/o GCNSpatial 19.86 31.96 13.19%
W/o GCNTemporal 20.76 33.18 13.60%
PeMSD8 Z-GCNETs 15.76 25.11 10.01%
W/o Zigzag learning 17.16 27.06 10.77%
W/o GCNSpatial 16.92 26.86 10.33%
W/o GCNTemporal 16.66 26.44 10.39%

5.5 How does time-aware zigzag persistence help?

To track the importance of pp-dimensional topological features in Z-GCNETs (i.e., 0-dimensional and 1-dimensional holes), we evaluate the performance of Z-GCNETs on two different aspects: (i) the sensitivity of Z-GCNETs to different dimensional topological features and (ii) the effects of threshold ν\nu_{*} in constructed input networks along with zigzag persistence. Table 6 summarizes the results using different dimensional topological features and different thresholds on PeMSD4 and PeMSD8. Under the same scale parameter ν\nu_{*}, we find that 1-dimensional topological features consistently outperform 0-dimensional terms on both datasets. Furthermore, the forecasting results on PeMSD4 are not significantly affected by varying ν\nu_{*}. However, on on PeMSD8 1-dimensional topological features constructed under ν\nu_{*} of 0.3 yield better results than 1-dimensional terms constructed under ν=0.5\nu_{*}=0.5.

Table 6: Results of zigzag persistence on the dynamic network with different dimensional features and threshold values (ν)(\nu_{*}).
Zigzag module PeMSD4
MAE RMSE MAPE
Z-GCNETs + 0-th ZPIν=0.3{}_{\nu_{*}=0.3} 19.73 32.04 12.93%
1-st ZPIν=0.3{}_{\nu_{*}=0.3} 19.47 31.66 12.75%
0-th ZPIν=0.5{}_{\nu_{*}=0.5} 19.78 32.20 12.98%
1-st ZPIν=0.5{}_{\nu_{*}=0.5} 19.50 31.61 12.78%
Zigzag module PeMSD8
MAE RMSE MAPE
Z-GCNETs + 0-th ZPIν=0.3{}_{\nu_{*}=0.3} 17.14 27.24 10.66%
1-st ZPIν=0.3{}_{\nu_{*}=0.3} 15.76 25.11 10.01%
0-th ZPIν=0.5{}_{\nu_{*}=0.5} 17.22 27.41 10.77%
1-st ZPIν=0.5{}_{\nu_{*}=0.5} 16.77 26.62 10.39%

5.6 Robustness Study

To assess robustness of Z-GCNETs under noisy conditions, we consider adding Gaussian noise into 30% of training sets. The added noise follows zero-mean i.i.d Gaussian density with fixed variance ς2{\varsigma}^{2}, i.e., 𝒩(0,ς2)\mathcal{N}(0,{\varsigma}^{2}), where ς{2,4}\varsigma\in\{2,4\}. In Table 7 we report comparisons with two competitive baselines (AGCRN and LSGCN) on Decentraland and PeMSD4 using two different noise levels. Table 7 shows the performance of Z-GCNETs and two SOAs under described noisy conditions. We can see that the performance of all methods decays slowly with respect to the Gaussian noises. Despite that, we can see that Z-GCNETs is still consistently more robust than SOAs on both Decentraland and PeMSD4. On the other hand, for influence shown in Table 7, all methods have relative lower RMSE, proving that the Gaussian noises are usually less effective for graph convolution-based models.

Table 7: Robustness study on Decentraland and PeMSD4 (RMSE).
Noise AGCRN LSGCN Z-GCNETs (ours)
Decentraland 𝒩(0,2)\mathcal{N}(0,2) 27.69 36.10 24.12
𝒩(0,4)\mathcal{N}(0,4) 28.12 36.79 25.03
PeMSD4 𝒩(0,2)\mathcal{N}(0,2) 32.24 34.16 31.95
𝒩(0,4)\mathcal{N}(0,4) 32.67 34.75 32.18

6 Conclusion

Inspired the recent call for developing time-aware deep learning mechanisms by the US Defense Advanced Research Projects Agency (DARPA), we have proposed a new time-aware zigzag topological layer (Z-GCNETs) for time-conditioned GCNs. Our idea is based on the concepts of zigzag persistence whose utility remains unexplored not only in conjunction with time-aware GCN but DL in general. The new Z-GCNETs layer allows us to track the salient time-aware topological characterizations of the data persisting over time. Our results on spatio-temporal graph structured data have indicated that integration of the new time-aware zigzag topological layer into GCNs results both in enhanced forecasting performance and substantial robustness gains.

7 Acknowledgements

The project has been supported in part by the grants NSF DMS 1925346, NSF ECCS 2039701, and NSF ECCS 1824716.

References

  • Adams & Carlsson (2015) Adams, H. and Carlsson, G. Evasion paths in mobile sensor networks. The International Journal of Robotics Research, 34(1):90–104, 2015.
  • Adams et al. (2017) Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., and Ziegelmeier, L. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18, 2017.
  • Adams et al. (2020) Adams, H., Bush, J., Carr, B., Kassab, L., and Mirth, J. A torus model for optical flow. Pattern Recognition Letters, 129:304–310, 2020.
  • Bai et al. (2020) Bai, L., Yao, L., Li, C., Wang, X., and Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in Neural Information Processing Systems, 33, 2020.
  • Bauer (2019) Bauer, U. Ripser: efficient computation of vietoris-rips persistence barcodes. arXiv:1908.02518, 2019.
  • Carlsson (2019) Carlsson, G. Persistent homology and applied homotopy theory. Handbook of Homotopy Theory, 2019.
  • Carlsson & Gabrielsson (2020) Carlsson, G. and Gabrielsson, R. B. Topological approaches to deep learning. In Topological Data Analysis, pp.  119–146. Springer, 2020.
  • Carlsson & Silva (2010) Carlsson, G. and Silva, V. Zigzag persistence. Found. Comput. Math., 10(4):367–405, August 2010. ISSN 1615-3375.
  • Carrière et al. (2020) Carrière, M., Chazal, F., Ike, Y., Lacombe, T., Royer, M., and Umeda, Y. Perslay: A neural network layer for persistence diagrams and new graph topological signatures. In AISTATS, pp.  2786–2796, 2020.
  • Chae et al. (2018) Chae, S., Kwon, S., and Lee, D. Predicting infectious disease using deep learning and big data. International Journal of Environmental Research and Public Health, 15:1596, 07 2018. doi: 10.3390/ijerph15081596.
  • Chen et al. (2001) Chen, C., Petty, K., Skabardonis, A., Varaiya, P., and Jia, Z. Freeway performance measurement system: mining loop detector data. Transportation Research Record, 1748(1):96–102, 2001.
  • Cho et al. (2014) Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  • Chowdhury et al. (2018) Chowdhury, S., Dai, B., and Mémoli, F. The importance of forgetting: Limiting memory improves recovery of topological characteristics from neural data. PloS one, 13(9):e0202561, 2018.
  • Corcoran & Jones (2017) Corcoran, P. and Jones, C. B. Modelling topological features of swarm behaviour in space and time with persistence landscapes. IEEE Access, 5:18534–18544, 2017.
  • Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29, pp.  3844–3852. Curran Associates, Inc., 2016.
  • di Angelo & Salzer (2020) di Angelo, M. and Salzer, G. Tokens, types, and standards: Identification and utilization in ethereum. In 2020 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS), pp.  1–10, 2020. doi: 10.1109/DAPPS49028.2020.00001.
  • Gamble et al. (2015) Gamble, J., Chintakunta, H., and Krim, H. Coordinate-free quantification of coverage in dynamic sensor networks. Signal Processing, 114:1–18, 2015.
  • Gao et al. (2020) Gao, S., Huang, Y., Zhang, S., Han, J., Wang, G., Zhang, M., and Lin, Q. Short-term runoff prediction with gru and lstm networks without requiring time step optimization during sample generation. Journal of Hydrology, 589:125188, 2020. ISSN 0022-1694. doi: https://doi.org/10.1016/j.jhydrol.2020.125188. URL https://www.sciencedirect.com/science/article/pii/S002216942030648X.
  • Greff et al. (2017) Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., and Schmidhuber, J. Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232, 2017. doi: 10.1109/TNNLS.2016.2582924.
  • Guo et al. (2019) Guo, S., Lin, Y., Feng, N., Song, C., and Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  922–929, 2019.
  • Hamilton (2020) Hamilton, J. D. Time series analysis. Princeton university press, 2020.
  • Hofer et al. (2019) Hofer, C. D., Kwitt, R., and Niethammer, M. Learning representations of persistence barcodes. JMLR, 20(126):1–45, 2019.
  • Huang et al. (2020) Huang, R., Huang, C., Liu, Y., Dai, G., and Kong, W. Lsgcn: Long short-term traffic prediction with graph convolutional networks. In Bessiere, C. (ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 2355–2361, 2020.
  • Huang et al. (2019) Huang, S., Wang, D., Wu, X., and Tang, A. Dsanet: Dual self-attention network for multivariate time series forecasting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp.  2129–2132, 2019.
  • Kim et al. (2020) Kim, W., Mémoli, F., and Smith, Z. Analysis of dynamic graphs and dynamic metric spaces via zigzag persistence. In Topological Data Analysis, pp.  371–389. Springer, 2020.
  • Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. ICLR, 2017.
  • Li et al. (2018) Li, Y., Yu, R., Shahabi, C., and Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. International Conference on Learning Representations, 2018.
  • Pareja et al. (2020) Pareja, A., Domeniconi, G., Chen, J., Ma, T., Suzumura, T., Kanezashi, H., Kaler, T., Schardl, T., and Leiserson, C. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  5363–5370, 2020.
  • (29) Schmidhuber, J. LSTM: Impact on the world’s most valuable public companies. http://people.idsia.ch/~juergen/impact-on-most-valuable-companies.html. Accessed: 2020-03-19.
  • Shin & Kim (2020) Shin, S. and Kim, W. Skeleton-based dynamic hand gesture recognition using a part-based gru-rnn for gesture-based interface. IEEE Access, 8:50236–50243, 2020. doi: 10.1109/ACCESS.2020.2980128.
  • Song et al. (2020) Song, C., Lin, Y., Guo, S., and Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  914–921, 2020.
  • Sutskever et al. (2014) Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27:3104–3112, 2014.
  • Tausz & Carlsson (2011) Tausz, A. and Carlsson, G. Applications of zigzag persistence to topological data analysis. arXiv:1108.3545, 2011.
  • Tymochko et al. (2020) Tymochko, S., Munch, E., and Khasawneh, F. A. Hopf bifurcation analysis using zigzag persistence. Algorithms, 13(11):278, 2020.
  • Vassilevska et al. (2006) Vassilevska, V., Williams, R., and Yuster, R. Finding the smallest h-subgraph in real weighted graphs and related problems. In Automata, Languages and Programming, pp.  262–273. Springer Berlin Heidelberg, 2006.
  • Veličković et al. (2018) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. ICLR, 2018.
  • Weber et al. (2019) Weber, M., Domeniconi, G., Chen, J., Weidele, D. K. I., Bellei, C., Robinson, T., and Leiserson, C. E. Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics. arXiv preprint arXiv:1908.02591, 2019.
  • Wu et al. (2019) Wu, Z., Pan, S., Long, G., Jiang, J., and Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp.  1907–1913, 2019.
  • Xian et al. (2020) Xian, L., Adams, H., Topaz, C. M., and Ziegelmeier, L. Capturing dynamics of time-varying data via topology. arXiv:2010.05780, 2020.
  • Yan et al. (2018) Yan, S., Xiong, Y., and Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • Yao et al. (2018) Yao, H., Wu, F., Ke, J., Tang, X., Jia, Y., Lu, S., Gong, P., Ye, J., and Li, Z. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  • Yu et al. (2018a) Yu, B., Yin, H., and Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp.  3634–3640, 2018a.
  • Yu et al. (2018b) Yu, B., Yin, H., and Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp.  3634–3640. International Joint Conferences on Artificial Intelligence Organization, 7 2018b. doi: 10.24963/ijcai.2018/505. URL https://doi.org/10.24963/ijcai.2018/505.
  • Yu et al. (2019) Yu, Y., Si, X., Hu, C., and Zhang, J. A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput., 31(7):1235–1270, 2019.
  • Yuan et al. (2019) Yuan, J., Wang, H., Lin, C., Liu, D., and Yu, D. A novel gru-rnn network model for dynamic path planning of mobile robot. IEEE Access, 7:15140–15151, 2019. doi: 10.1109/ACCESS.2019.2894626.

Appendix A Additional Experimental Settings

On PeMSD4 and PeMSD8, we train our model using Adam optimizer with an initial learning rate lr=0.003lr=0.003 and decay rate of ρ=0.3\rho=0.3; whilst we set learning rate lrlr to 0.001 and decay rate ρ\rho to 0.1 in Bytom and Decentraland datasets. The length of Laplacianlink is set to 2 and 3 for transportation networks and token networks, respectively. Our Z-GCNETs is trained with batch sizes of 64 and 8 on PeMSD4 and PeMSD8, respectively. On Ethereum token networks, we set the batch size to 8. We run the experiments for 300 epochs and 100 epochs on transportation networks and Ethereum token networks, respectively. In all experiments, we set the grid size of ZPI to 100×100100\times 100 and use CNN model to learn zigzag persistence representation. The CNN model consists of 2 CNN layers with number of filter set to 8, kernel size to 3, stride to 2, and the global max-pooling with the pool size of 5×55\times 5.

Appendix B The Choice of Filtration

We now have also run experiments on the impact of the filtration choice. In addition to the weight rank clique filtration, we consider for power and weighted-degree sublevel filtrations. Table 8 shows a subset of illustrative results for Ethereum token networks. For sparser graphs such as Bytom, all filtrations tend to yield similar results. For more heterogeneous dynamic graphs with a richer topological structure, e.g., Decentraland, power filtration is the winner as it better captures evolution of the underlying graph organization. The proposed methodology is compatible with any filtration.

Table 8: Z-GCNETs (MAPE) for different zigzag filtrations.
Filtration Weighted-degree sublevel set Power
Dataset\\backslashScale Transaction Volume Volume
Bytom 30.56 30.80 30.79
Decentraland 25.18 24.93 22.15

Appendix C Ablation study on Ethereum token networks

To make sure that all the components of the Z-GCNETs perform well, we also conduct ablation study on Ethereum token networks. Table 9 summarizes the results obtained on Bytom and Decentraland. The results demonstrate that our Z-GCNETs outperforms Z-GCNETs without zigzag persistence representation learning (zigzag learning), spatial graph convolution (GCNSpatial), and temporal graph convolution (GCNTemporal).

Table 9: Ablation study (MAPE) of Ethereum token networks.
Architecture Dataset
Bytom Decentraland
Z-GCNETs 31.04% 23.81%
W/o Zigzag learning 33.19% 24.24%
W/o GCNSpatial 34.32% 25.22%
W/o GCNTemporal 31.25% 24.62%