Autoregressive GNN-ODE GRU Model for Network Dynamics

Bo Liang Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, P. R. China Lin Wang Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, P. R. China Xiaofan Wang Corresponding author: [email protected] Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, P. R. China Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, China

Abstract

Revealing the continuous dynamics on the networks is essential for understanding, predicting, and even controlling complex systems, but it is hard to learn and model the continuous network dynamics because of complex and unknown governing equations, high dimensions of complex systems, and unsatisfactory observations. Moreover, in real cases, observed time-series data are usually non-uniform and sparse, which also causes serious challenges. In this paper, we propose an Autoregressive GNN-ODE GRU Model (AGOG) to learn and capture the continuous network dynamics and realize predictions of node states at an arbitrary time in a data-driven manner. The GNN module is used to model complicated and nonlinear network dynamics. The hidden state of node states is specified by the ODE system, and the augmented ODE system is utilized to map the GNN into the continuous time domain. The hidden state is updated through GRUCell by observations. As prior knowledge, the true observations at the same timestamp are combined with the hidden states for the next prediction. We use the autoregressive model to make a one-step ahead prediction based on observation history. The prediction is achieved by solving an initial-value problem for ODE. To verify the performance of our model, we visualize the learned dynamics and test them in three tasks: interpolation reconstruction, extrapolation prediction, and regular sequences prediction. The results demonstrate that our model can capture the continuous dynamic process of complex systems accurately and make precise predictions of node states with minimal error. Our model can consistently outperform other baselines or achieve comparable performance.

^†^†preprint: APS/123-QED

I Introduction

Complex networks, as the modeling of complex systems, are ubiquitous in real life, involving different disciplines and domains, such as physics [1], biology [2], sociology [3], computer science [4] and control science [5] . Complex networks are essential carriers of information and have attracted much attention from researchers in recent years [6]. Researchers attempt to utilize various tools to exploit and make good use of the information hidden behind the network. Besides, what interests researchers more is the dynamic process on the complex networks [7], that is, the evolution of the complex systems. Each unit in the network interacts with other units along the edges according to its specific dynamic rules, and the state of each unit evolves over time. Usually, the dynamic rules of each unit can be represented by the ordinary differential equation(ODE) or partial differential equation(PDE). For instance, in the gene regulatory network, the expression level of one gene is affected or restricted by the expression level of other genes, and the expression levels of genes change dynamically [8]. In the mutualistic interaction network, the number of a species is not merely affected by the ecological environment and external factors but also by other partners with mutualistic interactions [7]. The fundamental and essential part of the dynamic process on the network is the governing equation which controls and models the dynamic laws of all the units. Discovering governing equations helps to understand the dynamics of the physical process, which also provides underlying guidance to the advancement of technology in return [9].

However, in many current dynamic systems on networks, governing equations are relatively unknown or partially known, which causes great difficulties in understanding and quantitative descriptions of the system. With the development of technology, states of the dynamic systems are measurable and traceable with instruments and sensors, and abundant time series data are observed, which gives rise to a new mode of data-driven discovery in governing equations or system identification [10]. Earlier researches include classic linear approaches [11], symbolic regression [12], dynamic modeling decomposition [13], nonlinear regression [14], and nonlinear Laplacian spectral analysis [15]. Sparse regression also makes good advancement in this field [16]. Recent efforts are devoted to leveraging artificial neural network (ANN) and deep neural network(DNN) to tackle the time series data and identify the system model [16]. Champion et al. [17] present a deep auto-encoder neural network to model the coordinate transformation and utilize the sparse identification of nonlinear dynamics algorithm for parsimonious modeling. Nevertheless, these methods are susceptible to the system size and are not applicable to the complex dynamic process and large networks composed of thousands of units. In practice, node states of the network can not be observed ideally at regular intervals, and the time series data is usually irregular and sparse in most cases, which also causes problems for these methods. Nowadays, instead of identifying the dynamic model in mathematical functions, researchers try to simulate, fit and learn the network dynamics in the form of a black box without need of identifying or knowing the specific model formulas. NDCN is proposed to use the Ordinary Differential Equation system combined with the Graph Neural Network block to learn the continuous network dynamics process [18]. The GNN block is used to model the complicated and nonlinear interactions between node states, and the real-time dynamic changes of node states are caught or tracked by the ODE system. The real-time states of nodes can be achieved similarly by solving an ODE initial-value problem. The depth of GNN blocks is analogous to the length in the time domain. NDCN uses a single ODE system to model the network dynamics, and the predicted states of nodes are obtained by solving the same ODE initial-value problem. However, using a single ODESolver, NDCN can not handle the long-term network dynamics well. The deviation between the predicted values and real states of nodes may get larger and larger over time if the predicted states of nodes in the early period do not fit well with the true observations. In addition, NDCN does not make full use of all the observations effectively. The observations of node states other than the initial values are not engaged in the training process as prior knowledge. This is probably one of the main reasons for this bad performance in the predictions of the long-term network dynamics. And this situation will become even worse when the observations are very sparse. It is difficult for a single ODE system to learn or fit the real dynamics of node states with sparse observations. To mitigate or overcome the effect of the sparsity of observations on NDCN and improve the ability of the long-term prediction, instead of directly using ODE systems to model the entire network dynamics process like NDCN, we model the network dynamics in segments using the ODE equations in an autoregressive way, which can reduce the difficulty of learning the dynamic process. In this way, the true observations can be fully involved in the training process as prior knowledge and effectively correct the prediction bias at each timestamp to ensure that the initial value for the next prediction is as accurate as possible. By this means, we impose a strong constraint on both ends of the ODESolver to reduce the prediction error and force the ODE system to easily capture or fit the real dynamics.

In this paper, we propose an Autoregressive GNN-ODE GRU Model(AGOG) to learn and capture the continuous network dynamics in a data-driven manner. Considering the sparsity of the observations of node states, we use the Autoregressive model to make a one-step ahead prediction based on the observation history to guarantee the continuity and accuracy of node states. The GNN module is utilized to model the complicated and nonlinear network dynamics, and this module includes two parts that imply the interactions between nodes and the dependence of dynamic evolution, respectively. The dynamics of hidden states of node states is formulated by the ODE system. The augmented ODE system [19] is used to map the hidden state of node states into the continuous time domain, and the hidden state is solved in the expanded space, which could improve generalization and stability and achieve lower losses. And the hidden states are updated through observations by GRUCell. The objective function is defined to minimize the reconstruction error and continuity error which describe the accuracy of prediction and the continuity of dynamic processes, respectively. To evaluate the performance of our model, in consideration of these three tasks: interpolation reconstruction, extrapolation prediction, and regular sequences prediction, we visualize the learned network dynamics and test our model and other baseline methods in different network dynamics with various underlying networks. The results demonstrate that our model can learn and capture the continuous network dynamics precisely and stably. Our model consistently outperforms other baseline methods in most cases and can still achieve comparable performance in other cases.

The rest of this paper is organized as follows. The following Section is the Related Work. Our model is described in Section $3$ . The experiments are executed in Sections $4$ and $5$ . This paper is summarized in the last Section.

II Related work

The proposal of Neural ODE [20] causes a sensation and ignites the enthusiasm of researchers. Different from discrete hidden layers in the neural networks, Neural ODE uses the ODE system as a fundamental component, and the output is calculated through the ODESolver. The continuous neural dynamics of Neural ODE determines that it can handle or incorporate time-series data, especially with non-uniform intervals, which inspires further research greatly [21, 22, 23]. Rubanova et al. [24] define two models: autoregressive ODE-RNN and Latent ODE based on variational auto-encode, to handle the time series with irregular intervals. Latent ODE uses ODE-RNN as the recognition network to identify the approximate posterior of the initial state, and the ODESolver is then utilized to achieve the hidden state at any time point. Lechner et al. [25] theoretically prove that ODE-RNN is plagued by gradient vanishing and gradient explosion and introduce ODE-LSTM to address this problem and capture the long-term dependency in irregularly sampled time-series data. Inspired by the Neural ODE, Brouwer et al. [26] modify the Gated Recurrent Unit in the continuous time domain and combine it with the Bayesian update network to model the sporadic observations. Herrera et al. [27] consider the lack of theoretical guarantee for the predictive capabilities of methods mentioned above and then propose Neural Jump ODE to model the conditional expectation for stochastic processes in continuous time.

The powerful capabilities of Graph Neural Networks(GNN) have been proved in various tasks, such as node classification, link prediction, and graph classification [28, 29]. In recent years, multiple variants of GNN spring up such as GCN [30], GAE [31], GAT [32] and GraphSage [33]. GCN [30] can be divided into two classes: spectral-based and spatial-based approaches. Spectral-based approaches define the convolutional operator in the spectral domain, and the convolutional operator is deemed as the noise filter in the graph signal processing [34]. Researchers consider designing different convolutional operators to capture different structural properties of graphs. Spatial-based approaches define message passing rules or convolutional operators based on graph topology, and the node representation or node message is updated by its neighbors’ and its own [35, 36]. To improve the performance of GNN, some tricks or modules are proposed. Skip connection is tried to alleviate the over-smoothing and make GNN deeper [37]. Sampling methods consider reducing the size of node neighbors by conducting sampling for each node or each convolution layer [38]. Pooling methods focus on designing pooling layers to learn high order or graph level hierarchy representation of graphs [39]. Besides, GNN combined with gate mechanisms is also proposed to use LSTM or GRU in the message passing to improve the long-term passing of node information [40].

The work of combining GNN and Neural ODE or PDE has also been proposed. NDCN [18] uses the GNN-based ODE system to learn the nonlinear and complex continuous dynamics on the network and achieve predictions of node states at an arbitrary time. The GNN module is utilized to model the network dynamics equation, and the ODE system is used to map the discrete node states into the continuous time domain. Besides, NDCN is also suitable for the node classification task. The continuous dynamics of ODE corresponds to the discrete layers of GNN, and the continuous time can be interpreted as the depth of discrete layers. Similarly, CGNNs [41] generalize GNN with discrete layers into a continuous physical process. Inspired by the idea of Pagerank and graph diffusion methods, CGNNs use the ODE system to define the continuous dynamics of node embedding, and the derivative of node embedding is derived from the embedding of its neighbors and itself and its initial values. CGNNs can effectively improve over-smoothing, help make GNN deeper, and learn long-term dependency. Grand [42] is also motivated by the continuous diffusion process and model graph learning as a discretization of a basic PDE. And Grand uses different spatial and temporal discretization formalism and diffusivity functions to model the diffusion on the graph. However, CGNNs and Grand are both derived from the graph diffusion mechanisms, designed for the node classification task, and not particularly targeted for modeling the dynamic processes between nodes, and the terminal states or the final representation of nodes are achieved in a short time. Moreover, STGODE [43] applies CGNN to the Traﬀic Flow Forecasting. STGODE uses the continuous GNN-ODE to capture the spatial-temporal dependency and alleviate over-smoothing.

III Autoregressive GNN-ODE GRU Model

Consider that a continuous dynamic process on a network is governed by:

\frac{dx(t)}{dt}=f(x,G,t),

(1)

where $x(t)$ are the states of nodes and $G=(V,E)$ represents the underlying network structure that describes the interaction relationships between nodes. And $f$ is the governing function which dominates the dynamic law of the evolution of node states. The number of nodes of $G$ is $n$ , and $x(t)\in\mathbb{R}^{n\times k}$ where $k$ denotes the dimension of node states. And $k$ can be any integer greater than zero.

For this continuous dynamic process on the network, we can have a series of observations of node states, $\mathbf{x}=\{(t_{0},x_{0}),(t_{1},x_{1}),...,(t_{i},x_{i}),...,(t_{T},x_{T})\}$ , where each $t_{i}\in\mathbb{R}$ denotes the timestamp of the observation $x_{i}\in\mathbb{R}^{n\times k}$ and $t_{0}<...<t_{i}<...<t_{T}$ . We want to realize predictions of the node states on $G$ at any given time which may be larger or smaller than $t_{T}$ . To tackle this problem, we propose an Autoregressive GNN-ODE GRU model to provide a novel solution. Autoregressive models make further predictions based on the observation history and specify the conditional probability $p(x_{i}|x_{i}-1,...,x_{0})$ . One-step ahead predictions are achieved by solving an ODE initial-value problem. And the new observations as prior knowledge are combined with the hidden states for the predictions at the next timestamp. The diagram of our model is shown in Fig. 1, and the framework of our model is specified as follows.

Refer to caption — Figure 1: Diagram of Autoregressive GNN-ODE GRU Model. The process of GRUCell[44] is presented inside the black box. Our model provides one-step ahead predictions based on observations history in the form of an autoregressive model. The observation is first mapped into the hidden state. And the dynamic of hidden states is formulated by the Augmented ODE system. The hidden state at the next moment is achieved by solving an ODE initial value problem through the ODESolver. Predictions of node states are obtained from the hidden state. Then the hidden states are updated through GRUCell with new observations at the same time as the prior knowledge. The updated hidden states are used to predict the node states at the next moment. Procedures are repeated over and over to achieve predictions of node states at the corresponding time.

Suppose that the initial observation of node states is $x_{0}\in\mathbb{R}^{n\times k}$ , and the corresponding observation timestamp is $t_{0}$ . We first map $x_{0}$ into a hidden state $h(t_{0})\in\mathbb{R}^{n\times d}$ :

h(t_{0})=x_{0}W_{e}+b_{e},

(2)

in which $W_{e}\in\mathbb{R}^{k\times d}$ and $b_{e}\in\mathbb{R}^{n\times d}$ .

The dynamic of hidden states is specified by the following ordinary differential equation system:

\frac{dh(t)}{dt}=g_{\theta}(h(t),t),

(3)

in which the function $g_{\theta}(h(t),t)$ formulates the latent dynamic of hidden states with the parameter $\theta$ . Consider that the dynamic process is modeled or relies on the network structure. The function $g_{\theta}$ is designated expressly by a GNN module:

g_{\theta}=\Phi h(t)\theta_{1}+(h(t)\theta_{0}+b_{0}),

(4)

where $\Phi$ is the normalized Laplacian matrix, $\Phi=D^{-\frac{1}{2}}(D-A)D^{-\frac{1}{2}}$ in which $A$ denotes the adjacency matrix of $G$ and $D$ denotes the degree matrix of the network. On the right side of Eq. 4, the left part models the interactions of hidden states between nodes, while the right part specifies the dependence of the dynamic evolution of the hidden state on the current hidden state of nodes. The left part simulates the influence of dynamic interactions between nodes on the state of nodes, and the right part represents the self-dynamic of nodes which implies the evolution rule of the node states influenced by their own states.

So, the next hidden state at $t_{1}$ , $h^{\prime}(t_{1})$ , can be achieved by the ODESolver:

	$\displaystyle h^{\prime}(t_{1})=h(t_{0})+\int_{t_{0}}^{t_{1}}g_{\theta}(h(t),t)\,dt,$		(5)
	$\displaystyle h^{\prime}(t_{1})=ODESolver(g_{\theta}(h(t),t),h(t_{1}),(t_{0},t_{1})).$		(6)

This initial-value problem is solved by numerical methods, e.g., the fourth-order Runge-Kutta method, the high-order Dormand-Prince DOPRI5 method, and the Euler method. We use the ODE solvers from the torchdiffeq package (https://github.com/rtqichen/torchdiffeq), and details can be found in [20]. To make the ODESolver more powerful and expressive, we adopt the Augmented Neural ODE [19] to offer a novel solution to the problems above. The space of the hidden state is expanded, and the ODE is solved from $\mathbb{R}^{n\times d}$ to $\mathbb{R}^{n\times(d+p)}$ where $p$ represents the augmented dimension. The augmented space is defined as $a(t)\in\mathbb{R}^{n\times p}$ , and Eq. 3 is updated to:

\frac{d[h(t),a(t)]}{dt}=g_{\theta}([h(t),a(t)],t),[h(0),a(0)]=[h(t_{0}),\mathbf{0}].

(7)

For simplicity, the hidden state with the augmented space is denoted as $h_{a}(t)$ , and $h_{a}(t_{0})=[h(t_{0}),\mathbf{0}]$ . Intuitively, the hidden state $h(t)$ is merged with the augmented zero space $a(t)$ , and the ODE is solved in the expanded space. With the additional space, the ODE flow can easily flow to the augmented space, and the learned hidden state is represented in a higher dimension and has a better expressivity and precision. Simultaneously, the trained function $G_{\theta}$ becomes more smooth and achieves better generalization and lower time cost. Then the predicted hidden state $h^{\prime}(t_{1})\in\mathbb{R}^{n\times(d+p)}$ at $t_{1}$ is achieved by:

h^{\prime}(t_{1})=ODESolver(g_{\theta},h_{a}(t_{0}),(t_{0},t_{1})).

(8)

And the prediction of node states at $t_{1}$ , $x^{\prime}_{1}$ , is achieved through a linear layer:

x^{\prime}_{1}=h^{\prime}(t_{1})W_{o}+b_{o},

(9)

where $W_{o}\in\mathbb{R}^{(d+p)\times k}$ and $b_{o}\in\mathbb{R}^{n\times k}$ .

The real node states at $t_{1}$ are denoted as $x_{1}$ . We use real observations as the prior knowledge to revise and update the obtained hidden state, $h^{\prime}(t_{1})\in\mathbb{R}^{n\times(d+p)}$ . We also map $x_{1}$ into the hidden state as $h(t_{1})\in\mathbb{R}^{n\times d}$ and utilize the GRUCell to update hidden state $h^{\prime}(t_{1})$ at $t_{1}$ given current true observation $x_{1}$ :

	$\displaystyle h(t_{1})=x_{1}W_{e}+b_{e},$		(10)
	$\displaystyle\hat{h}(t_{1})=GRUCell(h^{\prime}(t_{1}),h(t_{1})).$		(11)

In detail, Eq. 11 is formulated as:

\begin{split}r_{1}=\sigma(h^{\prime}(t_{1})W_{r}+b_{wr}+h(t_{1})U_{r}+b_{ur}),\\ z_{1}=\sigma(h^{\prime}(t_{1})W_{z}+b_{wz}+h(t_{1})U_{z}+b_{uz}),\\ n_{1}=\tanh(h^{\prime}(t_{1})W_{h}+b_{wh}+r*(h(t_{1})U_{h}+b_{uh})),\\ \hat{h}(t_{1})=(1-z_{1})*n_{1}+z_{1}*h(t_{1}),\end{split}

(12)

in which $W_{h}\in\mathbb{R}^{(d+p)\times d}$ , $U_{h}\in\mathbb{R}^{d\times d}$ , $b_{wh},b_{uh}\in\mathbb{R}^{n\times d}$ . The updated hidden state at $t_{1}$ is denoted as $\hat{h}(t_{1})\in\mathbb{R}^{n\times d}$ . The augmented hidden state $h_{a}(t_{1})$ at $t_{1}$ is expressed as $[\hat{h}(t_{1}),\mathbf{0}]$ , and the updated prediction of node states at $t_{1}$ from the augmented updated hidden state $h_{a}(t_{1})$ , can be calculated from:

\hat{x}_{1}=h_{a}(t_{1})W_{o}+b_{o}.

(13)

Then $h_{a}(t_{1})$ can be the initial value of the ODESolver to get the hidden state at the next moment $t_{2}$ , $h^{\prime}(t_{2})$ :

h^{\prime}(t_{2})=ODESolve(g_{\theta},h_{a}(t_{1}),(t_{1},t_{2})).

(14)

And the prediction $x^{\prime}_{2}$ at $t_{2}$ can be achieved through $h^{\prime}(t_{2})$ . Repeating these procedures above, we can obtain the prediction of node states at the corresponding time. There are two types of outputs, $x^{\prime}_{i}$ and $\hat{x}_{i}$ , defined as the prediction and the updated prediction to show the distinction, respectively. $x^{\prime}_{i}$ is obtained from the solution of the ODESolver $h^{\prime}(t_{i})$ , through a linear layer based on Eq. 9, while $\hat{x}_{i}$ is achieved from the augmented updated hidden state $h_{a}(t_{i})$ , through a linear layer based on Eq. 13. The augmented updated hidden state $h_{a}(t_{i})$ is obtained in two steps. The predicted hidden state of the ODESolver solution $h^{\prime}(t_{i})$ is updated by the GRUCell through the real observations $x_{i}$ at the same time to get the updated hidden states $\hat{h}(t_{i})$ , then $h_{a}(t_{i})$ is achieved after the augmentation of the $\hat{h}(t_{i})$ . And $x^{\prime}_{i}$ is deemed as the final result that we need.

The loss function is defined as below:

L=\frac{1}{T}\sum_{i=0}^{T}|x^{\prime}_{i}-x_{i}|+\frac{1}{T}\sum_{i=0}^{T}|\hat{x}_{i}-x^{\prime}_{i}|,

(15)

where $|\cdot|$ denotes the $l_{1}$ -normal loss, that is the mean element-wise absolute value difference. Our aim is to minimize this objective function, Eq. 15. The above function has two terms: the left part is the reconstruction error, while the right part is the continuity error. It is easy to understand the reconstruction error which requires the predicted value to be closer to the true states. As for the continuity error, in the dynamic of the hidden states, we expect that the updated hidden states should be consistent with the predicted hidden states from the ODESolver at the same timestamp. The updated hidden states are then used as the initial value for the predictions at the next timestamp. We use the true observations to correct the hidden state and hope that the updated hidden state will not deviate too much from the original hidden state. By introducing the true observations, we impose a strong constraint on both ends of the ODESolver and force the ODE system to capture the real dynamics easily. Moreover, the continuity error implies the constraints on the smoothness and continuity of the trajectory of the predicted states. $W_{e},W_{o},\theta_{0},\theta_{1},b_{e},b_{o},b_{0}$ , and GRUCell are all trainable parameters, and they are updated through backpropagation of the gradient of the loss. The loss function is optimized by the Adam method, and the pseudocode of our Autoregressive GNN-ODE GRU model is presented in Algorithm 1. For clarity, our model is abbreviated as AGOG(Autoregressive GNN-ODE GRU Model).

Algorithm 1 Autoregressive GNN-ODE GRU Model

1:augmented dimension:

p

; number of nodes:

n

; node states and their timestamps:

\mathbf{x}=\{(t_{i},x_{i})\}_{i=0...T}

;

2:predicted node states:

\{x^{\prime}_{i}\}_{i=0...T}

and

\{\hat{x}_{i}\}_{i=0...T}

;

3:for

i=0:T

h(t_{i})=x_{i}W_{e}+b_{e}

5:end for

h_{a}(t_{0})=[h(t_{0}),\mathbf{0}\in\mathbb{R}^{n\times p}]

h^{\prime}(t_{1})=ODESolve(g_{\theta},h_{a}(t_{0}),(t_{0},t_{1}))

\hat{h}(t_{1})=GRUCell(h^{\prime}(t_{1}),h(t_{1}))

9:for

i=2:T

10:

h_{a}(t_{i-1})=[\hat{h}(t_{i-1}),\mathbf{0}\in\mathbb{R}^{n\times p}]

11:

h^{\prime}(t_{i})=ODESolve(g_{\theta},h_{a}(t_{i-1}),(t_{i-1},t_{i}))

12:

\hat{h}(t_{i})=GRUCell(h^{\prime}(t_{i}),h(t_{i}))

13:end for

14:for

i=0:T

15:

x^{\prime}_{i}=h^{\prime}(t_{i})W_{o}+b_{o}

16:

\hat{x}_{i}=h_{a}(t_{i})W_{o}+b_{o}

17:end for

IV Interpolation reconstruction and Extrapolation prediction

We evaluate our model on different structures of networks with various network dynamics. Complex network models include: 1) Community network [45], 2) Grid network, 3) Erdós and Rényi random network [46], 4) Power-law network [47], and 5) WS small-world network [48]. For each type of network models, we investigate three kinds of network dynamics processes. And the real benchmark dynamic of node states is generated according to its underlying network structure and rules of dynamic evolution by the Dormand-Prince method. Regarding the experimental setup, we generally follow the settings in NDCN [18], except for setting a larger time interval for observations, because we consider the sparsity of observations in real life. In the simulation, for each network dynamics on each network structure, we randomly sample 120 snapshots of node states with irregular time interval, $\mathbf{x}=\{(t_{0},x_{0}),(t_{1},x_{1}),...,(t_{119},x_{119}),t_{0}<t_{1}...<t_{119}\}$ , where sampling intervals between adjacent datapoints are not equal. The last $20$ datapoints $\{(t_{i},x_{i})\}_{i=100,...,119}$ are used for the extrapolation prediction task. As for the first 100 datapoints, we randomly choose $P\%$ datapoints for training, and the residual datapoints are targeted for the interpolation reconstruction task. A smaller $P$ means that the observed data is more sparsity. To illustrate the superiority of our model, we use the MAE(Mean Absolute Error), and Normalized L1 loss metrics to measure the performance of our model on these two tasks. The observed node states may have zero, so the MAPE(Mean Absolute Percentage Error) is not applicable here. Instead, the Normalized L1 loss is selected, which is normalized by the mean value of all observations.

Experimental configuration. The number of nodes of the generated networks is set as $400$ . The initial node state $x_{0}$ is set as the same for each case. The training percentage of observed snapshots $P\%$ is set as $10\%$ , and $30\%$ , respectively. With regard to hyperparameters in our model, the hidden dimension $d$ of hidden state $h(t_{i})$ is set as $20$ , $h(t_{i})\in\mathbb{R}^{n\times 20}$ , and the augmented dimension $p$ is set as $5$ . And our model is trained for $800$ epochs with the learning rate $0.01$ in all cases. The Euler method is used for the ODESolver.

We use NDCN method [18] as the baseline to compare with our model. NDCN uses a simple GNN module to model the interactions between node states, and the discrete node states are mapped into the continuous time domain by the ODE system. The main goal of NDCN is state prediction on the network. There are also three variants of NDCN: No-Encode, No-Graph, and No-Control. For succinctness, we only consider NDCN in this paper because of its best performance between it and its variants. Moreover, there are also some GNN-ODE methods that are not designed for predictions of node states, but we consider them as baselines here as well. These methods define the evolution function of node embedding as the ODE function, and the final embedding is achieved by the ODESolver in a finite time. Here we regard the node embedding as the hidden state of nodes states. We use the ODE functions of these GNN-ODE methods to model the network dynamics, respectively. CGNNs [41] expand the discrete GNN layers into a continuous physical process inspired by the diffusion process on graphs and define an evolution equation of node embedding that the derivative of node embedding is the combination of itself, the embedding of its neighbors, and its initial value. CGNNs have two different variants, which consider whether each dimension of node embedding evolves independently or not, and we denote them as CGNN and WCGNN, respectively, which are all considered here. Similarly, Grand [42] is also motivated by the continuous diffusion process and model GNN as a discretization of a basic PDE. And Grand uses different spatial or temporal discretization formalisms and diffusivity functions to model the diffusion on the graph. The diffusivity is modeled by the scaled dot product attention mechanism. The more general non-linear form of GRAND [42] is tested here, denoted as GRAND. We use the default parameters for NDCN as stated in [18]. The hidden size is set as $20$ for CGNN, WCGNN, and GRAND. And the dimension of the query and key matrices is both set as $20$ for GRAND. The loss function of CGNN, WCGNN, and GRAND is the same as that of NDCN.

IV.1 Network dynamics model

We consider several following network dynamics from different domains in this paper. Define the adjacent matrix of the underlying network as $A$ and the state of node $i$ at timestamp $t$ as $z_{i}(t)\in\mathbb{R}^{1\times k}$ in which $k$ denotes the dimension of the node state. In the experiment, $k$ is set as $1$ for all network dynamics. The descriptions of these network dynamic models and specific governing equations of dynamics are listed in turn.

•

Gene regulatory network dynamics model [7]. This model simulates the interactions between genes which are governed by:

$\frac{dz_{i}(t)}{dt}=-b_{i}z_{i}+\sum_{j=1}^{N}A_{ij}\frac{z_{j}^{h}}{z_{j}^{h}+1}.$ (16)
•

Kuramoto network dynamics model [49]. This model describes the synchronizations of phase-coupled oscillators and is defined as:

$\frac{dz_{i}(t)}{dt}=\omega_{i}+k_{i}\sum_{j=1}^{N}A_{ij}\sin(z_{i}-z_{j}).$ (17)

•

Mutualistic interaction network dynamics model [7]. This model describes the mutualistic interactions between species in the ecotope, and the specific dynamics equation is shown as:

\begin{split}\frac{dz_{i}(t)}{dt}=-b_{i}+z_{i}(1-\frac{z_{i}}{k_{i}})(\frac{z_{i}}{c_{i}}-1)+\\ \sum_{j=1}^{N}A_{ij}\frac{z_{i}z_{j}}{d_{i}+e_{i}z_{i}+h_{j}z_{j}}.\end{split}

(18)

IV.2 Result

We visualize the snapshots of node states of each network dynamics with various underlying networks learned by different methods in Figs. 2, 3, and 4, where the training percentage of observed snapshots $P\%$ is set as $10\%$ . Due to space limitations, we just present the dynamic process on three networks for each network dynamics. In each case, we select five snapshots of node states, and node states evolve along the direction of time from top to bottom. In each snapshot, $400$ nodes are mapped into a $20\times 20$ matrix, and each square corresponds to a node and reflects its state value. In each row of each case, the five photos show the real dynamic of node states, the dynamic captured by our model AGOG, and the dynamic learned by NDCN and the other two selected methods in turn, from left to right. The number at the bottom of each row of images represents the difference between the true value and prediction of the node state by different methods, which is quantified by the MAE metric. (In addition, we also show the quantitative results over time of the interpolation reconstruction and extrapolation prediction tasks in the Figs. 5, 6 and 7)

As shown in these figures, each type of network dynamics with different structures of networks performs various evolution processes of node states except the Kuramoto network dynamics. This may be because the Kuramoto network dynamic is a long-term dynamic process. The result demonstrates that our model can learn and capture the network dynamics accurately in most instances. In rare cases, our model exhibits a slight time-lag but can still catch the real dynamics in a very short time. Moreover, the quantitative results also show that our method can maintain an accurate prediction of node states with a small error at any arbitrary time. But it is hard for these baselines to capture the real dynamics. It is clear that these methods either do not learn the network dynamics well from the beginning or their predictions deviate more and more from the true states as time goes on. For example, in some cases, NDCN can learn the pattern of the real dynamics initially, but the value of predicted node states quickly contrasts sharply with the real one, which is more obvious when the dynamic evolves for quite a long time. It may be a challenge of robustness or the learning ability of the long-term for NDCN. And NDCN exhibits severe time delay in most cases. At the same time, the sparsity of the observations also causes great difficulties for NDCN. As for other GNN-ODE based methods, it is very tough for them to capture the real evolution of network dynamics, especially for GRAND. WCGNN has a bright performance in some scenes, but at the same time, it shows a strong time lag.

The experiments are conducted over 20 independent realizations, and the average performance in all cases is presented in Tabs. 1 and 2, corresponding to the interpolation reconstruction task and extrapolation prediction task, respectively(owing to space constraints, results with the percentage of observed node states as $30\%$ are shown in the APPENDICES). In each row, the results of our model AGOG and other baseline methods under these two metrics in a certain case are shown from left to right, respectively. We also present the results of our method without considering the continuity loss in the objective function, which is abbreviated as AGOG*. Each result is reported after multiplying by $10^{-2}$ , and the best result in each case and each metric is in bold. The results show that our model outperforms other baseline methods in both interpolation reconstruction and extrapolation prediction tasks by a huge margin, especially when the proportion of the observed node states is very small. NDCN cannot handle very sparse data well and learn the network dynamics from that. Our model presents considerable robustness and can learn and simulate the continuous network dynamics precisely. The accuracy of our model is one order of magnitude higher than other baseline methods in most cases. In the gene regulation dynamics, the Normalized L1 metric of our model can be $9.5\%$ of that of NDCN and $10.3\%$ of that of WCGNN in the extrapolation prediction task, and $20.4\%$ of that of NDCN and $18.9\%$ of that of WCGNN in the interpolation reconstruction on average. And in the Mutualistic interaction network dynamics, the average MAE metric accuracy of our model can achieve $14.1\times 10^{-2}$ in the extrapolation prediction task, which is $9.7\%$ of that of WCGNN. These two metrics reach the same conclusion, and all demonstrate the superiority and unparalleled ability of our model. Moreover, with the introduction of the continuity loss, the performance of AGOG is significantly improved in comparison with AGOG*, in both extrapolation and interpolation prediction tasks. The continuity of the trajectory of the predicted nodes states contributes to the accuracy of predictions. In rare cases, AGOG performs slightly worse than AGOG*. Under these conditions, the introduction of continuity loss may cause over-smoothing, making the true observations less useful. The balance between the reconstruction error and the continuity error may be a good choice to alleviate the over-smoothing.

Table 1: Results(

\times 10^{-2}

) of interpolation reconstruction of network dynamics with the percentage of observed node states as

10\%

		MAE						Normalized L1
		AGOG	AGOG*	NDCN	CGNN	WCGNN	GRAND	AGOG	AGOG*	NDCN	CGNN	WCGNN	GRAND
Gene Regulation	Community	83.36	85.21	422.7	569.22	459.95	624.1	3.38	3.45	17.14	23.06	18.63	25.35
	Grid	12.05	20.4	104.81	205.01	121.81	165.95	2.65	4.5	23.06	45.18	26.81	36.49
	Power Law	33.33	37.24	114.34	310.83	226.75	332.87	4.27	4.78	14.66	39.87	29.06	42.67
	Random	100.97	134.49	541.98	582.36	357.37	744.93	3.32	4.44	17.7	18.99	11.63	24.34
	Small World	17.19	29.16	75.05	156.04	73.77	127.79	4.07	6.92	17.69	36.81	17.38	30.16
Kuramoto Model	Community	30.33	32.24	148.65	152.8	106.95	175.32	6.28	6.67	30.77	31.64	22.14	36.29
	Grid	27.86	27.07	145.01	223.28	135.24	157.22	5.74	5.57	29.86	46	27.86	32.38
	Power Law	25.02	26.11	119.98	133.18	88.06	176.38	5.12	5.35	24.56	27.28	18.03	36.11
	Random	32.51	32.02	154.12	139.67	103.79	174.01	6.75	6.64	32.01	29.02	21.56	36.15
	Small World	25.84	26.32	140.57	195.82	120.43	156.25	5.3	5.39	28.8	40.14	24.67	32.01
Mutualistic Interaction	Community	92.9	105.15	263.16	345.62	235.26	344.86	8.34	9.42	23.54	30.78	20.98	30.34
	Grid	21.86	23.07	51.33	88.23	56.98	36.92	10.27	10.83	23.94	41.18	26.55	17.24
	Power Law	38.23	62.31	181.83	188.48	164.55	208.03	7.23	11.84	34.54	35.84	31.18	39.62
	Random	130.9	110.63	303.67	321.14	237.04	301.54	10.15	8.58	23.51	24.83	18.33	23.3
	Small World	14.85	17.85	38.29	53	29.49	25.15	8.79	10.54	22.56	31.17	17.3	14.76

Table 2: Results(

\times 10^{-2}

) of extrapolation reconstruction of network dynamics with the percentage of observed node states as

10\%

		MAE						Normalized L1
		AGOG	AGOG*	NDCN	CGNN	WCGNN	GRAND	AGOG	AGOG*	NDCN	CGNN	WCGNN	GRAND
Gene Regulation	Community	65.53	79.45	784.97	932.9	661.16	651.7	2.09	2.54	25.01	29.79	21.1	20.8
	Grid	8.83	12.39	224.64	213.69	152.79	255.68	1.29	1.81	32.85	31.25	22.35	37.4
	Power Law	26.96	36.8	152.95	426.55	288.29	428.87	2.8	3.82	15.89	44.31	29.95	44.55
	Random	72.87	86.45	853.83	901.31	516.93	592.88	1.85	2.19	21.63	22.86	13.12	15.04
	Small World	11.1	12.74	124.01	189.88	90.18	124.71	1.96	2.24	21.78	33.43	15.89	21.98
Kuramoto Model	Community	13.73	17.86	210.39	166.95	114.41	218.28	2.26	2.94	34.72	27.5	18.86	35.97
	Grid	14.94	15.58	200.52	290.94	158.56	212.69	2.46	2.57	33.09	48.03	26.18	35.11
	Power Law	13.27	15.66	190.7	156.64	108.09	244.54	2.18	2.57	31.31	25.68	17.71	40.09
	Random	13.71	15.96	185.17	135.35	104.81	214.2	2.26	2.64	30.7	22.5	17.38	35.51
	Small World	13.64	14.98	194.18	245.64	141.81	214.89	2.23	2.45	31.81	40.27	23.24	35.21
Mutualistic Interaction	Community	16.2	23.77	194.74	316.53	179.83	205.35	1.17	1.71	14.01	22.78	12.95	14.81
	Grid	19.94	21.04	153.61	248.75	212	76.6	5.16	5.44	39.63	64.13	54.58	19.74
	Power Law	14.9	42.73	269.03	237.97	196.13	132.21	1.69	4.84	30.49	26.98	22.23	14.99
	Random	18.95	17.09	240.46	253.33	168.53	142.82	1.23	1.11	15.61	16.44	10.93	9.27
	Small World	15.9	26.46	187.46	162.67	148.34	89.77	5	8.26	58.92	50.82	46.35	28.08

The results of both the snapshots visualizations and quantitative tasks indicate that the sparsity of datapoints poses large difficulties for these baseline methods. From the results of snapshot visualizations, we can clearly find that these algorithms either fail to learn the network dynamics well from the beginning or their predictions deviate more and more from the true state over time. Consider that these baseline methods all use a single ODE function to directly model the overall continuous network dynamics process. And the long-term predictions of methods are based on the answers of the short-term. At the same time, the sparsity of the observations poses great challenges and difficulties for these methods to fit and model these time-series data. Under this condition, once the method fails to predict the short-term data, the prediction bias for the subsequent data will become so large that the network dynamics cannot be well learned. However, we use the autoregressive model to deal with the sparsity of data. We use the ODE function for segmental modeling of the network dynamics to mitigate the problems caused by the sparsity of the data. And the observations are also introduced into the training process at the same time to correct the predicted values at this moment, so that the subsequent predictions are less likely to be affected by the deviations of the previous predictions. In this way, we impose a strong constraint on both ends of the ODESolver and force the ODE system to capture or fit the real dynamics easily. The results show the validity of our method in the form of the autoregressive model.

V Regular Sequences Prediction

Under ideal conditions, the node states can be observed at equal time intervals. In this section, we try to use our model to capture the network dynamics based on regular sequences of node states. In the experiment, we also consider these continuous network dynamics with different underlying networks above. Regarding the experimental setup, we generally follow the settings in NDCN [18], except for setting a larger time interval for observations, because we consider the sparsity of observations in real life. For each case, we sample 80 snapshots of node states with equal intervals, $\mathbf{x}=\{(t_{0},x_{0}),(t_{1},x_{1}),...,(t_{79},x_{79}),$ $\{t_{0}<...<t_{79}\}$ , where time intervals between adjacent observations are equal. The first $80\%$ observations are used for training, while the last $20\%$ are considered as the test set for the regular sequences prediction task. For evaluation, we also employ the MAE(Mean Absolute Error) and Normalized L1 loss metrics to validate the performance of methods.

Experimental configuration. The basic experimental settings are set as the same as that in the previous Section. Consider the time interval between observations is equal. Beyond the baseline methods mentioned above, we also choose temporal-GNN algorithms as the baseline, which are also considered in [18]. Temporal-GNN algorithms are the combinations of GCN and RNN blocks, and they can not be applied to the interpolation reconstruction task. Only the extrapolation prediction task is considered here. The hidden size of the GCN and RNN blocks are set as $10$ and $5$ , respectively. The default parameters are used for NDCN, and the parameters of our model and other GNN-ODE based methods remain the same as in Section $4$ .

V.1 Baseline

The temporal-GNN algorithms are composed of different RNN cells and GCN blocks. The GCN block simulates and learns network dynamics, while the RNN cell models the sequential connections of node states. Here we just consider GRU, LSTM, and RNN cells. The objective function of the temporal-GNN algorithms is the same as that of NDCN. The details are listed as follows:

•

GRU-GNN(GRU-G). This temporal-GNN is consist of GCN block and GRUCell:

\begin{split}z_{t}=\operatorname{ReLU}(\Phi x_{t}W_{e}+b_{e}),\\ h_{t}=GRUCell(z_{t},h_{t-1}),\\ x^{\prime}_{t+1}=h_{t}W_{d}+b_{d}.\end{split}

(19)

•

LSTM-GNN(LSTM-G). This temporal-GNN is composed of GCN block and LSTMCell:

\begin{split}z_{t}=\operatorname{ReLU}(\Phi x_{t}W_{e}+b_{e}),\\ h_{t}=LSTMCell(z_{t},h_{t-1}),\\ x^{\prime}_{t+1}=h_{t}W_{d}+b_{d}.\end{split}

(20)

•

RNN-GNN(RNN-G). This temporal-GNN is consist of GCN block and RNNCell:

\begin{split}z_{t}=\operatorname{ReLU}(\Phi x_{t}W_{e}+b_{e}),\\ h_{t}=RNNCell(z_{t},h_{t-1}),\\ x^{\prime}_{t+1}=h_{t}W_{d}+b_{d}.\end{split}

(21)

Table 3: MAE results(

\times 10^{-2}

) in the regular sequences prediction of network dynamics.

		MAE
		AGOG	NDCN	CGNN	WCGNN	GRAND	LSTM-GNN	GRU-GNN	RNN-GNN
Gene Regulation	Community	127.11	90.4	1032.35	893.49	606.87	1773.02	1406.7	1469.39
	Grid	33.06	106.98	227.58	216.4	270.63	187.3	181.38	187.03
	Power Law	42.37	33.13	453.64	286.78	440.6	310.13	335.62	322.85
	Random	175.41	85.32	1063.32	961.52	689.47	2768.31	2088.75	2219.91
	Small World	24.49	35.87	214.8	134.03	115.09	75.11	70.14	87.24
Kuramoto Model	Community	22.86	239.33	210.13	136.8	217.47	141.08	141.1	141.37
	Grid	25.14	252.09	335.53	169.64	210.84	146.38	134.32	140.95
	Power Law	24.32	201.19	191.53	122.71	244.15	138.6	136.13	142.79
	Random	23.09	258.95	176.92	134.99	213.76	145.94	142.93	144.89
	Small World	24.03	276.11	293.77	156.23	209.02	143.97	134.7	142.84
Mutualistic Interaction	Community	26.8	45.8	348.58	373.87	168.56	177.17	41.23	94.67
	Grid	39.78	112.34	246.89	206.6	85.98	196.6	205.37	216.79
	Power Law	21.4	81.73	331.29	317.76	150.37	179.97	84.77	110.69
	Random	34.47	70.18	241.21	363.66	84.28	579.91	56.17	116.97
	Small World	47.68	54.38	160.39	144.35	92.7	167.19	153.69	167.48

Table 4: Normalized L1 Loss results(

\times 10^{-2}

) in the regular sequences prediction of network dynamics.

		Normalized L1
		AGOG	NDCN	CGNN	WCGNN	GRAND	LSTM-GNN	GRU-GNN	RNN-GNN
Gene Regulation	Community	4.07	2.9	33.09	28.64	19.45	56.86	45.07	47.11
	Grid	4.86	15.74	33.48	31.83	39.81	27.55	26.68	27.51
	Power Law	4.4	3.44	47.14	29.8	45.79	32.23	34.88	33.55
	Random	4.46	2.17	27.05	24.48	17.54	70.47	53.14	56.48
	Small World	4.34	6.35	38.01	23.73	20.38	15.3	12.41	15.42
Kuramoto Model	Community	3.82	40	35.13	22.87	36.34	23.55	23.55	23.59
	Grid	4.11	41.27	54.93	27.77	34.51	23.94	21.99	23.07
	Power Law	4	33.12	31.53	20.2	40.21	22.79	22.39	23.48
	Random	3.81	42.82	29.25	22.32	35.29	24.08	23.6	23.91
	Small World	3.93	45.23	48.11	25.59	34.23	23.56	22.06	23.37
Mutualistic Interaction	Community	1.93	3.3	25.1	26.91	12.13	12.78	2.97	6.82
	Grid	10.4	29.37	64.54	54.01	22.48	51.39	53.69	56.67
	Power Law	2.43	9.27	37.56	36.02	17.05	20.4	9.61	12.55
	Random	2.24	4.55	15.65	23.61	5.47	37.64	3.63	7.59
	Small World	15.25	17.42	51.37	46.19	29.69	53.51	49.17	53.61

V.2 Result

The results are averaged over 20 independent realizations and shown in Tabs. 3 and 4. The best results in each case are denoted in bold. The results show that these two metrics come to the same conclusion. In the Gene Regulation network dynamics, NDCN performs the best in the community network, grid network, and random network, while our model is slightly inferior to NDCN but still comparable. In other scenes of network dynamics, our method demonstrates extraordinary competitiveness and has unparalleled dominance and the best performance. The performance of these temporal-GNN algorithms is unstable and not very good in these circumstances. Besides, one of the key flaws of these temporal-GNN algorithms is that there are too many trainable parameters. WCGNN performs relatively well in Kuramoto model network dynamics than NDCN, while the performance of GRAND is just fine in Mutualistic interaction network dynamics. The results of these methods in other scenarios are very poor. Overall, the performance of other algorithms is uneven and unstable. In the Kuramoto model dynamics, the Normalized L1 loss metric of our model can achieve $16\%$ of that of the second best, GRU-GNN. In the Mutualistic interaction network dynamics, our model can achieve $51.4\%$ of the second best, NDCN, in the MAE metric. To sum up, the results show that our model can have the best or comparable performance and good robustness in almost all cases. Our model can still learn and capture the network dynamics very well in the regular sequences prediction task.

VI Conclusion

Revealing the continuous dynamics on complex networks is essential for understanding complex systems. But this task is very hard due to the relatively unknown and partially known equations and high dimensions of complex systems. In addition, the observations of node states are generally non-uniform and very sparse in real life, which also poses a great challenge. In this paper, we propose an Autoregressive GNN-ODE GRU Model(AGOG) to capture the continuous network dynamics based on the observed node states in a data-driven manner. The further predictions are based on the observation history. With the predictions achieved, the true observations at the same time as the prior information are used to update the hidden states for the predictions at the next step. We impose a strong constraint to make the observations into the training process and make the ODE system learn the real continuous network dynamics more easily and accurately. We use a GNN module to simulate and fit the network dynamics, and this module has two parts that describe the interactions between nodes and the dependence of the dynamic evolution, respectively. The hidden state of nodes is specified by the ODE system, and we utilize the augmented ODE to map the hidden state into continuous time. The augmented ODE solves the hidden states in the augmented space, which could achieve lower losses and improve generalization and stability. Then the hidden states are updated through observations by GRUCell. To evaluate the validity of our model, we visualize the learned network dynamics by our model and test it in three tasks: interpolation reconstruction, extrapolation prediction, and regular sequences prediction. The results show that our model can consistently outperform other baseline methods or achieve comparable performance in these dynamics with different underlying networks. Our model can accurately learn the continuous network dynamics with a small error. In future work, we consider introducing the attention mechanism into the learning of network dynamics.

Acknowledgements.

This work was supported by the Project of Science and Technology Commission of Shanghai Municipality, China, under Grant 22JC1401401, the National Natural Science Foundation of China under Grant Nos 61873167, and the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000(in part).

Appendix A Appendix

We show the quantitative results over time of the interpolation reconstruction and extrapolation prediction tasks with $P$ as $10\%$ in the Figs. 5, 6 and 7. Due to space limitations, we just present the results in the community network. In each figure, each point corresponds to the average deviation of the prediction for the node state at a given moment. Here we ignore the specific time of the predicted node states and only consider the temporal order of the predictions. The results over time demonstrate that our method is superior to other methods at any time. The results of interpolation reconstruction and extrapolation prediction tasks with the observed proportion of node states as $30\%$ are shown in Tabs. 5, and 6, respectively. These two metrics all reach the same conclusion. Similar results are found in Tabs. 1 and 2. The results show that our model exhibits tremendous power ability and can capture the continuous network dynamics accurately and with a small error.

Table 5: Results(

\times 10^{-2}

) of interpolation reconstruction of network dynamics with the percentage of observed node states as

30\%

		MAE					Normalized L1
		AGOG	NDCN	CGNN	WCGNN	GRAND	AGOG	NDCN	CGNN	WCGNN	GRAND
Gene Regulation	Community	44.02	167.88	565.59	466.48	598.34	1.81	6.89	23.13	19.06	24.44
	Grid	8.08	46.42	203.7	118.86	163.07	1.78	10.23	45.12	26.27	36.02
	Power Law	21.31	44.84	304.94	222.21	333.06	2.73	5.78	39.24	28.55	42.8
	Random	59.58	187.32	506.02	355.29	702.61	1.95	6.11	16.5	11.59	23.03
	Small World	8.27	26.32	141.5	65.91	113.44	1.98	6.3	33.93	15.8	27.19
Kuramoto Model	Community	16.84	142.33	150.44	107.57	194.64	3.5	29.69	31.39	22.44	40.58
	Grid	15.31	141.85	219.09	133.96	156.36	3.15	29.21	45.13	27.59	32.2
	Power Law	13.33	115.04	127.48	85.31	187.11	2.72	23.53	26.08	17.45	38.26
	Random	17.96	151.52	135.19	102.82	180.39	3.72	31.43	28.06	21.34	37.38
	Small World	14.4	135.99	191.48	119.61	155.56	2.95	27.85	39.23	24.49	31.85
Mutualistic Interaction	Community	34.71	166.05	343.73	222	308.52	3.17	15.14	31.26	20.15	28.09
	Grid	9.88	25.63	84.11	55.69	35.29	4.6	11.93	39.19	25.89	16.44
	Power Law	11.55	75.35	183.26	143.99	211.69	2.27	14.68	35.83	28.13	41.41
	Random	40.64	190.7	310.74	218.81	308.77	3.24	14.94	24.42	17.2	24.31
	Small World	7.08	13.65	50.2	28.2	25.09	4.18	8.03	29.52	16.54	14.74

Table 6: Results(

\times 10^{-2}

) of extrapolation prediction of network dynamics with the percentage of observed node states as

30\%

		MAE					Normalized L1
		AGOG	NDCN	CGNN	WCGNN	GRAND	AGOG	NDCN	CGNN	WCGNN	GRAND
Gene Regulation	Community	79.33	216.3	898.11	708.72	635.88	2.53	6.89	28.61	22.59	20.25
	Grid	9.09	163.71	213.6	157.98	248.28	1.33	23.91	31.23	23.11	36.32
	Power Law	28.16	50.56	431.04	282.54	435.58	2.92	5.25	44.75	29.33	45.22
	Random	81.69	221.77	919.1	640.56	615.32	2.07	5.62	23.3	16.24	15.61
	Small World	10.69	36.64	192.26	89.6	107.77	1.89	6.49	34.05	15.89	19.1
Kuramoto Model	Community	13.7	182.87	173.01	115.78	244.22	2.27	30.39	28.77	19.23	40.59
	Grid	14.66	176.91	304.4	157.57	206.63	2.42	29.22	50.25	26.01	34.1
	Power Law	12.59	170.19	156.69	105.95	258.42	2.05	27.84	25.62	17.32	42.23
	Random	13.91	178.5	138.29	107.81	225.84	2.29	29.5	22.87	17.83	37.3
	Small World	14.1	181.54	250.1	141.32	208.99	2.31	29.76	40.98	23.15	34.23
Mutualistic Interaction	Community	19.58	125.3	332.21	209.61	147.17	1.41	9.01	23.9	15.09	10.59
	Grid	25.69	170.63	246.87	211.63	78.17	6.64	44.04	63.57	54.49	20.13
	Power Law	22.83	91.07	278.12	220.83	130.34	2.59	10.32	31.52	25.03	14.77
	Random	15.71	146.11	229.37	222.86	102.63	1.02	9.46	14.88	14.43	6.65
	Small World	16.25	74.67	159.9	148.57	94.78	5.09	23.28	50.01	46.4	29.64

References

Newman [2003] M. E. Newman, SIAM Rev. 45, 167 (2003).
Liu et al. [2020] C. Liu, Y. Ma, J. Zhao, R. Nussinov, Y.-C. Zhang, F. Cheng, and Z.-K. Zhang, Phys. Rep. 846, 1 (2020).
Kossinets and Watts [2006] G. Kossinets and D. J. Watts, Science 311, 88 (2006).
Perozzi et al. [2014] B. Perozzi, R. Al-Rfou, and S. Skiena, in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (2014) pp. 701–710.
Yuan et al. [2022] X. Yuan, G. Ren, Y. Yu, and W. Sun, J FRANKLIN I 359, 2663 (2022).
Benson et al. [2016] A. R. Benson, D. F. Gleich, and J. Leskovec, Science 353, 163 (2016).
Gao et al. [2016] J. Gao, B. Barzel, and A.-L. Barabási, Nature 530, 307 (2016).
Alon [2019] U. Alon, An introduction to systems biology: design principles of biological circuits (CRC press, 2019).
Jiahao et al. [2021] T. Z. Jiahao, M. A. Hsieh, and E. Forgoston, Chaos: An Interdisciplinary Journal of Nonlinear Science 31, 111101 (2021).
Kaheman et al. [2020] K. Kaheman, J. N. Kutz, and S. L. Brunton, P. Roy. Soc. A-Math Phy. 476, 20200279 (2020).
Ljung [2010] L. Ljung, Annu. Rev. Control 34, 1 (2010).
Schmidt and Lipson [2009] M. Schmidt and H. Lipson, Science 324, 81 (2009).
Kutz et al. [2016] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor, Dynamic mode decomposition: data-driven modeling of complex systems (SIAM, 2016).
Voss et al. [1999] H. U. Voss, P. Kolodner, M. Abel, and J. Kurths, Phys. Rev. Lett. 83, 3422 (1999).
Yair et al. [2017] O. Yair, R. Talmon, R. R. Coifman, and I. G. Kevrekidis, Proc. Natl. Acad. Sci. U. S. A. 114, E7865 (2017).
Vlachas et al. [2018] P. R. Vlachas, W. Byeon, Z. Y. Wan, T. P. Sapsis, and P. Koumoutsakos, P. Roy. Soc. A-Math Phy. 474, 20170844 (2018).
Champion et al. [2019] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, Proc. Natl. Acad. Sci. U. S. A. 116, 22445 (2019).
Zang and Wang [2020] C. Zang and F. Wang, in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (2020) pp. 892–902.
Dupont et al. [2019] E. Dupont, A. Doucet, and Y. W. Teh, in Proc. 33rd Int. Conf. Neural Inf. Process, Vol. 32 (2019) pp. 3134–3144.
Chen et al. [2018] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, in Proc. 32nd Int. Conf. Neural Inf. Process, Vol. 31 (2018) pp. 6572–6583.
Morrill et al. [2020] J. Morrill, P. Kidger, C. Salvi, J. Foster, and T. Lyons, arXiv e-prints , arXiv (2020).
Jhin et al. [2021] S. Y. Jhin, M. Jo, T. Kong, J. Jeon, and N. Park, arXiv preprint arXiv:2105.14953 (2021).
Liang et al. [2021] Y. Liang, K. Ouyang, H. Yan, Y. Wang, Z. Tong, and R. Zimmermann, in Proc. 30th Int. Joint Conf. Artif. Intell. (2021) pp. 1498–1504.
Rubanova et al. [2019] Y. Rubanova, R. T. Chen, and D. Duvenaud, in Proc. 33rd Int. Conf. Neural Inf. Process (2019) pp. 5320–5330.
Lechner and Hasani [2020] M. Lechner and R. Hasani, Proc. 34th Int. Conf. Neural Inf. Process 33 (2020).
Brouwer et al. [2019] E. D. Brouwer, J. Simm, A. Arany, and Y. Moreau, in Proc. 33rd Int. Conf. Neural Inf. Process (2019) pp. 7379–7390.
Herrera et al. [2020] C. Herrera, F. Krach, and J. Teichmann, in Int. Conf. Learn. Representations (2020).
Zhang et al. [2020] Z. Zhang, P. Cui, and W. Zhu, IEEE Trans. Knowl. Data Eng. (2020).
Wu et al. [2020] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, IEEE Trans. Neural Netw. Learn. Syst. 32, 4 (2020).
Kipf and Welling [2016] T. N. Kipf and M. Welling, in Int. Conf. Learn. Representations (2016).
Kipf and Welling [2016] T. N. Kipf and M. Welling, arXiv preprint arXiv:1611.07308 (2016).
Veličković et al. [2018] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, in Int. Conf. Learn. Representations (2018).
Hamilton et al. [2017] W. L. Hamilton, Z. Ying, and J. Leskovec, in Proc. 31st Int. Conf. Neural Inf. Process, Vol. 30 (2017) pp. 1024–1034.
Li et al. [2018] R. Li, S. Wang, F. Zhu, and J. Huang, in Proc. 40th AAAI Conf. Artif. Intell.,, Vol. 32 (2018).
Zhuang and Ma [2018] C. Zhuang and Q. Ma, in Proc. 27th Int. Conf. World Wide Web (2018) pp. 499–508.
Gilmer et al. [2017] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, in Int. Conf. Mach. Learn. (2017) pp. 1263–1272.
Li et al. [2019] G. Li, M. Muller, A. Thabet, and B. Ghanem, in Proc. IEEE/CVF International Conf. Comput. Vis. (2019) pp. 9267–9276.
Chen et al. [2018] J. Chen, T. Ma, and C. Xiao, in Int. Conf. Learn. Representations (2018).
Ying et al. [2018] R. Ying, J. You, C. Morris, X. Ren, W. L. Hamilton, and J. Leskovec, in Proc. 32nd Int. Conf. Neural Inf. Process (2018) pp. 4805–4815.
Li et al. [2016] Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel, in Int. Conf. Learn. Representations (2016).
Xhonneux et al. [2020] L.-P. Xhonneux, M. Qu, and J. Tang, in Int. Conf. Mach. Learn. (2020) pp. 10432–10441.
Chamberlain et al. [2021] B. Chamberlain, J. Rowbottom, M. I. Gorinova, M. Bronstein, S. Webb, and E. Rossi, in International Conference on Machine Learning (PMLR, 2021) pp. 1407–1418.
Fang et al. [2021] Z. Fang, Q. Long, G. Song, and K. Xie, in Proc. 27th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (2021) pp. 364–373.
Daoulas et al. [2021] V. Daoulas, N. Tampouratzis, P. Mousouliotis, and I. Papaefstathiou, in 2021 IEEE/ACM 25th International Symposium on Distributed Simulation and Real Time Applications (DS-RT) (IEEE, 2021) pp. 1–8.
Fortunato [2010] S. Fortunato, Phys. Rep. 486, 75 (2010).
Erd6s and Renyi [1959] P. Erd6s and A. Renyi, Publ. Math. Debrecen 6, 290 (1959).
Barabási and Albert [1999] A.-L. Barabási and R. Albert, Science 286, 509 (1999).
Watts and Strogatz [1998] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
Kuramoto [1975] Y. Kuramoto, in Int. Symp. Math. Probl. Theor. Phys. (1975) pp. 420–422.