STGC-GNNs: A GNN-based traffic prediction framework with a spatial-temporal Granger causality graph
Abstract
The key to traffic prediction is to accurately depict the temporal dynamics of traffic flow traveling in a road network, so it is important to model the spatial dependence of the road network. The essence of spatial dependence is to accurately describe how traffic information transmission is affected by other nodes in the road network, and the GNN-based traffic prediction model, as a benchmark for traffic prediction, has become the most common method for the ability to model spatial dependence by transmitting traffic information with the message passing mechanism. However, existing methods model a local and static spatial dependence, which cannot transmit the global-dynamic traffic information (GDTi) required for long-term prediction. The challenge is the difficulty of detecting the precise transmission of GDTi due to the uncertainty of individual transport, especially for long-term transmission. In this paper, we propose a new hypothesis: GDTi behaves macroscopically as a transmitting causal relationship (TCR) underlying traffic flow, which remains stable under dynamic changing traffic flow. We further propose spatial-temporal Granger causality (STGC) to express TCR, which models global and dynamic spatial dependence. To model global transmission, we model the causal order and causal lag of TCR’s global transmission by a spatial-temporal alignment algorithm. To capture dynamic spatial dependence, we approximate the stable TCR underlying dynamic traffic flow by a Granger causality test. The experimental results on three backbone models show that using STGC to model the spatial dependence has better results than the original model for 45-min and 1 h long-term prediction.
Index Terms:
traffic prediction, spatial dependence, Granger causality.I Introduction
With the growing development of Intelligent Transportation Systems (ITS), traffic prediction, which is an important function of ITS, has received increasing attention. Traffic prediction is considered a time-series prediction task that requires the prediction of traffic data in the future based on historical traffic data recorded by traffic sensors and includes traffic flow prediction, flow velocity prediction, and peak hour prediction. These can support decision-making for city management, traffic planning, and route optimization.
The basic assumption of traffic prediction is that a stable pattern is implied behind the traffic data. Such patterns can be discovered from historical data and used for future forecasting. A large number of traffic forecasting studies have emerged in recent years. Statistics-based methods were first applied, including historical averaging (HA)[1, 2], vector autoregression (VAR)[3], the autoregressive integrated moving average (ARIMA) model[4, 5, 6, 7, 8, 9], and the Kalman filtering model[10]. With the development of deep learning, regression models such as feedforward neural networks (FFNs)[11, 12, 13, 14] and deep belief networks (DBNs) are used for traffic prediction. A recurrent neural network (RNN)[15, 16] and its variants long short-term memory (LSTM) and gated recurrent units (GRUs)[17] have also been used to model the temporal dependence of traffic data. However, the above methods only model the temporal dependence and ignore the spatial dependence among different road network nodes. This is mentioned in Observation 1. Existing views generally agree that traffic prediction is different from the ordinary time-series prediction task in that the spatial dependence constrained to the road network structure is more important in addition to capturing temporal patterns.
To capture spatial dependence, models based on convolutional neural networks (CNNs) have been used to model spatial dependence[18]. However, such models are only applicable to Euclidean data and not non-Euclidean road networks. Therefore, graph neural networks (GNNs), which can represent discrete and irregular data, is used to model complex road networks and has become the benchmark model for traffic prediction. These methods extract a local connectivity between nodes from the static road network topology and model it as a spatial graph to input into the GNN structure and then transmit the traffic information between nodes through a message-passing mechanism such as the temporal graph convolutional network (T-GCN)[19], graph wavenet[20], or spatiotemporal graph convolutional network (STGCN)[21]. Other methods such as STSGCN[22], STFGNN[23], and STGODE[24] also consider the effect of spatial dependence on the time-axis, and a combination of spatial graphs and temporal graphs is input into the GNN structure.
The essence of spatial dependence is to accurately describe how traffic information is transmitted between road network nodes, in other words, how the transmission of traffic information is influenced by other nodes. However, the large uncertainty of microtransport behavior makes it inherently difficult to track and predict the traffic information transmitted in the road network, especially in long-term prediction, where the transmission of traffic information is a global and dynamic process. The local and static spatial dependence described by the spatial graph cannot express the global-dynamic traffic information (GDTi) transmitted among the road network nodes. Therefore, how to model the transmission of GDTi in the road network and predict the traffic information in the future with the power of GNN is the key problem in long-term prediction.
In this paper, we propose a transmitting causal relationship (TCR) hypothesis: the transmission of GDTi is very uncertain at the microlevel but behaves as a stable causal relationship underlying traffic flow transmission at the macrolevel. TCR can be approximated by the spatial-temporal Granger causality test in the long-term transmission of GDTi through long-term transmission. We further propose a GNN-based traffic prediction framework with a spatial-temporal Granger causality graph (STGC-GNNS) for prediction. Our contributions are as follows.
-
1.
We propose a spatial-temporal Granger causality to model TCR, which can express global and dynamic spatial dependence and capture the transmission of GDTi to perform long-term prediction tasks. Spatial-temporal Granger causality can be detected by a spatial-temporal alignment algorithm followed by a Granger causality test.
-
2.
We propose a GNN-based traffic prediction framework with a spatial-temporal Granger causality graph (STGC-GNNS) to improve long-term prediction. This framework is compatible with all GNN-based traffic prediction models that use only a spatial graph to capture spatial dependence.
-
3.
We conduct comparative performance experiments on a real-world dataset, and the results show that our method can outperform three backbone models on horizons of 45 min and 60 min. The visualization of the results shows that our method can improve the prediction of nodes with high prediction difficulty on all horizons such as intersectional, boundary, and distant nodes. This can further verify the validity of the spatial dependence we captured.
The rest of the paper is organized as follows. In Section II, existing traffic prediction methods are summarized and analyzed. Our hypothesis and the observations deducing it are described in Section III. Section IV depicts our overall framework and detailed methods. We evaluate our work in Section V and conclude this paper in Section VII. Several potential problems to solve in the future are discussed in Section VI.
II Related work
The traffic prediction task aims to forecast future traffic data using historical traffic data and includes traffic flow prediction, flow velocity prediction, and peak hour prediction. It is an important part of Intelligent Transportation Systems (ITS), and existing traffic prediction methods can be classified into model-driven and data-driven types. Model-driven methods consider traffic prediction as a time-series forecasting task, aiming to model the temporal patterns inherent in historical data and use them for prediction. These can be classified into two categories: statistical methods and machine learning methods. Among them, statistical methods were first used in traffic flow forecasting by fitting parametric models from historical data through parametric methods such as linear regression for forecasting, including historical averaging (HA)[1, 2], vector autoregression (VAR)[3], autoregressive integrated moving average (ARIMA) models[4, 5, 6, 7, 8, 9], and Kalman filtering models[10]. These models are based on the assumption of stationary and linear time series and can make simple and fast forecasts but are not sufficient to model the uncertainty in complex and dynamic traffic data. Machine learning methods fill this gap by automatically learning the patterns inherent in the data from sufficient historical data based on nonlinear assumptions. This includes including feature-based methods, Gaussian process-based methods, and state-space-based methods. Feature-based methods regress traffic data using human-engineered important traffic features[25, 26, 27], Gaussian process-based methods model the intrinsic features of traffic data for prediction by different kernel functions[28, 29, 30], and state space-based methods consider the traffic data generation process as a hidden Markov process and thus use the state space model to model the traffic system[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. Machine learning methods are able to handle high-dimensional data and capture the complex nonlinearity inherent in the data.
With the development of deep learning, as a powerful approximator, deep neural networks can learn implicit patterns from large amounts of data. These are called data-driven methods and have achieved good performance in traffic prediction tasks. Data-driven traffic prediction methods can be classified into two categories: one for considering only temporal dependence and the other for considering both temporal and spatial dependence. The temporal-dependence-only approaches use CNN[46], RNN[15, 16], and its variants[17] to model the temporal correlation in traffic data. These methods can capture temporal characteristics such as the periodicity and trend of traffic flow data but ignore the spatial dependence. The methods that capture both temporal and spatial dependence consider the interactions between road network nodes and can be divided into CNN-based methods and GNN-based methods based on the representation of the relationships between nodes. CNN-based methods divide road networks into regular grids and ignore the natural discrete, irregular structure of road networks[18]; GNN-based methods take this into account, are capable of characterizing non-Euclidean structures, and have become a benchmark model of traffic prediction. According to the GNN framework used, these approaches can be classified as GCN-based methods, GAT-based methods, and GAE-based methods. GCN-based methods model spatial dependencies using convolutional operations in the spatial or spectral domain[19, 20, 21, 22, 23, 24, 47, 48, 49, 50], GAT-based methods use attention mechanisms to learn the weights of other traffic nodes[51, 52, 53, 54, 55], and GAE-based methods use a self-encoder structure to encode node representations[56].
GNN-based methods take the predefined graph as input to model the spatial dependence of complex road networks. The existing methods extract a local connectivity between nodes from the static road network topology and express it as a spatial graph such as a traffic network topology graph or spatial distance weighting graph. The former adjacency matrix sets the entry value to 1 if road Sections and are topologically adjacent, while the latter preserves the connections between pairs of nodes that are closer by using a Gaussian filter. The entry value is closer to 1 if road sections and are spatially closer. Most methods input spatial graphs directly into GNN structures such as temporal graph convolutional networks (T-GCNs)[19], graph graphs[20], and spatiotemporal graph convolutional networks (STGCNs)[21]. Another class of methods considers the similarity of time series between nodes on the basis of a spatial graph and combines a temporal graph with it to input. STSGCN[22] considers spatial dependencies to be invariant on the time axis and thus models spatial-temporal correlation by connecting individual spatial graphs of adjacent time steps into one graph as input. STFGNN[23] and STGODE[24], on the other hand, assume that spatial dependencies can lead to correlations on time series and therefore generate temporal graphs using DTW to compute correlations between time series, with the former merging the temporal graph with the spatial graph and the latter inputting the temporal graph and spatial graph into the prediction module separately.
Based on the above, the modeling of spatial dependence is particularly important for GNN-based methods. The essence of spatial dependence is to accurately describe the transmission of traffic information on road networks, which is very difficult in traffic prediction on complex road networks, especially in long-term prediction. The local and static spatial dependence modeled by existing methods cannot describe the state of global diffusion of traffic information after long-range and dynamic transmission, i.e., a spatial graph cannot model the transmission of GDTi in the road network. Modeling the transmission of GDTi in the road network and predicting the traffic information with GNN is the key problem in long-term prediction.
To capture the transmission of GDTi, we propose a TCR hypothesis and further express the TCR using spatial-temporal Granger causality to achieve the modeling of global and dynamic spatial dependence.
III Transmitting Causal Relationship (TCR) Hypothesis
The modeling of spatial dependence essentially captures the transmission of traffic information between different nodes in the road network. For the long-term traffic prediction task, the traffic information is difficult to capture accurately because it is global and dynamic after long-term travel. The global property describes that the distance spanned by this transmission is long-range from a spatial perspective, while the dynamic property describes its uncertainty and complexity from a temporal perspective. To model the spatial transmission of GDTi as accurately as possible, we propose a hypothesis in this section, the transmitting causal relationship (TCR) hypothesis, which argues that the key to long-term prediction lies in modeling the global and dynamic transmission of traffic information in the road network, which shows high uncertainty and complexity at the micro level but behaves as a stable causal relationship at the macro level.
Since the observed traffic data in the real world are a consequence of the combination of dynamic changes in traffic information and complex topological constraints of the road network and the transmission of traffic information is not independent of the time and space, three dependencies can be derived, which makes traffic prediction a challenging task. In this section, to make our hypothesis and the subsequent expression of our method clearer, we first redefine the three dependencies and lags derived. We also describe three observations on real-world transportation systems, which can support the motivations and hypothesis. Note that when defining spatial dependence, since we want to capture a transmitted spatial dependence that is directed, the definitions of the target-node and source-node are derived, which may be novel for the traffic prediction domain.
III-A Definition of different dependencies
(1)Temporal dependence



Definition 1 (Temporal dependence). Temporal dependence is the effect that the traffic information at the current timestep has on the traffic information at future timesteps for the same node.
Definition 2 (Temporal lag). Temporal lag is the interval of time to transmit traffic information between two timesteps, which are temporally dependent.
(2)Spatial dependence
Definition 3 (Spatial dependence). Spatial dependence is the effect that the traffic information of one node has on the traffic information of another node at the current time step.
Definition 4 (Spatial lag). Spatial lag is the distance to transmit traffic information between two nodes, which are spatially dependent at one timestep.
Definition 5 (Target-node and Source-node). For a pair of nodes that are spatially dependent, we call the node that releases the traffic information the source-node and the node that receives the traffic information the target-node. As illustrated in Figure 3, node no. 4 is the source-node, and nodes no. 1, 2, 3, 5, and 7 are target-nodes.
(3)Spatial-temporal dependence
Definition 6 (Spatial-temporal dependence). Spatial-temporal dependence is the effect that the traffic information of one node at the current time step has on the traffic information of another node at the future time step.
Definition 7 (Spatial-temporal lag). Spatial-temporal lag is the temporal lag consumed by transmitting traffic information between different nodes, which is spatial-temporal dependent.
III-B Three observations on a real-world traffic system
Similar to what was mentioned in Section I, existing methods ignore some important observations in real-world traffic systems. In this section, we will depict those observations that can help understand our methods.
(1)Observation 1: Power of spatial dependence




The nodes in the road network are not completely independent but interact with others with the movement of transport, which is manifested as the transmission of traffic information and will lead to spatial dependence between interacting nodes. In the GNN-based model, the predictions of target-nodes can be improved if the spatial dependence can be modeled correctly because the information of other nodes is correctly introduced.
We observe that for the GNN-based model, since the spatial dependence is modeled by the input graph, it is possible to determine whether the input graph contains a valid spatial dependence by comparing its prediction result with the result when only self-connected inputs are used. Figure 4 shows the prediction results of the model on the METR_LA dataset when the input graph is a spatial graph (which is formulated as a spatial weighting matrix) and a self-connected graph (which is formulated as an identity matrix).
Figure 4 shows that in the majority of cases, the prediction results of the spatial weighting matrix input outperform those of the identity matrix input, which indicates that it is necessary and effective to consider the spatial dependence. However, it can also be seen that on the horizon of 45 min, the prediction results of the STGCN with spatial graph input are instead worse than those with self-connected graph input, which indicates that when the captured spatial dependence is wrong or redundant, it may lead to worse performance compared to using the traffic information only from itself. We call the reason behind this phenomenon the power of spatial dependence.
Therefore, for the GNN-based traffic prediction model, the construction of input graphs modeling spatial dependence directly affects the prediction effect of the model even if all other settings of the model are kept consistent. This is exactly the power of spatial dependence, which must be taken into consideration when using GNN-based traffic prediction models.
(2)Observation 2: Spatial-temporal entanglement of traffic information transmission
In real-world transportation systems, the flow of traffic can be considered as a process of traffic information transmitting across space and time. This observation indicates that the spatial-temporal entanglement of traffic information transmission for the traveling distance and the corresponding time consumed are not independent. We depict this observation in Figure 5.

The observation of the spatial-temporal entanglement of traffic information transmission indicates that if we want to capture the spatial dependence of road networks from traffic information under global transmission, we must consider the spatial-temporal lag of traffic information transmission across space. We considered this in Algorithm 1.
(3)Observation 3: Capability boundary of spatial distance


The existing methods generally consider the spatial distribution of road network nodes as an important indicator to measure spatial dependence, so they often use traffic network topology graphs or spatial distance weighting graphs as spatial graphs to input into GNN structures, which essentially model a local and static spatial dependence, i.e., the spatial dependency obtained from a fixed distribution of nodes in a certain road network. However, when traffic information is long-term transmitted, it will exhibit the property of being global in space and dynamic in time. We refer to this kind of traffic information as global-dynamic traffic information (GDTi). The key to long-term prediction is to model the transmission of GDTi. Therefore, it cannot be described by a local and static spatial dependence.
The simulation process of traffic flow for the same road network on horizons of 15 min and 30 min is illustrated in Figure 6. Assume that the time required for traffic information to be transmitted between two adjacent intersections in this traffic system is 15 min. Then, when we predict the traffic characteristics (e.g., flow rate and flow speed) of the target-node after 15 minutes, we only need to trace the traffic characteristics of the 1st-order neighbors at the current moment because this part of the traffic information will consume 15 minutes to transmit to the target-node. Similarly, when predicting the short-term traffic characteristics after 30 minutes, we need to trace the traffic characteristics of the 2nd-order neighbors.
It can be inferred that the nodes to be tracked will be farther away from the target-node on the prediction horizon of 45 minutes and longer, and therefore the transmission of GDTi with a traveling time of 45 minutes needs to be captured. This is difficult to capture in the existing spatial graph.
III-C TCR Hypothesis
To capture the long-term transmission of GDTi in the road network as accurately as possible, we propose the TCR hypothesis and further hypothesize the properties of TCR based on the above observations, which can derive our method. According to Observation 1 and Observation 3, we suggest that the spatial dependence for long-term prediction can be expressed by the transmission of GDTi, specifically as a kind of transmitting causal relationship (TCR), so we try to capture this TCR and model it directly as a graph input to the GNN structure. According to Observation 2, we consider the spatial-temporal entanglement of the transmission of GDTi and design the spatial-temporal alignment algorithm in the process of capturing the TCR.
TCR Hypothesis. The transmission of Global-Dynamic Traffic information (GDTi) appears as a stable causal relationship between nodes underlying transmitting traffic information, which is referred to as Transmitting Causal Relationship (TCR).
IV Methodology
IV-A Overview
As illustrated in Figure 7, GNN-based traffic prediction models can always be formalized as a component that captures temporal dependence and a GNN that captures spatial dependence. However, existing spatial graphs describe a local and static spatial dependence that may fail to make long-term predictions. We propose a spatial-temporal Granger causality test method that captures global and dynamic spatial dependence and models it as a spatial-temporal Granger causality graph (STGC graph) input to the framework.

IV-B Spatial-temporal Granger causality graph (STGC graph)
In this section, we further propose a spatial-temporal Granger causality test to approximate the causal relationship underlying traffic road networks based on the TCR hypothesis to capture global and dynamic spatial dependence. The spatial-temporal Granger causality test is further divided into two modules: the spatial-temporal alignment module and the Granger causality test module. The former aligns the release and receive time of TCR’s global transmission, so the cause-node and effect-node contain the same part of GDTi at the same timestep, making transmission easier to detect; the latter further tests which two nodes the global transmission occurs from the aligned time-series, thus capturing the stable TCR under dynamic traffic flow.
(1)Spatial-temporal alignment
According to the TCR hypothesis, the transmission of GDTi can be regarded as the flow of causal effect from the cause-node to the effect-node, i.e., the mechanism of TCR. According to Observation 1 and Observation 2, TCRs can be derived as having two significant properties.
-
•
Causal lag. The effect produced by the cause-node does not have an immediate impact on the effect-node. The occurrence of the cause and the effect has a time lag, which is expressed as a spatial-temporal lag between the cause-node and effect-node in the traffic road network.
-
•
Causal order. The effect always occurs or is observed after the cause, which is manifested as the effect-node’s receiving GDTi always occurring after the cause-node’s releasing GDTi.
Therefore, we designed a spatial-temporal alignment algorithm, which can be divided into two steps.
-
1.
According to the causal lag, the time consumed by the TCR to transfer from the cause-node to the effect-node is calculated, i.e., the spatial-temporal lag between them.
-
2.
According to the causal order, we shift the time series of the cause-node by timesteps along the time axis to align the time when the TCR is generated on the cause-node and the time it impacts on the effect-node, thus ensuring that the causal mechanism can be inferred directly from the observed time series, which is the outcome of this causal mechanism.
According to the definitions in Section III, for TCR, the source-node is the cause-node, and the target-node is the effect-node. The spatial lag between nodes is the spatial distance between them, and the spatial-temporal lag is the time it takes for GDTi to travel from the cause-node to the effect-node, i.e., the time required for TCR transmission. Since the spatial-temporal lag between nodes is related not only to the spatial distance between nodes but also to the flow velocity of traffic information, which cannot be measured precisely, we use the average velocity of the cause-node as an approximation of the traffic information flow velocity. The algorithm is shown schematically in Figure 8.
Algorithm 1 Spatial-temporal alignment. Spatial-temporal alignment is the operation that slides the time series of the source-node along the time axis, and the sliding stride is the spatial-temporal lag between the source-node and target-node. The is the ratio of the spatial lag between the source-node and target-node to the average traffic velocity of the source-node.

As illustrated in Figure 8, B is the source-node, and A is the target-node.
(2)Granger causality test
After the spatial-temporal alignment of cause-node and effect-node, we can assume that cause-node and effect-node contain the same part of GDTi at the same moment, and we can infer whether there is a transmission of GDTi between them directly from the observed time series. Therefore, we need to answer another important question: how can TCR be detected from long-term observed and dynamically changing time series?
Based on the observation of real-world systems, with the traveling of traffic flow, the traffic information of downstream nodes will contain the traffic information of their upstream nodes, and thus the traffic information of upstream nodes can significantly improve the prediction of downstream nodes. The Granger causality test considers the time-series variables that can significantly improve the prediction after joining as cause variables, while the improved one is the effect variable, which is consistent with our observation. Therefore, we believe that the TCR in the traffic system may be manifested as Granger causality, and using the Granger causality test algorithm, we are able to detect the transmission of traffic information between two nodes, which can express a global and dynamic spatial dependence.
Granger causality analysis determines whether there is a causal relationship between different time-series variables. The basic idea is that if the prediction result using the historical information of both X and Y is better than that of only using the historical information of Y, that is, X helps to explain the future trend of Y, then X is the Granger cause of Y. There are two important hypotheses behind this:
-
•
Adding information about the cause variable helps recover the information of the effect variable.
-
•
The time lags need to be modeled when using past data to predict future data.
The Granger causality test method established vector autoregressive models for each of the above two predictions as follows.
(1) |
(2) |
where and represent the traffic values of the time series X and Y at time respectively,, and represent the regression parameters of the autoregressive model, and represent the residuals of the two autoregressive models respectively, is the time lag of autoregressive models. If , then it can be determined that the X has a statistically significant Granger causality on Y, expressed as “ Granger-causes ”.
As shown in Figure 9, to perform a Granger causality test on the time series of node A and B, we need to input both time series and to be tested. If the returned p-value is below the significance level, the null hypothesis, “ does not Granger-cause ”, is rejected, i.e., the conclusion obtained is that the node B is a Granger cause of the node A.

(3)STGC graph
We input the aligned time series of the two nodes into the Granger causality test module, i.e., we use the spatial-temporal Granger causality test method to test the TCR between the two nodes, as shown in Figure 10.

The spatial-temporal Granger causality test is able to detect a Granger causality relationship between two traffic nodes, which can imply global and dynamic spatial dependence. To model this spatial dependence into a graph to input to the GNN for prediction, the following settings are made based on our observations:
-
•
The reservation of long-range spatial dependence. To prevent filtering out the long-range spatial dependence, instead of obtaining the input graph by a distance filter or topological adjacency, we model the distance between nodes as a spatial-temporal lag and use the spatial-temporal Granger causality test to detect the spatial dependence under global transmission. It is important to note that we will not use the “cost” offered in the dataset (which implies the spatial distance between different nodes) directly; we calculate the shortest path in the road network between two nodes and obtain the minimum “cost” instead. The reason behind this operation is to reserve as much potential spatial dependence as possible to be detected because the raw dataset does not provide all “cost” between any two nodes, even nodes that are very close. This is also the way to obtain the spatial-temporal lag for long-range spatial dependence because the raw dataset does not provide the “cost” for two nodes that are far away from with a high probability.
-
•
The direction of cause-and-effect.Spatial-temporal Granger causality, as a manifestation of spatial dependency between nodes, describes a causal relationship underlying traffic information transmission, which can be understood as the tracking of traffic information in the real world. Among them, cause-node corresponds to the source-node of traffic flow, while effect-node corresponds to the target-node of traffic flow. therefore, unlike the spatial graph where the dependence is mostly undirected, the relationships of nodes in the STGC graph are directed, and the direction is from the cause-node to the effect-node. this is not only consistent with the real-world situation that traffic information always flows from source-node to target-node but also consistent with the message passing mechanism of GNN, that is, the message of cause-node can pass to effect-node and update effect-node thus helping effect-node to make better prediction.
IV-C GNN-based traffic prediction
Eventually, we input the STGC graph into the GNN-based traffic prediction model, which can capture both the temporal dependence and spatial dependence of the traffic road network. The GNN-based traffic prediction model generally combines two modules, one for modeling temporal dependence such as RNN and its variants, and the other for modeling spatial dependence using GNN. Among them, the GNN structure can propagate and aggregate the traffic information between nodes that are spatially dependent, which is a good way to depict the transmission of GDTi. We can evaluate the ability to predict traffic directly by the output result of the GNN-based traffic prediction model.
V Experiments and Results
V-A Dataset
We conducted experiments on the real-world dataset METR_LA, which is a Los Angeles highway dataset in the United States and one of the benchmark datasets for traffic prediction. For this dataset, the spatial graph is modeled in the form of a spatial weighting graph, which is later referred to as the spatial distance graph (SD graph), i.e., the spatial distance between nodes is converted into weights using Gaussian filtering.
(1)Selection of dataset
Based on Observation 1, we compared the prediction performance of the STGCN on two U.S. highway datasets using self-connected graph (identity matrix as adjacency matrix) and SD graph (spatial weighting matrix as adjacency matrix) as input, the former representing the case of prediction using only the temporal dependence and the latter representing the case of introducing spatial dependence, as shown in Table I.
Dataset | Horizon | MAE/MAPE/RMSE | |
Spatial weighting matrix | Identity matrix | ||
METR_LA | 15 min | 3.07/7.16%/6.50 | 3/3.13/7.64%/6.69 |
30min | 3.55/8.79%/7.95 | 4.12/9.86%/8.34 | |
60min | 4.62/11.37%/10.05 | 4.70/11.41%/10.31 | |
PEMS_BAY | 15 min | 1.14/2.30%/2.28 | 1.15/2.36%/2.32 |
30min | 1.41/3.03%/3.06 | 1.41/3.00%/3.07 | |
60min | 1.81/4.04%/3.99 | 1.79/4.05%/4.01 |
It can be found that on the PEMS_BAY dataset, the prediction performance of the GNN-based model using a self-connected graph as input is very close to that using the SD graph, while the difference between them on METR_LA is very significant. Additionally, the overall prediction error value of PEMS_BAY is smaller. This indicates that the METR_LA dataset has higher complexity and prediction difficulty and can more significantly compare the effectiveness of different input graphs, so this dataset is chosen for experiments and analysis in this work.
(2)Basic information of dataset
The basic information of the METR_LA dataset is listed in Table II.The study area and distribution of traffic sensors in the METR_LA dataset are shown in Figure 11.

Dataset | Number of sensors | Duration | Sampling interval | Sampling time steps |
METR_LA | 207 | 2012/3/1 00:00:00——2012/6/27 23:55:00 | 5 min | 34272 |
V-B Experiments Setting
To more directly compare the effectiveness of the input graph itself for capturing spatial dependence, we selected three GNN-based models that use the input graph directly instead of performing transformation operations such as the attention mechanism. These models can use the GNN mechanism to aggregate and update messages based on the dependence expressed in the input graph and can directly reflect the prediction improved by neighboring nodes.
-
1.
T-GCN[19]: A temporal graph convolution for traffic prediction that combines a graph convolutional network (GCN) and gated recurrent unit (GRU) to capture spatial dependence and temporal dependence.
-
2.
STGCN[21]:Spatiotemporal graph convolutional networks for traffic forecasting, which builds a model with complete convolutional structures for temporal and spatial prediction.
-
3.
Graph Wavenet[20]:A deep spatial-temporal graph model for traffic prediction by developing a novel adaptive dependency matrix to work with a spatial distance matrix. To compare our Granger spatial-temporal causal graph with the spatial distance graph, we remove the adaptive dependency matrix as the backbone.
We use these three models as backbone models, i.e., GNN-based traffic prediction models in Figure 7. Apart from this operation, the partitioning of the dataset, the structure of the model and the training settings are kept the same as those of the original model to make a fair and effective evaluation of the effects produced by the method proposed in this paper.
V-C Metrics
To evaluate the effectiveness of this method for traffic prediction tasks, the following metrics are used as a measure of the difference between real traffic data and predicted data .
-
•
Mean Absolute Error (MAE)
(3) -
•
Mean Absolute Percentage Error (MAPE)
(4) -
•
Root Mean Squared Error (RMSE)
(5)
V-D Performance Analysis
We evaluated the performance on prediction horizons of 15 min, 30 min, 45 min, and 60 min. The results are listed in Table III, where the backbone using SD graph as input graph uses the original name, while “STGC-” indicates a model using STGC graph as input.
It can be seen that for the same backbone model, the SD graph has better prediction results in all metrics for short-term prediction of 15 min and 30 min, while for the long-term prediction of 45 min and 60 min, the STGC graph has better results. This suggests that spatial dependence constructed from local and static connectivity can capture most of the nodes that interact with the target-node within a short time but seems to fail in long-term prediction. Long-range spatial dependence is needed more for long-term prediction, which is supposed to be detected in the global transmission of traffic information.
The comparison experiments between the STGC graph and the SD graph under the control variable settings have been able to show that the STGC graph is more effective in long-term prediction than the SD graph. To further illustrate that this effectiveness of the STGC graph is self-induced and not just due to the invalidation of the SD graph, we add the results where the input graph is a self-connected graph only (the adjacency matrix is the identity matrix) in comparison. The results are shown in Table IV, where (-) represents that the input graph is a self-connected graph.


Horizon | Model | Metrics | ||
MAE | MAPE(%) | RMSE | ||
15 min | STGCN | 3.07 | 7.16 | 6.50 |
STGC-STGCN | 3.09 | 7.54 | 6.58 | |
T-GCN | 3.59 | 9.65 | 6.54 | |
STGC-T-GCN | 3.62 | 9.50 | 6.52 | |
GWNET | 2.53 | 6.16 | 4.69 | |
STGC-GWNET | 2.56 | 6.39 | 4.82 | |
30 min | STGCN | 3.55 | 8.79 | 7.95 |
STGC-STGCN | 3.65 | 9.02 | 8.13 | |
T-GCN | 3.86 | 10.49 | 7.15 | |
STGC-T-GCN | 3.91 | 10.49 | 7.07 | |
GWNET | 2.83 | 7.29 | 5.50 | |
STGC-GWNET | 2.83 | 7.46 | 5.53 | |
45 min | STGCN | 4.69 | 10.99 | 9.41 |
STGC-STGCN | 4.10 | 9.99 | 9.13 | |
T-GCN | 4.10 | 11.36 | 7.62 | |
STGC-T-GCN | 4.02 | 11.21 | 7.45 | |
GWNET | 3.06 | 8.20 | 6.08 | |
STGC-GWNET | 3.03 | 8.19 | 6.01 | |
60 min | STGCN | 4.62 | 11.37 | 10.05 |
STGC-STGCN | 4.46 | 11.24 | 10.00 | |
T-GCN | 4.28 | 12.26 | 8.08 | |
STGC-T-GCN | 4.17 | 11.85 | 7.82 | |
GWNET | 3.25 | 8.78 | 6.49 | |
STGC-GWNET | 3.17 | 8.77 | 6.40 |
Horizon | Model | Metrics | ||
MAE | MAPE(%) | RMSE | ||
45 min | STGCN(-) | 4.21 | 10.41 | 9.27 |
STGC-STGCN | 4.10 | 9.99 | 9.13 | |
T-GCN(-) | 8.55 | 30.89 | 14.86 | |
STGC-T-GCN | 4.02 | 11.21 | 7.45 | |
GWNET(-) | 3.47 | 9.56 | 7.07 | |
STGC-GWNET | 3.03 | 8.19 | 6.01 | |
60 min | STGCN(-) | 4.70 | 11.41 | 10.31 |
STGC-STGCN | 4.46 | 11.24 | 10.00 | |
T-GCN(-) | 8.55 | 30.91 | 14.87 | |
STGC-T-GCN | 4.17 | 11.85 | 7.82 | |
GWNET(-) | 3.77 | 10.67 | 7.70 | |
STGC-GWNET | 3.17 | 8.77 | 6.40 |
In addition, to argue that the validity of such long-term predictions is not due to randomness, we add random groups for comparison. For each prediction horizon, we generated 10 random graphs with the following settings: 1) the sparsity of the random graph is consistent with the STGC graph; 2) for each node, the number of neighboring nodes is consistent with that in the STGC graph; and 3) the IDs of neighboring nodes are randomly generated. The final accuracy is averaged by the evaluation results of 10 random graphs, labeled as STGCN (random) in Figure 12.
Taking the MAE metric as an example, it shows that the predictive power of both STGC-STGCN and STGCN (random) remains relatively steady as the prediction horizon increases, where STGC-STGCN achieves the best results in long-term prediction and STGCN (random) keeps performing poorly, which is not surprising because it randomly introduces information from other nodes. The MAE metric also indicates that the STGC graph’s capability of long-term prediction is not due to random factors such as sparsity but due to the correct capture of spatial dependence. It is noteworthy that the random matrix achieves better results than the identity matrix at a prediction horizon of 30 min and is comparable to the identity matrix at a prediction horizon of 60 min. This illustrates, on the one hand, the limitations of ignoring spatial dependence and, on the other hand, the possibility of achieving better results by chance than by information from itself, even if such luck is not always reliable.
As illustrated in Figure 12, the original model (STGCN) using the SD graph showed a significant decrease in predictive capability between 30 min and 45 min, while the predictive capability using the STGC graph was much more stable, which also supports our Observation 2.
V-E Visualization
Although our method outperforms the original model only on the 45-min and 60-min long-term predictions, for the 207 nodes in the METR_LA dataset, there are nodes that are densely distributed with simple upstream and downstream and thus low prediction difficulty, while there are also nodes located at intersections with higher prediction difficulty. Therefore, the prediction accuracy of individual nodes was calculated and visualized on OpenStreetMap separately. We take the STGCN as an example for analysis in this section.


(1) Visualization of the prediction results for the STGC-STGCN
For the long-term predictions of 45 min and 60 min, our method can achieve better results than the SD graph for most of the nodes (e.g., Figure 13a). For some nodes, our method can achieve better results than the SD graph on all prediction horizons (see Figure 13b), which are mostly located at the boundary and intersection of the road network. We also have better prediction for remote nodes (circle in Figure 13b), which indicates that our method has better results for nodes that are more difficult to predict on all horizons.
(2) Visualization of STGC-STGCN interpretability
To interpret the spatial-temporal Granger causality, we detect and compare it with the spatial connectivity inscribed by the SD graph. We visualize the road network and its spatial dependence: the nodes to be predicted are marked with a gray marker, and other nodes are marked with a light gray marker. To visualize the spatial dependence, we used makers with car icons. The cause-node with red marker (with car icon), the effect-node with blue marker (with car icon), and the neighboring nodes in the SD graph with green marker (with car icon).
Direction of STGC. The STGC graph describes a kind of spatial dependence caused by global traffic information transmission, which corresponds to the upstream and downstream relationship of traffic flow in the real world. The STGC graph describes a kind of spatial dependence caused by global traffic information transmission, which corresponds to the upstream and downstream of traffic flow in the real world. By examining the detected causal relationship with the directions marked on the highway by OpenStreetMap, the effect-node should be in the downstream direction of the cause-node. As shown in Figure 14, it can be found that the cause (upstream) and effect (downstream) of our detection are consistent with the direction of the highway marked on the map.

The local connectivity constructed in the SD graph is a static relationship generated from the invariant spatial structure of the road network (node location distribution, road network topology, etc.), which mostly has no direction and is unable to portray the interaction caused by dynamic traffic flow. However, driving on a highway is constrained by traffic rules, and it is difficult to have direct interactions between nodes on roads in different driving directions even if they are close to each other. Spatial dependence is likely to be wrong when determined only by spatial distance but without considering the possibility of actual interaction.
As shown in Figure 15, two nodes that are spatially dependent in the SD graph do not actually reach each other, and traffic flow interactions cannot be generated between them. Considering the possibility of actual interactions, it is likely to introduce incorrect spatial dependence.

Spatial-temporal Granger causality in a sense can avoid the introduction of incorrect information by detecting the global transmission of traffic information between two nodes. As shown in Figure 16, the effect-node of the node to predict always appears downstream, while the spatially connected node may appear on the road in the opposite direction, which cannot be either upstream or downstream.

STGC captures long-range spatial dependence. The STGC graph can model long-range spatial dependence, while the SD graph can only model local connected nodes, which will not spend a longer time interacting with the node to be predicted. As shown in Figure 17, the spatial dependence between the gray marker (with an icon of a car) in the upper picture and the red marker (with an icon of a car) in the lower picture can be detected by the STGC graph.

VI Discussion
In this section, we analyze the possible reasons why spatial-temporal Granger causality can outperform the SD graph at 45 min and longer horizons. We also summarize the shortcomings of our method and suggest possible future solutions.
Spatial distance does not mean everything. In many traffic prediction benchmark datasets, such as METR_LA used in this paper, the spatial distance between nodes is not the actual geographical distance between them but the distance traveling from one node to another along the road network, and this distance needs to be actually measured. Therefore, in the raw dataset, there may be missing data, i.e., the distance values of two neighboring nodes are missing. However, the SD graph is constructed considering only the distances recorded in the dataset, so the situation in Figure 18 may occur, where the closer nodes are not connected in the SD graph, but instead the distant nodes are connected even if the closer nodes are reachable to each other.
In addition, in a traffic system with obvious physical constraints between roads with different directions such as a highway, two nodes may take a high cost to connect even though they are very close in space, and if these two nodes are determined to be spatially dependent, then they may bring poor results for prediction. Similar to the scenario in Figure 19, in the SD graph generated by spatial distance filtering, two nodes on roads with opposite directions are judged as neighbors. Our method further determines this spatial dependence by detecting the transmission of traffic information, which can avoid this negative influence to a certain extent.
Time to travel across the road network. We obtained statistics on the spatial-temporal lag of each pair of nodes in the road network and found an interesting phenomenon. In the ideal case (i.e., the interaction between nodes is planned by the shortest path and kept at the average flow rate of the source-node for smooth transmission), most nodes can complete the interaction within 30 min (spatial-temporal lag ¡= 6), and the nodes farthest apart can complete the interaction within 45 min (spatial-temporal lag¡=9), which is exactly the capability boundary of the SD graph. That is, when the prediction horizon ¿ 45 min, a large amount of traffic flow may have already left the system, and the new flow entering the system is unknown; therefore, the overall flow in the system is not guaranteed to be constant. In this case, the uncertainty of traffic information transmission further becomes higher, and the spatial dependence to be captured by the prediction is still long-range dependence, but it can no longer be inferred from the spatial location alone, and a stable association must be mined from the observed data. Thus, we hypothesize that local and static connectivity is insufficient to cope with this uncertainty when the prediction duration exceeds the maximum traveling time in a traffic system. The spatial-temporal Granger causality test method we use essentially mines a stable spatial dependence from time series that are long-term observed and dynamically changing and is able to address the problem of long-term prediction to some extent. How exactly this uncertainty is modeled and how the transmission of traffic flow is tracked are questions that can be considered and solved in the future.
Spatial dependence is dynamically changing or invariant? Traffic flow is actually observed at the macro level generated by a large number of microlevel vehicles traveling in the road network, which have very high uncertainty in their trajectories but exhibit some patterns at the macro level. The spatial pattern is recognized by spatial dependence. The construction of an SD graph essentially establishes spatial dependence from a static road network with a constant spatial structure, and the assumption behind it is that the closer they are, the stronger the spatial dependency, so this spatial dependence is actually static and local. However, as we observe, traffic time series are actually dynamic and complex over time and space and potentially influenced by factors such as the environment and society. Static spatial dependence is no longer sufficient to describe the changing spatial dependence of the road network. We assume that although the spatial dependence of the road network is dynamically changing, there will be an equilibrium, i.e., there is a stable spatial dependence, and the changing spatial dependence always approaches this steady state. Thus, this steady state can explain the spatial dependence of the road network to the greatest extent and help to predict it. Our method attempts to approximate this stable spatial dependence. Therefore, in some time windows or prediction horizons, our prediction result may deviate from the real situation. In the future, we can choose to divide the input into time periods and detect an STGC graph for each time period to realize the dynamic modeling of spatial dependence, which may work better for complex traffic prediction problems.
What is the role of temporal graphs? There is another class of GNN-based traffic prediction models that combine temporal graphs with spatial graphs as input. In this type of model, the temporal graph is often chosen to connect nodes with higher similarity in time series, which can compensate to some extent for the spatial graph that only uses static and local connectivity. However, this type of model eventually ensures the effect in decision-making by operations such as cropping or maxpooling and actually still relies mainly on spatial graphs. In this paper, we do not compare and improve such models because our approach also detects relationships from observed time series, which may create redundancy with the information contained in the temporal graph. We believe that the temporal graph actually models the effect of spatial dependence on the time axis and can help model spatial dependence more accurately, but how to play its role is an open question to be discussed.
VII Conclusions
In this paper, we proposed a spatial-temporal Granger causality to model spatial dependence in the road network. This causality can capture the transmission of GDTi. Unlike the local and static connectivity generated from the spatial distribution of the road network, the spatial-temporal Granger causality we proposed can capture the long-range dependence and approximate the stable causal relationship of dynamic traffic flow transmission from the long-term observed traffic flow and thus performs better in 45-min and 60-min long-term predictions. Specifically, we proposed a TCR hypothesis and a spatial-temporal Granger causality test method to detect causal relationships.
The causal relationship we eventually obtained is directional and can match well with the source-node and target-node of a real-world traffic system. Experiments on the METR_LA dataset showed that our method can outperform the original model on three GNN-based traffic prediction models at 45 min and 60 min. At the prediction level of individual nodes, our model can improve the prediction at all horizons for intersectional, boundary, and distant nodes. The upstream and downstream traffic information transferred at intersectional nodes are complex, and boundary nodes tend to depend only on upstream or downstream nodes, while remote nodes rely more on the capture of long-range dependence. This also illustrates the effectiveness of our method for GDTi transfer capture compared to a spatial graph.
References
- [1] Y. Sun, G. Zhang, and H. Yin, “Passenger flow prediction of subway transfer stations based on nonparametric regression model,” Discrete Dynamics in Nature and Society, vol. 2014, p. 397154, 2014. [Online]. Available: https://doi.org/10.1155/2014/397154
- [2] B. Pan, U. Demiryurek, and C. Shahabi, “Utilizing real-world transportation data for accurate traffic prediction,” in 2012 IEEE 12th International Conference on Data Mining, 2012, Conference Proceedings, pp. 595–604.
- [3] E. Zivot and J. Wang, “Vector autoregressive models for multivariate time series,” Modeling financial time series with S-PLUS®, pp. 385–429, 2006.
- [4] B. M. Williams and L. A. Hoel, “Modeling and forecasting vehicular traffic flow as a seasonal arima process: Theoretical basis and empirical results,” Journal of transportation engineering, vol. 129, no. 6, pp. 664–672, 2003.
- [5] S. Shekhar and B. M. Williams, “Adaptive seasonal time series models for forecasting short-term traffic flow,” Transportation Research Record, vol. 2024, no. 1, pp. 116–125, 2007.
- [6] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang, “Prediction of urban human mobility using large-scale taxi traces and its applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121, 2012.
- [7] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “Predicting taxi–passenger demand using streaming data,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1393–1402, 2013.
- [8] M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 871–882, 2013.
- [9] I. M. Wagner-Muns, I. G. Guardiola, V. Samaranayke, and W. I. Kayani, “A functional data analysis approach to traffic volume forecasting,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 3, pp. 878–888, 2017.
- [10] Y. Xie, Y. Zhang, and Z. Ye, “Short‐term traffic volume forecasting using kalman filter with discrete wavelet decomposition,” Computer‐Aided Civil and Infrastructure Engineering, vol. 22, no. 5, pp. 326–334, 2007.
- [11] T. Pamuła, “Impact of data loss for prediction of traffic flow on an urban road using neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 1000–1009, 2018.
- [12] G. M. Soliman and T. H. Abou-El-Enien, “Terrorism prediction using artificial neural network,” Rev. d’Intelligence Artif., vol. 33, no. 2, pp. 81–87, 2019.
- [13] M. S. Dougherty and M. R. Cobbett, “Short-term inter-urban traffic forecasts using neural networks,” International journal of forecasting, vol. 13, no. 1, pp. 21–31, 1997.
- [14] M. Raeesi, M. S. Mesgari, and P. Mahmoudi, “Traffic time series forecasting by feedforward neural network: a case study based on traffic data of monroe,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XL2, pp. 219–223, 2014. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2014ISPAr.XL2..219R
- [15] Z. Lv, J. Xu, K. Zheng, H. Yin, P. Zhao, and X. Zhou, “Lc-rnn: A deep learning model for traffic speed prediction,” in IJCAI, vol. 2018, 2018, Conference Proceedings, p. 27th.
- [16] J. W. C. Lint, S. Hoogendoorn, and H. Zuvlen, “Freeway travel time prediction with state-space neural networks: Modeling state-space dynamics with recurrent neural networks,” Transportation Research Record, vol. 1811, 2002.
- [17] R. Fu, Z. Zhang, and L. Li, “Using lstm and gru neural network methods for traffic flow prediction,” in 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, 2016, Conference Proceedings, pp. 324–328.
- [18] J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, “Dnn-based prediction model for spatio-temporal data,” in Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, 2016, Conference Proceedings, pp. 1–4.
- [19] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li, “T-gcn: A temporal graph convolutional network for traffic prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 9, pp. 3848–3858, 2019.
- [20] Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph wavenet for deep spatial-temporal graph modeling,” p. 1907–1913, 2019.
- [21] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,” p. 3634–3640, 2018.
- [22] C. Song, Y. Lin, S. Guo, and H. Wan, “Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 914–921, 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5438
- [23] M. Li and Z. Zhu, “Spatial-temporal fusion graph neural networks for traffic flow forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 5, pp. 4189–4196, 2021. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/16542
- [24] Z. Fang, Q. Long, G. Song, and K. Xie, “Spatial-temporal graph ode networks for traffic flow forecasting,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, Conference Proceedings, pp. 364–373.
- [25] L. Tang, Y. Zhao, J. Cabrera, J. Ma, and K. L. Tsui, “Forecasting short-term passenger flow: An empirical study on shenzhen metro,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 10, pp. 3613–3622, 2019.
- [26] W. Li, J. Cao, J. Guan, S. Zhou, G. Liang, W. K. Y. So, and M. Szczecinski, “A general framework for unmet demand prediction in on-demand transport services,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 8, pp. 2820–2830, 2019.
- [27] J. Guan, W. Wang, W. Li, and S. Zhou, “A unified framework for predicting kpis of on-demand transport services,” IEEE Access, vol. 6, pp. 32 005–32 014, 2018.
- [28] L. Lin, J. Li, F. Chen, J. Ye, and J. Huai, “Road traffic speed prediction: A probabilistic model fusing multi-source data,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1310–1323, 2018.
- [29] Z. Diao, D. Zhang, X. Wang, K. Xie, S. He, X. Lu, and Y. Li, “A hybrid model for short-term traffic volume prediction in massive transportation systems,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 935–946, 2019.
- [30] D. Salinas, M. Bohlke-Schneider, L. Callot, R. Medico, and J. Gasthaus, “High-dimensional multivariate forecasting with low-rank gaussian copula processes,” Advances in neural information processing systems, vol. 32, 2019.
- [31] H. Tan, Y. Wu, B. Shen, P. J. Jin, and B. Ran, “Short-term traffic prediction based on dynamic tensor completion,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 8, pp. 2123–2133, 2016.
- [32] J. Shin and M. Sunwoo, “Vehicle speed prediction using a markov chain with speed constraints,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 9, pp. 3201–3211, 2019.
- [33] P. Duan, G. Mao, W. Liang, and D. Zhang, “A unified spatio-temporal model for short-term traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 9, pp. 3212–3223, 2019.
- [34] Z. Li, N. D. Sergin, H. Yan, C. Zhang, and F. Tsung, “Tensor completion for weakly-dependent data on graph for metro passenger flow prediction,” in proceedings of the AAAI conference on artificial intelligence, vol. 34, 2020, Conference Proceedings, pp. 4804–4810.
- [35] Y. Gong, Z. Li, J. Zhang, W. Liu, and J. Yi, “Potential passenger flow prediction: A novel study for urban transportation development,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2019, Conference Proceedings, pp. 4020–4027.
- [36] J.-M. Chiou, “Dynamical functional prediction and classification, with application to traffic flow prediction,” The Annals of Applied Statistics, vol. 6, no. 4, pp. 1588–1614, 2012.
- [37] V. I. Shvetsov, “Mathematical modeling of traffic flows,” Automation and remote control, vol. 64, no. 11, pp. 1651–1689, 2003.
- [38] A. Kinoshita, A. Takasu, and J. Adachi, “Latent variable model for weather-aware traffic state analysis,” in International Workshop on Information Search, Integration, and Personalization. Springer, 2016, Conference Proceedings, pp. 51–65.
- [39] D. Deng, C. Shahabi, U. Demiryurek, and L. Zhu, “Situation aware multi-task learning for traffic prediction,” in 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 2017, Conference Proceedings, pp. 81–90.
- [40] D. Deng, C. Shahabi, U. Demiryurek, L. Zhu, R. Yu, and Y. Liu, “Latent space model for road networks to predict time-varying traffic,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, Conference Proceedings, pp. 1525–1534.
- [41] H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal regularized matrix factorization for high-dimensional time series prediction,” Advances in neural information processing systems, vol. 29, 2016.
- [42] H. Hong, X. Zhou, W. Huang, X. Xing, F. Chen, Y. Lei, K. Bian, and K. Xie, “Learning common metrics for homogenous tasks in traffic flow prediction,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). IEEE, 2015, Conference Proceedings, pp. 1007–1012.
- [43] N. Polson and V. Sokolov, “Bayesian particle tracking of traffic flows,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 2, pp. 345–356, 2017.
- [44] Y. Gong, Z. Li, J. Zhang, W. Liu, Y. Zheng, and C. Kirsch, “Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization,” in Proceedings of the 27th ACM international conference on information and knowledge management, 2018, Conference Proceedings, pp. 1243–1252.
- [45] K. Ishibashi, S. Harada, and R. Kawahara, “Inferring latent traffic demand offered to an overloaded link with modeling qos-degradation effect,” IEICE Transactions on Communications, 2018.
- [46] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction,” Sensors, vol. 17, no. 4, p. 818, 2017.
- [47] M. Lv, Z. Hong, L. Chen, T. Chen, T. Zhu, and S. Ji, “Temporal multi-graph convolutional network for traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3337–3348, 2021.
- [48] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” arXiv preprint arXiv:1707.01926, 2017.
- [49] Z. Cui, R. Ke, Z. Pu, X. Ma, and Y. Wang, “Learning traffic as a graph: A gated graph wavelet recurrent neural network for network-scale traffic prediction,” Transportation Research Part C: Emerging Technologies, vol. 115, p. 102620, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0968090X19306448
- [50] Z. Diao, X. Wang, D. Zhang, Y. Liu, K. Xie, and S. He, “Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 890–897, 2019. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/3877
- [51] W. Zhao, S. Zhang, B. Zhou, and B. Wang, “Stcgat: Spatial-temporal causal networks for complex urban road traffic flow prediction,” ArXiv, vol. abs/2203.10749, 2022.
- [52] C. Zheng, X. Fan, C. Wang, and J. Qi, “Gman: A graph multi-attention network for traffic prediction,” pp. 1234–1241, 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5477
- [53] W. Chen, L. Chen, Y. Xie, W. Cao, Y. Gao, and X. Feng, “Multi-range attentive bicomponent graph convolutional network for traffic forecasting,” pp. 3529–3536, 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5758
- [54] G. Guo and W. Yuan, “Short-term traffic speed forecasting based on graph attention temporal convolutional networks,” Neurocomputing, vol. 410, pp. 387–393, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231220309504
- [55] X. Shi, H. Qi, Y. Shen, G. Wu, and B. Yin, “A spatial–temporal attention approach for traffic prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 8, pp. 4909–4918, 2021.
- [56] F. Zhou, Q. Yang, T. Zhong, D. Chen, and N. Zhang, “Variational graph neural networks for road traffic prediction in intelligent transportation systems,” IEEE Transactions on Industrial Informatics, vol. 17, no. 4, pp. 2802–2812, 2021.