Multi-task Learning for Sparse Traffic Forecasting

Jiezhang Li, Junjun Li, Yue-Jiao Gong^*
School of Coumpute Science and Engineering, South China University of Technology
Guangzhou, China
[email protected]

Abstract

Accurate traffic prediction is crucial to improve the performance of intelligent transportation systems. Previous traffic prediction tasks mainly focus on small and non-isolated traffic subsystems, while the Traffic4cast 2022 competition is dedicated to exploring the traffic state dynamics of entire cities. Given one hour of sparse loop count data only, the task is to predict the congestion classes for all road segments and the expected times of arrival along super-segments 15 minutes into the future. The sparsity of loop counter data and highly uncertain real-time traffic conditions make the competition challenging. For this reason, we propose a multi-task learning network that can simultaneously predict the congestion classes and the speed of each road segment. Specifically, we use clustering and neural network methods to learn the dynamic features of loop counter data. Then, we construct a graph with road segments as nodes and capture the spatial dependence between road segments based on a Graph Neural Network. Finally, we learn three measures, namely the congestion class, the speed value and the volume class, simultaneously through a multi-task learning module. For the extended competition, we use the predicted speeds to calculate the expected times of arrival along super-segments. Our method achieved excellent results on the dataset provided by the Traffic4cast Competition 2022, source code is available at https://github.com/OctopusLi/NeurIPS2022-traffic4cast.

Keywords multi-task learning $\displaystyle\cdot$ volume clustering $\displaystyle\cdot$ deep neural network $\displaystyle\cdot$ traffic prediction

1 Introduction

Intelligent transportation system plays an increasingly important role in modern cities. As an important task of intelligent transportation systems, traffic prediction aims to predict future traffic states according to historical traffic data and real-time traffic conditions.

In the past decade, deep learning methods have made breakthroughs in the field of traffic prediction. Some methods [1, 2, 3, 4, 5, 6] used Convolutional Neural Network and Recurrent Neural Network [7] to capture the temporal dependence of traffic data. But these methods do not consider the topological relationship of traffic data. Some works [8, 9, 10, 11] introduced Graph Neural Network [12] to model road graphs and learn the spatial dependence between road segments. However, most of the previous methods are built upon small subsystems, and meanwhile they require dense traffic data acquisition. These methods are not applicable in the practical large-scale traffic systems with sparse measurement data.

During the previous traffic4cast competition in 2019-2021 [13, 14, 15], many methods have been proposed to predict traffic conditions by using large-scale data and have contributed both methodological and practical insights to advance the application of AI in forecasting traffic and other spatial processes. The Traffic4cast 2022 challenge further aims to explore the ability to generalize loosely related temporal vertex data on just a few nodes to predict dynamic future traffic states on the edges of the entire road graph. Specifically, the core challenge of Traffic4cast 2022 is to predict the congestion classes of all road segments 15 minutes into the future. The extended challenge aims to predict the expected times of arrival along super-segments over the next 15 minutes. Note that the input of competition is the car count data from spatially sparse vehicle counters in three cities in 15-minute aggregated time bins for one hour before the prediction time slot.

2 Methods

Our model is composed of several modules, which are presented one-by-one in this section. The overview of the model is shown in Figure 1.

Refer to caption — Figure 1: Architecture of the proposed model.

2.1 Volume Clustering

First, the data records are clustered into 10 groups following the simple median clustering method¹¹1https://github.com/iarai/NeurIPS2022-traffic4cast/ provided by the organizer²²2https://www.iarai.ac.at/traffic4cast/. Specifically, for each data record, we add the values of all loop counter volumes to obtain a volumeSum value. Then, we first sort the dataset according to volumeSum and then patition the records into $\displaystyle 10$ equal-frequency bins. The bin number indicates the cluster index of each data record.

Afterwards, for each road segment, we extract a $\displaystyle 10\times 3$ feature matrix $\displaystyle V$ as follows. Suppose there are $\displaystyle n$ historical records within the $\displaystyle i$ th cluster, while for this road segment, the numbers of congestion labels for the undefined/green, yellow, and red states are $\displaystyle c_{i}^{1}$ , $\displaystyle c_{i}^{2}$ , and $\displaystyle c_{i}^{3}$ , respectively. Then the $\displaystyle i$ th row of $\displaystyle V$ is calculated as $\displaystyle(c_{i}^{1},c_{i}^{2},c_{i}^{3})/n$ . In this way, we obtain the statistical distribution of the congestion level of each road segment w.r.t. different global traffic volumes, which are taken as the prior knowledge and passed into the graph neural network for further learning.

2.2 Volume Feature Learning

In our road graph, loop counters are very sparse. If the volumes of loop counters are directly introduced into the network as node features, there will be many vacant values in nodes. So we use a multi-layer perception network to learn the relationship between counter volume and road segments, and then feed the feature of road segments to the graph neural network for further learning.

2.3 Static Feature Learning

The attribute features of road segments are essential. We embed importance to $\displaystyle\boldsymbol{R}^{5}$ , oneway to $\displaystyle\boldsymbol{R}^{2}$ , tunnel to $\displaystyle\boldsymbol{R}^{2}$ , and lanes to $\displaystyle\boldsymbol{R}^{3}$ . For continuous features like parsed max speed, flow speed, length meters, counter distance, and limit speed we just concatenate them. By this way, We obtain a unified representation for each road segments.

2.4 Graph Neural Network

GNN has achieved good results in many fields, it can aggregate information of other nodes and edges through a message passing mechanism. Notice that the loop counter volume data is sparse, which means the nodes are hard to learn the features of their neighbors by normal graph neural network. Moreover, our tasks aim to learn the features of road segments, but the GNN model aggregate information based on nodes. So it is more conducive to the representation learning of road segment features by taking road segments as the nodes of the graph. For this reason, we construct a new road graph, taking road segments as the nodes of the graph. In this way, we can better learn the dependencies between road segments. We introduce the features learned by the previous modules into the multi-layer graph neural network, learn the spatial dependency relationship between road segments, and pass the output to the multi-task learning network.

2.5 Multi-task Learning

We propose a multi-task learning component to the learn congestion class, the speed value, and the volume class for each segment in the road graph in a concurrent way. We feed the output of GNN module to residual blocks [16] to obtain the congestion class $\displaystyle\boldsymbol{y}_{c}^{pred}$ , speed $\displaystyle\boldsymbol{y}_{s}^{pred}$ , volume class $\displaystyle\boldsymbol{y}_{v}^{pred}$ . The loss function is defined as follows:

\displaystyle\displaystyle L_{c}=WeightedCrossEntropy\big{(}y_{c_{i}},y_{c_{i}}^{pred}\big{)}

(1)

L_{s}=\left(y_{s_{i}}-y_{s_{i}}^{pred}\right)^{2}

(2)

\displaystyle\displaystyle L_{v}=WeightedCrossEntropy\big{(}y_{v_{i}},y_{v_{i}}^{pred}\big{)}

(3)

Where $\displaystyle\boldsymbol{y}_{c}$ denotes the label of the congestion class (0=undefined; 1=green/uncongested; 2=yellow/slowed-down; 3=red/congested), $\displaystyle\boldsymbol{y}_{s}$ denotes the label of the speed, and $\displaystyle\boldsymbol{y}_{v}$ denotes the label of the volume class (1 for volumes 1 and 2; 3 for volumes 3 and 4; 5 for volumes 5 and above, according to the competition).

Finally, we combine the loss terms of these tasks as the overall loss function:

\displaystyle\displaystyle L=\lambda_{1}\cdot L_{c}+\lambda_{2}\cdot L_{s}+\lambda_{3}\cdot L_{v}

(4)

where $\displaystyle\lambda_{1}$ , $\displaystyle\lambda_{2}$ , and $\displaystyle\lambda_{3}$ are hyper-parameters. We take 0.03, 1, 1 in training.

2.6 ETA Prediction

Since we have the speed output and length of all road segments, we just need to calculate the travel time of segments in the super-segments and add them all together to produce the super-segment ETAs.

3 Experiments

In this section, we examine the performance of our method on three real-world large-scale datasets from Traffic4cast competition 2022, collected by HERE Technologies in the years 2019 to 2021. There are three city datasets: London, Madrid, and Melbourne. For each city, its road graph consists of tens of thousands of nodes, edges, and some sparse loop counters. For example, the road graph of London has 132,414 edges, 59,110 nodes, and 3,751 counters. Loop counters volume is aggregated every 15 minutes, resulting in 3,751x4 volume input in which the vacant value is filled with 0. In the training set, we only select the data in the daytime, because the loop counters data and label data are sparse at midnight.

Table 1: Datasets

Cities	Nodes	Edges	Counters	Period
London	59110	132414	3751	2019-07-01 - 2020-01-31
Madrid	63397	121902	3875	2021-06-01 - 2021-12-31
Melbourne	49510	94871	3982	2020-06-01 - 2020-12-30

Table 2: Model setting

model (checkpoint)	sampling	size training set	number of iterations	batch size	trainable parameters
model<London>	110 days at 6am-10pm	110*64	20 epochs	2	525.03M
model<madrid>	109 days at 6am-10pm	109*64	20 epochs	2	502.60M
model<melbourne>	106 days at 6am-10pm	106*64	20 epochs	2	409.66M

Table 3: Leaderboard for core challenge

Team	core challenge
ustc-gobbler	0.8431079388
Bolt	0.8496679068
oahciy	0.8504153291
GongLab	0.8560309211
TSE	0.8736550411
discovery	0.8759126862
ywei	0.8778917591

Table 4: Leaderboard for extended challenge

Team	extended challenge
ustc-gobbler	58.49972153
TSE	59.78244781
oahciy	61.22274017
Bolt	61.2546107
discovery	62.29674403
GongLab	64.74489975

3.1 Leaderboard Results

For each city dataset, we trained 9 models, with each model taking 20 rounds of training in order to select the best-performed one on the validation set. Then, we average the predicted results of the 9 model to produce the final submission result. Finally, our team GongLab won the fourth place in the core competition and the sixth place in the extended competition.

3.2 Methods for Comparison

We compared our model against several baselines on the validation set:

•

Naive Count: For the core challenge, we count the congestion category of all roads and calculate the probability. For the extended challenge, we calculate the median eta of all super-segments in the training set.
•

Volume Cluster: The baseline groups the data based on the sum of counter volumes. It calculates the statistical probability of congestion for different groups for the core challenge and the median ETA for different groups for the extended challenge.
•

GNN: The GNN model is provided by the organizers. It fills the loop counter volumes of nodes as input and uses message passing mechanism to learn the dependence between nodes.

Table 5: Performance for core challenge on validation set.

Models	Naive count	Volume cluster	GNN	OurModel
London	1.0013	0.9872	0.9702	0.82705
Madrid	1.0023	0.9914	0.9735	0.82652
Melbourne	1.0216	1.0066	0.9845	0.84560

Table 6: Performance for extended challenge on validation set.

Models	Naive count	Volume cluster	GNN	OurModel
London	108.0988	99.9642	-	91.0117
Madrid	68.1231	61.2911	-	59.8746
Melbourne	41.2249	37.8772	-	39.1628

As can be seen from Table 5 and Table 6, our model performance is significantly better than the baselines. The Naive Count model does not take the dynamic changes of input into account, and Volume Cluster model cannot learn the dependence between road segments. Since the loop counter volumes are very sparse, the performance of the GNN model is poor. Our model combines the Volume Cluster and GNN methods, which can capture the dynamics of volume data as well as the dependencies between road segments. Moreover, we adopt the multi-task learning method to learn the congestion class, the speed value, and the volume class simultaneously, which further boosts the performance.

Table 7: Performance Comparison of our model and the ablation experimental model.

Models	OurModel	No cluster	No static feature	No GNN
London	0.82705	0.84949	0.83534	0.82910
Madrid	0.82652	0.84946	0.83442	0.82825
Melbourne	0.84560	0.89053	0.84874	0.84803

We perform ablation experiments to verify the effectiveness of the model. It can be seen from Table 7 that the performance reduces after removing each module we proposed, which shows that the above modules are meaningful. For example, if the clustering component is removed, the score of the model increases from 0.82705 to 0.84949, indicating that it is important to use volume clustering method to capture the dynamic changes of loop counters data.

4 Discussion

We propose a multi-task learning framework for sparse traffic prediction, and experimental results show that our model is significantly better than the baselines. During the experiments, we also found that for particularly sparse node input, it is difficult for the general GNN network to learn the spatial dependence between nodes. Besides, taking the clustering technique to analyze the historical data as additional patterns can effectively improve the performance of the model.

5 Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grant 62276100, in part by the Guangdong Natural Science Funds for Distinguished Young Scholars under Grant 2022B1515020049, and in part by the Guangdong Regional Joint Funds for Basic and Applied Research under Grant 2021B1515120078.

References

[1] Xiaolei Ma, Zhuang Dai, Zhengbing He, Jihui Ma, Yong Wang, and Yunpeng Wang. Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors, 17(4):818, 2017.
[2] Nicholas G Polson and Vadim O Sokolov. Deep learning for short-term traffic flow prediction. Transportation Research Part C: Emerging Technologies, 79:1–17, 2017.
[3] Zou Zhene, Peng Hao, Liu Lin, Xiong Guixi, Bowen Du, Md Zakirul Alam Bhuiyan, Yuntao Long, and Da Li. Deep convolutional mesh rnn for urban traffic passenger flows prediction. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 1305–1310. IEEE, 2018.
[4] Shanghang Zhang, Guanhang Wu, Joao P Costeira, and José MF Moura. Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras. In Proceedings of the IEEE international conference on computer vision, pages 3667–3676, 2017.
[5] Jintao Ke, Hongyu Zheng, Hai Yang, and Xiqun Michael Chen. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transportation research part C: Emerging technologies, 85:591–608, 2017.
[6] Wangyang Wei, Honghai Wu, and Huadong Ma. An autoencoder and lstm-based traffic flow prediction method. Sensors, 19(13):2946, 2019.
[7] Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari, 2010.
[8] Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 1234–1241, 2020.
[9] Rongzhou Huang, Chuyin Huang, Yubao Liu, Genan Dai, and Weiyang Kong. Lsgcn: Long short-term traffic prediction with graph convolutional networks. In IJCAI, pages 2355–2361, 2020.
[10] Shengnan Guo, Youfang Lin, Huaiyu Wan, Xiucheng Li, and Gao Cong. Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Transactions on Knowledge and Data Engineering, 2021.
[11] Weiqi Chen, Ling Chen, Yu Xie, Wei Cao, Yusong Gao, and Xiaojie Feng. Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 3529–3536, 2020.
[12] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
[13] David P Kreil, Michael K Kopp, David Jonietz, Moritz Neun, Aleksandra Gruca, Pedro Herruzo, Henry Martin, Ali Soleymani, and Sepp Hochreiter. The surprising efficiency of framing geo-spatial time series forecasting as a video prediction task – insights from the iarai 4͡c competition at neurips 2019. In Hugo Jair Escalante and Raia Hadsell, editors, Proceedings of the NeurIPS 2019 Competition and Demonstration Track, volume 123 of Proceedings of Machine Learning Research, pages 232–241. PMLR, 08–14 Dec 2020.
[14] Michael Kopp, David Kreil, Moritz Neun, David Jonietz, Henry Martin, Pedro Herruzo, Aleksandra Gruca, Ali Soleymani, Fanyou Wu, Yang Liu, Jingwei Xu, Jianjin Zhang, Jay Santokhi, Alabi Bojesomo, Hasan Al Marzouqi, Panos Liatsis, Pak Hay Kwok, Qi Qi, and Sepp Hochreiter. Traffic4cast at neurips 2020 - yet more on the unreasonable effectiveness of gridded geo-spatial processes. In Hugo Jair Escalante and Katja Hofmann, editors, Proceedings of the NeurIPS 2020 Competition and Demonstration Track, volume 133 of Proceedings of Machine Learning Research, pages 325–343. PMLR, 06–12 Dec 2021.
[15] Christian Eichenberger, Moritz Neun, Henry Martin, Pedro Herruzo, Markus Spanring, Yichao Lu, Sungbin Choi, Vsevolod Konyakhin, Nina Lukashina, Aleksei Shpilman, Nina Wiedemann, Martin Raubal, Bo Wang, Hai L. Vu, Reza Mohajerpoor, Chen Cai, Inhi Kim, Luca Hermes, Andrew Melnik, Riza Velioglu, Markus Vieth, Malte Schilling, Alabi Bojesomo, Hasan Al Marzouqi, Panos Liatsis, Jay Santokhi, Dylan Hillier, Yiming Yang, Joned Sarwar, Anna Jordan, Emil Hewage, David Jonietz, Fei Tang, Aleksandra Gruca, Michael Kopp, David Kreil, and Sepp Hochreiter. Traffic4cast at neurips 2021 - temporal and spatial few-shot transfer learning in gridded geo-spatial processes. In Douwe Kiela, Marco Ciccone, and Barbara Caputo, editors, Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, volume 176 of Proceedings of Machine Learning Research, pages 97–112. PMLR, 06–14 Dec 2022.
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.