This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Online Spatio-Temporal Correlation-Based Federated Learning for Traffic Flow Forecasting

Qingxiang Liu, Sheng Sun, Min Liu,   Yuwei Wang, and Bo Gao This work was supported by the National Key Research and Development Program of China (2021YFB2900102) and the National Natural Science Foundation of China (No. 62072436 and No.62202449).Qingxiang Liu, Sheng Sun, Min Liu and Yuwei Wang are with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. Qingxiang Liu is also with University of Chinese Academy of Sciences, Beijing, China. Min Liu is also with the Zhongguancun Laboratory (e-mail: [email protected], [email protected], [email protected] and [email protected]).Bo Gao is with the School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China (email: [email protected]).Min Liu is the corresponding author.
Abstract

Traffic flow forecasting (TFF) is of great importance to the construction of Intelligent Transportation Systems (ITS). To mitigate communication burden and tackle with the problem of privacy leakage aroused by centralized forecasting methods, Federated Learning (FL) has been applied to TFF. However, existing FL-based approaches employ batch learning manner, which makes the pre-trained models inapplicable to subsequent traffic data, thus exhibiting subpar prediction performance. In this paper, we perform the first study of forecasting traffic flow adopting Online Learning (OL) manner in FL framework and then propose a novel prediction method named Online Spatio-Temporal Correlation-based Federated Learning (FedOSTC), aiming to guarantee performance gains regardless of traffic fluctuation. Specifically, clients employ Gated Recurrent Unit (GRU)-based encoders to obtain the internal temporal patterns inside traffic data sequences. Then, the central server evaluates spatial correlation among clients via Graph Attention Network (GAT), catering to the dynamic changes of spatial closeness caused by traffic fluctuation. Furthermore, to improve the generalization of the global model for upcoming traffic data, a period-aware aggregation mechanism is proposed to aggregate the local models which are optimized using Online Gradient Descent (OGD) algorithm at clients. We perform comprehensive experiments on two real-world datasets to validate the efficiency and effectiveness of our proposed method and the numerical results demonstrate the superiority of FedOSTC.

Index Terms:
Federated learning, online learning, spatio-temporal correlation, traffic flow forecasting.

I Introduction

The development of Internet of Things (IoT) has quicken the construction progress of Intelligent Transport Systems (ITS), which has aroused great interests in industry and academia[1, 2]. In recent years, an increasing number of traffic nodes, e.g., traffic sensors and loop detectors are mounted along roads to generate overwhelming data for enhanced traffic service, e.g., traffic flow forecasting [3, 4]. By far, plenty of deep learning-based methods are proposed to improve performance gains in traffic flow forecasting [5, 6].

Recurrent Neural Network (RNN) and its variants, e.g., Long Short-Term Memory network (LSTM) [7] and Gated Recurrent Unit network (GRU) [8, 9], are widespreadly used in forecasting traffic flow, due to the effectiveness of capturing temporal patterns inside traffic data sequences. Besides, these methods combined with graphing approaches, such as Graph Neural Network (GNN) [10] or Graph Attention Network (GAT) [11], can extract spatial correlation among traffic nodes, thus further increasing forecasting accuracy. Despite prominent performance improvement, these methods are based on centralized learning strategy, which involves transmitting large quantities of privately-owned traffic data from decentralized traffic nodes to the central server for training prediction models. This strategy yields considerable communication overhead and privacy concern, since private information (e.g., plate numbers) contained in traffic data may be leaked during data exchange.

As a result, it is crucial to forecast traffic flow with keeping traffic data decentralized for mitigating communication burden and preserving privacy. Federated Learning (FL), as a novel distributed computing paradigm can resolve the problem above, where traffic nodes (called clients) collaboratively train a prediction model based on their private data, and only exchange intermediate model parameters with the server for model aggregation [12, 13]. Owing to the merit of guaranteeing competitive forecasting performance without direct data transmission, there have been a few efforts in forecasting traffic flow depicting spatio-temporal correlation based on FL framework [14, 15, 16].

However, all of these approaches employ batch learning manner, where prediction models are trained in advance based on batches of historical traffic data. This coarse-grained manner of optimizing prediction models makes the optimized models fail to extensively capture the hidden temporal patterns in traffic flows. Furthermore, due to traffic fluctuation, these optimized models easily suffer from poor prediction performance, when directly applied to the upcoming traffic data. Therefore, these pre-trained models need to be repetitively trained, which incurs extra computation consumption, leads to delayed prediction, and fails to fulfill real-time prediction.

Online Learning (OL) manner can break through the above-mentioned limitations, in which traffic nodes consistently update prediction models and preform predictions once they observe new traffic data. Naturally, OL manner has the inherent advantages of real-time prediction and non-redundant deployment and hence applies to the task of traffic flow forecasting. Traffic flows fluctuate severely at some time stamps, which poses great challenges to prediction models updated in batch learning manner. However, in OL manner, the fine-grained optimization mechanism of prediction models can adapt to the dynamic changes of temporal patterns hidden in traffic flows, thus guaranteeing the prediction performance. To the best of our knowledge, there are no researches focusing on traffic flow forecasting employing OL manner in FL framework and this article fills this research gap.

However, there remain two main challenges to be tackled. First, we should dynamically evaluate the spatial correlation among clients at each round so as to guarantee prediction performance, which remarkably increases the difficulty of prediction and model update. Second, existing FL approaches adopt the averaging mechanism to yield fresh global models which have poor performance on the subsequent traffic data, due to traffic fluctuation. Therefore, it is necessary to explore the fluctuation patterns of traffic flow in the process of model aggregation so as to increase the generalization ability of global models and eventually improve performance gains.

To this end, we propose a novel prediction method named Online Spatial-Temporal Correlation-based Federated Learning (FedOSTC) for traffic flow forecasting, which possesses excellent capabilities of dynamically capturing spatio-temporal dependence and boosting model adaptation regardless of traffic fluctuation. Specifically, a GRU with encoder-decoder architecture is regarded as the prediction model and trained collaboratively by multiple traffic nodes. The temporal patterns inside traffic flows are captured independently via encoders at clients (traffic nodes). Then, the central server adaptively evaluates spatial closeness from temporal patterns using GAT, which can mitigate the effect of traffic fluctuation on spatial correlation. Relying on the instantly-captured spatio-temporal correlation, clients perform prediction and then incrementally optimize model parameters using Online Gradient Descent (OGD). Finally, based on the observation of periodic patterns of traffic flows, we generate fresh global models using the newly-developed period-aware aggregation mechanism rather than averaging mechanism, which can increase the generalization of global models for subsequent traffic flows. The main contributions of this paper are described as follows:

  • We propose a novel prediction model named FedOSTC for forecasting traffic flow in FL framework, to tackle with the problems of high communication burden and privacy leakage in centralized methods. Specifically, FedOSTC includes online evaluating spatio-temporal correlation and incrementally optimizing prediction models.

  • We obtain temporal patterns inside traffic data at clients using GRU-based encoders, and the central server adopts GAT to dynamically evaluate spatial closeness based on temporal patterns transmitted from clients, aiming to adapt to the dynamic changes of spatial correlation aroused by traffic fluctuation.

  • We incrementally update local models using OGD and further propose a period-aware aggregation mechanism, which can increase the generalization of global models and eventually improve performance gains.

  • We conduct comprehensive experiments to validate that our proposed FedOSTC outperforms the related state-of-the-art methods from the perspectives of prediction performance and model generalization ability.

The remainder of this paper is organized as below. Section II reviews the related literature. The problem of traffic flow forecasting adopting OL manner in FL framework is formulated in Section III. Then in Section IV, detailed information of our proposed method FedOSTC is elaborated. Furthermore, extensive experiments are conducted in Section V. In the end, the paper is concluded in Section VI.

II Related Work

II-A Traffic Flow Forecasting

A great variety of methods have been proposed to increase the accuracy of traffic flow forecasting. Overall, existing methods can be divided into two categories: parametric and non-parametric methods. AutoRegressive Integrated Moving Average (ARIMA) and its variants are classical parametric methods and have been researched extensively [17, 18]. However, this series of methods have limitations in regressing traffic flows with high fluctuation, because they are based on the assumption that traffic flows have stationary distribution, which is not the case in reality. Non-parametric methods are based on machine learning models, e.g., Graph Network (GN) [19] and RNN [20]. GN-based methods are devoted to evaluating spatial correlation among multiple nodes based on the constructed topology network, while RNN can capture temporal dependence inside traffic sequences. Therefore, many researches combine GN approaches and RNN to measure spatio-temporal correlation and have achieved satisfactory performance [21, 22].

In [21], Guo et al. proposed an Attention based Spatio-Temporal Graph Convolutional Network (ASTGCN), which contains a spatio-temporal attention mechanism for dynamically capturing the spatial patterns and temporal features. In [22], Zhu et al. proposed a traffic prediction method based on Graph Convolutional Network (GCN) and RNN, where the topology graph of road network is utilized for depicting the spatial correlation, aiming to increase forecasting accuracy.

However, these researches all adopt the centralized training strategy, where the central server updates prediction models directly based on traffic data transmitted from decentralized traffic nodes, thus resulting in huge communication overhead and privacy leakage.

II-B Federated Learning for Traffic Flow Forecasting

Some recent researches pay attention to forecasting traffic flow using FL [23, 14, 15, 16]. In [23], Qi et al. treated the vehicles as clients and introduced the blockchain to avoid the single point of failure. This scenario is not similar to ours. In [14, 15], Liu et al. and Zhang et al. investigated a similar scenario where traffic sensors mounted along roads belong to some certain organizations. The traffic data detected by sensors should be directly transmitted to the organizational servers for training, also confronting with excessive communicating burden. In order to preserve privacy, multiple organizational servers adopt FL manner to collaboratively train the prediction model. However, the two proposed methods are far from reasonable, because only the spatial correlation among intra-organization nodes is evaluated and that of inter-organization ones is ignored, the lack of which leads to low forecasting accuracy.

Different from the above researches, Meng et al. proposed a model named cross-node federated graph neural network (CNFGNN) [16]. In this work, each traffic sensor serves as a client, and GNN is utilized to capture the spatial relation at the central server. However, in the backpropagation process of GNN, the central server has to communicate with clients for updating parameters, which incurs considerable communication overhead. Besides, since the transportation graph composed by clients at each round remains unchanged, the captured spatial correlation among multiple clients barely changes, thus failing to cater for traffic fluctuation.

Given the dynamic changes of traffic flow, our work mainly focuses on designing an online method for dynamically capturing spatio-temporal correlation among traffic nodes in FL framework, which is quite different from the above researches.

III Problem Formulation and Preliminary Knowledge

In this section, we firstly formulate the traffic flow forecasting problem with OL mode in FL framework, and then present the preliminary knowledge.

III-A Problem Formulation

Suppose there is a central server and NN traffic nodes. All traffic nodes compose a transportation network, denoted as a directed graph 𝒢=(𝒮,)\mathcal{G}=(\mathcal{S},\mathcal{E}). 𝒮={s1,s2,,sN}\mathcal{S}=\{s_{1},s_{2},\cdots,s_{N}\} represents the node set and sns_{n} denotes the nn-th traffic node. =(em,n)N×N\mathcal{E}=\left({{e_{m,n}}}\right)\in{\mathbb{R}^{N\times N}} denotes the adjacency matrix of these nodes. If sms_{m} is adjacent to sns_{n}, em,n=1e_{m,n}=1 (otherwise em,n=0e_{m,n}=0). Let 𝒜n\mathcal{A}_{n} represent the set of traffic nodes adjacent to sns_{n}, and we define that sn𝒜ns_{n}\in\mathcal{A}_{n}. Let xt,nx_{t,n} denote the traffic speed observed by sns_{n} at the tt-th time stamp. Primary descriptions are listed in Table I.

TABLE I: Primary Notations and Definitions in Section III
Notation Definition
𝒢\mathcal{G} The transportation network composed by traffic nodes.
sns_{n} The nn-th traffic node.
em,ne_{m,n} The adjacency between sns_{n} and sms_{m}.
𝒜n\mathcal{A}_{n} Adjacent node set of sns_{n}.
xt,nx_{t,n} Traffic speed observed by sns_{n} at the tt-th time stamp.
f()f(\cdot) Prediction model.
TT Historical step.
FF Forecasting step.
NN Client number.
Xt,nTX_{t,n}^{T} Training speed sequence of sns_{n} at the tt-th round.
X^t,nF\hat{X}_{t,n}^{F} Predicted values generated by sns_{n} at the tt-th round.
ll Loss function.
\mathcal{R} Maximum global round.
wtw_{t} Global model parameters of the tt-th round.
REGREG Total regret over all clients.

In this scenario, traffic nodes are regarded as clients to perform traffic flow forecasting using global model parameters from the central server and then incrementally update local models based on newly-observed speed data. The central server aggregates local models uploaded by clients to yield the fresh global model.

At the tt-th round, sns_{n} receives the global model parameters wtw_{t} from the central server and forecasts traffic speeds of future FF time stamps based on traffic speeds of TT historical time stamps, which can be formulated as

X^t,nF=f(Xt,nT;wt),\hat{X}_{t,n}^{F}={f}(X_{t,n}^{T};{w_{t}}), (1)

where Xt,nT=(xtT+1,n,xtT+2,n,,xt,n)X_{t,n}^{T}=\left(x_{t-T+1,n},x_{t-T+2,n},\dots,x_{t,n}\right) denotes the sequence of traffic speeds from tT+1t-T+1 to tt observed by sns_{n}, and X^t,nF=(x^t+1,n,x^t+2,n,,x^t+F,n)\hat{X}_{t,n}^{F}=\left(\hat{x}_{t+1,n},\hat{x}_{t+2,n},\dots,\hat{x}_{t+F,n}\right) represents the sequence of predicted traffic speeds from t+1{t+1} to t+F{t+F}. f()f(\cdot) denotes the chosen prediction model.

Let Xt,nF=(xt+1,n,xt+2,n,,xt+F,n)X_{t,n}^{F}=\left(x_{t+1,n},x_{t+2,n},\dots,x_{t+F,n}\right) denote the true traffic speeds of the future time stamps sns_{n} observes. Then sns_{n} computes the prediction error as follows:

l(Xt,nF,f(Xt,nT;wt))=l(Xt,nF,X^t,nF;wt),{l}(X_{t,n}^{F},f(X_{t,n}^{T};w_{t}))={l}(X_{t,n}^{F},\hat{X}_{t,n}^{F};{w_{t}}), (2)

where ll denotes the loss function, and is adopted to evaluate the bias between the predicted and true values.

Let \mathcal{R} represent the maximum number of global round. We denote ww^{*} as the optimal model parameters which can be obtained in hindsight after the server seeing the data sequences of all NN clients over \mathcal{R} rounds [24]. The total prediction regret of NN clients can be denoted by REGREG, which evaluates the difference in prediction loss using the actual model parameters and ww^{*} [25]. REGREG is formulated as

REG=1Nt=1n=1N(l(Xt,nF,X^t,nF;wt)l(Xt,nF,X^t,nF;w)).REG=\frac{1}{N}\sum\limits_{t=1}^{\mathcal{R}}\sum\limits_{n=1}^{N}({{l}(X_{t,n}^{F},\hat{X}_{t,n}^{F};{w_{t}})}-{{l}(X_{t,n}^{F},\hat{X}_{t,n}^{F};{w^{*}})}). (3)

The objective of traffic flow forecasting adopting OL manner in FL is to minimize the global regrets over all NN clients, i.e., minREG.\min REG. Since ww^{*} remains constant, we have to iteratively optimize the optimal model parameters wt(1t𝒯)w_{t}(1\leq t\leq\mathcal{T}), so as to generate the minimum regret.

III-B Preliminary Knowledge of Federated Learning with Online Learning Manner

We integrate the OL manner into the widespread used Federated Averaging (FedAvg) method [26], which is elaborated in Algorithm 1. Specifically, at the tt-th round, each client concurrently optimizes its locally-owned prediction model via Online Gradient Descent (OGD) (Line 9-10) for EE epochs using the newly arriving traffic data. This update manner differs from traditional offline learning mechanism where the prediction model is pre-trained based on historical traffic data obtained in advance. After accomplishing the local training procedure, clients upload their updated local models to the central server. Then, the central server performs model aggregation and yields the fresh global model based on averaging mechanism (Line 5).

Input: Initialized model w1w_{1}, learning rate η\eta.
Output: The global model w+1w_{\mathcal{R}+1}.
1
2 ServerExecute:
3 for t=1,2,,t=1,2,\cdots,\mathcal{R} do
4       for sns_{n} in 𝒮\mathcal{S} do
5             wt+1,nw_{t+1,n}\leftarrow ClientExecute (wtw_{t}, nn, tt)
6            
7      wt+11Ntsn𝒮wt+1,n{w_{t+1}}\leftarrow\frac{1}{{{N_{t}}}}\sum\nolimits_{{s_{n}}\in{\mathcal{S}}}{{w_{t+1,n}}}
8      
9return w+1w_{\mathcal{R}+1}
10
11 Function ClientExecute(ww, nn, tt):
12       Make prediction via (1).
13       for e=1,2,,Ee=1,2,\cdots,E do
14             wwηl(Xt,nF,X^t,nF;w)w\leftarrow w-\eta\nabla l(X_{t,n}^{F},\hat{X}_{t,n}^{F};w)
15            
16      return ww
17
Algorithm 1 FedAvg with OL manner

IV Methodology

In this section, we provide a detailed introduction of FedOSTC, and the execution process of FedOSTC is elaborated in Fig. 1. Specifically, we propose a novel spatio-temporal correlation evaluation mechanism which includes obtaining hidden temporal patterns inside traffic speed sequences using GRU-based encoders at clients and assessing spatial correlation among clients at the central server. After that, we update local prediction models using OGD algorithm at clients. Based on the periodic patterns inside traffic flows, we further propose a period-aware aggregation mechanism to generate the fresh global model. Then, we elaborate the process of the proposed FedOSTC in Algorithm 2. Last but not least, we analyze the convergence of FedOSTC which theoretically validates the high-performance of FedOSTC.

Refer to caption
Figure 1: The procedure of FedOSTC. (1) The server distributes the global model. (2) The client uploads the temporal pattern. (3) The server returns the updated temporal pattern. (4) The client uploads local model parameters.

IV-A Spatio-Temporal Correlation Evaluation

An efficient way of assessing spatio-temporal correlation is fundamental to satisfactory prediction performance. Therefore, in this subsection, we elaborates how to evaluate spatial and temporal correlation among clients. Firstly, clients capture the hidden temporal patterns inside speed sequences respectively with GRU-based encoders. Then the central server measures the spatial correlation among clients using GAT and returns the updated hidden temporal patterns to clients. Finally, clients input the updated hidden patterns into the decoders to generate the predicted results.

IV-A1 Obtaining Temporal Patterns from Speed Sequences at Clients

Considering the effectiveness of GRU in processing temporal sequences, we adopt GRU to capture the temporal patterns inside traffic speed sequences.

At the tt-th round, given xd,nXt,nF(tT+1dt)x_{d,n}\in X_{t,n}^{F}(t-T+1\leq d\leq t), sns_{n} executes the process of encoding as follows:

zt,nd=sigmoid(Wt,nzxd,n+Ut,nzh^t,nd1),z_{t,n}^{d}=sigmoid(W_{t,n}^{z}x_{d,n}+U_{t,n}^{z}\hat{h}_{t,n}^{d-1}), (4)
rt,nd=sigmoid(Wt,nrxd,n+Ut,nrh^t,nd1),r_{t,n}^{d}=sigmoid(W_{t,n}^{r}x_{d,n}+U_{t,n}^{r}\hat{h}_{t,n}^{d-1}), (5)
h~t,nd=tanh(Wt,nhxd,n+Ut,nh(rt,ndh^t,nd1)),\tilde{h}_{t,n}^{d}=\tanh(W_{t,n}^{h}x_{d,n}+U_{t,n}^{h}(r_{t,n}^{d}\odot\hat{h}_{t,n}^{d-1})), (6)
h^t,nd=(1zt,nd)h^t,nd1+zt,ndh~t,nd,\hat{h}_{t,n}^{d}=(1-z_{t,n}^{d})\odot\hat{h}_{t,n}^{d-1}+z_{t,n}^{d}\odot\tilde{h}_{t,n}^{d}, (7)

where zt,ndz_{t,n}^{d}, rt,ndr_{t,n}^{d}, and h^t,nd\hat{h}_{t,n}^{d} represent the update gate, reset gate, and candidate hidden state respectively. Wt,nz,Ut,nz,Wt,nr,Ut,nr,Wt,nhW_{t,n}^{z},U_{t,n}^{z},W_{t,n}^{r},U_{t,n}^{r},W_{t,n}^{h}, and Ut,nhU_{t,n}^{h} denote weight matrices and are contained in wt(e)w_{t}^{(e)} which denotes the encoder parameters included in wtw_{t}.

Let ht,n=h^t,nth_{t,n}=\hat{h}_{t,n}^{t} denote the hidden temporal pattern learned by the encoder from Xt,nTX_{t,n}^{T}. Then, ht,nh_{t,n} is uploaded to the central server for evaluating spatial correlation among clients.

IV-A2 Evaluating Spatial Correlation Among Clients at the Server

In the scenario of traffic flow forecasting, due to the closeness of clients’ locations, each client is adjacent to some others in the transportation network 𝒢\mathcal{G}. Based on this, traffic speed observations at different clients are definitely correlated with each other. When a certain client makes prediction, historical traffic speed observations of the adjacent clients should be also taken into consideration. However, since raw traffic data stored at clients cannot be shared for privacy preservation, it is infeasible to capturing spatial correlation among clients using raw data like many centralized methods.

Inspired by [16], we draw that raw traffic sequences and temporal patterns have similar distribution. The distributions of temporal patterns and traffic sequences after dimensional reduction by Principal Component Analysis (PCA) are depicted in Fig. 2. We mark different clients in different colors and can observe that “circles” and “triangles” have the same distribution. Based on this, we utilize temporal patterns output by encoders as the alternative to raw traffic speed for the assessment of spatial correlation among clients.

Refer to caption
Figure 2: The distributions of raw speed sequences and temporal patterns after dimensional reduction by PCA.

In this subsection, we evaluate spatial correlation among clients using GAT, where the closeness of spatial relation is assessed by directly calculating attention coefficients of temporal patterns using attention mechanism. It has the advantages that attention coefficients eventually express the relation among raw data, thus adapting to the dynamic changes of spatial correlation aroused by traffic fluctuation.

Specifically, after receiving temporal patterns from clients, the central server firstly calculates the attention scores between two adjacent clients. The attention score between sns_{n} and sms_{m} at the tt-th round is denoted as ξm,nt(ξm,nt)\xi_{m,n}^{t}(\xi_{m,n}^{t}\in\mathbb{R}), which can be calculated by

ξm,nt=a([ht,n||ht,m]),sm𝒜n,{\xi_{m,n}^{t}}=a([{h_{t,n}}||{h_{t,m}}]),\forall{s_{m}}\in{\mathcal{A}_{n}}, (8)

where [||][\cdot||\cdot] represents the operation of concatenating. a()a(\cdot) is a projection function.

Furthermore, we denote the attention coefficient between sms_{m} and sns_{n} at the tt-th round as αm,nt\alpha_{m,n}^{t} and it is calculated by

αm,nt=exp(LeakyReLU(ξm,nt))sk𝒜nexp(LeakyReLU(ξk,nt)).{\alpha_{m,n}^{t}}=\frac{{\exp({\rm{LeakyReLU}}({\xi_{m,n}^{t}}))}}{{\sum\nolimits_{s_{k}\in{\mathcal{A}_{n}}}{\exp({\rm{LeakyReLU}}({\xi_{k,n}^{t}}))}}}. (9)

It is remarkable that αm,nt\alpha_{m,n}^{t} indicates the correlation strength of sms_{m} and sns_{n}. The larger αm,nt\alpha_{m,n}^{t} is, the more related sms_{m} is to sns_{n}.

Finally, the central server updates ht,nh_{t,n} according to

ht,n=sigmoid(sm𝒜nαm,ntht,m),{h^{\prime}_{t,n}}=sigmoid(\sum\nolimits_{{s_{m}}\in{\mathcal{A}_{n}}}{{\alpha_{m,n}^{t}}{h_{t,m}}}), (10)

where ht,n{h^{\prime}_{t,n}} denotes the updated version of ht,nh_{t,n}. It is worth noting that ht,n{h^{\prime}_{t,n}} contains not only the temporal pattern inside Xt,nTX_{t,n}^{T}, but the temporal patterns inside speed sequences from adjacent clients. Then, the central server transmits ht,n{h^{\prime}_{t,n}} back to sns_{n} for generating predicted values.

The decoder of sns_{n} inputs ht,nh^{\prime}_{t,n} and outputs the predicted results. wt(d)w_{t}^{(d)} denotes the decoder parameters and is also included in wtw_{t}. Each decoder is also a GRU whose execution follows the process of (4)-(7), besides appending a fully-connected layer for making decisions of the final prediction and outputting X^t,nF\hat{X}_{t,n}^{F}.

IV-B Prediction Model Update

In this subsection, we illustrate the process of prediction model update, which includes incrementally optimizing local models at clients and the period-aware model aggregation mechanism.

IV-B1 Incremental Local Optimization at Clients

At the tt-th round, sns_{n} outputs the predicted result X^t,nF\hat{X}_{t,n}^{F}, followed by calculating the prediction loss based on (2). Then, sns_{n} executes the backpropagation process to compute the current gradient of encoder and decoder. At the ee-th local epoch, the model parameters of sns_{n} are optimized using OGD as follows [25]:

gt,n(e)=l(Xt,nF,X^t,nF;wt,n(e1)),g_{t,n}^{(e)}=\nabla{l}(X_{t,n}^{F},\hat{X}_{t,n}^{F};w_{t,n}^{(e-1)}), (11)
wt,n(e)=wt,n(e1)ηgt,n(e),w_{t,n}^{(e)}=w_{t,n}^{(e-1)}-\eta g_{t,n}^{(e)}, (12)

where gt,n(e)g_{t,n}^{(e)} and wt,n(e)w_{t,n}^{(e)} denote the gradient and the local model parameters of sns_{n} at the ee-th epoch in the tt-th global round respectively. η\eta represents the learning rate and wt,n(0)=wtw_{t,n}^{(0)}={w_{t}}. sns_{n} executes the above process for EE epochs, and then transmits the up-to-date local model parameters wt+1,nw_{t+1,n} (wt+1,n=wt,n(E)w_{t+1,n}=w_{t,n}^{(E)}) to the central server for aggregation.

IV-B2 Period-Aware Aggregation Mechanism at the Server

In traditional FL methods, such as FedAvg [26], the server aggregates local model parameters, and generates the fresh global model wt+1w_{t+1} at the tt-th round as

wt+1=1Nn=1Nwt+1,n.{w_{t+1}}=\frac{1}{N}\sum\limits_{n=1}^{N}{{w_{t+1,n}}.} (13)

It is plain that wt+1w_{t+1} is updated based on traffic data at the tt-th round. Since traffic sequences at different rounds have variant distributions, this aggregation mechanism makes wt+1w_{t+1} more applicable to the traffic data of the tt-th round, compared with wtw_{t}. Therefore, if the server treats wt+1w_{t+1} as the initial model parameters at beginning of the tt-th round, the prediction accuracy can be increased. However, it is far from reasonable, since wt+1w_{t+1} is generated in the end of the tt-th round.

Refer to caption
Figure 3: Raw traffic flows of three clients in 5 days.

Furthermore, we discover that each traffic flow has the same built-in period. As shown in Fig. 3, each client has similar distribution every 288 time stamps (clients observe traffic speeds every 5 minutes and therefore one day is split into 288 time stamps), except for a few ones. Let 𝒯\mathcal{T} denote the period of traffic flows. It is reasonable to deem that local model parameters wt,nw_{t,n} and global model parameters wt+1w_{t+1} are similar with wt+𝒯,nw_{t+\mathcal{T},n} and wt+𝒯+1w_{t+\mathcal{T}+1} respectively.

Inspired by this, at the tt-th round, we calculate the correlation coefficients between previous local models wt,n(sn𝒮)w_{t,n}(\forall s_{n}\in\mathcal{S}) and the fresh global model wt+1w_{t+1} as

ρt,n=exp(wt,nwt+1)m=1Nexp(wt,mwt+1),sn𝒮,{\rho_{t,n}}=\frac{\exp(-||w_{t,n}-w_{t+1}||)}{\sum\nolimits_{m=1}^{N}{\exp(-||w_{t,m}-w_{t+1}||)}},\forall s_{n}\in\mathcal{S}, (14)

where ρt,n\rho_{t,n} denotes the correlation coefficient between wt,nw_{t,n} and wt+1w_{t+1}. If the server performs weighted aggregation based on ρt,n\rho_{t,n} in the end of the (t1t-1)-th round as sn𝒮ρt,nwt,n\sum\nolimits_{{s_{n}}\in{\mathcal{S}}}\rho_{t,n}w_{t,n}, the newly-generated global model wtw_{t} will apply to the data sequences of the tt-th round better than the original wtw_{t} yielded using averaging mechanism as (13). Furthermore, ρt,n\rho_{t,n} also enables to approximate the similarity between wt+𝒯,nw_{t+\mathcal{T},n} and wt+𝒯+1w_{t+\mathcal{T}+1}, and thus it is directly adopted as the aggregation weight of sns_{n} at the (t+𝒯1)(t+\mathcal{T}-1)-th round.

Specifically, the execution process of the proposed period-aware aggregation mechanism is as follows.

  • If t𝒯t\leq\mathcal{T}, the central server firstly generates the fresh global model wtw_{t} as (13) and then calculates correlation coefficients as (14).

  • If t>𝒯t>\mathcal{T}, the central server calculates correlation coefficients as (14), following yielding wt+1w_{t+1} as

    wt+1=n=1Nρt+1𝒯,nwt+1,n.{w_{t+1}}=\sum\limits_{n=1}^{N}{{\rho_{t+1-\mathcal{T},n}}}\cdot{w_{t+1,n}}. (15)

It is reasonable to deem that the averaging aggregation mechanism formulated in (13) is a special case of the weighted aggregation in (15), with ρt+1𝒯,n=1N,sn𝒮\rho_{t+1-\mathcal{T},n}=\frac{1}{N},\forall{s_{n}\in\mathcal{S}}.

By this means, the generated fresh global model can well adapt to the subsequent traffic data better, which yields more satisfactory prediction performance.

Input: initialized global model w1w_{1}, traffic flow sequence [xt,nTx_{t,n}^{T}, xt,nFx_{t,n}^{F}], maximum global round RR, maximum local epoch EE.
Output: The global model wR+1w_{R+1}
1
2
3ServerExecute:
4 for t=1,2,,t=1,2,\cdots,\mathcal{R} do
5       for sn𝒮s_{n}\in\mathcal{S} do
6             wt+1,nw_{t+1,n}\leftarrow ClientExecute (wtw_{t}, nn, tt)
7            
8            if t𝒯t\leq\mathcal{T} then
9                   Generate wt+1w_{t+1} via (13).
10                  
11            else
12                   Generate wt+1w_{t+1} via (15).
13                  
14            Calculate ρt,n\rho_{t,n} via (14).
15      
16return w+1w_{\mathcal{R}+1}.
17 Function ClientExecute(wtw_{t}, nn, tt):
18       [wt,n,e(0)w_{t,n,e}^{(0)}, wt,n,d(0)w_{t,n,d}^{(0)}] = wt,n(0)w_{t,n}^{(0)} = wtw_{t}.
19       for e=1,2,,Ee=1,2,\cdots,E do
20             ht,nh_{t,n} \leftarrow ClientEncode (xt,nTx^{T}_{t,n}, wt,n,e(e)w_{t,n,e}^{(e)}, nn).
21             if e=1e=1 then
22                   Get ht,nh^{{}^{\prime}}_{t,n} via (8)-(10).
23                  
24            x^t,nF\hat{x}^{F}_{t,n}\leftarrow ClientDecode (ht,nh_{t,n}, ht,nh^{{}^{\prime}}_{t,n}, wt,n,d(e)w_{t,n,d}^{(e)}, nn).
25             wt,n(e)w_{t,n}^{(e)} = [wt,n,e(e)w_{t,n,e}^{(e)}, wt,n,d(e)w_{t,n,d}^{(e)}].
26             lossloss \leftarrow l(Xt,nF,X^t,nF;wt,n(e){l}(X_{t,n}^{F},\hat{X}_{t,n}^{F};w_{t,n}^{(e)})
27             Update wt,n(e)w_{t,n}^{(e)} via (11)(12).
28            
29      wt+1,nw_{t+1,n} = wt,n(E)w_{t,n}^{(E)} = [wt,n,e(E)w_{t,n,e}^{(E)}, wt,n,d(E)w_{t,n,d}^{(E)}].
30       return wt+1,nw_{t+1,n}.
31      
32
33 Function ClientEncode(xt,nTx^{T}_{t,n}, wt,n,e(e)w_{t,n,e}^{(e)}, nn):
34       With xt,nTx_{t,n}^{T} input, get ht,nh_{t,n} through (4)-(7).
35       return ht,nh_{t,n}
36      
37 Function ClientDecode(ht,nh_{t,n}, ht,nh^{{}^{\prime}}_{t,n}, wt,n,d(e)w_{t,n,d}^{(e)}, nn):
38       Get x^t,nF\hat{x}^{F}_{t,n} from GRU.
39       return x^t,nF\hat{x}^{F}_{t,n}.
Algorithm 2 FedOSTC

IV-C Execution Process of FedOSTC

Algorithm 2 elaborates the process of FedOSTC, where wt,n,e(e)w_{t,n,e}^{(e)} and wt,n,d(e)w_{t,n,d}^{(e)} denote the parameters of encoder and decoder on sns_{n} at the ee-th epoch in the tt-th round respectively. At each global round, the server transmits current global model to clients. Encoders at clients input traffic sequences and output the temporal patterns of these traffic sequences (Line 23-25). The server evaluates the spatial correlation among clients using GAT, and then updates hidden states from clients (Line 16). When clients receive the updated hidden states from the server, they make prediction, calculate prediction loss and optimize parameters of encoders and decoders via OGD, which is iterated for EE epochs (Line 17-20). Then clients transmit their updated model parameters to the central server for aggregation using the period-aware weighted aggregation mechanism (Line 5-9). The process continues until the maximum round \mathcal{R} arrives.

IV-D Convergence Analysis of FedOSTC

In this subsection, the convergence analysis of FedOSTC is demonstrated theoretically. Based on the theory of Online Convex Optimization (OCO) [24], we obtain the upper bound of REGREG.

For simplicity, we reformulate l(Xt,nF,X^t,nF;wt){l}(X_{t,n}^{F},\hat{X}_{t,n}^{F};{w_{t}}) as l(wt)l(w_{t}), since the loss and prediction results are dependent on the adopted model parameters. Firstly, some assumptions are introduced.

Assumption 1.

The loss function l()l(\cdot) is L-smooth, i.e., l(x)l(y)l(y),xy+L2xy2,x,y,L>0.l(x)-l(y)\leq\left\langle{\nabla l(y),x-y}\right\rangle+\frac{L}{2}||x-y|{|^{2}},\forall x,y,\exists L>0.

Assumption 2.

The gradient of loss function l()l(\cdot) is uniformly bounded, i.e., 𝔼(l()2)G2.{\rm\mathbb{E}}(||\nabla{l}(\cdot)|{|^{2}})\leq{G^{2}}.

Assumption 3.

REG is bounded below, i.e., REGmin=REG>.RE{G_{\min}}=REG{{}^{*}}>-\infty.

To Facilitate the analysis on the convergence rate of FedOSTC, the period-aware weighted aggregation mechanism described in (15) is transformed.

Lemma 1.

The aggregation mechanism described in (15) can be reformulated as

wt+1=wtηn=1Ne=1Eρt,ngt,n(e).{w_{t+1}}={w_{t}}-\eta\sum\limits_{n=1}^{N}{\sum\limits_{e=1}^{E}{{\rho_{t,n}}\cdot g_{t,n}^{(e)}}}. (16)

Therefore, the dynamic weighting aggregation mechanism can be considered as a relational expression of wtw_{t} and wt+1w_{t+1}. The proof of Lemma 4.1 is elaborated as follows.

Proof.

With (11) and (12), we can get

wt,n(E)\displaystyle w_{t,n}^{(E)} =wt,n(E1)ηgt,n(E)=wt,n(E2)ηgt,n(E1)ηgt,n(E)\displaystyle=w_{t,n}^{(E-1)}-\eta g_{t,n}^{(E)}=w_{t,n}^{(E-2)}-\eta g_{t,n}^{(E-1)}-\eta g_{t,n}^{(E)}
==wt,n(0)ηgt,n(1)ηgt,n(E1)ηgt,n(E)\displaystyle=\cdots=w_{t,n}^{(0)}-\eta g_{t,n}^{(1)}-\cdots-\eta g_{t,n}^{(E-1)}-\eta g_{t,n}^{(E)}
=wtηe=1Egt,n(e).\displaystyle={w_{t}}-\eta\sum\limits_{e=1}^{E}{g_{t,n}^{(e)}}. (17)

Hence, we can complete the proof as

wt+1\displaystyle{w_{t+1}} =n=1Nρt,nwt+1,n=n=1Nρt,n(wtηe=1Egt,n(e))\displaystyle=\sum\limits_{n=1}^{N}{{\rho_{t,n}}{w_{t+1,n}}}=\sum\limits_{n=1}^{N}{{\rho_{t,n}}({w_{t}}-\eta\sum\limits_{e=1}^{E}{g_{t,n}^{(e)}})}
=n=1Nρt,nwtηn=1Ne=1Eρt,ngt,n(e)\displaystyle=\sum\limits_{n=1}^{N}{{\rho_{t,n}}{w_{t}}-\eta\sum\limits_{n=1}^{N}{\sum\limits_{e=1}^{E}{{\rho_{t,n}}g_{t,n}^{(e)}}}}
=wtηn=1Ne=1Eρt,ngt,n(e).\displaystyle={w_{t}}-\eta\sum\limits_{n=1}^{N}{\sum\limits_{e=1}^{E}{{\rho_{t,n}}g_{t,n}^{(e)}}}. (18)

Theorem 1.

With Assumption 1-3 and Lemma 1 held, after \mathcal{R} round, REGREG in FedOSTC has an upper bound:

𝔼[REGmin](1+EN+L2EN)ηG2.{\rm\mathbb{E}}[REG_{\min}]\leq{\rm{(1}}+EN+\frac{L}{2}EN{\rm{)}}\eta\mathcal{R}{G^{2}}. (19)

Theorem 4.2 indicates that the minimum regret of FedOSTC has an upper bound. Furthermore, REGmin=o()REG_{\min}=o(\mathcal{R}) indicates that FedOSTC performs as well as the the optimal parameters ww^{*}. Therefore, FedOSTC is efficient in tackling with the issue of traffic flow forecasting. The proof of Theorem 4.2 is detailed.

Proof.

Since ll is L-smooth, we can transform Assumption 1 as

l(wt)l(wt+1)l(wt+1),wtwt+1+L2wtwt+12\displaystyle{l}({w_{t}})-{l}({w_{t+1}})\leq\left\langle{\nabla{l}({w_{t+1}}),{w_{t}}-{w_{t+1}}}\right\rangle+\frac{L}{2}||{w_{t}}-{w_{t+1}}|{|^{2}}
=l(wt+1),ηe=1En=1Nρt,ngt,n(e)+L2ηe=1En=1Nρt,ngt,n(e)2\displaystyle=\left\langle{\nabla{l}({w_{t+1}}),\eta\sum\limits_{e=1}^{E}{\sum\limits_{n=1}^{N}{{\rho_{t,n}}g_{t,n}^{(e)}}}}\right\rangle{\rm{}}+\frac{L}{2}||\eta\sum\limits_{e=1}^{E}{\sum\limits_{n=1}^{N}{{\rho_{t,n}}g_{t,n}^{(e)}}}|{|^{2}}
ηl(wt+1)e=1En=1Nρt,ngt,n(e)2+L2ηe=1En=1Nρt,ngt,n(e)2\displaystyle\leq\eta||\nabla{l}({w_{t+1}})-\sum\limits_{e=1}^{E}{\sum\limits_{n=1}^{N}{{\rho_{t,n}}g_{t,n}^{(e)}}}|{|^{2}}{\rm{}}+\frac{L}{2}||\eta\sum\limits_{e=1}^{E}{\sum\limits_{n=1}^{N}{{\rho_{t,n}}g_{t,n}^{(e)}}}|{|^{2}}
ηl(wt+1)2+(1+L2)ηe=1En=1Nρt,ngt,n(e)2\displaystyle\leq\eta||\nabla{l}({w_{t+1}})|{|^{2}}+(1+\frac{L}{2})\eta\sum\limits_{e=1}^{E}{\sum\limits_{n=1}^{N}{||{\rho_{t,n}}g_{t,n}^{(e)}}}|{|^{2}} (20)

With Assumption 2, the inequation above can be reformulated as

𝔼[l(wt)l(wt+1)]\displaystyle{\mathbb{E}}[{l}({w_{t}})-{l}({w_{t+1}})] η𝔼l(wt+1)2\displaystyle\leq\eta{\rm\mathbb{E}}||\nabla{l}({w_{t+1}})|{|^{2}}
+(1+L2)ηe=1En=1N𝔼ρt,ngt,n(e)2\displaystyle+(1+\frac{L}{2})\eta\sum\limits_{e=1}^{E}{\sum\limits_{n=1}^{N}{{\rm\mathbb{E}}||{\rho_{t,n}}g_{t,n}^{(e)}}}|{|^{2}}
ηG2+(1+L2)ηENG2.\displaystyle\leq\eta{G^{2}}+(1+\frac{L}{2})\eta EN{G^{2}}. (21)

With Assumption 3, the minimum regret can be obtained as

𝔼[REGmin]\displaystyle{\mathbb{E}}[REG_{\min}] =𝔼{1Nn=1Nt=1[l(wt)l(w)]}\displaystyle={\rm\mathbb{E}}\left\{{\frac{1}{N}\sum\limits_{n=1}^{N}{\sum\limits_{t=1}^{\mathcal{R}}{[{l}({w_{t}})-{l}({w^{*}})]}}}\right\}
𝔼{1Nn=1Nt=1[l(wt)l(wt+1)]}\displaystyle\leq{\rm\mathbb{E}}\left\{{\frac{1}{N}\sum\limits_{n=1}^{N}{\sum\limits_{t=1}^{\mathcal{R}}{[{l}({w_{t}})-{l}({w_{t+1}})]}}}\right\}
1Nn=1Nt=1(ηG2+ηENG2+L2ηENG2)\displaystyle\leq\frac{1}{N}\sum\limits_{n=1}^{N}{\sum\limits_{t=1}^{\mathcal{R}}{(\eta{G^{2}}+\eta EN{G^{2}}+\frac{L}{2}\eta EN{G^{2}})}}
=(1+EN+L2EN)ηG2.\displaystyle=(1+EN+\frac{L}{2}EN{\rm{)}}\eta{\mathcal{R}}{G^{2}}. (22)

V Experiments

TABLE II: Comparison in Prediction Performance of Six Methods on Two Datasets with Different Forecasting Steps
Methods PEMS-BAY
5min (F=1F=1) 30min (F=6F=6) 1h (F=12F=12)
RMSE MAE RMSE MAE RMSE MAE
CenterOff 1.898 1.019 3.602 1.736 4.693 2.255
CenterOn 1.614 1.007 2.243 1.353 2.327 1.371
FedAvgOff 1.802 1.023 3.717 1.784 5.253 2.406
FedAvgOn 0.996 0.996 1.952 1.712 2.661 2.306
CNFGNN 1.691 0.983 3.758 1.882 5.265 2.523
FedOSTC 1.014 1.014 1.450 1.209 1.601 1.254
\uparrow -0.108 -0.018 +0.502 +0.503 +0.726 +0.117
Methods METR-LA
5min (F=1F=1) 30min (F=6F=6) 1h (F=12F=12)
RMSE MAE RMSE MAE RMSE MAE
CenterOff 5.796 3.248 7.975 4.392 9.346 5.184
CenterOn 4.949 3.261 5.889 3.639 6.137 3.673
FedAvgOff 5.843 3.321 8.163 4.445 9.522 5.223
FedAvgOn 2.945 2.945 4.588 3.877 5.714 4.744
CNFGNN 5.705 3.157 8.160 4.536 9.523 5.242
FedOSTC 3.006 3.006 4.129 3.353 4.598 3.484
\uparrow -0.061 -0.061 +0.459 +0.286 +1.116 +0.189

In this section, comprehensive experiments are conducted to validate the effectiveness of our proposed method FedOSTC. Firstly, the system configuration is introduced. Then, the prediction performance of FedOSTC is compared with that of five baselines. Furthermore, we compare the generalization ability of the global models in the FL-based methods. Finally, additional experiments are conducted to explore the effect of local epoch on prediction performance.

V-A System Configuration

V-A1 Datasets and Metrics

The experiments are conducted on two real-world datasets, i.e., PEMS-BAY and METR-LA respectively[8]. PEMS-BAY contains traffic speed from 01/01/2017 to 31/05/2017 collected by 325 sensors. METR-LA contains speed information from 207 sensors ranging from 01/03/2012 to 30/06/2012. In the two datasets, sensors observe traffic speed every 5 minutes. We randomly select 50 sensors as clients and utilize the speed data from Sunday to Thursday in May for experiments. The adjacency matrixes of clients for the two datasets are constructed like [16]. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are adopted to evaluate the prediction error.

V-A2 Experiment Setting

FedOSTC is implemented with PyTorch and all experiments are conducted on a server with an Intel(R) Xeon(R) Gold 6230 CPU and eight Nvidia Tesla V100S-PCIe GPUs. In all experiments, the encoder and decoder are both a GRU layer with 64 and 128 cells respectively. TT, EE, and η\eta are set to 12, 5, and 0.001 respectively. The settings of other methods are the same as the respective literature, unless mentioned otherwise.

V-B Comparisons of Forecasting Performance

Refer to caption
Figure 4: Prediction RMSEs of clients.

V-B1 Baseline Methods

We compare FedOSTC with five baseline methods described as follows.

  • CenterOff[27]: The central server trains the prediction model using traffic data uploads from traffic nodes by batch learning manner. Spatial correlation among traffic nodes is not evaluated.

  • CenterOn: The execution process is the same as CenterOff except for that the central server online updates the prediction model.

  • FedAvgOff[26]: In the FedAvg method, clients update the prediction model based on batch learning manner and the server aggregates updated local models via averaging mechanism. Spatial correlation among clients is not evaluated.

  • FedAvgOn: It is the same as FedAvgOff, except for that clients update the prediction model based on OL manner.

  • CNFGNN[16]: The execution process is like FedAvgOff, except for that the server evaluates spatial dependence using GNN.

The prediction errors of FedOSTC and the baselines on two datasets with different forecasting steps are presented in Table II. Thereinto, ‘\uparrow’ denotes the performance gains of FedOSTC over the best-performed baseline.

If the forecasting step is shorter (FF is lower), the speed sequence to be forecasted (Xt,nFX_{t,n}^{F}) is more likely to have similar distribution with the historical speed sequence (Xt,nTX_{t,n}^{T}). This is the reason why prediction errors increase with the longer forecasting step. When F=1F=1, the performance gains of FedOSTC are negative and the absolute values are so low as to be ignored. However, when F=6F=6 and 12, FedOSTC performs best and generates considerable performance gains.

In terms of both RMSE and MAE, CenterOn and FedAvgOn outperform CenterOff and FedAvgOff respectively, which indicates the high-efficiency of OL manner in the task of traffic flow forecasting. It is because that the prediction model is incrementally optimized based on fine-grained speed data, which can sufficiently obtain the dynamic changes in speed data.

In the three methods adopting batch learning manner, CenterOff achieves the best prediction performance on both datasets. It shows that the centralized method (CenterOff) perform better in evaluating spatio-temporal correlation among clients than the FL methods (FedAvgOff and CNFGNN). However, our proposed FedOSTC performs best among the three methods adopting OL manner, which indicates the effectiveness of spatio-temporal correlation mechanism and period-aware weighted aggregation mechanism.

To further evaluate the high-efficiency of FedOSTC in traffic flow forecasting, ground truth values and prediction results of FedOSTC and CNFGNN are illustrated in Fig. 5. It is explicit that FedOSTC yields much smaller deviation, compared with CNFGNN, which is the existing best FL method. Furthermore, the distribution of prediction values from FedOSTC is similar to that of ground truth values.

Refer to caption
Figure 5: Ground truth and forecasting values of CNFGNN and FedOSTC.
Refer to caption
Figure 6: Local epoch versus prediction errors.

V-C Comparisons of Generalization Ability

In the scenario of traffic flow forecasting, the observed traffic speeds from clients have different distribution. While in FL framework, all clients train a same prediction model cooperatively. We hope the global prediction model from the central server has good generalization ability so that all clients can achieve satisfactory performance. Therefore, in this subsection, we compare model generalization ability of the four methods based on FL framework. The prediction RMSEs of clients on two datasets with different forecasting steps are illustrated in Fig.4. By comparing FedAvgOff and CNFGNN with FedAvgOn and FedOSTC respectively, we can observe that RMSE variance of OL manner is much smaller than that of batch learning manner, which indicates that OL manner can yield global model with better generalization ability in FL. Furthermore, the prediction RMSE of FedOSTC is lower than that of FedAvgOn for almost all clients. The RMSE variance over all clients in FedOSTC is obviously lower than that in FedAvgOn. Therefore, by adopting the proposed spatio-temporal correlation evaluating mechanism and period-aware weighted aggregation mechanism, FedOSTC can increase the generalization ability of global model.

V-D Effect of E on Prediction Performance

In the scenario of traffic flow forecasting, clients make prediction and perform incremental optimization based on only a speed sequence. It is quite different from those methods adopting batch learning manner, where the local batch size is set to 128. In this subsection, we explore the effect of local epoch on prediction performance and further consider whether 5 epochs are redundant for incremental optimization. The prediction RMSEs and MAEs on two datasets with different settings of EE are illustrated in Fig. 6.

We observe that different settings of EE yield various effects on prediction errors. On the two datasets, both prediction RMSEs and MAEs get lower with more local epochs, which indicates that the internal patterns of speed sequences can be obtained better with more local optimizations. On the other hand, larger local epoch means clients have to spare more computation resources for local optimization, which poses much burden on the resource-limited clients. Therefore, in the future research, we will explore the balance between local epoch and prediction performance.

VI Conclusion

In this paper, we proposed a novel method named FedOSTC for traffic flow forecasting. To boost prediction performance, we dynamically assessed spatio-temporal correlation among clients. Specifically, the temporal patterns inside traffic data sequences are obtained at clients. Since spatial correlation among clients varies with traffic fluctuation, the spatial dependence of the observed traffic data is assessed dynamically via GAT. Given the periodic changes of traffic flows, we proposed a period-aware aggregation mechanism to aggregate the online-updated local models, aiming to improve the generalization of the fresh global model for the subsequent traffic data. Last but not least, we conducted extensive experiments on METR-LA and PEMS-BAY datasets to verify the effectiveness of OL manner in traffic flow forecasting and the superiority of FedOSTC, compared with state-of-the-art methods.

Given the observation that larger local epochs yield higher prediction accuracy but more computation sources, we will explore the balance between computation overhead and prediction performance in the future research and refine FedOSTC from the aspects of reducing communication overhead, decreasing forecasting time, etc.

References

  • [1] C. Lin, G. Han, J. Du, T. Xu, L. Shu, and Z. Lv, “Spatiotemporal congestion-aware path planning toward intelligent transportation systems in software-defined smart city iot,” IEEE Internet of Things Journal, vol. 7, no. 9, pp. 8012–8024, 2020.
  • [2] F. Zhu, Y. Lv, Y. Chen, X. Wang, G. Xiong, and F.-Y. Wang, “Parallel transportation systems: Toward iot-enabled smart urban traffic control and management,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 10, pp. 4063–4071, 2019.
  • [3] L. Liu, J. Zhen, G. Li, G. Zhan, Z. He, B. Du, and L. Lin, “Dynamic spatial-temporal representation learning for traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 11, pp. 7169–7183, 2020.
  • [4] C. Chen, B. Liu, S. Wan, P. Qiao, and Q. Pei, “An edge traffic flow detection scheme based on deep learning in an intelligent transportation system,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1840–1852, 2020.
  • [5] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction with big data: a deep learning approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 2, pp. 865–873, 2014.
  • [6] X. Tan, Y. Xie, H. Ma, S. Yu, and J. Hu, “Recognizing the content types of network traffic based on a hybrid dnn-hmm model,” Journal of Network and Computer Applications, vol. 142, pp. 51–62, 2019.
  • [7] Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm network: a deep learning approach for short-term traffic forecast,” IET Intelligent Transport Systems, vol. 11, no. 2, pp. 68–75, 2017.
  • [8] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” arXiv preprint arXiv:1707.01926, 2017.
  • [9] D. Zhang and M. R. Kabuka, “Combining weather condition data to predict traffic flow: a gru-based deep learning approach,” IET Intelligent Transport Systems, vol. 12, no. 7, pp. 578–585, 2018.
  • [10] Z. Li, C. Lu, Y. Yi, and J. Gong, “A hierarchical framework for interactive behaviour prediction of heterogeneous traffic participants based on graph neural network,” IEEE Transactions on Intelligent Transportation Systems, 2021.
  • [11] T. Wu, F. Chen, and Y. Wan, “Graph attention lstm network: A new model for traffic flow forecasting,” in 2018 5th International Conference on Information Science and Control Engineering (ICISCE).   IEEE, 2018, pp. 241–245.
  • [12] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 10, no. 2, pp. 1–19, 2019.
  • [13] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
  • [14] Y. Liu, J. James, J. Kang, D. Niyato, and S. Zhang, “Privacy-preserving traffic flow prediction: A federated learning approach,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7751–7763, 2020.
  • [15] C. Zhang, S. Zhang, J. James, and S. Yu, “Fastgnn: A topological information protected federated learning approach for traffic speed forecasting,” IEEE Transactions on Industrial Informatics, vol. 17, no. 12, pp. 8464–8474, 2021.
  • [16] C. Meng, S. Rambhatla, and Y. Liu, “Cross-node federated graph neural network for spatio-temporal data modeling,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 1202–1211.
  • [17] S. Lee and D. B. Fambro, “Application of subset autoregressive integrated moving average model for short-term freeway traffic volume forecasting,” Transportation research record, vol. 1678, no. 1, pp. 179–188, 1999.
  • [18] B. M. Williams and L. A. Hoel, “Modeling and forecasting vehicular traffic flow as a seasonal arima process: Theoretical basis and empirical results,” Journal of transportation engineering, vol. 129, no. 6, pp. 664–672, 2003.
  • [19] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
  • [20] D. Ma, X. Song, and P. Li, “Daily traffic flow forecasting through a contextual convolutional recurrent neural network modeling inter-and intra-day traffic patterns,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 5, pp. 2627–2636, 2020.
  • [21] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-temporal graph convolutional networks for traffic flow forecasting,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 922–929.
  • [22] H. Zhu, Y. Xie, W. He, C. Sun, K. Zhu, G. Zhou, and N. Ma, “A novel traffic flow forecasting method based on rnn-gcn and brb,” Journal of Advanced Transportation, vol. 2020, 2020.
  • [23] Y. Qi, M. S. Hossain, J. Nie, and X. Li, “Privacy-preserving blockchain-based federated learning for traffic flow prediction,” Future Generation Computer Systems, vol. 117, pp. 328–337, 2021.
  • [24] S. Shalev-Shwartz et al., “Online learning and online convex optimization,” Foundations and trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
  • [25] S. C. Hoi, D. Sahoo, J. Lu, and P. Zhao, “Online learning: A comprehensive survey,” Neurocomputing, vol. 459, pp. 249–289, 2021.
  • [26] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  • [27] K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in EMNLP, 2014.
[Uncaptioned image] Qingxiang Liu is currently pursuing the Ph.D degree in University of Chinese Academy of Sciences, Beijing, China, with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. He has authored or coauthored several papers at IEEE Transactions on Computational Social Systems, Future Generation Computer Systems and so on. His current research interests include federated learning and edge computing.
[Uncaptioned image] Sheng Sun received her B.S. and Ph.D degrees in computer science from Beihang University, China, and the University of Chinese Academy of Sciences, China, respectively. She is currently an assistant professor at the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. Her current research interests include federated learning, mobile computing and edge intelligence.
[Uncaptioned image] Min Liu (Senior Member, IEEE) received her B.S. and M.S. degrees in computer science from Xi’an Jiaotong University, China, in 1999 and 2002, respectively. She got her Ph.D in computer science from the Graduate University of the Chinese Academy of Sciences in 2008. She is currently a professor at the Networking Technology Research Centre, Institute of Computing Technology, Chinese Academy of Sciences. Her current research interests include mobile computing and edge intelligence.
[Uncaptioned image] Yuwei Wang received his Ph.D. degree in computer science from the University of Chinese Academy of Sciences, Beijing, China. He is currently a Senior Engineer (equivalent to Associate Professor) at the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His current research interests include federated learning, mobile edge computing, and next-generation network architecture.
[Uncaptioned image] Bo Gao received the Ph.D. degree in computer engineering from Virginia Tech, Blacksburg, VA, USA, in 2014. He was an Assistant Professor with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, from 2014 to 2017. He was a Visiting Researcher with the School of Computing and Communications, Lancaster University, Lancaster, UK, from 2018 to 2019. He is cur-rently an Associate Professor with the School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China. His research interests include wireless net-working, dynamic spectrum sharing, mobile edge computing and multi-agent systems.