This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning

Qiong Wu, , Yu Zhao, Qiang Fan, Pingyi Fan,  ,
Jiangzhou Wang, , Cui Zhang
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61701197, in part by the open research fund of State Key Laboratory of Integrated Services Networks under Grant No. ISN23-11, in part by the 111 Project under Grant No. B12018. (Corresponding author: Qiong Wu) Qiong Wu and Yu Zhao are with School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China, and also with the State Key Laboratory of Integrated Services Networks (Xidian University), Xi’an 710071, China (Email: [email protected], [email protected]). Qiang Fan is with Qualcomm, San Jose CA 95110 USA (Email: [email protected]). Pingyi Fan is with the Department of Electronic Engineering, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China (Email: [email protected]). Jiangzhou Wang is with the School of Engineering, University of Kent, CT2 7NT Canterbury, U.K. (Email: [email protected]). Cui Zhang is with Banma Network Technology Co., Ltd., Shanghai 200000, China (Email: [email protected]).
Abstract

The vehicular edge computing (VEC) can cache contents in different RSUs at the network edge to support the real-time vehicular applications. In VEC, owing to the high-mobility characteristics of vehicles, it is necessary to cache the user data in advance and learn the most popular and interesting contents for vehicular users. Since user data usually contains privacy information, users are reluctant to share their data with others. To solve this problem, traditional federated learning (FL) needs to update the global model synchronously through aggregating all users’ local models to protect users’ privacy. However, vehicles may frequently drive out of the coverage area of the VEC before they achieve their local model trainings and thus the local models cannot be uploaded as expected, which would reduce the accuracy of the global model. In addition, the caching capacity of the local RSU is limited and the popular contents are diverse, thus the size of the predicted popular contents usually exceeds the cache capacity of the local RSU. Hence, the VEC should cache the predicted popular contents in different RSUs while considering the content transmission delay. In this paper, we consider the mobility of vehicles and propose a cooperative Caching scheme in the VEC based on Asynchronous Federated and deep Reinforcement learning (CAFR). We first consider the mobility of vehicles and propose an asynchronous FL algorithm to obtain an accurate global model, and then propose an algorithm to predict the popular contents based on the global model. In addition, we consider the mobility of vehicles and propose a deep reinforcement learning algorithm to obtain the optimal cooperative caching location for the predicted popular contents in order to optimize the content transmission delay. Extensive experimental results have demonstrated that the CAFR scheme outperforms other baseline caching schemes.

Index Terms:
cooperative caching, VEC, asynchronous federated learning, deep reinforcement learning

I Introduction

With the development of the internet of vehicles (IoV) and cloud computing, caching technology facilitates various real-time vehicular applications for vehicular users (VUs), such as automatic navigation, pattern recognition and multimedia entertainment [1] [2]. For the standard caching technology, the cloud caches various contents like data, video and web pages. In this scheme, vehicles transmit the required contents to a macro base station (MBS) connected to a cloud server, and could fetch the contents from the MBS, which would cause high content transmission delay from the MBS to vehicles due to the communication congestion caused by frequently requested contents from vehicles [3]. The content transmission delay can be effectively reduced by the emergence of vehicular edge computing (VEC), which caches contents in the road side unit (RSU) deployed at the edge of vehicular networks (VNs) [4]. Thus, vehicles can fetch contents directly from the local RSU, to reduce the content transmission delay. In the VEC, since the caching capacity of the local RSU is limited, if some vehicles cannot fetch their required contents, a neighboring RSU who has the required contents could forward them to the local RSU. The worst case is that vehicles need to fetch contents from the MBS due to both local and neighboring RSUs not having cached the requested contents.

In the VEC, it is critical to design a caching scheme to cache the popular contents. The traditional caching schemes cache contents based on the previously requested contents [5]. However, owing to the high-mobility characteristics of vehicles in VEC, the previously requested contents from vehicles may become outdated quickly, thus the traditional caching schemes may not satisfy all the VUs’ requirement. Therefore, it is necessary to predict the most popular contents in the VEC and cache them in the suitable RSUs in advance. Machine learning (ML) as a new tool, can extract hidden features by training user data to efficiently predict popular contents[6]. However, the user data usually contains privacy information and users are reluctant to share their data directly with others, which make it difficult to collect and train users’ data. Federated learning (FL) can protect the privacy of users by sharing their local models instead of data[7]. In traditional FL, the global model is periodically updated by aggregating all vehicles’ local models[8] -[10]. However, vehicles may frequently drive out of the coverage area of the VEC before they update their local models and thus the local models cannot be uploaded in the same area, which would reduce the accuracy of the global model as well as the probability of getting the predicted popular contents. Hence, it motivates us to consider the mobility of vehicles and propose an asynchronous FL to predict accurate popular contents in VEC.

Generally, the predicted popular contents should be cached in their local RSU of vehicles to guarantee a low content transmission delay. However, the caching capacity of each local RSU is limited and the popular contents may be diverse, thus the size of the predicted popular contents usually exceeds the cache capacity of the local RSU. Hence, the VEC has to determine where the predicted popular contents are cached and updated. The content transmission delay is an important metric for vehicles to provide real-time vehicular application. The different popular contents cached in the local and neighboring RSUs would impact the way vehicles fetch contents, and thus affect the content transmission delay. In addition, the content transmission delay of each vehicle is impacted by its channel condition, which is affected by vehicle mobility. Therefore, it is necessary to consider the mobility of vehicles to design a cooperative caching scheme, in which the predicted popular contents can be cached among RSUs to optimize the content transmission delay. In contrast to some conventional decision algorithms, deep reinforcement learning (DRL) is a favorable tool to construct the decision-making framework and optimize the cooperative caching for the contents in complex vehicular environment [11]. Therefore, we shall employ DRL to determine the optimal cooperative caching to reduce the content transmission delay of vehicles.

In this paper, we consider the vehicle mobility and propose a cooperative Caching scheme in VEC based on Asynchronous Federated and deep Reinforcement learning (CAFR). The main contributions of this paper are summarized as follows.

  • 1)

    By considering the mobility characteristics of vehicles including the positions and velocities, we propose an asynchronous FL algorithm to improve the accuracy of the global model.

  • 2)

    We propose an algorithm to predict the popular contents based on the global model, where each vehicle adopts the autoencoder (AE) to predict its interested contents based on the global model, while the local RSU collects the interested contents of all vehicles within the coverage area to catch the popular contents.

  • 3)

    We elaborately design a DRL framework (dueling deep Q-network (DQN)) to illustrate the cooperative caching problem, where the state, action and reward function have been defined. Then the local RSU can determine optimal cooperative caching to minimize the content transmission delay based on the dueling DQN algorithm.

The rest of the paper is organized as follows. Section II reviews the related works on content caching in VNs. Section III briefly describes the system model. Section IV proposes a mobility-aware cooperative caching in the VEC based on asynchronous federated and deep reinforcement learning method. We present some simulation results in Section V, and then conclude them in Section VI.

II Related Work

In this section, we first review the existing works related to the content caching in vehicular networks (VNs), and then survey the current state of art of the cooperative content caching schemes in VEC.

In [12], Dai et al. proposed a distributed content caching framework with empowering blockchain to achieve security and protect privacy, and considered the mobility of vehicles to design an intelligent content caching scheme based on DRL framework. In [13], Yu et al. proposed a mobility-aware proactive edge caching scheme in VNs that allows multiple vehicles with private data to collaboratively train a global model for predicting content popularity, in order to meet the requirements for computationally intensive and latency-sensitive vehicular applications. In [14], Zhao et al. optimized the edge caching and computation management for service caching, and adopted Lyapunov optimization to deal with the dynamic and unpredictable challenges in VNs. In [15], Jiang et al. constructed a two-tier secure access control structure for providing content caching in VNs with the assistance of edge devices, and proposed the group signature-based scheme for the purpose of anonymous authentication. In [16], Tang et al. proposed a new optimization method to reduce the average response time of caching in VNs, and then adopted Lyapunov optimization technology to constrain the long-term energy consumption to guarantee the stability of response time. In [17], Dai et al. proposed a VN with digital twin to cache contents for adaptive network management and policy arrangement, and designed an offloading scheme based on the DRL framework to minimize the total offloading delay. However, the above content caching schemes in VNs did not take into account the cooperative caching in the VEC environment.

There are some works considering cooperative content caching schemes in VEC. In [18], Qiao et al. proposed a cooperative edge caching scheme in VEC and constructed the double time-scale markov decision process to minimize the content access cost, and employed the deep deterministic policy gradient (DDPG) method to solve the long-term mixed-integer linear programming problems. In [19], Chen et al. proposed a cooperative edge caching scheme in VEC which considered the location-based contents and the popular contents, while designing an optimal scheme for cooperative content placement based on an ant colony algorithm to minimize the total transmission delay and cost. In [20], Yao et al. designed a cooperative edge caching scheme with consistent hash and mobility prediction in VEC to predict the path of each vehicle, and also proposed a cache replacement policy based on the content popularity to decide the priorities of collaborative contents. In [21], Wang et al. proposed a cooperative edge caching scheme in VEC based on the long short-term memory (LSTM) networks, which caches the predicted contents in RSUs or other vehicles and thus reduces the content transmission delay. In [22], Gupta et al. proposed a cooperative caching scheme that jointly considers cache location, content popularity and predicted rating of contents to make caching decision based on the non-negative matrix factorization, where it employs a legitimate user authorization to ensure the secure delivery of cached contents. In [23], Yao et al. proposed a cooperative caching scheme based on the mobility prediction and drivers’ social similarities in VEC, where the regularity of vehicles’ movement behaviors are predicted based on the hidden markov model to improve the caching performance. In [24], Wu et al. proposed a hybrid service provisioning framework and cooperative caching scheme in VEC to solve the profit allocation problem among the content providers (CPs), and proposed an optimization model to improve the caching performance in managing the caching resources. In [25], Yao et al. proposed a cooperative caching scheme based on mobility prediction, where the popular contents may be cached in the mobile vehicles within the coverage area of hot spot. They also designed a cache replacement scheme according to the content popularity to solve the limited caching capacity problem for each edge cache device. In [26], Zhang et al. proposed a cooperative edge caching architecture that focuses on the mobility-aware caching, where the vehicles cache the contents with base stations collaboratively. They also introduced a vehicle-aided edge caching scheme to improve the capability of edge caching. In [27], Liu et al. designed a cooperative caching scheme that allows vehicles to search the unrequested contents. This scheme facilitates the content sharing among vehicles and improves the service performance. In [28], Wang et al. proposed a VEC caching scheme to reduce the total transmission delay. This scheme extends the capability of the data center from the core network to the edge nodes by cooperatively caching popular contents in different CPs. It minimizes the VUs’ average delay according to an iterative ascending price method. In [29], Liu et al. proposed a real-time caching scheme in which edge devices cooperate to improve the caching resource utilization. In addition, they adopted the DRL framework to optimize the problem of searching requests and utility models to guarantee the search efficiency. In [30], Ko et al. proposed an adaptive scheduling scheme consisting of the centralized scheduling mechanism, ad hoc scheduling mechanism and cluster management mechanism to exploit the ad hoc data sharing among different RSUs. In [31], Cui et al. proposed a privacy-preserving data downloading method in VEC, where the RSUs can find popular contents by analyzing encrypted requests of nearby vehicles to improve the downloading efficiency of the network. In [32], Luo et al. designed a communication, computation and cooperative caching framework, where computing-enabled RSUs provide computation and bandwidth resource to the VUs to minimize the data processing cost in VEC.

As mentioned above, no other works has considered the vehicle mobility and privacy of VUs simultaneously to design cooperative caching schemes in VEC, which motivates us to propose a mobility-aware cooperative caching in VEC based on the asynchronous FL and DRL.

Refer to caption
Figure 1: VEC scenario

III System Model

III-A System Scenario

As shown in Fig. 1, we consider a three-tier VEC in an urban scenario that consists of a local RSU, a neighboring RSU, a MBS attached with a cloud and some vehicles moving in the coverage area of the local RSU. The top tier is the MBS deployed at the center of the VEC, while middle tier is the RSUs deployed in the coverage area of the MBS. They are placed on one side of the road. The bottom tier is the vehicles driving within the coverage area of the RSUs.

Each vehicle stores a large amount of VUs’ historical data, i.e., local data. Each data is a vector reflecting different information of a VU, including the VU’s personal information such as identity (ID) number, gender, age and postcode, the contents that the VU may request, as well as the VU’s ratings for the contents where a larger rating for a content indicates that the VU is more interested in the content. Particularly, the rating for a content may be 0, which means that it is not popular or is not requested by VUs. Each vehicle randomly chooses a part of the local data to form a training set while the rest is used as a testing set. The time duration of vehicles within the coverage area of the MBS is divided into rounds. For each round, each vehicle randomly selects contents from its testing set as the requested contents, and sends the request information to the local RSU to fetch the contents at the beginning of each round. We consider the MBS has abundant storage capacity and caches all available contents, while the limited storage capacity of each RSU can only accommodate part of contents. Therefore, the vehicle fetches each of the requested content from the local RSU, neighboring RSU or MBS in different conditions. Specifically,

III-A1 Local RSU

If a requested content is cached in the local RSU, the local RSU sends back the requested content to the vehicle. In this case the vehicle fetches the content from the local RSU.

III-A2 neighboring RSU

If a requested content is not cached in the local RSU, the local RSU transfers the request to the neighboring RSU, and the neighboring RSU sends the content to the local RSU if it caches the requested content. Afterward, the local RSU sends back the content to the vehicle. In this case the vehicle fetches the content from the neighboring RSU.

III-A3 MBS

If a content is neither cached in the local RSU nor the neighboring RSU, the vehicle sends the request to the MBS that directly sends back the requested content to the vehicle. In this case, the VU fetches the content from the MBS.

III-B Mobility Model of Vehicles

The model assumes that all vehicles drive in the same direction and vehicles arrive at a local RSU, following a Poisson distribution with the arrival rate λv\lambda_{v}. Once a vehicle enters the coverage of the local RSU, it sends request information to the local RSU. Each vehicle keeps the same mobility characteristics including position and velocity within a round and may change its mobility characteristics at the beginning of each round. The velocity of different vehicles follows an independent identically distribution. The velocity of each vehicle is generated by a truncated Gaussian distribution, which is flexible and consistent with the real dynamic vehicular environment. For round rr, the number of vehicles driving in the coverage area of the local RSU is NrN^{r}. The set of NrN^{r} vehicles are denoted as 𝕍r={V1r,V2r,,Vir,,VNrr}\mathbb{V}^{r}=\left\{V_{1}^{r},V_{2}^{r},\ldots,V_{i}^{r},\ldots,V_{N^{r}}^{r}\right\}, where VirV_{i}^{r} is vehicle ii driving in the local RSU (1iNr)(1\leq i\leq N^{r}). Let {U1r,U2r,,Uir,,UNrr}\left\{U_{1}^{r},U_{2}^{r},\ldots,U_{i}^{r},\ldots,U_{N^{r}}^{r}\right\} be the velocities of all vehicles driving in the local RSU, where UirU_{i}^{r} is velocity of VirV_{i}^{r}. According to [33], the probability density function of UirU_{i}^{r} is expressed as

f(Uir)={e12σ2(Uirμ)22πσ2(erf(Umaxμσ2)erf(Uminμσ2)),UminUirUmax,0otherwise.f({U_{i}^{r}})=\left\{\begin{aligned} \frac{{{e^{-\frac{1}{{2{\sigma^{2}}}}{{({U_{i}^{r}}-\mu)}^{2}}}}}}{{\sqrt{2\pi{\sigma^{2}}}(erf(\frac{{{U_{\max}}-\mu}}{{\sigma\sqrt{2}}})-erf(\frac{{{U_{\min}}-\mu}}{{\sigma\sqrt{2}}}))}},\\ {U_{min}}\leq{U_{i}^{r}}\leq{U_{max}},\\ 0\qquad\qquad\qquad\qquad\quad otherwise.\end{aligned}\right. (1)

where UmaxU_{\max} and UminU_{\min} are the maximum and minimum velocity threshold of each vehicle, respectively, and erf(Uirμσ2)erf\left(\frac{U_{i}^{r}-\mu}{\sigma\sqrt{2}}\right) is the Gauss error function of UirU_{i}^{r} under the mean μ\mu and variance σ2\sigma^{2}.

III-C Communication Model

The communications between the local RSU and neighboring RSU adopt the wired link. Each vehicle keeps the same communication model during a round and changes its communication model for different rounds. When the round is rr, the channel gain of VirV_{i}^{r} is modeled as [34]

hir(dis(x,Vir))=αir(dis(x,Vir))gir(dis(x,Vir)),\displaystyle h_{i}^{r}(dis(x,V_{i}^{r}))=\alpha_{i}^{r}(dis(x,V_{i}^{r}))g_{i}^{r}(dis(x,V_{i}^{r})), (2)
x=S,M,\displaystyle x=S,M,

where x=Sx=S means the local RSU and x=Mx=M means the MBS, dis(x,Vir)dis(x,V_{i}^{r}) is the distance between the local RSU//MBS and VirV_{i}^{r}, αir(dis(x,Vir))\alpha_{i}^{r}(dis(x,V_{i}^{r})) is the path loss between the local RSU//MBS and VirV_{i}^{r}, and gir(dis(x,Vir))g_{i}^{r}(dis(x,V_{i}^{r})) is the shadowing channel fading between the local RSU//MBS and VirV_{i}^{r}, which follows a Log-normal distribution.

Each RSU communicates with the vehicles in its coverage area through vehicle to RSU (V2R) link, while the MBS communicates with vehicles through vehicle to base station (V2B) link. Since the distances between the local RSU//MBS and VirV_{i}^{r} are different in different rounds, V2R//V2B link suffers from different channel impairments, and thus transmit with different transmission rates in different rounds. The transmission rates under V2R and V2B link are calculated as follows.

According to the Shannon theorem, the transmission rate between the local RSU and VirV_{i}^{r} is calculated as [35]

RR,ir=Blog2(1+pBhir(dis(S,Vir))σc2),R_{R,i}^{r}=B\log_{2}\left(1+\frac{p_{B}h_{i}^{r}(dis(S,V_{i}^{r}))}{\sigma_{c}^{2}}\right), (3)

where BB is the available bandwidth, pBp_{B} is the transmit power level used by the local RSU and σc2\sigma_{c}^{2} is the noise power.

Similarly, the transmission rate between the MBS and VirV_{i}^{r} is calculated as

RB,ir=Blog2(1+pMhir(dis(M,Vir))σc2),R_{B,i}^{r}=B\log_{2}\left(1+\frac{p_{M}h_{i}^{r}(dis(M,V_{i}^{r}))}{\sigma_{c}^{2}}\right), (4)

where pMp_{M} is the transmit power level used by MBS.

Refer to caption
Figure 2: Asynchronous FL

IV Cooperative Caching Scheme

In this section, we propose a cooperative cache scheme to optimize the content transmission delay in each round rr. We first propose an asynchronous FL algorithm to protect VU’s information and obtain an accurate model. Then we propose an algorithm to predict the popular contents based on the obtained model. Finally, we present a DRL based algorithm to determine the optimal cooperative caching according to the predicted popular contents. Next, we will introduce the asynchronous FL algorithm, the popular content prediction algorithm and the DRL-based algorithm, respectively.

IV-A Asynchronous Federated Learning

As shown in Fig. 2, the asynchronous FL algorithm consists of 5 steps as follows.

IV-A1 Select Vehicles


The main goal of this step is to select the vehicles whose staying time in the local RSU is long enough to ensure they can participate in the asynchronous FL and complete the training process.

Each vehicle first sends its mobility characteristics including its velocity and position (i.e., the distance to the local RSU and distance it has traversed within the coverage of the local RSU), then the local RSU selects vehicles according to the staying time that is calculated based on the vehicle’s mobility characteristics. The staying time of VirV_{i}^{r} in the local RSU is calculated as

Tr,istaying=(LsPir)/Uir,T_{r,i}^{staying}=\left(L_{s}-P_{i}^{r}\right)/U_{i}^{r}, (5)

where LsL_{s} is the coverage range of the local RSU, PirP_{i}^{r} is the distance that VirV_{i}^{r} has traversed within the coverage of the local RSU.

The staying time of VirV_{i}^{r} should be larger than the sum of the average training time TtrainingT_{training} and inference time TinferenceT_{inference} to guarantee that VirV_{i}^{r} can complete the training process. Therefore, if Tr,istaying>Ttraining+TinferenceT_{r,i}^{staying}>T_{training}+T_{inference}, the local RSU selects VirV_{i}^{r} to participate in asynchronous FL training. Otherwise, VirV_{i}^{r} is ignored.

IV-A2 Download Model


In this step, the local RSU will generate the global model ωr\omega^{r}. For the first round, the local RSU initializes a global model based on the AE, which can extract the hidden features used for popular content prediction. In each round, the local RSU updates the global model and transfers the global model ωr\omega^{r} to all the selected vehicles in the end.

IV-A3 Local Training


In this step, each vehicle in the local RSU sets the downloaded global model ωr\omega^{r} as the initial local model and updates the local model iteratively through training. Afterward, the updated local model will be the feedback to the local RSU. For each iteration kk, VirV_{i}^{r} randomly samples some training data ni,krn_{i,k}^{r} from the training set. Then, it uses ni,krn_{i,k}^{r} to train the local model based on the AE that consists of an encoder and a decoder. Let Wi,kr,eW_{i,k}^{r,e} and bi,kr,eb_{i,k}^{r,e} be the weight matrix and bias vector of the encoder for iteration kk, respectively, Wi,kr,dW_{i,k}^{r,d} and bi,kr,db_{i,k}^{r,d} be the weight matrix and bias vector of the decoder for iteration kk, respectively. Thus the local model of Vi,jrV_{i,j}^{r} for iteration kk is expressed as ωi,kr={Wi,kr,e,bi,kr,e,Wi,kr,d,bi,kr,d}\omega_{i,k}^{r}=\{W_{i,k}^{r,e},b_{i,k}^{r,e},W_{i,k}^{r,d},b_{i,k}^{r,d}\}. For each training data xx in ni,krn_{i,k}^{r}, the encoder first maps the original training data xx to a hidden layer to obtain the hidden feature of xx, i.e., z(x)=f(Wi,kr,ex+bi,kr,e)z(x)=f\left(W_{i,k}^{r,e}x+b_{i,k}^{r,e}\right). Then the decoder calculates the reconstructed input x^\hat{x}, i.e., x^=g(Wi,kr,dz(x)+bi,kr,d)\hat{x}=g\left(W_{i,k}^{r,d}z(x)+b_{i,k}^{r,d}\right), where f()f{(\cdot)} and g()g{(\cdot)} are the nonlinear and logical activation function [36]. Afterward, the loss function of data xx under the local model ωi,kr\omega_{i,k}^{r} is calculated as

l(ωi,kr;x)=(xx^)2,l\left(\omega_{i,k}^{r};x\right)=(x-\hat{x})^{2}, (6)

where ωi,1r=ωr\omega^{r}_{i,1}=\omega^{r}.

After the loss functions of all the data in ni,krn_{i,k}^{r} are calculated, the local loss function for iteration kk is calculated as

f(ωi,kr)=1|ni,kr|xni,krl(ωi,kr;x),f(\omega_{i,k}^{r})=\frac{1}{\left|n_{i,k}^{r}\right|}\sum_{x\in n_{i,k}^{r}}l\left(\omega_{i,k}^{r};x\right), (7)

where |ni,kr|\left|n_{i,k}^{r}\right| is the number of data in ni,krn_{i,k}^{r}.

Then the regularized local loss function is calculated to reduce the deviation between the local model ωi,kr\omega_{i,k}^{r} and global model ωr\omega^{r} to improve the algorithm convergence, i.e.,

g(ωi,kr)=f(ωi,kr)+ρ2ωrωi,kr2,g\left(\omega_{i,k}^{r}\right)=f\left(\omega_{i,k}^{r}\right)+\frac{\rho}{2}\left\|\omega^{r}-\omega_{i,k}^{r}\right\|^{2}, (8)

where ρ\rho is the regularization parameter.

Let g(ωi,kr)\nabla g(\omega_{i,k}^{r}) be the gradient of g(ωi,kr)g\left(\omega_{i,k}^{r}\right), which is referred to as the local gradient. In the previous round, some vehicles may upload the updated local model unsuccessfully due to the delayed training time, and thus adversely affect the convergence of global model [37][38][39]. Here, these vehicles are called stragglers and the local gradient of a straggler in the previous round is referred to as the delayed local gradient. To solve this problem, the delayed local gradient will be aggregated into the local gradient of the current round rr. Thus, the aggregated local gradient can be calculated as

ζi,kr=g(ωi,kr)+βgid,\nabla\zeta_{i,k}^{r}=\nabla g(\omega_{i,k}^{r})+\beta\nabla g_{i}^{d}, (9)

where β\beta is the decay coefficient and gid\nabla g_{i}^{d} is the delayed local gradient. Note that gid=0\nabla g_{i}^{d}=0 if VirV_{i}^{r} uploads successfully in the previous round.

Then the local model for the next iteration is updated as

ωi,k+1r=ωrηlrζi,kr,\omega^{r}_{i,k+1}=\omega^{r}-\eta_{l}^{r}\nabla\zeta_{i,k}^{r}, (10)

where ηlr\eta_{l}^{r} is the local learning rate in round rr, which is calculated as

ηlr=ηlmax{1,log(r)},\eta_{l}^{r}=\eta_{l}\max\{1,\log(r)\}, (11)

where ηl\eta_{l} is the initial value of local learning rate.

Then iteration kk is finished and VirV_{i}^{r} randomly samples some training data again to start the next iteration. When the number of iterations reaches the threshold ee, VirV_{i}^{r} completes the local training and upload the updated local model ωir\omega_{i}^{r} to the local RSU.

IV-A4 Upload Model


Each vehicle uploads its updated local model to the local RSU after it completes local training.

IV-A5 Asynchronous aggregation


If the local model of VirV_{i}^{r}, i.e., ωir\omega^{r}_{i}, is the first model received by the local RSU, the upload is successful and the local RSU updates the global model. Otherwise, the local RSU drops ωir\omega^{r}_{i} and thus the upload is not successful.

When the upload is successful, the local RSU updates the global model ωr\omega^{r} by weighted averaging as follows:

1 Set global model ωr\omega^{r};
2 for each round rr from 11 to RmaxR^{max} do
3       for each vehicle Vir𝕍rV^{r}_{i}\in\mathbb{V}^{r} in parallel do
4             Tr,istaying=(LsPir)/UirT_{r,i}^{staying}=\left(L_{s}-P_{i}^{r}\right)/U_{i}^{r};
5             if  Tr,istaying>Ttraining+TinferenceT_{r,i}^{staying}>T_{training}+T_{inference} then
6                   VirV^{r}_{i} is selected to participate in asynchronous FL training;
7            
8      for each selected vehicle VirV^{r}_{i} do
9             ωirVehicle Updates(ωr,i)\omega^{r}_{i}\leftarrow\textbf{Vehicle Updates}(\omega^{r},i);
10             Upload the local model ωir\omega^{r}_{i} to the local RSU;
11            
12      Receive the updated model ωir\omega^{r}_{i};
13       Calculate the weight of the asynchronous aggregation χi\chi_{i} based on Eq. (14);
14       Update the global model based on Eq. (12);
15       return wr+1w^{r+1}
16Vehicle Update(w,iw,i):
17 Input: wrw^{r}
18 Calculate the local learning rate ηlr\eta_{l}^{r} based on Eq. (11);
19 for each local epoch k from 11 to ee do
20       Randomly samples some data ni,krn_{i,k}^{r} from the training set;
21       for each data xni,krx\in n_{i,k}^{r}  do
22             Calculate the loss function of data xx based on Eq. (6);
23            
24      Calculate the local loss function for interation kk based on Eq. (7);
25       Calculate the regularized local loss function g(ωi,kr)g\left(\omega_{i,k}^{r}\right) based on Eq. (8);
26       Aggregate local gradient ζi,kr\nabla\zeta_{i,k}^{r} based on Eq. (9);
27       Update the local model ωi,kr\omega^{r}_{i,k} based on Eq. (10);
28      
29Set ωir=ωi,er\omega^{r}_{i}=\omega^{r}_{i,e};
30 return ωir\omega^{r}_{i}
Algorithm 1 The Asynchronous Federated Learning Algorithm
ωr=ωr1+dirdrχiωir,\omega^{r}=\omega^{r-1}+\frac{d_{i}^{r}}{d^{r}}\chi_{i}\omega^{r}_{i}, (12)

where dird_{i}^{r} is the size of local data in VirV_{i}^{r}, drd^{r} is the total local data size of the selected vehicles and χi\chi_{i} is the weight of the asynchronous aggregation for VirV_{i}^{r}. The weight of the asynchronous aggregation χi\chi_{i} is calculated by considering the traversed distance of VirV_{i}^{r} in the coverage area of the local RSU and the content transmission delay from local RSU to VirV_{i}^{r} to improve the accuracy of the global model and reduce the content transmission delay. Specifically, if the traversed distance of VirV_{i}^{r} is large, it may have long available time to participate in the training, thus its local model should occupy large weight for aggregation to improve the accuracy of the global model. In addition, the content transmission delay from local RSU to VirV_{i}^{r} is important because VirV_{i}^{r} would finally download the content from the local RSU when the content is either cached in the local or neighboring RSU. Thus, if the content transmission delay from local RSU to VirV_{i}^{r} is small, its local model should also occupy large weight for aggregation to reduce the content transmission delay. The weight of the asynchronous aggregation χi\chi_{i} is calculated as

χi=μ1(LsPir)+μ2sRR,ir,\chi_{i}=\mu_{1}{(L_{s}-P_{i}^{r})}+\mu_{2}\frac{s}{R_{R,i}^{r}}, (13)

where μ1\mu_{1} and μ2\mu_{2} are coefficients of the position weight and transmission weight, respectively (i.e., μ1+μ2=1\mu_{1}+\mu_{2}=1), ss is the size of each content. Thus, the content transmission delay from local RSU to VirV_{i}^{r} is affected by the transmission rate between the local RSU and VirV_{i}^{r}, i.e., RR,irR_{R,i}^{r}. We can further calculate χi\chi_{i} based on the normalized LsPirL_{s}-P_{i}^{r} and RR,irR_{R,i}^{r}, i.e.,

χi=μ1(LsPir)Ls+μ2RR,irmaxkNr(RR,kr).\chi_{i}=\mu_{1}\frac{(L_{s}-P_{i}^{r})}{L_{s}}+\mu_{2}\frac{R_{R,i}^{r}}{\max_{k\in N^{r}}\left(R_{R,k}^{r}\right)}. (14)

Since the local RSU knows dis(S,Vir)dis(S,V_{i}^{r}) and PirP_{i}^{r} for each vehicle ii at the beginning of the asynchronous FL, the local RSU can calculate RR,irR_{R,i}^{r} according to Eqs. (LABEL:eq2) and (3), and further calculate χi\chi_{i} according to Eq. (13).

Up to now, the asynchronous FL in round rr is finished and the updated global model ωr\omega^{r} is obtained. The process of the asynchronous FL algorithm is shown in Algorithm 1 for ease of understanding, where RmaxR^{max} is the maximum number of rounds, ee is the maximum number of local epochs. Then, the local RSU sends the obtained model to each vehicle to predict popular contents.

IV-B Popular Content Prediction

Refer to caption
Figure 3: Popular content prediction process

In this subsection, we propose an algorithm to predict the popular contents. As shown in Fig. 3, the popular content prediction algorithm consists of the 4 steps as follows.

IV-B1 Data Preprocessing


The VU’s rating for a content is 0 when VU is uninterested in the content or has not requested a content. Thus, it is difficult to differentiate if a content is an interesting one for the VU when its rating is 0. Marking all contents with rating 0 as uninterested contents is a bias prediction. Therefore, we adopt the obtained model to reconstruct the rating for each content in the first step, which is described as follows.

Each vehicle abstracts a rating matrix from the data in the testing set, where the first dimension of the matrix is VUs’ ID and the second dimension is VU’s ratings for all contents. Denote the rating matrix of VirV_{i}^{r} as 𝑹ir\bm{R}_{i}^{r}. Then, the AE with the obtained model is adopted to reconstruct 𝑹ir\bm{R}_{i}^{r}. The rating matrix 𝑹ir\bm{R}_{i}^{r} is used as the input data for the AE that outputs the reconstructed rating matrix 𝑹^ir\hat{\bm{R}}_{i}^{r}. Since 𝑹^ir\hat{\bm{R}}_{i}^{r} is reconstructed based on the obtained model which reflects the hidden features of data, 𝑹^ir\hat{\bm{R}}_{i}^{r} can be used to approximate the rating matrix 𝑹ir\bm{R}_{i}^{r}. Then, similar to the rating matrix, each vehicle also abstracts a personal information matrix from the data of the testing set, where the first dimension of the matrix is VUs’ ID and the second dimension is VU’s personal information.

IV-B2 Cosine Similarity


VirV_{i}^{r} counts the number of the nonzero ratings for each VU in 𝑹ir\bm{R}_{i}^{r} and marks the VUs with the 11//mm largest numbers as active VUs. Then, each vehicle combines 𝑹^ir\hat{\bm{R}}_{i}^{r} and the personal information matrix (denoted as 𝑯ir\bm{H}_{i}^{r}) to calculate the similarity between each active VU and other VUs. The similarity between an active VU aa and bb is calculated according to cosine similarity [40]

sima,br,i=cos(𝑯ir(a,:),𝑯ir(b,:))\displaystyle\operatorname{sim}_{a,b}^{r,i}=\cos\left(\bm{H}_{i}^{r}(a,:),\bm{H}_{i}^{r}(b,:)\right) (15)
=𝑯ir(a,:)𝑯ir(b,:)T𝑯ir(a,:)2×𝑯ir(b,:)2\displaystyle=\frac{\bm{H}_{i}^{r}(a,:)\cdot\bm{H}_{i}^{r}(b,:)^{T}}{\left\|\bm{H}_{i}^{r}(a,:)\right\|_{2}\times\left\|\bm{H}_{i}^{r}(b,:)\right\|_{2}}

where 𝑯ir(a,:)\bm{H}_{i}^{r}(a,:) and 𝑯ir(b,:)\bm{H}_{i}^{r}(b,:) are the vectors corresponding to the active VU aa and bb in the combined matrixes, respectively, 𝑯ir(a,:)2\left\|\bm{H}_{i}^{r}(a,:)\right\|_{2} and 𝑯ir(b,:)2\left\|\bm{H}_{i}^{r}(b,:)\right\|_{2} are the 2-norm of 𝑯ir(a,:)\bm{H}_{i}^{r}(a,:) and 𝑯ir(b,:)\bm{H}_{i}^{r}(b,:), respectively. Then for each active VU aa, VirV_{i}^{r} selects the VUs with the KK largest similarities as the KK neighboring VUs of VU aa. The ratings of the KK neighboring VUs also reflect the preferences of VU aa to a certain extent.

IV-B3 Interested Contents


After determining the neighboring VUs of active VUs, in 𝑹ir\bm{R}_{i}^{r}, the vectors of neighboring VUs for each active VU are abstracted to construct a matrix 𝑯K\bm{H}_{K}, where the first dimension of 𝑯K\bm{H}_{K} is the IDs of the neighboring VUs for active VUs, while the second dimension of 𝑯K\bm{H}_{K} is the ratings of the contents from neighboring VUs. In 𝑯K\bm{H}_{K}, a content with a VU’s nonzero rating is regarded as the VU’s interested content. Then the number of interested contents is counted for each VU, where the counted number of a content is referred to as the content popularity of the content. VirV_{i}^{r} selects the contents with the FcF_{c} largest content popularity as the predicted interested contents.

IV-B4 Popular Contents


After vehicle in the local RSU uploads their predicted interested contents, the local RSU collects and compares the predicted interested contents uploaded from all vehicles to select the contents with the FcF_{c} largest content popularity as the popular contents. The proposed popular content prediction algorithm is illustrated in Algorithm 2, where r\mathbb{C}^{r} is the set of the popular contents and ir\mathbb{C}_{i}^{r} is the set of interested contents of VirV^{r}_{i}.

1 Input: ωr\omega^{r}
2 for each vehicle Vir𝕍rV^{r}_{i}\in\mathbb{V}^{r} do
3       Construct the rating matrix 𝑹ir\bm{R}_{i}^{r} and personal information matrix;
4       𝑹^irAE(ωr,𝑹ir)\hat{\bm{R}}_{i}^{r}\leftarrow AE(\omega^{r},\bm{R}_{i}^{r});
5       Combine 𝑹^ir\hat{\bm{R}}_{i}^{r} and information matrix as 𝑯ir\bm{H}_{i}^{r};
6       irVehicle Predicts(𝑯ir,i)\mathbb{C}_{i}^{r}\leftarrow\textbf{Vehicle Predicts}(\bm{H}_{i}^{r},i);
7       Uploads ir\mathbb{C}_{i}^{r} to the local RSU;
8      
9Compare received contents and select the FcF_{c} most interested contents into r\mathbb{C}^{r}.
10 return r\mathbb{C}^{r}
11 Vehicle Predicts(Hir,i)(\bm{H}_{i}^{r},i):
12 Input: Hir,i1,2,,Nr\bm{H}_{i}^{r},i\in{1,2,...,N^{r}}
13 Calculate the similarity between VirV_{i}^{r} and other vehicles based on Eq. (15);
14 Select the first KK vehicles with the largest similarity as neighboring vehicles of VirV_{i}^{r};
15 Construct reconstructed rating matrixes of KK neighboring vehicles as 𝑯K\bm{H}_{K};
16 Select the FcF_{c} most interested contents as ir\mathbb{C}_{i}^{r};
17 return ir\mathbb{C}_{i}^{r}
Algorithm 2 The Popular Content Prediction Algorithm

The cache capacity of the each RSU cc, i.e., the largest number of contents that each RSU can accommodate, is usually smaller than FcF_{c}. Next, we will propose a cooperative caching to determine where the predicted popular contents can be cached.

IV-C Cooperative Caching Based on DRL

We consider the computation capability of each RSU is powerful and the cooperative caching can be determined within a short time. The main goal is to find an optimal cooperative caching based on DRL to minimize the content transmission delay. Next, we will formulate the DRL framework and then introduce the DRL algorithm.

IV-C1 DRL Framework


The DRL framework includes state, action and reward. The training process is divided into slots. For the current slot tt, the local RSU observes the current state s(t)s(t) and decides the current action a(t)a(t) based on s(t)s(t) according to a policy π\pi, which is used to generate the action based on the state at each slot. Then the local RSU can obtain the current reward r(t)r(t) and observes the next state s(t+1)s(t+1) that is transited from the current state s(t)s(t). We will design s(t)s(t), a(t)a(t) and r(t)r(t), respectively, for this DRL framework.

State

We consider the contents cached by the local RSU as the current state s(t)s(t). In order to focus on the contents with high popularity, the contents of the state space s(t)s(t) are sorted in descending order based on the predicted content popularity of the FcF_{c} popular contents, thus the current state can be expressed as s(t)=(s1,s2,,sc)s(t)=\left(s_{1},s_{2},\ldots,s_{c}\right), where sis_{i} is the iith most popular content.

Action

Action a(t)a(t) represents whether the contents cached in the local RSU need to be relocated or not. In the FcF_{c} predicted popular contents, the contents that are not cached in the local RSU form a set \mathbb{N}. If a(t)=1a(t)=1, the local RSU randomly selects n(n<c)n(n<c) contents from \mathbb{N} and exchanges them with the nn lowest popular contents cached in the local RSU, and then sorts the contents in a descending order based on their content popularity to get s(t+1)s(t+1). Neighboring RSU also randomly samples cc contents from FcF_{c} popular contents that do not belong to s(t+1)s(t+1) as the cached contents of the neighboring RSU within the next slot t+1t+1. We denote the contents cached by the neighboring RSU as sn(t+1)s_{n}(t+1). If a(t)=0a(t)=0, the contents cached in the local RSU will not be relocated and the neighboring RSU also determines its cached contents, similar to the case when a(t)=1a(t)=1.

Reward

The reward function r(t)r(t) is designed to minimize the total content transmission delay to fetch the contents requested by vehicles. Note that the local RSU has recorded all the contents requested by the vehicles. The content transmission delays to fetch a requested content ff are different when the content is cached in different places.

If content ff is cached in the local RSU, i.e., fs(t)f\in s(t), the local RSU transmits content ff to VirV_{i}^{r}, thus the content transmission delay is calculated as

dR,i,fr=sRR,ir,d_{R,i,f}^{r}=\frac{s}{R_{R,i}^{r}}, (16)

where RR,irR_{R,i}^{r} is the transmission rate between the local RSU and VirV_{i}^{r}, which has been calculated by Eq. (3).

If content ff is cached in the neighboring RSU, i.e., fsn(t)f\in s_{n}(t), the neighboring RSU sends the content to the local RSU that forwards the content to VirV_{i}^{r}, thus the transmission delay is calculated as

d¯R,i,fr=sRR,ir+sRRR,\bar{d}_{R,i,f}^{r}=\frac{s}{R_{R,i}^{r}}+\frac{s}{R_{R-R}}, (17)

where RRRR_{R-R} is the transmission rate between the local RSU and neighboring RSU, which is a constant transmission rate in the wired link.

If content ff is neither cached in the local RSU nor in the neighboring RSU, i.e., fs(t) and fsn(t)f\notin s(t)\text{ and }f\notin s_{n}(t), the MBS transmits content ff to VirV_{i}^{r}, thus the content transmission delay is expressed as

dB,i,fr=sRB,ir,d_{B,i,f}^{r}=\frac{s}{R_{B,i}^{r}}, (18)

where RB,irR_{B,i}^{r} is the transmission rate between the MBS and VirV_{i}^{r}, which is calculated according to Eq. (4).

In order to clearly distinguish the content transmission delays under different conditions, we set the reward that VirV_{i}^{r} fetches content ff at slot tt as

ri,fr(t)={eλ1dR,i,frfs(t)e(λ1dR,i,fr+λ2d¯R,i,fr)fsn(t)eλ3dM,i,frfs(t) and fsn(t),r_{i,f}^{r}(t)=\begin{cases}e^{-\lambda_{1}d_{R,i,f}^{r}}&f\in s(t)\\ e^{-\left(\lambda_{1}d_{R,i,f}^{r}+\lambda_{2}\bar{d}_{R,i,f}^{r}\right)}&f\in s_{n}(t)\\ e^{-\lambda_{3}d_{M,i,f}^{r}}&f\notin s(t)\text{ and }f\notin s_{n}(t)\end{cases}, (19)

where λ1+λ2+λ3=1\lambda_{1}+\lambda_{2}+\lambda_{3}=1 and λ1<λ2λ3\lambda_{1}<\lambda_{2}\ll\lambda_{3}.

Thus the reward function r(t)r(t) is calculated as

r(t)=i=1Nrf=1Firri,fr(t),r(t)=\sum_{i=1}^{N^{r}}\sum_{f=1}^{F_{i}^{r}}r_{i,f}^{r}(t), (20)

where FirF_{i}^{r} is the number of requested contents from VirV_{i}^{r}.

IV-C2 DRL Algorithm


As mentioned above, the next state will change when the action is 11. The dueling DQN algorithm is a particular algorithm which works for the cases where the partial actions have no relevant effects on subsequent states [41]. Specifically, the dueling DQN decomposes the Q-value into two functions VV and AA. Function VV is the state value function that is unrelated to the action, while AA is the action advantage function that is related to the action. Therefore, we adopt the dueling DQN algorithm to solve this problem.

1 Initialize replay buffer 𝒟\mathcal{D}, the parameters of the prediction network θ\theta, the parameters of the target network θ\theta^{\prime};
2 Input: requested contents from all vehicles in the local RSU for round rr
3 for episode from 11 to TsT_{s} do
4       Local RSU randomly caches cc contents from FcF_{c} popular contents;
5       Neighboring RSU randomly caches cc contents from FcF_{c} popular contents that are not cached in the local RSU;
6       for slot from 11 to NsN_{s} do
7             Observe the state s(t);s(t);
8             Calculate the Q-value of prediction network Q(s(t),a;θ)Q(s(t),a;\theta) based on Eq. (21);
9             Calculate the action a(t)a(t) based on Eq. (22);
10             Obtain state s(t+1)s(t+1) after executing action a(t)a(t);
11             Obtain reward r(t)r(t) based on Eqs. (16) - (20);
12             Store tuple (s(t),a(t),r(t),s(t+1))(s(t),a(t),r(t),s(t+1)) in 𝒟\mathcal{D};
13             if number of tuples in 𝒟\mathcal{D} is larger than II then
14                   Randomly sample a minibatch of II tuples from 𝒟\mathcal{D};
15                   for tuple ii from 11 to II do
16                         Calculate the Q-value function of target network Q(si,a;θ)Q^{\prime}(s^{i},a;\theta^{\prime}) based on Eq. (23);
17                         Calculate the target Q-value of the target network yiy^{i} based on Eq. (24);
18                         Calculate the loss function L(θ)L(\theta) based on Eq. (25);
19                        
20                  Calculate the gradient of loss function θL(θ)\nabla_{\theta}L(\theta) based on Eq. (26);
21                   Update parameters of the prediction network θ\theta based on Eq. (27);
22                  
23            if number of slots is MM then
24                  θ=θ\theta^{\prime}=\theta.
25            
26      
Algorithm 3 Cooperative Caching Based on Dueling DQN Algorithm

The dueling DQN includes a prediction network, a target network and a replay buffer. The prediction network evaluates the current state-action value (Q-value) function, while the target network generates the optimal Q-value function. Each of them consists of three layers, i.e., the feature layer, the state-value layer, and the advantage layer. The replay buffer 𝒟\mathcal{D} is adopted to cache the transitions for each slot. The dueling DQN algorithm is illustrated in Algorithm 3 and is described in detail as follow.

Refer to caption
Figure 4: The flow diagram of the dueling DQN

Firstly, the parameters of the prediction network θ\theta and the parameters of the target network θ\theta^{\prime} are initialized randomly. The requested contents from all vehicles in the local RSU for round rr as input (lines 1-2).

Then the algorithm is executed for TsT_{s} episodes. At the beginning of each episode, the local RSU randomly selects cc contents from FcF_{c} popular contents, and the neighboring RSU randomly selects cc contents from FcF_{c} popular contents that are not cached in the local RSU. Then the algorithm is executed iteratively from slots 11 to NsN_{s}. In each slot tt, the local RSU first observes state s(t)s(t) and then input s(t)s(t) to the prediction network, in which it goes through the feature layer, state-value layer and advantage layer, respectively. In the end, the prediction network outputs the state value function V(s(t);θ)V(s(t);\theta) and the action advantage function under each action aa, i.e., A(s(t),a;θ)A(s(t),a;\theta), respectively, where a{0,1}{a\in\{0,1\}}. Furthermore, the Q-value function of prediction network under each action aa is calculated as

Q(s(t),a;θ)=V(s(t);θ)+{A(s(t),a;θ)𝔼[A(s(t),a;θ)]}.\begin{aligned} Q(s(t),a;\theta)=V(s(t);\theta)+\{A(s(t),a;\theta)\\ -\mathbb{E}[A(s(t),a;\theta)]\}\\ \end{aligned}. (21)

In Eq. (21), the range of Q-values can be narrowed to remove redundant degrees of freedom by calculating the difference between the action advantage function A(s(t),a;θ)A(s(t),a;\theta) and the average value of the action advantage functions under all actions, i.e., 𝔼[A(s(t),a;θ)]\mathbb{E}[A(s(t),a;\theta)]. Thus, the stability of the algorithm can be improved.

Then action a(t)a(t) is chosen by the ε-greedy\varepsilon\text{-greedy} method, which is calculated as

a(t)=argmaxa{0,1}(Q(s(t),a;θ)).a(t)=\underset{a\in\{0,1\}}{\operatorname{argmax}}(Q(s(t),a;\theta)). (22)

Particularly, action a(1)a(1) is initialized as 11 at slot 11.

The local RSU calculates the reward r(t)r(t) according to Eqs. (16) - (20) and state s(t)s(t) transits to the next state s(t+1)s(t+1), then the local RSU observes s(t+1)s(t+1). Next, the neighboring RSU randomly samples cc popular contents that are not cached in s(t+1)s(t+1) as its cached contents, which is denoted as sn(t+1)s_{n}(t+1). The transition from s(t)s(t) to s(t+1)s(t+1) is denoted as tuple (s(t),a(t),r(t),s(t+1))(s(t),a(t),r(t),s(t+1)), which is then stored in the replay buffer 𝒟\mathcal{D}. When the number of the stored tuples in the replay buffer 𝒟\mathcal{D} is larger than II, the local RSU randomly samples II tuples from 𝒟\mathcal{D} to form a minibatch. Let (si,ai,ri,si),(i=1,2,3,,I)(s^{i},a^{i},r^{i},s^{\prime i}),(i=1,2,3,...,I) be the ii-th tuple in the mini-batch. Then SiS_{i} input each tuple into the prediction network and the target network (lines 3-12).

Next, we will introduce how parameters of prediction network θ\theta are updated. For tuple ii, the local RSU inputs sis^{i} into the target network, where it goes through the feature layer and outputs its feature. Then the feature is input to the state-value layer and the advantage layer, respectively, which output state value function V(si;θ)V^{\prime}(s^{i};\theta^{\prime}) and action advantage function A(si,a;θ)A^{\prime}(s^{i},a;\theta^{\prime}) under each action a{0,1}a\in\{0,1\}, respectively. Thus, the Q-value function of target network of tuple ii under each action aa is calculated as

Q(si,a;θ)=V(si;θ)+{A(si,a;θ)𝔼[A(si,a;θ)]}.\begin{aligned} &Q^{\prime}(s^{i},a;\theta^{\prime})=V^{\prime}(s^{i};\theta^{\prime})\\ &+\{A^{\prime}(s^{i},a;\theta^{\prime})-\left.\mathbb{E}\left[A^{\prime}\left(s^{i},a;\theta^{\prime}\right)\right]\right\}\\ \end{aligned}. (23)

Then the target Q-value of the target network of tuple ii is calculated as

yi=ri+γDmaxa{0,1}Q(si,a;θ),y^{i}=r^{i}+\gamma_{D}\max_{a\in\{0,1\}}Q^{\prime}(s^{i},a;\theta^{\prime}), (24)

where γD\gamma_{D} is the discount factor. The loss function is calculated as follows

L(θ)=1Ii=1I[(yiQ(si,ai,θ))2].L(\theta)=\frac{1}{I}\sum_{i=1}^{I}\left[(y^{i}-Q(s^{i},a^{i},\theta))^{2}\right]. (25)

The gradient of loss function θL(θ)\nabla_{\theta}L(\theta) for all sampled tuples is calculated as

θL(θ)=1Ii=1I[(yiQ(si,ai,θ))θiQ(si,ai,θ)].\nabla_{\theta}L(\theta)=\frac{1}{I}\sum_{i=1}^{I}[\left(y^{i}-Q(s^{i},a^{i},\theta)\right)\nabla_{\theta^{i}}Q(s^{i},a^{i},\theta)]. (26)

At the end of slot tt, the parameters of the prediction network θ\theta are updated as

θθηθθL(θ),\theta\leftarrow\theta-\eta_{\theta}\nabla_{\theta}L(\theta), (27)

where ηθ\eta_{\theta} is the learning rate of prediction network.

Up to now, the iteration in slot tt is completed, which will be repeated. During the iterations, the parameters of target network θ\theta^{\prime} are updated after a certain number of slots (MM), as the parameters of prediction network θ\theta. When the number of slots reaches NsN_{s}, this episode is finished and then the local RSU randomly caches cc contents from FcF_{c} popular contents to start the next episode. When the number of episodes reaches TsT_{s}, the algorithm will be terminated (lines 13-22). The flow diagram of the dueling DQN algorithm is shown in Fig. 4.

Finally, the local RSU and neighboring RSU cache popular contents according to the optimal cooperative caching, and then each vehicle fetches contents from the VEC. This round is finished after each vehicle has fetched contents and then the next round is started.

V Simulation and Analytical Results

TABLE I: Values of the parameters in the experiments.
Parameters of System Model
Parameter Value Parameter Value
BB 540540 kHz KK 1010
mm 33 pBp_{B} 3030 dBm
pMp_{M} 4343 dBm RR,RR_{R,R} 1515 Mbps
ss 100100 bytes TtrainingT_{training} 22s
TinferenceT_{inference} 0.50.5s UmaxU_{\max} 6060 km/h
UminU_{\min} 5050 km/h μ\mu 5555 km/h
σ\sigma 2.52.5km/h σc2\sigma_{c}^{2} 114-114 dBm
Parameters of Asynchronous FL
Parameter Value Parameter Value
LsL_{s} 10001000m β\beta 0.0010.001
ηl\eta_{l} 0.010.01 μ1\mu_{1} 0.50.5
μ2\mu_{2} 0.50.5 ρ\rho 0.00010.0001
Parameters of DRL
Parameter Value Parameter Value
II 3232 γD\gamma_{D} 0.990.99
ηθ\eta_{\theta} 0.010.01 λ1\lambda_{1} 0.00010.0001
λ2\lambda_{2} 0.40.4 λ3\lambda_{3} 0.59990.5999

We have evaluated the performance of the proposed CAFR scheme in this section.

V-A Settings and Dataset

We simulate a VEC environment on the urban road as shown in Fig. 1 and the simulation tool is Python 3.83.8. The communications between vehicle and RSU/MBS employ the 3rd Generation Partnership Project (3GPP) cellular V2X (C-V2X) architecture, where the parameters are set according to the 3GPP standard [34]. The simulation parameters are listed in Table I. A real-world dataset from the MovieLens website, i.e., MovieLens 1M, is used in the experiments. MovieLens 1M contains 1,000,2091,000,209 rating values for 3,8833,883 movies from 6,0406,040 anonymous VUs with movie ratings ranging from 0 to 11, where each VU rates at least 2020 movies [42]. MovieLens lM also provides personal information about VUs including ID number, gender, age and postcode. We randomly divide MovieLens lM data set to each vehicle as its local data. Each vehicle randomly chooses 99.8%99.8\% data from its local data as its training set and 0.2%0.2\% data as its testing set. For each round, each vehicle randomly samples a part of the movies from testing set as its requested contents.

V-B Performance Evaluation

We use cache hit ratio and the content transmission delay as performance metrics to evaluate the CAFR scheme. The cache hit rate is defined as the probability of fetching requested contents from the local RSU [43]. If a requested content is cached in the local RSU, it can be fetched directly from the local RSU, which is referred to as a cache hit, otherwise, it is referred to as a cache miss. Thus, the cache hit rate is calculated as

 cache hit radio =cache hits cache hits +cache misses ×100%.\text{ cache hit radio }=\frac{\text{cache hits }}{\text{cache hits }+\text{cache misses }}\times 100\%. (28)

The content transmission delay indicates the average delay for all vehicles to fetch contents, which is calculated as

content transmission delay=Dtotalthe number of vehicles ,\text{content transmission delay}=\frac{D^{\text{total}}}{\text{the number of vehicles }}, (29)

where DtotalD^{\text{total}} is the delay for all vehicles to fetch contents, and it is calculated by aggregating the content transmission delay for every vehicle to fetch contents.

We compare the CAFR scheme with other baseline schemes such as:

  • Random: Randomly selecting cc contents from the all contents to cache in the local and neighboring RSU.

  • c-ϵ\epsilon-greedy: Selecting the contents with cc largest numbers of requests based on probability 1ϵ1-\epsilon and selecting cc contents randomly based on probability ϵ\epsilon to cache in the local RSU. In our simulation, ϵ=0.1\epsilon=0.1.

  • Thompson sampling: For each round, the contents cached in the local RSU is updated based on the number of cache hits and cache misses in the previous round [9], and cc contents with the highest value are selected to cache in the local RSU.

  • FedAVG: Federated averaging (FedAVG) is a typical synchronous FL scheme where the local RSU needs to wait for the local model updates to update its global model according to weighted average method:

    ωr=i=1Nrdirdrωir.\omega^{r}=\sum_{i=1}^{N^{r}}\frac{d^{r}_{i}}{d^{r}}\omega^{r}_{i}. (30)
  • CAFR without DRL: Compared with the CAFR scheme, this scheme does not adopt the DRL algorithm to optimize caching scheme. Specifically, after predicting the popular contents, cc contents are randomly selected from the predicted popular contents to cache in the local RSU and neighboring RSU, respectively.

Refer to caption
Figure 5: Cache hit radio under different cache capacities

Now, we will evaluate the performance of the CAFR scheme through simulation experiments. In the following performance evaluation, each result is the average value of five experiments.

Fig. 5 shows the cache hit ratio of different schemes under different cache capacities of each RSU, where the result of CAFR is obtained when the vehicle density is 1515 vehicles/km (i.e., the number of vehicles is 15 per kilometer), and the results of other schemes are independent with the vehicle density. It can be seen that the cache hit ratio of all schemes increases with a larger capacity. This is because that the local RSU caches more contents with a larger capacity, thus the requested contents of vehicles are more likely to be fetched from the local RSU. Moreover, it is seen that the random scheme provides the worst cache hit ratio, because the scheme just selects contents randomly without considering the content popularity. In addition, CAFR and c-ϵ\epsilon-greedy outperform the random scheme and the thompson sampling. This is because that random and thompson sampling schemes do not predict the caching contents through learning, whereas CAFR and c-ϵ\epsilon-greedy decide the caching contents by observing the historical requested contents. Furthermore, CAFR outperforms c-ϵ\epsilon-greedy. This is because that CAFR captures useful hidden features from the data to predict the accurate popular contents.

Refer to caption
Figure 6: Content transmission delay under different cache capacities

Fig. 6 shows the content transmission delay of different schemes under different cache capacities of each RSU, where the vehicle density is 1515 vehicles/km. It is seen that the content transmission delays of all schemes decrease as the cache capacity increases. This is because that each RSU caches more contents as the cache capacity increases, and each vehicle fetches contents from local RSU and neighboring RSU with a higher possibility, thus reducing the content transmission delay. Moreover, the content transmission delay of CAFR is smaller than other schemes. This is because that the cache hit rate of CAFR is better than those of schemes, and more vehicles can fetch contents from local RSU directly, thus reducing the content transmission delay.

Refer to caption
Figure 7: Cache hit radio and content transmission delay under different vehicle densities

Fig. 7 shows the cache hit ratio and the content transmission delay of the CAFR scheme under different vehicle densities when the cache capacity of each RSU is 100100. As shown in this figure, the cache hit rate increases as the vehicle density increases. This is because when more vehicles enter the coverage area of the RSU, the global model of the local RSU is trained based on more data, and thus can predict accurately. In addition, the content transmission delay decreases as the vehicle density increases. This is because the cache hit rate increases when the vehicle density increases, which enables more vehicles to fetch contents directly from local RSU.

Refer to caption
Figure 8: Cache hit radio of CAFR and FedAVG

Fig. 8 compares the cache hit rate of the CAFR scheme and the FedAVG scheme under different rounds when the vehicle density is 1515 vehicles/km and the cache capacity of each RSU is 100100 contents. It can be seen that the cache hit radio of CAFR fluctuates between 22.5%22.5\% and 24%24\% within 3030 rounds, while the cache hit rate of FedAVG scheme fluctuates between 22%22\% and 23.5%23.5\% within 3030 rounds. This indicates that the CAFR scheme is slightly better than the FedAVG scheme. This is because the CAFR scheme has considered the vehicles’ mobility characteristics including the positions and velocities to select vehicles and aggregate the local model, thus improving the accuracy of the global model.

Refer to caption
Figure 9: Training time of CAFR and FedAVG

Fig. 9 shows the training time of CAFR and FedAVG schemes for each round when the vehicle density is 1515 vehicles/km and the cache capacity of each RSU is 100100 contents. It can be seen that the training time of CAFR scheme for each round is within 11s and 22s, while the training time of FedAVG scheme for each round is within 2222s and 2424s. This indicates that CAFR scheme has a much smaller training time than the FedAVG scheme. This is because the FedAVG scheme needs to aggregate all vehicles’ local models for the global model updating in each round, while the CAFR scheme aggregates as soon as a vehicle’s local model is received for each round.

Refer to caption
Figure 10: Cache hit radio and content transmission delay of each episode in the DRL

Fig. 10 shows the cache hit rate and content transmission delay of each episode in the DRL of the CAFR scheme when the vehicle density is 1515 vehicles/km and the cache capacity of RSU is 100100. As the episode increases, the cache hit rate gradually increases and the content transmission delay decreases gradually in the first ten episodes. This is because the local RSU and neighboring RSU gradually cache appropriate popular contents in the first ten episodes. In addition, it is seen that the cache hit rate and content transmission delay converge at around episode 1010. This is because the local RSU is able to learn the policy to perform optimal cooperative caching at around 1010 episodes.

Refer to caption
Figure 11: Cache hit radio for whether cache replacement

Fig. 11 compares the cache hit ratio of the CAFR scheme with CAFR scheme without DRL under different cache capacities of each RSU when the vehicle density is 1515 vehicles/km. As shown in Fig. 11, the cache hit ratio of CAFR outperforms the CAFR without DRL. This is because DRL can determine the optimal cooperative caching according to the predicted popular contents, and thus more suitable popular contents can be cached in the local RSU.

Refer to caption
Figure 12: Content transmission delay of CAFR and CAFR without DRL under different cache capacities

Fig. 12 compares the content transmission delay of the CAFR scheme with CAFR scheme without DRL under different cache capacities of each RSU when the vehicle density is 1515 vehicles/km. As shown in Fig. 12, the content transmission delay of CAFR is less than that of CAFR without DRL. This is because the cache hit ratio of CAFR outperforms the CAFR without DRL and more vehicles can fetch contents from local RSU directly.

VI Conclusions

In this paper, we considered the vehicle mobility and proposed a cooperative caching scheme CAFR to reduce the content transmission delay and improve the cache hit radio. We first proposed an asynchronous FL algorithm to obtain an accurate global model, and then proposed an algorithm to predict the popular contents based on the global model. Afterwards, we proposed a cooperative caching scheme to minimize the content transmission delay based on the dueling DQN algorithm. Simulation results have demonstrated that the CAFR scheme outperforms other baseline caching schemes. According to the theoretical analysis and simulation results, the conclusions can be summarized as follows:

  • CAFR scheme can learn from the local data of vehicles to capture useful hidden features and predict the accurate popular contents.

  • CAFR greatly reduces the training time for each round by aggregating the local model of a single vehicle in each round. In addition, CAFR considers vehicles’ mobility characteristics including the positions and velocities to select vehicles and aggregate the local model, which can improve the accuracy of the training model.

  • The DRL in the CAFR scheme determines the optimal cooperative caching policy according to the predicted popular contents, and thus more suitable popular contents are cached in the local RSU and neighboring RSU to reduce the content transmission delay.

References

  • [1] L. Liu, C. Chen, Q. Pei, S. Maharjan and Y. Zhang, "Vehicular Edge Computing and Networking: A Survey," Mob. Networks Appl., vol. 26, pp. 1145-1168, 2021.
  • [2] Q. Wu, Y. Zhao and Q. Fan, "Time-Dependent Performance Modeling for Platooning Communications at Intersection," IEEE IoT-J, doi: 10.1109/JIOT.2022.3161028.
  • [3] Y. Dai, D. Xu, S. Maharjan, G. Qiao and Y. Zhang, "Artificial Intelligence Empowered Edge Computing and Caching for Internet of Vehicles," IEEE Wireless Commun. Mag., vol. 26, no. 3, pp. 12-18, Jun. 2019.
  • [4] M. A. Javed and S. Zeadally, "AI-Empowered Content Caching in Vehicular Edge Computing: Opportunities and Challenges," IEEE Network, vol. 35, no. 3, pp. 109-115, May/June 2021.
  • [5] A. Narayanan, S. Verma, E. Ramadan, P. Babaie, and Z.-L. Zhang, "DeepCache: A deep learning based framework for content caching," Proc. Workshop Netw. Meets AI ML NetAI, 2018, pp. 48–53.
  • [6] M. Yan, C. A. Chan, W. Li, L. Lei, A. F. Gygax and C. -L. I, "Assessing the Energy Consumption of Proactive Mobile Edge Caching in Wireless Networks," IEEE Access, vol. 7, pp. 104394-104404, 2019, doi: 10.1109/ACCESS.2019.2931449.
  • [7] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor and S. Cui, "A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks," IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 269-283, Jan. 2021, doi: 10.1109/TWC.2020.3024629.
  • [8] X. Wang, C. Wang, X. Li, V. C. M. Leung and T. Taleb, "Federated Deep Reinforcement Learning for Internet of Things With Decentralized Cooperative Edge Caching," IEEE IoT-J, vol. 7, no. 10, pp. 9441-9455, Oct. 2020, doi: 10.1109/JIOT.2020.2986803.
  • [9] L. Cui, X. Su, Z. Ming, Z. Chen, S. Yang, Y. Zhou and W. Xiao, "CREAT: Blockchain-assisted Compression Algorithm of Federated Learning for Content Caching in Edge Computing," IEEE J-IoT, doi: 10.1109/JIOT.2020.3014370.
  • [10] R. Cheng, Y. Sun, Y. Liu, L. Xia, D. Feng and M. Imran, "Blockchain-empowered Federated Learning Approach for An Intelligent and Reliable D2D Caching Scheme," IEEE J-IoT, doi: 10.1109/JIOT.2021.3103107.
  • [11] H. Zhu, Q. Wu, X. -J. Wu, Q. Fan, P. Fan and J. Wang, "Decentralized Power Allocation for MIMO-NOMA Vehicular Edge Computing Based on Deep Reinforcement Learning," IEEE J-IoT, doi: 10.1109/JIOT.2021.3138434.
  • [12] Y. Dai, D. Xu, K. Zhang, S. Maharjan and Y. Zhang, "Deep Reinforcement Learning and Permissioned Blockchain for Content Caching in Vehicular Edge Computing and Networks," IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 4312-4324, April 2020, doi: 10.1109/TVT.2020.2973705.
  • [13] Z. Yu, J. Hu, G. Min, Z. Zhao, W. Miao and M. S. Hossain, "Mobility-Aware Proactive Edge Caching for Connected Vehicles Using Federated Learning," IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 5341-5351, Aug. 2021, doi: 10.1109/TITS.2020.3017474.
  • [14] J. Zhao, X. Sun, Q. Li and X. Ma, "Edge Caching and Computation Management for Real-Time Internet of Vehicles: An Online and Distributed Approach," IEEE Trans. Intell. Transp. Syst., vol. 22, no. 4, pp. 2183-2197, April 2021, doi: 10.1109/TITS.2020.3012966.
  • [15] S. Jiang, J. Liu, L. Huang, H. Wu and Y. Zhou, "Vehicular Edge Computing Meets Cache: An Access Control Scheme for Content Delivery," ICC 2020 - 2020 IEEE International Conference on Communications (ICC), 2020, pp. 1-6, doi: 10.1109/ICC40277.2020.9148755.
  • [16] C. Tang, C. Zhu, H. Wu, Q. Li and J. J. P. C. Rodrigues, "Toward Response Time Minimization Considering Energy Consumption in Caching-Assisted Vehicular Edge Computing," IEEE J-IoT, vol. 9, no. 7, pp. 5051-5064, 1 April1, 2022, doi: 10.1109/JIOT.2021.3108902.
  • [17] Y. Dai and Y. Zhang, "Adaptive Digital Twin for Vehicular Edge Computing and Networks," Journal of Communications and Information Networks (JCIN), vol. 7, no. 1, pp. 48-59, March 2022.
  • [18] G. Qiao, S. Leng, S. Maharjan, Y. Zhang and N. Ansari, "Deep Reinforcement Learning for Cooperative Content Caching in Vehicular Edge Computing and Networks," IEEE J-IoT , vol. 7, no. 1, pp. 247-257, Jan. 2020, doi: 10.1109/JIOT.2019.2945640.
  • [19] J. Chen, H. Wu, P. Yang, F. Lyu and X. Shen, "Cooperative Edge Caching With Location-Based and Popular Contents for Vehicular Networks," IEEE Trans. Veh. Technol., vol. 69, no. 9, pp. 10291-10305, Sept. 2020, doi: 10.1109/TVT.2020.3004720.
  • [20] L. Yao, X. Xu, J. Deng, G. Wu and Z. Li, "A Cooperative Caching Scheme for VCCN With Mobility Prediction and Consistent Hashing," IEEE Trans. Intell. Transp. Syst., doi: 10.1109/TITS.2022.3171071.
  • [21] R. Wang, Z. Kan, Y. Cui, D. Wu and Y. Zhen, "Cooperative Caching Strategy With Content Request Prediction in Internet of Vehicles," IEEE J-IoT, vol. 8, no. 11, pp. 8964-8975, 1 June1, 2021, doi: 10.1109/JIOT.2021.3056084.
  • [22] D. Gupta, S. Rani, S. H. Ahmed, S. Garg, M. Jalil Piran and M. Alrashoud, "ICN-Based Enhanced Cooperative Caching for Multimedia Streaming in Resource Constrained Vehicular Environment," IEEE Trans. Intell. Transp. Syst., vol. 22, no. 7, pp. 4588-4600, July 2021, doi: 10.1109/TITS.2020.3043593.
  • [23] L. Yao, Y. Wang, X. Wang and G. WU, "Cooperative Caching in Vehicular Content Centric Network Based on Social Attributes and Mobility," IEEE Trans. Mob. Comput., vol. 20, no. 2, pp. 391-402, 1 Feb. 2021, doi: 10.1109/TMC.2019.2944829.
  • [24] R. Wu, G. Tang, T. Chen, D. Guo, L. Luo and W. Kang, "A Profit-Aware Coalition Game for Cooperative Content Caching at the Network Edge," IEEE J-IoT, vol. 9, no. 2, pp. 1361-1373, 15 Jan.15, 2022, doi: 10.1109/JIOT.2021.3087719.
  • [25] L. Yao, A. Chen, J. Deng, J. Wang and G. Wu, "A Cooperative Caching Scheme Based on Mobility Prediction in Vehicular Content Centric Networks," IEEE Trans. Veh. Technol., vol. 67, no. 6, pp. 5435-5444, June 2018, doi: 10.1109/TVT.2017.2784562.
  • [26] K. Zhang, S. Leng, Y. He, S. Maharjan and Y. Zhang, "Cooperative Content Caching in 5G Networks with Mobile Edge Computing," IEEE Wirel. Commun., vol. 25, no. 3, pp. 80-87, JUNE 2018, doi: 10.1109/MWC.2018.1700303.
  • [27] K. Liu, J. K. -Y. Ng, J. Wang, V. C. S. Lee, W. Wu and S. H. Son, "Network-Coding-Assisted Data Dissemination via Cooperative Vehicle-to-Vehicle/-Infrastructure Communications," IEEE Trans. Intell. Transp. Syst., vol. 17, no. 6, pp. 1509-1520, June 2016, doi: 10.1109/TITS.2015.2495269.
  • [28] S. Wang, Z. Zhang, R. Yu and Y. Zhang, "Low-latency caching with auction game in vehicular edge computing," 2017 IEEE/CIC International Conference on Communications in China (ICCC), 2017, pp. 1-6, doi: 10.1109/ICCChina.2017.8330526.
  • [29] M. Liu, D. Li, H. Wu, F. Lyu and X. S. Shen, "Real-Time Search-Driven Caching for Sensing Data in Vehicular Networks," IEEE J-IoT, doi: 10.1109/JIOT.2021.3134964.
  • [30] B. Ko, K. Liu, S. H. Son and K. -J. Park, "RSU-Assisted Adaptive Scheduling for Vehicle-to-Vehicle Data Sharing in Bidirectional Road Scenarios," IEEE Trans. Intell. Transp. Syst., vol. 22, no. 2, pp. 977-989, Feb. 2021, doi: 10.1109/TITS.2019.2961705.
  • [31] J. Cui, L. Wei, H. Zhong, J. Zhang, Y. Xu and L. Liu, "Edge Computing in VANETs-An Efficient and Privacy-Preserving Cooperative Downloading Scheme," IEEE J. Sel. Areas Commun., vol. 38, no. 6, pp. 1191-1204, June 2020, doi: 10.1109/JSAC.2020.2986617.
  • [32] Q. Luo, C. Li, T. H. Luan and W. Shi, "Collaborative Data Scheduling for Vehicular Edge Computing via Deep Reinforcement Learning," IEEE J-IoT, vol. 7, no. 10, pp. 9637-9650, Oct. 2020, doi: 10.1109/JIOT.2020.2983660.
  • [33] Y. AlNagar, S. Hosny, and A. A. El-Sherif, "Towards mobility-aware proactive caching for vehicular ad hoc networks," Proc. IEEE Wireless Commun. Netw. Conf. Workshop (WCNCW), Apr. 2019, pp. 1–6.
  • [34] "Study on LTE-based V2X Services," 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 36.885, June 2016, version 14.0.0.
  • [35] J. Chen, H. Wu, P. Yang, F. Lyu and X. Shen, "Cooperative Edge Caching With Location-Based and Popular Contents for Vehicular Networks," IEEE Trans. Veh. Technol., vol. 69, no. 9, pp. 10291-10305, Sept. 2020, doi: 10.1109/TVT.2020.3004720.
  • [36] A. Ng, "Sparse autoencoder," CS294A Lecture notes, vol. 72, no. 2011, pp. 1–19, 2011.
  • [37] Y. Chen, Y. Ning, M. Slawski and H. Rangwala, "Asynchronous online federated learning for edge devices with non-IID data," 2020 IEEE International Conference on Big Data (Big Data), pp. 15-24, 2020.
  • [38] C. Xie, S. Koyejo, and I. Gupta, "Asynchronous federated optimization," arXiv preprint arXiv:1903.03934, 2019.
  • [39] H. -S. Lee and J. -W. Lee, "Adaptive Transmission Scheduling in Wireless Networks for Asynchronous Federated Learning," IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3673-3687, Dec. 2021, doi: 10.1109/JSAC.2021.3118353.
  • [40] Z. Yu, J. Hu, G. Min, H. Lu, Z. Zhao, H. Wang and N. Georgalas, "Federated Learning Based Proactive Content Caching in Edge Computing," 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1-6, doi: 10.1109/GLOCOM.2018.8647616.
  • [41] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot and N. Freitas, "Dueling Network Architectures for Deep Reinforcement Learning," ArXiv, abs/1511.065811 (2016): n. pag.
  • [42] F. Harper and J. Konstan, "The movielens datasets: History and context," ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, 2016.
  • [43] S. Müller, O. Atan, M. van der Schaar and A. Klein, "Context-Aware Proactive Content Caching With Service Differentiation in Wireless Networks," IEEE Trans. Wireless Commun., vol. 16, no. 2, pp. 1024-1036, Feb. 2017.