This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Content Caching-Assisted Vehicular Edge Computing Using Multi-Agent Graph Attention Reinforcement Learning

Jinjin Shen, Yan Lin, , Yijin Zhang, , Weibin Zhang, ,  
Feng Shu,  and Jun Li
Copyright (c) 20xx IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. This work was supported in part by the National Natural Science Foundation of China under Grants 62001225, 62071236, 62471204, and U22A2002, in part by Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292, in part by the Scientific Research Fund Project of Hainan University under Grant KYQD(ZR)-21008, in part by the Collaborative Innovation Center of Information Technology, Hainan University, under Grant XTCX2022XXC, in part by the Key Technologies R&D Program of Jiangsu (Prospective and Key Technologies for Industry) under Grants BE2023022 and BE2023022-2, in part by Natural Science Foundation of Jiangsu Province under Grant BK2021022532, and in part by Major Natural Science Foundation of the Higher Education Institutions of Jiangsu Province under Grant 24KJA510003. (Corresponding author: Yan Lin) J. Shen, Y. Lin, W. Zhang and Y. Zhang are with the School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail:{jinjin.shen, yanlin, weibin.zhang}@njust.edu.cn; [email protected]). F. Shu is with the School of Information and Communication Engineering, Hainan University, Haikou 570228, China (email: [email protected]). J. Li is with the School of Information Science and Engineering, Southeast University, Nanjing 210096, China (email: [email protected]).
Abstract

In order to avoid repeated task offloading and realize the reuse of popular task computing results, we construct a novel content caching-assisted vehicular edge computing (VEC) framework. In the face of irregular network topology and unknown environmental dynamics, we further propose a multi-agent graph attention reinforcement learning (MGARL) based edge caching scheme, which utilizes the graph attention convolution kernel to integrate the neighboring nodes’ features of each agent and further enhance the cooperation among agents. Our simulation results show that our proposed scheme is capable of improving the utilization of caching resources while reducing the long-term task computing latency compared to the baselines.

Index Terms:
Vehicular Edge Computing, Edge Caching, Multi-Agent, Graph Attention Reinforcement Learning

I Introduction

Recently, mobile edge computing (MEC) has evolved as a promising technology for ultra-reliable and ultra-low latency (URLLC) applications, such as virtual reality and autonomous driving [1]. Instead of sending tasks to the far-end cloud, MEC sinks computing and caching resources to the edge nodes close to the users [2]. In vehicular networks, vehicles have limited computing capability but can offload requested tasks to edge servers (ESs) for execution [3, 4, 5], thus relieving the computing pressure of vehicles [6].

Nevertheless, in practical scenarios where vehicular users (VUs) are driving during rush hour traffic or in congested urban areas, the data they offload exhibits high spatial, temporal, and social correlation, which results in different VUs requesting the same services simultaneously, such as navigation assistance, real-time traffic updates, or recommendations for nearby amenities [7]. As such, multiple VUs may have similar computing tasks associated with the same computing results to access and utilize the shared services. In this case, instead of duplicate task re-uploading and re-computing, the previous task computing results can be cached by ESs, e.g. the surrounding Roadside Units (RSUs) or VUs, and be further utilized by other VUs to reduce the latency of subsequent tasks [8]. Therefore, designing an optimal content caching policy for edge computing is of great significance for the vehicular edge computing (VEC) networks.

Considering the unknown dynamics of time-varying network topology and channel conditions, the edge computing content caching problem is essentially a model-free sequential decision-making problem [9]. By combining the advantages of deep learning in identifying data features and reinforcement learning in dynamic programming, deep reinforcement learning (DRL) has been widely employed in solving such problems [10]. For example, H. Tian et al. of [11] used deep deterministic policy gradient (DDPG) approach to orchestrate the joint offloading, caching, and resource allocation decision-making for VEC, taking into account the time-varying content popularity. However, it relies on the centralized learning framework without cooperative learning. Additionally, X. Ye et al. of  [12] designed a blockchain-based collaborative computing and caching framework in VEC using the multi-agent asynchronous advantage actor-critic (A3C) approach, which has verified the efficiency of cooperative learning compared to the non-cooperative learning.

TABLE I: Related contributions
Network Topology Content Popularity Graphical Information Cooperative Learning
Dynamic Irregular
[10]
[11]
[12]
[16]
Ours

However, the randomness of VU mobility leads to the dynamics and temporal-spatial irregularities of the network topology, which results in the environmental uncertainty and makes environmental knowledge more difficult to learn. But the aforementioned commonly used DRL solutions relying on convolutional neural networks exhibit weakness in extracting the temporal and spatial relationships between nodes in the irregular topology. To solve this issue, graph convolutional neural networks have been employed in DRL, which has the benefits of acting directly on graphs and fully utilizing the structural information such as the relationships among nodes [13]. Consequently, it has been used for solving optimization problems by jointly considering the information of agents close to an agent [14], and further enhances the cooperation of agents [15]. Although D. Wang et al. of [16] have explored the deep graph reinforcement learning approach for solving the joint caching, communication, and resource allocation problem, they did not carefully consider the dynamic graphical topology characteristics resulted from high vehicular mobility. In this context, this paper aims to investigate the content caching-assisted VEC problem, where the irregular and dynamic network topology and the cooperative learning among multiple vehicles are considered. To the best of our knowledge, at the time of writing, this is the first attempt in the related literature to study the content caching problem in VEC networks relying on multi-agent graph attention reinforcement learning (MGARL) framework. Compared to the existing literature, MGARL method has the promising potential to capture the dynamics and irregularities of time-varying vehicular networks, as well as make better use of the neighboring nodes’ information to facilitate the cooperative content caching decision-making.

Our main contributions are boldly and explicitly contrasted to the literature in Table I and are detailed as follows:

  • A content caching-assisted VEC framework is established where each VU needs to decide whether to cache the task computing result, with the aim of reducing computing latency and reusing the popular content.

  • The edge caching decision-making problem is constructed as a decentralized partially observable Markov Decision Process (DEC-POMDP), where each VU agent observes its local information and takes action individually based on its learned policy.

  • An MGARL-based edge caching scheme is proposed, where graph attention convolution kernels are utilized to capture relationships among agents by taking into account the neighboring agents’ features when making their own edge caching decisions.

  • Simulation results show that the proposed scheme outperforms other baselines in terms of both improving caching resource utilization and reducing long-term task computing latency.

II System Model and Problem Formulation

II-A Network Model

Refer to caption
Figure 1: Illustration of the content caching-assisted VEC system.

As shown in Fig. 1, we consider a Manhattan vehicular network model which consists of multiple horizontal and vertical two-lane two-way roads. LL RSUs are deployed uniformly and MM VUs drive along the road with their velocity obeying the Markov-Gaussian stochastic process independently and may change their direction at intersections with a given probability η\eta. Let ={1,2,,L}\mathcal{L}=\{1,2,\cdots,L\} and ={1,2,,M}\mathcal{M}=\{1,2,\cdots,M\} denote the index sets of RSUs and VUs, respectively. Both RSUs and VUs have the capability to compute tasks and cache contents, which are referred to as service nodes (SNs). Let 𝒩=={1,2,,N}\mathcal{N}=\mathcal{L}\cup\mathcal{M}=\{1,2,\cdots,N\} represent the index set of NN SNs.

Without loss of generality, we assume that the system operates in a time-slot (TS) mode and environmental parameters (e.g. transmit power and channel gain) remain unchanged during each TS. Additionally, new tasks with different popularity characteristics arrive every certain TSs, and each task can be completed successfully during each TS. Let 𝒦={1,2,,K}\mathcal{K}=\{1,2,\cdots,K\} represent the index set of tasks, where task kk is represented by a 3-tuple dk,sk,bk\left\langle d_{k},s_{k},b_{k}\right\rangle with the task size dkd_{k}, the number of CPU cycles required to process the task sks_{k} and the size of the processed task content bkb_{k}. Note that, if the content of the task result has been cached on SNs within the communication range of the VU, the content can be fetched directly from SNs to VUs. Otherwise, the VU has to compute the task itself or offload it for execution. In this case, after completing the task, each VU needs to decide whether to cache the content and to which SN.

II-B Communication Model

Let PmP_{m} and gm,nup(t)g_{m,n}^{\text{up}}(t) denote the transmit power of VU mm and the channel gain spanning from VU mm and SN nn in TS tt, respectively. In this framework, we consider the small-scale fading and the path loss of vehicle-to-vehicle (V2V) communication and vehicle-to-infrastructure (V2I) communication, neglecting the effects of shadow fading [4]. We assume that the inter-user interference has been eliminated by allocating orthogonal resource blocks. Thus, the uplink transmission rate from VU mm to SN nn in TS tt is given by

ζm,nup(t)=Blog2(1+Pmgm,nup(t)/σ2),\zeta_{m,n}^{\text{up}}(t)=B\log_{2}{(1+{P_{m}g_{m,n}^{\text{up}}(t)}/{\sigma^{2}})}, (1)

where BB is the bandwidth of each channel, and σ2\sigma^{2} is the power of the additive Gaussian white noise, both of which are assumed to be the same for each TS. Similarly, let PnP_{n} and gn,mdown(t)g_{n,m}^{\text{down}}(t) denote the transmit power of SN nn and the channel gain spanning from SN nn and VU mm in TS tt, respectively. Thus the downlink transmission rate from SN nn to VU mm in TS tt is given by

ζn,mdown(t)=Blog2(1+Pngn,mdown(t)/σ2).\zeta_{n,m}^{\text{down}}(t)=B\log_{2}{(1+{P_{n}g_{n,m}^{\text{down}}(t)}/{\sigma^{2}})}. (2)

II-C Computing Model

Let km(t)k_{m}(t) denote the task requested by VU mm in TS tt and wm(t){0,1}w_{m}(t)\in\{0,1\} denote whether VU mm has to offload km(t)k_{m}(t). If wm(t)=0w_{m}(t)=0, VU mm computes km(t)k_{m}(t) itself, otherwise VU mm offloads km(t)k_{m}(t) to the nearest SN nm(t)n_{m}(t).

  • Local Computing: Let fmlf_{m}^{\text{l}} be the computation capability on VU mm. Therefore, the latency of computing task km(t)k_{m}(t) generated by user VU mm is given by Tml(t)=(1wm(t))skm(t)/fmlT_{m}^{\text{l}}(t)=(1-w_{m}(t)){s_{k_{m}(t)}}/{f_{m}^{\text{l}}}.

  • Task Offloading: Let fnm(t)offf_{n_{m}(t)}^{\text{off}} be the computation capability on SN nm(t)n_{m}(t). Then, the latency in completing task km(t)k_{m}(t) is expressed as Tmoff(t)=wm(t)(Tmup(t)+Tmexe(t)+Tmdown(t))T_{m}^{\text{off}}(t)=w_{m}(t)(T_{m}^{\text{up}}(t)+T_{m}^{\text{exe}}(t)+T_{m}^{\text{down}}(t)), where Tmup(t)=dkm(t)/ζm,nm(t)up(t)T_{m}^{\text{up}}(t)={d_{k_{m}(t)}}/{\zeta_{m,n_{m}(t)}^{\text{up}}(t)}, Tmexe(t)=skm(t)/fnm(t)offT_{m}^{\text{exe}}(t)={s_{k_{m}(t)}}/{f_{n_{m}(t)}^{\text{off}}} and Tmdown(t)=bkm(t)/ζnm(t),mdown(t)T_{m}^{\text{down}}(t)={b_{k_{m}(t)}}/{\zeta_{n_{m}(t),m}^{\text{down}}(t)} denote the task offloading latency, task computing latency, and result content fetching latency, respectively.

II-D Caching Model

Let Hca(t)=[hn,kca(t)]n,k\textbf{H}_{\text{ca}}(t)=[h_{n,k}^{\text{ca}}(t)]_{\forall n,\forall k} represent whether computing result contents are cached or not by all SNs in TS tt. Specifically, hn,kca(t)=1h_{n,k}^{\text{ca}}(t)=1 if the content of task kk is cached on SN nn in TS tt, otherwise hn,kca(t)=0h_{n,k}^{\text{ca}}(t)=0. Note that all SNs have limited caching capacity with gn(t)g_{n}(t) for SN nn, thus it is impossible to cache all the task contents. Let us continue to denote amca(t){0,1,,N}a_{m}^{\text{ca}}(t)\in\{0,1,\cdots,N\} as the caching decision variable for VU mm in TS tt. To be specific, if amca(t)=n(n0)a_{m}^{\text{ca}}(t)=n~{}(n\neq 0), VU mm caches the computing result content to SN nn and hn,kca(t)=1h_{n,k}^{\text{ca}}(t)=1, otherwise VU mm does not cache the content to any SN. Then, the caching decision of all VUs in TS tt can be denoted as aca(t)=[a1ca(t),a2ca(t),,aMca(t)]\textbf{a}_{\text{ca}}(t)=[a_{1}^{\text{ca}}(t),a_{2}^{\text{ca}}(t),\cdots,a_{M}^{\text{ca}}(t)].

When SNs cache the content of ZtZ_{t} task computing results in TS tt, we have 𝒵={1,2,,Zt}\mathcal{Z}=\{1,2,\cdots,Z_{t}\}. We assume that the task content popularity follows the Zipf distribution, thus the probability of task content z𝒵z\in\mathcal{Z} to be requested is expressed as qz(t)=Iz(t)δ/z=0ZIz(t)δq_{z}(t)={I_{z}(t)^{-\delta}}/{\sum_{z=0}^{Z}I_{z}(t)^{-\delta}} , where Iz(t)I_{z}(t) denotes the ranking of the popularity of task content zz in descending order in TS tt, and δ\delta is the Zipf distribution parameter.

II-E Problem Formulation

Our aim is to design an edge caching scheme for the VEC system to minimize the long-term task computing latency by utilizing limited edge caching resources. To formulate our problem, we introduce an auxiliary variable em(t)e_{m}(t), where em(t)=1e_{m}(t)=1 when mm can fetch the required content directly from SNs within its communication range, otherwise em(t)=0e_{m}(t)=0. Thus, the total computation latency is the local computing latency or the task offloading latency when em(t)=0e_{m}(t)=0, and is the latency for fetching the cached task result when em(t)=1e_{m}(t)=1. Then, the task computing latency for VU mm in TS tt can be expressed by Tm(t)=(1em(t))(Tml(t)+Tmoff(t))+em(t)Tmdown(t)T_{m}(t)=(1-e_{m}(t))(T_{m}^{\text{l}}(t)+T_{m}^{\text{off}}(t))+e_{m}(t)T_{m}^{\text{down}}(t). Mathematically, the optimization problem is formulated as

min{aca(t)}E[t=1Tm=1MTm(t)]\min_{\left\{\textbf{a}_{\text{ca}}(t)\right\}}{E[\sum_{t=1}^{T}{\sum_{m=1}^{M}}T_{m}(t)]} (3a)
s.t.amca(t){0,1,,N},m,t,\text{s.t.}\>~{}a_{m}^{\text{ca}}(t)\in\left\{0,1,\cdots,N\right\},\forall m,\forall t, (3b)
k𝒦hn,kca(t)bkgn(t),n,\sum_{k\in\mathcal{K}}h_{n,k}^{\text{ca}}(t)b_{k}\leq g_{n}(t),\forall n, (3c)

where constraint (3c) indicates that the occupied caching capacity at SN nn can not exceed its maximum capacity gn(t)g_{n}(t).

Refer to caption
Figure 2: Illustration of the proposed MGARL-based content caching-assisted VEC scheme.

III The Design of DEC-POMDP

Considering the fact that the vehicular environment is dynamically changing and each VU is unable to obtain global environmental knowledge, we further formulate the above edge caching decision-making problem as a DEC-POMDP. Explicitly, each VU agent gets the local observation and takes action individually. By interacting with the environment iteratively, the agents can learn their strategies to maximize the system reward. Next, we will elaborate on the definitions of DEC-POMDP.

III-A Observation

Due to limited sensing and positioning technology, we assume that each VU agent can only observe its location, its caching state, and the current remaining caching capacity of all SNs. Accordingly, the observation of VU agent mm can be defined as

𝒐m(t)=[xm(t),ym(t),hm,1ca(t),hm,2ca(t),,\displaystyle\boldsymbol{o}_{m}(t)=[x_{m}(t),y_{m}(t),h_{m,1}^{\text{ca}}(t),h_{m,2}^{\text{ca}}(t),\cdots, (4)
hm,Kca(t),g1(t),g2(t),,gN(t)],\displaystyle h_{m,K}^{\text{ca}}(t),g_{1}(t),g_{2}(t),\cdots,g_{N}(t)],

where xm(t)x_{m}(t) and ym(t)y_{m}(t) are the current horizontal and vertical coordinates of VU agent mm, respectively.

III-B Action

Based on the learned policy and its observation in TS tt, VU agent mm selects an action amca(t)a_{m}^{ca}(t) to decide whether to cache the content and to which SN.

III-C Reward Function

In order to minimize the long-term task computing latency, the local reward of VU agent mm can be designed as

rm(t)=em(t)remTm(t)+pl(t),r_{m}(t)=e_{m}(t)re_{m}-T_{m}(t)+pl(t), (5)

where remre_{m} is the reward for encouraging caching computing results. Moreover, pl(t)pl(t) is negative if the caching decision made by VU agent mm leads to the excess of caching capacity, otherwise pl(t)=0pl(t)=0. Accordingly, the system reward, which is defined as the sum of all local rewards, is given by r(t)=m=1Mrm(t)r(t)=\sum_{m=1}^{M}r_{m}(t).

IV MGARL-based Scheme

To adapt to dynamically changing irregular topology and capture relationships among agents, we resort to MGARL to further utilize the cooperation of multiple agents for solving the edge caching decision-making problem. As illustrated in Fig.2, MGARL consists of three modules: graph modeling, latent feature generating and parameter updating.

IV-A Graph Modeling

The vehicular network topology can be modeled as a graph 𝒢={𝒱,}\mathcal{G}=\{\mathcal{V},\mathcal{E}\} where the node set 𝒱\mathcal{V} is the set of MM VUs and the edge set \mathcal{E} is determined by the distance among vehicles. To be specific, there exists an edge between two nodes when their distance does not exceed the maximum communication range and such vehicles are neighbors of each other. Let 𝒜it\mathcal{A}_{i}^{t} denote the set of neighbors of node ii in TS tt. Each node in the graph has its node features, denoted by hith_{i}^{t} for node ii in TS tt, which are yielded from the observation oito_{i}^{t} through a fully connected network.

Algorithm 1 Training Process for MGARL-Based Content Caching-Assisted VEC Scheme  

1:  for each agent i𝒱i\in\mathcal{V} do
2:     Initialize the QQ network parameter θi\theta_{i} of agent ii;
3:  end for
4:  for each episode q{1,2,,Q}q\in\{1,2,\cdots,Q\} do
5:     Reset simulation parameters;
6:     for each TS t{1,2,,T}t\in\{1,2,\cdots,T\} do
7:        for each agent i𝒱i\in\mathcal{V} do
8:           VU ii completes one task;
9:           Get local observation oito_{i}^{t} and construct the adjacency matrix 𝐂it\mathbf{C}_{i}^{t};
10:           Select aita_{i}^{t} based on the learned policy;
11:           Cache the content and calculate ritr_{i}^{t};
12:           Get oit+1o_{i}^{t+1} and 𝐂it+1\mathbf{C}_{i}^{t+1};
13:           Store the tuple oit,ait,rit,oit+1,𝐂it,𝐂it+1\left\langle o_{i}^{t},a_{i}^{t},r_{i}^{t},o_{i}^{t+1},\mathbf{C}_{i}^{t},\mathbf{C}_{i}^{t+1}\right\rangle in shared replay buffer;
14:           Encode oito_{i}^{t} to hith_{i}^{t} through a fully connected network;
15:           Integrate the features of 𝒜+it\mathcal{A}_{+i}^{t} according to 𝐂it\mathbf{C}_{i}^{t} to generate hith_{i}^{\prime t} by adopting multi-head attention as the convolution kernel;
16:           Sample a random minibatch of SS tuples from the shared replay buffer;
17:           Update θi\theta_{i} to minimize the loss function;
18:        end for
19:     end for
20:  end for

IV-B Latent Feature Generating

The latent feature generating stage utilizes the convolutional layer to integrate the node features in node ii’s local region, which includes ii and 𝒜it\mathcal{A}_{i}^{t} to generate the latent feature hith_{i}^{\prime t} in TS tt. Firstly, in TS tt, all nodes’ feature vectors are merged into a feature matrix 𝐅t\mathbf{F}^{t} with size M×DM\times D in the order of index where DD is the length of the feature vector. Then, let 𝒜+it\mathcal{A}_{+i}^{t} represent the set of node ii and 𝒜it\mathcal{A}_{i}^{t} and we introduce an adjacency matrix 𝐂it\mathbf{C}_{i}^{t} with size |𝒜+it|×M\lvert\mathcal{A}_{+i}^{t}\rvert\times M to denote the one-hot representations of 𝒜+it\mathcal{A}_{+i}^{t}. To be specific, the first row of 𝐂it\mathbf{C}_{i}^{t} is the one-hot representation of node ii, and the j{2,3,,|𝒜+it|}j\in\{2,3,\cdots,\lvert\mathcal{A}_{+i}^{t}\rvert\}th row is the representation of the (j1)(j-1)th neighbor of ii. Then, the features in the local region of node ii are obtained by 𝐂it×𝐅t\mathbf{C}_{i}^{t}\times\mathbf{F}^{t}.

To further capture the relationships among nodes, the multi-head attention is adopted as the convolution kernel to integrate the feature vectors in the local region of node ii and generate the latent feature vector hith_{i}^{\prime t}. Let WQuW_{Q}^{u}, WKuW_{K}^{u} and WVuW_{V}^{u} denote the parameter matrices corresponding to the query, key, and value representation in attention head uu, respectively. For attention head uu, the relationship between node ii and its neighbor j𝒜+itj\in\mathcal{A}_{+i}^{t} in TS tt can be formulated as

αi,j,tu=exp(τWQuhit(WKuhjt)T)k𝒜+itexp(τWQuhit(WKuhkt)T),\alpha_{i,j,t}^{u}=\frac{\textbf{{exp}}(\tau\cdot W_{Q}^{u}h_{i}^{t}\cdot(W_{K}^{u}h_{j}^{t})^{\mathrm{T}})}{\sum_{k\in\mathcal{A}_{+i}^{t}}\textbf{{exp}}(\tau\cdot W_{Q}^{u}h_{i}^{t}\cdot(W_{K}^{u}h_{k}^{t})^{\mathrm{T}})}, (6)

where τ\tau is a scaling factor. Next, the outputs of UU attention heads for node ii are concatenated and then fed into function 𝝈\bm{\sigma} to produce the output of the convolutional layer, expressed as hit=𝝈(con[j𝒜+jtai,j,tuWVuhjt,u])h_{i}^{\prime t}=\bm{\sigma}(\textbf{{con}}[\sum_{j\in\mathcal{A}_{+j}^{t}}a_{i,j,t}^{u}W_{V}^{u}h_{j}^{t},\forall u]), where con denotes the concatenation of the outputs of UU attention heads. Note that, more graphical information can be extracted through increasing the number of convolutional layers.

IV-C Parameter Updating

Similar to the traditional deep Q network (DQN) algorithm using Q values to estimate the long-term cumulative reward of taking a certain action at the current state, the MGARL algorithm utilizes both the training network and the target network, where the training network parameters are updated by optimizing the loss function of the target network. Through extracting graphical information to facilitate cooperative learning, the MGARL method can better capture the dynamics and irregularities than the traditional DRL methods. In order to utilize historical data for training and improve sample utilization, the tuple oit,ait,rit,oit+1,𝐂it,𝐂it+1\left\langle o_{i}^{t},a_{i}^{t},r_{i}^{t},o_{i}^{t+1},\mathbf{C}_{i}^{t},\mathbf{C}_{i}^{t+1}\right\rangle of vehicular agent ii is stored in the shared replay buffer in each TS. When training, SS sets of samples are randomly selected to update the parameters of the training network. Particularly, in order to achieve stable interactions between agents, MGARL optimizes the attention weight distribution of the last convolutional layer to minimize the Kullback-Leibler divergence of the attention weight distribution in the current TS over the weights in the next TS. Then, the loss function of the target network is expressed as

(θ)=1SS1Mi=1M(yitQi(𝒪i,𝐂it,ait;θ))2+\displaystyle\mathcal{L}(\theta)=\frac{1}{S}\sum_{S}\frac{1}{M}\sum_{i=1}^{M}(y_{i}^{t}-Q_{i}(\mathcal{O}_{i,\mathbf{C}_{i}^{t}},a_{i}^{t};\theta))^{2}+ (7)
λ1Uu=1UDKL(𝒢u,tk(𝒪i,𝐂it;θ)(𝒢u,t+1k(𝒪i,𝐂it+1;θ)),\displaystyle\vspace{-0.3cm}\lambda\frac{1}{U}\sum_{u=1}^{U}D_{KL}(\mathcal{G}_{u,t}^{k}(\mathcal{O}_{i,\mathbf{C}_{i}^{t}};\theta)\|(\mathcal{G}_{u,t+1}^{k}(\mathcal{O}_{i,\mathbf{C}_{i}^{t+1}};\theta)),

where yit=rit+γmaxaQi(𝒪i,𝐂it+1,ai;θ))2y_{i}^{t}=r_{i}^{t}+\gamma{max}_{a^{\prime}}Q_{i}^{\prime}(\mathcal{O}_{i,\mathbf{C}_{i}^{t+1}},a_{i}^{\prime};\theta^{\prime}))^{2} is the Q value generated by the target network with parameter θ\theta^{\prime}. γ\gamma is the discount factor. 𝒪i,𝐂it\mathcal{O}_{i,\mathbf{C}_{i}^{t}} denotes the set of observations of nodes in agent ii’s local region determined by adjacency matrix 𝐂it\mathbf{C}_{i}^{t}. Q function parameterized by θ\theta takes 𝒪i,𝐂it\mathcal{O}_{i,\mathbf{C}_{i}^{t}} as input and outputs QQ value for agent ii. λ\lambda is the coefficient for the loss and 𝒢u,tk(𝒪i,𝐂it;θ)\mathcal{G}_{u,t}^{k}(\mathcal{O}_{i,\mathbf{C}_{i}^{t}};\theta) denotes the attention weight distribution of relation representations of attention head uu in convolutional layer kk for agent ii. The details of the training process for the proposed MGARL-based algorithm are listed in Algorithm 1.

IV-D Complexity Analysis

Let us now discuss the complexity of the graph attention convolutional layer in TS tt including the feature mapping of nodes and the calculation of attention weights. We assume that the constructed graph in TS tt consists of |t|\lvert\mathcal{E}_{t}\rvert edges and each node feature is mapped from dimension FF to space in dimension FF^{\prime}. Then, the computational complexity for mapping node features is represented as 𝒪(MFF)\mathcal{O}\left(MFF^{\prime}\right) while the computational complexity in calculating attention weights is related to the number of edges, which is expressed as 𝒪(|t|F)\mathcal{O}\left(\lvert\mathcal{E}_{t}\rvert F^{\prime}\right). Therefore, the total computational complexity with UU attention heads is determined by the number of VUs, the number of edges, and the node feature dimension, given by 𝒪(U(MFF+|t|F))\mathcal{O}\left(U(MFF^{\prime}+\lvert\mathcal{E}_{t}\rvert F^{\prime})\right).

V Simulation Results

V-A Simulation Settings

We consider a Manhattan vehicular network model, where the length and width of the road are both 11 km. By default, we set M=20M=20, L=16L=16, dk[50,80]d_{k}\in\left[50,80\right] MB, bk[6,16]b_{k}\in\left[6,16\right] MB, sk=50s_{k}=50 Megacycles (k)(\forall k), B=10B=10 MHz, σ2=174\sigma^{2}=-174 dBm/Hz, δ=0.5\delta=0.5, rem=2(m)re_{m}=2\ (\forall m), and pl{0,0.5}pl\in\{0,-0.5\}. The transmit power of RSUs and VUs are set to 4040 dBm and 2323 dBm, respectively. The caching capacity of RSUs and VUs are 100100 MB and 5050 MB, respectively. The computation capability of RSUs and VUs are 11 GHz and 0.80.8 GHz, respectively. The communication range of RSUs and V2V communication are set as 400400 m and 300300 m, respectively. We adopt the channel model according to [4]. In terms of the VU mobility model, we set η=0.4\eta=0.4 and the mean range of VUs’ velocity as [36,54][36,54] km/h, and other parameters setting please refer to [17]. With regard to the learning configurations, γ=0.9\gamma=0.9, S=128S=128, the learning rate is 0.0010.001 and the buffer capacity is 20002000. For fair comparison, we choose the proposed w/o attention based scheme, multi-agent Independent Double Deep Q Network (IDDQN) based scheme and the random content caching scheme.

V-B Performance Evaluation

Refer to caption
Figure 3: Comparison of the convergence.

Fig. 3 presents the convergence performance of all schemes. It can be observed that the cumulative reward of the proposed scheme is higher than the baselines. This is attributed to the fact that the proposed scheme, in conjunction with the proposed w/o attention and IDDQN-based scheme, can learn policy by continuously interacting with the environment. Moreover, the proposed scheme utilizes graph attention convolution kernels to capture the relationships among agents to further enhance cooperation, which is more adaptable to irregular network topologies and consequently has the highest cumulative reward.

Refer to caption
Figure 4: The converged content hit ratio versus the number of VUs.
Refer to caption
Figure 5: The converged total system latency versus the number of RSUs.

The converged content hit ratio versus the number of VUs is investigated in Fig. 4 when the number of content types (CTs) is 10 and 20. The first point to observe is that the converged content hit ratio increases as the number of VUs increases. This is because more users lead to the increasing number of content requests, thus increasing the probability of reusing the same cached content. Secondly, it is clear that the content hit ratio is reduced when the number of CTs is increased from 1010 to 2020. The underlying reason is that due to the limited caching capacity of SNs, they may not cache all the contents, which decreases the hit ratio of each content. Moreover, we can see that our proposed scheme outperforms other schemes in improving the content hit ratio regardless of both the number of CTs and the number of VUs, which verifies the effectiveness of the proposed scheme in improving caching resource utilization.

Fig. 5 compares the converged total system latency versus the number of RSUs. Firstly, as the number of RSUs increases, the converged total system latency exhibits a downward trend. The underlying reason for this phenomenon is that with more RSUs having computation and caching capability, VUs are more likely to fetch the cached result content from the surrounding RSUs, thus avoiding the repeated uploading and computing process. Secondly, it can be seen that the proposed scheme outperforms other schemes under different numbers of RSUs, since multiple graph attention convolutional layers can extract more hidden structural information to facilitate cooperative learning. This phenomenon implies that the proposed scheme can make better use of densely deployed cache resources to further reduce the total system latency.

VI Conclusion

In this paper, we proposed a content caching-assisted VEC framework by taking into account the reuse of popular task computing results. To adapt to the irregular network topology and the environmental uncertainty, we developed an MGARL-based edge caching scheme for VEC networks by utilizing the cooperation among agents with the integration information of neighboring nodes in decision-making. Compared to the baselines, the proposed scheme can better learn the irregular topology dynamics, thus significantly reducing the task computing latency and improving the utilization of densely deployed caching resources.

References

  • [1] K. Jiang, H. Zhou, X. Chen, and H. Zhang, “Mobile edge computing for ultra-reliable and low-latency communications,” IEEE Commun. Mag., vol. 5, no. 2, pp. 68–75, 2021.
  • [2] Z. Yao, S. Xia, Y. Li, and G. Wu, “Cooperative task offloading and service caching for digital twin edge networks: A graph attention multi-agent reinforcement learning approach,” IEEE J. Sel. Areas Commun., vol. 41, no. 11, pp. 3401–3413, 2023.
  • [3] H. Zhou, K. Jiang, S. He, G. Min, and J. Wu, “Distributed deep multi-agent reinforcement learning for cooperative edge caching in Internet of Vehicles,” IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 9595–9609, 2023.
  • [4] Y. Lin, Y. Zhang, J. Li, F. Shu, and C. Li, “Popularity-aware online task offloading for heterogeneous vehicular edge computing using contextual clustering of bandits,” IEEE Internet of Things J., vol. 9, no. 7, pp. 5422–5433, 2022.
  • [5] Y. Lin, J. Bao, Y. Zhang, J. Li, F. Shu, and L. Hanzo, “Privacy-preserving joint edge association and power optimization for the Internet of Vehicles via federated multi-agent reinforcement learning,” IEEE Trans. Veh. Technol., vol. 72, no. 6, pp. 8256–8261, 2023.
  • [6] L. Liu, X. Yuan, N. Zhang, D. Chen, K. Yu, and A. Taherkordi, “Joint computation offloading and data caching in multi-access edge computing enabled Internet of Vehicles,” IEEE Trans. Veh. Technol., vol. 72, no. 11, pp. 14939–14954, 2023.
  • [7] M. W. Al Azad and S. Mastorakis, “The promise and challenges of computation deduplication and reuse at the network edge,” IEEE Wireless Commun., vol. 29, no. 6, pp. 112–118, 2022.
  • [8] F. Zeng, K. Zhang, L. Wu, and J. Wu, “Efficient caching in vehicular edge computing based on edge-cloud collaboration,” IEEE Trans. Veh. Technol., vol. 72, no. 2, pp. 2468–2481, 2023.
  • [9] G. Qiao, S. Leng, S. Maharjan, Y. Zhang, and N. Ansari, “Deep reinforcement learning for cooperative content caching in vehicular edge computing and networks,” IEEE Internet of Things J., vol. 7, no. 1, pp. 247–257, 2020.
  • [10] L. Geng, H. Zhao, J. Wang, A. Kaushik, S. Yuan, and W. Feng, “Deep-reinforcement-learning-based distributed computation offloading in vehicular edge computing networks,” IEEE Internet of Things J., vol. 10, no. 14, pp. 12416–12433, 2023.
  • [11] H. Tian, X. Xu, L. Qi, X. Zhang, W. Dou, S. Yu, and Q. Ni, “Copace: Edge computation offloading and caching for self-driving with deep reinforcement learning,” IEEE Trans. Veh. Technol., vol. 70, no. 12, pp. 13281–13293, 2021.
  • [12] X. Ye, M. Li, P. Si, R. Yang, Z. Wang, and Y. Zhang, “Collaborative and intelligent resource optimization for computing and caching in IoV with blockchain and MEC using A3C approach,” IEEE Trans. Veh. Technol., vol. 72, no. 2, pp. 1449–1463, 2023.
  • [13] R. Akmam Dziyauddin, D. Niyato, N. Cong Luong, M. A. M. Izhar, M. Hadhari, and S. Daud, “Computation offloading and content caching delivery in vehicular edge computing: A survey,” arXiv e-prints, pp. arXiv–1912, 2019.
  • [14] J. Jiang, C. Dun, T. Huang, and Z. Lu, “Graph convolutional reinforcement learning,” arXiv preprint arXiv:1810.09202, 2018.
  • [15] Z. Yao, Y. Li, S. Xia, and G. Wu, “Attention cooperative task offloading and service caching in edge computing,” in GLOBECOM 2022 - 2022 IEEE Global Communications Conference, pp. 5189–5194, 2022.
  • [16] D. Wang, Y. Bai, G. Huang, B. Song, and F. R. Yu, “Cache-aided MEC for IoT: Resource allocation using deep graph reinforcement learning,” IEEE Internet of Things J., vol. 10, no. 13, pp. 11486–11496, 2023.
  • [17] Y. Lin, Z. Zhang, Y. Huang, J. Li, F. Shu, and L. Hanzo, “Heterogeneous user-centric cluster migration improves the connectivity-handover trade-off in vehicular networks,” IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 16027–16043, 2020.