This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Mobility-Aware Routing and Caching in Small Cell Networks using Federated Learning

Yuwen Cao, , Setareh Maghsudi,  and Tomoaki Ohtsuki This paper was presented in part at the IEEE Conference on Communications (ICC) 2021 [1]. Y. Cao and T. Ohtsuki are with the Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan (e-mail:[email protected]; [email protected]). S. Maghsudi is with the Department of Computer Science, University of Tübingen, 72074 Tübingen- Germany, and with the Fraunhofer Heinrich-Hertz-Institute, 10587 Berlin- Germany (email:[email protected]). The work of Y. Cao and T. Ohtsuki was supported by JSPS KAKENHI Grant Number JP20J12528. The work of S. Maghsudi was supported by Grant 16KIS1165 from the German Federal Ministry of Education and Research.
Abstract

We consider a service cost minimization problem for resource-constrained small-cell networks with caching, where the challenge mainly stems from (i) the insufficient backhaul capacity and limited network bandwidth and (ii) the limited storing capacity of small-cell base stations (SBSs). Besides, the optimization problem is NP-hard since both the users’ mobility patterns and content preferences are unknown. In this paper, we develop a novel mobility-aware joint routing and caching strategy to address the challenges. The designed framework divides the entire geographical area into small sections containing one SBS and several MUs. Based on the concept of one-stop-shop (OSS), we propose a federated routing and popularity learning (FRPL) approach in which the SBSs cooperatively learn the routing and preference of their respective MUs and make a caching decision. The FRPL method completes multiple tasks in one shot, thus reducing the average processing time per global aggregation of learning. By exploiting the outcomes of FRPL together with the estimated service edge of SBSs, the proposed cache placement solution greedily approximates the minimizer of the challenging service cost optimization problem. Theoretical and numerical analyses show the effectiveness of our proposed approaches.

Index Terms: Caching, federated learning, routing, mobility patterns, small cell network, one-stop-shop, multiple tasks.

I Introduction

Wireless caching is a promising concept to reduce the peak traffic and the backhaul load, particularly for video content delivery [2, 3, 4]. The caching concept originates from the content reuse property of video streaming, i.e., users are likely to request the same video content. Thus saving popular content at the local small-cell base stations (SBSs) during the off-peak time or pushing them at user devices directly through broadcasting improves the network’s throughput and coverage performance. Besides, it enhances the users’ quality of experience (QoE) [5, 6].

Realizing the potential of wireless caching necessitates optimizing several parameters. Under lack of sufficient prior knowledge, one can take advantage of a recently-emerged concept, namely federated learning (FL). FL enables several partners to jointly learn the parameters of a specific model in a distributed manner, i.e., without requiring any data exchange [7]. Thus, using FL strongly reduces the amount of data uploaded via the wireless uplink channel. Besides, FL maintains the benefits of cognitive- and swift reaction to the mobility and other characteristics of cellular networks, as well as preserving personal data privacy [8]. Besides, the FL concept is a widespread tool for wireless edge network optimization [8, 9, 10, 11, 12], due to the following three reasons: (i) The ever-increasing number of devices in the internet of things (IoT) generate an enormous amount of data at the network edge. That entails an enabling technology and efficient approaches for sensitive data management and utilization in wireless edge network; (ii) With the computation- and storage constraints of increasingly complex edge networks, conventional network optimization approaches built on static models perform poorly in modeling dynamic networks; (iii) Conventional machine learning-based approaches can optimize decision-making through interactions with the dynamic environment. However, such methods require user data as input, which might be sensitive or inaccessible in nature due to regulatory constraints. Thus FL is an enabling technology for resource-constrained network optimization (concerning computation and storage resources), and for bringing intelligence to the wireless edge network.

Reference [13] formulates a joint link scheduling and power allocation problem for (device-to-device) D2D-assisted wireless caching network. The authors then decompose the original problem into two subproblems and alternately solve subproblems by using the convex optimization method [14]. In [15], Want et al. investigate the NP-hard cache placement problem for D2D-assisted network. The authors propose a dynamic programming algorithm that greedily assigns the contents to the empty caches. In [16], the authors investigate an energy-efficient transmission mode selection problem for D2D-assisted small-cell networks. By relaxing the hard problem and using the bandit theory [17], they achieve the maximum network utility. Furthermore, reference [18] studies a joint transmission and caching policy optimization problem for D2D-assisted small-cell networks, given that the users’ demands are known a prior. Wireless caching in small-cell networks is also studied in [19, 20, 21, 22]. Reference [19] develops a context-aware proactive caching approach for small-cell networks. Such an approach and its variant in [20] learn the context-specific content popularity online. In [21], the authors introduce a learning-based approach to caching in heterogeneous small-cell networks. Besides, [22] investigates a joint femto-caching and power control problem in small-cell networks without knowing any prior information on file popularity.

References [23, 24, 25] study caching based on network coding (coded-multicast). [23] investigates the caching policy optimization based on multicast transmissions. The results reveal that properly combining caching and multicast transmission reduces energy costs. By jointly optimizing the cache placement and multicast transmission, [24] proposes a novel coded caching approach that can guarantee a global caching gain. In [25], the authors develop an optimal femto-cache placement strategy that maximizes the number of network coded file downloads from femto-caches, thereby significantly reducing the macro-cell base station (MBS) bandwidth usage.

TABLE I:
The State-Of-The Art Research of Caching in Small-Cell Networks
Ref. Optimization Problem Prior Information Resource Constrained Merit
[15] Transmission, Caching policy Popularity None Low energy and economical cost
[16] Cache placement User context Yes High cache hit ratio
[18] Cache placement User demand None A lower bound on the training time
[22] Transmission, Caching policy Popularity None Low energy cost
[25] Cache placement Popularity None Low computation complexity
[26] Energy efficiency, Cache placement Popularity, Perfect CSI None High cache hit ratio
[27] Caching utility Popularity, Mobility data None High caching utility
Our work Routing, Cache placement None Yes See Section I-A

Mobility-aware caching exploits users’ mobility statistics for allocating caching resources [26, 27, 28, 29]. Reference [26], e.g., proposes a mobility-aware cache placement strategy that maximizes the data offloading ratio in D2D-assisted networks, given precise prior information of file popularity. Similarly, [27] assumes that each SBS has the global channel state information (CSI) and file popularity. Based on this assumption, the authors develop a green mobility-aware caching model to jointly optimize the cache placement and power allocation among SBSs and mobile devices. Besides, [28] explores the mobility-aware content caching problem for small-cell networks to maximize the caching gain given mobile users’ (MUs’) preferences and MUs’ mobility patterns. Using prior knowledge about the MUs’ locations at a central processor, [29] proposes a content cooperative caching approach that maximizes the sum mean opinion score (MOS) of MUs. Table I summarizes the state-of-the-art research in caching for resource-constrained small-cell network.

I-A Summary of Contributions

In the majority of references mentioned above, the popularity profile of content files is either known a prior or modeled as a Zipf-like distribution [30][31]. However, such an assumption is unrealistic in practice, particularly for the dense small-cell networks: (i) The file popularity is often non-stationary due to the dynamic nature of contents; (ii) Perfect knowledge of file popularity is technically difficult to guarantee, especially for intensive instantaneous demands; (iii) File popularity is an outcome of the decisions of a dense population, and very costly to predict [4]. In such a situation, developing a low-cost and effective learning-based approach for estimating the files’ popularity becomes imperative. Besides, to our best knowledge, no research has considered joint routing and cache placement optimization for resource-constrained small-cell networks so far.

Against this background, we focus on developing mobility-aware routing and caching strategies for dynamic small-cell networks under uncertainty. To jointly optimize the routing and cache placement, we first formulate the network cost minimization problem, which is an NP-hard integer programming (NIP) problem. We propose a federated routing and popularity learning (FRPL) approach by which the SBSs learn the popularity of files as a function of the non-stationary pedestrian- and request densities. To avoid unnecessary data retransmission via the backhaul, we develop a novel content transmission protocol that improves the cache-efficiency (CE) performance of SBSs. Besides, we optimize the cache placement using an algorithm that greedily approximates the minimizer of the NIP problem. The contributions of this paper are as follows:

  • Motivated by the notion of the one-stop-shop (OSS), we propose an FRPL approach that enables SBSs to complete multiple tasks simultaneously, thus reducing the processing time. That is crucial as the time slot within one global aggregation is limited.

  • The proposed FRPL method jointly learns the routing and content popularity considering the influence of both the users’ mobility and the SBSs’ geographical location. Therefore, it is robust against even in a highly dynamic and non-stationary network.

  • We design a cache placement approach that exploits the outcome of FRPL method combined with a novel transmission protocol to minimize the network’s content delivery cost by optimizing the cached content and delivery strategy.

  • Numerical results show that our cache placement policy guarantees a higher cache hit rate for SBSs compared to the existing schemes, although the MUs’ demand density is a priori unknown. Besides, the proposed greedy cache placement approach reduces the network cost compared to the state-of-the-art research. We also show the effectiveness of the proposed FRPL approach.

In Section II, we present the network model, FL model, and content request model. We then formulate the associated resource-constrained network optimization problem. In Section III, we propose an FRPL approach for multi-task learning. The analysis of the proposed model and solution appear in Section IV. Section V includes the numerical analysis. Section VI concludes the paper.

TABLE II:
List of Most Important Variables
Symbols Definitions/Explanations
QiQ_{i} The number of the samples collected by a participant ii.
𝝆i\boldsymbol{\rho}_{i} The local FL model hyperparameter.
𝜷=1Qi=1IQi𝝆i\boldsymbol{\beta}=\frac{1}{Q}\sum\nolimits_{i=1}^{{I}}Q_{i}\boldsymbol{\rho}_{i} The update of the global FL model.
pfp_{f} The probability that a request for file f{1,,M}f\in\{1,\cdots,M\} at time tt, and pf[0,1]p_{f}\in[0,1].
ck,fc_{k,f} The binary caching strategy of file ff at SBS k{1,,K}k\in\{1,\cdots,K\} at time tt, and ck,f{0,1}c_{k,f}\in\{0,1\}.
ψk,t\psi_{k,t} The pedestrian density at time tt at the site of SBS kk.
λf,t\lambda_{f,t} Expected request density of a specific file ff at time tt.
CkSC_{k}^{S} Cache capacity of SBS kk.
αCf\alpha_{C_{f}} Cost incurred by caching file ff at each SBS.
βM\beta_{M}; βMk\beta_{M_{k}}; βSk\beta_{S_{k}} Costs incurred by retrieving file ff from backhaul to MBS, from MBS to SBS kk, or directly from SBS kk.
dk,fd_{k,f} Normalized costs incurred by retrieving file ff from MBS to SBS kk.
L1(𝝆1,i)L_{1}({\boldsymbol{\rho}_{1,i}}); L2(𝝆2,i)L_{2}({\boldsymbol{\rho}_{2,i}}) Loss function for training local FL model hyperparameters 𝝆1,i{\boldsymbol{\rho}_{1,i}} (𝝆2,i{\boldsymbol{\rho}_{2,i}}) w.r.t. the pedestrian (request)-density prediction task.
(xk(t),yk(t))(x_{k}(t),y_{k}(t)) The final location of the centroid of cluster kk.
Nk,t+(j)N_{k,t}^{+}(j); Nk,t(j)N_{k,t}^{-}(j) Statistical function of pedestrian density for those who might come (leave) cell kk from (to) neighboring cell jj.
f(𝜷q,t,𝐱q,i,j,yq,i,j)\nabla f(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) The gradient function of f(𝜷q,t,𝐱q,i,j,yq,i,j)f(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) w.r.t. the global FL model 𝜷q,t\boldsymbol{\beta}_{q,t}.
F(𝜷q,t,𝐱q,i,j,yq,i,j)\nabla F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) The gradient function of F(𝜷q,t,𝐱q,i,j,yq,i,j)F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) w.r.t. the global FL model 𝜷q,t\boldsymbol{\beta}_{q,t}.
k,t\mathcal{L}_{k,t} The serving edge consisting of a set of MUs of which SBS kk will serve at time tt.

Notations: Bold lower case letters represent vectors, whereas bold upper case letters denote matrices. []T\left[{\cdot}\right]^{T}, 0\left\|{\cdot}\right\|_{0}, and \left\|{\cdot}\right\| denote transpose, l0l_{0}-norm, and l2l_{2}-norm operations, respectively. Besides, 𝔼[]\mathbb{E}[\cdot] is used to denote the statistical expectation operation. \nabla represents the gradient of a function. Table II summarizes the frequently-used variables in the order of appearance in the paper.

II System Model and Problem Formulation

II-A Network Model

We consider a network consisting of KK SBSs and one MBS. The network serves UU MUs simultaneously. Due to the dense deployment, the coverage areas of the SBSs and the MBS overlap; Hence, at each time, an MU might be in the communication range of multiple entities that can contribute to learning and service delivery (contributors or participants). Before proceeding, we present a definition.

Definition 1 (Global Aggregation):

A global aggregation at period TT refers to the required time to perform the cascaded FRPL and cache placement approaches until convergence, i.e., completing the involved tasks.

Refer to caption
Figure 1: The learning procedure with I{I} participants.

Throughout the paper, we use two time scales: (i) t={1,2,,𝜽}t=\{1,2,\cdots,\boldsymbol{\theta}\}. It refers to a learning round (period) in the FRPL method, where 𝜽\boldsymbol{\theta} is the required time until the global loss function converges; and (ii) T(𝜽)={1,2,}T(\boldsymbol{\theta})=\{1,2,\cdots\}, which corresponds every instance of the global aggregation. Thus, each global aggregation consists of several learning rounds. The proposed cache placement approach takes several iterations; however, at one time instant (learning period), both the routing strategy and cache placement remain fixed.

FL is the core of the proposed FRPL methodology. The MBS and SBSs collaboratively learn a shared model while each entity retains its training data at its side. The FL model trained at the participant’s side is the local model. The MBS integrates the local models and generates the global model, which improves the local model of each participant. Fig. 1 shows the workflow and details follow.

Consider II participants labeled as {1,,I}\{1,\cdots,{I}\}. We denote the input data of participant ii, i{1,,I}i\in\{1,\cdots,{I}\} by 𝐗i=[𝐱i,1,,𝐱i,Qi]\mathbf{X}_{i}=[\mathbf{x}_{i,1},\cdots,\mathbf{x}_{i,Q_{i}}], where QiQ_{i} indicates the number of the samples. Thus 𝐗i\mathbf{X}_{i} is the entry vector to train the local FL model. The output vector is 𝐲i=[yi,1,,yi,Qi]T\mathbf{y}_{i}=[y_{i,1},\cdots,y_{i,Q_{i}}]^{T}. The vector 𝝆i\boldsymbol{\rho}_{i} captures the model parameters of the local FL model. Under a linear model assumption, we have 𝐲i=𝐗i𝝆i+𝐧\mathbf{y}_{i}=\mathbf{X}_{i}\boldsymbol{\rho}_{i}+\mathbf{n}, where 𝐧\mathbf{n} is measurement noise typically approximated as Gaussian i.i.d. samples. In the standard gradient descent (SGD) methods [32], [33], training and update procedures of the local FL model aims at finding optimal parameters 𝝆i\boldsymbol{\rho}_{i} that minimize the squared error cost function f(𝝆i,𝐗i,𝐲i)=𝐗i𝝆i𝐲i2f(\boldsymbol{\rho}_{i},\mathbf{X}_{i},\mathbf{y}_{i})=||\mathbf{X}_{i}\boldsymbol{\rho}_{i}-\mathbf{y}_{i}||^{2}. We define a vector 𝜷\boldsymbol{\beta} to capture the parameters related to the global FL model. The update of the global model therefore is given by 𝜷=1Qi=1IQi𝝆i\boldsymbol{\beta}=\;\frac{1}{Q}\sum\nolimits_{i=1}^{{I}}\;Q_{i}\boldsymbol{\rho}_{i}. Section III provides more details of the FRPL framework.

II-B Content Request Model

We consider a content library with MM contents. The size of each file f{1,,M}f\in\{1,\cdots,M\} is gfg_{f}. The MUs request file ff randomly and independently with probability pf[0,1]p_{f}\in[0,1] so that 𝐩=[p1,,pM]T\mathbf{p}=[p_{1},\cdots,p_{M}]^{T} is the request probability vector. Traditionally, upon requesting a file within the time deadline τ\tau, an MU obtains it through random caching, local SBS caching, or MBS caching [2]. Nevertheless, such naive caching mechanisms barely guarantee a high cache efficiency for the following reasons: (i) They largely neglect the limited storing capacity and the finite bandwidth; (ii) They often retransmit the data unnecessarily. Therefore, we decompose the cache domain of an MU’s required contents into three categories:

  • Local Caching: The MU first checks the local SBS. If the required content is cached, then the MU obtains it directly from there within the time deadline τ\tau. Otherwise, the MU receives it from one of the following sources:

  • Intra-Cell Caching: If the local SBS does not have the required content, it can fetch it from other SBSs in the intra-cell domain upon availability so that the MU is served within the deadline τ\tau.

  • Inter-Cell Caching: If no SBS in the intra-cell domain has the required content, the local SBS fetches it via the backhaul link, from an external SBS deployed in the other overlapped cells or, in the worst case, from the MBS.

II-C Problem Statement

We focus on the following challenges: (i) When MUs migrate into one area, proper retrieving of several requested files over a backhaul while minimizing the cost becomes complicated; (ii) The MUs’ preferences affect the cache efficiency of SBSs. Moreover, distributed caching might result in duplicate files in a small area; (iii) Joint learning and cache placement, also service provisioning, can yield long delays.

Let 𝐂{0,1}K×M\mathbf{C}\in\{0,1\}^{K\times M} be a binary caching strategy of the required contents in SBS, where ck,f=1c_{k,f}=1 indicates the kkth SBS stores the file ff, and ck,f=0c_{k,f}=0 indicates otherwise. Let k=1Kf=1Mck,fαCf\sum\nolimits_{k=1}^{K}\sum\nolimits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}} denote the SBS’s aggregated cost as a result of caching, where αCf\alpha_{C_{f}} is the cost of caching content ff. The cost mainly depends on the file size gfg_{f}. The expected cost for retrieving content ff from SBS k{1,,K}k\in\{1,\cdots,K\} within a global aggregation period can be written as

Jk,ft(ψk,t,pf)=ψk,tpf(ck,fβSk+(1ck,f)dk,f),\displaystyle\begin{split}J^{t}_{k,f}(\psi_{k,t},p_{f})=\psi_{k,t}\cdot p_{f}\cdot\left(c_{k,f}\cdot\beta_{S_{k}}+(1-c_{k,f})\cdot d_{k,f}\right),\end{split} (1)

where ψk,t\psi_{k,t} represents the pedestrian density at time slot tt at the site of SBS kk. Besides, λf,t=ψk,tpf\lambda_{f,t}=\psi_{k,t}\cdot p_{f} indicates the expected request density (i.e., the number of requests per time slot) of a specific file ff at time tt. Finally, dk,f=lkK(1cl,f)(βM+βMk+βSk)d_{k,f}=\prod\nolimits_{l\neq k}^{K}(1-c_{l,f})(\beta_{M}+\beta_{M_{k}}+\beta_{S_{k}}) corresponds to the worst case cost, i.e., when fetching file ff from the MBS backhaul. βM\beta_{M} is a constant cost for retrieving ff via an MBS backhaul, βMk\beta_{M_{k}} denotes the cost of retrieving ff for SBS kk from MBS backhaul, and βSk\beta_{S_{k}} refers to the cost incurred by retrieving ff for MUs from the SBS kk. Besides, λf,tck,fβSk\lambda_{f,t}\cdot c_{k,f}\cdot\beta_{S_{k}} is the expected cost for retrieving ff from SBS kk, if available. Minimizing the aggregated cost Jk,ft(ψk,t,pf)J^{t}_{k,f}(\psi_{k,t},p_{f}) in (1) over the KK SBSs, i.e., minimize{𝐂}k=1KJk,ft(ψk,t,pf)\mathop{\text{minimize}}\nolimits_{\{\mathbf{C}\}}\quad\sum\nolimits_{k=1}^{K}J^{t}_{k,f}(\psi_{k,t},p_{f}), yields the optimal routing strategy for each file. Such content routing strategy stipulates that retrieving file ff through the desired SBS k=argmink=1KJk,ft(ψk,t,pf)k^{*}=\operatorname*{argmin}\nolimits\quad\sum\nolimits_{k=1}^{K}J^{t}_{k,f}(\psi_{k,t},p_{f}) leads to a least aggregated cost.

The expected cost Jk,ft(ψk,t,pf){J_{k,f}^{t}(\psi_{k,t},p_{f})} in (1) is valid for the costs for every file request, regardless of the file being cached or not; i.e., it can serve the general transmission cost function for small-cell networks. Besides, it is an upper bound for the transmission cost incurred for retrieving file ff from SBS kk in a global aggregation period. Note that, in our mobility-aware scenario, a multicase-based caching policy is complex to implement. Indeed, most of the papers in that direction model the requests by a Poisson point process (PPP) across SBSs [23]; however, such assumption is not valid in our case, as the mobility pattern, and, thus, the user-SBS association, is dynamic and unknown.

The optimal mobility-aware routing strategy and cache placement is equivalent to minimizing the network cost per global aggregation. Formally,

minimize{𝐂}k=1Kf=1Mck,fαCf+k=1Kf=1MJk,ft(ψk,t,pf)\displaystyle\mathop{\text{minimize}}\limits_{\{\mathbf{C}\}}\quad\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}J^{t}_{k,f}(\psi_{k,t},p_{f}) (2a)
s.t.f=1Mck,fgfCkS,k{1,,K},t={1,2,},\displaystyle\;\;\;\text{s.t.}\sum\limits_{f=1}^{M}c_{k,f}g_{f}\leq C_{k}^{S},\forall\,k\in\{1,\cdots,{K}\},\,t=\{1,2,\cdots\}, (2b)
ck,f{0,1},k,f,\displaystyle\quad\;\;\;\;\;c_{k,f}\in\{0,1\},\;\forall\;k,f, (2c)

where Constraint (2b) means that the total size of cached files cannot exceed the cache capacity of SBS kk.

Problem (2) is an integer programming problem that is NP-hard. Moreover, the objective function is not available since it involves unknown popularity pfp_{f} and pedestrian density ψk,t\psi_{k,t}. In particular, there exists 2M+K2^{M+K} possible caching strategy matrices {𝐂}\{\mathbf{C}\}, implying an exponential growth in complexity as a function of MM and KK. Therefore, it is essential to develop an efficient approach to solve the problem (2) while maintaining a low delay.

III Federated Routing and Popularity Learning

Based on the OSS concept, we propose an FRPL approach to learn the pedestrian- and request density while ensuring a fast model aggregation. It consists of three major steps:

Geographical Location Division: We divide the entire area uniformly into KK small parts. Each area includes an SBS and a set of connected MUs at time slot tt, denoted by 𝒰k,t,k{1,,K}\mathcal{U}_{k,t},k\in\{1,\cdots,K\}. Usually, MUs in 𝒰k,t\mathcal{U}_{k,t} have the same network cell ID at this moment.

Dual-Task Execution: The SBS of each sub-area aims at learning the pedestrian- and files’ request density by exploiting the location- and request information. Details follow.

  • TASK 1 (Pedestrian Density Prediction): To predict the pedestrian density of a cell kk, k𝒦k\in\mathcal{K}, the corresponding SBS derives the following statistics for the set of MUs at KK areas: (i) The number of pedestrian clusters in the transition region of κ\kappa neighboring cells that have kk as the predicted next cell [34]; (ii) The number of pedestrians already transited to kk; (iii) The number of pedestrians that probably leave cell kk.
    To search the κ\kappa neighboring areas to find the candidates of pedestrian clusters, the SBS uses the KK-means clustering algorithm [35] with the loss function given by L1(𝝆1,i):=j𝒰i,t𝐱1,i,jf(𝝆1,i,𝐱1,i,j)2L_{1}({\boldsymbol{\rho}_{1,i}}):=\sum\nolimits_{j\in\mathcal{U}_{i,t}}||\mathbf{x}_{1,i,j}-f(\boldsymbol{\rho}_{1,i},\mathbf{x}_{1,i,j})||^{2}, where f(𝝆1,i,𝐱1,i,j)f(\boldsymbol{\rho}_{1,i},\mathbf{x}_{1,i,j}) is the centroid of all objects assigned to x1,i,jx_{1,i,j}’s class.
    In words, the pedestrian clustering minimizes the sum of squared errors between data points and their respective centroids until reaching a stationary centroid.
    To obtain the number of pedestrian clusters that are approaching the desired cell kk, the SBS uses the following detection criterion:

    (x0xk(t))2+(y0yk(t))2(x0xk(t1))2+(y0yk(t1))2<1,\displaystyle\frac{\sqrt{(x_{0}-x_{k}(t))^{2}+(y_{0}-y_{k}(t))^{2}}}{\sqrt{(x_{0}-x_{k}(t-1))^{2}+(y_{0}-y_{k}(t-1))^{2}}}<1, (3)

    where (x0,y0)(x_{0},y_{0}) is the location of the SBS kk, and (xk(t),yk(t))(x_{k}(t),y_{k}(t)) is the final location of the centroid of cluster kk with k={1,,κ}k=\{1,\cdots,\kappa\}. We use Nk,t+(j)N_{k,t}^{+}(j) to denote the statistical function of density associated with the pedestrians that might transit to cell kk from neighboring cell jj. Besides, Nk,t(j)N_{k,t}^{-}(j) is the statistical function of density associated with the pedestrians might leave cell kk to the neighboring cell jj. Thus, the pedestrian density of cell kk from the neighboring cell jj yields Nk,j,t=max(Nk,t+(j)Nk,t(j),0)N_{k,j,t}=\text{max}(N_{k,t}^{+}(j)-N_{k,t}^{-}(j),0). Moreover, Nk,0,tN_{k,0,t} refers to the density function associated with the pedestrians already moved to cell kk. Then the pedestrian density of the desired cell yields

    ψk,t=Nk,0,t+i=1κmax(Nk,t+(i)Nk,t(i),0)=Nk,0,t+Nk,1,t++Nk,κ,t=i=0κNk,i,t,\displaystyle\begin{split}\psi^{*}_{k,t}&=N_{k,0,t}+\sum\limits_{i=1}^{\kappa}\text{max}\left(N_{k,t}^{+}(i)-N_{k,t}^{-}(i),0\right)\\ &=N_{k,0,t}+N_{k,1,t}+\ldots+N_{k,\kappa,t}=\sum\limits_{i=0}^{\kappa}N_{k,i,t},\end{split} (4)

    given that their respective centroids of clusters 1,,κ1,\cdots,\kappa satisfy the detection criterion in (3) accordingly. Fig. 2 illustrates the procedure described above.

    Refer to caption
    Figure 2: Prediction of ψk,t\psi_{k,t}^{*} in a dense network where moving pedestrians (purple dots) that might move to the desired cell (crosses) are clustered, whereas moving pedestrians (blue dots) that might leave the current cell are clustered at time slot τ\tau.
  • TASK 2 (Request Density Prediction): The SBS first searches the κ\kappa neighboring areas to find those MUs falling into 𝒰k,t,k{1,,K}\mathcal{U}_{k,t},k\in\{1,\cdots,K\}, and to observe the demographic information about those associated users and their ratings. Naturally, the files with high ratings have more request density.

    Using the files’ rating together with the users’ features, the SBS learns the request density of every file ff, f={1,,M}f=\{1,\cdots,M\}, by minimizing the least squared error between the estimated request density and the actual one. By exploring the linear regression model to predict λf,t\lambda_{f,t}, the SBS uses the following loss function

    L2(𝝆2,i):=12y2,i,jf(𝝆2,i,𝐱2,i,j)2+α𝝆2,i2,\displaystyle L_{2}({\boldsymbol{\rho}_{2,i}}):=\frac{1}{2}||y_{2,i,j}-f(\boldsymbol{\rho}_{2,i},\mathbf{x}_{2,i,j})||^{2}+\alpha||\boldsymbol{\rho}_{2,i}||^{2}, (5)

    where α\alpha is a regularization hyperparameter. The popularity of file ff is pf=λf,tf=1Mλf,tp_{f}=\frac{\lambda_{f,t}}{\sum\nolimits_{f=1}^{M}\lambda_{f,t}}. Thus, the expected request density λf,t\lambda_{f,t}^{*} is given by

    λf,t=ψk,tpf=(Nk,0,t+Nk,1,t++Nk,κ,t)λf,tf=1Mλf,t.\displaystyle\begin{split}\lambda_{f,t}^{*}&=\psi^{*}_{k,t}\cdot p_{f}\\ &=\frac{\big{(}N_{k,0,t}+N_{k,1,t}+\ldots+N_{k,\kappa,t}\big{)}\cdot\lambda_{f,t}}{\sum\limits_{f=1}^{M}\lambda_{f,t}}.\end{split} (6)

    Later in Section IV, the SBS uses this metric for cache placement.

Fast Model Aggregation: In standard SGD methods [32], [33], the unknown model is iteratively estimated by computing 𝝆q,i\boldsymbol{\rho}_{q,i} at each epoch, evaluating a gradient associated to the loss function

Lq(𝝆q,i)={j𝒰i,t𝐱q,i,jf(𝝆q,i,𝐱q,i,j)2,q=1,12yq,i,jf(𝝆q,i,𝐱q,i,j)2+α𝝆q,i2,q=2,\displaystyle L_{q}(\boldsymbol{\rho}_{q,i})=\begin{cases}{\sum\nolimits_{j\in\mathcal{U}_{i,t}}||\mathbf{x}_{q,i,j}-f(\boldsymbol{\rho}_{q,i},\mathbf{x}_{q,i,j})||^{2}},&{q=1},\\ {}\\ {\frac{1}{2}||y_{q,i,j}-f(\boldsymbol{\rho}_{q,i},\mathbf{x}_{q,i,j})||^{2}+\alpha||\boldsymbol{\rho}_{q,i}||^{2}},&{q=2},\end{cases} (7)

with q={1,2}q=\{1,2\} being the indices of prediction tasks. Thus the training procedure of the FRPL finds the optimal parameters {𝝆q,1,,𝝆q,K}\{\boldsymbol{\rho}_{q,1},\cdots,\boldsymbol{\rho}_{q,K}\} that minimize the sum of squared error cost function, i.e., Lq(𝝆q,i)L_{q}(\boldsymbol{\rho}_{q,i}) in (7). For the linear regression FRPL in the request density prediction task, the model averaging constraint guarantees that KK SBSs and MBS share the same FL model after the convergence.
In the framework of FRPL, KK SBSs send the local models {𝝆q,1,,𝝆q,K}\{\boldsymbol{\rho}_{q,1},\cdots,\boldsymbol{\rho}_{q,K}\} to the MBS.111For each global aggregation, all participants update their parameters 𝝆q,1,,𝝆q,K\boldsymbol{\rho}_{q,1},\cdots,\boldsymbol{\rho}_{q,K} to facilitate the highest running efficiency of FRPL [9]. At time slot tt, the global model, denoted by 𝜷q,t\boldsymbol{\beta}_{q,t}, is given by222We denote the local and global model parameters with time-scale index tt, to elaborate the procedure of aggregating the local and global FL models at each epoch.

𝜷q,t=1Qqi=1KQq,i𝝆q,i,t\displaystyle\boldsymbol{\beta}_{q,t}=\frac{1}{Q_{q}}\sum\limits_{i=1}^{K}\;{Q_{q,i}\cdot\boldsymbol{\rho}_{q,i,t}} ,q{1,2},\displaystyle,\;\;\quad q\in\{1,2\}, (8)

where Qq,iQ_{q,i} indicates the number of samples collected by the local model ii w.r.t. the task qq. Besides, Qq=i=1KQq,iQ_{q}=\sum\nolimits_{i=1}^{{K}}Q_{q,i} is the total number of samples collected by all local models w.r.t. the task qq. Note that i=1KQq,i𝝆q,i,t\sum\nolimits_{i=1}^{K}{Q_{q,i}\cdot\boldsymbol{\rho}_{q,i,t}} indicates the total number of training data points. The MBS then sends 𝜷q,t\boldsymbol{\beta}_{q,t} back to the SBSs.
After receiving 𝜷q,t\boldsymbol{\beta}_{q,t} from the MBS, the SBSs use the SGD method to update the local models {𝝆q,1,,𝝆q,K}\{\boldsymbol{\rho}_{q,1},\cdots,\boldsymbol{\rho}_{q,K}\}. The update of local model 𝝆q,i,t\boldsymbol{\rho}_{q,i,t} follows as [32]

𝝆q,i,t+1=𝜷q,tηQq,ij=1Qq,if(𝜷q,t,𝐱q,i,j,yq,i,j)\displaystyle\boldsymbol{\rho}_{q,i,t+1}=\boldsymbol{\beta}_{q,t}-\frac{\eta}{Q_{q,i}}\sum\limits_{j=1}^{Q_{q,i}}\nabla f(\boldsymbol{\beta}_{q,t},\;\mathbf{x}_{q,i,j},\;{y}_{q,i,j}) (9)

w.r.t. the task qq, where η\eta denotes the learning rate. Moreover, f(𝜷q,t,𝐱q,i,j,yq,i,j)\nabla f(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) corresponds to the gradient function of f(𝜷q,t,𝐱q,i,j,yq,i,j)f(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) w.r.t. 𝜷q,t\boldsymbol{\beta}_{q,t}. For simplicity, we hereby define F(𝜷q,t,𝐱q,i,j,yq,i,j)=1Qqi=1Kj=1Qq,if(𝜷q,t,𝐱q,i,j,yq,i,j)\small F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j})=\frac{1}{Q_{q}}\sum\nolimits_{i=1}^{K}\sum\nolimits_{j=1}^{Q_{q,i}}f(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) as the squared error cost function of the global model.
After receiving the updated local models, the MBS also updates the global model 𝜷q,t\boldsymbol{\beta}_{q,t} as [36]

𝜷q,t+1=𝜷q,tη(F(𝜷q,t,𝐱q,i,j,yq,i,j)𝚯q),q{1,2},\displaystyle\begin{split}\boldsymbol{\beta}_{q,t+1}=\boldsymbol{\beta}_{q,t}-\eta\Big{(}\nabla F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j})-\boldsymbol{\Theta}_{q}\Big{)},\;q\in\{1,2\},\end{split} (10)

where F(𝜷q,t,𝐱q,i,j,yq,i,j)\nabla F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) is the gradient function of F(𝜷q,t,𝐱q,i,j,yq,i,j)F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j}) w.r.t. 𝜷q,t\boldsymbol{\beta}_{q,t}. Besides, 𝚯q\boldsymbol{\Theta}_{q} is given by

𝚯q=F(𝜷q,t,𝐱q,i,j,yq,i,j)1Qqi=1KQq,i𝝆q,i,t,q{1,2}.\displaystyle\small\boldsymbol{\Theta}_{q}=\nabla F(\boldsymbol{\beta}_{q,t},\mathbf{x}_{q,i,j},{y}_{q,i,j})-\frac{1}{Q_{q}}\sum\limits_{i=1}^{{{K}}}\;{Q_{q,i}\cdot\boldsymbol{\rho}_{q,i,t}},\;q\in\{1,2\}. (11)

Algorithm 1 summarizes the described stages.

Algorithm 1 Federated Routing and Popularity Learning (FRPL)
1:Initialize the points T0T\leftarrow 0, t0t\leftarrow 0, κ0\kappa\leftarrow 0, and η>0\eta>0.
2:The pedestrian density ψ1:K,t{\psi}^{*}_{1:K,\,t} and the expected request density λ1:M,t\lambda^{*}_{1:M,\,t}.
3:for T=1,2,T=1,2,\cdots do
4:     for t=1,,θt=1,\cdots,{\theta} do \triangleright Pedestrian Density Prediction
5:         
For each local FL model, the respective SBS searches the κ\kappa neighboring areas to find the candidates of pedestrian clusters.
6:         
Calculate the gradient of the loss function L1(𝝆1,i,t)L_{1}(\boldsymbol{\rho}_{1,i,t}).
7:         
Monitor the statistic of clusters and estimate the number of clusters by following the criterion in (3).
8:         
Update the local FL model 𝝆1,i,t\boldsymbol{\rho}_{1,i,t} and the centroids of κ\kappa clusters.
9:         
Estimate the pedestrian density ψk,tNk,0,t+Nk,1,t++Nk,κ,t.\psi^{*}_{k,t}\leftarrow N_{k,0,t}+N_{k,1,t}+\cdots+N_{k,\kappa,t}.
10:     end for
11:     for t=1,,θt=1,\,\cdots,\,\theta do \triangleright Request Density Prediction
12:         
Search the κ\kappa neighboring areas and observe the statistical information in terms of users’ demographic information as well as their ratings.
13:         
Pre-process data by merging the users’ demographic information and rating information.
14:         
Learn the request density λf,t\lambda_{f,t} of each file f{1,,M}f\in\{1,\cdots,M\}, by means of minimizing the least squared error L2(𝝆2,i,t)L_{2}({\boldsymbol{\rho}_{2,i,t}}) between the estimated and actual request density.
15:         
Update the local FL model 𝝆2,i,t\boldsymbol{\rho}_{2,i,t}.
16:         
Estimate the expected request density λf,t(Nk,0,t+Nk,1,t++Nk,κ,t)λf,tf=1Mλf,t.\lambda^{*}_{f,t}\leftarrow\frac{\big{(}N_{k,0,t}+N_{k,1,t}+\cdots+N_{k,\kappa,t}\big{)}\cdot\lambda_{f,t}}{\sum\nolimits_{f=1}^{M}\lambda_{f,t}}.
17:     end for
18:     
The distributed KK SBSs invoke the SGD method to update the local FL models {𝝆q,1,t,,𝝆q,K,t}\{\boldsymbol{\rho}_{q,1,t},\cdots,\boldsymbol{\rho}_{q,K,t}\}, q={1,2}q=\{1,2\}, to MBS to aggregate the local FL models.
19:     Update the global FL model 𝜷q,t\boldsymbol{\beta}_{q,t} by following (10).
20:     
Update the gradient function of global FL model by (11).
21:end for

Finally, we note that the dimension of the raw pedestrians’ data (including geographic location, date, time, sojourn time) affects the outcome of KK-means clustering algorithm [35]. We therefore state the following proposition.

Proposition 1:

The expected request density λf,t\lambda_{f,t}^{*} has the following properties:

  • When the datasets utilized for user clustering in transition region of neighboring cells are insufficient (empty or very few data points), the associated λf,t\lambda_{f,t}^{*} can be lower-bounded as

    λf,tNk,0,tλf,tf=1Mλf,t.\displaystyle\lambda_{f,t}^{*}\geq\frac{N_{k,0,t}\cdot\lambda_{f,t}}{\sum\limits_{f=1}^{M}\lambda_{f,t}}. (12)
  • If the datasets’ dimension is larger than 1010, the associated λf,t\lambda_{f,t}^{*} can be upper-bounded as

    λf,t(Nk,0,t+Nk,1,t++Nk,κ,t)λf,tf=1Mλf,t,\displaystyle\lambda_{f,t}^{*}\leq\frac{\left(N_{k,0,t}+N_{k,1,t}+\ldots+N_{k,\kappa^{*},t}\right)\cdot\lambda_{f,t}}{\sum\limits_{f=1}^{M}\lambda_{f,t}}, (13)

    where κ\kappa^{*} indicates the maximum number of non-empty clusters.

Proof:

See Appendix A. ∎

IV Cache Placement Policy

For each SBS k{1,,K}k\in\{1,\cdots,K\}, let k,t\mathcal{L}_{k,t} denote the serving edge, i.e., the set of MUs that it severs at time tt at both the intra- and inter-cell domains, as shown in Fig. 3. Obviously, the k,t\mathcal{L}_{k,t} is time-variant and depends on the mobility of users.
For each global aggregation, every SBS kk attempts to serve as many MUs as possible across its service edge to reduce the aggregated cost of retrieving MM contents across KK SBSs, i.e., k=1Kf=1MJk,ft(ψk,t,pf)\sum\nolimits_{k=1}^{K}\sum\nolimits_{f=1}^{M}J^{t}_{k,f}(\psi_{k,t},p_{f}) in (2). Thus, for simplicity, we assume that at each global aggregation, the maximum number of MUs that SBS kk serves in its service edge equals to the maximum pedestrian density ψk,t\psi_{k,t} that FRPL method predicts.333By Proposition 1, at time tt, the pedestrian density of each cell kk can be upper bounded by Nk,0,t+Nk,1,t++Nk,κ,tN_{k,0,t}+N_{k,1,t}+\cdots+N_{k,\kappa^{*},t}, where κ\kappa^{*} corresponds to the maximum number of non-empty clusters; That is, the maximum number of MUs that SBS kk can serve is equivalent to the upper bound of the pedestrian density of this cell.

Refer to caption
Figure 3: Selecting the serving edge at time slot tt with (k,k,t)(k,\mathcal{L}_{k,t}) being the candidate edge.

IV-A The Algorithm

After learning ψk,t\psi^{*}_{k,t} and pfp_{f} using the FRPL method, each SBS k{1,,K}k\in\{1,\cdots,K\} knows also k,t\mathcal{L}^{*}_{k,t}. Then the optimal cache placement problem is equivalent to minimizing the network cost function 𝒟({ck,f})\mathcal{D}(\{c_{k,f}\}) given by

𝒟({ck,f})=k=1Kf=1Mck,fαCf+k=1Kf=1Mψk,tpf(ck,fβSk+(1ck,f)dk,f).\displaystyle\begin{split}&\mathcal{D}(\{c_{k,f}\})=\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}+\\ &\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\psi_{k,t}^{*}\cdot p_{f}\cdot\Big{(}c_{k,f}\cdot\beta_{S_{k}}+(1-c_{k,f})\cdot d_{k,f}\Big{)}.\end{split} (14)

Thus the optimization problem follows as

minimize{ck,f}𝒟({ck,f})\displaystyle\mathop{\text{minimize}}\limits_{\{c_{k,f}\}}\quad\quad\quad\quad{\mathcal{D}}(\{c_{k,f}\}) (15a)
s.t.f=1Mck,fgfCkS,k{1,,K},t={1,2,},\displaystyle\,\,\,\text{s.t.}\sum\limits_{f=1}^{M}c_{k,f}g_{f}\leq C_{k}^{S},\;\forall\;k\in\{1,\cdots,K\},\,t=\{1,2,\cdots\}, (15b)
ck,f{0,1},k,f.\displaystyle\;\;\;\;\;\,\,\;c_{k,f}\in\{0,1\},\;\quad\forall\;k,f. (15c)

Problem (15) is an NP-hard integer programming. Given cache placement policy {ck,f}\{c_{k,f}\}, the content retrieval policy is determined based on our content transmission protocol introduced in Section II-C. To solve the problem with low complexity, we develop Algorithm 2, which greedily places the files to a cache and provides an approximate solution.
Let IkI_{k} be the total size of files that are already cached at SBS kk during each iteration of our algorithm. Besides, k,f\mathcal{L}_{k,f} denotes the set of pairs {k,f}\{k,f\}, where SBS kk does not store file ff yet despite having the sufficient cache capacity. At iteration Θ{\Theta}, the algorithm conducts the cache placement by picking the pair {k,f}k,fΘ\{k^{*},f^{*}\}\in\mathcal{L}^{\Theta}_{k,f} with the lowest 𝒟({ck,f})\mathcal{D}(\{c_{k,f}\}), provided that such cost is lower than the one in the previous iteration. Note that if the algorithm starts with the pair {k,f}\{k,f^{*}\} that results in the lowest cost 𝒟({ck,f})\mathcal{D}(\{c_{k,f^{*}}\}), then the relative network cost caused by SBS kk at each iteration remains fixed. Formally,

{k,f}=argmin{k,f}k,fΘ𝒟({ck,f}).\displaystyle\{k^{*},f^{*}\}=\operatorname*{argmin}\limits_{\{k,f\}\in\mathcal{L}^{{\Theta}}_{k,f}}{\mathcal{D}}(\{c_{k,f}\}). (16)

Then, if IkΘ<CkSI^{{\Theta}}_{k^{*}}<C^{S}_{k^{*}}, the algorithm updates IkΘ+1=IkΘ+ck,fgfI^{{\Theta+1}}_{k^{*}}=I^{{\Theta}}_{k^{*}}+c_{k^{*},f^{*}}\cdot g_{f^{*}} and k,fΘ+1=k,fΘ\{k,f}\mathcal{L}^{{\Theta+1}}_{k,f}=\mathcal{L}^{\Theta}_{k,f}\backslash\{k^{*},f^{*}\}; Otherwise, when IkΘ=CkSI^{\Theta}_{k^{*}}=C^{S}_{k^{*}} implies that the cache of SBS kk^{*} is full, the algorithm excludes all pairs {k,f}\{k^{*},f\} from k,fΘ\mathcal{L}^{\Theta}_{k,f}, and stops the cache placement for SBS kk^{*}. The algorithm terminates when all the caches are full. Algorithm 2 summarizes the described greedy cache placement.

Algorithm 2 Cache Placement Policy
1:Initialize the occupied cache size IkI_{k} with zero. Let KiteK_{ite} be the required number of iterations.
2:The near-optimal cache placement {ck,f}\{c_{k,f}\}.
3:for Θ\Theta = 1, 2, \cdots, KiteK_{ite} do
4:     
Pick the pair {k,f}k,f\{k^{*},f^{*}\}\in\mathcal{L}_{k,f} with the lowest network cost 𝒟({ck,f})\mathcal{D}(\{c_{k,f}\}) according to (16).
5:     if IkΘ<CkSI^{\Theta}_{k^{*}}<C^{S}_{k^{*}} then
6:         Let IkΘ+1IkΘ+ck,fgfI^{\Theta+1}_{k^{*}}\leftarrow I^{\Theta}_{k^{*}}+c_{k^{*},f^{*}}\cdot g_{f^{*}}.
7:         Let k,fΘ+1k,fΘ\{k,f}\mathcal{L}^{\Theta+1}_{k,f}\leftarrow\mathcal{L}^{\Theta}_{k,f}\backslash\{k^{*},f^{*}\}.
8:     else Exclude all the pairs {k,f}\{k^{*},f\} f\forall f from k,fΘ\mathcal{L}^{\Theta}_{k,f}.
9:     end if
10:end for

IV-B Discussion

Let B(k)u,fB_{(k)_{u},f} be a binary variable that indicates whether SBS kk caches a file ff requested by MU uk,tu\in\mathcal{L}^{*}_{k,t}. We observe that the approximation ratio of our proposed cache policy is proportional to the number of MUs that an SBS serves at each global aggregation [37] when neglecting the normalized cost for retrieving files from MBS.444Numerical analyses show that the normalized cost parameter for retrieving files from MBS backhaul has only a limited influence on the network cost minimization. To support this conclusion, we present the following results.

Proposition 2:

For each global aggregation period, the normalized cost caused by caching at SBSs is equal to the aggregated costs for caching request contents over the serving edges. Then we have

{k,f}k,fck,fαCf={k,f}k,fmin(1,uk,tB(k)u,f)αCf,\displaystyle\begin{split}\sum\limits_{\{k,f\}\in\mathcal{L}_{k,f}}c_{k,f}\cdot\alpha_{C_{f}}=\sum\limits_{\{k,f\}\in\mathcal{L}_{k,f}}{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}},\end{split} (17)

where k,t\mathcal{L}^{*}_{k,t} is the estimated service edge of SBS k{1,,K}k\in\{1,\cdots,K\} at time tt.

Proof:

See Appendix B. ∎

Theorem 1:

When the normalized cost for retrieving files from MBS backhaul is neglected, i.e., for dk,fd_{k,f} = 0, our proposed cache placement policy in Algorithm 2 is a polynomial-time LL-approximation algorithm with LL being the maximum number of MUs that an SBS serves at its service edge at each time.

Proof:

See Appendix C. ∎

IV-C Complexity Analysis

Based on the optimization criterion given by (16), the computational complexity of cache placement using Algorithm 2 at SBS kk is polynomial time 𝒪(nk,f0)\mathcal{O}\left(n\cdot\left\|\mathcal{L}_{k,f}\right\|_{0}\right), with nn being the required number of iterations to solve problem (16). As k,f\mathcal{L}_{k,f} is the set of all possible (SBS, file) pairs that impose the lowest 𝒟({ck,f}){\mathcal{D}}(\{c_{k,f}\}) to fill the cache of SBS kk, the total computational complexity of the cache placement policy yields 𝒪(k=1,,Knk,f0)\mathcal{O}\left(\sum\nolimits_{k=1,\cdots,K}n\cdot\left\|\mathcal{L}_{k,f}\right\|_{0}\right). Hence the proposed policy has a linear complexity in the number of (SBS, file) pair.

TABLE III: SYSTEM PARAMETER SETUPS
Parameters Value
Global aggregation periods (rounds) TT 3.4×1033.4\times 10^{3}
The volume of content library MM 39523952
The total number of MUs UU 6040
The learning rate η\eta 3×1033\times 10^{-3}
The hyperparameter for regularization, i.e., α\alpha [1, 10]
The size of each content ff, i.e., gfg_{f} 1MB
Cache size of SBS kk, i.e., CkSC^{S}_{k} [50MB:500MB]
Probability of randomly selecting mm contents, i.e., ϵ\epsilon [0.01, 0.1, 0.8]
Power cost of caching content ff, i.e., αCf\alpha_{C_{f}} 1.5 mW
Power cost of retrieving ff via MBS, i.e., βM\beta_{M} 13 mW
Power cost of retrieving ff for SBS kk from MBS, i.e., βMk\beta_{M_{k}} 370 mW
Power cost of retrieving ff via SBS kk, i.e., βSk\beta_{S_{k}} 180 mW

V Simulation Results and Analysis

V-A Datasets and System Parameter Setups

We use a real-world dataset, namely, MovieLens 1M [38], to evaluate our proposed learning and caching strategies. Similar to [19], we assume that the movie rating process in the datasets is a streaming request. The MovieLens 1M datasets consist of (i) the users’ demographic information given in the format of <user ID*, age (in 7 categories), gender (in 2 categories), occupation (in 21 categories), Zip-code>; (ii) the rating information given in the format of <user ID*, movie ID, rating (in 5 categories), timestamp>.
To predict the request density, in the experiments, we merge the data of users’ demographic- and rating information based on the label of the user ID*. Against this join-type dataset, we pick the attributes of <movie ID, user ID*, their gender, age, Zip-code, and occupation> as the context dimensions. Table III gathers the most important parameters of our experiments.

V-B Performance Evaluation

We evaluate the average caching efficiency (the average ratio of cache hits within one global aggregation period compared to the total requests) of our proposals in comparison with the following cache placement approaches:

Refer to caption
Figure 4: The average cache efficiency (CE) of different cache placement approaches.
Refer to caption
Figure 5: Cumulative request density of different cache placement approaches.
  • Optimal with Full Information: This scheme has full information about the users’ demands; thus, it has the potential of providing the best caching performance;

  • ϵ\boldsymbol{\epsilon}-Greedy: The policy selects a set of mm files uniformly at random with probability ϵ1\epsilon\ll 1, while with probability 1ϵ1-\epsilon, it selects the mm files with the highest estimated popularity so far;

  • Least-Recently-Used (LRU): It fetches the requested files from the potential contributor (i.e., MBS backhaul) and caches it when a cache miss event occurs. If an item exceeds the cache capacity of SBS, it removes the file in the cache that has been least recently used;

  • Random: This scheme selects a random set of files to cache in each time slot.

In Fig. 4, we compare the performance of different caching approaches using the same dataset, so that the file set remains constant. The figure shows that our proposed algorithm achieves a gain significantly higher than that of the random- and ϵ\epsilon-greedy approaches under different ϵ\epsilon. The gain grows as the cache size increases. The reason is that the proposed algorithm greedily places the cache to SBSs after learning the expected request densities of files by Algorithm 1.
Besides, our proposed approach exhibits superior performance compared to the LRU approach for large storage space. Also, by comparison between the proposed and optimal approaches, we deduce that the proposed algorithm provides a near-optimal solution while pursuing a minimum network cost.

Refer to caption

     (a)

Refer to caption

     (b)

Figure 6: Network cost impacted by (a) distinct cache placement methods; and (b) the proposed FRPL approach using distinct average cache sizes (%\% of file library) with cache content update duration = 4 (Hours).

Fig. 5 shows the cumulative request density of the cached files up to time slot TT with CkS=200C_{k}^{S}=200 MB. The cumulative request density is the sum of request densities of desired files at each round. The figure shows that the proposed algorithm achieves larger cumulative request density than ϵ\epsilon-greedy and random caching methods. In particular, at time slot T=2500T=2500, the cumulative request density achieved by the proposed greedy algorithm is 1.651.65, 1.871.87, 1.931.93, 3.763.76, and 10.3110.31 times the cumulative request density achieved by LRU, ϵ\epsilon-greedy (for ϵ=0.01,0.1,0.8\epsilon=0.01,0.1,0.8), and random placement, respectively.

In Fig. 6, we evaluate the network cost of the proposed FRPL- and local caching methods. It can be seen from Fig. 6(a) that the cost of our cache placement solution is less than that of local caching. The figure also shows that, compared to the local caching, the FRPL approach requires fewer iterations to arrive at a given level of network cost. This is because the FRPL approach accurately learns the expected request density from the context of observed files’ rating and users’ features. Fig. 6(b) depicts the network cost of the proposed FRPL approach (α\alpha = 10) for distinct cache sizes. We assume that the caching entity (i.e., the SBS) updates its cache content every 4-hours. Fig. 6(b) then shows that expanding the cache from 5%\% to 32.5%\% reduces the network cost of the FRPL approach. The reason is that enlarging the cache size expands the solution space of problem (15).

Refer to caption

     (a)

Refer to caption

     (b)

Figure 7: Network cost of different placement methods with (a) distinct average cache sizes (%\% of file library); and (b) different transmission cost ratios βMk/βSk\beta_{M_{k}}/\beta_{S_{k}}.
Refer to caption

     (a)

Refer to caption

     (b)

Refer to caption

     (c)

Figure 8: The normalized network cost versus the average cache efficiency with (a) distinct transmission cost ratios βMk/βSk\beta_{M_{k}}/\beta_{S_{k}} and the costs of caching content ff, the cache update duration is 15 hours; (b) different transmission cost ratios βMk/βSk\beta_{M_{k}}/\beta_{S_{k}} and the costs of caching content ff, the cache update duration is 20 hours; and (c) different cache placement methods, the cache update duration is 20 hours.

Fig. 7(a) and Fig. 7(b) show the network cost of different placement methods as a function of the cache size (%\% of file library) and the transmission cost ratio βMk/βSk\beta_{M_{k}}/\beta_{S_{k}}, respectively. Fig. 7(a) shows that the proposed algorithm reduces the network cost compared to ϵ\epsilon-greedy (ϵ=0.1,0.8\epsilon=0.1,0.8), and local caching approaches for cache sizes varying from 10%\% to 50%\%. The result in Fig. 7(a) is consistent with that in Fig. 6(a). Besides, Fig. 7(b) shows that the transmission cost ratio βMk/βSk\beta_{M_{k}}/\beta_{S_{k}} dramatically influences the cost performance of different methods. In particular, when increasing βMk/βSk\beta_{M_{k}}/\beta_{S_{k}} from 1 to 5, the cost of different placement approaches falls abruptly. Indeed, the parameter of normalized cost of file retrieval from the SBSs affects the performance more than that from the MBS over a backhaul.

In Fig. 8(a)-8(c), we evaluate the normalized network cost versus the average caching efficiency as a function of the transmission cost ratio βMk/βSk\beta_{M_{k}}/\beta_{S_{k}} and the cost of caching content ff for different placement approaches. We utilize movie rating data collected from MovieLens 1M [38] associated with distinct cache content update intervals. To model the user demand process, we synthesize two datasets with cache content update intervals equal to 15 hours (Fig. 8(a)) and 20 hours (Fig. 8(b) and Fig. 8(c)). Fig. 8(a) shows that: (i) with the decreasing cost of caching content ff, i.e., αCf\alpha_{C_{f}}, the proposed placement approach reduces the normalized network cost; (ii) with the increasing transmission cost ratio βMk/βSk\beta_{M_{k}}/\beta_{S_{k}} and for a given αCf\alpha_{C_{f}}, our approach improves the cost performance. Besides, Fig. 8(b) shows that when facing ultra-high request traffic, our approach reduces the normalized network cost. Finally, Fig. 8(c) reveals that the proposed placement approach reduces the network cost compared to the ϵ\epsilon-greedy (ϵ=0.1,0.8\epsilon=0.1,0.8) and local caching methods. The reason is that our proposal accurately predicts the expected request density of each file particularly for high-dimensional datasets.

VI Concluding Remarks and Future Directions

We developed mobility-aware routing and caching strategies for resource-constrained small-cell networks in an FL framework. To jointly optimize the routing and cache placement, we first formulated the NP-hard network cost minimization problem. To address the complexity issue, we first proposed an approach by which the SBSs learn the pedestrian density and request density. Afterward, based on the predictions, a greedy algorithm optimizes the cache placement and minimizes the network cost. Numerical results established that our proposed method yields a near-optimal solution.

Potential future works include the extension of our proposal in the following directions:
(i) Integrating energy-efficient real-time routing architectures using machine learning methods to reduce the routing overhead [39] and discovery latency [40] in mobile edge networks.
(ii) Allowing for time-variant popularity (i.e., dynamic content library), which renders learning, prediction, and energy-efficient network performance challenging.
(iii) Finally, it is essential to investigate a proper offloading of the learning tasks to potential participants, under constraints such as spectrum scarcity, transmission latency, and the like.

Appendix A Proof of Proposition 1.

KK-means clustering performs poor on datasets with n10n\geq 10 dimensional real space n\mathbb{R}^{n}. As a result, the KK-means clustering often converges with at least one or more clusters with either empty or very few data points. In this case, to facilitate fast model aggregation, we discard the empty clusters or summarize very few data points when performing the pedestrian density prediction. That results in κ\kappa^{*} non-empty clusters. Cluster filtration is essential, as a modest value of κ\kappa^{*} enables solutions that summarize the underlying data better [35].
Thus, when the datasets for user clustering in transition regions of neighboring cells are scarce, the estimated pedestrian density ψk,t\psi^{*}_{k,t} reaches trough of Nk,0,tN_{k,0,t}. That results in a lower bound λf,tNk,0,tλf,tf=1Mλf,t\lambda_{f,t}^{*}\geq\frac{N_{k,0,t}\cdot\lambda_{f,t}}{\sum\nolimits_{f=1}^{M}\lambda_{f,t}}. In contrast, when the number of dimensions of datasets is larger than 1010, the estimated pedestrian density ψk,t\psi^{*}_{k,t} reaches at a peak value of (Nk,0,t+Nk,1,t++Nk,κ,t)\left(N_{k,0,t}+N_{k,1,t}+\ldots+N_{k,\kappa^{*},t}\right), resulting in an upper bound λf,t(Nk,0,t+Nk,1,t++Nk,κ,t)λf,tf=1Mλf,t\lambda_{f,t}^{*}\leq\frac{\left(N_{k,0,t}+N_{k,1,t}+\ldots+N_{k,\kappa^{*},t}\right)\cdot\lambda_{f,t}}{\sum\nolimits_{f=1}^{M}\lambda_{f,t}}, which completes the proof. \hfill\blacksquare

Appendix B Proof of Proposition 2.

The normalized caching cost at KK SBSs is given by

k=1Kf=1Mck,fαCf.\displaystyle\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}. (18)

Based on the definition of k,f\mathcal{L}_{k,f}, (18)(\ref{equ:1202027}) can be rewritten as

{k,f}k,fck,fαCf.\displaystyle\sum\limits_{\{k,f\}\in\mathcal{L}_{k,f}}c_{k,f}\cdot\alpha_{C_{f}}. (19)

Then, by the definition of B(k)u,fB_{(k)_{u},f}, the aggregated cost for caching the request contents at the serving edge equals

{k,f}k,fmin(1,uk,tB(k)u,f)αCf.\displaystyle\sum\limits_{\{k,f\}\in\mathcal{L}_{k,f}}{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}. (20)

When no user requests content ff that is in the cache of SBS kk, the involved normalized cost ck,fαCfc_{k,f}\cdot\alpha_{C_{f}} and also the the aggregated cost at the serving edge, i.e., min(1,uk,tB(k)u,f)αCf{\rm{min}}\big{(}1,\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\big{)}\cdot\alpha_{C_{f}}, become zero. If that event occurs, one concludes that the normalized cost {k,f}k,fck,fαCf\sum\nolimits_{\{k,f\}\in\mathcal{L}_{k,f}}c_{k,f}\cdot\alpha_{C_{f}} in (19) equals the aggregated cost of MUs that fall into the edge k,t\mathcal{L}^{*}_{k,t}, i.e., {k,f}k,fmin(1,uk,tB(k)u,f)αCf\sum\nolimits_{\{k,f\}\in\mathcal{L}_{k,f}}{\rm{min}}\big{(}1,\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\big{)}\cdot\alpha_{C_{f}} in (20).
When some MUs request a content ff that is not in the cache, the proposed placement policy selects a unique (k,f)(k^{*},f) pair from k,f\mathcal{L}_{k,f} associated with the lowest 𝒟({ck,f}){\mathcal{D}}(\{c_{k^{*},f}\}). This means that ck,f=1c_{k^{*},f}=1. In this case, we have ck,fαCf=αCfc_{k^{*},f}\cdot\alpha_{C_{f}}=\alpha_{C_{f}}. Meanwhile, min(1,uk,tB(k)u,f)=1{\rm{min}}\big{(}1,\sum\nolimits_{u\in\mathcal{L}^{*}_{k^{*},t}}B_{(k^{*})_{u},f}\big{)}=1. That is because a requested content ff that falls into the set k,t{\mathcal{L}^{*}_{k^{*},t}} will be cached at SBS deployed at either the intra-cell domain or the inter-cell domain, so that min(1,uk,tB(k)u,f)αCf=αCf{\rm{min}}\big{(}1,\sum\nolimits_{u\in\mathcal{L}^{*}_{k^{*},t}}B_{(k^{*})_{u},f}\big{)}\cdot\alpha_{C_{f}}=\alpha_{C_{f}}. Therefore the equality of the normalized cost in (19), i.e., traversing all the candidate {k,f}\{k,f\} pairs of the set k,f\mathcal{L}_{k,f}, equaling to the aggregated cost in (20) holds, when an event of requesting an uncached content ff occurs.
Based on the discussion above we have

{k,f}k,fck,fαCf={k,f}k,fmin(1,uk,tB(k)u,f)αCf.\displaystyle\begin{split}&\sum\limits_{\{k,f\}\in\mathcal{L}_{k,f}}c_{k,f}\cdot\alpha_{C_{f}}=\\ &\sum\limits_{\{k,f\}\in\mathcal{L}_{k,f}}{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}.\end{split} (21)

This completes the proof of Proposition 2. \hfill\blacksquare

Appendix C Proof of Theorem 1.

The network cost minimization problem is given by

𝒟({ck,f})=k=1Kf=1Mck,fαCf+k=1Kf=1Mψk,tpfck,fβSk+k=1Kf=1Mψk,tpf(1ck,f)dk,f.\displaystyle\begin{split}\mathcal{D}(\{c_{k,f}\})=&\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\psi_{k,t}^{*}\cdot p_{f}\cdot c_{k,f}\cdot\beta_{S_{k}}\\ &+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\psi_{k,t}^{*}\cdot p_{f}\cdot(1-c_{k,f})\cdot d_{k,f}.\end{split} (22)

Let 𝒜\mathcal{A} and \mathcal{B} be two distinct subsets of 𝒜t\mathcal{A}_{t}, i.e., 𝒜𝒜t\mathcal{A}\subseteq\mathcal{A}_{t}, 𝒜t\mathcal{B}\subseteq\mathcal{A}_{t}, and 𝒜\mathcal{A}\neq\mathcal{B}. When keeping in the summation only a subset of the terms and all the terms are positive [23], i.e., ck,f0c_{k,f}\geq 0, αCf0\alpha_{C_{f}}\geq 0, λf,t=ψk,tpf0\lambda_{f,t}^{*}=\psi_{k,t}^{*}\cdot p_{f}\geq 0, βSk0\beta_{S_{k}}\geq 0, (1ck,f)0(1-c_{k,f})\geq 0, and dk,f0d_{k,f}\geq 0, the following inequalities hold, i.e., k=1Kf=1Mck,fαCf{k,f}𝒜ck,fαCf\sum\nolimits_{k=1}^{K}\sum\nolimits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}\geq\sum\nolimits_{\{k,f\}\in\mathcal{A}}c_{k,f}\cdot\alpha_{C_{f}}, k=1Kf=1Mλf,tck,fβSk{k,f}𝒜λf,tck,fβSk\sum\nolimits_{k=1}^{K}\sum\nolimits_{f=1}^{M}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}\geq\sum\nolimits_{\{k,f\}\in\mathcal{A}}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}, and k=1Kf=1Mλf,t(1ck,f)dk,f{k,f}λf,t(1ck,f)dk,f\sum\nolimits_{k=1}^{K}\sum\nolimits_{f=1}^{M}\lambda_{f,t}^{*}\cdot(1-c_{k,f})\cdot d_{k,f}\geq\sum\nolimits_{\{k,f\}\in\mathcal{B}}\lambda_{f,t}^{*}\cdot(1-c_{k,f})\cdot d_{k,f}. Thus, the network cost 𝒟({ck,f})\mathcal{D}(\{c_{k,f}\}) in (22) can be rewritten by

𝒟({ck,f})=k=1Kf=1Mck,fαCf+k=1Kf=1Mλf,tck,fβSk+k=1Kf=1Mλf,t(1ck,f)dk,f{k,f}𝒜ck,fαCf+{k,f}𝒜λf,tck,fβSk+{k,f}λf,t(1ck,f)dk,f(a){k,f}𝒜ck,fαCf+{k,f}𝒜λf,tck,fβSk,\displaystyle\begin{split}\mathcal{D}(\{c_{k,f}\})&=\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}\\ &\quad+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\lambda_{f,t}^{*}\cdot(1-c_{k,f})\cdot d_{k,f}\\ &\geq\sum\limits_{\{k,f\}\in\mathcal{A}}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{\{k,f\}\in\mathcal{A}}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}\\ &\quad+\sum\limits_{\{k,f\}\in\mathcal{B}}\lambda_{f,t}^{*}\cdot(1-c_{k,f})\cdot d_{k,f}\\ &\overset{(a)}{\geq}\sum\limits_{\{k,f\}\in\mathcal{A}}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{\{k,f\}\in\mathcal{A}}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}},\end{split} (23)

where inequality (a)(a) holds if one neglects the normalized cost for retrieving files from MBS, i.e., dk,f=0d_{k,f}=0.

Based on Proposition 2, we arrive at the following equivalent transformations against the normalized cost for caching contents in (18) and the normalized cost for retrieving contents in (22) in the form of

{k,f}𝒜ck,fαCf={k,f}𝒜{min(1,uk,tB(k)u,f)αCf},\displaystyle\begin{split}&\sum\limits_{\{k,f\}\in\mathcal{A}}c_{k,f}\cdot\alpha_{C_{f}}=\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}\Bigg{\}},\end{split} (24)
{k,f}𝒜λf,tck,fβSk={k,f}𝒜{λf,tmin(1,uk,tB(k)u,f)βSk}.\displaystyle\begin{split}&\sum\limits_{\{k,f\}\in\mathcal{A}}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}\\ &=\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\lambda_{f,t}^{*}\cdot{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\beta_{S_{k}}\Bigg{\}}.\end{split} (25)

Substituting (24) and (25) back into (23) yields (26), shown at the top of this page.

Based on the facts that min(1,i=1,,Ixi)1Ii=1,,Imin(1,xi){\rm min}\big{(}1,\sum\nolimits_{i=1,\cdots,I}x_{i}\big{)}\geq\frac{1}{I}\sum\nolimits_{i=1,\cdots,I}{\rm min}\left(1,x_{i}\right), we therefore can arrive at that min(1,uk,tB(k)u,f)1k,t0uk,tmin(1,B(k)u,f){\rm{min}}\big{(}1,\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\big{)}\geq\frac{1}{\left\|\mathcal{L}^{*}_{k,t}\right\|_{0}}\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\big{(}1,B_{(k)_{u},f}\big{)}. Substituting this lower bound approximation into (26) results in (27), shown at the top of this page.

𝒟({ck,f})=k=1Kf=1Mck,fαCf+k=1Kf=1Mλf,tck,fβSk+k=1Kf=1Mλf,t(1ck,f)dk,f{k,f}𝒜ck,fαCf+{k,f}𝒜λf,tck,fβSk+{k,f}λf,t(1ck,f)dk,f{k,f}𝒜{min(1,uk,tB(k)u,f)αCf}+{k,f}𝒜{λf,tmin(1,uk,tB(k)u,f)βSk},\displaystyle\small\begin{split}\mathcal{D}\left(\{c_{k,f}\}\right)&=\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}\quad+\sum\limits_{k=1}^{K}\sum\limits_{f=1}^{M}\lambda_{f,t}^{*}\cdot(1-c_{k,f})\cdot d_{k,f}\\ &\geq\sum\limits_{\{k,f\}\in\mathcal{A}}c_{k,f}\cdot\alpha_{C_{f}}+\sum\limits_{\{k,f\}\in\mathcal{A}}\lambda_{f,t}^{*}\cdot c_{k,f}\cdot\beta_{S_{k}}+\sum\limits_{\{k,f\}\in\mathcal{B}}\lambda_{f,t}^{*}\cdot(1-c_{k,f})\cdot d_{k,f}\\ &\geq\sum\limits_{\{k,f\}\in\mathcal{A}}\left\{{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}\right\}+\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\lambda_{f,t}^{*}\cdot{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\beta_{S_{k}}\Bigg{\}},\end{split} (26)
𝒟({ck,f}){k,f}𝒜{min(1,uk,tB(k)u,f)αCf}+{k,f}𝒜{λf,tmin(1,uk,tB(k)u,f)βSk}{k,f}𝒜{1k,t0uk,tmin(1,B(k)u,f)αCf}+{k,f}𝒜{λf,tβSk1k,t0uk,tmin(1,B(k)u,f)}=1k,t0{k,f}𝒜{uk,tmin(1,B(k)u,f)αCf}+1k,t0{k,f}𝒜{λf,tβSkuk,tmin(1,B(k)u,f)},\displaystyle\begin{split}\mathcal{D}\left(\{c_{k,f}\}\right)&\geq\sum\limits_{\{k,f\}\in\mathcal{A}}\left\{{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}\right\}+\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\lambda_{f,t}^{*}\cdot{\rm{min}}\Big{(}1,\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}B_{(k)_{u},f}\Big{)}\cdot\beta_{S_{k}}\Bigg{\}}\\ &\geq\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\Big{(}1,B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}\Bigg{\}}+\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\lambda_{f,t}^{*}\cdot\beta_{S_{k}}\cdot\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\cdot\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\Big{(}1,B_{(k)_{u},f}\Big{)}\Bigg{\}}\\ &=\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\Big{(}1,B_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}\Bigg{\}}+\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\lambda_{f,t}^{*}\cdot\beta_{S_{k}}\cdot\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\Big{(}1,B_{(k)_{u},f}\Big{)}\Bigg{\}},\end{split} (27)

Next, we let ck,fc^{*}_{k,f} be the optimal cache placement solution to the optimization problem (16). In addition, define B(k)u,fB^{*}_{(k)_{u},f} as an optimal binary variable deciding whether file ff requested by MU uk,tu\in\mathcal{L}^{*}_{k,t} is selected to be cached. Thus we have {k,f}𝒜{uk,tmin(1,B(k)u,f)αCf}{k,f}𝒜{uk,tmin(1,B(k)u,f)αCf}\sum\nolimits_{\{k,f\}\in\mathcal{A}}\Big{\{}\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\big{(}1,B_{(k)_{u},f}\big{)}\cdot\alpha_{C_{f}}\Big{\}}\geq\sum\nolimits_{\{k,f\}\in\mathcal{A}}\Big{\{}\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\big{(}1,B^{*}_{(k)_{u},f}\big{)}\cdot\alpha_{C_{f}}\Big{\}}, and {k,f}𝒜{λf,tuk,tmin(1,B(k)u,f)βSk}{k,f}𝒜{λf,tuk,tmin(1,B(k)u,f)βSk}\sum\nolimits_{\{k,f\}\in\mathcal{A}}\Big{\{}\lambda_{f,t}^{*}\cdot\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\big{(}1,B_{(k)_{u},f}\big{)}\cdot\beta_{S_{k}}\Big{\}}\geq\sum\nolimits_{\{k,f\}\in\mathcal{A}}\Big{\{}\lambda_{f,t}^{*}\cdot\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\big{(}1,B^{*}_{(k)_{u},f}\big{)}\cdot\beta_{S_{k}}\Big{\}}. Besides, by following the properties in Proposition 2, we have that {k,f}k,fKiteck,f={k,f}k,fKitemin(1,uk,tB(k)u,f)\sum\nolimits_{\{k,f\}\in\mathcal{L}_{k,f}^{K_{ite}}}c_{k,f}^{*}=\sum\nolimits_{\{k,f\}\in\mathcal{L}_{k,f}^{K_{ite}}}{\rm{min}}\big{(}1,\sum\nolimits_{u\in\mathcal{L}^{*}_{k,t}}B^{*}_{(k)_{u},f}\big{)} with KiteK_{ite} being the required number of iterations of Algorithm 2. In addition, let 𝒟({ck,f}|dk,f=0)\mathcal{D}(\{c_{k,f}\}|d_{k,f}=0) be the network cost of small-cell networks given that dk,fd_{k,f} = 0. Eventually, we can arrive at

𝒟({ck,f}|dk,f=0)1k,t0{k,f}𝒜{uk,tmin(1,B(k)u,f)αCf}+1k,t0{k,f}𝒜{λf,tβSkuk,tmin(1,B(k)u,f)}=1k,t0𝒟({ck,f}|dk,f=0).\displaystyle\begin{split}&\mathcal{D}\left(\{c_{k,f}\}|d_{k,f}=0\right)\\ &\geq\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\Big{(}1,B^{*}_{(k)_{u},f}\Big{)}\cdot\alpha_{C_{f}}\Bigg{\}}\\ &+\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\sum\limits_{\{k,f\}\in\mathcal{A}}\Bigg{\{}\lambda_{f,t}^{*}\cdot\beta_{S_{k}}\cdot\sum\limits_{u\in\mathcal{L}^{*}_{k,t}}{\rm{min}}\Big{(}1,B^{*}_{(k)_{u},f}\Big{)}\Bigg{\}}\\ &=\frac{1}{||\mathcal{L}^{*}_{k,t}||_{0}}\mathcal{D}\left(\{c^{*}_{k,f}\}|d_{k,f}=0\right).\end{split} (28)

By following (28), we conclude that 𝒟({ck,f}|dk,f=0)1L𝒟({ck,f}|dk,f=0)\mathcal{D}\big{(}\{c_{k,f}\}|d_{k,f}=0\big{)}\geq\frac{1}{L}\mathcal{D}\big{(}\{c^{*}_{k,f}\}|d_{k,f}=0\big{)} where L=k,t0L=||\mathcal{L}^{*}_{k,t}||_{0}. Based on the above observations, we thereby deduce that when dk,fd_{k,f} = 0, our proposed cache placement policy in Algorithm 2 indeed is of a polynomial-time LL-approximation algorithm. This completes the proof of Theorem 1. \hfill\blacksquare

References

  • [1] Y. Cao, S. Maghsudi, and T. Ohtsuki, “Mobility-aware routing and caching: A federated learning assisted approach,” in ICC 2021 - IEEE International Conference on Communications, 2021, pp. 1–6.
  • [2] H. Ahlehagh and S. Dey, “Video-aware scheduling and caching in the radio access network,” IEEE/ACM Trans. Netw., vol. 22, no. 5, pp. 1444–1462, Oct. 2014.
  • [3] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8402–8413, Dec. 2013.
  • [4] S. Maghsudi and M. van der Schaar, “A non-stationary bandit-learning approach to energy-efficient femto-caching with rateless-coded transmission,” IEEE Trans. Wireless Commun., vol. 19, no. 7, pp. 5040–5056, Jul. 2020.
  • [5] S. Maghsudi and E. Hossain, “Distributed user association in energy harvesting small cell networks: A probabilistic bandit model,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1549–1563, Mar. 2017.
  • [6] S. Li, J. Xu, M. van der Schaar, and W. Li, “Trend-aware video caching through online learning,” IEEE Trans. Multimedia, vol. 18, no. 12, pp. 2503–2516, Dec. 2016.
  • [7] J. Konečný, H. McMahan, F. Yu, P. Richtárik, A. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, Oct. 2017.
  • [8] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning in mobile edge networks: A comprehensive survey,” arXiv preprint arXiv:1909.11875, 2019.
  • [9] H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor, “Scheduling policies for federated learning in wireless networks,” IEEE Trans. Commun., vol. 68, no. 1, pp. 317–333, Jan. 2020.
  • [10] G. Zhu, Y. Wang, and K. Huang, “Low-latency broadband analog aggregation for federated edge learning,” arXiv preprint arXiv:1812.11494, 2018.
  • [11] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Proces. Mag., vol. 37, no. 3, pp. 50–60, May 2020.
  • [12] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence time optimization for federated learning over wireless networks,” IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2457–2471, Apr. 2021.
  • [13] L. Zhang, M. Xiao, G. Wu, and S. Li, “Efficient scheduling and power allocation for D2D-assisted wireless caching networks,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2438–2452, Jun. 2016.
  • [14] S. Boyd and L. Vandenberghe, Convex optimization.   UK: Cambridge Uni. Press, 2004.
  • [15] R. Wang, J. Zhang, S. H. Song, and K. B. Letaief, “Exploiting mobility in cache-assisted D2D networks: Performance analysis and optimization,” IEEE Trans. Wireless Commun., vol. 17, no. 8, pp. 5592–5605, Aug. 2018.
  • [16] S. Maghsudi and D. Niyato, “On transmission mode selection in D2D-enhanced small cell networks,” IEEE Wireless Commun. Lett., vol. 6, no. 5, pp. 618–621, Oct. 2017.
  • [17] S. Maghsudi and S. Stańczak, “Transmission mode selection for network-assisted device to device communication: A levy-bandit approach,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jul. 2014, pp. 7009–7013.
  • [18] M. Gregori, J. Gómez-Vilardebó, J. Matamoros, and D. Gündüz, “Wireless content caching for small cell and D2D networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 5, pp. 1222–1234, May 2016.
  • [19] S. Müller, O. Atan, M. van der Schaar, and A. Klein, “Context-aware proactive content caching with service differentiation in wireless networks,” IEEE Trans. Wireless Commun., vol. 16, no. 2, pp. 1024–1036, Feb. 2017.
  • [20] S. Müller, O. Atan, M. van der Schaar, and A. Klein, “Smart caching in wireless small cell networks via contextual multi-armed bandits,” in 2016 IEEE International Conference on Communications (ICC), Jul. 2016, pp. 1–7.
  • [21] B. N. Bharath, K. G. Nagananda, and H. V. Poor, “A learning-based approach to caching in heterogenous small cell networks,” IEEE Trans. Commun., vol. 64, no. 4, pp. 1674–1686, Apr. 2016.
  • [22] S. Maghsudi and M. van der Schaar, “A bandit learning approach to energy-efficient femto-caching under uncertainty,” in 2019 IEEE Global Communications Conference (GLOBECOM), Dec. 2019, pp. 1–6.
  • [23] K. Poularakis, G. Iosifidis, V. Sourlas, and L. Tassiulas, “Exploiting caching and multicast for 5G wireless networks,” IEEE Trans. Wireless Commun., vol. 15, no. 4, pp. 2995–3007, Apr. 2016.
  • [24] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2856–2867, May 2014.
  • [25] Y. N. Shnaiwer, S. Sorour, N. Aboutorab, P. Sadeghi, and T. Y. Al-Naffouri, “Network-coded content delivery in femtocaching-assisted cellular networks,” in 2015 IEEE Global Communications Conference (GLOBECOM), Dec. 2015, pp. 1–6.
  • [26] R. Wang, J. Zhang, S. H. Song, and K. B. Letaief, “Mobility-aware caching in D2D networks,” IEEE Trans. Wireless Commun., vol. 16, no. 8, pp. 5001–5015, Aug. 2017.
  • [27] M. Chen, Y. Hao, L. Hu, K. Huang, and V. K. N. Lau, “Green and mobility-aware caching in 5G networks,” IEEE Trans. Wireless Commun., vol. 16, no. 12, pp. 8347–8361, Dec. 2017.
  • [28] Y. Guan, Y. Xiao, H. Feng, C. Shen, and L. J. Cimini, “Mobicacher: Mobility-aware content caching in small-cell networks,” in 2014 IEEE Global Communications Conference, Dec. 2014, pp. 4537–4542.
  • [29] Z. Yang, Y. Liu, Y. Chen, and L. Jiao, “Learning automata based Q-learning for content placement in cooperative caching,” IEEE Trans. Commun., vol. 68, no. 6, pp. 3667–3680, Jun. 2020.
  • [30] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipf-like distributions: Evidence and implications,” in IEEE INFOCOM’99, vol. 1, Mar. 1999, pp. 126–134.
  • [31] M. Hefeeda and O. Saleh, “Traffic modeling and proportional partial caching for peer-to-peer systems,” IEEE/ACM Trans. Netw., vol. 16, no. 6, pp. 1447–1460, Dec. 2008.
  • [32] J. Konečný, H. McMahan, D. Ramage, and P. Richtárik, “Federated optimization: Distributed machine learning for on-device intelligence,” arXiv preprint arXiv:1610.02527, Oct. 2016.
  • [33] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, 2017, pp. 1273–1282.
  • [34] N. P. Kuruvatti, A. Klein, and H. D. Schotten, “Prediction of dynamic crowd formation in cellular networks for activating small cells,” in 2015 IEEE 81st Vehicular Technology Conference, Jul. 2015, pp. 1–5.
  • [35] K. Bennett, P. Bradley, and A. Demiriz, “Constrained K-means clustering,” in MSR-TR-2000-65, May 2000, pp. 1–9.
  • [36] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “Performance optimization of federated learning over wireless networks,” in 2019 IEEE Global Communications Conference (GLOBECOM), Dec. 2019, pp. 1–6.
  • [37] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems.   Berlin, Germany: Springer-Verlag, 2004.
  • [38] F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Transactions on Interactive Intelligent Systems, vol. 5, no. 4, Dec. 2015.
  • [39] B. Tavli and W. Heinzelman, “Energy-efficient real-time multicast routing in mobile ad hoc networks,” IEEE Trans. Comput., vol. 60, no. 5, pp. 707–722, May 2011.
  • [40] T. Imielinski and H. F. Korth, Mobile computing.   Springer Science & Business Media, 1996, vol. 353.