PPVF: An Efficient Privacy-Preserving Online Video Fetching Framework with Correlated Differential Privacy

Xianzhi Zhang, Yipeng Zhou, Di Wu, Quan Z. Sheng, Miao Hu, and Linchang Xiao Xianzhi Zhang, Di Wu, Miao Hu, Linchang Xiao are with the School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China, and the Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China. E-mail: {zhangxzh9, xiaolch3}@mail2.sysu.edu.cn; {wudi27, humiao5}@mail.sysu.edu.cn. (Di Wu is the corresponding author.) Yipeng Zhou, and Quan Z. Sheng are with the School of Computing, Macquarie University, NSW 2109, Australia. E-mail: {yipeng.zhou, michael.sheng}@mq.edu.au.

Abstract

Online video streaming has evolved into an integral component of the contemporary Internet landscape. Yet, the disclosure of user requests presents formidable privacy challenges. As users stream their preferred online videos, their requests are automatically seized by video content providers, potentially leaking users’ privacy. Unfortunately, current protection methods are not well-suited to preserving user request privacy from content providers while maintaining high-quality online video services. To tackle this challenge, we introduce a novel Privacy-Preserving Video Fetching (PPVF) framework, which utilizes trusted edge devices to pre-fetch and cache videos, ensuring the privacy of users’ requests while optimizing the efficiency of edge caching. More specifically, we design PPVF with three core components: (1) Online privacy budget scheduler, which employs a theoretically guaranteed online algorithm to select non-requested videos as candidates with assigned privacy budgets. Alternative videos are chosen by an online algorithm that is theoretically guaranteed to consider both video utilities and available privacy budgets. (2) Noisy video request generator, which generates redundant video requests (in addition to original ones) utilizing correlated differential privacy to obfuscate request privacy. (3) Online video utility predictor, which leverages federated learning to collaboratively evaluate video utility in an online fashion, aiding in video selection in (1) and noise generation in (2). Finally, we conduct extensive experiments using real-world video request traces from Tencent Video. The results demonstrate that PPVF effectively safeguards user request privacy while upholding high video caching performance.

Index Terms:

Request Privacy, Video Pre-Fetching, Edge Caching, Online Algorithm, Correlated Differential Privacy.

I Introduction

Online video streaming has become an indispensable service in our daily lives, serving billions of Internet users by streaming diverse videos, including movies, news, and TV episodes. In 2023, YouTube alone provided over 5 billion online videos daily to more than 122 million Internet users, with a total daily playback time exceeding 1 billion hours [1]. To serve such a huge number of users, it is common to leverage edge devices (EDs) to pre-fetch video content for users. Such EDs, including user devices [2, 3, 4], mobile vehicles [5, 6, 7], access points (APs) [4, 8], and edge nodes of the content delivery network (CDN) [9], can significantly reduce communications between users and video content providers (CPs). Meanwhile, serving users with content cached on EDs can achieve a high quality of service (QoS) and a low playback latency for video streaming.

However, with the proliferation of the online video market, the request privacy leakage concern has risen [7, 10, 11]. When users request videos from online CPs, their request traces are automatically recorded by CPs. Analyzing these traces can potentially reveal sensitive privacy information such as gender [9], age [12], location [2, 6], and hobbies [13, 14]. The leakage of request privacy poses significant risks to users with the spread of spam and scams [15]. Consequently, it is urgent and essential to develop effective video request strategies with preserved user privacy [16].

Various privacy-preserving methodologies, such as encryption, federated learning (FL) and differential privacy (DP), have emerged, but applying them to preserve video request privacy is non-trivial, which can be explained as follows. Encryption-based methods, e.g., Hypertext Transfer Protocol Secure (HTTPS) [17], can only shield against privacy threats from external attackers when delivering videos. FL is a generic framework to preserve privacy by only exposing parameters when training machine learning models for edge devices [18, 19]. Yet, they fail to conceal request traces from CPs because user requests must remain visible to CPs for video streaming services to function properly. DP is a widely utilized approach to safeguarding user privacy in machine learning systems [12, 20, 21]. However, straightly distorting user requests with DP noises can severely impair video streaming efficiency by requesting many videos out of users’ interest.

To mitigate video request privacy leakage, we propose a novel Privacy-Preserving Video Fetching (PPVF) framework by synthetically utilizing video pre-fetching, FL and correlated differential privacy (CDP) to supplement each other and overcome the deficiency of each single methodology. For deployment in practice, our framework can be implemented on trusted EDs, e.g., user devices [2, 3, 4], vehicles [5, 6, 7], access points (APs) [4, 8], owned or trusted by users. By utilizing the PPVF framework, trusted EDs achieve a harmonious blend of efficient video delivery and safeguard the privacy of viewing records. In summary, our main contributions are listed below:

•

To the best of our knowledge, we are among the first to propose an online privacy budget scheduler for reconciling privacy and efficiency in online video systems. Candidate videos with assigned privacy budgets are tactfully selected by a threshold-based online algorithm for further distorting the process. Additionally, the performance of the allocation algorithm is theoretically guaranteed by competitive analysis.
•

We leverage Correlated Differential Privacy (CDP) to design the noisy video request generator for generating video requests (including redundant noisy ones) in edge video caching systems. By taking correlated video request patterns into account, CDP can more accurately calibrate the noise scale when distorting pre-fetching requests and hence avoid injecting excessive noises.
•

To predict video utility (serving as the pivotal prior knowledge for video selection and noise generation), we further construct a FL-based online video utility predictor, which only exposes non-critical parameters to collaboratively evaluate video utility in an online and privacy-preserving mode.
•

We conduct extensive experiments by using real-world request traces collected from Tencent Video [22] to validate the superiority of PPVF. The experimental results demonstrate that PPVF is the best in preserving privacy without significantly compromising caching performance compared with the state-of-the-art baselines.

The remainder of the paper is organized as follows. The PPVF system architecture, the threat model, and the problems formulation are presented in Section II. In Section III, we introduce novel algorithms for allocating the privacy budget and determining the pre-fetching strategy in edge caching systems. Section IV presents the video utility predictor obtained via federated learning and online parameter estimation methods. The experimental results are reported in Section V followed by a discussion of the related works in Section VI. Finally, we conclude our paper in Section VII.

II System Model and Preliminary

In this section, we introduce the system model of our PPVF framework and the main entities in PPVF. To facilitate readability, we have summarized notations in Table I.

II-A System Architecture

There are three types of entities in the system, which are briefly introduced as follows:

•

Content Provider (CP): The CP is an online video service provider possessing a comprehensive set of I videos denoted by $\mathcal{I}=\{i_{1},i_{2},\cdots,i_{\text{I}}\}$ . However, the CP also collects users’ request traces to enhance its services, e.g., recommendation [23, 13] and advertisement [12, 14], which is regarded as the main risk entity in our privacy model.
•

Trusted Edge Devices (EDs): In PPVF, we denote $\mathcal{E}=\{e_{1},e_{2},\cdots,e_{\text{E}}\}$ as the set of all EDs which can be trusted by users [24, 10, 11]. Each ED $e\in\mathcal{E}$ has a storage limitation $c_{e}$ in our system model. EDs play two critical roles: i) caching videos fetched from the CP to serve users’ requests with a higher quality of service (QoS), and ii) preserving video request privacy to conceal users’ real preferences¹¹1Our main focus is to design a privacy framework for trusted edge devices, which may deploy Trusted Execution Environments (TEE), such as Intel SGX, TrustZone for Cortex-M, to convince users and execute instructions reliably. .
•

Users: Users are final consumers of videos. Rather than directly exposing video requests to the CP, users in PPVF only submit their requests to corresponding trusted EDs, which act as agents to fetch videos from the CP for users.

II-A1 Interactions Between EDs and Users

When a user needs to watch video content $i^{\prime}\in\mathcal{I}$ at time $\tau\in[0,T)$ , the user submits her request (denoted by a view content vector $\bm{v}_{e}=[v_{e,i}]^{\text{I}}$ where $v_{e,i^{\prime}}=1$ and $v_{e,i}=0,\forall i\neq i^{\prime},i\in\mathcal{I}$ ) to its ED $e$ . Here, time interval $[0,T)$ is the observed window in our problem. If the requested video is cached at ED $e$ , the video can be streamed directly from the ED to the user. Otherwise, ED $e$ needs to pre-fetch some redundant videos plus the missing video $i^{\prime}$ from the CP. Here, we denote $k\in\mathbb{N}^{+}\,,\,1\leq k\leq\text{K}_{e}$ as the index of the cache missing request that needs to be fetched from the remote CP by ED $e$ , where $\text{K}_{e}$ represents the maximum index number of the missing requests at ED $e$ in $[0,T)$ .

Note that all EDs can authentically record all video requests from users to evaluate the utility for video pre-fetching and caching [24, 4, 12]. For any ED $e$ , let $\mathcal{V}_{e}^{k}=\{\,(i,\tau)\,\mid\,i\in\mathcal{I},\tau\in[0,t^{k}),\tau\in\mathbb{R},1\leq k\leq\text{K}_{e}\}$ denote the set of all historical viewing requests up to the time $t^{k}$ , where $t^{k}$ is the time of the $k$ -th cache missing request. Besides, we denote $\mathcal{T}_{e,i}^{t_{1},t_{2}}$ as the timestamp set of the viewing requests for video $i$ within time window $[t_{1},t_{2})$ at ED $e$ .

TABLE I: Main notations used in the paper.

Notation	Description
$\mathcal{E}$ / $\mathcal{I}$	The space set of all edge devices (EDs) / videos.
$i$ / $e$	The index of any video / ED.
$k$	The index of request that is missed at a specific ED and needs to be fetched from the CP.
$t^{k}$	The timestamp of the $k$ -th cache missing request.
$\mathcal{V}_{e}$ / $\mathcal{V}_{e}^{k}$	The set of all viewing requests at ED $e$ in time $[0,T)$ / $[0,t^{k})$ .
$\mathcal{T}_{e,i}$ / $\mathcal{T}_{e,i}^{t_{1},t_{2}}$	The timestamp set of the viewing requests of video $i$ arriving at ED $e$ in time window $[0,T)$ / $[t_{1},t_{2})$ .
$\bm{x}_{e}^{k}$ / $\bm{v}_{e}^{k}$	The pre-fetching / viewing vector for the $k$ -th cache missing request at ED $e$ .
$c_{e}$ / $f_{e}$	The caching / pre-fetching capability of ED $e$ .
$\bm{a}_{e}^{k}$	The privacy budget allocation vector for the $k$ -th pre-fetching at ED $e$ .
$\mathcal{A}_{e}^{k}$	The candidate video set for generating redundant requests for the $k$ -th pre-fetching at ED $e$ .
$\xi_{e,i}$ / $\epsilon_{e,i}$	The total privacy budget / once privacy cost of ED $e$ with respect to video $i$ .
$\lambda_{e,i}^{k}$	The utility of pre-fetching and caching video $i$ in time $t^{k}$ at ED $e$ .
$\bm{\theta}=\{\bm{\beta},\bm{p},\bm{q}\}$	The parameters of MEP model.
$t_{\theta}$	The update time point of online parameter estimation based on FL-framework.
$\bm{\Psi}^{k}_{e}$	The correlated degree matrix among different videos in time $t^{k}$ at ED $e$ .
$\bm{\psi}_{e}^{k}$ / $\bm{\alpha}_{e}^{k}$ / $\bm{\sigma}_{e}^{k}$	The historical matrices to calculate the correlated degree matrix $\bm{\Psi}^{k}$ in time $t^{k}$ at ED $e$ .
$\Delta\lambda_{e,i}^{k}$ / $\Delta\lambda^{k}_{e,gc}$	The correlated sensitivity for each video $i$ / global of the $k$ -th pre-fetching at ED $e$ .

II-A2 Interactions between the CP and EDs

In PPVF, EDs, in lieu of end users, interact with the CP. On the one hand, the CP delivers requested videos to EDs. On the other hand, since each ED has a limited number of local viewing requests, the CP needs to assist EDs in evaluating video utility with federated learning to enhance the quality of service.

The vector of pre-fetching requests is denoted by $\bm{x}_{e}^{k}=[x_{e,i}^{k}]^{\text{I}}$ , where $x_{e,i}^{k}\in\{0,1\},\forall e,i,k$ . To preserve privacy, ED $e$ utilizes $\bm{x}_{e}^{k}$ to obfuscate the original view request vector $\bm{v}_{e}^{k}$ for the $k$ -th cache missing request, which needs to be fetched from the CP. Videos finally fetched by ED $e$ from the CP is conducted by $\bm{r}_{e}^{k}=[r_{e,i}^{k}]^{\text{I}}$ , where $r_{e,i}^{k}=v_{e,i}^{k}|x_{e,i}^{k}$ , representing the fetching vector sent by ED $e$ for the $k$ -th cache missing request. Here, the symbol ‘ $|$ ’ represents the ‘OR’ operator, indicating that whether ED $e$ fetches video $i$ depends on both $x_{e,i}^{k}$ and $v_{e,i}^{k}$ . Note that the generation of $\bm{v}_{e}^{k}$ is purely based on users’ view interests, not affected by our strategies. Our study focuses on generating $\bm{x}_{e}^{k}$ for privacy protection.

Furthermore, with the assistance of the CP, we assume that EDs can periodically update the parameters of their local model for video utility prediction without disclosing their private data. The set of time points to execute the online parameter estimation is denoted by $t_{\theta}\in\mathcal{Q}$ , where $\mathcal{Q}=\{t_{\theta}|t_{\theta}\in[0,T),t_{\theta}\in\mathbb{N}\}$ . Note that $t_{\theta}$ represents the time point to update model parameters $\bm{\theta}$ , not the time point for requesting videos. Due to the limited caching space, ED $e$ updates its cached videos by fetching videos according to predicted video utility when its cache space is full. The video utility will be further specified in Section IV.

II-B Threat Model

In traditional online video systems, privacy threats related to video fetching primarily arise from the exposure of users’ video-request patterns and preferences. As users interact with the CP to access the online video services, their historical video requests and pre-fetching activities will be inadvertently exposed to the CP, which can accordingly infer sensitive information, such as age [12], gender [9], locations [2, 6], and favorites [13, 14], of users. Such threats are driven by the goal of enhancing services through caching or recommendation algorithms. CPs can exploit inferred sensitive information to gain insights into individual user preferences. Therefore, unauthorized access to user-specific information without protection poses a significant privacy threat, enabling CPs to infer personal preferences, potentially compromising users’ privacy.

II-C Problems Formulation

Let us first consider the global video caching problem without considering privacy leakage. When requesting videos missed by the edge cache from the CP, ED $e$ also makes requests for redundant videos based on pre-fetching decisions $\bm{x}_{e}\ =[x^{k}_{e,i}]^{\text{K}_{e}\times\text{I}}$ . Let $\bm{\lambda}_{e}\ =[\lambda^{k}_{e,i}]^{\text{K}_{e}\times\text{I}}$ denote all video utility values, e.g., the predicted rate to request videos by users, for any ED $e$ . The problem of maximizing pre-fetching and caching utility can be formulated by:


$\displaystyle\mathbb{P}_{g}:\quad$	$\displaystyle\max_{\bm{x}_{e},\forall e}\,\sum_{e\in\mathcal{E}}\sum_{k=1}^{\text{K}_{e}}\sum_{i\in\mathcal{I}}\lambda_{e,i}^{k}\cdot x^{k}_{e,i}$	(1a)
$\displaystyle\mathrm{s.t.}\,$	$\displaystyle\sum_{i\in\mathcal{I}}x_{e,i}^{k}\leq f_{e},\hskip 55.0pt\forall e\in\mathcal{E},\,1\leq k\leq\text{K}_{e},$	(1b)
	$\displaystyle x_{e,i}^{k}\in\{0,1\},\hskip 25.0pt\forall e\in\mathcal{E},\,\forall i\in\mathcal{I},\,1\leq k\leq\text{K}_{e},$	(1c)
	$\displaystyle\lambda_{e,i}^{k}=h_{e}(i,t^{k}\ \|\ \mathcal{V}_{e}^{k},\bm{\theta}),\forall e\in\mathcal{E},\,\forall i\in\mathcal{I},\,1\leq k\leq\text{K}_{e},$	(1d)

where $h_{e}:\mathcal{I}\times\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ can represent any prediction function for utility with model parameters $\bm{\theta}$ and historical viewing records $\mathcal{V}_{e}^{k}$ up to time $t^{k}$ at ED $e$ . Besides, Eq. (1b) restricts the maximum pre-fetching capacity $f_{e}$ of ED $e$ .

For traditional video streaming, the CP can collect historical video request records to infer $\bm{\lambda}_{e},\,\forall e$ , which can be further used to derive optimal solution $\bm{x}_{e}^{*}$ , for all EDs. In this process, the CP can exactly infer preferences exposed by EDs. To prevent privacy leakage, EDs can apply differential privacy (DP) noises to distort pre-fetching decisions $\bm{x}_{e}$ , to hide both users’ original video requests and video utility. It is difficult for the CP to infer user privacy from public fetching actions, and hence user privacy is preserved. In the rest of the subsection, we extend $\mathbb{P}_{g}$ to present the privacy-preserving video pre-fetching problem.

We begin by succinctly introducing DP, avoiding any unnecessary notation. In problem $\mathbb{P}_{g}$ , the pre-fetching decision variables $\bm{x}$ are primarily determined by the utility $\bm{\lambda}$ , which is evaluated by the function $h$ with the parameter $\bm{\theta}$ , and the set $\mathcal{V}$ of real request records. To preserve privacy, DP can be applied to distort the output of utility function $h$ to protect privacy in dataset $\mathcal{V}$ .

Definition 1.

( $\epsilon$ -Differential Privacy) A randomized mechanism $\mathcal{M}$ confirms $\epsilon$ -DP, if for any pair of adjacent datasets $\mathcal{V}\simeq\mathcal{V}^{\prime}$ , any tuple of input $(i,t)\in\mathcal{I}\times\mathbb{R}^{+}$ , and any predict function $h$ with its parameters $\bm{\theta}$ , it satisfies:

\frac{Pr\{\mathcal{M}(h(i,t\ |\ \mathcal{V},\bm{\theta}))\in\mathcal{O}\}}{Pr\{\mathcal{M}(h(i,t\ |\ \mathcal{V}^{\prime},\bm{\theta}))\in\mathcal{O}\}}\leq exp(\epsilon).

(2)

Here, $\epsilon$ is the privacy budget and $\mathcal{O}$ represents the outcome range of mechanism $\mathcal{M}$ .

Refer to caption — Figure 1: The workflow of privacy-preserving video fetching (PPVF) for online video service at EDs.

However, in practical online video systems, cardinality I for $\mathcal{I}$ is a huge number, and users’ view preferences can be very skewed, implying that there exists a large number of cold videos with very few user requests [25, 4]. Thereby, directly applying DP noise to distort utilities for video pre-fetching confronts the following two challenges:

(1)

More privacy budget will be consumed to protect privacy if there are more videos in $\mathcal{I}$ . If cardinality I is a large number, the noises will be excessively large such that the video utility predicted by the function $h$ is valueless.
(2)

Considering that the CP can implement collaborative filtering algorithms to infer user privacy, requesting cold videos (with scarce requests) as noises is not effective in preserving privacy since collaborative filtering algorithms can easily remove the noisy influence of cold video requests [26].

To tackle these two challenges, we consider adding DP noises to protect pre-fetching privacy by the Exponential Mechanism (EM) [27] with a candidate video set selected in an online manner. We start with a brief introduction to the EM. The EM is a classical DP mechanism satisfying $\epsilon$ -DP, which can be applied to distort the output of utility function $h$ , defined as follows.

Definition 2.

(Exponential Mechanism) The exponential mechanism (EM) satisfies $\epsilon$ -differential privacy with the following steps: (1) specifies a global sensitivity, denoted as $\Delta\lambda$ , for a video utility prediction function $h:\mathcal{I}\times\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ . (2) video $i\in\mathcal{I}$ is selected to request with the probability

P\{i\}\propto\exp\left(\frac{\epsilon\cdot h(i,t\ |\ \mathcal{V},\bm{\theta})}{2\Delta\lambda}\right).

Here, $\epsilon$ represents the privacy budget, $\mathcal{V}$ is the set of request records privately owned by an ED, and $\bm{\theta}$ represents parameters in function $h$ .

Instead of applying the whole video set $\mathcal{I}$ , we identify a candidate video set $\mathcal{A}_{e}^{k}\subseteq\mathcal{I}$ for ED $e$ to generate redundant requests ED $e$ . Here, $k$ is the pre-fetching index for the request missed by the edge cache. The whole candidate video set $\mathcal{A}_{e}=\{\mathcal{A}_{e}^{k}\ |\ 1\leq k\leq\text{K}_{e}\}$ is obtained by solving the problem $\mathbb{P}_{e}$ subjecting to both the privacy budget and pre-fetching capacity constraint. In the problem $\mathbb{P}_{e}$ , the constraint (3b) ensures that the assigned privacy budget cannot exceed the total privacy budget $\xi_{e}$ for each video, and $f_{e}$ denotes the pre-fetching capability at ED $e$ .


$\displaystyle\mathbb{P}_{e}:\quad$	$\displaystyle\mathcal{J}_{e}^{*}=\max_{\bm{a}_{e}}\,\sum_{k=1}^{\text{K}_{e}}\sum_{i\in\mathcal{I}}\lambda_{e,i}^{k}\cdot a^{k}_{e,i}$	(3a)
$\displaystyle\mathrm{s.t.}\,$	$\displaystyle\sum_{k=1}^{\text{K}_{e}}\epsilon_{e,i}\cdot a_{e,i}^{k}\leq\xi_{e},\hskip 87.0pt\forall i\in\mathcal{I},$	(3b)
	$\displaystyle\sum_{i\in\mathcal{I}}a_{e,i}^{k}\leq f_{e},\hskip 87.0pt1\leq k\leq\text{K}_{e},$	(3c)
	$\displaystyle a_{e,i}^{k}\in\{0,1\},\hskip 58.0pt\forall i\in\mathcal{I},\,1\leq k\leq\text{K}_{e},$	(3d)
	$\displaystyle\lambda_{e,i}^{k}=h_{e}(i,t^{k}\ \|\ \mathcal{V}_{e}^{k},\bm{\theta}),\hskip 15.0pt\forall i\in\mathcal{I},\,1\leq k\leq\text{K}_{e},$	(3e)
	$\displaystyle\mathcal{A}_{e}^{k}=\{i\ \|\ a_{e,i}^{k}=1,\forall i\in\mathcal{I}\},\hskip 30.0pt1\leq k\leq\text{K}_{e}.$	(3f)

Based on problem $\mathbb{P}_{e}$ , we can illustrate the holistic optimization process of PPVF:

•

Federated Learning: EDs can jointly evaluate video utility, i.e., $\lambda_{e,i}^{k}=h_{e}(i,t^{k}\ |\ \mathcal{V}_{e}^{k},\bm{\theta})$ , $\forall\,e,\,i,\,k$ , via the federated learning framework. This approach allows EDs to achieve significantly more accurate video utility than those obtained solely from local traces.
•

Video Selection and Budget Allocation: With evaluated $\bm{\lambda}_{e}^{k}$ , EDs can solve $\mathbb{P}_{e}$ in an online manner to obtain $\mathcal{A}_{e}^{k}$ , which is the candidate set of videos to be requested from the remote CP.
•

Pre-fetching Request Generation: With $\mathcal{A}_{e}^{k}$ derived from $\mathbb{P}_{e}$ and privacy budget allocation decisions, EDs can apply the EM to distort their fetching requests with correlated differential privacy (CDP). Then, EDs contact the CP to fetch both viewing videos and redundant videos.

III PPVF Framework Design

III-A PPVF Overview

To better understand how users, EDs and the CP interact with each other, we present the workflow of PPVF, as shown in Fig. 1. Briefly speaking, the life cycle of each request involves five steps. ① Upon receiving a video viewing request from a user, EDs first search for that video in their edge cache. If the video is cached, EDs directly stream it to users with a shorter response latency without any privacy leakage. Otherwise, EDs assemble pre-fetching video requests, along with the viewing video, to fetch redundant videos from the CP. ② The utility predictor evaluates the videos’ utility based on the federated learning framework, which ensures that only model parameters $\bm{\theta}$ are exchanged between EDs and the CP. The process is detailed in Section IV. ③ Subsequently, the budget scheduler assesses video utility in conjunction with the privacy budget to curate a video candidate set for the subsequent pre-fetching decision, elaborated in Section III-B. ④ The request generator distorts the original utility to generate a pre-fetching decision, which navigates the trade-off between privacy and caching performance by leveraging the EM with the CDP method. After combining the pre-fetching decision with the real view video, the final fetching vector is sent to the CP. This process will be discussed in Section III-C. ⑤ When the CP returns videos, EDs perform cache replacement based on video utility and forward the viewing video to users.

Note that our main contribution is represented by three modules in green color (long dashed box) in Fig. 1, which will be introduced in the following sections in detail.

III-B Threshold-based Online Algorithm

Directly solving problem $\mathbb{P}_{e}$ confronts two challenges: (1) the problem is inherently online due to the dynamic nature of request patterns and video utility, and (2) the problem $\mathbb{P}_{e}$ can be categorized as an online multiple knapsack problem, which is challenging to solve immediately and irrevocably, even if the utility $\bm{\lambda}^{k}_{e}$ is known.

To solve the challenging online problem $\mathbb{P}_{e}$ , we propose a filtering mechanism [28, 29] that selects a video into $\mathcal{A}_{e}^{k}$ if the ratio of this video utility over its excepted privacy budget exceeds a threshold. Intuitively speaking, a video of a higher utility will be played by users in the future with a higher probability. Hence, the utility of EDs in caching such videos will be higher. Meanwhile, considering the limited privacy budget, PPVF only selects videos with the ratio $\lambda_{e,i}^{k}/\epsilon_{e,i}$ exceeding a threshold. This threshold is set in accordance with the fraction of the consumed privacy budget.

We can set the threshold for selecting videos as follows. Let $U_{e}$ and $L_{e}$ denote the upper and lower bound of the ratio for any video $i$ at ED $e$ , which means that $L_{e}<\lambda_{e,i}^{k}/\epsilon_{e,i}<U_{e},\ \forall i\in\mathcal{I},1\leq k\leq\text{K}_{e}$ . With the values of $L_{e}$ and $U_{e}$ , the threshold function is defined as:

\Theta_{e}(\gamma)=\left\{\begin{array}[]{cl}L_{e},&0\leq\gamma\leq\Gamma_{e},\\ \left(\frac{U_{e}\cdot\exp(1)}{L_{e}}\right)^{\gamma}\frac{L_{e}}{\mathrm{e}},&\Gamma_{e}<\gamma\leq 1.\end{array}\right.

(4)

Here, $\gamma\in[0,1]$ denotes the fraction of the privacy budget that has been allocated to a video until the current time, $\Gamma_{e}=\frac{1}{1+ln(U_{e}/L_{e})}$ is the lowest threshold for assessing the privacy budget proportion $\gamma$ . The intuition of our design is that the selection of a video is more conservative if the consumed privacy budget fraction $\gamma$ of that video is larger.

The detailed algorithm is presented in Alg. 1. Specifically, using the threshold function $\Theta_{e}(\cdot)$ , PPVF randomly selects a video from the set $\mathcal{I}$ and checks whether its $\lambda_{e,i}^{k}/\epsilon_{e,i}$ is exceeds the threshold $\Theta_{e}(\gamma_{e,i}^{k})$ . If so, the video is incorporated into candidate set $\mathcal{A}_{e}^{k}$ ; if not, the video remains unchosen. This stochastic selection process continues until $\mathcal{A}_{e}^{k}$ contains $f_{e}$ videos. Although Alg. 1 is a heuristic-based algorithm, we can theoretically prove that Alg. 1 can achieve an optimal competitive ratio (CR) of $\left(1+\ln(U_{e}/L_{e})\right)$ for any ED $e$ .

Theorem 1.

Alg. 1 has a competitive ratio of $\left(1+\ln(U_{e}/L_{e})\right)$ under rational Assumption 1 for any ED $e$ to allocate the privacy budget.

Assumption 1.

Each privacy cost of a pre-fetching video $i$ has a weight much smaller than the total budget of the content, i.e., $\epsilon_{e,i}\ll\xi_{e}$ with respect to any ED $e$ .

Proof.

We prove Theorem 1 with Assumption 1 in Appendix A. ∎

Considering $U_{e}=\max_{\forall i,k}\lambda_{e,i}^{k}/\epsilon_{e,i}$ and $L_{e}=\min_{\forall i,k}\lambda_{e,i}^{k}/\epsilon_{e,i}$ , we observe that CR is solely dependent on $\bm{\lambda}$ and $\bm{\epsilon}$ and is independent of the request quantity (i.e., $K_{e}$ ). As the privacy budgets $\bm{\epsilon}$ are specified by EDs for each video, while utility $\bm{\lambda}$ is often generated by upstream utility prediction algorithms, they are both within a specific range. This characteristic ensures a constant-level CR of our algorithm, independent of the total request quantity (i.e., $K_{e}$ ), which is a highly appealing property. Through differentiation of CR with respect to $U_{e}$ and $L_{e}$ , it becomes evident that a decrease in $U_{e}$ leads to a reduction in CR, indicating an approach to the optimal offline solution in the worst-case scenario. Similarly, a larger $L_{e}$ results in a smaller value of CR. Moreover, when $U_{e}/L_{e}$ approaches 1, CR approaches 1, indicating that the performance is close to the offline optimal solution.

Input: The space of all videos

\mathcal{I}

; The total privacy budget for all videos

\xi_{e}

; The pre-fetching capacity

f_{e}

; The privacy cost

\bm{\epsilon}_{e}

Output: The candidate set

\mathcal{A}_{e}

for video pre-fetching.

1 Initialize

k\leftarrow 1

\bm{\gamma}_{e}^{k}\leftarrow[0]^{\text{I}}

;

2 for $k\leq\text{K}_{e}$ do

3 Initialize

\bm{a}_{e}^{k}\leftarrow[0]^{\text{I}}

f^{k}\leftarrow 0

\mathcal{I}^{k}\leftarrow\mathcal{I}

;

4 Obtain the evaluated video utility

\bm{\lambda}^{k}_{e}

;

5 while $f^{k}<f_{e}$ and $\mathcal{I}^{k}\neq\emptyset$ do

6 Select content

i

randomly from

\mathcal{I}^{k}

;

\mathcal{I}^{k}\leftarrow\mathcal{I}^{k}-\{i\}

;

8 if $\frac{\lambda_{e,i}^{k}}{\epsilon_{e,i}}>\Theta(\gamma_{e,i}^{k})$ and $\epsilon_{e,i}<(1-\gamma_{e,i}^{k})\cdot\xi_{e}$ then

a_{e,i}^{k}\leftarrow 1

;

\gamma_{e,i}^{k+1}\leftarrow\gamma_{e,i}^{k}+\frac{\epsilon_{e,i}}{\xi_{e}}

;

f^{k}\leftarrow f^{k}+1

;

11 else

a_{e,i}^{k}\leftarrow 0

;

\gamma_{e,i}^{k+1}\leftarrow\gamma_{e,i}^{k}

;

f^{k}\leftarrow f^{k}

;

13 end if

15 end while

16 Generate the candidate set

\mathcal{A}_{e}^{k}

with

\bm{a}_{e}^{k}

following Eq. (3f);

k\leftarrow k+1

;

19 end for

Algorithm 1 Online privacy budget allocation algorithm for ED

e

III-C CDP-based Video Pre-fetching

Directly requesting videos in $\mathcal{A}_{e}^{k}=\{i\ |\ a_{e,i}^{k}=1,\forall i\in\mathcal{I}\}$ according to video utility can expose the video utility knowledge to the CP. To preserve privacy, PPVF adopts the EM to randomly select videos based on probability shown in Eq. (5) and generate the final pre-fetching decision $\bm{x}_{e}^{k}$ . Specifically, if video $i$ is selected by the EM, PPVF will set $x_{e,i}^{k}=1$ to pre-fetch that video from the CP. Otherwise, it will be set to $0$ . The probability is given by

P\{\text{video $i$ is chosen from }\mathcal{\mathcal{A}}_{e}^{k}\ |\ \bm{\lambda}_{e}^{k}\}\propto\exp\frac{\epsilon_{e}^{k}\cdot\lambda_{e,i}^{k}}{2\cdot\Delta\lambda_{e,gc}^{k}}.

(5)

Here, $\epsilon_{e}^{k}=\frac{1}{f_{e}}\sum_{i\in\mathcal{A}^{k}_{e}}\epsilon_{e,i}$ is the averaged privacy for pre-fetching one redundant video, where $\sum_{i\in\mathcal{A}^{k}_{e}}\epsilon_{e,i}$ denotes the total privacy budget assigned by Alg. 1. Besides, $\Delta\lambda_{e,gc}^{k}$ is the global sensitivity at the time of the $k$ -th pre-fetching.

In our problem, the calculation of $\Delta\lambda_{e,gc}^{k}$ is complicated because of the correlation between videos. Collaborative filtering algorithms can exploit such correlation information for inferring users’ personal interests. To factor in the influence of video correlation, we employ the correlated differential privacy (CDP) for computing sensitivity. For ED $e$ , we can calculate the correlation between videos $i$ and $j$ for the $k$ -th pre-fetching with $\lambda_{e,i(j)}^{k}$ predicted by utility function $h_{e}$ . Suppose that EDs cache three history matrices $\bm{\psi}_{e}^{k-1}=[\psi_{e,i,j}^{k-1}]^{\text{I}\times\text{I}}$ , $\bm{\alpha}_{e}^{k-1}=[\alpha_{e,i}^{k-1}]^{\text{I}}$ , $\bm{\sigma}_{e}^{k-1}=[\sigma_{e,i}^{k-1}]^{\text{I}}$ , where the items can be incrementally updated by $\psi_{e,i,j}^{k}=\psi_{e,i,j}^{k-1}+\lambda_{e,i}^{k}\cdot\lambda_{e,j}^{k},\ \alpha_{e,i}^{k}=\alpha_{e,i}^{k-1}+\lambda_{e,i}^{k},\ \sigma_{e,i}^{k}=\sigma_{e,i}^{k-1}+(\lambda_{e,i}^{k})^{2},$ respectively. Based on Pearson’s correlation [27], the correlation degree between videos $i$ and $j$ can be calculated with these three history matrices as follows:

\Psi_{e,i,j}^{k}=\frac{k\cdot\psi_{e,i,j}^{k}-\alpha_{e,i}^{k}\cdot\alpha_{e,j}^{k}}{\sqrt{k\cdot\sigma_{e,i}^{k}-(\alpha_{e,i}^{k})^{2}}\cdot\sqrt{k\cdot\sigma_{e,j}^{k}-(\alpha_{e,j}^{k})^{2}}}.

(6)

Let $\bm{\Psi}_{e}^{k}=[\Psi_{e,i,j}^{k}]^{\text{I}\times\text{I}}$ denote the correlation matrix. The sensitivity of our problem can be calculated using the following two definitions.

Definition 3.

(Correlated Video Sensitivity) For any given ED $e\in\mathcal{E}$ , missing request index $1\leq k\leq\text{K}_{e}$ and video $i,j\in\mathcal{A}_{e}^{k}$ , the correlated video sensitivity $\Delta\lambda_{e,i}^{k}$ is defined as

\Delta\lambda_{e,i}^{k}=\sum_{j\in\mathcal{A}^{k}_{e}}(\Psi^{k}_{e,i,j}||h_{e}(i,t^{k}|\mathcal{V}^{k}_{e},\bm{\theta})-h_{e}(i,t^{k}|\mathcal{V}^{k}_{e,-j},\bm{\theta})||_{1}),

(7)

where $\Psi_{e,i,j}^{k}$ is the correlation degree parameter, $h_{e}:\mathcal{I}\times\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ is the utility function, $\mathcal{A}^{k}_{e}$ is the candidate for video selection, and $\mathcal{V}^{k}_{e}$ , $\mathcal{V}^{k}_{e,-j}$ are two adjacent datasets different in video $j$ .

Here, the L1 distance measures the effect on utility when deleting records related to video $j$ from $\mathcal{V}_{e}^{k}$ . Parameter $\Psi_{e,i,j}^{k}$ estimates the correlated degree between videos $i$ and $j$ . Correlated Video Sensitivity combines the effect of correlated records and the correlated degree together.

Definition 4.

(Correlated Sensitivity [27]) Given all the video sensitivities $\Delta\lambda_{e,i}^{k},\,\forall i\in\mathcal{A}_{e}^{k}$ , the global sensitivity $\Delta\lambda_{e,gc}^{k}$ for the correlated videos is determined by

\Delta\lambda_{e,gc}^{k}=\max_{i\in\mathcal{A}_{e}^{k}}\Delta\lambda_{e,i}^{k}.

(8)

The correlated sensitivity lists all videos in candidate set $\mathcal{A}_{e}^{k}$ responding to utility and selects the maximal video sensitivity as the correlated sensitivity. When videos are independent or weakly correlated, the global sensitivity will only slightly increase. Particularly, if all videos are independent, the correlated video sensitivity is equal to the global sensitivity, i.e., $\max_{i\in\mathcal{A}^{k}_{e}}||h_{e}(i,t^{k}|\mathcal{V}^{k}_{e},\bm{\theta})-h_{e}(i,t^{k}|\mathcal{V}^{k}_{e,-i},\bm{\theta})||_{1}$ . Additionally, given the historical matrices $\bm{\psi}_{e}^{k}$ , $\bm{\alpha}_{e}^{k}$ , and $\bm{\sigma}_{e}^{k}$ , which can be incrementally updated prior to the $k$ -th pre-fetching, the correlated degree $\Psi_{e,i,j}^{k}$ can be efficiently determined in $\text{O}(1)$ , as per Eq. (6). Consequently, the computational complexity to obtain sensitivity $\Delta\lambda^{k}_{e,i}$ of video $i$ is $\text{O}(f_{e})$ with the evaluated utility vector $\bm{\lambda}^{k}_{e}$ by function $h_{e}$ and the correlation matrix $\bm{\Psi}_{e}^{k}$ . Finally, the total computational complexity for deriving the correlated sensitivity $\Delta\lambda^{k}_{e,gc}$ is $\text{O}(f_{e}^{2})$ with the utility vector $\bm{\lambda}^{k}_{e}$ and the correlation matrix $\bm{\Psi}_{e}^{k}$ .

The detailed algorithm to generate redundant pre-fetching video requests is presented in Alg. 2. It can be proved that Alg. 2 guarantees a $\sum_{i\in\mathcal{A}_{e}^{k}}\epsilon_{e,i}$ -DP for the $k$ -th pre-fetching decision at ED $e$ , where $\sum_{i\in\mathcal{A}^{k}_{e}}\epsilon_{e,i}$ denotes the total privacy budget assigned by Alg. 1. The proof can be directly deduced by considering the properties of the EM and the Composition Theorem [30]. The EM facilitates the selection of one video from the candidate set for pre-fetching while ensuring $\epsilon_{e}^{k}$ -DP compliance. Based on the Composition Theorem [30], Alg. 1 makes a maximum $f_{e}$ times of selections, adhering to $f_{e}\cdot\epsilon_{e}^{k}$ -DP, which is equivalent to $\sum_{i\in\mathcal{A}^{k}_{e}}\epsilon_{e,i}$ -DP.

Remark 1.

In a nutshell, the superiority of PPVF for optimally balancing privacy preservation and efficiency is attributed to the following two advantages. Firstly, rather than blindly distorting requests for all videos, PPVF can reduce the consumption of the privacy budget by only distorting requests for a subset of selected candidate videos. Secondly, by considering correlation in video requests, PPVF can more accurately calibrate the noise scale by using CDP, which can avoid setting over-large noise scales for videos.

Input: Pre-fetching index

k

; Video utility

\bm{\lambda}^{k}_{e}

; Allocated privacy budget

\bm{\epsilon}_{e}

; Pre-fetching capacity

f_{e}

Output: The pre-fetching decision

\bm{x}_{e}

1 Obtain candidate set

\mathcal{A}_{e}^{k}

for the

k

-th pre-fetching by Alg. 1;

2 Incrementally update

\bm{\psi}_{e}^{k}

\bm{\alpha}_{e}^{k}

, and

\bm{\sigma}_{e}^{k}

with video utility

\bm{\lambda}^{k}_{e}

;

3 Calculate

\bm{\Psi}_{e}^{k}

with matrices

\bm{\psi}_{e}^{k}

\bm{\alpha}_{e}^{k}

, and

\bm{\sigma}_{e}^{k}

by Eq. (6);

4 Obtain

\Delta\lambda_{e,gc}^{k}

with

\bm{\lambda}^{k}_{e}

\bm{\Psi}_{e}^{k}

and

\mathcal{A}_{e}^{k}

following Eqs. (7)-(8);

\bm{x}_{e}^{k}\leftarrow[0]^{\text{I}}

f^{k}\leftarrow 0

;

6 while $f^{k}<f_{e}$ do

7 Select pre-fetching video

i

from

\mathcal{A}_{e}^{k}

based on the probability shown in Eq. (5);

x_{e,i}^{k}=1

f^{k}\leftarrow f^{k}+1

;

10 end while

return Pre-fetching decision

\bm{x}_{e}^{k}

Algorithm 2 Online privacy-preserving videos pre-fetching algorithm for ED

e

IV Online Video Utility Prediction

In this section, we shift our focus to the discussion on evaluating video utility, i.e., $\lambda_{e,i}^{k}=h_{e}(i,t^{k}\ |\ \mathcal{V}_{e}^{k},\bm{\theta}),\forall e,\,i,\,k$ , by federated learning. Note that FL is a privacy-preserving framework for training machine learning models. We resort to point process-based models, i.e., Mutual-Exciting Process (MEP) [31, 32], to illustrate how PPVF works. Note that MEP is employed here because it has been widely used in [33, 22, 34] for predicting video utility in traditional online video caching problems. It is not difficult to replace MEP with a new model for video utility prediction in PPVF. In this section, we focus on how to modify it to fit in PPVF.

IV-A Intensity and Likelihood Function

The core of a point process lies in its intensity function, denoting the occurrence probability of an event within a tiny time window $[t,t+\rm{d}t)$ [34]. By abusing notations a little bit, an intensity function can be defined by $h(\iota,t)\text{d}t=P\{\Omega\,|\,\mathcal{V}(t)\}=\text{E}(\text{d}N(\iota,t)|\mathcal{V})$ , where $N(\iota,t)$ is the count function and $\text{E}(\text{d}N(\iota,t)\,|\,\mathcal{V})$ represents the expected count of occurrences of event $\Omega$ with type $\iota$ in the time window $[t,t+\text{d}t)$ based on the historical event set $\mathcal{V}$ [34].

In PPVF, the historical event set $\mathcal{V}$ corresponds to the historical record set of viewing requests. We can create an intensity function for a particular video $i$ (i.e., event type $\iota$ ) indicating the expected request rate from users for that video, which is regarded as the utility of video $i$ for caching. Recall that the set $\mathcal{T}_{e,i}^{0,t^{k}}$ represents the timestamps corresponding to requests of video $i$ in local viewing records $\mathcal{V}_{e}^{k}$ at ED $e$ within the time interval $[0,t^{k})$ . We can create the intensity function for video $i$ and time $t^{k}$ at ED $e$ as:

\displaystyle h_{e}(i,t^{k}\,|\,\mathcal{V}_{e}^{k},\bm{\beta},\bm{\omega})

\displaystyle=\beta_{i}+\sum_{j\in\mathcal{I}}\omega_{ij}\sum_{\tau\in\mathcal{T}_{e,j}^{0,t^{k}}}\phi(t^{k}-\tau),

(9)

where $\bm{\omega}=[\omega_{i,j}]^{\text{I}\times\text{I}},\omega_{i,j}\in\mathbb{R}^{+}$ denotes the influencing parameter matrix among all videos, while $\bm{\beta}=[\beta_{i}]^{\text{I}},\beta_{i}\in\mathbb{R}^{+}$ is the bias parameter vector of the intensity functions. Specifically, $\phi(\cdot)$ is defined as $\phi(t)=\exp(-\delta\cdot t)$ , where exponential decreasing kernel functions are adopted to gauge the influence of historical events for point process models [34, 32, 31]. Here, $\delta>0$ serves as a hyper-parameter of the influence attenuation coefficient.

Remark 2.

The intuition of Eq. (9) is that users’ video requests at different time points are correlated, where a more recent historical video request would contribute a higher request rate to its relevant video. The extent can be captured by influencing parameters and kernel functions in the point process model to predict future request rates.

To make our presentation concise, $h_{e}(i,t^{k})$ is used to represent $h_{e}(i,t^{k}\,|\,\mathcal{V}_{e}^{k},\bm{\beta},\bm{\omega})$ hereafter if the meaning is clear. The parameter space of $\omega_{i,j}$ in Eq. (9) is $\text{O}(\text{I}^{2})$ which is prohibitive for solving directly. The parameter space can be reduced by Singular Value Decomposition (SVD) [33]. Given that $\omega_{i,j}$ represents how much video $j$ influences video $i$ , it can be decomposed as the product of $\omega_{i,j}=\bm{p}_{i}\cdot\bm{q}_{j}^{\intercal}$ . Here, $\bm{p}_{i}$ and $\bm{q}_{j}$ are latent vectors with dimension $\text{D}\ll\text{I}$ . Hence, we can significantly shrink the dimension space of $\bm{\omega}$ to avoid overfitting. Specifically, the parameter dimensions can be condensed from $\text{I}\times\text{I}$ to $2\times\text{I}\times\text{D}$ , where $\text{D}\ll\text{I}$ . Consequently, the utility $\lambda_{e,i}^{k},\forall e,i,k$ can be obtained by the revised form in Eq. (10).

\displaystyle\lambda_{e,i}^{k}=\hat{h}_{e}(i,t^{k})

\displaystyle=\beta_{i}+\sum_{j\in\mathcal{I}}\bm{p}_{i}\cdot\bm{q}_{j}^{\intercal}\sum_{\tau\in\mathcal{T}_{e,j}^{0,t^{k}}}\phi(t^{k}-\tau).

(10)

Next, we can use the maximum likelihood estimation (MLE) [22] to optimize all parameters denoted by $\bm{\theta}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{\bm{\beta},\bm{p},\bm{q}\}$ in Eq. (10). With local timestamp set $\mathcal{T}_{e,i}$ for video $i$ in time $[0,T)$ at ED $e$ , the local log-likelihood function is derived as:

\begin{split}ll_{e}\left(\,\bm{\theta}\,|\,\mathcal{V}_{e}\,\right)=\sum_{i\in\mathcal{I}}\sum_{\tau\in\mathcal{T}_{e,i}}\log\hat{h}_{e}(i,\tau)-\int_{0}^{T}\hat{h}_{e}(i,t)\mathrm{d}t.\end{split}

(11)

Here, $\mathcal{V}_{e}$ represents the whole private dataset at ED $e$ to generate the time timestamp $\mathcal{T}_{e,i}$ for any $i\in\mathcal{I}$ . The detailed derivation can be found in Appendix B. To preserve privacy, each ED $e$ should locally maximize Eq. (11). However, the estimation accuracy will be inferior because request records owned by each ED can be very scarce. Moreover, given the dynamic nature of video popularity, parameter estimation cannot be solved by one-time training. Continuous online learning is necessary to closely track the changes in video request patterns. To address these challenges, we propose an FL-based online parameter estimation algorithm, and the CP can coordinate the training process by collecting, aggregating, and distributing model parameters.

IV-B FL-based Online Parameter Estimation

IV-B1 Local Online Log-Likelihood Function for EDs

In practical video systems, user requests are generated online, which can make the computation complexity of $\lambda_{e,i}^{k}$ very heavy. To alleviate computation overhead, we simplify Eq. (10) by removing distant historical events without compromising the accuracy of utility prediction. In Eq. (10), the kernel $\phi(t-\tau)$ represents the influence of the request at time $\tau$ on video utility, where $t$ is the current time. If $t-\tau\gg 1$ , it implies that $\phi(t-\tau)\approx 0$ and the influence of the request at the past time $\tau$ on video utility prediction is negligible. Thus, we set a threshold $\phi_{th}$ to eliminate these distant records from computation. At time $t$ , all records before time $t+\frac{\log\phi_{th}}{\delta}$ will be ignored. In this way, we can significantly reduce computation overhead.

Let $\Delta t=-\frac{\ln{\phi_{th}}}{\delta}$ . At time $t_{\theta}$ , where $t_{\theta}\in\mathcal{Q}$ is the timestamp to update model parameters, we only consider records in the period $[t_{\theta}-\Delta t,t_{\theta})$ in an online manner.

\begin{split}\hat{ll}_{e}\left(\,\bm{\theta}\,|\,\mathcal{V}_{e}\right)&=\sum_{i\in\mathcal{I}}\left(\sum_{\tau\in\mathcal{T}_{e,i}^{t_{\theta}-\Delta t,t_{\theta}}}\log\hat{h}_{e}(i,\tau)\right.\\ &\hskip 80.0pt-\left.\int_{t_{\theta}-\Delta t}^{t_{\theta}}\hat{h}_{e}(i,t)\mathrm{d}t\right).\end{split}

(12)

Notably, unlike the approach in [22], we only truncate the online update interval for the log-likelihood calculation while preserving the influence of all historical events in the intensity function. This choice aligns with our ability to incrementally compute the impact of historical records in Eq. (10), Eq. (12), avoiding excessive computational complexity. For a detailed discussion on the incremental computation of the point process, please consult [33].

With a direct mathematical derivation, the partial derivatives of each parameter respected to Eq. (12) can be derived as follows:

	$\displaystyle\begin{split}\frac{\partial\hat{ll}_{e}(\bm{\theta}\,\|\,\mathcal{V}_{e})}{\partial\beta_{i}}&=\sum_{\tau\in\mathcal{T}_{e,i}^{t_{\theta}-\Delta t,t_{\theta}}}\frac{1}{\hat{h}_{e}(i,\tau)}-\Delta t,\end{split}$			(13)
	$\displaystyle\begin{split}\frac{\partial\hat{ll}_{e}(\bm{\theta}\,\|\,\mathcal{V}_{e})}{\partial\bm{p}_{i}}&=\sum_{\tau\in\mathcal{T}_{e,i}^{t_{\theta}-\Delta t,t_{\theta}}}\frac{\sum_{j\in\mathcal{I}}\bm{q}_{j}^{\intercal}\sum_{\tau^{\prime}\in\mathcal{T}_{e,j}^{0,\tau}}\phi(\tau-\tau^{\prime})}{\hat{h}_{e}(i,\tau)}\\ &-\sum_{j^{\prime}\in\mathcal{I}}\bm{q}_{j^{\prime}}^{\intercal}\left(\sum_{\tau^{\prime\prime}\in\mathcal{T}_{e,j^{\prime}}^{0,t_{\theta}-\Delta t}}\int_{t_{\theta}-\Delta t-\tau^{\prime\prime}}^{t_{\theta}-\tau^{\prime\prime}}\phi(t)\mathrm{d}t\right.\\ &-\left.\sum_{\tau^{\prime\prime\prime}\in\mathcal{T}_{e,j^{\prime}}^{t_{\theta}-\Delta t,t_{\theta}}}\int_{0}^{t_{\theta}-\tau^{\prime\prime\prime}}\phi(t)\mathrm{d}t\right),\end{split}$			(14)
	$\displaystyle\begin{split}\frac{\partial\hat{ll}_{e}(\bm{\theta}\,\|\,\mathcal{V}_{e})}{\partial\bm{q}_{j}}&=\sum_{i\in\mathcal{I}}\bm{p}_{i}\left(\sum_{\tau\in\mathcal{T}_{e,i}^{t_{\theta}-\Delta t,t_{\theta}}}\frac{\sum_{\tau^{\prime}\in\mathcal{T}_{e,j}^{0,\tau}}\phi(\tau-\tau^{\prime})}{\hat{h}_{e}(i,\tau)}\right.\\ &-\sum_{\tau^{\prime\prime}\in\mathcal{T}_{e,j}^{0,t_{\theta}-\Delta t}}\int_{t_{\theta}-\Delta t-\tau^{\prime\prime}}^{t_{\theta}-\tau^{\prime\prime}}\phi(t)\mathrm{d}t\\ &-\left.\sum_{\tau^{\prime\prime\prime}\in\mathcal{T}_{e,j}^{t_{\theta}-\Delta t,t_{\theta}}}\int_{0}^{t_{\theta}-\tau^{\prime\prime\prime}}\phi(t)\mathrm{d}t\right).\end{split}$			(15)

Upon computing the local likelihood value $\hat{ll}_{e}(\bm{\theta}\,|\,\mathcal{V}_{e})$ and gradient values $\frac{\partial\hat{ll}_{e}(\bm{\theta}\,|\,\mathcal{V}_{e})}{\partial\beta{i}}$ , $\frac{\partial\hat{ll}_{e}(\bm{\theta}\,|\,\mathcal{V}_{e})}{\partial\bm{p}_{i}}$ , and $\frac{\partial\hat{ll}_{e}(\bm{\theta}\,|\,\mathcal{V}_{e})}{\partial\bm{q}_{j}}$ using local historical viewing records from time $[t_{\theta}-\Delta t,t_{\theta})$ , the EDs transmit only these non-sensitive gradients and likelihood values to the CP for global estimation.

IV-B2 Global Log-Likelihood Function for CP

For a more precise understanding of the global estimation, we present the global likelihood function as follows:

\begin{split}\min_{\bm{\theta}}\ L=-\sum_{e\in\mathcal{E}}\hat{ll}_{e}&\left(\,\bm{\theta}\,|\,\mathcal{V}_{e}\right)+\frac{\rho_{\beta}}{2}\lVert\bm{\beta}\rVert_{2}^{2}+\frac{\rho_{p}}{2}\lVert\bm{p}\rVert_{2}^{2}+\frac{\rho_{q}}{2}\lVert\bm{q}\rVert_{2}^{2},\\ &\text{ s.t. }\quad\bm{\beta},\bm{q},\bm{p}\in\mathbb{R}^{+},\end{split}

(16)

where $\rho_{\beta},\rho_{q},\rho_{p}>0$ denote regularization parameters. Here, instead of direct computation by the server using private records, all EDs upload the local likelihood function value $\hat{ll}_{e}\left(\bm{\theta}\,|\,\mathcal{V}_{e}\right)$ . Furthermore, for any $\theta\in\bm{\theta}$ , the CP can aggregate the gradient of $\theta$ separately, drawing upon the gradient $\frac{\partial\hat{ll}_{e}(\bm{\theta}\,|\,\mathcal{V}_{e})}{\partial\theta}$ provided by the EDs. Subsequently, the partial derivative from Eq. (16), when aggregated in CP, aligns with

\frac{\partial L}{\partial\theta}=\rho_{\theta}\theta+\sum_{e\in\mathcal{E}}\frac{\partial\hat{ll}_{e}(\bm{\theta}\,|\,\mathcal{V}_{e})}{\partial\theta}.

(17)

Let $\theta^{(n)}$ denote the parameters trained for $n$ iterations. Subsequently, the update rule for $\theta$ is:

\theta^{(n+1)}\leftarrow\theta^{(n)}+\eta\left(-\frac{\partial\,L}{\partial\,\theta^{(n)}}+\rho_{\theta}\theta^{(n)}\right).

(18)

Here, $\rho_{\theta}$ represents the regularization parameter specific to $\theta$ , while $\eta$ signifies the learning rate determined by the selected gradient descent algorithm. Upon completing a round of parameter updates, the updated parameters $\bm{\theta}^{(n)}$ are disseminated to each ED for the subsequent iteration.

IV-B3 FL-based Execution

In our FL framework, EDs compute local likelihood plus gradient values, and subsequently expose these parameters (i.e., $\bm{\theta}$ ) to the parameter server (perhaps maintained by the CP) for parameter aggregation. By interacting with EDs, the CP is responsible for collecting, aggregating, updating, and then disseminating model parameters to EDs. This approach is crafted to optimize the model without sharing raw historical viewing records from EDs, and thus preserves privacy.

It can be completed by iteratively conducting the following two operations on EDs and the CP:

•

EDs’ role in FL: EDs calculate the local likelihood function and gradients using recent timestamp sets $\cup_{i\in\mathcal{I}}\mathcal{T}^{t_{\theta}-\Delta t,t_{\theta}}_{e,i}$ related to historical record set, in conjunction with model parameters $\bm{\theta}^{(n)}$ from the $n$ -th iterations. Then, EDs transmit these gradients and likelihood values to the parameter server. Therefore, user privacy is preserved since the original data will not be exposed.
•

CP’s role in FL: Once the CP receives computations from EDs, it aggregates likelihood functions and gradient values for all EDs. This consolidated result underpins the global gradient update. Following this update, the CP distributes the revised model parameters $\bm{\theta}^{(n+1)}$ to EDs.

The details can be found in Alg. 3 in Appendix C.

V Performance Evaluation

In this section, we conduct trace-driven experiments to evaluate PPVF using a real viewing dataset in Tencent Video. We seek to answer the following three questions: (1) How well do PPVF’s components of budget scheduler and request generator perform in distorting the private information in user profiles (Section V-D1)? (2) How well does PPVF work to protect users’ interests against the powerful recommendation system (Section V-D2)? (3) How well can the PPVF framework adapt to traffic changes and improve edge caching performance compared to fixed experts and SOTA learning-based approaches (Section V-E)?

To facilitate the peer review, we also release the source code of our system PPVF²²2https://github.com/zhangxzh9/PPVF-MAINCODE and the dataset³³3https://github.com/zhangxzh9/PPVF-DATASET. We now discuss the methodology and setup of our evaluations.

V-A Dataset and Settings

Given the large scale of Tencent Video Datasets, we randomly sample a small subset of $10,000$ users drawn from the upper echelon of active users (i.e., the set of $20\%$ users with the largest interactive viewing records) in the origin public dataset [22]. The new dataset consists of $933,541$ video viewing requests for $10,373$ unique videos, all within a specific city over 30 days. Following [33], we evenly group these users into $25$ fixed groups to replicate a real online video system aided by $25$ EDs during the whole 30 days. It is important to note that this number set is simulation-based, and this setting can be tuned by the customized strategy that is adaptable to various scenarios and meets different user privacy requirements. The experimental results also demonstrate the robustness of our framework at different levels of EDs. As such, each request record in the dataset is collated by the metadata $(e,u,i,\tau)$ .

Similar to [22], the time interval in our experiments is quantized at $1$ hour for all requests, i.e., the requests arrived at the same hour have the same timestamp $\tau$ . It is important to emphasize that the pre-fetching video’s timing is not tied to a specific time slot. If a request is not met at the edge, EDs must promptly retrieve the video from CP using the pre-fetching algorithm. Additionally, we split the dataset into two date-based subsets. The initial subset, containing requests with time $0\leq\tau<240$ hours, is used to initialize the system. The subsequent subset with requests over the next $20$ days (i.e., $240\leq\tau<720$ hours) functions as the test period.

Other experimental setups are described according to different tasks: (1) Point process and FL: Following [33], the decay parameter and the dimension of the latent vector are designated as $\delta=0.01$ and $D=10$ , respectively. In alignment with [22], all model parameters (i.e., $\bm{\beta}$ , $\bm{p}$ , and $\bm{q}$ ) are initialized at $1.0$ . For online FL-based parameter estimation, the maximum iteration count is $20$ and the truncated threshold is set to $\phi_{th}=e^{-0.48}$ . Therefore, the interval $t_{\theta}$ to update $\bm{\beta}$ , $\bm{p}$ , and $\bm{q}$ is $2$ days ( $48$ hours), which is the same as that setting in [22]. Some detailed experiments are conducted to study the influence of the online updated interval of the FL framework. (2) Edge pre-fetching and caching: For experimental consistency, the privacy cost ( $\epsilon_{e,i},\forall e,\,i$ ) is uniformly set to $1$ for each video’s pre-fetching requests [35]. Additionally, we standardize the allocation of the privacy budget, pre-fetching capacity, and caching capacity across all EDs with values set at $\xi_{e}=15$ , $f_{e}=4$ , and $c_{e}=1\%$ , respectively. In specific experiments, one of these parameters might be varied to assess its impact, with the other two being constant.

V-B Baselines

We compare PPVF with three types of baselines. The first type includes privacy-preserving video fetching algorithms, while the second and third types are video caching algorithms that do not consider privacy leakage. Privacy-preserving caching algorithms include: (1) SAGE [36], which pre-fetches videos with randomly assigned privacy budgets until the privacy budgets reach the maximum constraints; (2) BESTFIT, which allocates the privacy budget to pre-fetch videos with the highest utility until the privacy budgets reach the maximum constraints. Note that these two baselines are only designed for privacy allocation without designing a video utility predictor. For a fair comparison, we implement our video utility predictor in SAGE and BESTFIT for caching.

To further demonstrate the superiority of our utility predictor, we replicate two advanced caching utility prediction methods at the edge for comparison. These algorithms are introduced as follows: (3) MAV [37], which caches the videos at the edge nodes considering the strength of user requests in the future round within the dynamic Stackberg game. The caching utility is calculated by the moving average value (MAV) method, and the weight of MAV is set as $0.9$ based on [37]; (4) HRS [22], which serves as a video popularity prediction model designed for the edge server, employing a fusion of three distinct point process models. All parameter configurations within this baseline align with the defaults specified in [22]. It is worth mentioning that these two baselines are mainly designed to improve edge caching efficiency with utility (e.g., popularity) prediction. Both of them overlook the privacy of users exposed by pre-fetching requests. Therefore, we only replace our utility predictor module with MAV and HRS and keep the other system components unchanged.

We also compare PPVF with the following two eviction caching algorithms, in which EDs only fetch videos watched by users when the cache is missed. These algorithms include: (5) LRU (Least Recently Used), which replaces the video that has not received any request for the longest time with a newly requested video; (6) LFU (Least Frequently Used), which replaces the video that has been requested in the least number of times with a newly requested video. These two conventional caching algorithms are extensively employed both in industry and academia, making them suitable benchmarks for comparing caching performance.

V-C Metrics

To evaluate PPVF, we employ three metrics to evaluate both privacy protection and system efficiency. More specifically, we adopt the following metrics in our experiments:

1.

JS (Jaccard Similarity) measures the averaged similarity between users’ real profiles and profiles exposed by their ED for video fetching. Each profile is represented by a vector of dimension I, where each element indicates whether a video has been requested by a user or ED during the entire testing period. A lower similarity is more desirable, implying stronger privacy protection.
2.

RHR (Recommendation Hit Rate) Degradation, which calculates the averaged degradation of RHR among all users when using a recommendation algorithm to recommend videos for users based on their original profiles and noisy profiles exposed by EDs. A larger degradation of RHR implies stronger privacy protection. A popular collaborative filtering recommendation algorithm [26], NCF, is implemented with the same settings in [26] as the adversary in our experiments.
3.

CHR (Cache Hit Ratio), which is defined as the number of video hits at all EDs divided by the total number of original video requests from users over the entire test period. CHR is employed to evaluate the caching system efficiency.

V-D Effectiveness of Privacy Protection

We first evaluate the performance of privacy protection using two metrics, i.e., the average JS and the degradation of RHR. We then further investigate the final status of the remaining privacy budget of all content.

V-D1 Effectiveness of Distorting Private Information

For experiments presented in Fig. 2(a) and Fig. 2(b), we compare the average JS between users’ original profiles and profiles exposed by EDs after the test period. In Fig. 2(a), we fix $c_{e}=1\%$ and $\xi_{e}=15$ , but vary $f_{e}$ to study the average JS under different numbers of redundant requests. Whereas, in Fig. 2(b), we vary $\xi_{e}$ but fix $c_{e}=1\%$ and $f_{e}=4$ to study privacy protection under different privacy budget of EDs. As presented in Fig. 2(a) and Fig. 2(b), we can observe that PPVF steadily outperforms other baselines. It exhibits an average reduction of $17.54\%$ ( $22.38\%$ ) of JS compared to the second-best privacy-preserving baseline when varying the pre-fetching capacity (or the total privacy budget) for each ED.

These experimental results manifest that PPVF can significantly distort exposed user profiles so that users’ video request privacy can be preserved. Compared with SAGE and BESTFIT, PPVF achieves the lowest average JS because PPVF considers both limited privacy budget and video utility when pre-fetching videos. Recall that we tune a threshold to select videos for generating redundant requests in Alg. 1. When the privacy budget is plentiful, PPVF selects videos of high utility with higher priority. However, if the privacy budget of the video is tight, we tune the threshold so that PPVF can select the video more conservatively. Instead, more diversified videos will be selected to conceal user privacy. Note that the average JS of classical caching algorithms, i.e., LRU, and LFU, are also compared with ours. Although these algorithms do not consider privacy protection with the redundant fetching videos, they can benchmark the degree of protection offered by privacy-preserving algorithms at edge devices.

V-D2 Privacy Protection against Recommendation Systems

We further employ the degradation of RHR to evaluate privacy protection by implementing the algorithm in [26] to recommend videos based on request records exposed by EDs after the test period. The configurations in Fig. 2(c) and Fig. 2(d) mirror those in Fig. 2(a) and Fig. 2(b), respectively. By using original user profiles for a recommendation, the algorithm in [26] can achieve $99.42\%$ RHR, indicating the effectiveness of the recommendation. Then, the Degradation of RHR calculates the gap between the accuracy achieved by utilizing request records exposed by EDs and the original accuracy $99.42\%$ . The experimental results in Fig. 2(c) and Fig. 2(d) indicate that:

•

PPVF is the best one to achieve the highest RHR degradation under all different scenarios. In particular, the performance of PPVF is better when the pre-fetching capacity is limited or the total privacy budget is sufficient.
•

SAGE and BESTFIT are better than LRU/LFU in privacy-preservation. However, their performance is inferior to PPVF under the same constraints, e.g., pre-fetching capacity and privacy budget.
•

LRU/LFU can degrade RHR performance because they are implemented on EDs, which only expose consolidated request records of multiple users, making it difficult for recommenders to identify personalized interests.

V-D3 Remaining Privacy Budgets of all Content

In Fig. 3, we plot the cumulative distribution function (CDF) of the remaining privacy budgets of all videos after the test period to visualize redundant request decisions made by different caching algorithms. Here, we set $\xi_{e}=15$ by default. From Fig. 3, we can observe that PPVF can use up privacy budgets of videos worth for caching, and thereby, there are nearly $70\%$ videos with $60\%$ residual privacy budget at the end of the test. In contrast, the budget consumption of BESTFIT and SAGE is scattered among different videos. Their redundant requests for the hottest or coldest videos are not effective in preserving privacy, which is why their protection is weaker than ours.

V-E Effective Caching Performance

To comprehensively demonstrate the superiority of PPVF, we also compare CHR performance between different algorithms. In this experiment, we tune the caching capacity of each ED from $0.1\%$ to $10\%$ of the total video number, i.e., the caching capacity ranges from $10$ to $1037$ videos. For simplicity, we ignore the size difference of videos [22]. We calculate the CHR of the entire system over the test period. The results are plotted in Fig. 4, in which the y-axis represents the average CHR and the x-axis represents the caching capacity of each ED. To better show the difference between PPVF, SAGE, and BESTFIT, we present numerical results in Table III in Appendix D. From the experimental results in Fig. 4 and Table III, we can observe that:

•

PPVF is slightly better than SAGE and BESTFIT in terms of the CHR performance among the small capacities, while BESTFIT achieves the highest CHR when the caching capacity is large. Note that it is fair to compare the CHR performance between PPVF, SAGE, and BESTFIT because all algorithms cache redundant video requests based on the same predicted utility.
•

Except for SAGE and BESTFIT, our PPVF consistently attains the highest CHR in comparison to other caching baselines. This translates to an average enhancement of $18.15\%$ over the second-best caching algorithm, HRS, within all capacity settings. The presented results demonstrate the robustness of PPVF, indicating its potential applicability for implementation across heterogeneous edge devices with varying caching capacities.
•

On the one hand, PPVF outperforms HRS / MAV in CHR by leveraging a more effective utility prediction method to reliably aggregate the private information at all EDs based on the FL framework. On the other hand, the CHR performance of PPVF surpasses that of LFU/LRU because these eviction-based algorithms do not make any redundant video requests to improve their caching efficiency. The superior performance of PPVF over these SOTA baselines can be attributed to its ability to request and cache high-utility redundant videos, thereby elevating the CHR.
•

Moreover, PPVF demonstrates a more efficient utilization of caching capacity. To elucidate, when working with a limited caching capacity of $0.1\%,$ PPVF’s performance surpasses the second-best caching solution by more than $24.70\%$ . This is especially notable under constrained caching resources, underscoring PPVF’s exceptional capability to predict the most popular videos, even if using distorted information in the online FL framework. With more accurately estimated video utility, PPVF can accordingly pre-fetch videos to attain superior CHR performance.

Lastly, we study the sensitivity of the online parameter update interval $t_{\theta}$ in Table II to see how this hyper-parameter affects the video caching performance. All other hyper-parameters are kept unchanged as we vary $t_{\theta}$ . As illustrated in Table II, the comprehensive CHR exhibits an effective improvement when this hyper-parameter is minimized. A smaller $t_{\theta}$ means a more frequent online parameter update in the FL framework. This observation stems from the fact that a more frequent update contributes to sustaining an efficient model for predicting video utility. Nevertheless, this enhancement comes at the expense of an increased computational burden. Considering this trade-off, the selection of a 2-day (48-hour) interval for parameter updates in previous work [22] and our study is deemed rational.

TABLE II: The average CHR (%) results under different

t_{\theta}

(hours). All Settings are on default as described in Sec. V-A except varying except

t_{\theta}

$c_{e}$	$t_{\theta}$ (hours)
	12	24	48	72	120	240
0.1%	3.944	3.975	3.782	3.471	3.251	2.472
1%	19.01	19.40	19.49	19.37	18.74	17.48
10%	58.07	60.37	60.68	60.36	60.53	59.06

VI RELATED WORK

User privacy, including historical records, location, and other personal information, is an important concern in online video systems, prompting significant research efforts for safeguarding it. Various approaches have been introduced to address this concern. For instance, noise-based methods like DP [38] and Anonymous [2, 6] have been proposed to shield location information, while blockchain-based techniques [39, 40] have been employed to safeguard users’ personal information. Despite these efforts, the focus on request privacy, deemed the most critical user privacy aspect [4], is also essential during the design of privacy-preserving video systems. In this section, we briefly review existing relevant works from two perspectives: privacy leakage in online video services and its protection.

VI-A Request Privacy Leakage

Privacy concerns in online video services span multiple dimensions: request traces [11, 10], personal details [41, 8], location data [6, 3, 2], and specific content data [17, 7], to name a few. Among these, request traces have emerged as particularly pivotal within online video services, as they may inadvertently expose user preferences to potential adversaries [4]. Such traces frequently encapsulate sensitive user information, capturing browsing patterns, preferences, and interests of users [10, 7]. Commercial motivations propel content providers to amass and scrutinize users’ private data [16]. This collected data is multifaceted, comprising geographical locations [3], behavioral tendencies [42], personal specifics [12], among other aspects. Leveraging this data can substantially refine service quality for content providers across domains, including content caching [33], recommendation engines [23], and video distribution [43, 44].

VI-B Request Privacy Protection at the Network Edge

Privacy protection in online video services at the network edge can be broadly classified into three categories. The first focuses on cryptography based techniques, including encryption transmission [17, 45] and blockchain-enabled methods [5, 46]. While these techniques are potent, they impose significant computational demands on edge devices and cannot entirely prevent CPs from potential misuse of user data. The second category encompasses trusted distributed computing (TDC) techniques, exemplified by federated learning [9, 20, 47]. Although these methods bolster user privacy by obviating the need for direct data transfer, their suitability for online video platforms is debatable, given their limited capability to prohibit content providers from tracking user viewing habits. The third category is grounded in noise-based techniques. These methods accentuate request privacy within edge networks by obfuscating actual user preferences [24, 48]. A prevalent approach within this category is the pre-fetching of redundant and unrelated videos to foster ambiguity. Such pre-fetched content can also be cached at the network edge to serve future requests, thereby curtailing direct interactions with CPs [49, 50], which in turn mitigates the data exposure risk. Nevertheless, balancing the quality of edge services with the imperative of user data protection remains an intricate endeavor.

one viable solution to ensure request privacy in online video systems involves incorporating DP noises, which delivers robust information protection assurances [13, 4, 12]. L’ecuyer et al. [36] pioneered the use of block composition to tackle privacy concerns arising from expanding private datasets. This innovative method provides theoretical assurances for the efficient utilization of individual dataset segments. Moreover, to shield non-iid datasets, correlated differential privacy was introduced in [51, 27], taking into account the interdependence among records. Yet, the challenge confronted by allocating limited privacy budgets for online video requests on edge devices persists. Certain allocation frameworks, like Sage [36] and DP-FLames [52], may be overly simplistic or rely on improbable assumptions, thereby restricting their flexibility in diverse scenarios. In light of these methodological limitations, we present a novel privacy protection strategy. This method enhances request privacy by generating redundant requests, all while preserving the operational efficacy of edge caching.

VII Conclusion

With the proliferation of online video services, preserving request privacy remains an open problem. The challenge of this problem lies in that online video providers can automatically capture video requests from users. As a consequence, user requests cannot be trivially distorted by injecting noises or protected with encryption. In this work, we are among the first to attempt to address this challenge by proposing the PPVF framework, which synthetically utilizes trusted edge caching, correlated differential privacy, and federated learning. In other words, edge devices try to conceal user request privacy by generating noisy requests (with the noise scale calibrated according to video correlation) to the video provider. To maintain system efficiency, edge devices collaboratively predict video utility via FL so that they can harmonize video utility and privacy leakage amount when requesting videos. With the advancement of the online video market, privacy-preserving techniques presented in this work offer invaluable insights and solutions for bolstering user privacy when consuming video content. Subsequent endeavors can build upon these foundations to further propel the field of privacy-preserving online video services.

References

[1] Global Media Insight, “Youtube statistics 2024 (demographics, users by country & more ),” p. 1, 2024. [Online]. Available: https://www.globalmediainsight.com/blog/youtube-users-statistics/
[2] N. Nisha, I. Natgunanathan, S. Gao, and Y. Xiang, “A novel privacy protection scheme for location-based services using collaborative caching,” Computer Networks, vol. 213, p. 109107, Aug 2022.
[3] S. Amini, J. Lindqvist, J. Hong, J. Lin, E. Toch, and N. Sadeh, “Caché: Caching location-enhanced content to improve user privacy,” in Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiCom), no. 11. ACM, 2011, pp. 197–209.
[4] M. Wang, C. Xu, X. Chen, H. Hao, L. Zhong, and S. Yu, “Differential privacy oriented distributed online learning for mobile social video prefetching,” IEEE Transactions on Multimedia, vol. 21, no. 3, pp. 636–651, Jan 2019.
[5] Y. Qian, Y. Jiang, L. Hu, M. S. Hossain, M. Alrashoud, and M. Al-Hammadi, “Blockchain-based privacy-aware content caching in cognitive internet of vehicles,” IEEE Network, vol. 34, no. 2, pp. 46–51, 2020.
[6] L. Hu, Y. Qian, M. Chen, M. S. Hossain, and G. Muhammad, “Proactive Cache-Based Location Privacy Preserving for Vehicle Networks,” IEEE Wireless Communications, vol. 25, no. 6, pp. 77–83, Dec 2018.
[7] J. Cui, L. Wei, H. Zhong, J. Zhang, Y. Xu, and L. Liu, “Edge computing in VANETs-An efficient and privacy-preserving cooperative downloading scheme,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 6, pp. 1191–1204, Apr 2020.
[8] X. Zhang, H. Zhong, C. Fan, I. Bolodurina, and J. Cui, “CBACS: A Privacy-Preserving and Efficient Cache-Based Access Control Scheme for Software Defined Vehicular Networks,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 1930–1945, May 2022.
[9] D. Qiao, S. Guo, D. Liu, S. Long, P. Zhou, and Z. Li, “Adaptive Federated Deep Reinforcement Learning for Proactive Content Caching in Edge Computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4767–4782, Dec 2022.
[10] V. Sivaraman and B. Sikdar, “A Defense Mechanism against Timing Attacks on User Privacy in ICN,” IEEE/ACM Transactions on Networking, vol. 29, no. 6, pp. 2709–2722, Dec 2021.
[11] W. Tong, W. Chen, B. Jiang, F. Xu, Q. Li, and S. Zhong, “Privacy-Preserving Data Integrity Verification for Secure Mobile Edge Storage,” IEEE Transactions on Mobile Computing, vol. Early Acce, pp. 1–1, Mar 2022.
[12] P. Zhou, K. Wang, J. Xu, and D. Wu, “Differentially-private and trustworthy online social multimedia big data retrieval in edge computing,” IEEE Transactions on Multimedia, vol. 21, no. 3, pp. 539–554, Mar 2019.
[13] Q. Cai, Z. Xue, C. Zhang, W. Xue, S. Liu, R. Zhan, X. Wang, T. Zuo, W. Xie, D. Zheng, P. Jiang, and K. Gai, “Two-Stage Constrained Actor-Critic for Short Video Recommendation,” in Proceedings of the ACM Web Conference 2023 (WWW ’23). ACM, Apr 2023, pp. 865–875.
[14] Q. Yang and P. Kong, “RuleCache: A mobility pattern based multi-level cache approach for location privacy protection,” in 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2016, pp. 448–455.
[15] L. Xiao, X. Wan, C. Dai, X. Du, X. Chen, and M. Guizani, “Security in Mobile Edge Caching with Reinforcement Learning,” IEEE Wireless Communications, vol. 25, no. 3, pp. 116–122, Jun 2018.
[16] J. Ni, K. Zhang, and A. V. Vasilakos, “Security and Privacy for Mobile Edge Caching: Challenges and Solutions,” IEEE Wireless Communications, vol. 28, no. 3, pp. 77–83, Jun 2021.
[17] A. Araldo, G. Dan, and D. Rossi, “Caching Encrypted Content Via Stochastic Cache Partitioning,” IEEE/ACM Transactions on Networking, vol. 26, no. 1, pp. 548–561, Jan 2018.
[18] J. Liu, J. Liu, H. Xu, Y. Liao, Z. Wang, and Q. Ma, “Yoga: Adaptive layer-wise model aggregation for decentralized federated learning,” IEEE/ACM Transactions on Networking, vol. 32, no. 2, pp. 1768–1780, 2024.
[19] T. Nguyen and M. T. Thai, “Preserving privacy and security in federated learning,” IEEE/ACM Transactions on Networking, vol. 32, no. 1, pp. 833–843, 2024.
[20] X. Liu, Z. Yan, Y. Zhou, D. Wu, X. Chen, and J. H. Wang, “Optimizing parameter mixing under constrained communications in parallel federated learning,” IEEE/ACM Transactions on Networking, vol. 31, no. 6, pp. 2640–2652, 2023.
[21] J. Hu, Z. Wang, Y. Shen, B. Lin, P. Sun, X. Pang, J. Liu, and K. Ren, “Shield against gradient leakage attacks: Adaptive privacy-preserving federated learning,” IEEE/ACM Transactions on Networking, vol. 32, no. 2, pp. 1407–1422, 2024.
[22] X. Zhang, Y. Zhou, D. Wu, M. Hu, X. Zheng, M. Chen, and S. Guo, “Optimizing Video Caching at the Edge: A Hybrid Multi-Point Process Approach,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 10, pp. 2597–2611, Oct 2022.
[23] R. Guerraoui, A. M. Kermarrec, and M. Taziki, “The utility and privacy effects of a click,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). ACM, Aug 2017, pp. 665–674.
[24] A. A. Sen, F. B. Eassa, M. Yamin, and K. Jambi, “Double Cache Approach with Wireless Technology for Preserving User Privacy,” Wireless Communications and Mobile Computing, vol. 2018, pp. 1–11, 2018.
[25] Y. Zhou, L. Chen, C. Yang, and D. M. Chiu, “Video Popularity Dynamics and Its Implication for Replication,” IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1273–1285, Aug 2015.
[26] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web (WWW’17). ACM, 2017, pp. 173–182.
[27] T. Zhu, P. Xiong, G. Li, and W. Zhou, “Correlated differential privacy: Hiding information in Non-IID data set,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 2, pp. 229–242, Feb 2015.
[28] W. Li, L. Xiang, B. Guo, Z. Li, and X. Wang, “DPlanner: A Privacy Budgeting System for Utility,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 1196–1210, Dec 2022.
[29] W. Li, L. Xiang, Z. Zhou, and F. Peng, “Privacy budgeting for growing machine learning datasets,” in Proceedings - IEEE INFOCOM 2021. IEEE, May 2021, pp. 1–10.
[30] F. McSherry, “Privacy integrated queries: An extensible platform for privacy-preserving data analysis,” in Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems (SIGMOD-PODS’09). ACM, 2009, pp. 19–30.
[31] D. Daley, Daryl J and Vere-Jones, An introduction to the theory of point processes: volume II: general theory and structure, ser. Probability and Its Applications. Springer, 2008, vol. 6(13).
[32] A. G. Hawkes, “Spectra of Some Self-Exciting and Mutually Exciting Point Processes,” Biometrika, vol. 58, no. 1, p. 83, Apr 1971.
[33] Z. Shi, Y. Zhou, D. Wu, and C. Wang, “PPVC: Online Learning Toward Optimized Video Content Caching,” IEEE/ACM Transactions on Networking, vol. 30, no. 3, pp. 1029–1044, Jun 2022.
[34] M.-A. Rizoiu, Y. Lee, S. Mishra, and L. Xie, Hawkes processes for events in social media. Association for Computing Machinery and Morgan & Claypool, 2017, p. 191–218. [Online]. Available: https://doi.org/10.1145/3122865.3122874
[35] K. Pan and K. Feng, “Differential privacy-enabled multi-party learning with dynamic privacy budget allocating strategy,” Electronics, vol. 12, no. 3, 2023.
[36] M. Lécuyer, R. Spahn, K. Vodrahalli, R. Geambasu, and D. Hsu, “Privacy accounting and quality control in the sage differentially private ML platform,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP’19). ACM, Oct 2019, pp. 181–195.
[37] X. Zhang, L. Xiao, Y. Zhou, M. Hu, D. Wu, and G. Liu, “crvr: A stackelberg game approach for joint privacy-aware video requesting and edge caching,” arXiv preprint arXiv:2310.12622, 2023. [Online]. Available: http://arxiv.org/abs/2310.12622
[38] Z. Zhang, T. Cao, X. Wang, H. Xiao, and J. Guan, “VC-PPQ: Privacy-preserving Q-learning Based Video Caching Optimization in Mobile Edge Networks,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 6, pp. 4129–4144, Aug 2022.
[39] Y. Dai, D. Xu, K. Zhang, S. Maharjan, and Y. Zhang, “Deep Reinforcement Learning and Permissioned Blockchain for Content Caching in Vehicular Edge Computing and Networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 4312–4324, Apr 2020.
[40] L. Cui, X. Su, Z. Ming, Z. Chen, S. Yang, Y. Zhou, and W. Xiao, “CREAT: Blockchain-Assisted Compression Algorithm of Federated Learning for Content Caching in Edge Computing,” IEEE Internet of Things Journal, vol. 9, no. 16, pp. 14 151–14 161, Aug 2022.
[41] K. Xue, P. He, X. Zhang, Q. Xia, D. S. Wei, H. Yue, and F. Wu, “A Secure, Efficient, and Accountable Edge-Based Access Control Framework for Information Centric Networks,” IEEE/ACM Transactions on Networking, vol. 27, no. 3, pp. 1220–1233, Jun 2019.
[42] Y. Zhang, P. Zhao, K. Bian, Y. Liu, L. Song, and X. Li, “DRL360: 360-degree Video Streaming with Deep Reinforcement Learning,” in Proceedings - IEEE INFOCOM 2019. IEEE, Apr 2019, pp. 1252–1260.
[43] H. Gupta, J. Chen, B. Li, and R. Srikant, “Online Learning-Based Rate Selection for Wireless Interactive Panoramic Scene Delivery,” in Proceedings - IEEE INFOCOM 2022. IEEE, Jun 2022, pp. 1799–1808.
[44] V. Kirilin, A. Sundarrajan, S. Gorinsky, and R. K. Sitaraman, “RL-Cache: Learning-Based Cache Admission for Content Delivery,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2372–2385, Oct 2020.
[45] Q. Xu, Z. Su, Q. Zheng, M. Luo, B. Dong, and K. Zhang, “Game theoretical secure caching scheme in multihoming edge computing-enabled heterogeneous networks,” IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4536–4546, Jun 2019.
[46] Y. Jiang, Y. Zhong, and X. Ge, “IIoT Data Sharing Based on Blockchain: A Multileader Multifollower Stackelberg Game Approach,” IEEE Internet of Things Journal, vol. 9, no. 6, pp. 4396–4410, Mar 2022.
[47] X. Liu, Z. Zhong, Y. Zhou, D. Wu, X. Chen, M. Chen, and Q. Z. Sheng, “Accelerating federated learning via parallel servers: A theoretically guaranteed approach,” IEEE/ACM Transactions on Networking, vol. 30, no. 5, pp. 2201–2215, 2022.
[48] K. Wang and N. Deng, “A Privacy-Protected Popularity Prediction Scheme for Content Caching Based on Federated Learning,” IEEE Transactions on Vehicular Technology, vol. 71, no. 9, pp. 10 191–10 196, Jun 2022.
[49] Q. Wu, Z. Li, G. Tyson, S. Uhlig, M. A. Kaafar, and G. Xie, “Privacy-Aware Multipath Video Caching for Content-Centric Networks,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 8, pp. 2219–2230, Aug 2016.
[50] S. Nikolaou, R. Van Renesse, and N. Schiper, “Proactive Cache Placement on Cooperative Client Caches for Online Social Networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 4, pp. 1174–1186, Apr 2016.
[51] J. Chen, H. Ma, D. Zhao, and L. Liu, “Correlated Differential Privacy Protection for Mobile Crowdsensing,” IEEE Transactions on Big Data, vol. 7, no. 4, pp. 784–795, Dec 2017.
[52] Y. Hu, W. Liang, R. Wu, K. Xiao, W. Wang, X. Li, J. Liu, and Z. Qin, “Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding,” in Proceedings of the ACM Web Conference 2023 (WWW ’23). ACM, 2023, pp. 2306–2317.

Appendix A Proof of Theorem 1

Before the proof, the detailed algorithm for online video selection and privacy budget allocation is shown in Alg. 1.

Proof.

We initiate our discussion with a single content scenario and subsequently extend our proof to a multi-video content context. Besides focusing on a particular ED, we simplify our notation by omitting the subscript $e$ , provided there is no risk of confusion.

For any given input sequence $\kappa$ , we define $\text{PPVF}_{i}(\kappa)=\sum_{k\in\mathcal{S}_{i}}\lambda_{i}^{k}$ and $\text{OPT}_{i}(\kappa)=\sum_{k\in\mathcal{S}^{\star}_{i}}\lambda_{i}^{k}$ as the total utilities accrued by the PPVF algorithm (as outlined in Alg. 1) and the offline optimum (denoted as OPT), respectively. Here, $\mathcal{S}_{i}$ and $\mathcal{S}^{\star}_{i}$ represent the set of requests selected to video $i$ from the input sequence by these two methods. Let $\Gamma_{i}$ represent the fraction of video $i$ ’s budget consumed by PPVF.

Furthermore, we define

\Lambda_{i}=\sum_{k\in(\mathcal{S}\cap\mathcal{S}^{\star})}\lambda_{i}^{k},\quad\Lambda^{\prime}_{i}=\sum_{k\in(\mathcal{S}\backslash\mathcal{S}^{\star})}\lambda_{i}^{k},

and

\Upsilon_{i}=\sum_{k\in(\mathcal{S}\cap\mathcal{S}^{\star})}\Theta(\gamma_{i}^{k})\cdot\epsilon_{i},\quad\Upsilon_{i}^{\prime}=\sum_{k\in(\mathcal{S}\backslash\mathcal{S}^{\star})}\Theta(\gamma_{i}^{k})\cdot\epsilon_{i}.

First, for each request $k\in\mathcal{S}_{i}$ , the efficiency $\lambda_{i}^{k}/\epsilon_{i}$ is at least $\Theta(\gamma_{i}^{k})$ , i.e., $\lambda_{i}^{k}>\Theta(\gamma_{i}^{k})\cdot\epsilon_{i}$ , where $\gamma_{i}^{k}$ denotes the fraction of privacy budget of video $i$ accessed at that specific time. Rounding down the utility $\lambda_{i}^{k}$ of each request $k$ chosen by PPVF to $\Theta(\gamma_{i}^{k})\cdot\epsilon_{i}$ , we ascertain that


	$\displaystyle\Upsilon_{i}\leq\Lambda_{i},$		(19a)
	$\displaystyle\Upsilon_{i}^{\prime}\leq\Lambda^{\prime}_{i}.$		(19b)

Recall that $\Theta(\gamma)$ is a monotonically increasing function with respect to $\gamma$ , we can also observe

\Upsilon_{i}\leq\Theta(\Gamma_{i})\cdot E_{i},

(20)

where $E_{i}=\sum_{k\in(\mathcal{S}\cap\mathcal{S}^{\star})}\epsilon_{i}$ .

Continuing our analysis, for each request $k\in\mathcal{S}_{i}^{\star}-(\mathcal{S}_{i}\cap\mathcal{S}_{i}^{\star})$ , which represents requests selected by OPT but not by our PPVF algorithm, we have:

\lambda_{i}^{k}\leq\Theta(\gamma^{k}_{i})\cdot\epsilon_{i}\leq\Theta(\Gamma_{i})\cdot\epsilon_{i}.

(21)

Note that $\xi-E_{i}$ is the remaining budget as per PPVF after selecting requests to set $\mathcal{S}\cap\mathcal{S}^{\star}$ , which represents the ideal maximum budget that OPT could employ to select the request to the set $\mathcal{S}^{\star}-\mathcal{S}\cap\mathcal{S}^{\star}$ . Given the threshold function $\Theta(\gamma)$ is monotonically increasing with respect to $\gamma$ , we can derive:

\text{OPT}_{i}(\kappa)-\Lambda_{i}\leq\Theta(\Gamma_{i})\cdot(\xi-E_{i}).

(22)

Since $\text{PPVF}_{i}(\kappa)=\Lambda_{i}+\Lambda^{\prime}_{i}$ , the above inequality implies that

\frac{\text{OPT}_{i}(\kappa)}{\text{PPVF}_{i}(\kappa)}\leq\frac{\Lambda_{i}+\Theta(\Gamma_{i})\cdot(\xi_{e}-E_{i})}{\Lambda_{i}+\Lambda^{\prime}_{i}}.

(23)

Additionally, considering $\text{OPT}_{i}(\kappa)\geq\text{PPVF}_{i}(\kappa)$ , we always have $\Theta(\Gamma_{i})\cdot(\xi-E_{i})\geq\Lambda^{\prime}_{i}$ . Thus, if we reduce $\Lambda_{i}$ to $\Upsilon_{i}$ in both denominator and numerator of Eq. (23), the ratio of $\frac{\text{OPT}_{i}(\kappa)}{\text{PPVF}_{i}(\kappa)}$ increases. In conclusion, combining the inequations in Eqs. (19)-(23), we have:

$\displaystyle\frac{\text{OPT}_{i}(\kappa)}{\text{PPVF}_{i}(\kappa)}$	$\displaystyle\leq\frac{\Upsilon_{i}+\Theta(\Gamma_{i})\cdot(\xi-E_{i})}{\Upsilon_{i}+\Lambda^{\prime}_{i}}$	(24)
	$\displaystyle\leq\frac{\Theta(\Gamma_{i})\cdot E_{i}+\Theta(\Gamma_{i})\cdot(\xi-E_{i})}{\Upsilon_{i}+\Lambda^{\prime}_{i}}$
	$\displaystyle\leq\frac{\Theta(\Gamma_{i})\cdot\xi}{\Upsilon_{i}+\Upsilon_{i}^{\prime}}=\frac{\Theta(\Gamma_{i})}{\sum_{k\in\mathcal{S}}\Theta(\gamma_{i}^{k})\Delta\gamma_{i}^{k}}.$

Recall that $\Gamma_{e}=\frac{1}{1+ln(U_{e}/L_{e})}$ is the lowest threshold in function $\Theta_{e}(\cdot)$ at any ED $e$ . Based on Assumption 1, indicating that $\Delta\gamma_{e,i}^{k}\rightarrow 0$ , we have:

	$\displaystyle\sum_{k\in\mathcal{S}_{e,i}}\Theta_{e}(\gamma_{e,i}^{k})\Delta\gamma_{e,i}^{k}\approx\int_{0}^{\Gamma_{i}}\Theta_{e}(\gamma)\ \mathrm{d}\gamma$	(25)
$\displaystyle=$	$\displaystyle\int_{0}^{\Gamma_{e}}L_{e}\cdot\mathrm{d}\gamma+\int_{\Gamma_{e}}^{\Gamma_{i}}\frac{L_{e}}{\exp(1)}\ln\left(\frac{U_{e}\cdot\exp(1)}{L_{e}}\right)^{\gamma}\cdot\mathrm{d}\gamma$
$\displaystyle=$	$\displaystyle\Gamma_{e}\cdot\left(L_{e}+\frac{L_{e}}{\exp(1)}\left(\ln\left(\frac{U_{e}\cdot\exp(1)}{L_{e}}\right)^{\Gamma_{i}}\right.\right.$
	$\displaystyle\hskip 100.0pt-\left.\left.\ln\left(\frac{U_{e}\cdot\exp(1)}{L_{e}}\right)^{\Gamma}\right)\right)$
$\displaystyle=$	$\displaystyle\Gamma_{e}\cdot\frac{L_{e}}{\exp(1)}\ln\left(\frac{U_{e}\cdot\exp(1)}{L_{e}}\right)^{\Gamma_{i}}$
$\displaystyle=$	$\displaystyle\Gamma_{e}\cdot\Theta_{e}(\Gamma_{i}).$

Therefore, the ratio $\frac{\text{OPT}_{i}(\kappa)}{\text{PPVF}_{i}(\kappa)}$ at ED $e$ can be derived into

\displaystyle\frac{\text{OPT}_{i}(\kappa)}{\text{PPVF}_{i}(\kappa)}

\displaystyle\leq\frac{\Theta(\Gamma_{i})}{\Gamma_{e}\cdot\Theta(\Gamma_{i})}=1+\ln(U_{e}/L_{e}).

(26)

Denote $\mathcal{J}_{e}^{*}$ and $\mathcal{J}_{e}^{\text{PPVF}}$ as the final sum of utility obtained by solution of offline optimum and PPVF, respectively. Following the proof in [28, 29], the CR of our online algorithm at any ED $e$ can be obtained by by summing up all single video $i$ :

$\displaystyle\mathcal{J}_{e}^{*}$	$\displaystyle=\sum_{i}\text{OPT}_{i}(\kappa)$	(27)
	$\displaystyle\leq\sum_{i}\left(1+\ln(U_{e}/L_{e})\right)\cdot\text{PPVF}_{i}(\kappa)$
	$\displaystyle=\left(1+\ln(U_{e}/L_{e})\right)\cdot\mathcal{J}_{e}^{\text{PPVF}}.$

To sum up, we can similarly prove that Alg. 1 achieves the best competition ratio (CR) among all online solutions under Assumption 1.

CR=\max_{\kappa}\frac{\mathcal{J}_{e}^{*}}{\mathcal{J}_{e}^{\text{PPVF}}}=\left(1+\ln(U_{e}/L_{e})\right).

(28)

Proof completes. ∎

TABLE III: CHR (%) under different caching capacity

c_{e}

. SAGE and BESTFIT are implemented with our video utility predictor.

Caching Capacity	0.1%	0.25%	0.5%	0.75%	1%	2.5%	5%	7.5%	10%
PPVF	3.782	7.270	11.728	15.764	19.492	35.487	47.936	55.505	60.677
SAGE	3.768	7.266	11.719	15.761	19.491	35.431	47.905	55.451	60.569
BESTFIT	3.779	7.256	11.710	15.688	19.431	35.762	48.394	55.971	61.528
HRS	3.033	6.082	9.974	13.336	16.327	29.632	41.700	48.845	52.483
MAV	3.029	5.740	9.079	11.737	14.083	25.209	38.351	47.365	53.862
LRU	1.293	2.965	5.539	7.989	10.257	21.739	35.366	45.367	52.436
LFU	1.256	2.878	5.847	8.506	10.925	23.542	38.312	48.171	54.896

Input: Online update time set

\mathcal{Q}

, Update interval

\Delta t

Output:

\bm{\theta}

1 for $t_{\theta}\in\mathcal{Q}$ do

n\leftarrow 0

and initialize

\bm{\theta}^{(n)}

with the outdated parameters;

3 while The termination condition is not satisfied do

l_{G}\leftarrow 0

\bm{\nabla}_{G}\leftarrow[0]^{(2\times\text{D}+1)\times\text{I}}

;

5 for $\forall e\in\mathcal{E}$ do

l_{e}

\bm{\nabla}_{e}\leftarrow

EDLocalTraining( $t_{\theta}$ , $\Delta t$ , $\bm{\theta}^{(n)}$ );

l_{G}\leftarrow l_{G}+l_{e}

\bm{\nabla}_{G}\leftarrow\bm{\nabla}_{G}+\bm{\nabla}_{e}

;

9 end for

10 Update

\bm{\theta}^{(n+1)}\leftarrow\bm{\theta}^{(n)}

with

l_{G}

\bm{\nabla}_{G}

and the penalty term

\bm{\rho}

;

n\leftarrow n+1

;

13 end while

15 end for

16return

\bm{\theta}

17Function EDLocalTraining( $t_{\theta}$ , $\Delta t$ , $\bm{\theta}^{(n)}$ ):

l_{e}\leftarrow 0

\bm{\nabla}_{\beta}\leftarrow[0]^{\text{I}}

\bm{\nabla}_{p}\leftarrow[0]^{\text{I}\times\text{D}}

\bm{\nabla}_{q}\leftarrow[0]^{\text{I}\times\text{D}}

;

21 for $\forall i\in\mathcal{I}$ do

22 Collect the timestamp set of historical viewing requests

\mathcal{T}^{t_{\theta}-\Delta t,t_{\theta}}_{e,i}

from the local dataset;

23 for $\forall\tau\in\mathcal{T}^{t_{\theta}-\Delta t,t_{\theta}}_{e,i}$ do

24 Obtain the intensity

\hat{h}_{e}(i,\tau)

by Eq. (10);

l_{e}\ \leftarrow l_{e}+\log\hat{h}_{e}(i,\tau)

;

27 end for

28 Calculate the term of integration

l^{\prime}=\int_{t_{\theta}-\Delta t}^{t_{\theta}}\hat{h}_{e}(i,t)\mathrm{d}t

in Eq.(12);

l_{e}\leftarrow l_{e}-l^{\prime}

;

30 Calculate the gradient

\nabla_{\beta}[i]

\bm{\nabla}_{p}[i]

by Eqs. (14) and (13), respectively;

31 for $\forall j\in\mathcal{I}$ do

32 Calculate the gradient

\bm{\nabla}_{q}[j]

by Eq. (15);

34 end for

36 end for

37 Combine

\bm{\nabla}_{\beta}

\bm{\nabla}_{p}[i]

\bm{\nabla}_{q}[j]

to gain the whole

\bm{\nabla}_{e}

;

38 return

l_{e}

\bm{\nabla}_{e}

Algorithm 3 Online and distributed parameters learning algorithm for MEP.

Appendix B Detailed Derivation for log-likelihood function

Let the occurrence time $t^{\nu}$ be the time of the last viewing request in the historical viewing request set $\mathcal{V}_{e}^{\nu}$ . Given the overall intensity function $\hat{h}^{\prime}_{e}(t)=\sum_{\forall i}\hat{h}_{e}(t,i)$ for any ED $e$ , the probability that no request occurs in the period $[\,t^{\nu},t\,)\ ,\ t<t^{\nu+1}$ is

P\left\{\text{no request occurs in }[\,t^{\nu},t\,)\ \big{|}\ \mathcal{V}_{e}\right\}=\exp\left[-\int_{t^{\nu}}^{t}\hat{h}^{\prime}_{e}(t^{\prime})\mathrm{d}t^{\prime}\right].

Thus, the probability that a request for video $i$ occurs at time $t$ is given by

P\left\{i,t\ \big{|}\ \mathcal{V}_{e}^{\nu}\,\right\}=\hat{h}_{e}(t,i)\exp\left[-\int_{t^{\nu}}^{t}\hat{h}^{\prime}_{e}(t^{\prime})\mathrm{d}t^{\prime}\right].

For convenience, we let $t^{0}=0$ and align the all-time series for different videos $i$ to the same initial point. Recall that $\mathcal{V}_{e}$ as the historical dataset of all viewing requests at ED $e$ between the time $(\,0,T\,]$ , it is easy to derive the likelihood function for all the parameters $\bm{\theta}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{\bm{p},\bm{q},\bm{\beta}\ \}$ shown in Eq. (29).

\begin{split}l_{e}\left(\,\bm{\theta}\,\big{|}\,\mathcal{V}_{e}\,\right)&=\prod_{\nu=1}^{|\mathcal{V}_{e}|}\hat{h}_{e}(i,t^{\nu})\exp\left[-\int_{t^{\nu-1}}^{t^{\nu}}\hat{h}^{\prime}_{e}(t)\mathrm{d}t\right]\\ &\cdot\exp\left[-\int_{t^{|\mathcal{V}_{e}|}}^{T}\hat{h}^{\prime}_{e}(t)\mathrm{d}t\right],\\ &=\prod_{i\in\mathcal{I}}\prod_{\tau\in\mathcal{T}_{e,i}}\hat{h}_{e}(i,\tau)\cdot\exp\left[-\int_{0}^{T}\hat{h}_{e}(t)\mathrm{d}t\right].\end{split}

(29)

where $\mathcal{T}_{e,i}$ denoted the timestamp set of the viewing requests to video $i$ arriving at ED $e$ in $[0,T)$ . In order to facilitate calculation, we can further derive the log-likelihood function to optimize all the parameters $\bm{\theta}$ , which is defined as:

\begin{split}ll_{e}\left(\,\bm{\theta}\,\big{|}\,\mathcal{V}_{e}\,\right)=\sum_{i\in\mathcal{I}}\sum_{\tau\in\mathcal{T}_{e,i}}\log\hat{h}_{e}(i,\tau)(\tau)-\int_{0}^{T}\hat{h}_{e}(i,t^{\prime})\mathrm{d}t^{\prime}.\end{split}

(30)

Appendix C FL-based online parameters estimation algorithm

The detailed algorithm for online FL-based parameter estimation for the MEP model is presented in Alg. 3.

Appendix D Supplement of CHR Results

To better display the caching performance difference among different baselines, we further present numerical CHR results in Table III. Table III demonstrates that PPVF / SAGE / BESTFIT consistently attain the highest CHR compared to other caching baselines due to our efficient utility prediction algorithm. PPVF is slightly better than SAGE and BESTFIT in terms of the CHR performance among the small capacities, while BESTFIT achieves the highest CHR when the caching capacity is large.