On the Regret of Online Edge Service Hosting ¹¹footnotemark: 1

R Sri Prakash [email protected];[email protected] Nikhil Karamchandani [email protected] Sharayu Moharir [email protected]

Abstract

We consider the problem of service hosting where a service provider can dynamically rent edge resources via short term contracts to ensure better quality of service to its customers. The service can also be partially hosted at the edge, in which case, customers’ requests can be partially served at the edge. The total cost incurred by the system is modeled as a combination of the rent cost, the service cost incurred due to latency in serving customers, and the fetch cost incurred as a result of the bandwidth used to fetch the code/databases of the service from the cloud servers to host the service at the edge. In this paper, we compare multiple hosting policies with regret as a metric, defined as the difference in the cost incurred by the policy and the optimal policy over some time horizon $T$ . In particular we consider the Retro Renting (RR) and Follow The Perturbed Leader (FTPL) policies proposed in the literature and provide performance guarantees on the regret of these policies. We show that under i.i.d stochastic arrivals, RR policy has linear regret while FTPL policy has constant regret. Next, we propose a variant of FTPL, namely Wait then FTPL (W-FTPL), which also has constant regret while demonstrating much better dependence on the fetch cost. We also show that under adversarial arrivals, RR policy has linear regret while both FTPL and W-FTPL have regret $\mathrm{O}(\sqrt{T})$ which is order-optimal.

\affiliation

organization=Indian Institute of Technology Bombay, addressline=Powai, postcode=400076, city=Mumbai, country=India

1 Introduction

Software as a Service (SaaS) instances like online navigation platforms, Video-on-Demand services, etc., have stringent latency constraints in order to provide a good quality of experience to their customers. While most SaaSs use cloud resources, low latency necessitates the use of storage/computational resources at the edge, i.e., close to the end-user. A service is said to be hosted at the edge if the code and databases needed to serve user queries are stored on the edge servers and requests can be served at the edge. In this work, the service can also be partially hosted at the edge, in which case, customers’ requests can be partially served at the edge [1, 2]. For instance, consider a SaaS application which provides access to news articles on demand. A typical news article includes some text, some images and possibly some embedded videos. One way to partially host such a service is to store the text corresponding to the news articles at the edge and store the images/videos in the cloud. Each user request will then be served partially by the edge.

We consider the setting where third-party resources can be rented via short-term contracts to host the service, and the edge hosting status of the SaaS can be dynamically changed over time. If the service is not hosted at the edge, it can be fetched from the cloud servers by incurring a fetch cost. The performance of a hosting policy is a function of the rent cost, the fetch cost, and the quality of experience of the users. We refer to the algorithmic challenge of determining what fraction of the service to host at the edge over time as the service hosting problem. Note that we study the service hosting problem from the perspective of a specific SaaS provider which can rent third-party edge resources to improve their customers’ quality of experience by reducing the latency of their service.

Novel online service hosting policies with provable performance guarantees have been proposed in [3, 4]. The metric of interest in [3, 4] is the competitive ratio, defined as the ratio of the cost incurred by an online policy to the cost incurred by an optimal offline policy for the same request arrival sequence. Since the competitive ratio is multiplicative by definition, the absolute value of the difference in the cost incurred by an online policy and the optimal policy can be large even though the competitive ratio of the online policy is close to one. This motivates studying the performance of candidate policies in terms of regret, defined as the difference in the cost incurred by the policy and the optimal policy. Regret is a widely used metric in online learning [5], including recently for the caching problem [6, 7], which is closely related to the service hosting problem. In this work, one of our goals is to design hosting policies with provable guarantees in terms of the regret. Another key dimension in the performance guarantees of online policies is the assumption made on the request arrival process. Commonly studied settings include the stochastic setting and the adversarial setting, and we provide performance guarantees for both settings.

1.1 Our Contributions

We propose a variant of the FTPL policy called Wait then Follow the Perturbed Leader (W-FTPL). W-FTPL is a randomized policy that takes into account the fetch cost in its decision-making. More specifically, W-FTPL does not fetch the service for an initial wait period which depends on the request arrivals and is an increasing function of the fetch cost. Following the wait period, W-FTPL mimics the FTPL policy.

For i.i.d. stochastic request arrivals and the regret metric, we show that RR is sub-optimal while FTPL and W-FTPL are both order-optimal with respect to the time horizon. While the regret of FTPL can increase linearly with the fetch cost, the regret of W-FTPL increases at most polylogarithmically. The improved performance of W-FTPL over FTPL is a consequence of the fact that W-FTPL avoids most of the fetches made by FTPL in the initial time-slots and by the end of the wait period, its estimate of the arrival rate is accurate enough to avoid multiple fetches. Further, we characterize the competitive ratio of FTPL and W-FTPL for the setting where a finite number of partial hosting levels are allowed. We compare these results with the competitive ratio of RR when a single partial hosting level is allowed [2] to conclude that FTPL and W-FTPL outperform RR with respect to the competitive ratio.

For the adversarial setting, we first characterize a fundamental lower bound on the regret of any online hosting policy. In terms of regret, we then show that RR is strictly suboptimal, while FTPL and W-FTPL have order-optimal performance with respect to time. We show that the competitive ratios of FTPL and W-FTPL are upper bounded by a constant (with respect to time) for any finite number of partial hosting levels. This is a more general setting than the one considered in [2], where the competitive ratio of a variant of RR is characterized when only one partial hosting level is permitted.

1.2 Related work

Several emerging applications such as Augmented / Virtual Reality (AR/VR), online gaming, the Internet of Things (IoT), and autonomous driving have very stringent latency requirements. In addition, many of these applications are also compute-and-storage heavy. This necessitates the migration of some of the storage and computation requirements of these services from remote cloud servers to the edge. To enable such offloading, several edge computing architectures have been proposed in the literature and their performance has been analyzed extensively, see for example [8, 9, 10] and references therein.

The service hosting problem has received a lot of attention recently, and various settings have been studied. One approach has been to pose the design problem of where to host which service as a one-shot large-scale optimization problem, see for example [11, 12, 13]. Our work differs from this line of work in that we consider an online setting where we adaptively decide when (and what fraction of) service to host at any time, depending on the varying number of requests seen thus far.

There have been recent works [4, 14] which consider the online service hosting problem, and design adaptive schemes whose performance is characterized in terms of the competitive ratio with respect to an oracle which known the entire request sequence apriori. Our work differs from these in a couple of ways. Firstly, the above mentioned works study the problem from the perspective of an edge resource provider who has to decide which amongst a collection of services to host at each time on a storage-constrained edge server. On the other hand, similar to some other works [15, 3], we study the problem from the perspective of a service provider which needs to decide when to host its application at the edge so as to minimize the overall cost. Secondly, most of the works mentioned above do not consider partial hosting of the service, which is allowed in our framework. While [2] does study partial hosting and proposed the Retro-Renting (RR) policy which is shown to have a constant competitive ratio, it only allows one level of partial hosting of the service. On the other hand, we consider a much more general setting where multiple partial service hosting levels are allowed and are able to show that both the FTPL and its variant W-FTPL are able to achieve a constant competitive ratio.

While the works mentioned above study competitive ratio as a performance metric, another popular measure of performance for online algorithms is the regret which considers the difference in the cost incurred by an online policy and the optimal (static) policy. In particular, for the closely related caching problem, several policies inspired by the recent advances in online convex optimization have been proposed, including Online Gradient Ascent [6], Online Mirror Descent [16], and Follow the Perturbed Leader (FTPL) [17, 7]. In particular, FTPL has been shown to have order-optimal regret for the caching problem in the adversarial setting [17, 7]. In this work, we study regret for the RR and FTPL policies for the service hosting problem under both the stochastic and adversarial settings. Since RR is a deterministic policy, its regret performance in the adversarial setting is poor. While FTPL does achieve order-optimal regret (with respect to time), one limitation is that it makes hosting decisions agnostic of the fetch cost. As a result, in some cases, FTPL is prone to fetching and evicting the service multiple times in the initial time-slots when its estimate of the request arrival rate is noisy, thus leading to poor performance. This motivated our design and regret analysis of W-FTPL, a variant of FTPL which takes into account the fetch cost. One recent work which also considers regret as a performance metric is [18], which studied the problem of joint online service hosting and routing and proposed a two time-scale algorithm based on online gradient descent with provably order-optimal regret under adversarial requests.

Finally, there are several other approaches which have been used to address the online service hosting problem, including Markov Decision Processes [19, 20, 1], Reinforcement Learning [21], and Multi-Armed Bandits [22, 23].

2 System Setup

We consider a system consisting of a back-end server and an edge-server to serve customers of a service. The back-end server always hosts the service and can serve any requests that are routed to it. In this work, we allow the service to be partially hosted at the edge-server. When the service is partially hosted at the edge, requests are partially served at the edge and partially at the back-end server. Further, the fraction of service hosted at the edge can be changed over time. If the service is not hosted or partially hosted at the edge, parts of the service can be fetched from the back-end server to host at the edge.

Sequence of events in each time-slot: We consider a time-slotted system. In each time-slot, we first make the service hosting decision for that time-slot. Following this, requests may arrive and are served at the edge and/or the back-end server.

Request arrivals: We consider two types of arrival processes in this paper.

-

Stochastic arrivals: In this case, request arrivals are i.i.d. across time-slots with mean $\mu$ .
-

Adversarial arrivals: In this case, the request arrival sequence is generated by an oblivious adversary²²2 The entire request sequence is assumed to be fixed apriori..

Let $r_{t}$ denote the number of request arrivals in time-slot $t$ . We consider bounded requests, specifically, $0\leq r_{t}\leq\kappa$ , and let $R_{t}=\sum_{i=1}^{t}r_{i}$ denote the cumulative requests observed by the end of time-slot $t$ . In both cases, we assume that there are at most $\kappa$ arrivals per slot.

Costs: We model three types of costs.

-

Rent cost ( $\mathcal{C}^{\mathcal{P}}_{R,t}$ ): The system incurs a cost of $c$ units per time-slot to host the entire service at the edge. For hosting $f$ fraction of service, the system incurs a cost of $cf$ units per time-slot.
-

Service cost ( $\mathcal{C}^{\mathcal{P}}_{S,t}$ ): This is the cost incurred to use the back-end server to serve (parts of) requests. If $f$ fraction of service is hosted at the edge, the system incurs a service cost of $g(f)$ units per request, where $g(f)$ is a decreasing function of $f$ . We fix $g(0)=1$ , i.e., service cost is one unit when the service is not hosted at the edge.
-

Fetch cost ( $\mathcal{C}^{\mathcal{P}}_{F,t}$ ): The system incurs a cost of $M>1$ units for each fetch of the entire service from the back-end server to host on the edge-server. For fetching $f$ fraction of service, the system incurs a cost of $Mf$ units.

Let $\rho_{t}^{\mathcal{P}}$ denote the edge hosting status of the service in time slot $t$ under policy $\mathcal{P}$ . We consider the setting where $\rho_{t}^{\mathcal{P}}\in\{\alpha_{1},\alpha_{2},\cdots,\alpha_{K-1},\alpha_{K}\}$ and $\rho_{t}^{\mathcal{P}}=\alpha_{i}$ implies that $\alpha_{i}$ fraction of the service is hosted at the edge-server in time-slot $t$ . Note that $\alpha_{i}\in[0,1]$ and in particular we consider $\alpha_{1}=0$ , $\alpha_{K}=1$ and $\alpha_{i}<\alpha_{j}$ for $i<j$ . It follows that $\rho_{t}^{\mathcal{P}}=1$ implies that the entire service is hosted at the edge in time-slot $t$ , and $\rho_{t}^{\mathcal{P}}=0$ implies that the service is not hosted at the edge in time-slot $t$ .

We make some assumptions about the various system parameters.

Assumption 1.

$c\leq\kappa$ .

This assumption is motivated by the fact that if $\kappa<c$ , it is strictly sub-optimal to host the entire service at the edge, irrespective of the number of arrivals in a time-slot.

Assumption 2.

For a fraction of service $\alpha_{i}$ , $1\leq i\leq K$ , $\alpha_{i}+g(\alpha_{i})\leq 1$ .

In [1, 2], it is shown that the benefits of partial hosting are limited to the setting where $\alpha_{i}+g(\alpha_{i})\leq 1$ . Motivated by this, we consider the case where $\alpha_{i}+g(\alpha_{i})\leq 1$ for all candidate values of $\alpha_{i}$ .

Assumption 3.

For any fraction of service $\alpha_{i}$ , $1\leq i\leq K$ , $c\alpha_{i}+g(\alpha_{i})r_{t}\leq\kappa$

Assumption 3 follows from Assumptions 1 and 2

Assumption 4.

For stochastic arrivals, $\mu>0$ and for adversarial arrivals, $R_{T}\geq 1$ .

This assumption eliminates degenerate cases where there are no request arrivals in the time-horizon of interest.

The total cost incurred in time-slot $t$ by policy $\mathcal{P}$ , denoted by $\mathcal{C}^{\mathcal{P}}_{t}(r_{t})$ , is the sum of the rent, service, and fetch costs. It follows that,

\displaystyle\mathcal{C}^{\mathcal{P}}_{t}(r_{t})

\displaystyle=c\rho_{t}^{\mathcal{P}}+g(\rho_{t}^{\mathcal{P}})r_{t}+M(\rho_{t}^{\mathcal{P}}-\rho_{t-1}^{\mathcal{P}})^{+}.

(1)

Let $r=\{r_{t}\}_{t\geq 1}$ denote the request arrival sequence and $\mathcal{C}^{\mathcal{P}}(T,r)$ denote the cumulative cost incurred by policy $\mathcal{P}$ in time-slots 1 to $T$ . It follows that,

\mathcal{C}^{\mathcal{P}}(T,r)=\sum_{t=1}^{T}\mathcal{C}^{\mathcal{P}}_{t}(r_{t}).

Performance metrics: We consider two performance metrics, namely regret and competitive ratio.

For i.i.d. stochastic arrivals, the regret of a policy $\mathcal{P}$ , denoted by $\mathcal{R}^{\mathcal{P}}_{S}(T)$ , is defined as the difference in the total expected cost incurred by the policy and the oracle static hosting policy. The oracle static hosting policy makes a hosting decision at $t=1$ using the knowledge of the statistics of the request arrival process but not the entire sample-path. For ease of notation, we define $\mu={\mathbb{E}}[r_{t}]$ , $\mu_{i}=c\alpha_{i}+g(\alpha_{i})\mu$ . It follows that the expected cost incurred by the optimal static hosting policy in time-slots 1 to $T$ is $\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}$ , and therefore,

$\displaystyle\mathcal{R}^{\mathcal{P}}_{S}(T)$	$\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}$
	$\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{\mu_{i}T+M\alpha_{i}\},$
	$\displaystyle\leq{\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{\mu_{i}T\},$	(2)

where ${\mathbb{E}}_{\mathcal{P},r}$ represents the expectation over the randomization in policy $\mathcal{P}$ and the request arrival sequences $r$ .

For i.i.d. stochastic arrivals, the competitive ratio of a policy $\mathcal{P}$ , denoted by $\sigma^{\mathcal{P}}_{S}(T)$ , is defined as the ratio of the expected total cost incurred by the policy and expected total cost incurred by the optimal static hosting policy as discussed above. It follows that,

\displaystyle\sigma_{S}^{\mathcal{P}}(T)=\frac{{\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]}{\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}.

For adversarial arrivals, the regret of a policy $\mathcal{P}$ , denoted by $\mathcal{R}^{\mathcal{P}}_{A}(T)$ , is defined as the worst case difference in the total expected cost incurred by the policy and the optimal static hosting policy. The expectation is taken over the randomization in policy $\mathcal{P}$ . It follows that for any request sequence $r$ , the total cost incurred by the optimal static hosting policy in time-slots 1 to $T$ is $\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\}$ , and therefore,

	$\displaystyle\mathcal{R}^{\mathcal{P}}_{A}(T,r)$	$\displaystyle={\mathbb{E}}_{\mathcal{P}}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\},$
	$\displaystyle\mathcal{R}^{\mathcal{P}}_{A}(T)$	$\displaystyle=\sup_{r}\left(\mathcal{R}^{\mathcal{P}}_{A}(T,r)\right).$

We have the following relation between the regret induced by a policy under i.i.d stochastic arrivals and adversarial arrivals.

Lemma 1.

For any online policy $\mathcal{P}$ and stochastic arrivals we have, $\mathcal{R}_{S}^{\mathcal{P}}(T)\leq\mathcal{R}_{A}^{\mathcal{P}}(T)$ .

The proof can be found in Section 7.

For adversarial arrivals, the competitive ratio of a policy $\mathcal{P}$ , denoted by $\sigma^{\mathcal{P}}_{A}(T)$ , is defined as the worst case ratio of the expected total cost incurred by the policy and the cost incurred by the offline optimal policy. The optimal offline policy can change the hosting status over time using the knowledge of the entire request process a priori. Let the cost incurred by the optimal offline hosting policy in time-slots 1 to $T$ be denoted by $\mathcal{C}^{\text{OFF-OPT}}(T,r)$ . It follows that,

\displaystyle\sigma^{\mathcal{P}}_{A}(T)=\sup_{r}\frac{{\mathbb{E}}_{\mathcal{P}}[\mathcal{C}^{\mathcal{P}}(T,r)]}{\mathcal{C}^{\text{OFF-OPT}}(T,r)}.

Goal: The goal is to design online hosting policies with provable performance guarantees with respect to regret and competitive ratio.

Notations: We define some notations which will be used in the rest of this paper. Let $\bm{s}=[0,\alpha_{2},\alpha_{3},\cdots,1]^{\prime}$ , $\bm{f}=[1,g(\alpha_{2}),g(\alpha_{3}),\cdots,0]^{\prime}$ , and $\bm{\rho}_{t}^{\mathcal{P}}\in\mathcal{X}$ , where $\mathcal{X}$ is the collection of all possible one hot vectors in $\{0,1\}^{K}$ and thus $|\mathcal{X}|=K$ . The position of 1 in the one hot vector $\bm{\rho}_{t}^{\mathcal{P}}$ represents the level of service hosted at the edge in slot $t$ , that is $\rho_{t}^{\mathcal{P}}=\langle\bm{\rho}_{t}^{\mathcal{P}},\mathbf{s}\rangle$ . The total cost in slot $t$ as given in (1) can be rewritten using the above notations as follows,

	$\displaystyle\mathcal{C}_{t}^{\mathcal{P}}(r_{t})$	$\displaystyle=c\langle\bm{\rho}_{t}^{\mathcal{P}},\bm{s}\rangle+r_{t}\langle\bm{\rho}_{t}^{\mathcal{P}},\bm{f}\rangle+M(\rho_{t}^{\mathcal{P}}-\rho_{t-1}^{\mathcal{P}})^{+},$
		$\displaystyle=\langle\bm{\rho}_{t}^{\mathcal{P}},\bm{\theta}_{t}\rangle+M(\rho_{t}^{\mathcal{P}}-\rho_{t-1}^{\mathcal{P}})^{+},$

where $\bm{\theta}_{t}=c\bm{s}+r_{t}\bm{f}$ .

For ease of notation, for stochastic arrivals we define $\mu={\mathbb{E}}[r_{t}]$ , $\mu_{i}=c\alpha_{i}+g(\alpha_{i})\mu$ , $\Delta_{ij}=\mu_{i}-\mu_{j}$ , $i^{\star}=\operatorname*{arg\,min}_{i}\mu_{i}$ , $\Delta_{i}=\mu_{i}-\mu_{i^{\star}}$ , $\Delta_{\text{min}}=\min_{i}\Delta_{i}$ , $\Delta_{\text{max}}=\max_{i}\Delta_{i}$ .

We summarize important notations used in this paper in Table 1.

Symbol	Description
$c$	Rent cost for hosting complete service in a slot
$M$	Fetch cost
$K$	Number of hosting levels allowed
$t$	Time index
$\rho_{t}^{\mathcal{P}}$	Fraction of service hosted in time slot $t$ under policy $\mathcal{P}$
$\kappa$	Maximum number of arrivals in a slot
$\alpha_{i}$	Fraction of service
$g(f)$	Fraction of request forwarded if $f$ fraction is hosted
$r_{t}$	Number of requests in time slot $t$
$R_{t}$	Cumulative number of requests up to time slot $t$
$\mu$	${\mathbb{E}}[r_{t}]$
$T$	Time horizon
$\mathcal{C}^{\mathcal{P}}(T,r)$	Cost incurred till $T$ under policy $\mathcal{P}$ for request sequence $r$
$\mathcal{R}_{S}^{\mathcal{P}}(T)$	Regret under stochastic arrival setting
$\sigma_{S}^{\mathcal{P}}(T)$	Competitive ratio under stochastic arrival setting
$\mathcal{R}^{\mathcal{P}}_{A}(T)$	Regret under adversarial arrival setting
$\sigma_{A}^{\mathcal{P}}(T)$	Competitive ratio under adversarial arrival setting
$\mu_{i}$	$c\alpha_{i}+g(\alpha_{i})\mu$
$\Delta_{ij}$	$\mu_{i}-\mu_{j}$
$i^{\star}$	$\operatorname*{arg\,min}_{i}\mu_{i}$
$\Delta_{i}$	$\mu_{i}-\mu_{i^{\star}}$
$\Delta_{\text{min}}$	$\min_{i}\Delta_{i}$
$\Delta_{\text{max}}$	$\max_{i}\Delta_{i}$
$\bm{s}$	$[0,\alpha_{2},\alpha_{3},\cdots,1]^{\prime}$
$\bm{f}$	$[1,g(\alpha_{2}),g(\alpha_{3}),\cdots,0]^{\prime}$
$\bm{\theta}_{t}$	$c\bm{s}+r_{t}\bm{f}$
$\bm{\Theta}_{t}$	$\sum_{i=1}^{t-1}\bm{\theta}_{i}$
$\eta_{t}$	Learning rate
$\alpha$	Hyper parameter in FTPL, W-FTPL policies
$\beta$ , $\delta$	Hyperparameters in W-FTPL policy
$\bm{\gamma}$	IID Gaussian vector
$\mathcal{X}$	Collection of one hot vectors in $\{0,1\}^{K}$
$T_{s}$	Waiting time for W-FTPL policy

Table 1: Notations

3 Policies

Our Policy: Our policy called Wait then Follow the Perturbed Leader (W-FTPL), is a variant of the popular FTPL policy. A formal definition of FTPL is given in Algorithm 1. FTPL is a randomized policy and is known to perform well for the traditional caching problem, in fact achieving order-wise optimal regret for adversarial arrivals [7]. Under FTPL, in any time slot $t$ , for each $i$ , FTPL considers the cumulative cost that would have been incurred had the system used $\alpha_{i}$ hosting fraction in the entire duration $[1,t-1]$ , and then perturbs this by an appropriately scaled version of a sample of a standard Gaussian random variable. FTPL then hosts the fraction of service for which the perturbed cost is minimum in that slot.

Input:

c

\{\eta_{t}\}_{t\geq 1}

\{r_{t}\}_{t\geq 1}

1 Set

\bm{\Theta}_{1}\leftarrow\bm{0}

2 Sample

\bm{\gamma}\sim\mathcal{N}(\bm{0},\bm{I})

3 for $t=1$ to $T$ do

4 host

\bm{\rho}_{t}\in\operatorname*{arg\,min}_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{t}+\eta_{t}\bm{\gamma}\rangle

\bm{\theta}_{t}=c\bm{s}+r_{t}\bm{f}

6 Update

\bm{\Theta}_{t+1}=\bm{\Theta}_{t}+\bm{\theta}_{t}

Algorithm 1 Follow The Perturbed Leader (FTPL)

We propose a policy called Wait then Follow The Perturbed Leader (W-FTPL). The key idea behind the W-FTPL policy is to not host the service for an initial wait period. This is to reduce the number of fetches made initially when the estimate of the arrivals is noisy. The duration of this wait period is not fixed apriori and is a function of the arrival pattern seen till that time. Following the wait period, W-FTPL mimics the FTPL policy. Refer to Algorithm 2 for a formal definition of W-FTPL. Let $T_{s}$ be the time slot, after which W-FTPL starts acting as FTPL. Formally we define $T_{s}=\min\{t:t<\frac{(\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j}))^{2}}{\kappa^{2}\beta(\log M)^{1+\delta}}\}$ , where $\beta>1$ , $\delta>0$ are constants.

Input:

c

M

\{\eta_{t}\}_{t\geq 1}

\{r_{t}\}_{t\geq 1}

\beta

\delta

\kappa

1 Set

\bm{\Theta_{1}}\leftarrow\bm{0}

, wait=1

2 Sample

\bm{\gamma}\sim\mathcal{N}(\bm{0},\bm{I})

3 for $t=1$ to $T$ do

4 if $\text{wait}=1$ then

\rho_{t}=0

6 else

7 host

\bm{\rho}_{t}\in\operatorname*{arg\,min}_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{t}+\eta_{t}\bm{\gamma}\rangle

\bm{\theta}_{t}=c\bm{s}+r_{t}\bm{f}

9 Update

\bm{\Theta}_{t+1}=\bm{\Theta}_{t}+\bm{\theta}_{t}

\text{wait}=\min\{\text{wait},\mathbbm{1}_{t\geq\frac{(\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j}))^{2}}{\kappa^{2}\beta(\log M)^{1+\delta}}}\}

Algorithm 2 Wait then Follow The Perturbed Leader (W-FTPL)

Retro-Renting (RR) [3]: The RR policy is a deterministic hosting policy, and it either hosts the complete service or hosts nothing in a slot. The key idea behind this policy is to use recent arrival patterns to make hosting decisions. We refer the reader to Algorithm 1 in [3] for a formal definition of the RR policy. The performance of RR with respect to the competitive ratio was analyzed in [3].

$\alpha-$ RR [2] is a variant of RR where one partial level of hosting the service is allowed. The formal definition of $\alpha-$ RR, borrowed from [2, Algorithm 1] is given in Algorithm 3. Similar to RR, the key idea behind $\alpha-$ RR policy is to check whether the current hosting status is optimal in the hindsight using recent arrival patterns to make hosting decisions. $\alpha-$ RR checks the optimal offline policy cost for the recent request arrivals and chooses the fraction of service with the minimum cost. For $\rho_{t}^{\mathcal{P}}\in\{0,1\}$ , $\alpha-$ RR behaves as RR [2]. Therefore we refer $\alpha-$ RR as RR in this paper.

1 Input: Fetch cost

M

, partial hosting level

\alpha_{2}

, latency cost under partial hosting

g(\alpha_{2})

, rent cost

c

, request arrival sequence

\{x_{l}\}_{l\geq 0}^{t}

2 Output: service hosting strategy

r_{t+1}

t>0

3 Initialize:

r_{1}=t_{\text{recent}}=0

4 for each time-slot $t$ do

I_{t}=(M

g(\alpha_{2})

t

t_{\text{recent}}

c

\{x_{l}\}_{l\geq t_{\text{recent}}}^{t})

R_{0}^{(\tau_{0})}=[\underbrace{r_{t},r_{t},\ldots,r_{t}}_{\tau_{0}-t_{\text{recent}}},\underbrace{0,0,\ldots,0}_{t-\tau_{0}}]

R_{\alpha}^{(\tau_{\alpha})}=[\underbrace{r_{t},r_{t},\ldots,r_{t}}_{\tau_{\alpha}-t_{\text{recent}}},\underbrace{\alpha_{2},\alpha_{2},\ldots,\alpha_{2}}_{t-\tau_{\alpha}}]

R_{1}^{(\tau_{1})}=[\underbrace{r_{t},r_{t},\ldots,r_{t}}_{\tau_{1}-t_{\text{recent}}},\underbrace{1,1,\ldots,1}_{t-\tau_{1}}]

\text{minCost}(0)=\displaystyle\min_{\tau_{0}\in(t_{\text{recent}},t)}\textnormal{{totalCost}}(R_{0}^{(\tau_{0})},I_{t})

\text{minCost}(\alpha_{2})=\displaystyle\min_{\tau_{\alpha}\in(t_{\text{recent}},t)}\textnormal{{totalCost}}(R_{\alpha}^{(\tau_{\alpha})},I_{t})

\text{minCost}(1)=\displaystyle\min_{\tau_{1}\in(t_{\text{recent}},t)}\textnormal{{totalCost}}(R_{1}^{(\tau_{1})},I_{t})

r_{t+1}=\displaystyle\operatorname*{arg\,min}_{i\in\{0,\alpha_{2},1\}}\text{minCost}(i)

13 if $r_{t+1}\neq r_{t}$ then

t_{\text{recent}}=t

15 end if

17 end for

18Function totalCost( $R,I_{t}$ ):

g(0)=1

g(1)=0

;

20 cost =

R(1)\times c+x_{1}\times g(R(1))

;

21 for $j\leftarrow 2$ to $t-t_{\text{recent}}$ do

22 cost = cost

+R(j)\times c+x_{j}\times g(R(j))

+M\times\left|R(j)-R(j-1)\right|

;

25 end for

26 return cost;

27 end

Algorithm 3

\alpha

-RetroRenting (

\alpha

-RR)

4 Main Results and Discussion

In this section, we state and discuss our key results.

4.1 Regret Analysis

Our first result characterizes the regret performance of the policies discussed in Section 3 and the fundamental limit on the performance of any online policy for adversarial arrivals.

Theorem 1 (Adversarial Arrivals).

Let the arrivals be generated by an oblivious adversary under the constraint that at most $\kappa$ requests arrive in each time-slot. Then,

(a)

$\mathcal{R}^{\mathcal{P}}_{A}(T)=\Omega(\sqrt{T})$ for any online policy $\mathcal{P}$ .
(b)

$\mathcal{R}^{\text{RR}}_{A}(T)=\Omega({T}).$

(c)

For $\eta_{t}=\alpha\sqrt{t}$ , $\alpha>0$ ,

\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq

\displaystyle\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.

(d)

For $\eta_{t}=\alpha\sqrt{t}$ , $\alpha>0$ ,

$\mathcal{R}_{A}^{\text{FTPL}}(T)\geq M\alpha_{2}/K.$

(e)

For $\eta_{t}=\alpha\sqrt{t}$ , $\alpha>0$ , $\beta>1$ , $\delta>0$ ,

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{A}(T)\leq$	$\displaystyle\sqrt{\beta T(\log M)^{1+\delta}}+\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.$

The key take-away from Theorem 1 is that RR has linear regret and FTPL, W-FTPL have order-optimal regret, i.e., $\mathrm{O}(\sqrt{T})$ with respect to the time horizon under adversarial arrivals. The proof of Theorem 1 is in Section 6.

Our second result characterizes the regret performance of the policies discussed in Section 3 for i.i.d. stochastic arrivals.

Theorem 2 (Stochastic Arrivals).

Let the arrivals in each time-slot be i.i.d. stochastic with mean $\mu$ and $\Delta_{\text{min}}\neq 0$ . Then we have

(a)

$\mathcal{R}^{\text{RR}}_{S}(T)=\Omega({T}).$

(b)

For $\eta_{t}=\alpha\sqrt{t-1}$ , $\alpha>0$ ,

	$\displaystyle\mathcal{R}^{\text{FTPL}}_{S}(T)\leq$	$\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}}$
		$\displaystyle+(16\alpha^{2}+3\kappa^{2})\frac{M}{\Delta_{\text{min}}^{2}},$

where $h_{1}=4\max\{8\alpha^{2},\kappa^{2}\}$ .

(c)

For $\eta_{t}=\alpha\sqrt{t-1}$ , $\alpha>0$ , $\beta>1$ , $\delta>0$ , and $M$ large enough we have,

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}(T)\leq 1$	$\displaystyle+\beta\kappa^{2}(\log M)^{1+\delta}\left(\frac{4}{\Delta_{\text{min}}^{2}}+\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}\Delta_{\text{max}}^{2}}\right)$
		$\displaystyle+(16\alpha^{2}+4\kappa^{2})\left(\frac{1}{\Delta_{\text{min}}}+\frac{1}{\Delta_{\text{min}}^{2}}\right)$
		$\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right),$

where $h_{1}=4\max\{8\alpha^{2},\kappa^{2}\}$ .

Corollary 1 (Stochastic Arrivals).

Let the arrivals in each time-slot be i.i.d. stochastic with mean $\mu$ and $\Delta_{\text{min}}\neq 0$ . For the W-FTPL policy, if $\rho_{t}^{\mathcal{P}}\in\{0,1\}$ , $\beta=\max\{(1+4\alpha/\kappa)^{2},(1+\sqrt{2}/\kappa)^{2}\}$ , then the regret scales at most logarithmically with $M$ .

Remark 1.

While the result in 2 $(c)$ is stated for $\delta>0$ , it can e shown that for $\delta=0$ and $\beta$ chosen suitably large, the regret of W-FTPL scales logarithmically with $M$ .

The key take-aways from Theorem 2 are that RR has linear regret with respect to time, and FTPL/W-FTPL has regret that does not scale with time. Also, the upper bound on the regret of W-FTPL scales as $(\log M)^{1+\delta}$ for $\delta\geq 0$ where as for FTPL, the upper bound on the regret scales linearly with $M$ . W-FTPL thus outperforms FTPL w.r.t. to the dependence of regret on the fetch cost $M$ . We validate these results via simulations in Section 5.

4.2 Competitive Ratio Analysis

Our next result characterizes the competitive ratio of the policies discussed in Section 3. The competitive ratio of RR is characterized in [3, 2], and it is shown that RR has a constant competitive ratio and the competitive ratio decreases with an increase in fetch cost $M$ . We present the results for FTPL and W-FTPL policies below.

Theorem 3 (Adversarial Arrivals).

Let the arrivals be generated by an oblivious adversary under the constraint that at most $\kappa$ requests arrive in each time-slot. Then we have

(a)

For $\eta_{t}=\alpha\sqrt{t-1}$ , $\alpha>0$ ,

	$\displaystyle\sigma^{\text{FTPL}}_{A}(T)\leq$	$\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}$
		$\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.$

(b)

For $\eta_{t}=\alpha\sqrt{t-1}$ , $\alpha>0$ , $\beta>1$ , $\delta\geq 0$ ,

	$\displaystyle\sigma^{\text{W-FTPL}}_{A}(T)\leq$	$\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}$
		$\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}$
		$\displaystyle+\frac{\kappa^{2}}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}.$

The key take-away from Theorem 3 is that the competitive ratios of both FTPL and W-FTPL are bounded by a constant. The competitive ratio of FTPL and W-FTPL deteriorates with $M$ , while the competitive ratio of RR improves with $M$ when only one partial hosting level is permitted [2]. Note that the competitive ratio of FTPL and W-FTPL is bounded by a constant for the setting where any finite number of partial hosting levels are allowed. Compared to this, for RR in [2], the competitive ratio result holds only for one partial hosting level.

Theorem 4 (Stochastic Arrivals).

Let the arrivals in each time-slot be i.i.d. stochastic with mean $\mu$ . Then we have

(a)

For $T$ large enough, $\sigma^{RR}_{S}(T)>1$ .
(b)

For $\eta_{t}=\alpha\sqrt{t-1}$ , $\sigma^{\text{FTPL}}_{S}(T)\leq 1+\mathrm{O}(1/T)$ .
(c)

For $\eta_{t}=\alpha\sqrt{t-1}$ , $\alpha>0$ , $\beta>1$ , $\delta>0$ , $\sigma^{\text{W-FTPL}}_{S}(T)\leq 1+\mathrm{O}(1/T)$ .

The key take-away from Theorem 4 is that for $T$ large enough, FTPL and W-FTPL outperform RR in terms of competitive ratio for stochastic arrivals.

In Section 5 we supplement the results stated in this section through simulations.

5 Simulations

In this section, we compare RR, FTPL and W-FTPL policies via simulations using synthetic request sequences, as well as real trace data from [24]. In all our simulations, we consider $\eta_{t}=0.1\sqrt{t}$ for FTPL and W-FTPL, and $\beta=6$ for W-FTPL.

5.1 Synthetic arrivals

In Figure 1, we plot the regret of different policies as a function of time horizon $T$ . The parameters considered are $c=0.1$ , $M=50$ , $\kappa=5$ . The request arrival considered is similar to the one used in the proof of Theorem 3(b), i.e., we divide time into frames, and each frame starts with $\lceil\frac{M}{\kappa-c}\rceil$ slots of $\kappa$ requests each followed by $\lceil\frac{M}{c}\rceil$ slots of zero requests. We consider 100 such frames for this experiment. The regret is averaged over 50 experiments. We observe that RR has linear regret in this case which agrees with Theorem 1.

Refer to caption — Figure 1: Regret as a function of Time horizon ( $T$ )

5.2 Stochastic arrivals

For the stochastic case, we consider arrivals in each slot to be Bernoulli with parameter $\mu$ . We will restrict attention to the case where there is only one partial level of hosting, i.e., $\rho_{t}^{\mathcal{P}}\in\{0,\alpha_{2},1\}$ . We consider $\alpha_{2}=0.5$ and $g(\alpha_{2})=0.45$ . In Figure2 and Figure3, we consider $\mu=0.4$ , $c=0.45$ . All the results plotted are averaged over 50 independent experiments. In Figure2, we fix $M=5$ and compare the regret performance of the policies as a function of time. We observe that W-FTPL performs better than other policies, and RR has linear regret, which agrees with Theorem 2. In Figure3, we compare the performance of the policies with respect to $M$ for $T=10^{4}$ . We observe that W-FTPL performs better than other policies, and the main difference between the cost of policies is due to fetch cost. Also note that W-FTPL has better dependence on $M$ as compared to FTPL, which agrees with Theorem 2. From Theorem 2 we observe that for large values of $M$ the regret of FTPL is proportional to $M/\Delta_{\text{min}}^{2}$ where as W-FTPL is proportional to $\log M/\Delta_{\text{min}}^{2}$ . Therefore in Figure4, we compare the performance of the policies by varying rent cost $c$ for $M=500$ , $\mu=0.4$ , and $T=10^{4}$ . Again we note that W-FTPL outperforms RR and FTPL. As $c$ gets closer to $\mu$ , the FTPL policy does multiple fetches, and as a result, the regret increases when $c$ is close to $\mu$ .

5.3 Trace driven arrivals

For trace-driven simulations, we use the data collected by Grid Workloads Archive [24], which consists of requests arriving at computational servers. Among the traces mentioned in [24] we use DAS-2 traces, which were provided by the Advanced School for Computing and Imaging (ASCI). We consider slot duration as 1 hour and plot a snapshot of the trace in Figure 5. We consider $\kappa=300$ for all the experiments in this subsection. All the results plotted for trace driven experiments are averaged over 50 independent experiments.

We consider partial service hosting with one partial level for our simulations. This dataset provides information on the number of processors required to serve each request, and this number lies between 1 and 128 for all requests. For this experiment, we consider hosting the entire service as equivalent to having 128 processors at the edge and hosting the partial service as having 16 processors at the edge. It follows that the hosting status $\rho_{t}^{\mathcal{P}}\in\{0,16/125(=0.125),1\}$ . To compute $g(0.125)$ , we use the dataset to compute the fraction of requests that require more than 16 processors for service and obtain that $g(0.125)=0.0328$ .

In Figure6, we consider $M=500$ , $c=100$ , $\kappa=300$ and compare the regret performance of policies as a function of time. We observe that RR performs better than other policies for most values of time horizon. This is because RR attempts to mimic the optimal offline policy, which is not a static policy for this arrival sequence. Note that FTPL and W-FTPL attempt to mimic the optimal static policy and perform worse than RR.

In Figure7, we compare the performance of different policies with respect to $M$ and fix $c=100$ , $\kappa=300$ . We observe that RR performs better than the other policies for some values of $M$ because of the same reason mentioned before. In Figure8, we compare the performance of different policies by varying rent cost $c$ and fix $M=500$ , $\kappa=300$ . Again we note that RR outperforms W-FTPL and FTPL in some cases.

Our simulation results thus indicate that FTPL and W-FTPL outperform RR for i.i.d. stochastic arrivals. In the case where the arrival process is such that the offline optimal policy is not static, RR can outperform FTPL and W-FTPL. Additionally, RR can have linear regret with respect to time, while FTPL and W-FTPL have sub-linear regret with respect to time for all arrival processes considered.

6 Proof of theorems

6.1 Proof of Theorem 1(a)

Proof of Theorem 1(a).

Recall that regret of any policy is given by

	$\displaystyle\mathcal{R}^{\mathcal{P}}_{A}(T,r)=$	$\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t})r_{t}+M(\rho_{t}-\rho_{t-1})^{+}$
		$\displaystyle-\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\right\}.$

Note that $\sup_{r}\mathcal{R}^{\mathcal{P}}_{A}(T,r)\geq{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)]$ where the expectation is w.r.t some specified distribution over request sequences. We lower bound ${\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)]$ to get a lower bound on $\mathcal{R}^{\mathcal{P}}_{A}(T)$ .

	$\displaystyle{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)]=$	$\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t}){\mathbb{E}}_{r}[r_{t}]+M(\rho_{t}-\rho_{t-1})^{+}$
		$\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\right\}]$
	$\displaystyle\geq$	$\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t}){\mathbb{E}}_{r}[r_{t}]$
		$\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M.$

Let $\ell=\operatorname*{arg\,min}_{i\neq 1}\frac{\alpha_{i}}{1-g(\alpha_{i})}$ , $X\sim\text{Ber}(\frac{c\alpha_{\ell}}{\kappa(1-g(\alpha_{\ell}))})$ , and $r_{t}=\kappa X$ and i.i.d overtime, therefore ${\mathbb{E}}_{r}[r_{t}]=\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}$ . From Assumption 2 we have $\alpha_{i}+g(\alpha_{i})r_{t}\leq\kappa$ for all $1\leq i\leq K$ and $0\leq r_{t}\leq\kappa$ therefore $\frac{c\alpha_{\ell}}{\kappa(1-g(\alpha_{\ell}))}\leq 1$ and $X$ is a valid Bernoulli random variable.

	$\displaystyle{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)]\geq$	$\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t})\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}$
		$\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}(1-g(\rho_{t}))\frac{c\rho_{t}}{(1-g(\rho_{t}))}\mathds{1}_{\rho_{t}\neq 0}+g(\rho_{t})\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}$
		$\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M$
	$\displaystyle\overset{(a)}{\geq}$	$\displaystyle\sum_{t=1}^{T}(1-g(\rho_{t}))\frac{c\alpha_{\ell}}{(1-g(\alpha_{\ell}))}+g(\rho_{t})\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}$
		$\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M$
	$\displaystyle\overset{(b)}{\geq}$	$\displaystyle\frac{cT\alpha_{\ell}}{1-g(\alpha_{\ell})}-{\mathbb{E}}_{r}[\min\{R_{T},cT\alpha_{\ell}+g(\alpha_{\ell})R_{T}\}]-M$
	$\displaystyle\overset{(c)}{=}$	$\displaystyle\frac{1}{2}{\mathbb{E}}_{r}[\|R_{T}-cT\alpha_{\ell}-g(\alpha_{\ell})R_{T}\|]-M$
	$\displaystyle=$	$\displaystyle\frac{\kappa(1-g(\alpha_{\ell}))}{2}{\mathbb{E}}_{r}\left[\left\|\frac{R_{T}}{\kappa}-\frac{Tc\alpha_{\ell}}{\kappa(1-g(\alpha_{\ell}))}\right\|\right]-M$
	$\displaystyle\overset{(d)}{=}$	$\displaystyle\Omega(\sqrt{T}),$

where $(a)$ follows from definition of $\ell$ , $(b)$ follows because of $\min_{1\leq k\leq K}a_{k}\leq\min\{a_{i},a_{j}\}$ for $1\leq i,j\leq K$ , $(c)$ follows from Lemma 15 in Section 7, and $(d)$ follows from Lemma 16 in Section 7. ∎

6.2 Proof of Theorem 1(b)

Note that the RR algorithm in [3] can have only one intermediate level of partial hosting i.e., $\rho_{t}^{\text{RR}}\in\{0,\alpha_{2},1\}$ . So throughout the is subsection we consider $\rho_{t}^{\text{RR}}\in\{0,\alpha_{2},1\}$ . We use the following lemmas to prove Theorem 1(b).

Lemma 2.

RR does not fetch any non zero fraction of service for at least $\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil$ slots after eviction of complete service, where
$\alpha_{i^{\prime}}=\operatorname*{arg\,min}_{\alpha_{i}\neq 0}\frac{M\alpha_{i}}{\kappa-c\alpha_{i}-\kappa g(\alpha_{i})}$ .

Proof.

Without loss of generality we consider $t=0$ is when RR last evicted the service. If RR policy fetches $\alpha_{i}\neq 0$ fraction of service in slot $t+1$ , then $c\alpha_{i}t+g(\alpha_{i})R_{t}+M\alpha_{i}<R_{t}$ .

We now prove the lemma by contradiction. Let us assume that RR fetches $\alpha_{i}$ fraction of the service in time slot $t_{f}+1$ and $t_{f}<\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil.$ Then by using the definition of $\alpha_{i^{\prime}}$ , we get $t_{f}<\frac{M\alpha_{i}}{\kappa-c\alpha_{i}-g(\alpha_{i})\kappa}$ and $c\alpha_{i}t_{f}+g(\alpha_{i})\kappa{t_{f}}+M\alpha_{i}>\kappa t_{f}\geq R_{t_{f}}$ which is a contradiction. ∎

Lemma 3.

Once RR fetches any non zero fraction of service, it hosts (some positive fraction of the service) for at least $\lceil\frac{M}{c}\rceil$ slots before it evicts the completely.

Proof.

Without loss of generality we consider $t=0$ is when RR last fetched the service and say it hosted $\alpha_{i}\neq 0$ fraction of the service. For the RR policy to evict the service in slot $t+1$ , we need $c\alpha_{i}t+g(\alpha_{i})R_{t}>M\alpha_{i}+R_{t}$ , or equivalently $R_{t}<\frac{c\alpha_{i}t-M\alpha_{i}}{1-g(\alpha_{i})}$ .

We now prove the lemma by contradiction. Let us assume that RR evicts the service in time slot $t_{e}+1$ and $t_{e}<\lceil\frac{M}{c}\rceil$ . Then from the assumption above, we must have $R_{t_{e}}<\frac{c\alpha_{i}t_{e}-M\alpha_{i}}{1-g(\alpha_{i})}<0$ which is a contradiction. ∎

Proof of Theorem 1(b).

We split the entire time duration $T$ into frames of size $t_{f}+t_{e}$ and consider the arrival sequence in each frame as $r_{t}=\kappa$ for $1\leq t\leq t_{f}$ and $r_{t}=0$ for $t_{f}+1\leq t\leq t_{f}+t_{e}$ , where $\alpha_{i^{\prime}}=\operatorname*{arg\,min}_{\alpha_{i}\neq 0}\frac{M\alpha_{i}}{\kappa-c\alpha_{i}-\kappa g(\alpha_{i})}$ , $t_{f}=\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil$ , and $t_{e}=\lceil\frac{M}{c}\rceil$ .

From Lemma 2 we know that RR does not fetch any fraction of service till time $t_{f}$ . At $t=t_{f}+1$ RR fetches some positive fraction of the service because $c\alpha_{i^{\prime}}t_{f}+g(\alpha_{i^{\prime}})R_{t_{f}}+M\alpha_{i^{\prime}}<R_{t_{f}}$ holds by definition of $t_{f}$ .

From Lemma 3 we know that RR does not evict the service for $t\in[t_{f}+1,t_{f}+t_{e}]$ . At $t=t_{f}+t_{e}+1$ RR evicts the service because for $\alpha_{i}\neq 0$ , $c\alpha_{i}t_{e}+g(\alpha_{i})\sum_{t=t_{f}+1}^{t_{f}+t_{e}}r_{t}<M\alpha_{i}+\sum_{t=t_{f}+1}^{t_{f}+t_{e}}r_{t}$ holds by definition of $t_{e}$ .

So amongst the first $t_{f}+t_{e}$ time-slots, the RR policy does not host for the first $t_{f}$ slots and then hosts some positive fraction of service for the following $t_{e}$ slots. Let $i^{*}=\operatorname*{arg\,min}_{i}c\alpha_{i}T+g(\alpha_{i}R_{T}+M\alpha_{i})$ be the fraction of service to be hosted by optimal static policy. If the optimal static policy is to host $\alpha_{i^{*}}\neq 0$ fraction of service then we have a regret of at least $(\kappa-c\alpha_{i^{*}}-g(\alpha_{i^{*}})\kappa)t_{f}-M\alpha_{i^{*}}$ for initial frame and $(\kappa-c\alpha_{i^{*}}-g(\alpha_{i^{*}})\kappa)t_{f}=(\kappa-c\alpha_{i^{*}}-g(\alpha_{i^{*}})\kappa)\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\rceil>0$ form second frame onwards because of forwarding, which is a constant and does not depend on $T$ . If the optimal static policy is not to host the service, then we have a regret of $c\alpha_{i}t_{e}>0$ till $t_{f}+t_{e}$ for some $\alpha_{i}\neq 0$ , which is a constant and does not depend on $T$ . So in either case, a regret larger than some positive constant(say $d$ ) is occurred in a frame of size $t_{f}+t_{e}$ . Since we split the entire time duration $T$ into frames of size $t_{f}+t_{e}=\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\rceil+\lceil\frac{M}{c}\rceil$ and the request sequence is repeated in each frame, we get linear regret.

	$\displaystyle\mathcal{R}^{\text{RR}}_{A}(T)$	$\displaystyle\geq\sum_{f=1}^{\lfloor T/(t_{f}+t_{e})\rfloor}d-M\alpha_{i^{*}}$
		$\displaystyle\geq\lfloor T/(t_{f}+t_{e})\rfloor d-M\alpha_{i^{*}}$
		$\displaystyle\geq d\left(\frac{T}{\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}+\frac{M}{c}+2}-1\right)-M\alpha_{i^{*}}$
		$\displaystyle=\Omega(T).$

∎

6.3 Proof of Theorem 1(c)

The optimal static policy only fetches the optimal fraction of service to be hosted once. Therefore,

$\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T,r)=$	$\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]$
	$\displaystyle-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\}$
$\displaystyle\leq$	$\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}\}.$	(3)

Since FTPL policy does not consider fetch cost $M$ while making decisions we can decouple the fetch cost and the non fetch cost incurred by the FTPL policy. Therefore we can also decouple the expected regret incurred by FTPL into expected regret without fetch cost ( $M=0$ ) and the fetch cost incurred by FTPL to bound (3). We first bound the expected regret of the FTPL policy with fetch cost $M=0$ and then add it’s expected fetch cost to get the final regret bound.

Lemma 4.

The regret for FTPL policy with non decreasing learning rate $\{\eta_{t}\}_{t=1}^{T}$ and $M=0$ is given by

\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)

\displaystyle\leq\sqrt{2\log K}\left(\eta_{T}+\kappa^{2}\sum_{t=1}^{T}\frac{1}{\eta_{t}}\right).

Proof.

The proof of this theorem follows along the same lines as Theorem 1 of [25] and, Proposition 4.1 of [17]. Recall that $\rho_{t}^{\mathcal{P}}$ represents the fraction of service hosted in slot $t$ by policy $\mathcal{P}$ and $\bm{\rho}_{t}^{\mathcal{P}}$ is a one hot vector where position of one represents the level of service hosted at the edge. In slot $t$ FTPL hosts fraction of service $\rho_{t}^{\text{FTPL}}=\langle\bm{\rho}_{t}^{\text{FTPL}},\bm{s}\rangle$ , where $\bm{\rho}_{t}^{\text{FTPL}}=\operatorname*{arg\,min}_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{t}+\eta_{t}\bm{\gamma}\rangle$ . For ease of notation we suppress the policy in super script and denote $\bm{\rho}_{t}^{\mathcal{P}}$ as $\bm{\rho}_{t}$ and $\rho_{t}^{\mathcal{P}}$ as $\rho_{t}$ in further discussion. Define a time varying potential function

\displaystyle\bm{\Phi}_{t}(\bm{\theta})={\mathbb{E}}_{\bm{\gamma}}\left[\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\theta}+\eta_{t}\bm{\gamma}\rangle\right],

(4)

and observe that the gradient of $\bm{\Phi}_{t}$ at $\bm{\Theta}_{t}$ is ${\mathbb{E}}_{\bm{\gamma}}[\bm{\rho}_{t}]$ . So $\langle\nabla\bm{\Phi}_{t}(\bm{\Theta}_{t}),\bm{\theta}_{t}\rangle={\mathbb{E}}_{\bm{\gamma}}[\langle\bm{\rho}_{t},\bm{\theta}_{t}\rangle]$ . Consequently

	$\displaystyle{\mathbb{E}}_{\bm{\gamma}}[\langle\bm{\rho}_{t},\bm{\theta}_{t}\rangle]$	$\displaystyle=\langle\nabla\bm{\Phi}_{t}(\bm{\Theta}_{t}),\bm{\Theta}_{t+1}-\bm{\Theta}_{t}\rangle,$
		$\displaystyle=\bm{\Phi}_{t}(\bm{\Theta}_{t+1})-\bm{\Phi}_{t}(\bm{\Theta}_{t})-\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t})\rangle,$

where $\tilde{\bm{\theta}}_{t}$ is the line segment joining $\bm{\Theta_{t+1}}$ and $\bm{\Theta_{t}}$ which follows from Taylor’s expansion of $\bm{\Phi}_{t}(\bm{\Theta}_{t+1})$ .

	$\displaystyle\sum_{t=1}^{T}{\mathbb{E}}_{\bm{\gamma}}\left[\langle\bm{\rho}_{t},\bm{\theta}_{t}\rangle\right]$
	$\displaystyle=\sum_{t=1}^{T}\bm{\Phi}_{t}(\bm{\Theta}_{t+1})-\bm{\Phi}_{t}(\bm{\Theta}_{t})-\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t})\rangle,$
	$\displaystyle=\bm{\Phi}_{T}(\bm{\Theta}_{T+1})-\bm{\Phi}_{1}(\bm{\Theta}_{1})+\sum_{t=2}^{T}\bm{\Phi}_{t-1}(\bm{\Theta}_{t})-\bm{\Phi}_{t}(\bm{\Theta}_{t})$
	$\displaystyle-\sum_{t=1}^{T}\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t}\rangle.$		(5)

By Jensen’s inequality

	$\displaystyle\bm{\Phi}_{T}(\bm{\Theta}_{T+1})$	$\displaystyle={\mathbb{E}}\left[\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{T+1}+\eta_{t}\bm{\gamma}\rangle\right]$
		$\displaystyle\leq\min_{\bm{\rho}\in\mathcal{X}}{\mathbb{E}}\left[\langle\bm{\rho},\bm{\Theta}_{T+1}+\eta_{t}\bm{\gamma}\rangle\right]$
		$\displaystyle\overset{(a)}{=}\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{T+1}\rangle,$

where $(a)$ follows because $\gamma_{i}$ ’s are standard Gaussian random variables. Recall that $\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{T+1}\rangle$ is the loss incurred by the offline static policy. Therefore by rearranging (5) and using Jensen’s inequality we get

	$\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq$	$\displaystyle-\bm{\Phi}_{1}(\bm{\Theta_{1}})+\sum_{t=2}^{T}\bm{\Phi}_{t-1}(\bm{\Theta}_{t})-\bm{\Phi}_{t}(\bm{\Theta}_{t})$
		$\displaystyle-\sum_{t=1}^{T}\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t}\rangle.$		(6)

We bound each term in the RHS separately to get the final result.

The first term is $-\bm{\Phi}_{1}(\bm{\Theta}_{1})=-\bm{\Phi}_{1}(0)=-\eta_{1}{\mathbb{E}}_{\bm{\gamma}}[\min_{\bm{\rho}}\langle\bm{\rho},\bm{\gamma}\rangle]$
$=\eta_{1}{\mathbb{E}}_{\bm{\gamma}}[\max_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\gamma}\rangle]$ since Gaussian random variables are symmetric. By using the result of Lemma 9 in [25], we get

-\bm{\Phi}_{1}(0)\leq\eta_{1}\sqrt{2\log K}.

(7)

The second term can be analyzed by considering

	$\displaystyle\bm{\Phi}_{t-1}(\bm{\Theta}_{t})-\bm{\Phi}_{t}(\bm{\Theta}_{t})$
	$\displaystyle={\mathbb{E}}_{\bm{\gamma}}[\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\theta}_{t}+\eta_{t-1}\bm{\gamma}\rangle-\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\theta_{t}}+\eta_{t}\bm{\gamma}\rangle]$
	$\displaystyle\overset{(a)}{\leq}{\mathbb{E}}_{\bm{\gamma}}[\max_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},(\eta_{t-1}-\eta_{t})\bm{\gamma}\rangle]$
	$\displaystyle\leq\|\eta_{t}-\eta_{t-1}\|\sqrt{2\log K},$		(8)

where $(a)$ follows because $\max_{\bm{y}\in\mathcal{Y}}(f(\bm{y})-g(\bm{y}))\geq\max_{\bm{y}\in\mathcal{Y}}f(\bm{y})-\max_{\bm{y}\in\mathcal{Y}}g(\bm{y})$ for any arbitrary functions $f$ , $g$ . We obtain (8) by using Lemma 9 in [25].

Now we bound the third term in (6). Let $H=\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})$ . By using Lemma 7 of [26] we have,

\displaystyle H_{i,j}=\frac{1}{\eta_{t}}{\mathbb{E}}[\hat{x}(\tilde{\bm{\theta}_{t}}+\eta_{t}\bm{\gamma})_{i}\bm{\gamma}_{j}].

If we replace $\eta_{t}$ with $\eta_{t}/\kappa$ and $\theta_{t}$ with $\theta_{t}/\kappa$ and apply Lemma 2 of [25] we get

	$\displaystyle-\bigg{\langle}\frac{\bm{\theta}_{t}}{\kappa},\kappa H\frac{\bm{\theta}_{t}}{\kappa}\bigg{\rangle}\leq\frac{2\kappa}{\eta_{t}}\sqrt{2\log K}$
	$\displaystyle\implies-\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t}\rangle\leq\frac{2\kappa^{2}}{\eta_{t}}\sqrt{2\log K}.$		(9)

By using (7), (8) and (9), we bound (6),

	$\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq$	$\displaystyle\eta_{1}\sqrt{2\log K}+\sum_{t=2}^{T}\|\eta_{t}-\eta_{t-1}\|\sqrt{2\log K}$
		$\displaystyle+\sum_{t=1}^{T}\frac{2\kappa^{2}}{\eta_{t}}\sqrt{2\log K}$
	$\displaystyle\leq$	$\displaystyle\eta_{T}\sqrt{2\log K}+\sum_{t=1}^{T}\frac{2\kappa^{2}}{\eta_{t}}\sqrt{2\log K}$
	$\displaystyle\leq$	$\displaystyle\eta_{T}\sqrt{2\log K}+2\kappa^{2}\sqrt{2\log K}\sum_{t=1}^{T}\frac{1}{\eta_{t}}.$

∎

Now we bound the expected fetch cost incurred by FTPL policy.

Lemma 5.

The expected fetch cost under FTPL policy upto time $T$ denoted by ${\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{F}(T)]$ with learning rate $\eta_{t}=\alpha\sqrt{t}$ is bounded as follows

{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{F}(T)]\leq\frac{MK^{2}(c+2\kappa)}{2\alpha\sqrt{\pi}}{\sqrt{T+1}}.

Proof.

The proof of this theorem follows similar a line of arguments as in [17] for the case with $N=2$ and cache size 1 case. The fetch cost is incurred only when the service is fetched. The overall cost of fetching is $\sum_{t=1}^{T}M(\rho_{t+1}-\rho_{t})^{+}$ . So, the expected fetch cost is given by

	$\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{F}(T)]$	$\displaystyle={\mathbb{E}}\left[\sum_{t=1}^{T}M(\rho_{t+1}-\rho_{t})\mathds{1}_{\{\rho_{t+1}>\rho_{t}\}}\right]$
		$\displaystyle\leq M\sum_{t=1}^{T}{\mathbb{P}}(\rho_{t+1}>\rho_{t})$
		$\displaystyle\leq M\sum_{t=1}^{T}\sum_{j>i}{\mathbb{P}}(\rho_{t+1}=\alpha_{j},\rho_{t}=\alpha_{i}).$

Let $\mathcal{E}_{i,j}^{(t)}$ be the event that $\rho_{t}=\alpha_{i}$ , $\rho_{t+1}=\alpha_{j}$ . $\mathcal{E}_{i,j}^{(t)}$ occurs only if $\Theta_{t,i}+\eta_{t}\gamma_{i}<\Theta_{t,j}+\eta_{t}\gamma_{j}$ , $\Theta_{t+1,i}+\eta_{t+1}\gamma_{i}>\Theta_{t+1,j}+\eta_{t+1}\gamma_{j}$ . Let $\delta_{\alpha}=\alpha_{j}-\alpha_{i}$ , $\delta_{g}=g(\alpha_{i})-g(\alpha_{j})$ , note that $0<\delta_{\alpha}\leq 1$ , $0<\delta_{g}\leq 1$ .

	$\displaystyle{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})$
	$\displaystyle\leq{\mathbb{P}}\left(\Theta_{t,i}+\eta_{t}\gamma_{i}<\Theta_{t,j}+\eta_{t}\gamma_{j},\right.$
	$\displaystyle\quad\qquad\left.\Theta_{t+1,i}+\eta_{t+1}\gamma_{i}>\Theta_{t+1,j}+\eta_{t+1}\gamma_{j}\right)$
	$\displaystyle={\mathbb{P}}\left(\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{\sqrt{2}\eta_{t+1}}<\frac{(\gamma_{i}-\gamma_{j})}{\sqrt{2}}<\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}}\right)$
	$\displaystyle\overset{(a)}{\leq}\frac{\Theta_{t,j}-\Theta_{t,i}}{2\sqrt{\pi}\eta_{t}}-\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{2\sqrt{\pi}\eta_{t+1}}$
	$\displaystyle=\frac{1}{2\sqrt{\pi}}\left((c(t-1)\delta_{\alpha}-\delta_{g}R_{t-1})\left(\frac{1}{\eta_{t}}-\frac{1}{\eta_{t+1}}\right)\right)$
	$\displaystyle\quad-\frac{1}{2\sqrt{\pi}}\left(\frac{c\delta_{\alpha}-\delta_{g}r_{t}}{\eta_{t+1}}\right)$
	$\displaystyle\leq\frac{1}{2\sqrt{\pi}}\left(ct\left(\frac{1}{\eta_{t}}-\frac{1}{\eta_{t+1}}\right)+\frac{\delta_{g}r_{t}-c\delta_{\alpha}}{\eta_{t+1}}\right)$
	$\displaystyle\overset{(b)}{\leq}\frac{1}{2\sqrt{\pi}}\left(ct\left(\frac{1}{\eta_{t}}-\frac{1}{\eta_{t+1}}\right)+\frac{\kappa}{\eta_{t+1}}\right)$		(10)
	$\displaystyle=\frac{1}{2\alpha\sqrt{\pi}}\left(ct\left(\frac{1}{\sqrt{t}}-\frac{1}{\sqrt{t+1}}\right)+\frac{\kappa}{\sqrt{t+1}}\right)$
	$\displaystyle=\frac{1}{2\alpha\sqrt{\pi}}\left(c\sqrt{\frac{t}{t+1}}\frac{1}{\sqrt{t}+\sqrt{t+1}}+\frac{\kappa}{\sqrt{t+1}}\right)$
	$\displaystyle\overset{(c)}{\leq}\frac{(c+2\kappa)}{4\alpha\sqrt{\pi(t+1)}}.$

Here $(a)$ follows from the fact that ${\mathbb{P}}(a<Z<b)\leq\frac{b-a}{\sqrt{2\pi}}$ , where $Z\sim\mathcal{N}(0,1)$ , $(b)$ follows because $\delta_{g}\leq 1$ and $r_{t}\leq\kappa$ and, $(c)$ follows because $\frac{\sqrt{t}}{\sqrt{t}+\sqrt{t+1}}\leq\frac{1}{2}$ . Therefore,

	$\displaystyle{\mathbb{E}}[\mathcal{C}_{f}^{\text{FTPL}}(T)]$	$\displaystyle\leq\frac{M(c+2\kappa)}{4\alpha\sqrt{\pi}}\sum_{t=1}^{T}\sum_{j>i}\frac{1}{\sqrt{t+1}}$
		$\displaystyle\leq\frac{MK^{2}(c+2\kappa)}{2\alpha\sqrt{\pi}}{\sqrt{T+1}}.$

∎

Proof of Theorem 1(c).

By using Lemma 4 and 5 we get,

	$\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq$	$\displaystyle\sqrt{2\log K}\left(\alpha\sqrt{T}+2\kappa^{2}\sum_{t=1}^{T}\frac{1}{\alpha\sqrt{t}}\right)$
		$\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}$
	$\displaystyle\leq$	$\displaystyle\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.$

∎

6.4 Proof of Theorem 1(d)

Proof of Theorem 1(d).

We characterize the fetch cost under FTPL to analyze the impact of $M$ on the total cost incurred under FTPL. Fetch cost is incurred if $\rho_{t-1}=0$ , $\rho_{t}=1$ . Consider a request sequence $r_{1}=1$ and $r_{t}=0$ for all $t\geq 2$ . The optimal static policy is to not host any fraction of service i.e., to forward all the requests to back-end server.

The probability of hosting $\alpha_{i}$ , $1\leq i\leq K$ fraction of service in first slot is $1/K$ . Hosting any non zero fraction of service will incur at least $M\alpha_{2}$ fetch cost. Therefore the expected fetch cost of FTPL policy will be at least $M\alpha_{2}/K$ which is a lower bound on the expected cost of FTPL policy for this request sequence. For FTPL we can decouple the fetch cost and the non fetch cost and FTPL does not consider $M$ while making decisions. Therefore $M\alpha_{2}/K$ is also a lower bound for regret of FTPL. This completes the proof.

∎

6.5 Proof of Theorem 1(e)

Proof of Theorem 1(e).

Recall that $T_{s}=\min\{t:t<\frac{(\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j}))^{2}}{\kappa^{2}\beta(\log M)^{1+\delta}}\}$ . For arrival sequences with requests arriving in time-slots 1 to $T$ , we have two possible cases, namely, $T_{s}\geq T$ and $T_{s}<T$ . We consider each case to bound the regret of W-FTPL policy.

Case I ( $T_{s}\geq T$ ): Let $r^{(1)}$ be a request sequence chosen by adversary such that $T_{s}\geq T$ , then

	$\displaystyle\|\max_{i\neq j}(\Theta_{T+1,j}-\Theta_{T+1,i})\|$	$\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}$
	$\displaystyle\implies\|\max_{i}(\Theta_{T+1,1}-\Theta_{T+1,i})\|$	$\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}$
	$\displaystyle\overset{(a)}{\implies}\mathcal{R}^{\text{W-FTPL}}_{A}(T)$	$\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}},$

$(a)$ follows because $\rho_{t}=0$ for $t<T_{s}$ for W-FTPL.

Case II ( $T_{s}<T$ ): Let $r^{(2)}$ be a request sequence chosen by adversary such that $T_{s}<T$ , then by using case I we can bound

	$\displaystyle\|\max_{i\neq j}\Theta_{T_{s},i}-\Theta_{T_{s},j}\|$	$\displaystyle<\kappa\sqrt{\beta(T_{s}-1)(\log M)^{1+\delta}}$
	$\displaystyle\implies\mathcal{R}^{\text{W-FTPL}}_{A}(T_{s}-1)$	$\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}.$

Thus combining both cases we get upper bound on regret in wait phase as

\displaystyle\mathcal{R}^{\text{W-FTPL}}_{A}(T_{s}-1)

\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}.

(11)

Note that W-FTPL policy follows FTPL policy after its waiting time i.e., from time $T_{s}$ . Therefore W-FTPL and FTPL take same decisions and have same cost from time $T_{s}$ to $T$ . Thus the regret in a slot is also same for W-FTPL and FTPL from time $T_{s}$ to $T$ . If $T_{s}\geq T$ then only wait phase will be there and regret in FTPL phase is considered to be zero. We denote regret from time $t_{1}$ to $t_{2}$ as $\mathcal{R}_{A}^{\mathcal{P}}(t_{1}:t_{2})$ .

$\displaystyle\mathcal{R}_{A}^{\text{W-FTPL}}(T)$	$\displaystyle=\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}-1)+\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}:T)$
	$\displaystyle=\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}-1)+\mathcal{R}_{A}^{\text{FTPL}}(T_{s}:T)$
	$\displaystyle\leq\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}-1)+\mathcal{R}_{A}^{\text{FTPL}}(1:T).$	(12)

By (11), (12) and Theorem 1(c) we get

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{A}(T)\leq$	$\displaystyle\kappa\sqrt{\beta T(\log M)^{1+\delta}}$
		$\displaystyle+\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha^{2}}\right)$
		$\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.$

∎

6.6 Proof of Theorem 2(a)

Proof of Theorem 2(a).

From Lemma 3 we know that once RR fetches any non-zero fraction of service, it hosts (some positive fraction of the service) for at least $\left\lceil\frac{M}{c}\right\rceil$ slots before it evicts completely. From Lemma 2, we know that, once RR evicts the complete service from the edge, then it does not fetch any non zero fraction of the service for at least $\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil$ slots. We divide the entire time $T$ into frames of size $f_{s}=\left\lceil\frac{M}{c}\right\rceil+\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil$ slots. Let us define an event $\mathcal{F}$ as the frame that starts with $\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil$ slots with $\kappa$ arrivals in each slot and followed by $\left\lceil\frac{M}{c}\right\rceil$ slots of zeros arrivals. Let the probability of event $\mathcal{F}$ occurring be ${\mathbb{P}}(\mathcal{F})$ (independent of $T$ ). Conditioned on event $\mathcal{F}$ , if optimal static policy is to host non zero fraction of service then we have regret at least $(\kappa-c\alpha_{i}+g(\alpha_{i})\kappa)\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil>0$ in a frame where $\alpha_{i}\neq 0$ . If optimal is not to host any fraction of service then we have regret of $c\alpha_{i}\left\lceil\frac{M}{c}\right\rceil>0$ in a frame where $\alpha_{i}\neq 0$ . Therefore conditioned on event $\mathcal{F}$ RR always has a finite nonzero regret say $d(>0)$ which is independent of $T$ . Therefore,

	$\displaystyle\mathcal{R}^{\text{RR}}_{S}(T)$	$\displaystyle={\mathbb{E}}_{r}\left[\sum_{f=1}^{\lfloor T/f_{s}\rfloor}d\mathds{1}(\text{event $\mathcal{F}$ occured})\right]$
		$\displaystyle\geq\left(\frac{T}{f_{s}}-1\right){\mathbb{P}}(\mathcal{F})d$
		$\displaystyle\geq\left(\frac{T}{\frac{M}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}+\frac{M}{c}+2}-1\right){\mathbb{P}}(\mathcal{F})d$
		$\displaystyle=\Omega(T).$

So even in the stochastic case RR observes linear regret. ∎

6.7 Proof of Theorem 2(b)

Lemma 6.

Probability of hosting a sub-optimal fraction of service $\alpha_{j}\neq\alpha_{i^{\star}}$ under the FTPL policy in time slot $t+1$ is bounded as follows

\displaystyle{\mathbb{P}}(\rho_{t+1}=\alpha_{j})\leq\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right).

Proof.

FTPL hosts $\alpha_{j}$ fraction of service if $c\alpha_{i}t+g(\alpha_{i})R_{t}+\eta_{t+1}\gamma_{i}>c\alpha_{j}t+g(\alpha_{j})R_{t}+\eta_{t+1}\gamma_{j}$ in time slot $t+1$ , for all $i\neq j$ . Let $p_{t,j}={\mathbb{P}}(\rho_{t+1}=\alpha_{j})$ .

	$\displaystyle p_{t,j}\leq{\mathbb{P}}(c\alpha_{j}t$	$\displaystyle+g(\alpha_{j})R_{t}+\eta_{t+1}\gamma_{j}$
		$\displaystyle<c\alpha_{i^{\star}}t+g(\alpha_{i^{\star}})R_{t}+\eta_{t+1}\gamma_{i^{\star}})$
	$\displaystyle={\mathbb{P}}(\eta_{t+1}$	$\displaystyle(\gamma_{i^{\star}}-\gamma_{j})$
		$\displaystyle\geq c(\alpha_{j}-\alpha_{i^{\star}})t+(g(\alpha_{j})-g(\alpha_{i^{\star}}))R_{t}).$

There can be two possibilities one is $\alpha_{j}>\alpha_{i^{\star}}$ or $\alpha_{j}<\alpha_{i^{\star}}$ we bound the probability $p_{t,j}$ by considering each case separately.
Case 1 ( $\alpha_{j}>\alpha_{i^{\star}}$ ): Since $g(.)$ is a decreasing function $g(\alpha_{j})<g(\alpha_{i^{\star}})$ . Let $\mathcal{E}_{j}$ be the event that $R_{t}<\mu t+t\Delta_{j}/2$ .

	$\displaystyle p_{t,j}\leq$	$\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq c(\alpha_{j}-\alpha_{i^{\star}})t$
		$\displaystyle\qquad\qquad+(g(\alpha_{j})-g(\alpha_{i^{\star}}))R_{t},\mathcal{E}_{j})+{\mathbb{P}}(\mathcal{E}_{j}^{c})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}-(g(\alpha_{i^{\star}})-g(\alpha_{j}))t\Delta_{j}/2)$
		$\displaystyle+{\mathbb{P}}(\mathcal{E}_{j}^{c})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}/2)+{\mathbb{P}}(\mathcal{E}_{j}^{c})$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right),$

where $(a)$ is obtained by using the fact that complementary CDF of standard Gaussian $Q(x)\leq e^{-x^{2}/2}$ for $x>0$ and Hoeffding’s inequality [27].

Case 2 ( $\alpha_{j}<\alpha_{i^{\star}}$ ): Let $\mathcal{E}_{j}$ be an event $R_{t}>\mu t-t\Delta_{j}/2$ .

	$\displaystyle p_{t,j}\leq$	$\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq c(\alpha_{j}-\alpha_{i^{\star}})t$
		$\displaystyle\qquad\qquad+(g(\alpha_{j})-g(\alpha_{i^{\star}}))R_{t},\mathcal{E}_{j})+{\mathbb{P}}(\mathcal{E}_{j}^{c})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}-(g(\alpha_{j})-g(\alpha_{i^{\star}}))t\Delta_{j}/2)$
		$\displaystyle+{\mathbb{P}}(\mathcal{E}_{j}^{c})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}/2)+{\mathbb{P}}(\mathcal{E}_{j}^{c})$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right),$

where $(a)$ is obtained by using the fact that $Q(x)\leq e^{-x^{2}/2}$ for $x>0$ and Hoeffding’s inequality [27].

Therefore by combining both cases we get

{\mathbb{P}}(\rho_{t+1}=\alpha_{i})\leq\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right).

(13)

∎

Lemma 7.

For $M=0$ ,

	$\displaystyle\mathcal{R}_{S}^{\text{FTPL}}(T)\leq$	$\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}},$

where $h_{1}=2\max\{16\alpha^{2},2\kappa^{2}\}$ .

Proof.

Let $t_{0}=\lceil h_{1}\frac{\log K}{\Delta_{\text{min}}^{2}}\rceil$ where $h_{1}=2\max\{16\alpha^{2},2\kappa^{2}\}$ . From (2) we have

	$\displaystyle\mathcal{R}_{S}^{\text{FTPL}}(T)\leq$	$\displaystyle\left(\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t})\mu-\mu_{i^{\star}}\right)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\left(\sum_{i}\mu_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})\right)-\mu_{i^{\star}}$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{t_{0}}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})$
		$\displaystyle+\sum_{t=t_{0}+1}^{T}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle\sqrt{2t_{0}\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\sum_{t=t_{0}+1}^{T}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i}),$

where $(a)$ is obtained by using adversarial regret bound when $M=0$ . Now we consider each term in RHS separately and bound them.

\displaystyle\sqrt{2t_{0}\log K}\leq

\displaystyle\sqrt{2\log K}\left(1+\frac{\sqrt{h_{1}\log K}}{\Delta_{\text{min}}}\right).

By using Lemma 6 we have

	$\displaystyle\Delta_{j}{\mathbb{P}}(\rho_{t+1}$	$\displaystyle=\alpha_{j})$
	$\displaystyle\leq$	$\displaystyle\Delta_{j}\left[\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right)\right]$
	$\displaystyle=$	$\displaystyle\Delta_{j}\left[\exp\left(-\frac{t\Delta_{j}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right)\right]$
	$\displaystyle\overset{(b)}{\leq}$	$\displaystyle\Delta_{\text{min}}\left[\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)\right].$

Here $(b)$ is obtained using the fact that $f(u)=ue^{-u^{2}}$ is a decreasing function over $u\in[1,\infty)$ , and for $t>t_{0}$ , $\frac{t\Delta_{j}^{2}}{16\alpha^{2}}>\frac{h_{1}\log K\Delta_{j}^{2}}{16\alpha^{2}\Delta_{\text{min}}^{2}}\geq 1$ and $\frac{t\Delta_{j}^{2}}{2\kappa^{2}}>\frac{h_{1}\log K\Delta_{j}^{2}}{2\kappa^{2}\Delta_{\text{min}}^{2}}\geq 1$ . Therefore,

	$\displaystyle\mathcal{R}_{S}^{\text{FTPL}}$	$\displaystyle(T)$
	$\displaystyle\leq$	$\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\sum_{t=t_{0}}^{T}K\Delta_{\text{min}}\left[\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)\right]$
	$\displaystyle\overset{(c)}{\leq}$	$\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\frac{16\alpha^{2}+2\kappa^{2}}{\Delta_{\text{min}}}+2\Delta_{\text{min}}$
	$\displaystyle\leq$	$\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}}.$

Here $(c)$ follows because for $t>t_{0}$ we have $K\exp(-t\Delta_{\text{min}}^{2}/16\alpha^{2})<1$ , $K\exp(-t\Delta_{\text{min}}^{2}/2\kappa^{2})<1$ and $\sum_{t=1}^{\infty}e^{-at}\leq 1/a$ for $a>0$ . ∎

Lemma 8.

Fetch cost under FTPL policy with learning rate $\eta_{t}=\alpha\sqrt{t-1}$ is bounded as follows

{\mathbb{E}}[\mathcal{C}_{f}^{\text{FTPL}}(T)]\leq M\frac{16\alpha^{2}+2\kappa^{2}}{\Delta_{\text{min}}^{2}}.

Proof.

The fetch cost is incurred when we fetch extra fraction of service. Similar to the proof of Lemma 5 we get

	$\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{f}(T)]$	$\displaystyle\leq M\sum_{t=1}^{T}{\mathbb{P}}(\rho_{t}<\rho_{t+1})$
		$\displaystyle=M\sum_{t=1}^{T}\sum_{j>i}{\mathbb{P}}(\rho_{t}=\alpha_{i},\rho_{t+1}=\alpha_{j}).$

Let $\mathcal{E}_{i,j}^{(t)}$ be the event that $\rho_{t}=\alpha_{i}$ , $\rho_{t+1}=\alpha_{j}$ . $\mathcal{E}_{i,j}^{(t)}$ occurs if $\Theta_{t,i}+\eta_{t}\gamma_{i}<\Theta_{t,j}+\eta_{t}\gamma_{j}$ , $\Theta_{t+1,i}+\eta_{t+1}\gamma_{i}>\Theta_{t+1,j}+\eta_{t+1}\gamma_{j}$ .

\displaystyle{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})

\displaystyle={\mathbb{P}}\left(\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{\sqrt{2}\eta_{t+1}}\leq\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}}\right).

We consider two cases $\mu_{i}>\mu_{j}$ and $\mu_{i}<\mu_{j}$ and bound the probability.
Case 1 ( $\mu_{i}>\mu_{j}$ ): Let $\mathcal{E}_{1}$ be the event that $|R_{t-1}-\mu(t-1)|\leq|\Delta_{ij}|(t-1)/2$ ,

	$\displaystyle{\mathbb{P}}($	$\displaystyle\mathcal{E}_{i,j}^{(t)})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}}\right)$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}},\mathcal{E}_{1}\right)+{\mathbb{P}}(\mathcal{E}_{1}^{c})$
	$\displaystyle=$	$\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{c(\alpha_{j}-\alpha_{i})(t-1)}{\sqrt{2}\eta_{t}}\right.$
		$\displaystyle\hskip 76.82234pt\left.-\frac{(g(\alpha_{i})-g(\alpha_{j}))R_{t-1}}{\sqrt{2}\eta_{t}},\mathcal{E}_{1}\right)+{\mathbb{P}}(\mathcal{E}_{1}^{c})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{-\|\Delta_{ij}\|(t-1)}{2\sqrt{2}\eta_{t}}\right)+{\mathbb{P}}(\mathcal{E}_{1}^{c})$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle\exp\left(-\frac{(t-1)^{2}\Delta_{ij}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{ij}^{2}(t-1)}{2\kappa^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{(t-1)^{2}\Delta_{\text{min}}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{(t-1)\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right),$

where $(a)$ is obtained by using the fact that $Q(x)\leq e^{-x^{2}/2}$ for $x>0$ and Hoeffding’s inequality [27].

Case 2 ( $\mu_{i}<\mu_{j}$ ): Let $\mathcal{E}_{2}$ be the event that $|R_{t}-\mu t|\leq|\Delta_{ij}|t/2$ ,

	$\displaystyle{\mathbb{P}}$	$\displaystyle(\mathcal{E}_{i,j}^{(t)})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{\sqrt{2}\eta_{t+1}}\leq\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\right)$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{c(\alpha_{j}-\alpha_{i})t-(g(\alpha_{i})-g(\alpha_{j}))R_{t}}{\sqrt{2}\eta_{t}}\leq\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\right)$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\geq\frac{c(\alpha_{j}-\alpha_{i})t-(g(\alpha_{i})-g(\alpha_{j}))R_{t}}{\sqrt{2}\eta_{t}},\mathcal{E}_{2}\right)$
		$\displaystyle+{\mathbb{P}}(\mathcal{E}_{2}^{c})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\geq\frac{\|\Delta_{ij}\|t}{2\sqrt{2}\eta_{t}}\right)+{\mathbb{P}}(\mathcal{E}_{2}^{c})$
	$\displaystyle\overset{(b)}{\leq}$	$\displaystyle\exp\left(-\frac{t^{2}\Delta_{ij}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{ij}^{2}t}{2\kappa^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{t^{2}\Delta_{\text{min}}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{(t-1)\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right),$

where $(b)$ is obtained by using the fact that $Q(x)\leq e^{-x^{2}/2}$ for $x>0$ and Hoeffding’s inequality [27].

From combining both cases we get

\displaystyle{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})\leq\exp\left(-\frac{(t-1)\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right).

(14)

Therefore,

	$\displaystyle{\mathbb{E}}[\mathcal{C}_{F}^{\text{FTPL}}]\leq$	$\displaystyle M\sum_{t=1}^{T}\sum_{j>i}{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})$
	$\displaystyle\leq$	$\displaystyle MK^{2}\sum_{t=1}^{T-1}\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)$
		$\displaystyle+MK^{2}$
	$\displaystyle\leq$	$\displaystyle MK^{2}\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}.$

∎

Proof of Theorem 2(b).

Note that FTPL policy does not consider fetch cost $M$ while taking the decisions. By using Lemma 7, 8 we get the result stated. ∎

6.8 Proof of Theorem 2(c)

Lemma 9.

Under the W-FTPL policy, $T_{s}>t^{\prime}$ if and only if there exist $t<t^{\prime}$ such that the following condition holds:

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}$	$\displaystyle>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},$
	or
	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}$	$\displaystyle<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},$

where $(\tilde{i}_{t},\tilde{j}_{t})=\operatorname*{arg\,max}_{i\neq j}\Theta_{t+1,i}-\Theta_{t+1,j}$ , $\hat{\mu}_{t}=R_{t}/t$ .

Proof.

Let $(\tilde{i}_{t},\tilde{j}_{t})=\operatorname*{arg\,max}_{i\neq j}\Theta_{t+1,i}-\Theta_{t+1,j}$ , $\hat{\mu}_{t}=R_{t}/t$ . $T_{s}>t^{\prime}$ if and only if there exist $t<t^{\prime}$ such that

	$\displaystyle\frac{\|\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j})\|}{t}$	$\displaystyle<\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}$
	$\displaystyle\iff\frac{\|\Theta_{t+1,\tilde{j}_{t}}-\Theta_{t+1,\tilde{i}_{t}\|}}{t}$	$\displaystyle>\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}$
		$\displaystyle\Updownarrow$
	$\displaystyle\|c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}\|$	$\displaystyle>\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}$
		$\displaystyle\Updownarrow$
	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})$	$\displaystyle+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}$
		or
	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})$	$\displaystyle-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}.$

This completes the proof. ∎

Lemma 10.

Under the W-FTPL policy, $T_{s}>\frac{(\sqrt{\beta}-1)^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}$ with probability at least $1-\frac{(\sqrt{\beta}-1)^{2}(\log M)^{1+\delta}}{M^{2(\log M)^{\delta}}\Delta_{\text{max}}^{2}}$ .

Proof.

Let $\mathcal{E}_{t}$ be the event that

|\mu-\hat{\mu}_{t}|\leq\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}},

where $\hat{\mu}_{t}=R_{t}/t$ . By using Hoeffding’s inequality [27] we get

{\mathbb{P}}(\mathcal{E}_{t})\geq 1-\frac{1}{M^{2(\log M)^{\delta}}}.

Let $\mathcal{E}=\cap_{t=1}^{T_{0}}\mathcal{E}_{t}$ , where $T_{0}=\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}$ . Using the union bound, we get:

{\mathbb{P}}(\mathcal{E})\geq 1-\frac{T_{0}}{M^{2(\log M)^{\delta}}}.

Let $\epsilon_{t}=\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}.$ We prove the lemma by contradiction. Let us consider

T_{s}<\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},

and $\mathcal{E}$ holds, then by Lemma 9, $\exists t<\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}$ such that

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}$	$\displaystyle>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},$
		or
	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}$	$\displaystyle<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},$

holds. There are two possibilities $\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}$ , $\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}$ we consider them separately to prove the final result.
Case 1 ( $\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}$ ): It follows that

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\epsilon_{t}$
	$\displaystyle\implies\hat{\mu}_{t}-\mu>\frac{c(\alpha_{\tilde{i}_{t}}-\alpha_{\tilde{j}_{t}})+\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}-\mu$
	$\displaystyle\implies\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}>\frac{-\Delta_{\text{max}}+\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}$
	$\displaystyle\implies(\sqrt{\beta}-g(\alpha_{\tilde{j}_{t}})+g(\alpha_{\tilde{i}_{t}}))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}$
	$\displaystyle\implies(\sqrt{\beta}-1))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}$
	$\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},$

which is a contradiction. Alternatively,

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\epsilon_{t}$
	$\displaystyle\implies\hat{\mu}_{t}-\mu<\frac{c(\alpha_{\tilde{i}_{t}}-\alpha_{\tilde{j}_{t}})-\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}-\mu$
	$\displaystyle\implies-\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\frac{\Delta_{\text{max}}-\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}$
	$\displaystyle\implies(\sqrt{\beta}-g(\alpha_{\tilde{j}_{t}})+g(\alpha_{\tilde{i}_{t}}))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}$
	$\displaystyle\implies(\sqrt{\beta}-1))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}$
	$\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},$

which is also a contradiction.
Case 2 ( $\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}$ ): It follows that:

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\epsilon_{t}$
	$\displaystyle\implies\hat{\mu}_{t}-\mu<\frac{c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}-\mu$
	$\displaystyle\implies-\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\frac{\Delta_{\text{max}}-\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}$
	$\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},$

which is a contradiction. Alternatively,

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\epsilon_{t}$
	$\displaystyle\implies\hat{\mu}_{t}-\mu>\frac{c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}-\mu$
	$\displaystyle\implies\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}>\frac{-\Delta_{\text{max}}+\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}$
	$\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},$

which is also a contradiction.

Therefore $T_{s}>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}$ , if $\mathcal{E}$ occurs, which happens with probability at least $1-\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{M^{2(\log M)^{\delta}}\Delta_{\text{max}}^{2}}$ . ∎

Lemma 11.

Under the W-FTPL policy, ${\mathbb{P}}(T_{s}>t)\leq\exp(-\frac{\Delta_{\text{min}}^{2}t}{2\kappa^{2}})$ for $t>\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}$ .

Proof.

For $t>\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}$ , we have $\frac{\Delta_{\text{min}}}{2}>\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}$ . For $T_{s}>t$ , we have

	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}$	$\displaystyle<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},$
	and
	$\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}$	$\displaystyle>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}.$

Let us consider the following cases
Case 1 ( $\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}$ , $\Delta_{\tilde{i}_{t}\tilde{j}_{t}}>0$ ):

	$\displaystyle{\mathbb{P}}(T_{s}>t)$	$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}>\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\tilde{i}_{t}\tilde{j}_{t}}-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}-\Delta_{\text{min}}/2}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}}{2}\right)$
		$\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right),$

where $(a)$ is obtained by using Hoeffding’s inequality [27].
Case 2 ( $\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}$ , $\Delta_{\tilde{i}_{t}\tilde{j}_{t}}<0$ ):

	$\displaystyle{\mathbb{P}}(T_{s}>t)$	$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}<\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{\Delta_{\tilde{i}_{t}\tilde{j}_{t}}+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}+\Delta_{\text{min}}/2}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}}{2}\right)$
		$\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right).$

where $(a)$ is obtained by using Hoeffding’s inequality [27].
Case 3 ( $\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}$ , $\Delta_{\tilde{i}_{t}\tilde{j}_{t}}>0$ ):

	$\displaystyle{\mathbb{P}}(T_{s}>t)$	$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}<\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\tilde{i}_{t}\tilde{j}_{t}}+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}/2}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<-\Delta_{\text{min}}/2\right)$
		$\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right).$

where $(a)$ is obtained by using Hoeffding’s inequality [27].
Case 4 ( $\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}$ , $\Delta_{\tilde{i}_{t}\tilde{j}_{t}}<0$ ):

	$\displaystyle{\mathbb{P}}(T_{s}>t)$	$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}>\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{-\Delta_{\tilde{i}_{t}\tilde{j}_{t}}-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}-\Delta_{\text{min}}/2}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}}{2}\right)$
		$\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right),$

where $(a)$ is obtained by using Hoeffding’s inequality [27]. The result follows by combining these cases. ∎

Lemma 12.

Under the W-FTPL policy

{\mathbb{E}}[T_{s}]\leq 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{1}{M^{2\beta}}\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}.

Proof.

Let $T_{1}=\lceil\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}\rceil$ . It follows that

	$\displaystyle{\mathbb{E}}[T_{s}]$	$\displaystyle=\sum_{t=1}^{T}{\mathbb{P}}(T_{s}>t)$
		$\displaystyle=\sum_{t=1}^{T_{1}}{\mathbb{P}}(T_{s}>t)+\sum_{t=T_{1}+1}^{T}{\mathbb{P}}(T_{s}>t)$
		$\displaystyle\overset{(a)}{\leq}T_{1}+\sum_{t=T_{1}}^{T}\exp\left(-\frac{\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)$
		$\displaystyle\leq 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{1}{M^{2\beta(\log M)^{\delta}}}\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}},$

where $(a)$ is obtained by using Lemma 11. ∎

Proof of Theorem 2(c).

Under W-FTPL,

	$\displaystyle{\mathbb{E}}_{r}[\mathcal{C}^{\text{W-FTPL}}(T,r)]=$	$\displaystyle{\mathbb{E}}_{r}[T_{s}]+{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}(c\rho_{t}+g(\rho_{t})r_{t})\right]$
		$\displaystyle+M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right].$

By the using definition of regret we get,

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}(T)\leq$	$\displaystyle{\mathbb{E}}[T_{s}]+\sum_{t=1}^{T}(c\rho_{t}+\mu g(\rho_{t}))-\mu_{i^{\star}}$
		$\displaystyle+M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right].$

By Lemma 7, we get

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}(T)\leq$	$\displaystyle{\mathbb{E}}[T_{s}]+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}}$
		$\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right].$

Let ${\mathbb{E}}_{r}[\mathcal{C}_{f}]=M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right]$ , then

	$\displaystyle{\mathbb{E}}_{r}[\mathcal{C}_{f}]=$	$\displaystyle{\mathbb{E}}_{r}[\mathcal{C}_{f}\|T_{s}\leq T_{0}]{\mathbb{P}}(T_{s}\leq T_{0})$
		$\displaystyle+{\mathbb{E}}_{r}[\mathcal{C}_{f}\|T_{s}>T_{0}]{\mathbb{P}}(T_{s}>T_{0})$
	$\displaystyle\leq$	$\displaystyle{\mathbb{E}}[\mathcal{C}_{f}\|T_{s}=1]{\mathbb{P}}(T_{s}\leq T_{0})+{\mathbb{E}}[\mathcal{C}_{f}\|T_{s}=\lceil T_{0}\rceil]$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle M{\mathbb{P}}(T_{s}\leq T_{0}){\mathbb{E}}[\mathcal{C}_{F}^{\text{FTPL}}]$
		$\displaystyle+M\sum_{t=\lceil T_{0}\rceil}^{T}\left(\exp\left(-\frac{\Delta_{\text{min}}^{2}t}{16\alpha^{2}}\right)+e^{-\Delta_{\text{min}}^{2}t/2\kappa^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle M{\mathbb{P}}(T_{s}\leq T_{0})\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}$
		$\displaystyle+\frac{16\alpha^{2}}{\Delta_{\text{min}}^{2}}\exp\left(-\frac{\Delta_{\text{min}}^{2}T_{0}}{16\alpha^{2}}\right)$
		$\displaystyle+\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}\exp\left(-\frac{\Delta_{\text{min}}^{2}T_{0}}{2\kappa^{2}}\right).$

Here, $(a)$ is obtained by using inequality (14). By considering $T_{0}=\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}$ and using Lemma 10, 12 we get,

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}$	$\displaystyle(T)$
	$\displaystyle\leq$	$\displaystyle 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{1}{M^{2\beta(\log M)^{\delta}}}\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}$
		$\displaystyle+\frac{(16\alpha^{2}+4\kappa^{2})}{\Delta_{\text{min}}}$
		$\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+M\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}M^{2(\log M)^{\delta}}}\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}$
		$\displaystyle+\left(\frac{16\alpha^{2}M}{M^{\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}\Delta_{\text{min}}^{2}(\log M)^{\delta}}{16\alpha^{2}\Delta_{\text{max}}^{2}}}}+\frac{2M\kappa^{2}}{M^{\frac{(\sqrt{\beta}-1)^{2}\Delta_{\text{min}}^{2}(\log M)^{\delta}}{2\Delta_{\text{max}}^{2}}}}\right)$
		$\displaystyle\times\frac{1}{\Delta_{\text{min}}^{2}}.$

For large values of $M$ we have

	$\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}$	$\displaystyle(T)$
	$\displaystyle\leq$	$\displaystyle 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}$
		$\displaystyle+\frac{(16\alpha^{2}+4\kappa^{2})}{\Delta_{\text{min}}}$
		$\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)$
		$\displaystyle+\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}$
		$\displaystyle+\frac{16\alpha^{2}+2\kappa^{2}}{\Delta_{\text{min}}^{2}}$
	$\displaystyle\leq$	$\displaystyle 1+\beta\kappa^{2}(\log M)^{1+\delta}\left(\frac{4}{\Delta_{\text{min}^{2}}}+\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}^{2}}\Delta_{\text{max}}^{2}}\right)$
		$\displaystyle+(16\alpha^{2}+4\kappa^{2})\left(\frac{1}{\Delta_{\text{min}}}+\frac{1}{\Delta_{\text{min}^{2}}}\right)$
		$\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right).$

∎

6.9 Proof of Theorem 3(a)

Lemma 13.

The cost of the offline optimal policy is lower bounded as follows

\mathcal{C}^{\text{OPT-OFF}}(T,r)\geq\frac{R_{T}}{\kappa^{2}}(\min_{i}{c\alpha_{i}+g(\alpha_{i})\kappa}).

Proof.

The cost incurred by the optimal offline policy with $M>0$ is lower bounded by the cost incurred by the optimal offline policy when $M=0$ . For $M>0$ , there is an additional fetching cost. Therefore,

	$\displaystyle\mathcal{C}^{\text{OPT-OFF}}(T,r)$	$\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}+g(\alpha_{i})r_{t})$
		$\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}/\kappa+g(\alpha_{i})r_{t})$
		$\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}/\kappa+g(\alpha_{i}))\mathds{1}_{\{r_{t}\geq 1\}}$
		$\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}/\kappa+g(\alpha_{i}))r_{t}/\kappa$
		$\displaystyle\geq\frac{R_{T}}{\kappa^{2}}(\min_{i}{c\alpha_{i}+g(\alpha_{i})\kappa}).$

∎

Lemma 14.

The expected cost of FTPL policy for any request sequence $r$ for $\eta_{t}=\alpha\sqrt{t-1}$ , $\alpha>0$ , can be upper bounded as follows

	$\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]\leq\left((3+2M/c)\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}\right)R_{T}$
	$\displaystyle+(c+M)\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.$

Proof.

FTPL does not host any fraction the service for time $t+1$ if $c\alpha_{i}t+g(\alpha_{i})R_{t}+\eta_{t+1}\gamma_{i}>R_{t}+\eta_{t+1}\gamma_{1}$ , for all $i\neq 1$ . The probability of hosting any non zero fraction of service ( $p_{h,t}$ ) in time slot $t+1$ is given as

	$\displaystyle p_{h,t}$	$\displaystyle\leq\sum_{i=2}^{K}{\mathbb{P}}(c\alpha_{i}t+g(\alpha_{i})R_{t}+\eta_{t+1}\gamma_{i}<R_{t}+\eta_{t+1}\gamma_{1})$
		$\displaystyle=\sum_{i=2}^{K}{\mathbb{P}}\left(\frac{\gamma_{1}-\gamma_{i}}{\sqrt{2}}>\frac{c\alpha_{i}t+g(\alpha_{i})R_{t}-R_{t}}{\sqrt{2}\eta_{t+1}}\right)$
		$\displaystyle=\sum_{i=2}^{K}{\mathbb{P}}\left(\frac{\gamma_{1}-\gamma_{i}}{\sqrt{2}}>\frac{c\alpha_{i}t-(1-g(\alpha_{i}))R_{t}}{\sqrt{2}\eta_{t+1}}\right).$

Let $T^{\prime}=\frac{2R_{T}}{c}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}$ , if $t\geq T^{\prime}$ then $R_{t}\leq R_{T}\leq\frac{ct}{2}\min_{\alpha_{i}\neq 0}\frac{\alpha_{i}}{1-g(\alpha_{i})}$ . Therefore for $t\geq T^{\prime}$ , $R_{t}\leq\frac{c\alpha_{i}t}{2(1-g(\alpha_{i}))}$ for all $\alpha_{i}\neq 0$ and we get

$\displaystyle p_{h,t}$	$\displaystyle\leq\sum_{i=2}^{K}{\mathbb{P}}\left(\frac{\gamma_{1}-\gamma_{i}}{\sqrt{2}}>\frac{c\alpha_{i}t-c\alpha_{i}t/2}{\sqrt{2}\eta_{t+1}}\right)$
	$\displaystyle\leq\sum_{i=2}^{K}\exp\left(-\frac{c^{2}\alpha_{i}^{2}t^{2}}{16\eta_{t+1}^{2}}\right)$
	$\displaystyle=\sum_{i=2}^{K}\exp\left(-\frac{c^{2}\alpha_{i}^{2}t}{16\alpha^{2}}\right).$	(15)

Therefore, the expected cost under FTPL can be bounded as

	$\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}$	$\displaystyle(T,r)]$
	$\displaystyle\leq$	$\displaystyle\sum_{t=1}^{T}(c+M)p_{h,t}+r_{t}$
	$\displaystyle=$	$\displaystyle R_{T}+\sum_{t=1}^{T}(c+M)p_{h,t}$
	$\displaystyle\leq$	$\displaystyle R_{T}+(c+M)T^{\prime}+(c+M)\sum_{t=\lceil T^{\prime}\rceil}^{T}p_{h,t}$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle R_{T}+(c+M)T^{\prime}$
		$\displaystyle+(c+M)\sum_{t=\lceil T^{\prime}\rceil}^{T}\sum_{i=2}^{K}\exp\left(-\frac{c^{2}\alpha_{i}^{2}t}{16\alpha^{2}}\right)$
	$\displaystyle\leq$	$\displaystyle\left((3+2M/c)\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}\right)R_{T}$
		$\displaystyle+(c+M)\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.$

Here $(a)$ follows from (15). If $T<T^{\prime}$ , then the bound will only have the first term and even in that case the above bound holds. Therefore the above bound holds for any $T$ . ∎

Proof of Theorem 3(a).

By using Lemma 13, we have

	$\displaystyle\sigma_{A}^{\text{FTPL}}\leq$	$\displaystyle\sup_{r}\frac{{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]}{(\min_{i}{c\alpha_{i}+g(\alpha_{i})\kappa})R_{T}/\kappa^{2}}$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}$
		$\displaystyle+\sup_{r}\frac{\kappa^{2}(M+c)}{R_{T}\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}$
	$\displaystyle\leq$	$\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}$
		$\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.$

Here $(a)$ is obtained by using Lemma 14. ∎

6.10 Proof of Theorem 3(b)

Proof of Theorem 3(b).

By using Lemma 13, we have

	$\displaystyle\sigma^{\text{W-FTPL}}_{A}\leq$	$\displaystyle\sup_{r}\frac{R_{T_{s}}+{\mathbb{E}}[\mathcal{C}_{A}^{\text{FTPL}}(T,r)]}{R_{T}(\min_{i}\{c\alpha_{i}+g(\alpha_{i})\kappa\})/\kappa^{2}}$
	$\displaystyle\overset{(a)}{\leq}$	$\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}$
		$\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}$
		$\displaystyle+\frac{\kappa^{2}}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}.$

Here $(a)$ is obtained by using Theorem 3(a) and the fact that $R_{T_{s}}\leq R_{T}$ . ∎

6.11 Proof of Theorem 4(a)

Proof of Theorem 4(a).

	$\displaystyle\sigma^{\text{RR}}_{S}(T)$	$\displaystyle=\frac{{\mathbb{E}}[\mathcal{C}^{\text{RR}}(T,r)]}{\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}$
		$\displaystyle=\frac{\mathcal{R}^{\text{RR}}_{S}(T)+\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}{\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}$
		$\displaystyle\leq 1+\frac{\mathcal{R^{\text{RR}}}_{S}(T)}{\mu_{i^{\star}}T}.$

By using Theorem 2 we have,

\displaystyle\sigma^{\text{RR}}_{S}(T)\geq 1+\frac{M}{\mu_{i^{\star}}}\left(\frac{1}{\frac{M}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}+\frac{M}{c}+2}-\frac{1}{T}\right)pd.

For large $T$ we get,

\displaystyle\sigma^{\text{RR}}_{S}(T)>1.

∎

6.12 Proof of Theorem 4(b)

Proof of Theorem 4(b).

By using Theorem 2 we have

	$\displaystyle\sigma^{\text{FTPL}}_{S}(T)$	$\displaystyle\leq 1+\frac{1}{\mu_{i^{\star}}T}(16\alpha^{2}+1)\left(\frac{M}{\Delta_{\text{min}}^{2}}+\sum_{i\neq i^{\star}}\frac{1}{\Delta_{i}}\right)$
		$\displaystyle=1+\mathrm{O}(1/T).$

∎

6.13 Proof of Theorem 4(c)

Proof of Theorem 4(c).

By using Theorem 2 we have $\mathcal{R}^{\text{W-FTPL}}_{S}(T)$ as a constant and

\displaystyle\sigma^{\text{W-FTPL}}_{S}(T)\leq 1+\frac{\mathcal{R}^{\text{W-FTPL}}_{S}(T)}{T\mu_{i^{\star}}}.

Therefore we get,

\sigma^{\text{W-FTPL}}_{S}(T)\leq 1+\mathrm{O}(1/T).

∎

References

[1] R. S. Prakash, N. Karamchandani, V. Kavitha, S. Moharir, Partial service caching at the edge, in: 2020 18th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), IEEE, 2020, pp. 1–8.
[2] V. S. C. L. Narayana, M. Agarwala, R. S. Prakash, N. Karamchandani, S. Moharir, Online partial service hosting at the edge, CoRR abs/2103.00555 (2021). arXiv:2103.00555.
URL https://arxiv.org/abs/2103.00555
[3] V. C. L. Narayana, S. Moharir, N. Karamchandani, On renting edge resources for service hosting, ACM Transactions on Modeling and Performance Evaluation of Computing Systems 6 (2) (2021) 1–30.
[4] T. Zhao, I.-H. Hou, S. Wang, K. Chan, Red/led: An asymptotically optimal and scalable online algorithm for service caching at the edge, IEEE Journal on Selected Areas in Communications 36 (8) (2018) 1857–1870.
[5] T. L. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in applied mathematics 6 (1) (1985) 4–22.
[6] G. S. Paschos, A. Destounis, L. Vigneri, G. Iosifidis, Learning to cache with no regrets, in: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp. 235–243.
[7] R. Bhattacharjee, S. Banerjee, A. Sinha, Fundamental limits on the regret of online network-caching, Proceedings of the ACM on Measurement and Analysis of Computing Systems 4 (2) (2020) 1–31.
[8] C. Puliafito, E. Mingozzi, F. Longo, A. Puliafito, O. Rana, Fog computing for the internet of things: A survey, ACM Trans. Internet Technol. 19 (2) (2019) 18:1–18:41. doi:10.1145/3301443.
URL http://doi.acm.org/10.1145/3301443
[9] Y. Mao, C. You, J. Zhang, K. Huang, K. B. Letaief, A survey on mobile edge computing: The communication perspective, IEEE Communications Surveys & Tutorials 19 (4) (2017) 2322–2358.
[10] P. Mach, Z. Becvar, Mobile edge computing: A survey on architecture and computation offloading, IEEE Communications Surveys & Tutorials 19 (3) (2017) 1628–1656.
[11] S. Pasteris, S. Wang, M. Herbster, T. He, Service placement with provable guarantees in heterogeneous edge computing systems, in: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp. 514–522.
[12] S. Bi, L. Huang, Y. A. Zhang, Joint optimization of service caching placement and computation offloading in mobile edge computing systems, IEEE Transactions on Wireless Communications 19 (7) (2020) 4947–4963. doi:10.1109/TWC.2020.2988386.
[13] O. Ascigil, A. Tasiopoulos, T. K. Phan, V. Sourlas, I. Psaras, G. Pavlou, Resource provisioning and allocation in function-as-a-service edge-clouds, IEEE Transactions on Services Computing (2021) 1–1doi:10.1109/TSC.2021.3052139.
[14] H. Tan, S. H.-C. Jiang, Z. Han, M. Li, Asymptotically optimal online caching on multiple caches with relaying and bypassing, IEEE/ACM Transactions on Networking (2021).
[15] X. Xia, F. Chen, Q. He, J. Grundy, M. Abdelrazek, H. Jin, Online collaborative data caching in edge computing, IEEE Transactions on Parallel and Distributed Systems 32 (2) (2020) 281–294.
[16] T. S. Salem, G. Neglia, S. Ioannidis, No-regret caching via online mirror descent, in: ICC 2021-IEEE International Conference on Communications, IEEE, 2021, pp. 1–6.
[17] S. Mukhopadhyay, A. Sinha, Online caching with optimal switching regret, in: 2021 IEEE International Symposium on Information Theory (ISIT), IEEE, 2021, pp. 1546–1551.
[18] S. Fan, I.-H. Hou, L. B. Van Sy Mai, Online service caching and routing at the edge with unknown arrivals (2022).
[19] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, K. K. Leung, Dynamic service migration in mobile edge-clouds, in: 2015 IFIP Networking Conference (IFIP Networking), IEEE, 2015, pp. 1–9.
[20] G. Xiong, R. Singh, J. Li, Learning augmented index policy for optimal service placement at the network edge, arXiv preprint arXiv:2101.03641.
[21] D. Zeng, L. Gu, S. Pan, J. Cai, S. Guo, Resource management at the network edge: A deep reinforcement learning approach, IEEE Network 33 (3) (2019) 26–33.
[22] L. Chen, J. Xu, Budget-constrained edge service provisioning with demand estimation via bandit learning, arXiv preprint arXiv:1903.09080 (2019).
[23] W. He, D. He, Y. Huang, Y. Zhang, Y. Xu, G. Yun-feng, W. Zhang, Bandit learning-based service placement and resource allocation for mobile edge computing, in: 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, IEEE, pp. 1–6.
[24] A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. H. Epema, The grid workloads archive, Future Generation Computer Systems 24 (7) (2008) 672–686.
[25] A. Cohen, T. Hazan, Following the perturbed leader for online structured learning, in: International Conference on Machine Learning, PMLR, 2015, pp. 1034–1042.
[26] J. Abernethy, C. Lee, A. Sinha, A. Tewari, Online linear optimization via smoothing, in: Conference on Learning Theory, PMLR, 2014, pp. 807–823.
[27] W. Hoeffding, Probability inequalities for sums of bounded random variables, in: The collected works of Wassily Hoeffding, Springer, 1994, pp. 409–426.
[28] D. Berend, A. Kontorovich, A sharp estimate of the binomial mean absolute deviation with applications, Statistics & Probability Letters 83 (4) (2013) 1254–1259.
[29] H. Robbins, A remark on stirling’s formula, The American mathematical monthly 62 (1) (1955) 26–29.

7 Appendix

Proof of Lemma 1.

For i.i.d. stochastic arrivals, we have

	$\displaystyle\mathcal{R}^{\mathcal{P}}_{S}$	$\displaystyle(T)$
		$\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}$
		$\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{{\mathbb{E}}_{r}[c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}]\}$
		$\displaystyle\overset{(a)}{\leq}{\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-{\mathbb{E}}_{r}[\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\}]$
		$\displaystyle={\mathbb{E}}_{r}[\mathcal{R}_{A}(T,r)]\leq\sup_{r}\left(\mathcal{R}^{\mathcal{P}}_{A}(T,r)\right)=\mathcal{R}_{A}^{\mathcal{P}}(T),$

where $(a)$ is obtained by Jensen’s inequality. ∎

Lemma 15.

If $X$ is a random variable and ${\mathbb{E}}[f_{1}(X)]={\mathbb{E}}[f_{2}(X)]=m$ , then

m-{\mathbb{E}}[\min\{f_{1}(X),f_{2}(X)\}]=\frac{1}{2}{\mathbb{E}}[|f_{1}(X)-f_{2}(X)|].

Proof.

	$\displaystyle\min\{X,Y\}=\frac{1}{2}(X+Y)-\frac{1}{2}(\|X-Y\|)$
	$\displaystyle\implies m-\min\{f_{1}(X),f_{2}(X)\}=m-\frac{1}{2}(f_{1}(X)+f_{2}(X))$
	$\displaystyle+\frac{1}{2}(\|f_{1}(X)-f_{2}(X)\|)$
	$\displaystyle\implies m-{\mathbb{E}}[\min\{f_{1}(X),f_{2}(X)\}]=\frac{1}{2}{\mathbb{E}}[\|f_{1}(X)-f_{2}(X)\|].$

∎

Lemma 16.

If $X\sim\text{Bin}(T,p)$ then

	$\displaystyle\frac{1}{2}{\mathbb{E}}[\|X-Tp\|]\geq$	$\displaystyle\frac{\sqrt{Tp(1-p)}}{e\sqrt{2\pi}}\left[1+\frac{1}{12T+1}-\frac{1}{12\lfloor Tp\rfloor}\right.$
		$\displaystyle\left.-\frac{1}{12(T-\lfloor Tp\rfloor-1)}\right].$

Proof.

By using equation (1) from [28] we get

	$\displaystyle{\mathbb{E}}[\|X$	$\displaystyle-Tp\|]$
	$\displaystyle=$	$\displaystyle 2(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}(\lfloor Tp\rfloor+1){T\choose\lfloor Tp+1\rfloor}$
	$\displaystyle=$	$\displaystyle 2(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}\frac{T!}{\lfloor Tp\rfloor!(T-\lfloor Tp\rfloor-1)!}$
	$\displaystyle\overset{(a)}{\geq}$	$\displaystyle 2(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}\frac{1}{e\sqrt{2\pi}}\frac{T^{T+\frac{1}{2}}e^{\frac{1}{12T+1}}}{\lfloor Tp\rfloor^{\lfloor Tp\rfloor+\frac{1}{2}}e^{\frac{1}{12\lfloor Tp\rfloor}}}$
		$\displaystyle\times\frac{1}{(T-\lfloor Tp\rfloor-1)^{T-\lfloor Tp\rfloor-1+\frac{1}{2}}e^{\frac{1}{12(T-\lfloor Tp\rfloor-1)}}}$
	$\displaystyle\overset{(b)}{\geq}$	$\displaystyle\frac{2}{e\sqrt{2\pi}}(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}\frac{T^{T+\frac{1}{2}}e^{\frac{1}{12T+1}}}{(Tp)^{\lfloor Tp\rfloor+\frac{1}{2}}e^{\frac{1}{12\lfloor Tp\rfloor}}}$
		$\displaystyle\times\frac{1}{(T-Tp)^{T-\lfloor Tp\rfloor-\frac{1}{2}}e^{\frac{1}{12(T-\lfloor Tp\rfloor-1)}}}$
	$\displaystyle=$	$\displaystyle\frac{2\sqrt{Tp(1-p)}}{e\sqrt{2\pi}}e^{\frac{1}{12T+1}-\frac{1}{12\lfloor Tp\rfloor}-\frac{1}{12(T-\lfloor Tp\rfloor-1)}}$
	$\displaystyle\overset{(c)}{\geq}$	$\displaystyle\frac{2\sqrt{Tp(1-p)}}{e\sqrt{2\pi}}\left[1+\frac{1}{12T+1}-\frac{1}{12\lfloor Tp\rfloor}\right.$
		$\displaystyle\left.-\frac{1}{12(T-\lfloor Tp\rfloor-1)}\right],$

where $(a)$ follows by using the result in [29] which is $\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n}e^{\frac{1}{12n+1}}<n!<\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n}e^{\frac{1}{12n}}$ . $(b)$ follows by using the fact that $x-1\leq\lfloor x\rfloor\leq x$ . $(c)$ follows from the fact $e^{x}\geq 1+x$ . ∎

On the Regret of Online Edge Service Hosting 11footnotemark: 1

Abstract

1 Introduction

1.1 Our Contributions

1.2 Related work

2 System Setup

Assumption 1.

Assumption 2.

Assumption 3.

Assumption 4.

Lemma 1.

3 Policies

4 Main Results and Discussion

4.1 Regret Analysis

Theorem 1 (Adversarial Arrivals).

Theorem 2 (Stochastic Arrivals).

Corollary 1 (Stochastic Arrivals).

Remark 1.

4.2 Competitive Ratio Analysis

Theorem 3 (Adversarial Arrivals).

Theorem 4 (Stochastic Arrivals).

5 Simulations

5.1 Synthetic arrivals

5.2 Stochastic arrivals

5.3 Trace driven arrivals

6 Proof of theorems

6.1 Proof of Theorem 1(a)

Proof of Theorem 1(a).

6.2 Proof of Theorem 1(b)

Lemma 2.

Proof.

Lemma 3.

Proof.

Proof of Theorem 1(b).

6.3 Proof of Theorem 1(c)

Lemma 4.

Proof.

Lemma 5.

Proof.

Proof of Theorem 1(c).

6.4 Proof of Theorem 1(d)

Proof of Theorem 1(d).

6.5 Proof of Theorem 1(e)

Proof of Theorem 1(e).

6.6 Proof of Theorem 2(a)

Proof of Theorem 2(a).

6.7 Proof of Theorem 2(b)

Lemma 6.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Proof of Theorem 2(b).

6.8 Proof of Theorem 2(c)

Lemma 9.

Proof.

Lemma 10.

Proof.

Lemma 11.

Proof.

Lemma 12.

Proof.

Proof of Theorem 2(c).

6.9 Proof of Theorem 3(a)

Lemma 13.

Proof.

Lemma 14.

Proof.

Proof of Theorem 3(a).

6.10 Proof of Theorem 3(b)

Proof of Theorem 3(b).

6.11 Proof of Theorem 4(a)

Proof of Theorem 4(a).

6.12 Proof of Theorem 4(b)

Proof of Theorem 4(b).

6.13 Proof of Theorem 4(c)

Proof of Theorem 4(c).

References

7 Appendix

On the Regret of Online Edge Service Hosting ¹¹footnotemark: 1