This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Regret of Online Edge Service Hosting 11footnotemark: 1

R Sri Prakash [email protected];[email protected] Nikhil Karamchandani [email protected] Sharayu Moharir [email protected]
Abstract

We consider the problem of service hosting where a service provider can dynamically rent edge resources via short term contracts to ensure better quality of service to its customers. The service can also be partially hosted at the edge, in which case, customers’ requests can be partially served at the edge. The total cost incurred by the system is modeled as a combination of the rent cost, the service cost incurred due to latency in serving customers, and the fetch cost incurred as a result of the bandwidth used to fetch the code/databases of the service from the cloud servers to host the service at the edge. In this paper, we compare multiple hosting policies with regret as a metric, defined as the difference in the cost incurred by the policy and the optimal policy over some time horizon TT. In particular we consider the Retro Renting (RR) and Follow The Perturbed Leader (FTPL) policies proposed in the literature and provide performance guarantees on the regret of these policies. We show that under i.i.d stochastic arrivals, RR policy has linear regret while FTPL policy has constant regret. Next, we propose a variant of FTPL, namely Wait then FTPL (W-FTPL), which also has constant regret while demonstrating much better dependence on the fetch cost. We also show that under adversarial arrivals, RR policy has linear regret while both FTPL and W-FTPL have regret O(T)\mathrm{O}(\sqrt{T}) which is order-optimal.

\affiliation

organization=Indian Institute of Technology Bombay, addressline=Powai, postcode=400076, city=Mumbai, country=India

1 Introduction

Software as a Service (SaaS) instances like online navigation platforms, Video-on-Demand services, etc., have stringent latency constraints in order to provide a good quality of experience to their customers. While most SaaSs use cloud resources, low latency necessitates the use of storage/computational resources at the edge, i.e., close to the end-user. A service is said to be hosted at the edge if the code and databases needed to serve user queries are stored on the edge servers and requests can be served at the edge. In this work, the service can also be partially hosted at the edge, in which case, customers’ requests can be partially served at the edge [1, 2]. For instance, consider a SaaS application which provides access to news articles on demand. A typical news article includes some text, some images and possibly some embedded videos. One way to partially host such a service is to store the text corresponding to the news articles at the edge and store the images/videos in the cloud. Each user request will then be served partially by the edge.

We consider the setting where third-party resources can be rented via short-term contracts to host the service, and the edge hosting status of the SaaS can be dynamically changed over time. If the service is not hosted at the edge, it can be fetched from the cloud servers by incurring a fetch cost. The performance of a hosting policy is a function of the rent cost, the fetch cost, and the quality of experience of the users. We refer to the algorithmic challenge of determining what fraction of the service to host at the edge over time as the service hosting problem. Note that we study the service hosting problem from the perspective of a specific SaaS provider which can rent third-party edge resources to improve their customers’ quality of experience by reducing the latency of their service.

Novel online service hosting policies with provable performance guarantees have been proposed in [3, 4]. The metric of interest in [3, 4] is the competitive ratio, defined as the ratio of the cost incurred by an online policy to the cost incurred by an optimal offline policy for the same request arrival sequence. Since the competitive ratio is multiplicative by definition, the absolute value of the difference in the cost incurred by an online policy and the optimal policy can be large even though the competitive ratio of the online policy is close to one. This motivates studying the performance of candidate policies in terms of regret, defined as the difference in the cost incurred by the policy and the optimal policy. Regret is a widely used metric in online learning [5], including recently for the caching problem [6, 7], which is closely related to the service hosting problem. In this work, one of our goals is to design hosting policies with provable guarantees in terms of the regret. Another key dimension in the performance guarantees of online policies is the assumption made on the request arrival process. Commonly studied settings include the stochastic setting and the adversarial setting, and we provide performance guarantees for both settings.

1.1 Our Contributions

We propose a variant of the FTPL policy called Wait then Follow the Perturbed Leader (W-FTPL). W-FTPL is a randomized policy that takes into account the fetch cost in its decision-making. More specifically, W-FTPL does not fetch the service for an initial wait period which depends on the request arrivals and is an increasing function of the fetch cost. Following the wait period, W-FTPL mimics the FTPL policy.

For i.i.d. stochastic request arrivals and the regret metric, we show that RR is sub-optimal while FTPL and W-FTPL are both order-optimal with respect to the time horizon. While the regret of FTPL can increase linearly with the fetch cost, the regret of W-FTPL increases at most polylogarithmically. The improved performance of W-FTPL over FTPL is a consequence of the fact that W-FTPL avoids most of the fetches made by FTPL in the initial time-slots and by the end of the wait period, its estimate of the arrival rate is accurate enough to avoid multiple fetches. Further, we characterize the competitive ratio of FTPL and W-FTPL for the setting where a finite number of partial hosting levels are allowed. We compare these results with the competitive ratio of RR when a single partial hosting level is allowed [2] to conclude that FTPL and W-FTPL outperform RR with respect to the competitive ratio.

For the adversarial setting, we first characterize a fundamental lower bound on the regret of any online hosting policy. In terms of regret, we then show that RR is strictly suboptimal, while FTPL and W-FTPL have order-optimal performance with respect to time. We show that the competitive ratios of FTPL and W-FTPL are upper bounded by a constant (with respect to time) for any finite number of partial hosting levels. This is a more general setting than the one considered in [2], where the competitive ratio of a variant of RR is characterized when only one partial hosting level is permitted.

1.2 Related work

Several emerging applications such as Augmented / Virtual Reality (AR/VR), online gaming, the Internet of Things (IoT), and autonomous driving have very stringent latency requirements. In addition, many of these applications are also compute-and-storage heavy. This necessitates the migration of some of the storage and computation requirements of these services from remote cloud servers to the edge. To enable such offloading, several edge computing architectures have been proposed in the literature and their performance has been analyzed extensively, see for example [8, 9, 10] and references therein.

The service hosting problem has received a lot of attention recently, and various settings have been studied. One approach has been to pose the design problem of where to host which service as a one-shot large-scale optimization problem, see for example [11, 12, 13]. Our work differs from this line of work in that we consider an online setting where we adaptively decide when (and what fraction of) service to host at any time, depending on the varying number of requests seen thus far.

There have been recent works [4, 14] which consider the online service hosting problem, and design adaptive schemes whose performance is characterized in terms of the competitive ratio with respect to an oracle which known the entire request sequence apriori. Our work differs from these in a couple of ways. Firstly, the above mentioned works study the problem from the perspective of an edge resource provider who has to decide which amongst a collection of services to host at each time on a storage-constrained edge server. On the other hand, similar to some other works [15, 3], we study the problem from the perspective of a service provider which needs to decide when to host its application at the edge so as to minimize the overall cost. Secondly, most of the works mentioned above do not consider partial hosting of the service, which is allowed in our framework. While [2] does study partial hosting and proposed the Retro-Renting (RR) policy which is shown to have a constant competitive ratio, it only allows one level of partial hosting of the service. On the other hand, we consider a much more general setting where multiple partial service hosting levels are allowed and are able to show that both the FTPL and its variant W-FTPL are able to achieve a constant competitive ratio.

While the works mentioned above study competitive ratio as a performance metric, another popular measure of performance for online algorithms is the regret which considers the difference in the cost incurred by an online policy and the optimal (static) policy. In particular, for the closely related caching problem, several policies inspired by the recent advances in online convex optimization have been proposed, including Online Gradient Ascent [6], Online Mirror Descent [16], and Follow the Perturbed Leader (FTPL) [17, 7]. In particular, FTPL has been shown to have order-optimal regret for the caching problem in the adversarial setting [17, 7]. In this work, we study regret for the RR and FTPL policies for the service hosting problem under both the stochastic and adversarial settings. Since RR is a deterministic policy, its regret performance in the adversarial setting is poor. While FTPL does achieve order-optimal regret (with respect to time), one limitation is that it makes hosting decisions agnostic of the fetch cost. As a result, in some cases, FTPL is prone to fetching and evicting the service multiple times in the initial time-slots when its estimate of the request arrival rate is noisy, thus leading to poor performance. This motivated our design and regret analysis of W-FTPL, a variant of FTPL which takes into account the fetch cost. One recent work which also considers regret as a performance metric is [18], which studied the problem of joint online service hosting and routing and proposed a two time-scale algorithm based on online gradient descent with provably order-optimal regret under adversarial requests.

Finally, there are several other approaches which have been used to address the online service hosting problem, including Markov Decision Processes [19, 20, 1], Reinforcement Learning [21], and Multi-Armed Bandits [22, 23].

2 System Setup

We consider a system consisting of a back-end server and an edge-server to serve customers of a service. The back-end server always hosts the service and can serve any requests that are routed to it. In this work, we allow the service to be partially hosted at the edge-server. When the service is partially hosted at the edge, requests are partially served at the edge and partially at the back-end server. Further, the fraction of service hosted at the edge can be changed over time. If the service is not hosted or partially hosted at the edge, parts of the service can be fetched from the back-end server to host at the edge.

Sequence of events in each time-slot: We consider a time-slotted system. In each time-slot, we first make the service hosting decision for that time-slot. Following this, requests may arrive and are served at the edge and/or the back-end server.

Request arrivals: We consider two types of arrival processes in this paper.

  • -

    Stochastic arrivals: In this case, request arrivals are i.i.d. across time-slots with mean μ\mu.

  • -

    Adversarial arrivals: In this case, the request arrival sequence is generated by an oblivious adversary222 The entire request sequence is assumed to be fixed apriori..

Let rtr_{t} denote the number of request arrivals in time-slot tt. We consider bounded requests, specifically, 0rtκ0\leq r_{t}\leq\kappa, and let Rt=i=1triR_{t}=\sum_{i=1}^{t}r_{i} denote the cumulative requests observed by the end of time-slot tt. In both cases, we assume that there are at most κ\kappa arrivals per slot.

Costs: We model three types of costs.

  1. -

    Rent cost (𝒞R,t𝒫\mathcal{C}^{\mathcal{P}}_{R,t}): The system incurs a cost of cc units per time-slot to host the entire service at the edge. For hosting ff fraction of service, the system incurs a cost of cfcf units per time-slot.

  2. -

    Service cost (𝒞S,t𝒫\mathcal{C}^{\mathcal{P}}_{S,t}): This is the cost incurred to use the back-end server to serve (parts of) requests. If ff fraction of service is hosted at the edge, the system incurs a service cost of g(f)g(f) units per request, where g(f)g(f) is a decreasing function of ff. We fix g(0)=1g(0)=1, i.e., service cost is one unit when the service is not hosted at the edge.

  3. -

    Fetch cost (𝒞F,t𝒫\mathcal{C}^{\mathcal{P}}_{F,t}): The system incurs a cost of M>1M>1 units for each fetch of the entire service from the back-end server to host on the edge-server. For fetching ff fraction of service, the system incurs a cost of MfMf units.

Let ρt𝒫\rho_{t}^{\mathcal{P}} denote the edge hosting status of the service in time slot tt under policy 𝒫\mathcal{P}. We consider the setting where ρt𝒫{α1,α2,,αK1,αK}\rho_{t}^{\mathcal{P}}\in\{\alpha_{1},\alpha_{2},\cdots,\alpha_{K-1},\alpha_{K}\} and ρt𝒫=αi\rho_{t}^{\mathcal{P}}=\alpha_{i} implies that αi\alpha_{i} fraction of the service is hosted at the edge-server in time-slot tt. Note that αi[0,1]\alpha_{i}\in[0,1] and in particular we consider α1=0\alpha_{1}=0, αK=1\alpha_{K}=1 and αi<αj\alpha_{i}<\alpha_{j} for i<ji<j. It follows that ρt𝒫=1\rho_{t}^{\mathcal{P}}=1 implies that the entire service is hosted at the edge in time-slot tt, and ρt𝒫=0\rho_{t}^{\mathcal{P}}=0 implies that the service is not hosted at the edge in time-slot tt.

We make some assumptions about the various system parameters.

Assumption 1.

cκc\leq\kappa.

This assumption is motivated by the fact that if κ<c\kappa<c, it is strictly sub-optimal to host the entire service at the edge, irrespective of the number of arrivals in a time-slot.

Assumption 2.

For a fraction of service αi\alpha_{i}, 1iK1\leq i\leq K, αi+g(αi)1\alpha_{i}+g(\alpha_{i})\leq 1.

In [1, 2], it is shown that the benefits of partial hosting are limited to the setting where αi+g(αi)1\alpha_{i}+g(\alpha_{i})\leq 1. Motivated by this, we consider the case where αi+g(αi)1\alpha_{i}+g(\alpha_{i})\leq 1 for all candidate values of αi\alpha_{i}.

Assumption 3.

For any fraction of service αi\alpha_{i}, 1iK1\leq i\leq K, cαi+g(αi)rtκc\alpha_{i}+g(\alpha_{i})r_{t}\leq\kappa

Assumption 3 follows from Assumptions 1 and 2

Assumption 4.

For stochastic arrivals, μ>0\mu>0 and for adversarial arrivals, RT1R_{T}\geq 1.

This assumption eliminates degenerate cases where there are no request arrivals in the time-horizon of interest.

The total cost incurred in time-slot tt by policy 𝒫\mathcal{P}, denoted by 𝒞t𝒫(rt)\mathcal{C}^{\mathcal{P}}_{t}(r_{t}), is the sum of the rent, service, and fetch costs. It follows that,

𝒞t𝒫(rt)\displaystyle\mathcal{C}^{\mathcal{P}}_{t}(r_{t}) =cρt𝒫+g(ρt𝒫)rt+M(ρt𝒫ρt1𝒫)+.\displaystyle=c\rho_{t}^{\mathcal{P}}+g(\rho_{t}^{\mathcal{P}})r_{t}+M(\rho_{t}^{\mathcal{P}}-\rho_{t-1}^{\mathcal{P}})^{+}. (1)

Let r={rt}t1r=\{r_{t}\}_{t\geq 1} denote the request arrival sequence and 𝒞𝒫(T,r)\mathcal{C}^{\mathcal{P}}(T,r) denote the cumulative cost incurred by policy 𝒫\mathcal{P} in time-slots 1 to TT. It follows that,

𝒞𝒫(T,r)=t=1T𝒞t𝒫(rt).\mathcal{C}^{\mathcal{P}}(T,r)=\sum_{t=1}^{T}\mathcal{C}^{\mathcal{P}}_{t}(r_{t}).

Performance metrics: We consider two performance metrics, namely regret and competitive ratio.

For i.i.d. stochastic arrivals, the regret of a policy 𝒫\mathcal{P}, denoted by S𝒫(T)\mathcal{R}^{\mathcal{P}}_{S}(T), is defined as the difference in the total expected cost incurred by the policy and the oracle static hosting policy. The oracle static hosting policy makes a hosting decision at t=1t=1 using the knowledge of the statistics of the request arrival process but not the entire sample-path. For ease of notation, we define μ=𝔼[rt]\mu={\mathbb{E}}[r_{t}], μi=cαi+g(αi)μ\mu_{i}=c\alpha_{i}+g(\alpha_{i})\mu. It follows that the expected cost incurred by the optimal static hosting policy in time-slots 1 to TT is mini{cαiT+g(αi)μT+Mαi}\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}, and therefore,

S𝒫(T)\displaystyle\mathcal{R}^{\mathcal{P}}_{S}(T) =𝔼𝒫,r[𝒞𝒫(T,r)]mini{cαiT+g(αi)μT+Mαi}\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}
=𝔼𝒫,r[𝒞𝒫(T,r)]mini{μiT+Mαi},\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{\mu_{i}T+M\alpha_{i}\},
𝔼𝒫,r[𝒞𝒫(T,r)]mini{μiT},\displaystyle\leq{\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{\mu_{i}T\}, (2)

where 𝔼𝒫,r{\mathbb{E}}_{\mathcal{P},r} represents the expectation over the randomization in policy 𝒫\mathcal{P} and the request arrival sequences rr.

For i.i.d. stochastic arrivals, the competitive ratio of a policy 𝒫\mathcal{P}, denoted by σS𝒫(T)\sigma^{\mathcal{P}}_{S}(T), is defined as the ratio of the expected total cost incurred by the policy and expected total cost incurred by the optimal static hosting policy as discussed above. It follows that,

σS𝒫(T)=𝔼𝒫,r[𝒞𝒫(T,r)]mini{cαiT+g(αi)μT+Mαi}.\displaystyle\sigma_{S}^{\mathcal{P}}(T)=\frac{{\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]}{\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}.

For adversarial arrivals, the regret of a policy 𝒫\mathcal{P}, denoted by A𝒫(T)\mathcal{R}^{\mathcal{P}}_{A}(T), is defined as the worst case difference in the total expected cost incurred by the policy and the optimal static hosting policy. The expectation is taken over the randomization in policy 𝒫\mathcal{P}. It follows that for any request sequence rr, the total cost incurred by the optimal static hosting policy in time-slots 1 to TT is mini{cαiT+g(αi)RT+Mαi}\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\}, and therefore,

A𝒫(T,r)\displaystyle\mathcal{R}^{\mathcal{P}}_{A}(T,r) =𝔼𝒫[𝒞𝒫(T,r)]mini{cαiT+g(αi)RT+Mαi},\displaystyle={\mathbb{E}}_{\mathcal{P}}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\},
A𝒫(T)\displaystyle\mathcal{R}^{\mathcal{P}}_{A}(T) =supr(A𝒫(T,r)).\displaystyle=\sup_{r}\left(\mathcal{R}^{\mathcal{P}}_{A}(T,r)\right).

We have the following relation between the regret induced by a policy under i.i.d stochastic arrivals and adversarial arrivals.

Lemma 1.

For any online policy 𝒫\mathcal{P} and stochastic arrivals we have, S𝒫(T)A𝒫(T)\mathcal{R}_{S}^{\mathcal{P}}(T)\leq\mathcal{R}_{A}^{\mathcal{P}}(T).

The proof can be found in Section 7.

For adversarial arrivals, the competitive ratio of a policy 𝒫\mathcal{P}, denoted by σA𝒫(T)\sigma^{\mathcal{P}}_{A}(T), is defined as the worst case ratio of the expected total cost incurred by the policy and the cost incurred by the offline optimal policy. The optimal offline policy can change the hosting status over time using the knowledge of the entire request process a priori. Let the cost incurred by the optimal offline hosting policy in time-slots 1 to TT be denoted by 𝒞OFF-OPT(T,r)\mathcal{C}^{\text{OFF-OPT}}(T,r). It follows that,

σA𝒫(T)=supr𝔼𝒫[𝒞𝒫(T,r)]𝒞OFF-OPT(T,r).\displaystyle\sigma^{\mathcal{P}}_{A}(T)=\sup_{r}\frac{{\mathbb{E}}_{\mathcal{P}}[\mathcal{C}^{\mathcal{P}}(T,r)]}{\mathcal{C}^{\text{OFF-OPT}}(T,r)}.

Goal: The goal is to design online hosting policies with provable performance guarantees with respect to regret and competitive ratio.

Notations: We define some notations which will be used in the rest of this paper. Let 𝒔=[0,α2,α3,,1]\bm{s}=[0,\alpha_{2},\alpha_{3},\cdots,1]^{\prime}, 𝒇=[1,g(α2),g(α3),,0]\bm{f}=[1,g(\alpha_{2}),g(\alpha_{3}),\cdots,0]^{\prime}, and 𝝆t𝒫𝒳\bm{\rho}_{t}^{\mathcal{P}}\in\mathcal{X}, where 𝒳\mathcal{X} is the collection of all possible one hot vectors in {0,1}K\{0,1\}^{K} and thus |𝒳|=K|\mathcal{X}|=K. The position of 1 in the one hot vector 𝝆t𝒫\bm{\rho}_{t}^{\mathcal{P}} represents the level of service hosted at the edge in slot tt, that is ρt𝒫=𝝆t𝒫,𝐬\rho_{t}^{\mathcal{P}}=\langle\bm{\rho}_{t}^{\mathcal{P}},\mathbf{s}\rangle. The total cost in slot tt as given in (1) can be rewritten using the above notations as follows,

𝒞t𝒫(rt)\displaystyle\mathcal{C}_{t}^{\mathcal{P}}(r_{t}) =c𝝆t𝒫,𝒔+rt𝝆t𝒫,𝒇+M(ρt𝒫ρt1𝒫)+,\displaystyle=c\langle\bm{\rho}_{t}^{\mathcal{P}},\bm{s}\rangle+r_{t}\langle\bm{\rho}_{t}^{\mathcal{P}},\bm{f}\rangle+M(\rho_{t}^{\mathcal{P}}-\rho_{t-1}^{\mathcal{P}})^{+},
=𝝆t𝒫,𝜽t+M(ρt𝒫ρt1𝒫)+,\displaystyle=\langle\bm{\rho}_{t}^{\mathcal{P}},\bm{\theta}_{t}\rangle+M(\rho_{t}^{\mathcal{P}}-\rho_{t-1}^{\mathcal{P}})^{+},

where 𝜽t=c𝒔+rt𝒇\bm{\theta}_{t}=c\bm{s}+r_{t}\bm{f}.

For ease of notation, for stochastic arrivals we define μ=𝔼[rt]\mu={\mathbb{E}}[r_{t}], μi=cαi+g(αi)μ\mu_{i}=c\alpha_{i}+g(\alpha_{i})\mu, Δij=μiμj\Delta_{ij}=\mu_{i}-\mu_{j}, i=argminiμii^{\star}=\operatorname*{arg\,min}_{i}\mu_{i} , Δi=μiμi\Delta_{i}=\mu_{i}-\mu_{i^{\star}}, Δmin=miniΔi\Delta_{\text{min}}=\min_{i}\Delta_{i}, Δmax=maxiΔi\Delta_{\text{max}}=\max_{i}\Delta_{i}.

We summarize important notations used in this paper in Table 1.

Symbol Description
cc Rent cost for hosting complete service in a slot
MM Fetch cost
KK Number of hosting levels allowed
tt Time index
ρt𝒫\rho_{t}^{\mathcal{P}} Fraction of service hosted in time slot tt under policy 𝒫\mathcal{P}
κ\kappa Maximum number of arrivals in a slot
αi\alpha_{i} Fraction of service
g(f)g(f) Fraction of request forwarded if ff fraction is hosted
rtr_{t} Number of requests in time slot tt
RtR_{t} Cumulative number of requests up to time slot tt
μ\mu 𝔼[rt]{\mathbb{E}}[r_{t}]
TT Time horizon
𝒞𝒫(T,r)\mathcal{C}^{\mathcal{P}}(T,r) Cost incurred till TT under policy 𝒫\mathcal{P} for request sequence rr
S𝒫(T)\mathcal{R}_{S}^{\mathcal{P}}(T) Regret under stochastic arrival setting
σS𝒫(T)\sigma_{S}^{\mathcal{P}}(T) Competitive ratio under stochastic arrival setting
A𝒫(T)\mathcal{R}^{\mathcal{P}}_{A}(T) Regret under adversarial arrival setting
σA𝒫(T)\sigma_{A}^{\mathcal{P}}(T) Competitive ratio under adversarial arrival setting
μi\mu_{i} cαi+g(αi)μc\alpha_{i}+g(\alpha_{i})\mu
Δij\Delta_{ij} μiμj\mu_{i}-\mu_{j}
ii^{\star} argminiμi\operatorname*{arg\,min}_{i}\mu_{i}
Δi\Delta_{i} μiμi\mu_{i}-\mu_{i^{\star}}
Δmin\Delta_{\text{min}} miniΔi\min_{i}\Delta_{i}
Δmax\Delta_{\text{max}} maxiΔi\max_{i}\Delta_{i}
𝒔\bm{s} [0,α2,α3,,1][0,\alpha_{2},\alpha_{3},\cdots,1]^{\prime}
𝒇\bm{f} [1,g(α2),g(α3),,0][1,g(\alpha_{2}),g(\alpha_{3}),\cdots,0]^{\prime}
𝜽t\bm{\theta}_{t} c𝒔+rt𝒇c\bm{s}+r_{t}\bm{f}
𝚯t\bm{\Theta}_{t} i=1t1𝜽i\sum_{i=1}^{t-1}\bm{\theta}_{i}
ηt\eta_{t} Learning rate
α\alpha Hyper parameter in FTPL, W-FTPL policies
β\beta, δ\delta Hyperparameters in W-FTPL policy
𝜸\bm{\gamma} IID Gaussian vector
𝒳\mathcal{X} Collection of one hot vectors in {0,1}K\{0,1\}^{K}
TsT_{s} Waiting time for W-FTPL policy
Table 1: Notations

3 Policies

Our Policy: Our policy called Wait then Follow the Perturbed Leader (W-FTPL), is a variant of the popular FTPL policy. A formal definition of FTPL is given in Algorithm 1. FTPL is a randomized policy and is known to perform well for the traditional caching problem, in fact achieving order-wise optimal regret for adversarial arrivals [7]. Under FTPL, in any time slot tt, for each ii, FTPL considers the cumulative cost that would have been incurred had the system used αi\alpha_{i} hosting fraction in the entire duration [1,t1][1,t-1], and then perturbs this by an appropriately scaled version of a sample of a standard Gaussian random variable. FTPL then hosts the fraction of service for which the perturbed cost is minimum in that slot.

Input: cc, {ηt}t1\{\eta_{t}\}_{t\geq 1}, {rt}t1\{r_{t}\}_{t\geq 1}
1 Set 𝚯1𝟎\bm{\Theta}_{1}\leftarrow\bm{0}
2 Sample 𝜸𝒩(𝟎,𝑰)\bm{\gamma}\sim\mathcal{N}(\bm{0},\bm{I})
3 for t=1t=1 to TT do
4       host 𝝆targmin𝝆𝒳𝝆,𝚯t+ηt𝜸\bm{\rho}_{t}\in\operatorname*{arg\,min}_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{t}+\eta_{t}\bm{\gamma}\rangle
5       𝜽t=c𝒔+rt𝒇\bm{\theta}_{t}=c\bm{s}+r_{t}\bm{f}
6       Update 𝚯t+1=𝚯t+𝜽t\bm{\Theta}_{t+1}=\bm{\Theta}_{t}+\bm{\theta}_{t}
Algorithm 1 Follow The Perturbed Leader (FTPL)

We propose a policy called Wait then Follow The Perturbed Leader (W-FTPL). The key idea behind the W-FTPL policy is to not host the service for an initial wait period. This is to reduce the number of fetches made initially when the estimate of the arrivals is noisy. The duration of this wait period is not fixed apriori and is a function of the arrival pattern seen till that time. Following the wait period, W-FTPL mimics the FTPL policy. Refer to Algorithm 2 for a formal definition of W-FTPL. Let TsT_{s} be the time slot, after which W-FTPL starts acting as FTPL. Formally we define Ts=min{t:t<(maxij(Θt+1,iΘt+1,j))2κ2β(logM)1+δ}T_{s}=\min\{t:t<\frac{(\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j}))^{2}}{\kappa^{2}\beta(\log M)^{1+\delta}}\}, where β>1\beta>1, δ>0\delta>0 are constants.

Input: cc, MM, {ηt}t1\{\eta_{t}\}_{t\geq 1}, {rt}t1\{r_{t}\}_{t\geq 1}, β\beta, δ\delta, κ\kappa
1 Set 𝚯𝟏𝟎\bm{\Theta_{1}}\leftarrow\bm{0}, wait=1
2 Sample 𝜸𝒩(𝟎,𝑰)\bm{\gamma}\sim\mathcal{N}(\bm{0},\bm{I})
3 for t=1t=1 to TT do
4       if wait=1\text{wait}=1 then
5            ρt=0\rho_{t}=0
6      else
7             host 𝝆targmin𝝆𝒳𝝆,𝚯t+ηt𝜸\bm{\rho}_{t}\in\operatorname*{arg\,min}_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{t}+\eta_{t}\bm{\gamma}\rangle
8      𝜽t=c𝒔+rt𝒇\bm{\theta}_{t}=c\bm{s}+r_{t}\bm{f}
9       Update 𝚯t+1=𝚯t+𝜽t\bm{\Theta}_{t+1}=\bm{\Theta}_{t}+\bm{\theta}_{t}
10       wait=min{wait,𝟙t(maxij(Θt+1,iΘt+1,j))2κ2β(logM)1+δ}\text{wait}=\min\{\text{wait},\mathbbm{1}_{t\geq\frac{(\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j}))^{2}}{\kappa^{2}\beta(\log M)^{1+\delta}}}\}
Algorithm 2 Wait then Follow The Perturbed Leader (W-FTPL)

Retro-Renting (RR) [3]: The RR policy is a deterministic hosting policy, and it either hosts the complete service or hosts nothing in a slot. The key idea behind this policy is to use recent arrival patterns to make hosting decisions. We refer the reader to Algorithm 1 in [3] for a formal definition of the RR policy. The performance of RR with respect to the competitive ratio was analyzed in [3].

α\alpha-RR [2] is a variant of RR where one partial level of hosting the service is allowed. The formal definition of α\alpha-RR, borrowed from [2, Algorithm 1] is given in Algorithm 3. Similar to RR, the key idea behind α\alpha-RR policy is to check whether the current hosting status is optimal in the hindsight using recent arrival patterns to make hosting decisions. α\alpha-RR checks the optimal offline policy cost for the recent request arrivals and chooses the fraction of service with the minimum cost. For ρt𝒫{0,1}\rho_{t}^{\mathcal{P}}\in\{0,1\}, α\alpha-RR behaves as RR [2]. Therefore we refer α\alpha-RR as RR in this paper.

1 Input: Fetch cost MM, partial hosting level α2\alpha_{2}, latency cost under partial hosting g(α2)g(\alpha_{2}), rent cost cc, request arrival sequence {xl}l0t\{x_{l}\}_{l\geq 0}^{t}
2 Output: service hosting strategy rt+1r_{t+1}, t>0t>0
3 Initialize: r1=trecent=0r_{1}=t_{\text{recent}}=0
4 for each time-slot tt do
5       It=(MI_{t}=(M, g(α2)g(\alpha_{2}), tt, trecentt_{\text{recent}}, cc, {xl}ltrecentt)\{x_{l}\}_{l\geq t_{\text{recent}}}^{t})
6       R0(τ0)=[rt,rt,,rtτ0trecent,0,0,,0tτ0]R_{0}^{(\tau_{0})}=[\underbrace{r_{t},r_{t},\ldots,r_{t}}_{\tau_{0}-t_{\text{recent}}},\underbrace{0,0,\ldots,0}_{t-\tau_{0}}]
7       Rα(τα)=[rt,rt,,rtταtrecent,α2,α2,,α2tτα]R_{\alpha}^{(\tau_{\alpha})}=[\underbrace{r_{t},r_{t},\ldots,r_{t}}_{\tau_{\alpha}-t_{\text{recent}}},\underbrace{\alpha_{2},\alpha_{2},\ldots,\alpha_{2}}_{t-\tau_{\alpha}}]
8       R1(τ1)=[rt,rt,,rtτ1trecent,1,1,,1tτ1]R_{1}^{(\tau_{1})}=[\underbrace{r_{t},r_{t},\ldots,r_{t}}_{\tau_{1}-t_{\text{recent}}},\underbrace{1,1,\ldots,1}_{t-\tau_{1}}]
9       minCost(0)=minτ0(trecent,t)totalCost(R0(τ0),It)\text{minCost}(0)=\displaystyle\min_{\tau_{0}\in(t_{\text{recent}},t)}\textnormal{{totalCost}}(R_{0}^{(\tau_{0})},I_{t})
10       minCost(α2)=minτα(trecent,t)totalCost(Rα(τα),It)\text{minCost}(\alpha_{2})=\displaystyle\min_{\tau_{\alpha}\in(t_{\text{recent}},t)}\textnormal{{totalCost}}(R_{\alpha}^{(\tau_{\alpha})},I_{t})
11       minCost(1)=minτ1(trecent,t)totalCost(R1(τ1),It)\text{minCost}(1)=\displaystyle\min_{\tau_{1}\in(t_{\text{recent}},t)}\textnormal{{totalCost}}(R_{1}^{(\tau_{1})},I_{t})
12       rt+1=argmini{0,α2,1}minCost(i)r_{t+1}=\displaystyle\operatorname*{arg\,min}_{i\in\{0,\alpha_{2},1\}}\text{minCost}(i)
13       if rt+1rtr_{t+1}\neq r_{t} then
14            trecent=tt_{\text{recent}}=t
15       end if
16      
17 end for
18Function totalCost(R,ItR,I_{t}):
19       g(0)=1g(0)=1, g(1)=0g(1)=0;
20       cost = R(1)×c+x1×g(R(1))R(1)\times c+x_{1}\times g(R(1));
21       for j2j\leftarrow 2 to ttrecentt-t_{\text{recent}}  do
22             cost = cost +R(j)×c+xj×g(R(j))+R(j)\times c+x_{j}\times g(R(j))
23                            +M×|R(j)R(j1)|+M\times\left|R(j)-R(j-1)\right|;
24            
25       end for
26      return cost;
27 end
Algorithm 3 α\alpha-RetroRenting (α\alpha-RR)

4 Main Results and Discussion

In this section, we state and discuss our key results.

4.1 Regret Analysis

Our first result characterizes the regret performance of the policies discussed in Section 3 and the fundamental limit on the performance of any online policy for adversarial arrivals.

Theorem 1 (Adversarial Arrivals).

Let the arrivals be generated by an oblivious adversary under the constraint that at most κ\kappa requests arrive in each time-slot. Then,

  1. (a)

    A𝒫(T)=Ω(T)\mathcal{R}^{\mathcal{P}}_{A}(T)=\Omega(\sqrt{T}) for any online policy 𝒫\mathcal{P}.

  2. (b)

    ARR(T)=Ω(T).\mathcal{R}^{\text{RR}}_{A}(T)=\Omega({T}).

  3. (c)

    For ηt=αt\eta_{t}=\alpha\sqrt{t}, α>0\alpha>0,

    AFTPL(T)\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq 2TlogK(α+4κ2α)+K2M(c+2κ)2απT+1.\displaystyle\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.
  4. (d)

    For ηt=αt\eta_{t}=\alpha\sqrt{t}, α>0\alpha>0,

    AFTPL(T)Mα2/K.\mathcal{R}_{A}^{\text{FTPL}}(T)\geq M\alpha_{2}/K.
  5. (e)

    For ηt=αt\eta_{t}=\alpha\sqrt{t}, α>0\alpha>0, β>1\beta>1, δ>0\delta>0,

    AW-FTPL(T)\displaystyle\mathcal{R}^{\text{W-FTPL}}_{A}(T)\leq βT(logM)1+δ+2TlogK(α+4κ2α)\displaystyle\sqrt{\beta T(\log M)^{1+\delta}}+\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
    +K2M(c+2κ)2απT+1.\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.

The key take-away from Theorem 1 is that RR has linear regret and FTPL, W-FTPL have order-optimal regret, i.e., O(T)\mathrm{O}(\sqrt{T}) with respect to the time horizon under adversarial arrivals. The proof of Theorem 1 is in Section 6.

Our second result characterizes the regret performance of the policies discussed in Section 3 for i.i.d. stochastic arrivals.

Theorem 2 (Stochastic Arrivals).

Let the arrivals in each time-slot be i.i.d. stochastic with mean μ\mu and Δmin0\Delta_{\text{min}}\neq 0. Then we have

  1. (a)

    SRR(T)=Ω(T).\mathcal{R}^{\text{RR}}_{S}(T)=\Omega({T}).

  2. (b)

    For ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, α>0\alpha>0,

    SFTPL(T)\displaystyle\mathcal{R}^{\text{FTPL}}_{S}(T)\leq (2logK+22h1logKΔmin)(α+4κ2α)+16α2+4κ2Δmin\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}}
    +(16α2+3κ2)MΔmin2,\displaystyle+(16\alpha^{2}+3\kappa^{2})\frac{M}{\Delta_{\text{min}}^{2}},

    where h1=4max{8α2,κ2}h_{1}=4\max\{8\alpha^{2},\kappa^{2}\}.

  3. (c)

    For ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, α>0\alpha>0, β>1\beta>1, δ>0\delta>0, and MM large enough we have,

    SW-FTPL(T)1\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}(T)\leq 1 +βκ2(logM)1+δ(4Δmin2+16α2+3κ2Δmin2Δmax2)\displaystyle+\beta\kappa^{2}(\log M)^{1+\delta}\left(\frac{4}{\Delta_{\text{min}}^{2}}+\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}\Delta_{\text{max}}^{2}}\right)
    +(16α2+4κ2)(1Δmin+1Δmin2)\displaystyle+(16\alpha^{2}+4\kappa^{2})\left(\frac{1}{\Delta_{\text{min}}}+\frac{1}{\Delta_{\text{min}}^{2}}\right)
    +(2logK+22h1logKΔmin)(α+4κ2α),\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right),

    where h1=4max{8α2,κ2}h_{1}=4\max\{8\alpha^{2},\kappa^{2}\}.

Corollary 1 (Stochastic Arrivals).

Let the arrivals in each time-slot be i.i.d. stochastic with mean μ\mu and Δmin0\Delta_{\text{min}}\neq 0. For the W-FTPL policy, if ρt𝒫{0,1}\rho_{t}^{\mathcal{P}}\in\{0,1\}, β=max{(1+4α/κ)2,(1+2/κ)2}\beta=\max\{(1+4\alpha/\kappa)^{2},(1+\sqrt{2}/\kappa)^{2}\}, then the regret scales at most logarithmically with MM.

Remark 1.

While the result in 2 (c)(c) is stated for δ>0\delta>0, it can e shown that for δ=0\delta=0 and β\beta chosen suitably large, the regret of W-FTPL scales logarithmically with MM.

The key take-aways from Theorem 2 are that RR has linear regret with respect to time, and FTPL/W-FTPL has regret that does not scale with time. Also, the upper bound on the regret of W-FTPL scales as (logM)1+δ(\log M)^{1+\delta} for δ0\delta\geq 0 where as for FTPL, the upper bound on the regret scales linearly with MM. W-FTPL thus outperforms FTPL w.r.t. to the dependence of regret on the fetch cost MM. We validate these results via simulations in Section 5.

4.2 Competitive Ratio Analysis

Our next result characterizes the competitive ratio of the policies discussed in Section 3. The competitive ratio of RR is characterized in [3, 2], and it is shown that RR has a constant competitive ratio and the competitive ratio decreases with an increase in fetch cost MM. We present the results for FTPL and W-FTPL policies below.

Theorem 3 (Adversarial Arrivals).

Let the arrivals be generated by an oblivious adversary under the constraint that at most κ\kappa requests arrive in each time-slot. Then we have

  1. (a)

    For ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, α>0\alpha>0,

    σAFTPL(T)\displaystyle\sigma^{\text{FTPL}}_{A}(T)\leq κ2(3+2M/c)mini(cαi+g(αi)κ)maxαi01g(αi)αi\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}
    +κ2(M+c)mini(cαi+g(αi)κ)i=2K16α2c2αi2.\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.
  2. (b)

    For ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, α>0\alpha>0, β>1\beta>1, δ0\delta\geq 0,

    σAW-FTPL(T)\displaystyle\sigma^{\text{W-FTPL}}_{A}(T)\leq κ2(3+2M/c)mini(cαi+g(αi)κ)maxαi01g(αi)αi\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}
    +κ2(M+c)mini(cαi+g(αi)κ)i=2K16α2c2αi2\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}
    +κ2mini(cαi+g(αi)κ).\displaystyle+\frac{\kappa^{2}}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}.

The key take-away from Theorem 3 is that the competitive ratios of both FTPL and W-FTPL are bounded by a constant. The competitive ratio of FTPL and W-FTPL deteriorates with MM, while the competitive ratio of RR improves with MM when only one partial hosting level is permitted [2]. Note that the competitive ratio of FTPL and W-FTPL is bounded by a constant for the setting where any finite number of partial hosting levels are allowed. Compared to this, for RR in [2], the competitive ratio result holds only for one partial hosting level.

Theorem 4 (Stochastic Arrivals).

Let the arrivals in each time-slot be i.i.d. stochastic with mean μ\mu. Then we have

  1. (a)

    For TT large enough, σSRR(T)>1\sigma^{RR}_{S}(T)>1.

  2. (b)

    For ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, σSFTPL(T)1+O(1/T)\sigma^{\text{FTPL}}_{S}(T)\leq 1+\mathrm{O}(1/T).

  3. (c)

    For ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, α>0\alpha>0, β>1\beta>1, δ>0\delta>0, σSW-FTPL(T)1+O(1/T)\sigma^{\text{W-FTPL}}_{S}(T)\leq 1+\mathrm{O}(1/T).

The key take-away from Theorem 4 is that for TT large enough, FTPL and W-FTPL outperform RR in terms of competitive ratio for stochastic arrivals.

In Section 5 we supplement the results stated in this section through simulations.

5 Simulations

In this section, we compare RR, FTPL and W-FTPL policies via simulations using synthetic request sequences, as well as real trace data from [24]. In all our simulations, we consider ηt=0.1t\eta_{t}=0.1\sqrt{t} for FTPL and W-FTPL, and β=6\beta=6 for W-FTPL.

5.1 Synthetic arrivals

In Figure 1, we plot the regret of different policies as a function of time horizon TT. The parameters considered are c=0.1c=0.1, M=50M=50, κ=5\kappa=5. The request arrival considered is similar to the one used in the proof of Theorem 3(b), i.e., we divide time into frames, and each frame starts with Mκc\lceil\frac{M}{\kappa-c}\rceil slots of κ\kappa requests each followed by Mc\lceil\frac{M}{c}\rceil slots of zero requests. We consider 100 such frames for this experiment. The regret is averaged over 50 experiments. We observe that RR has linear regret in this case which agrees with Theorem 1.

Refer to caption
Figure 1: Regret as a function of Time horizon (TT)

5.2 Stochastic arrivals

For the stochastic case, we consider arrivals in each slot to be Bernoulli with parameter μ\mu. We will restrict attention to the case where there is only one partial level of hosting, i.e., ρt𝒫{0,α2,1}\rho_{t}^{\mathcal{P}}\in\{0,\alpha_{2},1\}. We consider α2=0.5\alpha_{2}=0.5 and g(α2)=0.45g(\alpha_{2})=0.45. In Figure2 and Figure3, we consider μ=0.4\mu=0.4, c=0.45c=0.45. All the results plotted are averaged over 50 independent experiments. In Figure2, we fix M=5M=5 and compare the regret performance of the policies as a function of time. We observe that W-FTPL performs better than other policies, and RR has linear regret, which agrees with Theorem 2. In Figure3, we compare the performance of the policies with respect to MM for T=104T=10^{4}. We observe that W-FTPL performs better than other policies, and the main difference between the cost of policies is due to fetch cost. Also note that W-FTPL has better dependence on MM as compared to FTPL, which agrees with Theorem 2. From Theorem 2 we observe that for large values of MM the regret of FTPL is proportional to M/Δmin2M/\Delta_{\text{min}}^{2} where as W-FTPL is proportional to logM/Δmin2\log M/\Delta_{\text{min}}^{2}. Therefore in Figure4, we compare the performance of the policies by varying rent cost cc for M=500M=500, μ=0.4\mu=0.4, and T=104T=10^{4}. Again we note that W-FTPL outperforms RR and FTPL. As cc gets closer to μ\mu, the FTPL policy does multiple fetches, and as a result, the regret increases when cc is close to μ\mu.

Refer to caption
Figure 2: Regret as a function of time horizon (TT)
Refer to caption
Figure 3: Cost, Regret as a function of fetch cost (MM)
Refer to caption
Figure 4: Cost, Regret as a function of rent cost (cc)

5.3 Trace driven arrivals

For trace-driven simulations, we use the data collected by Grid Workloads Archive [24], which consists of requests arriving at computational servers. Among the traces mentioned in [24] we use DAS-2 traces, which were provided by the Advanced School for Computing and Imaging (ASCI). We consider slot duration as 1 hour and plot a snapshot of the trace in Figure 5. We consider κ=300\kappa=300 for all the experiments in this subsection. All the results plotted for trace driven experiments are averaged over 50 independent experiments.

We consider partial service hosting with one partial level for our simulations. This dataset provides information on the number of processors required to serve each request, and this number lies between 1 and 128 for all requests. For this experiment, we consider hosting the entire service as equivalent to having 128 processors at the edge and hosting the partial service as having 16 processors at the edge. It follows that the hosting status ρt𝒫{0,16/125(=0.125),1}\rho_{t}^{\mathcal{P}}\in\{0,16/125(=0.125),1\}. To compute g(0.125)g(0.125), we use the dataset to compute the fraction of requests that require more than 16 processors for service and obtain that g(0.125)=0.0328g(0.125)=0.0328.

Refer to caption
Figure 5: Arrivals in a slot for trace data considered
Refer to caption
Figure 6: Regret as a function of time horizon
Refer to caption
Figure 7: Cost, Regret as a function of fetch cost (MM)
Refer to caption
Figure 8: Cost, Regret as a function of rent cost (cc)

In Figure6, we consider M=500M=500, c=100c=100, κ=300\kappa=300 and compare the regret performance of policies as a function of time. We observe that RR performs better than other policies for most values of time horizon. This is because RR attempts to mimic the optimal offline policy, which is not a static policy for this arrival sequence. Note that FTPL and W-FTPL attempt to mimic the optimal static policy and perform worse than RR.

In Figure7, we compare the performance of different policies with respect to MM and fix c=100c=100, κ=300\kappa=300. We observe that RR performs better than the other policies for some values of MM because of the same reason mentioned before. In Figure8, we compare the performance of different policies by varying rent cost cc and fix M=500M=500, κ=300\kappa=300. Again we note that RR outperforms W-FTPL and FTPL in some cases.

Our simulation results thus indicate that FTPL and W-FTPL outperform RR for i.i.d. stochastic arrivals. In the case where the arrival process is such that the offline optimal policy is not static, RR can outperform FTPL and W-FTPL. Additionally, RR can have linear regret with respect to time, while FTPL and W-FTPL have sub-linear regret with respect to time for all arrival processes considered.

6 Proof of theorems

6.1 Proof of Theorem 1(a)

Proof of Theorem 1(a).

Recall that regret of any policy is given by

A𝒫(T,r)=\displaystyle\mathcal{R}^{\mathcal{P}}_{A}(T,r)= t=1Tcρt+g(ρt)rt+M(ρtρt1)+\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t})r_{t}+M(\rho_{t}-\rho_{t-1})^{+}
mini{cαiT+g(αi)RT+Mαi}.\displaystyle-\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\right\}.

Note that suprA𝒫(T,r)𝔼r[A𝒫(T,r)]\sup_{r}\mathcal{R}^{\mathcal{P}}_{A}(T,r)\geq{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)] where the expectation is w.r.t some specified distribution over request sequences. We lower bound 𝔼r[A𝒫(T,r)]{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)] to get a lower bound on A𝒫(T)\mathcal{R}^{\mathcal{P}}_{A}(T).

𝔼r[A𝒫(T,r)]=\displaystyle{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)]= t=1Tcρt+g(ρt)𝔼r[rt]+M(ρtρt1)+\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t}){\mathbb{E}}_{r}[r_{t}]+M(\rho_{t}-\rho_{t-1})^{+}
𝔼r[mini{cαiT+g(αi)RT+Mαi}]\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\right\}]
\displaystyle\geq t=1Tcρt+g(ρt)𝔼r[rt]\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t}){\mathbb{E}}_{r}[r_{t}]
𝔼r[mini{cαiT+g(αi)RT}]M.\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M.

Let =argmini1αi1g(αi)\ell=\operatorname*{arg\,min}_{i\neq 1}\frac{\alpha_{i}}{1-g(\alpha_{i})}, XBer(cακ(1g(α)))X\sim\text{Ber}(\frac{c\alpha_{\ell}}{\kappa(1-g(\alpha_{\ell}))}), and rt=κXr_{t}=\kappa X and i.i.d overtime, therefore 𝔼r[rt]=cα1g(α){\mathbb{E}}_{r}[r_{t}]=\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}. From Assumption 2 we have αi+g(αi)rtκ\alpha_{i}+g(\alpha_{i})r_{t}\leq\kappa for all 1iK1\leq i\leq K and 0rtκ0\leq r_{t}\leq\kappa therefore cακ(1g(α))1\frac{c\alpha_{\ell}}{\kappa(1-g(\alpha_{\ell}))}\leq 1 and XX is a valid Bernoulli random variable.

𝔼r[A𝒫(T,r)]\displaystyle{\mathbb{E}}_{r}[\mathcal{R}^{\mathcal{P}}_{A}(T,r)]\geq t=1Tcρt+g(ρt)cα1g(α)\displaystyle\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t})\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}
𝔼r[mini{cαiT+g(αi)RT}]M\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M
=\displaystyle= t=1T(1g(ρt))cρt(1g(ρt))𝟙ρt0+g(ρt)cα1g(α)\displaystyle\sum_{t=1}^{T}(1-g(\rho_{t}))\frac{c\rho_{t}}{(1-g(\rho_{t}))}\mathds{1}_{\rho_{t}\neq 0}+g(\rho_{t})\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}
𝔼r[mini{cαiT+g(αi)RT}]M\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M
(a)\displaystyle\overset{(a)}{\geq} t=1T(1g(ρt))cα(1g(α))+g(ρt)cα1g(α)\displaystyle\sum_{t=1}^{T}(1-g(\rho_{t}))\frac{c\alpha_{\ell}}{(1-g(\alpha_{\ell}))}+g(\rho_{t})\frac{c\alpha_{\ell}}{1-g(\alpha_{\ell})}
𝔼r[mini{cαiT+g(αi)RT}]M\displaystyle-{\mathbb{E}}_{r}[\min_{i}\left\{c\alpha_{i}T+g(\alpha_{i})R_{T}\right\}]-M
(b)\displaystyle\overset{(b)}{\geq} cTα1g(α)𝔼r[min{RT,cTα+g(α)RT}]M\displaystyle\frac{cT\alpha_{\ell}}{1-g(\alpha_{\ell})}-{\mathbb{E}}_{r}[\min\{R_{T},cT\alpha_{\ell}+g(\alpha_{\ell})R_{T}\}]-M
=(c)\displaystyle\overset{(c)}{=} 12𝔼r[|RTcTαg(α)RT|]M\displaystyle\frac{1}{2}{\mathbb{E}}_{r}[|R_{T}-cT\alpha_{\ell}-g(\alpha_{\ell})R_{T}|]-M
=\displaystyle= κ(1g(α))2𝔼r[|RTκTcακ(1g(α))|]M\displaystyle\frac{\kappa(1-g(\alpha_{\ell}))}{2}{\mathbb{E}}_{r}\left[\left|\frac{R_{T}}{\kappa}-\frac{Tc\alpha_{\ell}}{\kappa(1-g(\alpha_{\ell}))}\right|\right]-M
=(d)\displaystyle\overset{(d)}{=} Ω(T),\displaystyle\Omega(\sqrt{T}),

where (a)(a) follows from definition of \ell, (b)(b) follows because of min1kKakmin{ai,aj}\min_{1\leq k\leq K}a_{k}\leq\min\{a_{i},a_{j}\} for 1i,jK1\leq i,j\leq K, (c)(c) follows from Lemma 15 in Section 7, and (d)(d) follows from Lemma 16 in Section 7. ∎

6.2 Proof of Theorem 1(b)

Note that the RR algorithm in [3] can have only one intermediate level of partial hosting i.e., ρtRR{0,α2,1}\rho_{t}^{\text{RR}}\in\{0,\alpha_{2},1\}. So throughout the is subsection we consider ρtRR{0,α2,1}\rho_{t}^{\text{RR}}\in\{0,\alpha_{2},1\}. We use the following lemmas to prove Theorem 1(b).

Lemma 2.

RR does not fetch any non zero fraction of service for at least Mαiκcαig(αi)κ\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil slots after eviction of complete service, where
αi=argminαi0Mαiκcαiκg(αi)\alpha_{i^{\prime}}=\operatorname*{arg\,min}_{\alpha_{i}\neq 0}\frac{M\alpha_{i}}{\kappa-c\alpha_{i}-\kappa g(\alpha_{i})}.

Proof.

Without loss of generality we consider t=0t=0 is when RR last evicted the service. If RR policy fetches αi0\alpha_{i}\neq 0 fraction of service in slot t+1t+1, then cαit+g(αi)Rt+Mαi<Rtc\alpha_{i}t+g(\alpha_{i})R_{t}+M\alpha_{i}<R_{t}.

We now prove the lemma by contradiction. Let us assume that RR fetches αi\alpha_{i} fraction of the service in time slot tf+1t_{f}+1 and tf<Mαiκcαig(αi)κ.t_{f}<\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil. Then by using the definition of αi\alpha_{i^{\prime}}, we get tf<Mαiκcαig(αi)κt_{f}<\frac{M\alpha_{i}}{\kappa-c\alpha_{i}-g(\alpha_{i})\kappa} and cαitf+g(αi)κtf+Mαi>κtfRtfc\alpha_{i}t_{f}+g(\alpha_{i})\kappa{t_{f}}+M\alpha_{i}>\kappa t_{f}\geq R_{t_{f}} which is a contradiction. ∎

Lemma 3.

Once RR fetches any non zero fraction of service, it hosts (some positive fraction of the service) for at least Mc\lceil\frac{M}{c}\rceil slots before it evicts the completely.

Proof.

Without loss of generality we consider t=0t=0 is when RR last fetched the service and say it hosted αi0\alpha_{i}\neq 0 fraction of the service. For the RR policy to evict the service in slot t+1t+1, we need cαit+g(αi)Rt>Mαi+Rtc\alpha_{i}t+g(\alpha_{i})R_{t}>M\alpha_{i}+R_{t}, or equivalently Rt<cαitMαi1g(αi)R_{t}<\frac{c\alpha_{i}t-M\alpha_{i}}{1-g(\alpha_{i})}.

We now prove the lemma by contradiction. Let us assume that RR evicts the service in time slot te+1t_{e}+1 and te<Mct_{e}<\lceil\frac{M}{c}\rceil. Then from the assumption above, we must have Rte<cαiteMαi1g(αi)<0R_{t_{e}}<\frac{c\alpha_{i}t_{e}-M\alpha_{i}}{1-g(\alpha_{i})}<0 which is a contradiction. ∎

Proof of Theorem 1(b).

We split the entire time duration TT into frames of size tf+tet_{f}+t_{e} and consider the arrival sequence in each frame as rt=κr_{t}=\kappa for 1ttf1\leq t\leq t_{f} and rt=0r_{t}=0 for tf+1ttf+tet_{f}+1\leq t\leq t_{f}+t_{e}, where αi=argminαi0Mαiκcαiκg(αi)\alpha_{i^{\prime}}=\operatorname*{arg\,min}_{\alpha_{i}\neq 0}\frac{M\alpha_{i}}{\kappa-c\alpha_{i}-\kappa g(\alpha_{i})}, tf=Mαiκcαig(αi)κt_{f}=\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil, and te=Mct_{e}=\lceil\frac{M}{c}\rceil.

From Lemma 2 we know that RR does not fetch any fraction of service till time tft_{f}. At t=tf+1t=t_{f}+1 RR fetches some positive fraction of the service because cαitf+g(αi)Rtf+Mαi<Rtfc\alpha_{i^{\prime}}t_{f}+g(\alpha_{i^{\prime}})R_{t_{f}}+M\alpha_{i^{\prime}}<R_{t_{f}} holds by definition of tft_{f}.

From Lemma 3 we know that RR does not evict the service for t[tf+1,tf+te]t\in[t_{f}+1,t_{f}+t_{e}]. At t=tf+te+1t=t_{f}+t_{e}+1 RR evicts the service because for αi0\alpha_{i}\neq 0, cαite+g(αi)t=tf+1tf+tert<Mαi+t=tf+1tf+tertc\alpha_{i}t_{e}+g(\alpha_{i})\sum_{t=t_{f}+1}^{t_{f}+t_{e}}r_{t}<M\alpha_{i}+\sum_{t=t_{f}+1}^{t_{f}+t_{e}}r_{t} holds by definition of tet_{e}.

So amongst the first tf+tet_{f}+t_{e} time-slots, the RR policy does not host for the first tft_{f} slots and then hosts some positive fraction of service for the following tet_{e} slots. Let i=argminicαiT+g(αiRT+Mαi)i^{*}=\operatorname*{arg\,min}_{i}c\alpha_{i}T+g(\alpha_{i}R_{T}+M\alpha_{i}) be the fraction of service to be hosted by optimal static policy. If the optimal static policy is to host αi0\alpha_{i^{*}}\neq 0 fraction of service then we have a regret of at least (κcαig(αi)κ)tfMαi(\kappa-c\alpha_{i^{*}}-g(\alpha_{i^{*}})\kappa)t_{f}-M\alpha_{i^{*}} for initial frame and (κcαig(αi)κ)tf=(κcαig(αi)κ)Mαiκcαig(αi)κ>0(\kappa-c\alpha_{i^{*}}-g(\alpha_{i^{*}})\kappa)t_{f}=(\kappa-c\alpha_{i^{*}}-g(\alpha_{i^{*}})\kappa)\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\rceil>0 form second frame onwards because of forwarding, which is a constant and does not depend on TT. If the optimal static policy is not to host the service, then we have a regret of cαite>0c\alpha_{i}t_{e}>0 till tf+tet_{f}+t_{e} for some αi0\alpha_{i}\neq 0, which is a constant and does not depend on TT. So in either case, a regret larger than some positive constant(say dd) is occurred in a frame of size tf+tet_{f}+t_{e}. Since we split the entire time duration TT into frames of size tf+te=Mαiκcαig(αi)κ+Mct_{f}+t_{e}=\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\rceil+\lceil\frac{M}{c}\rceil and the request sequence is repeated in each frame, we get linear regret.

ARR(T)\displaystyle\mathcal{R}^{\text{RR}}_{A}(T) f=1T/(tf+te)dMαi\displaystyle\geq\sum_{f=1}^{\lfloor T/(t_{f}+t_{e})\rfloor}d-M\alpha_{i^{*}}
T/(tf+te)dMαi\displaystyle\geq\lfloor T/(t_{f}+t_{e})\rfloor d-M\alpha_{i^{*}}
d(TMαiκcαig(αi)κ+Mc+21)Mαi\displaystyle\geq d\left(\frac{T}{\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}+\frac{M}{c}+2}-1\right)-M\alpha_{i^{*}}
=Ω(T).\displaystyle=\Omega(T).

6.3 Proof of Theorem 1(c)

The optimal static policy only fetches the optimal fraction of service to be hosted once. Therefore,

AFTPL(T,r)=\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T,r)= 𝔼[𝒞FTPL(T,r)]\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]
mini{cαiT+g(αi)RT+Mαi}\displaystyle-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\}
\displaystyle\leq 𝔼[𝒞FTPL(T,r)]mini{cαiT+g(αi)RT}.\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}\}. (3)

Since FTPL policy does not consider fetch cost MM while making decisions we can decouple the fetch cost and the non fetch cost incurred by the FTPL policy. Therefore we can also decouple the expected regret incurred by FTPL into expected regret without fetch cost (M=0M=0) and the fetch cost incurred by FTPL to bound (3). We first bound the expected regret of the FTPL policy with fetch cost M=0M=0 and then add it’s expected fetch cost to get the final regret bound.

Lemma 4.

The regret for FTPL policy with non decreasing learning rate {ηt}t=1T\{\eta_{t}\}_{t=1}^{T} and M=0M=0 is given by

AFTPL(T)\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T) 2logK(ηT+κ2t=1T1ηt).\displaystyle\leq\sqrt{2\log K}\left(\eta_{T}+\kappa^{2}\sum_{t=1}^{T}\frac{1}{\eta_{t}}\right).
Proof.

The proof of this theorem follows along the same lines as Theorem 1 of [25] and, Proposition 4.1 of [17]. Recall that ρt𝒫\rho_{t}^{\mathcal{P}} represents the fraction of service hosted in slot tt by policy 𝒫\mathcal{P} and 𝝆t𝒫\bm{\rho}_{t}^{\mathcal{P}} is a one hot vector where position of one represents the level of service hosted at the edge. In slot tt FTPL hosts fraction of service ρtFTPL=𝝆tFTPL,𝒔\rho_{t}^{\text{FTPL}}=\langle\bm{\rho}_{t}^{\text{FTPL}},\bm{s}\rangle, where 𝝆tFTPL=argmin𝝆𝒳𝝆,𝚯t+ηt𝜸\bm{\rho}_{t}^{\text{FTPL}}=\operatorname*{arg\,min}_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{t}+\eta_{t}\bm{\gamma}\rangle. For ease of notation we suppress the policy in super script and denote 𝝆t𝒫\bm{\rho}_{t}^{\mathcal{P}} as 𝝆t\bm{\rho}_{t} and ρt𝒫\rho_{t}^{\mathcal{P}} as ρt\rho_{t} in further discussion. Define a time varying potential function

𝚽t(𝜽)=𝔼𝜸[min𝝆𝒳𝝆,𝜽+ηt𝜸],\displaystyle\bm{\Phi}_{t}(\bm{\theta})={\mathbb{E}}_{\bm{\gamma}}\left[\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\theta}+\eta_{t}\bm{\gamma}\rangle\right], (4)

and observe that the gradient of 𝚽t\bm{\Phi}_{t} at 𝚯t\bm{\Theta}_{t} is 𝔼𝜸[𝝆t]{\mathbb{E}}_{\bm{\gamma}}[\bm{\rho}_{t}]. So 𝚽t(𝚯t),𝜽t=𝔼𝜸[𝝆t,𝜽t]\langle\nabla\bm{\Phi}_{t}(\bm{\Theta}_{t}),\bm{\theta}_{t}\rangle={\mathbb{E}}_{\bm{\gamma}}[\langle\bm{\rho}_{t},\bm{\theta}_{t}\rangle]. Consequently

𝔼𝜸[𝝆t,𝜽t]\displaystyle{\mathbb{E}}_{\bm{\gamma}}[\langle\bm{\rho}_{t},\bm{\theta}_{t}\rangle] =𝚽t(𝚯t),𝚯t+1𝚯t,\displaystyle=\langle\nabla\bm{\Phi}_{t}(\bm{\Theta}_{t}),\bm{\Theta}_{t+1}-\bm{\Theta}_{t}\rangle,
=𝚽t(𝚯t+1)𝚽t(𝚯t)12𝜽t,2Φt(𝜽~t)𝜽t),\displaystyle=\bm{\Phi}_{t}(\bm{\Theta}_{t+1})-\bm{\Phi}_{t}(\bm{\Theta}_{t})-\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t})\rangle,

where 𝜽~t\tilde{\bm{\theta}}_{t} is the line segment joining 𝚯𝒕+𝟏\bm{\Theta_{t+1}} and 𝚯𝒕\bm{\Theta_{t}} which follows from Taylor’s expansion of 𝚽t(𝚯t+1)\bm{\Phi}_{t}(\bm{\Theta}_{t+1}).

t=1T𝔼𝜸[𝝆t,𝜽t]\displaystyle\sum_{t=1}^{T}{\mathbb{E}}_{\bm{\gamma}}\left[\langle\bm{\rho}_{t},\bm{\theta}_{t}\rangle\right]
=t=1T𝚽t(𝚯t+1)𝚽t(𝚯t)12𝜽t,2Φt(𝜽~t)𝜽t),\displaystyle=\sum_{t=1}^{T}\bm{\Phi}_{t}(\bm{\Theta}_{t+1})-\bm{\Phi}_{t}(\bm{\Theta}_{t})-\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t})\rangle,
=𝚽T(𝚯T+1)𝚽1(𝚯1)+t=2T𝚽t1(𝚯t)𝚽t(𝚯t)\displaystyle=\bm{\Phi}_{T}(\bm{\Theta}_{T+1})-\bm{\Phi}_{1}(\bm{\Theta}_{1})+\sum_{t=2}^{T}\bm{\Phi}_{t-1}(\bm{\Theta}_{t})-\bm{\Phi}_{t}(\bm{\Theta}_{t})
t=1T12𝜽t,2Φt(𝜽~t)𝜽t.\displaystyle-\sum_{t=1}^{T}\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t}\rangle. (5)

By Jensen’s inequality

𝚽T(𝚯T+1)\displaystyle\bm{\Phi}_{T}(\bm{\Theta}_{T+1}) =𝔼[min𝝆𝒳𝝆,𝚯T+1+ηt𝜸]\displaystyle={\mathbb{E}}\left[\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{T+1}+\eta_{t}\bm{\gamma}\rangle\right]
min𝝆𝒳𝔼[𝝆,𝚯T+1+ηt𝜸]\displaystyle\leq\min_{\bm{\rho}\in\mathcal{X}}{\mathbb{E}}\left[\langle\bm{\rho},\bm{\Theta}_{T+1}+\eta_{t}\bm{\gamma}\rangle\right]
=(a)min𝝆𝒳𝝆,𝚯T+1,\displaystyle\overset{(a)}{=}\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{T+1}\rangle,

where (a)(a) follows because γi\gamma_{i}’s are standard Gaussian random variables. Recall that min𝝆𝒳𝝆,𝚯T+1\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\Theta}_{T+1}\rangle is the loss incurred by the offline static policy. Therefore by rearranging (5) and using Jensen’s inequality we get

AFTPL(T)\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq 𝚽1(𝚯𝟏)+t=2T𝚽t1(𝚯t)𝚽t(𝚯t)\displaystyle-\bm{\Phi}_{1}(\bm{\Theta_{1}})+\sum_{t=2}^{T}\bm{\Phi}_{t-1}(\bm{\Theta}_{t})-\bm{\Phi}_{t}(\bm{\Theta}_{t})
t=1T12𝜽t,2Φt(𝜽~t)𝜽t.\displaystyle-\sum_{t=1}^{T}\frac{1}{2}\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t}\rangle. (6)

We bound each term in the RHS separately to get the final result.

The first term is 𝚽1(𝚯1)=𝚽1(0)=η1𝔼𝜸[min𝝆𝝆,𝜸]-\bm{\Phi}_{1}(\bm{\Theta}_{1})=-\bm{\Phi}_{1}(0)=-\eta_{1}{\mathbb{E}}_{\bm{\gamma}}[\min_{\bm{\rho}}\langle\bm{\rho},\bm{\gamma}\rangle]
=η1𝔼𝜸[max𝝆𝒳𝝆,𝜸]=\eta_{1}{\mathbb{E}}_{\bm{\gamma}}[\max_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\gamma}\rangle] since Gaussian random variables are symmetric. By using the result of Lemma 9 in [25], we get

𝚽1(0)η12logK.-\bm{\Phi}_{1}(0)\leq\eta_{1}\sqrt{2\log K}. (7)

The second term can be analyzed by considering

𝚽t1(𝚯t)𝚽t(𝚯t)\displaystyle\bm{\Phi}_{t-1}(\bm{\Theta}_{t})-\bm{\Phi}_{t}(\bm{\Theta}_{t})
=𝔼𝜸[min𝝆𝒳𝝆,𝜽t+ηt1𝜸min𝝆𝒳𝝆,𝜽𝒕+ηt𝜸]\displaystyle={\mathbb{E}}_{\bm{\gamma}}[\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\theta}_{t}+\eta_{t-1}\bm{\gamma}\rangle-\min_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},\bm{\theta_{t}}+\eta_{t}\bm{\gamma}\rangle]
(a)𝔼𝜸[max𝝆𝒳𝝆,(ηt1ηt)𝜸]\displaystyle\overset{(a)}{\leq}{\mathbb{E}}_{\bm{\gamma}}[\max_{\bm{\rho}\in\mathcal{X}}\langle\bm{\rho},(\eta_{t-1}-\eta_{t})\bm{\gamma}\rangle]
|ηtηt1|2logK,\displaystyle\leq|\eta_{t}-\eta_{t-1}|\sqrt{2\log K}, (8)

where (a)(a) follows because max𝒚𝒴(f(𝒚)g(𝒚))max𝒚𝒴f(𝒚)max𝒚𝒴g(𝒚)\max_{\bm{y}\in\mathcal{Y}}(f(\bm{y})-g(\bm{y}))\geq\max_{\bm{y}\in\mathcal{Y}}f(\bm{y})-\max_{\bm{y}\in\mathcal{Y}}g(\bm{y}) for any arbitrary functions ff, gg. We obtain (8) by using Lemma 9 in [25].

Now we bound the third term in (6). Let H=2Φt(𝜽~t)H=\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t}). By using Lemma 7 of [26] we have,

Hi,j=1ηt𝔼[x^(𝜽t~+ηt𝜸)i𝜸j].\displaystyle H_{i,j}=\frac{1}{\eta_{t}}{\mathbb{E}}[\hat{x}(\tilde{\bm{\theta}_{t}}+\eta_{t}\bm{\gamma})_{i}\bm{\gamma}_{j}].

If we replace ηt\eta_{t} with ηt/κ\eta_{t}/\kappa and θt\theta_{t} with θt/κ\theta_{t}/\kappa and apply Lemma 2 of [25] we get

𝜽tκ,κH𝜽tκ2κηt2logK\displaystyle-\bigg{\langle}\frac{\bm{\theta}_{t}}{\kappa},\kappa H\frac{\bm{\theta}_{t}}{\kappa}\bigg{\rangle}\leq\frac{2\kappa}{\eta_{t}}\sqrt{2\log K}
𝜽t,2Φt(𝜽~t)𝜽t2κ2ηt2logK.\displaystyle\implies-\langle\bm{\theta}_{t},\nabla^{2}\Phi_{t}(\tilde{\bm{\theta}}_{t})\bm{\theta}_{t}\rangle\leq\frac{2\kappa^{2}}{\eta_{t}}\sqrt{2\log K}. (9)

By using (7), (8) and (9), we bound (6),

AFTPL(T)\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq η12logK+t=2T|ηtηt1|2logK\displaystyle\eta_{1}\sqrt{2\log K}+\sum_{t=2}^{T}|\eta_{t}-\eta_{t-1}|\sqrt{2\log K}
+t=1T2κ2ηt2logK\displaystyle+\sum_{t=1}^{T}\frac{2\kappa^{2}}{\eta_{t}}\sqrt{2\log K}
\displaystyle\leq ηT2logK+t=1T2κ2ηt2logK\displaystyle\eta_{T}\sqrt{2\log K}+\sum_{t=1}^{T}\frac{2\kappa^{2}}{\eta_{t}}\sqrt{2\log K}
\displaystyle\leq ηT2logK+2κ22logKt=1T1ηt.\displaystyle\eta_{T}\sqrt{2\log K}+2\kappa^{2}\sqrt{2\log K}\sum_{t=1}^{T}\frac{1}{\eta_{t}}.

Now we bound the expected fetch cost incurred by FTPL policy.

Lemma 5.

The expected fetch cost under FTPL policy upto time TT denoted by 𝔼[𝒞FFTPL(T)]{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{F}(T)] with learning rate ηt=αt\eta_{t}=\alpha\sqrt{t} is bounded as follows

𝔼[𝒞FFTPL(T)]MK2(c+2κ)2απT+1.{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{F}(T)]\leq\frac{MK^{2}(c+2\kappa)}{2\alpha\sqrt{\pi}}{\sqrt{T+1}}.
Proof.

The proof of this theorem follows similar a line of arguments as in [17] for the case with N=2N=2 and cache size 1 case. The fetch cost is incurred only when the service is fetched. The overall cost of fetching is t=1TM(ρt+1ρt)+\sum_{t=1}^{T}M(\rho_{t+1}-\rho_{t})^{+}. So, the expected fetch cost is given by

𝔼[𝒞FFTPL(T)]\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{F}(T)] =𝔼[t=1TM(ρt+1ρt)𝟙{ρt+1>ρt}]\displaystyle={\mathbb{E}}\left[\sum_{t=1}^{T}M(\rho_{t+1}-\rho_{t})\mathds{1}_{\{\rho_{t+1}>\rho_{t}\}}\right]
Mt=1T(ρt+1>ρt)\displaystyle\leq M\sum_{t=1}^{T}{\mathbb{P}}(\rho_{t+1}>\rho_{t})
Mt=1Tj>i(ρt+1=αj,ρt=αi).\displaystyle\leq M\sum_{t=1}^{T}\sum_{j>i}{\mathbb{P}}(\rho_{t+1}=\alpha_{j},\rho_{t}=\alpha_{i}).

Let i,j(t)\mathcal{E}_{i,j}^{(t)} be the event that ρt=αi\rho_{t}=\alpha_{i}, ρt+1=αj\rho_{t+1}=\alpha_{j}. i,j(t)\mathcal{E}_{i,j}^{(t)} occurs only if Θt,i+ηtγi<Θt,j+ηtγj\Theta_{t,i}+\eta_{t}\gamma_{i}<\Theta_{t,j}+\eta_{t}\gamma_{j}, Θt+1,i+ηt+1γi>Θt+1,j+ηt+1γj\Theta_{t+1,i}+\eta_{t+1}\gamma_{i}>\Theta_{t+1,j}+\eta_{t+1}\gamma_{j}. Let δα=αjαi\delta_{\alpha}=\alpha_{j}-\alpha_{i}, δg=g(αi)g(αj)\delta_{g}=g(\alpha_{i})-g(\alpha_{j}), note that 0<δα10<\delta_{\alpha}\leq 1, 0<δg10<\delta_{g}\leq 1.

(i,j(t))\displaystyle{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})
(Θt,i+ηtγi<Θt,j+ηtγj,\displaystyle\leq{\mathbb{P}}\left(\Theta_{t,i}+\eta_{t}\gamma_{i}<\Theta_{t,j}+\eta_{t}\gamma_{j},\right.
Θt+1,i+ηt+1γi>Θt+1,j+ηt+1γj)\displaystyle\quad\qquad\left.\Theta_{t+1,i}+\eta_{t+1}\gamma_{i}>\Theta_{t+1,j}+\eta_{t+1}\gamma_{j}\right)
=(Θt+1,jΘt+1,i2ηt+1<(γiγj)2<Θt,jΘt,i2ηt)\displaystyle={\mathbb{P}}\left(\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{\sqrt{2}\eta_{t+1}}<\frac{(\gamma_{i}-\gamma_{j})}{\sqrt{2}}<\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}}\right)
(a)Θt,jΘt,i2πηtΘt+1,jΘt+1,i2πηt+1\displaystyle\overset{(a)}{\leq}\frac{\Theta_{t,j}-\Theta_{t,i}}{2\sqrt{\pi}\eta_{t}}-\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{2\sqrt{\pi}\eta_{t+1}}
=12π((c(t1)δαδgRt1)(1ηt1ηt+1))\displaystyle=\frac{1}{2\sqrt{\pi}}\left((c(t-1)\delta_{\alpha}-\delta_{g}R_{t-1})\left(\frac{1}{\eta_{t}}-\frac{1}{\eta_{t+1}}\right)\right)
12π(cδαδgrtηt+1)\displaystyle\quad-\frac{1}{2\sqrt{\pi}}\left(\frac{c\delta_{\alpha}-\delta_{g}r_{t}}{\eta_{t+1}}\right)
12π(ct(1ηt1ηt+1)+δgrtcδαηt+1)\displaystyle\leq\frac{1}{2\sqrt{\pi}}\left(ct\left(\frac{1}{\eta_{t}}-\frac{1}{\eta_{t+1}}\right)+\frac{\delta_{g}r_{t}-c\delta_{\alpha}}{\eta_{t+1}}\right)
(b)12π(ct(1ηt1ηt+1)+κηt+1)\displaystyle\overset{(b)}{\leq}\frac{1}{2\sqrt{\pi}}\left(ct\left(\frac{1}{\eta_{t}}-\frac{1}{\eta_{t+1}}\right)+\frac{\kappa}{\eta_{t+1}}\right) (10)
=12απ(ct(1t1t+1)+κt+1)\displaystyle=\frac{1}{2\alpha\sqrt{\pi}}\left(ct\left(\frac{1}{\sqrt{t}}-\frac{1}{\sqrt{t+1}}\right)+\frac{\kappa}{\sqrt{t+1}}\right)
=12απ(ctt+11t+t+1+κt+1)\displaystyle=\frac{1}{2\alpha\sqrt{\pi}}\left(c\sqrt{\frac{t}{t+1}}\frac{1}{\sqrt{t}+\sqrt{t+1}}+\frac{\kappa}{\sqrt{t+1}}\right)
(c)(c+2κ)4απ(t+1).\displaystyle\overset{(c)}{\leq}\frac{(c+2\kappa)}{4\alpha\sqrt{\pi(t+1)}}.

Here (a)(a) follows from the fact that (a<Z<b)ba2π{\mathbb{P}}(a<Z<b)\leq\frac{b-a}{\sqrt{2\pi}}, where Z𝒩(0,1)Z\sim\mathcal{N}(0,1), (b)(b) follows because δg1\delta_{g}\leq 1 and rtκr_{t}\leq\kappa and, (c)(c) follows because tt+t+112\frac{\sqrt{t}}{\sqrt{t}+\sqrt{t+1}}\leq\frac{1}{2}. Therefore,

𝔼[𝒞fFTPL(T)]\displaystyle{\mathbb{E}}[\mathcal{C}_{f}^{\text{FTPL}}(T)] M(c+2κ)4απt=1Tj>i1t+1\displaystyle\leq\frac{M(c+2\kappa)}{4\alpha\sqrt{\pi}}\sum_{t=1}^{T}\sum_{j>i}\frac{1}{\sqrt{t+1}}
MK2(c+2κ)2απT+1.\displaystyle\leq\frac{MK^{2}(c+2\kappa)}{2\alpha\sqrt{\pi}}{\sqrt{T+1}}.

Proof of Theorem 1(c).

By using Lemma 4 and 5 we get,

AFTPL(T)\displaystyle\mathcal{R}^{\text{FTPL}}_{A}(T)\leq 2logK(αT+2κ2t=1T1αt)\displaystyle\sqrt{2\log K}\left(\alpha\sqrt{T}+2\kappa^{2}\sum_{t=1}^{T}\frac{1}{\alpha\sqrt{t}}\right)
+K2M(c+2κ)2απT+1\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}
\displaystyle\leq 2TlogK(α+4κ2α)\displaystyle\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+K2M(c+2κ)2απT+1.\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.

6.4 Proof of Theorem 1(d)

Proof of Theorem 1(d).

We characterize the fetch cost under FTPL to analyze the impact of MM on the total cost incurred under FTPL. Fetch cost is incurred if ρt1=0\rho_{t-1}=0, ρt=1\rho_{t}=1. Consider a request sequence r1=1r_{1}=1 and rt=0r_{t}=0 for all t2t\geq 2. The optimal static policy is to not host any fraction of service i.e., to forward all the requests to back-end server.

The probability of hosting αi\alpha_{i}, 1iK1\leq i\leq K fraction of service in first slot is 1/K1/K. Hosting any non zero fraction of service will incur at least Mα2M\alpha_{2} fetch cost. Therefore the expected fetch cost of FTPL policy will be at least Mα2/KM\alpha_{2}/K which is a lower bound on the expected cost of FTPL policy for this request sequence. For FTPL we can decouple the fetch cost and the non fetch cost and FTPL does not consider MM while making decisions. Therefore Mα2/KM\alpha_{2}/K is also a lower bound for regret of FTPL. This completes the proof.

6.5 Proof of Theorem 1(e)

Proof of Theorem 1(e).

Recall that Ts=min{t:t<(maxij(Θt+1,iΘt+1,j))2κ2β(logM)1+δ}T_{s}=\min\{t:t<\frac{(\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j}))^{2}}{\kappa^{2}\beta(\log M)^{1+\delta}}\}. For arrival sequences with requests arriving in time-slots 1 to TT, we have two possible cases, namely, TsTT_{s}\geq T and Ts<TT_{s}<T. We consider each case to bound the regret of W-FTPL policy.

Case I (TsTT_{s}\geq T): Let r(1)r^{(1)} be a request sequence chosen by adversary such that TsTT_{s}\geq T, then

|maxij(ΘT+1,jΘT+1,i)|\displaystyle|\max_{i\neq j}(\Theta_{T+1,j}-\Theta_{T+1,i})| <κβT(logM)1+δ\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}
|maxi(ΘT+1,1ΘT+1,i)|\displaystyle\implies|\max_{i}(\Theta_{T+1,1}-\Theta_{T+1,i})| <κβT(logM)1+δ\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}
(a)AW-FTPL(T)\displaystyle\overset{(a)}{\implies}\mathcal{R}^{\text{W-FTPL}}_{A}(T) <κβT(logM)1+δ,\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}},

(a)(a) follows because ρt=0\rho_{t}=0 for t<Tst<T_{s} for W-FTPL.

Case II (Ts<TT_{s}<T): Let r(2)r^{(2)} be a request sequence chosen by adversary such that Ts<TT_{s}<T, then by using case I we can bound

|maxijΘTs,iΘTs,j|\displaystyle|\max_{i\neq j}\Theta_{T_{s},i}-\Theta_{T_{s},j}| <κβ(Ts1)(logM)1+δ\displaystyle<\kappa\sqrt{\beta(T_{s}-1)(\log M)^{1+\delta}}
AW-FTPL(Ts1)\displaystyle\implies\mathcal{R}^{\text{W-FTPL}}_{A}(T_{s}-1) <κβT(logM)1+δ.\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}.

Thus combining both cases we get upper bound on regret in wait phase as

AW-FTPL(Ts1)\displaystyle\mathcal{R}^{\text{W-FTPL}}_{A}(T_{s}-1) <κβT(logM)1+δ.\displaystyle<\kappa\sqrt{\beta T(\log M)^{1+\delta}}. (11)

Note that W-FTPL policy follows FTPL policy after its waiting time i.e., from time TsT_{s}. Therefore W-FTPL and FTPL take same decisions and have same cost from time TsT_{s} to TT. Thus the regret in a slot is also same for W-FTPL and FTPL from time TsT_{s} to TT. If TsTT_{s}\geq T then only wait phase will be there and regret in FTPL phase is considered to be zero. We denote regret from time t1t_{1} to t2t_{2} as A𝒫(t1:t2)\mathcal{R}_{A}^{\mathcal{P}}(t_{1}:t_{2}).

AW-FTPL(T)\displaystyle\mathcal{R}_{A}^{\text{W-FTPL}}(T) =AW-FTPL(Ts1)+AW-FTPL(Ts:T)\displaystyle=\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}-1)+\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}:T)
=AW-FTPL(Ts1)+AFTPL(Ts:T)\displaystyle=\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}-1)+\mathcal{R}_{A}^{\text{FTPL}}(T_{s}:T)
AW-FTPL(Ts1)+AFTPL(1:T).\displaystyle\leq\mathcal{R}_{A}^{\text{W-FTPL}}(T_{s}-1)+\mathcal{R}_{A}^{\text{FTPL}}(1:T). (12)

By (11), (12) and Theorem 1(c) we get

AW-FTPL(T)\displaystyle\mathcal{R}^{\text{W-FTPL}}_{A}(T)\leq κβT(logM)1+δ\displaystyle\kappa\sqrt{\beta T(\log M)^{1+\delta}}
+2TlogK(α+4κ2α2)\displaystyle+\sqrt{2T\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha^{2}}\right)
+K2M(c+2κ)2απT+1.\displaystyle+\frac{K^{2}M(c+2\kappa)}{2\alpha\sqrt{\pi}}\sqrt{T+1}.

6.6 Proof of Theorem 2(a)

Proof of Theorem 2(a).

From Lemma 3 we know that once RR fetches any non-zero fraction of service, it hosts (some positive fraction of the service) for at least Mc\left\lceil\frac{M}{c}\right\rceil slots before it evicts completely. From Lemma 2, we know that, once RR evicts the complete service from the edge, then it does not fetch any non zero fraction of the service for at least Mαiκcαig(αi)κ\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil slots. We divide the entire time TT into frames of size fs=Mc+Mαiκcαig(αi)κf_{s}=\left\lceil\frac{M}{c}\right\rceil+\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil slots. Let us define an event \mathcal{F} as the frame that starts with Mαiκcαig(αi)κ\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil slots with κ\kappa arrivals in each slot and followed by Mc\left\lceil\frac{M}{c}\right\rceil slots of zeros arrivals. Let the probability of event \mathcal{F} occurring be (){\mathbb{P}}(\mathcal{F}) (independent of TT). Conditioned on event \mathcal{F}, if optimal static policy is to host non zero fraction of service then we have regret at least (κcαi+g(αi)κ)Mαiκcαig(αi)κ>0(\kappa-c\alpha_{i}+g(\alpha_{i})\kappa)\left\lceil\frac{M\alpha_{i^{\prime}}}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}\right\rceil>0 in a frame where αi0\alpha_{i}\neq 0. If optimal is not to host any fraction of service then we have regret of cαiMc>0c\alpha_{i}\left\lceil\frac{M}{c}\right\rceil>0 in a frame where αi0\alpha_{i}\neq 0. Therefore conditioned on event \mathcal{F} RR always has a finite nonzero regret say d(>0)d(>0) which is independent of TT. Therefore,

SRR(T)\displaystyle\mathcal{R}^{\text{RR}}_{S}(T) =𝔼r[f=1T/fsd𝟙(event  occured)]\displaystyle={\mathbb{E}}_{r}\left[\sum_{f=1}^{\lfloor T/f_{s}\rfloor}d\mathds{1}(\text{event $\mathcal{F}$ occured})\right]
(Tfs1)()d\displaystyle\geq\left(\frac{T}{f_{s}}-1\right){\mathbb{P}}(\mathcal{F})d
(TMκcαig(αi)κ+Mc+21)()d\displaystyle\geq\left(\frac{T}{\frac{M}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}+\frac{M}{c}+2}-1\right){\mathbb{P}}(\mathcal{F})d
=Ω(T).\displaystyle=\Omega(T).

So even in the stochastic case RR observes linear regret. ∎

6.7 Proof of Theorem 2(b)

Lemma 6.

Probability of hosting a sub-optimal fraction of service αjαi\alpha_{j}\neq\alpha_{i^{\star}} under the FTPL policy in time slot t+1t+1 is bounded as follows

(ρt+1=αj)exp(t2Δj216ηt+12)+exp(Δj2t2κ2).\displaystyle{\mathbb{P}}(\rho_{t+1}=\alpha_{j})\leq\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right).
Proof.

FTPL hosts αj\alpha_{j} fraction of service if cαit+g(αi)Rt+ηt+1γi>cαjt+g(αj)Rt+ηt+1γjc\alpha_{i}t+g(\alpha_{i})R_{t}+\eta_{t+1}\gamma_{i}>c\alpha_{j}t+g(\alpha_{j})R_{t}+\eta_{t+1}\gamma_{j} in time slot t+1t+1, for all iji\neq j. Let pt,j=(ρt+1=αj)p_{t,j}={\mathbb{P}}(\rho_{t+1}=\alpha_{j}).

pt,j(cαjt\displaystyle p_{t,j}\leq{\mathbb{P}}(c\alpha_{j}t +g(αj)Rt+ηt+1γj\displaystyle+g(\alpha_{j})R_{t}+\eta_{t+1}\gamma_{j}
<cαit+g(αi)Rt+ηt+1γi)\displaystyle<c\alpha_{i^{\star}}t+g(\alpha_{i^{\star}})R_{t}+\eta_{t+1}\gamma_{i^{\star}})
=(ηt+1\displaystyle={\mathbb{P}}(\eta_{t+1} (γiγj)\displaystyle(\gamma_{i^{\star}}-\gamma_{j})
c(αjαi)t+(g(αj)g(αi))Rt).\displaystyle\geq c(\alpha_{j}-\alpha_{i^{\star}})t+(g(\alpha_{j})-g(\alpha_{i^{\star}}))R_{t}).

There can be two possibilities one is αj>αi\alpha_{j}>\alpha_{i^{\star}} or αj<αi\alpha_{j}<\alpha_{i^{\star}} we bound the probability pt,jp_{t,j} by considering each case separately.
Case 1 (αj>αi\alpha_{j}>\alpha_{i^{\star}}): Since g(.)g(.) is a decreasing function g(αj)<g(αi)g(\alpha_{j})<g(\alpha_{i^{\star}}). Let j\mathcal{E}_{j} be the event that Rt<μt+tΔj/2R_{t}<\mu t+t\Delta_{j}/2.

pt,j\displaystyle p_{t,j}\leq (ηt+1(γiγj)c(αjαi)t\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq c(\alpha_{j}-\alpha_{i^{\star}})t
+(g(αj)g(αi))Rt,j)+(jc)\displaystyle\qquad\qquad+(g(\alpha_{j})-g(\alpha_{i^{\star}}))R_{t},\mathcal{E}_{j})+{\mathbb{P}}(\mathcal{E}_{j}^{c})
\displaystyle\leq (ηt+1(γiγj)tΔj(g(αi)g(αj))tΔj/2)\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}-(g(\alpha_{i^{\star}})-g(\alpha_{j}))t\Delta_{j}/2)
+(jc)\displaystyle+{\mathbb{P}}(\mathcal{E}_{j}^{c})
\displaystyle\leq (ηt+1(γiγj)tΔj/2)+(jc)\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}/2)+{\mathbb{P}}(\mathcal{E}_{j}^{c})
(a)\displaystyle\overset{(a)}{\leq} exp(t2Δj216ηt+12)+exp(Δj2t2κ2),\displaystyle\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right),

where (a)(a) is obtained by using the fact that complementary CDF of standard Gaussian Q(x)ex2/2Q(x)\leq e^{-x^{2}/2} for x>0x>0 and Hoeffding’s inequality [27].

Case 2 (αj<αi\alpha_{j}<\alpha_{i^{\star}}): Let j\mathcal{E}_{j} be an event Rt>μttΔj/2R_{t}>\mu t-t\Delta_{j}/2.

pt,j\displaystyle p_{t,j}\leq (ηt+1(γiγj)c(αjαi)t\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq c(\alpha_{j}-\alpha_{i^{\star}})t
+(g(αj)g(αi))Rt,j)+(jc)\displaystyle\qquad\qquad+(g(\alpha_{j})-g(\alpha_{i^{\star}}))R_{t},\mathcal{E}_{j})+{\mathbb{P}}(\mathcal{E}_{j}^{c})
\displaystyle\leq (ηt+1(γiγj)tΔj(g(αj)g(αi))tΔj/2)\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}-(g(\alpha_{j})-g(\alpha_{i^{\star}}))t\Delta_{j}/2)
+(jc)\displaystyle+{\mathbb{P}}(\mathcal{E}_{j}^{c})
\displaystyle\leq (ηt+1(γiγj)tΔj/2)+(jc)\displaystyle{\mathbb{P}}(\eta_{t+1}(\gamma_{i^{\star}}-\gamma_{j})\geq t\Delta_{j}/2)+{\mathbb{P}}(\mathcal{E}_{j}^{c})
(a)\displaystyle\overset{(a)}{\leq} exp(t2Δj216ηt+12)+exp(Δj2t2κ2),\displaystyle\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right),

where (a)(a) is obtained by using the fact that Q(x)ex2/2Q(x)\leq e^{-x^{2}/2} for x>0x>0 and Hoeffding’s inequality [27].

Therefore by combining both cases we get

(ρt+1=αi)exp(t2Δj216ηt+12)+exp(Δj2t2κ2).{\mathbb{P}}(\rho_{t+1}=\alpha_{i})\leq\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right). (13)

Lemma 7.

For M=0M=0,

SFTPL(T)\displaystyle\mathcal{R}_{S}^{\text{FTPL}}(T)\leq (2logK+22h1logKΔmin)(α+4κ2α)\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+16α2+4κ2Δmin,\displaystyle+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}},

where h1=2max{16α2,2κ2}h_{1}=2\max\{16\alpha^{2},2\kappa^{2}\}.

Proof.

Let t0=h1logKΔmin2t_{0}=\lceil h_{1}\frac{\log K}{\Delta_{\text{min}}^{2}}\rceil where h1=2max{16α2,2κ2}h_{1}=2\max\{16\alpha^{2},2\kappa^{2}\}. From (2) we have

SFTPL(T)\displaystyle\mathcal{R}_{S}^{\text{FTPL}}(T)\leq (t=1Tcρt+g(ρt)μμi)\displaystyle\left(\sum_{t=1}^{T}c\rho_{t}+g(\rho_{t})\mu-\mu_{i^{\star}}\right)
=\displaystyle= t=1T(iμi(ρt=αi))μi\displaystyle\sum_{t=1}^{T}\left(\sum_{i}\mu_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})\right)-\mu_{i^{\star}}
=\displaystyle= t=1TiiΔi(ρt=αi)\displaystyle\sum_{t=1}^{T}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})
=\displaystyle= t=1t0iiΔi(ρt=αi)\displaystyle\sum_{t=1}^{t_{0}}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})
+t=t0+1TiiΔi(ρt=αi)\displaystyle+\sum_{t=t_{0}+1}^{T}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i})
(a)\displaystyle\overset{(a)}{\leq} 2t0logK(α+4κ2α)\displaystyle\sqrt{2t_{0}\log K}\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+t=t0+1TiiΔi(ρt=αi),\displaystyle+\sum_{t=t_{0}+1}^{T}\sum_{i\neq i^{\star}}\Delta_{i}{\mathbb{P}}(\rho_{t}=\alpha_{i}),

where (a)(a) is obtained by using adversarial regret bound when M=0M=0. Now we consider each term in RHS separately and bound them.

2t0logK\displaystyle\sqrt{2t_{0}\log K}\leq 2logK(1+h1logKΔmin).\displaystyle\sqrt{2\log K}\left(1+\frac{\sqrt{h_{1}\log K}}{\Delta_{\text{min}}}\right).

By using Lemma 6 we have

Δj(ρt+1\displaystyle\Delta_{j}{\mathbb{P}}(\rho_{t+1} =αj)\displaystyle=\alpha_{j})
\displaystyle\leq Δj[exp(t2Δj216ηt+12)+exp(Δj2t2κ2)]\displaystyle\Delta_{j}\left[\exp\left(-\frac{t^{2}\Delta_{j}^{2}}{16\eta_{t+1}^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right)\right]
=\displaystyle= Δj[exp(tΔj216α2)+exp(Δj2t2κ2)]\displaystyle\Delta_{j}\left[\exp\left(-\frac{t\Delta_{j}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{j}^{2}t}{2\kappa^{2}}\right)\right]
(b)\displaystyle\overset{(b)}{\leq} Δmin[exp(tΔmin216α2)+exp(Δmin2t2κ2)].\displaystyle\Delta_{\text{min}}\left[\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)\right].

Here (b)(b) is obtained using the fact that f(u)=ueu2f(u)=ue^{-u^{2}} is a decreasing function over u[1,)u\in[1,\infty), and for t>t0t>t_{0}, tΔj216α2>h1logKΔj216α2Δmin21\frac{t\Delta_{j}^{2}}{16\alpha^{2}}>\frac{h_{1}\log K\Delta_{j}^{2}}{16\alpha^{2}\Delta_{\text{min}}^{2}}\geq 1 and tΔj22κ2>h1logKΔj22κ2Δmin21\frac{t\Delta_{j}^{2}}{2\kappa^{2}}>\frac{h_{1}\log K\Delta_{j}^{2}}{2\kappa^{2}\Delta_{\text{min}}^{2}}\geq 1. Therefore,

SFTPL\displaystyle\mathcal{R}_{S}^{\text{FTPL}} (T)\displaystyle(T)
\displaystyle\leq (2logK+22h1logKΔmin)(α+4κ2α)\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+t=t0TKΔmin[exp(tΔmin216α2)+exp(Δmin2t2κ2)]\displaystyle+\sum_{t=t_{0}}^{T}K\Delta_{\text{min}}\left[\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)\right]
(c)\displaystyle\overset{(c)}{\leq} (2logK+22h1logKΔmin)(α+4κ2α)\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+16α2+2κ2Δmin+2Δmin\displaystyle+\frac{16\alpha^{2}+2\kappa^{2}}{\Delta_{\text{min}}}+2\Delta_{\text{min}}
\displaystyle\leq (2logK+22h1logKΔmin)(α+4κ2α)\displaystyle\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+16α2+4κ2Δmin.\displaystyle+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}}.

Here (c)(c) follows because for t>t0t>t_{0} we have Kexp(tΔmin2/16α2)<1K\exp(-t\Delta_{\text{min}}^{2}/16\alpha^{2})<1, Kexp(tΔmin2/2κ2)<1K\exp(-t\Delta_{\text{min}}^{2}/2\kappa^{2})<1 and t=1eat1/a\sum_{t=1}^{\infty}e^{-at}\leq 1/a for a>0a>0. ∎

Lemma 8.

Fetch cost under FTPL policy with learning rate ηt=αt1\eta_{t}=\alpha\sqrt{t-1} is bounded as follows

𝔼[𝒞fFTPL(T)]M16α2+2κ2Δmin2.{\mathbb{E}}[\mathcal{C}_{f}^{\text{FTPL}}(T)]\leq M\frac{16\alpha^{2}+2\kappa^{2}}{\Delta_{\text{min}}^{2}}.
Proof.

The fetch cost is incurred when we fetch extra fraction of service. Similar to the proof of Lemma 5 we get

𝔼[𝒞fFTPL(T)]\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}_{f}(T)] Mt=1T(ρt<ρt+1)\displaystyle\leq M\sum_{t=1}^{T}{\mathbb{P}}(\rho_{t}<\rho_{t+1})
=Mt=1Tj>i(ρt=αi,ρt+1=αj).\displaystyle=M\sum_{t=1}^{T}\sum_{j>i}{\mathbb{P}}(\rho_{t}=\alpha_{i},\rho_{t+1}=\alpha_{j}).

Let i,j(t)\mathcal{E}_{i,j}^{(t)} be the event that ρt=αi\rho_{t}=\alpha_{i}, ρt+1=αj\rho_{t+1}=\alpha_{j}. i,j(t)\mathcal{E}_{i,j}^{(t)} occurs if Θt,i+ηtγi<Θt,j+ηtγj\Theta_{t,i}+\eta_{t}\gamma_{i}<\Theta_{t,j}+\eta_{t}\gamma_{j}, Θt+1,i+ηt+1γi>Θt+1,j+ηt+1γj\Theta_{t+1,i}+\eta_{t+1}\gamma_{i}>\Theta_{t+1,j}+\eta_{t+1}\gamma_{j}.

(i,j(t))\displaystyle{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)}) =(Θt+1,jΘt+1,i2ηt+1γiγj2Θt,jΘt,i2ηt).\displaystyle={\mathbb{P}}\left(\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{\sqrt{2}\eta_{t+1}}\leq\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}}\right).

We consider two cases μi>μj\mu_{i}>\mu_{j} and μi<μj\mu_{i}<\mu_{j} and bound the probability.
Case 1 (μi>μj\mu_{i}>\mu_{j}): Let 1\mathcal{E}_{1} be the event that |Rt1μ(t1)||Δij|(t1)/2|R_{t-1}-\mu(t-1)|\leq|\Delta_{ij}|(t-1)/2,

(\displaystyle{\mathbb{P}}( i,j(t))\displaystyle\mathcal{E}_{i,j}^{(t)})
\displaystyle\leq (γiγj2Θt,jΘt,i2ηt)\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}}\right)
\displaystyle\leq (γiγj2Θt,jΘt,i2ηt,1)+(1c)\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{\Theta_{t,j}-\Theta_{t,i}}{\sqrt{2}\eta_{t}},\mathcal{E}_{1}\right)+{\mathbb{P}}(\mathcal{E}_{1}^{c})
=\displaystyle= (γiγj2c(αjαi)(t1)2ηt\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{c(\alpha_{j}-\alpha_{i})(t-1)}{\sqrt{2}\eta_{t}}\right.
(g(αi)g(αj))Rt12ηt,1)+(1c)\displaystyle\hskip 76.82234pt\left.-\frac{(g(\alpha_{i})-g(\alpha_{j}))R_{t-1}}{\sqrt{2}\eta_{t}},\mathcal{E}_{1}\right)+{\mathbb{P}}(\mathcal{E}_{1}^{c})
\displaystyle\leq (γiγj2|Δij|(t1)22ηt)+(1c)\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\leq\frac{-|\Delta_{ij}|(t-1)}{2\sqrt{2}\eta_{t}}\right)+{\mathbb{P}}(\mathcal{E}_{1}^{c})
(a)\displaystyle\overset{(a)}{\leq} exp((t1)2Δij216ηt2)+exp(Δij2(t1)2κ2)\displaystyle\exp\left(-\frac{(t-1)^{2}\Delta_{ij}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{ij}^{2}(t-1)}{2\kappa^{2}}\right)
\displaystyle\leq exp((t1)2Δmin216ηt2)+exp(Δmin2(t1)2κ2)\displaystyle\exp\left(-\frac{(t-1)^{2}\Delta_{\text{min}}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right)
\displaystyle\leq exp((t1)Δmin216α2)+exp(Δmin2(t1)2κ2),\displaystyle\exp\left(-\frac{(t-1)\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right),

where (a)(a) is obtained by using the fact that Q(x)ex2/2Q(x)\leq e^{-x^{2}/2} for x>0x>0 and Hoeffding’s inequality [27].

Case 2 (μi<μj\mu_{i}<\mu_{j}): Let 2\mathcal{E}_{2} be the event that |Rtμt||Δij|t/2|R_{t}-\mu t|\leq|\Delta_{ij}|t/2,

\displaystyle{\mathbb{P}} (i,j(t))\displaystyle(\mathcal{E}_{i,j}^{(t)})
\displaystyle\leq (Θt+1,jΘt+1,i2ηt+1γiγj2)\displaystyle{\mathbb{P}}\left(\frac{\Theta_{t+1,j}-\Theta_{t+1,i}}{\sqrt{2}\eta_{t+1}}\leq\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\right)
\displaystyle\leq (c(αjαi)t(g(αi)g(αj))Rt2ηtγiγj2)\displaystyle{\mathbb{P}}\left(\frac{c(\alpha_{j}-\alpha_{i})t-(g(\alpha_{i})-g(\alpha_{j}))R_{t}}{\sqrt{2}\eta_{t}}\leq\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\right)
\displaystyle\leq (γiγj2c(αjαi)t(g(αi)g(αj))Rt2ηt,2)\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\geq\frac{c(\alpha_{j}-\alpha_{i})t-(g(\alpha_{i})-g(\alpha_{j}))R_{t}}{\sqrt{2}\eta_{t}},\mathcal{E}_{2}\right)
+(2c)\displaystyle+{\mathbb{P}}(\mathcal{E}_{2}^{c})
\displaystyle\leq (γiγj2|Δij|t22ηt)+(2c)\displaystyle{\mathbb{P}}\left(\frac{\gamma_{i}-\gamma_{j}}{\sqrt{2}}\geq\frac{|\Delta_{ij}|t}{2\sqrt{2}\eta_{t}}\right)+{\mathbb{P}}(\mathcal{E}_{2}^{c})
(b)\displaystyle\overset{(b)}{\leq} exp(t2Δij216ηt2)+exp(Δij2t2κ2)\displaystyle\exp\left(-\frac{t^{2}\Delta_{ij}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{ij}^{2}t}{2\kappa^{2}}\right)
\displaystyle\leq exp(t2Δmin216ηt2)+exp(Δmin2t2κ2)\displaystyle\exp\left(-\frac{t^{2}\Delta_{\text{min}}^{2}}{16\eta_{t}^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)
\displaystyle\leq exp(tΔmin216α2)+exp(Δmin2t2κ2)\displaystyle\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)
\displaystyle\leq exp((t1)Δmin216α2)+exp(Δmin2(t1)2κ2),\displaystyle\exp\left(-\frac{(t-1)\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right),

where (b)(b) is obtained by using the fact that Q(x)ex2/2Q(x)\leq e^{-x^{2}/2} for x>0x>0 and Hoeffding’s inequality [27].

From combining both cases we get

(i,j(t))exp((t1)Δmin216α2)+exp(Δmin2(t1)2κ2).\displaystyle{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})\leq\exp\left(-\frac{(t-1)\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}(t-1)}{2\kappa^{2}}\right). (14)

Therefore,

𝔼[𝒞FFTPL]\displaystyle{\mathbb{E}}[\mathcal{C}_{F}^{\text{FTPL}}]\leq Mt=1Tj>i(i,j(t))\displaystyle M\sum_{t=1}^{T}\sum_{j>i}{\mathbb{P}}(\mathcal{E}_{i,j}^{(t)})
\displaystyle\leq MK2t=1T1exp(tΔmin216α2)+exp(Δmin2t2κ2)\displaystyle MK^{2}\sum_{t=1}^{T-1}\exp\left(-\frac{t\Delta_{\text{min}}^{2}}{16\alpha^{2}}\right)+\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)
+MK2\displaystyle+MK^{2}
\displaystyle\leq MK216α2+3κ2Δmin2.\displaystyle MK^{2}\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}.

Proof of Theorem 2(b).

Note that FTPL policy does not consider fetch cost MM while taking the decisions. By using Lemma 7, 8 we get the result stated. ∎

6.8 Proof of Theorem 2(c)

Lemma 9.

Under the W-FTPL policy, Ts>tT_{s}>t^{\prime} if and only if there exist t<tt<t^{\prime} such that the following condition holds:

(g(αj~t)g(αi~t))μ^t\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t} >c(αj~tαi~t)+κβ(logM)1+δt,\displaystyle>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},
or
(g(αj~t)g(αi~t))μ^t\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t} <c(αj~tαi~t)κβ(logM)1+δt,\displaystyle<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},

where (i~t,j~t)=argmaxijΘt+1,iΘt+1,j(\tilde{i}_{t},\tilde{j}_{t})=\operatorname*{arg\,max}_{i\neq j}\Theta_{t+1,i}-\Theta_{t+1,j}, μ^t=Rt/t\hat{\mu}_{t}=R_{t}/t.

Proof.

Let (i~t,j~t)=argmaxijΘt+1,iΘt+1,j(\tilde{i}_{t},\tilde{j}_{t})=\operatorname*{arg\,max}_{i\neq j}\Theta_{t+1,i}-\Theta_{t+1,j}, μ^t=Rt/t\hat{\mu}_{t}=R_{t}/t. Ts>tT_{s}>t^{\prime} if and only if there exist t<tt<t^{\prime} such that

|maxij(Θt+1,iΘt+1,j)|t\displaystyle\frac{|\max_{i\neq j}(\Theta_{t+1,i}-\Theta_{t+1,j})|}{t} <κβ(logM)1+δt\displaystyle<\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}
|Θt+1,j~tΘt+1,i~t|t\displaystyle\iff\frac{|\Theta_{t+1,\tilde{j}_{t}}-\Theta_{t+1,\tilde{i}_{t}|}}{t} >κβ(logM)1+δt\displaystyle>\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}
\displaystyle\Updownarrow
|c(αj~tαi~t)+(g(αj~t)g(αi~t))μ^t|\displaystyle|c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}| >κβ(logM)1+δt\displaystyle>\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}
\displaystyle\Updownarrow
(g(αj~t)g(αi~t))μ^t>c(αj~tαi~t)\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}}) +κβ(logM)1+δt\displaystyle+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}
or
(g(αj~t)g(αi~t))μ^t<c(αj~tαi~t)\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}}) κβ(logM)1+δt.\displaystyle-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}.

This completes the proof. ∎

Lemma 10.

Under the W-FTPL policy, Ts>(β1)2(logM)1+δΔmax2T_{s}>\frac{(\sqrt{\beta}-1)^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}} with probability at least 1(β1)2(logM)1+δM2(logM)δΔmax21-\frac{(\sqrt{\beta}-1)^{2}(\log M)^{1+\delta}}{M^{2(\log M)^{\delta}}\Delta_{\text{max}}^{2}}.

Proof.

Let t\mathcal{E}_{t} be the event that

|μμ^t|κ(logM)1+δt,|\mu-\hat{\mu}_{t}|\leq\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}},

where μ^t=Rt/t\hat{\mu}_{t}=R_{t}/t. By using Hoeffding’s inequality [27] we get

(t)11M2(logM)δ.{\mathbb{P}}(\mathcal{E}_{t})\geq 1-\frac{1}{M^{2(\log M)^{\delta}}}.

Let =t=1T0t\mathcal{E}=\cap_{t=1}^{T_{0}}\mathcal{E}_{t}, where T0=(β1)2κ2(logM)1+δΔmax2T_{0}=\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}. Using the union bound, we get:

()1T0M2(logM)δ.{\mathbb{P}}(\mathcal{E})\geq 1-\frac{T_{0}}{M^{2(\log M)^{\delta}}}.

Let ϵt=κβ(logM)1+δt.\epsilon_{t}=\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}. We prove the lemma by contradiction. Let us consider

Ts<(β1)2κ2(logM)1+δΔmax2,T_{s}<\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},

and \mathcal{E} holds, then by Lemma 9, t<(β1)2κ2(logM)1+δΔmax2\exists t<\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}} such that

(g(αj~t)g(αi~t))μ^t\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t} >c(αj~tαi~t)+κβ(logM)1+δt,\displaystyle>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},
or
(g(αj~t)g(αi~t))μ^t\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t} <c(αj~tαi~t)κβ(logM)1+δt,\displaystyle<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},

holds. There are two possibilities αj~t<αi~t\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}, αj~t>αi~t\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}} we consider them separately to prove the final result.
Case 1 (αj~t<αi~t\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}): It follows that

(g(αj~t)g(αi~t))μ^t>c(αj~tαi~t)+ϵt\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\epsilon_{t}
μ^tμ>c(αi~tαj~t)+ϵtg(αj~t)g(αi~t)μ\displaystyle\implies\hat{\mu}_{t}-\mu>\frac{c(\alpha_{\tilde{i}_{t}}-\alpha_{\tilde{j}_{t}})+\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}-\mu
κ(logM)1+δt>Δmax+ϵtg(αj~t)g(αi~t)\displaystyle\implies\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}>\frac{-\Delta_{\text{max}}+\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}
(βg(αj~t)+g(αi~t))κ(logM)1+δt<Δmax\displaystyle\implies(\sqrt{\beta}-g(\alpha_{\tilde{j}_{t}})+g(\alpha_{\tilde{i}_{t}}))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}
(β1))κ(logM)1+δt<Δmax\displaystyle\implies(\sqrt{\beta}-1))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}
t>(β1)2κ2(logM)1+δΔmax2,\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},

which is a contradiction. Alternatively,

(g(αj~t)g(αi~t))μ^t<c(αj~tαi~t)ϵt\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\epsilon_{t}
μ^tμ<c(αi~tαj~t)ϵtg(αj~t)g(αi~t)μ\displaystyle\implies\hat{\mu}_{t}-\mu<\frac{c(\alpha_{\tilde{i}_{t}}-\alpha_{\tilde{j}_{t}})-\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}-\mu
κ(logM)1+δt<Δmaxϵtg(αj~t)g(αi~t)\displaystyle\implies-\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\frac{\Delta_{\text{max}}-\epsilon_{t}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}
(βg(αj~t)+g(αi~t))κ(logM)1+δt<Δmax\displaystyle\implies(\sqrt{\beta}-g(\alpha_{\tilde{j}_{t}})+g(\alpha_{\tilde{i}_{t}}))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}
(β1))κ(logM)1+δt<Δmax\displaystyle\implies(\sqrt{\beta}-1))\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\Delta_{\text{max}}
t>(β1)2κ2(logM)1+δΔmax2,\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},

which is also a contradiction.
Case 2 (αj~t>αi~t\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}): It follows that:

(g(αj~t)g(αi~t))μ^t>c(αj~tαi~t)+ϵt\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\epsilon_{t}
μ^tμ<c(αj~tαi~t)ϵtg(αi~t)g(αj~t)μ\displaystyle\implies\hat{\mu}_{t}-\mu<\frac{c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}-\mu
κ(logM)1+δt<Δmaxϵtg(αi~t)g(αj~t)\displaystyle\implies-\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}<\frac{\Delta_{\text{max}}-\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}
t>(β1)2κ2(logM)1+δΔmax2,\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},

which is a contradiction. Alternatively,

(g(αj~t)g(αi~t))μ^t<c(αj~tαi~t)ϵt\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t}<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\epsilon_{t}
μ^tμ>c(αj~tαi~t)+ϵtg(αi~t)g(αj~t)μ\displaystyle\implies\hat{\mu}_{t}-\mu>\frac{c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}-\mu
κ(logM)1+δt>Δmax+ϵtg(αi~t)g(αj~t)\displaystyle\implies\kappa\sqrt{\frac{(\log M)^{1+\delta}}{t}}>\frac{-\Delta_{\text{max}}+\epsilon_{t}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}
t>(β1)2κ2(logM)1+δΔmax2,\displaystyle\implies t>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}},

which is also a contradiction.

Therefore Ts>(β1)2κ2(logM)1+δΔmax2T_{s}>\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}, if \mathcal{E} occurs, which happens with probability at least 1(β1)2κ2(logM)1+δM2(logM)δΔmax21-\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{M^{2(\log M)^{\delta}}\Delta_{\text{max}}^{2}}. ∎

Lemma 11.

Under the W-FTPL policy, (Ts>t)exp(Δmin2t2κ2){\mathbb{P}}(T_{s}>t)\leq\exp(-\frac{\Delta_{\text{min}}^{2}t}{2\kappa^{2}}) for t>4βκ2(logM)1+δΔmin2t>\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}.

Proof.

For t>4βκ2(logM)1+δΔmin2t>\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}, we have Δmin2>κβ(logM)1+δt\frac{\Delta_{\text{min}}}{2}>\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}. For Ts>tT_{s}>t, we have

(g(αj~t)g(αi~t))μ^t\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t} <c(αj~tαi~t)+κβ(logM)1+δt,\displaystyle<-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}},
and
(g(αj~t)g(αi~t))μ^t\displaystyle(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))\hat{\mu}_{t} >c(αj~tαi~t)κβ(logM)1+δt.\displaystyle>-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}.

Let us consider the following cases
Case 1 (αj~t<αi~t\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}, Δi~tj~t>0\Delta_{\tilde{i}_{t}\tilde{j}_{t}}>0):

(Ts>t)\displaystyle{\mathbb{P}}(T_{s}>t) (μ^t>c(αj~tαi~t)κβ(logM)1+δt(g(αj~t)g(αi~t)))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}>\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)
(μ^tμ>Δi~tj~tκβ(logM)1+δtg(αj~t)g(αi~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\tilde{i}_{t}\tilde{j}_{t}}-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)
(μ^tμ>ΔminΔmin/2g(αj~t)g(αi~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}-\Delta_{\text{min}}/2}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)
(μ^tμ>Δmin2)\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}}{2}\right)
(a)exp(Δmin2t2κ2),\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right),

where (a)(a) is obtained by using Hoeffding’s inequality [27].
Case 2 (αj~t<αi~t\alpha_{\tilde{j}_{t}}<\alpha_{\tilde{i}_{t}}, Δi~tj~t<0\Delta_{\tilde{i}_{t}\tilde{j}_{t}}<0):

(Ts>t)\displaystyle{\mathbb{P}}(T_{s}>t) (μ^t<c(αj~tαi~t)+κβ(logM)1+δt(g(αj~t)g(αi~t)))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}<\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)
(μ^tμ<Δi~tj~t+κβ(logM)1+δtg(αj~t)g(αi~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{\Delta_{\tilde{i}_{t}\tilde{j}_{t}}+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)
(μ^tμ<Δmin+Δmin/2g(αj~t)g(αi~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}+\Delta_{\text{min}}/2}{g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}})}\right)
(μ^tμ<Δmin2)\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}}{2}\right)
(a)exp(Δmin2t2κ2).\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right).

where (a)(a) is obtained by using Hoeffding’s inequality [27].
Case 3 (αj~t>αi~t\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}, Δi~tj~t>0\Delta_{\tilde{i}_{t}\tilde{j}_{t}}>0):

(Ts>t)\displaystyle{\mathbb{P}}(T_{s}>t) (μ^t<c(αj~tαi~t)κβ(logM)1+δt(g(αj~t)g(αi~t)))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}<\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)
(μ^tμ<Δi~tj~t+κβ(logM)1+δtg(αi~t)g(αj~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\tilde{i}_{t}\tilde{j}_{t}}+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)
(μ^tμ<Δmin+κβ(logM)1+δtg(αi~t)g(αj~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)
(μ^tμ<Δmin/2g(αi~t)g(αj~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<\frac{-\Delta_{\text{min}}/2}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)
(μ^tμ<Δmin/2)\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu<-\Delta_{\text{min}}/2\right)
(a)exp(Δmin2t2κ2).\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right).

where (a)(a) is obtained by using Hoeffding’s inequality [27].
Case 4 (αj~t>αi~t\alpha_{\tilde{j}_{t}}>\alpha_{\tilde{i}_{t}}, Δi~tj~t<0\Delta_{\tilde{i}_{t}\tilde{j}_{t}}<0):

(Ts>t)\displaystyle{\mathbb{P}}(T_{s}>t) (μ^t>c(αj~tαi~t)+κβ(logM)1+δt(g(αj~t)g(αi~t)))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}>\frac{-c(\alpha_{\tilde{j}_{t}}-\alpha_{\tilde{i}_{t}})+\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{(g(\alpha_{\tilde{j}_{t}})-g(\alpha_{\tilde{i}_{t}}))}\right)
(μ^tμ>Δi~tj~tκβ(logM)1+δtg(αi~t)g(αj~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{-\Delta_{\tilde{i}_{t}\tilde{j}_{t}}-\kappa\sqrt{\frac{\beta(\log M)^{1+\delta}}{t}}}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)
(μ^tμ>ΔminΔmin/2g(αi~t)g(αj~t))\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}-\Delta_{\text{min}}/2}{g(\alpha_{\tilde{i}_{t}})-g(\alpha_{\tilde{j}_{t}})}\right)
(μ^tμ>Δmin2)\displaystyle\leq{\mathbb{P}}\left(\hat{\mu}_{t}-\mu>\frac{\Delta_{\text{min}}}{2}\right)
(a)exp(Δmin2t2κ2),\displaystyle\overset{(a)}{\leq}\exp\left(\frac{-\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right),

where (a)(a) is obtained by using Hoeffding’s inequality [27]. The result follows by combining these cases. ∎

Lemma 12.

Under the W-FTPL policy

𝔼[Ts]1+4βκ2(logM)1+δΔmin2+1M2β2κ2Δmin2.{\mathbb{E}}[T_{s}]\leq 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{1}{M^{2\beta}}\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}.
Proof.

Let T1=4βκ2(logM)1+δΔmin2T_{1}=\lceil\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}\rceil. It follows that

𝔼[Ts]\displaystyle{\mathbb{E}}[T_{s}] =t=1T(Ts>t)\displaystyle=\sum_{t=1}^{T}{\mathbb{P}}(T_{s}>t)
=t=1T1(Ts>t)+t=T1+1T(Ts>t)\displaystyle=\sum_{t=1}^{T_{1}}{\mathbb{P}}(T_{s}>t)+\sum_{t=T_{1}+1}^{T}{\mathbb{P}}(T_{s}>t)
(a)T1+t=T1Texp(Δmin2t2κ2)\displaystyle\overset{(a)}{\leq}T_{1}+\sum_{t=T_{1}}^{T}\exp\left(-\frac{\Delta_{\text{min}}^{2}t}{2\kappa^{2}}\right)
1+4βκ2(logM)1+δΔmin2+1M2β(logM)δ2κ2Δmin2,\displaystyle\leq 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{1}{M^{2\beta(\log M)^{\delta}}}\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}},

where (a)(a) is obtained by using Lemma 11. ∎

Proof of Theorem 2(c).

Under W-FTPL,

𝔼r[𝒞W-FTPL(T,r)]=\displaystyle{\mathbb{E}}_{r}[\mathcal{C}^{\text{W-FTPL}}(T,r)]= 𝔼r[Ts]+𝔼r[t=Ts+1T(cρt+g(ρt)rt)]\displaystyle{\mathbb{E}}_{r}[T_{s}]+{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}(c\rho_{t}+g(\rho_{t})r_{t})\right]
+M𝔼r[t=Ts+1T(ρt>ρt1)].\displaystyle+M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right].

By the using definition of regret we get,

SW-FTPL(T)\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}(T)\leq 𝔼[Ts]+t=1T(cρt+μg(ρt))μi\displaystyle{\mathbb{E}}[T_{s}]+\sum_{t=1}^{T}(c\rho_{t}+\mu g(\rho_{t}))-\mu_{i^{\star}}
+M𝔼r[t=Ts+1T(ρt>ρt1)].\displaystyle+M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right].

By Lemma 7, we get

SW-FTPL(T)\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S}(T)\leq 𝔼[Ts]+16α2+4κ2Δmin\displaystyle{\mathbb{E}}[T_{s}]+\frac{16\alpha^{2}+4\kappa^{2}}{\Delta_{\text{min}}}
+(2logK+22h1logKΔmin)(α+4κ2α)\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+M𝔼r[t=Ts+1T(ρt>ρt1)].\displaystyle+M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right].

Let 𝔼r[𝒞f]=M𝔼r[t=Ts+1T(ρt>ρt1)]{\mathbb{E}}_{r}[\mathcal{C}_{f}]=M{\mathbb{E}}_{r}\left[\sum_{t=T_{s}+1}^{T}{\mathbb{P}}(\rho_{t}>\rho_{t-1})\right], then

𝔼r[𝒞f]=\displaystyle{\mathbb{E}}_{r}[\mathcal{C}_{f}]= 𝔼r[𝒞f|TsT0](TsT0)\displaystyle{\mathbb{E}}_{r}[\mathcal{C}_{f}|T_{s}\leq T_{0}]{\mathbb{P}}(T_{s}\leq T_{0})
+𝔼r[𝒞f|Ts>T0](Ts>T0)\displaystyle+{\mathbb{E}}_{r}[\mathcal{C}_{f}|T_{s}>T_{0}]{\mathbb{P}}(T_{s}>T_{0})
\displaystyle\leq 𝔼[𝒞f|Ts=1](TsT0)+𝔼[𝒞f|Ts=T0]\displaystyle{\mathbb{E}}[\mathcal{C}_{f}|T_{s}=1]{\mathbb{P}}(T_{s}\leq T_{0})+{\mathbb{E}}[\mathcal{C}_{f}|T_{s}=\lceil T_{0}\rceil]
(a)\displaystyle\overset{(a)}{\leq} M(TsT0)𝔼[𝒞FFTPL]\displaystyle M{\mathbb{P}}(T_{s}\leq T_{0}){\mathbb{E}}[\mathcal{C}_{F}^{\text{FTPL}}]
+Mt=T0T(exp(Δmin2t16α2)+eΔmin2t/2κ2)\displaystyle+M\sum_{t=\lceil T_{0}\rceil}^{T}\left(\exp\left(-\frac{\Delta_{\text{min}}^{2}t}{16\alpha^{2}}\right)+e^{-\Delta_{\text{min}}^{2}t/2\kappa^{2}}\right)
\displaystyle\leq M(TsT0)16α2+3κ2Δmin2\displaystyle M{\mathbb{P}}(T_{s}\leq T_{0})\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}
+16α2Δmin2exp(Δmin2T016α2)\displaystyle+\frac{16\alpha^{2}}{\Delta_{\text{min}}^{2}}\exp\left(-\frac{\Delta_{\text{min}}^{2}T_{0}}{16\alpha^{2}}\right)
+2κ2Δmin2exp(Δmin2T02κ2).\displaystyle+\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}\exp\left(-\frac{\Delta_{\text{min}}^{2}T_{0}}{2\kappa^{2}}\right).

Here, (a)(a) is obtained by using inequality (14). By considering T0=(β1)2κ2(logM)1+δΔmax2T_{0}=\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}} and using Lemma 10, 12 we get,

SW-FTPL\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S} (T)\displaystyle(T)
\displaystyle\leq 1+4βκ2(logM)1+δΔmin2+1M2β(logM)δ2κ2Δmin2\displaystyle 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{1}{M^{2\beta(\log M)^{\delta}}}\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}
+(16α2+4κ2)Δmin\displaystyle+\frac{(16\alpha^{2}+4\kappa^{2})}{\Delta_{\text{min}}}
+(2logK+22h1logKΔmin)(α+4κ2α)\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+M(β1)2κ2(logM)1+δΔmax2M2(logM)δ16α2+3κ2Δmin2\displaystyle+M\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}M^{2(\log M)^{\delta}}}\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}
+(16α2MM(β1)2κ2Δmin2(logM)δ16α2Δmax2+2Mκ2M(β1)2Δmin2(logM)δ2Δmax2)\displaystyle+\left(\frac{16\alpha^{2}M}{M^{\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}\Delta_{\text{min}}^{2}(\log M)^{\delta}}{16\alpha^{2}\Delta_{\text{max}}^{2}}}}+\frac{2M\kappa^{2}}{M^{\frac{(\sqrt{\beta}-1)^{2}\Delta_{\text{min}}^{2}(\log M)^{\delta}}{2\Delta_{\text{max}}^{2}}}}\right)
×1Δmin2.\displaystyle\times\frac{1}{\Delta_{\text{min}}^{2}}.

For large values of MM we have

SW-FTPL\displaystyle\mathcal{R}^{\text{W-FTPL}}_{S} (T)\displaystyle(T)
\displaystyle\leq 1+4βκ2(logM)1+δΔmin2+2κ2Δmin2\displaystyle 1+\frac{4\beta\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{min}}^{2}}+\frac{2\kappa^{2}}{\Delta_{\text{min}}^{2}}
+(16α2+4κ2)Δmin\displaystyle+\frac{(16\alpha^{2}+4\kappa^{2})}{\Delta_{\text{min}}}
+(2logK+22h1logKΔmin)(α+4κ2α)\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right)
+(β1)2κ2(logM)1+δΔmax216α2+3κ2Δmin2\displaystyle+\frac{(\sqrt{\beta}-1)^{2}\kappa^{2}(\log M)^{1+\delta}}{\Delta_{\text{max}}^{2}}\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}}^{2}}
+16α2+2κ2Δmin2\displaystyle+\frac{16\alpha^{2}+2\kappa^{2}}{\Delta_{\text{min}}^{2}}
\displaystyle\leq 1+βκ2(logM)1+δ(4Δmin2+16α2+3κ2Δmin2Δmax2)\displaystyle 1+\beta\kappa^{2}(\log M)^{1+\delta}\left(\frac{4}{\Delta_{\text{min}^{2}}}+\frac{16\alpha^{2}+3\kappa^{2}}{\Delta_{\text{min}^{2}}\Delta_{\text{max}}^{2}}\right)
+(16α2+4κ2)(1Δmin+1Δmin2)\displaystyle+(16\alpha^{2}+4\kappa^{2})\left(\frac{1}{\Delta_{\text{min}}}+\frac{1}{\Delta_{\text{min}^{2}}}\right)
+(2logK+22h1logKΔmin)(α+4κ2α).\displaystyle+\left(\sqrt{2\log K}+\frac{2\sqrt{2h_{1}}\log K}{\Delta_{\text{min}}}\right)\left(\alpha+\frac{4\kappa^{2}}{\alpha}\right).

6.9 Proof of Theorem 3(a)

Lemma 13.

The cost of the offline optimal policy is lower bounded as follows

𝒞OPT-OFF(T,r)RTκ2(minicαi+g(αi)κ).\mathcal{C}^{\text{OPT-OFF}}(T,r)\geq\frac{R_{T}}{\kappa^{2}}(\min_{i}{c\alpha_{i}+g(\alpha_{i})\kappa}).
Proof.

The cost incurred by the optimal offline policy with M>0M>0 is lower bounded by the cost incurred by the optimal offline policy when M=0M=0. For M>0M>0, there is an additional fetching cost. Therefore,

𝒞OPT-OFF(T,r)\displaystyle\mathcal{C}^{\text{OPT-OFF}}(T,r) t=1Tmini(cαi+g(αi)rt)\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}+g(\alpha_{i})r_{t})
t=1Tmini(cαi/κ+g(αi)rt)\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}/\kappa+g(\alpha_{i})r_{t})
t=1Tmini(cαi/κ+g(αi))𝟙{rt1}\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}/\kappa+g(\alpha_{i}))\mathds{1}_{\{r_{t}\geq 1\}}
t=1Tmini(cαi/κ+g(αi))rt/κ\displaystyle\geq\sum_{t=1}^{T}\min_{i}(c\alpha_{i}/\kappa+g(\alpha_{i}))r_{t}/\kappa
RTκ2(minicαi+g(αi)κ).\displaystyle\geq\frac{R_{T}}{\kappa^{2}}(\min_{i}{c\alpha_{i}+g(\alpha_{i})\kappa}).

Lemma 14.

The expected cost of FTPL policy for any request sequence rr for ηt=αt1\eta_{t}=\alpha\sqrt{t-1}, α>0\alpha>0, can be upper bounded as follows

𝔼[𝒞FTPL(T,r)]((3+2M/c)maxαi01g(αi)αi)RT\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]\leq\left((3+2M/c)\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}\right)R_{T}
+(c+M)i=2K16α2c2αi2.\displaystyle+(c+M)\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.
Proof.

FTPL does not host any fraction the service for time t+1t+1 if cαit+g(αi)Rt+ηt+1γi>Rt+ηt+1γ1c\alpha_{i}t+g(\alpha_{i})R_{t}+\eta_{t+1}\gamma_{i}>R_{t}+\eta_{t+1}\gamma_{1}, for all i1i\neq 1. The probability of hosting any non zero fraction of service (ph,tp_{h,t}) in time slot t+1t+1 is given as

ph,t\displaystyle p_{h,t} i=2K(cαit+g(αi)Rt+ηt+1γi<Rt+ηt+1γ1)\displaystyle\leq\sum_{i=2}^{K}{\mathbb{P}}(c\alpha_{i}t+g(\alpha_{i})R_{t}+\eta_{t+1}\gamma_{i}<R_{t}+\eta_{t+1}\gamma_{1})
=i=2K(γ1γi2>cαit+g(αi)RtRt2ηt+1)\displaystyle=\sum_{i=2}^{K}{\mathbb{P}}\left(\frac{\gamma_{1}-\gamma_{i}}{\sqrt{2}}>\frac{c\alpha_{i}t+g(\alpha_{i})R_{t}-R_{t}}{\sqrt{2}\eta_{t+1}}\right)
=i=2K(γ1γi2>cαit(1g(αi))Rt2ηt+1).\displaystyle=\sum_{i=2}^{K}{\mathbb{P}}\left(\frac{\gamma_{1}-\gamma_{i}}{\sqrt{2}}>\frac{c\alpha_{i}t-(1-g(\alpha_{i}))R_{t}}{\sqrt{2}\eta_{t+1}}\right).

Let T=2RTcmaxαi01g(αi)αiT^{\prime}=\frac{2R_{T}}{c}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}, if tTt\geq T^{\prime} then RtRTct2minαi0αi1g(αi)R_{t}\leq R_{T}\leq\frac{ct}{2}\min_{\alpha_{i}\neq 0}\frac{\alpha_{i}}{1-g(\alpha_{i})}. Therefore for tTt\geq T^{\prime}, Rtcαit2(1g(αi))R_{t}\leq\frac{c\alpha_{i}t}{2(1-g(\alpha_{i}))} for all αi0\alpha_{i}\neq 0 and we get

ph,t\displaystyle p_{h,t} i=2K(γ1γi2>cαitcαit/22ηt+1)\displaystyle\leq\sum_{i=2}^{K}{\mathbb{P}}\left(\frac{\gamma_{1}-\gamma_{i}}{\sqrt{2}}>\frac{c\alpha_{i}t-c\alpha_{i}t/2}{\sqrt{2}\eta_{t+1}}\right)
i=2Kexp(c2αi2t216ηt+12)\displaystyle\leq\sum_{i=2}^{K}\exp\left(-\frac{c^{2}\alpha_{i}^{2}t^{2}}{16\eta_{t+1}^{2}}\right)
=i=2Kexp(c2αi2t16α2).\displaystyle=\sum_{i=2}^{K}\exp\left(-\frac{c^{2}\alpha_{i}^{2}t}{16\alpha^{2}}\right). (15)

Therefore, the expected cost under FTPL can be bounded as

𝔼[𝒞FTPL\displaystyle{\mathbb{E}}[\mathcal{C}^{\text{FTPL}} (T,r)]\displaystyle(T,r)]
\displaystyle\leq t=1T(c+M)ph,t+rt\displaystyle\sum_{t=1}^{T}(c+M)p_{h,t}+r_{t}
=\displaystyle= RT+t=1T(c+M)ph,t\displaystyle R_{T}+\sum_{t=1}^{T}(c+M)p_{h,t}
\displaystyle\leq RT+(c+M)T+(c+M)t=TTph,t\displaystyle R_{T}+(c+M)T^{\prime}+(c+M)\sum_{t=\lceil T^{\prime}\rceil}^{T}p_{h,t}
(a)\displaystyle\overset{(a)}{\leq} RT+(c+M)T\displaystyle R_{T}+(c+M)T^{\prime}
+(c+M)t=TTi=2Kexp(c2αi2t16α2)\displaystyle+(c+M)\sum_{t=\lceil T^{\prime}\rceil}^{T}\sum_{i=2}^{K}\exp\left(-\frac{c^{2}\alpha_{i}^{2}t}{16\alpha^{2}}\right)
\displaystyle\leq ((3+2M/c)maxαi01g(αi)αi)RT\displaystyle\left((3+2M/c)\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}\right)R_{T}
+(c+M)i=2K16α2c2αi2.\displaystyle+(c+M)\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.

Here (a)(a) follows from (15). If T<TT<T^{\prime}, then the bound will only have the first term and even in that case the above bound holds. Therefore the above bound holds for any TT. ∎

Proof of Theorem 3(a).

By using Lemma 13, we have

σAFTPL\displaystyle\sigma_{A}^{\text{FTPL}}\leq supr𝔼[𝒞FTPL(T,r)](minicαi+g(αi)κ)RT/κ2\displaystyle\sup_{r}\frac{{\mathbb{E}}[\mathcal{C}^{\text{FTPL}}(T,r)]}{(\min_{i}{c\alpha_{i}+g(\alpha_{i})\kappa})R_{T}/\kappa^{2}}
(a)\displaystyle\overset{(a)}{\leq} κ2(3+2M/c)mini(cαi+g(αi)κ)maxαi01g(αi)αi\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}
+suprκ2(M+c)RTmini(cαi+g(αi)κ)i=2K16α2c2αi2\displaystyle+\sup_{r}\frac{\kappa^{2}(M+c)}{R_{T}\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}
\displaystyle\leq κ2(3+2M/c)mini(cαi+g(αi)κ)maxαi01g(αi)αi\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}
+κ2(M+c)mini(cαi+g(αi)κ)i=2K16α2c2αi2.\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}.

Here (a)(a) is obtained by using Lemma 14. ∎

6.10 Proof of Theorem 3(b)

Proof of Theorem 3(b).

By using Lemma 13, we have

σAW-FTPL\displaystyle\sigma^{\text{W-FTPL}}_{A}\leq suprRTs+𝔼[𝒞AFTPL(T,r)]RT(mini{cαi+g(αi)κ})/κ2\displaystyle\sup_{r}\frac{R_{T_{s}}+{\mathbb{E}}[\mathcal{C}_{A}^{\text{FTPL}}(T,r)]}{R_{T}(\min_{i}\{c\alpha_{i}+g(\alpha_{i})\kappa\})/\kappa^{2}}
(a)\displaystyle\overset{(a)}{\leq} κ2(3+2M/c)mini(cαi+g(αi)κ)maxαi01g(αi)αi\displaystyle\frac{\kappa^{2}(3+2M/c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\max_{\alpha_{i}\neq 0}\frac{1-g(\alpha_{i})}{\alpha_{i}}
+κ2(M+c)mini(cαi+g(αi)κ)i=2K16α2c2αi2\displaystyle+\frac{\kappa^{2}(M+c)}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}\sum_{i=2}^{K}\frac{16\alpha^{2}}{c^{2}\alpha_{i}^{2}}
+κ2mini(cαi+g(αi)κ).\displaystyle+\frac{\kappa^{2}}{\min_{i}(c\alpha_{i}+g(\alpha_{i})\kappa)}.

Here (a)(a) is obtained by using Theorem 3(a) and the fact that RTsRTR_{T_{s}}\leq R_{T}. ∎

6.11 Proof of Theorem 4(a)

Proof of Theorem 4(a).
σSRR(T)\displaystyle\sigma^{\text{RR}}_{S}(T) =𝔼[𝒞RR(T,r)]mini{cαiT+g(αi)μT+Mαi}\displaystyle=\frac{{\mathbb{E}}[\mathcal{C}^{\text{RR}}(T,r)]}{\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}
=SRR(T)+mini{cαiT+g(αi)μT+Mαi}mini{cαiT+g(αi)μT+Mαi}\displaystyle=\frac{\mathcal{R}^{\text{RR}}_{S}(T)+\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}{\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}}
1+RRS(T)μiT.\displaystyle\leq 1+\frac{\mathcal{R^{\text{RR}}}_{S}(T)}{\mu_{i^{\star}}T}.

By using Theorem 2 we have,

σSRR(T)1+Mμi(1Mκcαig(αi)κ+Mc+21T)pd.\displaystyle\sigma^{\text{RR}}_{S}(T)\geq 1+\frac{M}{\mu_{i^{\star}}}\left(\frac{1}{\frac{M}{\kappa-c\alpha_{i^{\prime}}-g(\alpha_{i^{\prime}})\kappa}+\frac{M}{c}+2}-\frac{1}{T}\right)pd.

For large TT we get,

σSRR(T)>1.\displaystyle\sigma^{\text{RR}}_{S}(T)>1.

6.12 Proof of Theorem 4(b)

Proof of Theorem 4(b).

By using Theorem 2 we have

σSFTPL(T)\displaystyle\sigma^{\text{FTPL}}_{S}(T) 1+1μiT(16α2+1)(MΔmin2+ii1Δi)\displaystyle\leq 1+\frac{1}{\mu_{i^{\star}}T}(16\alpha^{2}+1)\left(\frac{M}{\Delta_{\text{min}}^{2}}+\sum_{i\neq i^{\star}}\frac{1}{\Delta_{i}}\right)
=1+O(1/T).\displaystyle=1+\mathrm{O}(1/T).

6.13 Proof of Theorem 4(c)

Proof of Theorem 4(c).

By using Theorem 2 we have SW-FTPL(T)\mathcal{R}^{\text{W-FTPL}}_{S}(T) as a constant and

σSW-FTPL(T)1+SW-FTPL(T)Tμi.\displaystyle\sigma^{\text{W-FTPL}}_{S}(T)\leq 1+\frac{\mathcal{R}^{\text{W-FTPL}}_{S}(T)}{T\mu_{i^{\star}}}.

Therefore we get,

σSW-FTPL(T)1+O(1/T).\sigma^{\text{W-FTPL}}_{S}(T)\leq 1+\mathrm{O}(1/T).

References

  • [1] R. S. Prakash, N. Karamchandani, V. Kavitha, S. Moharir, Partial service caching at the edge, in: 2020 18th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), IEEE, 2020, pp. 1–8.
  • [2] V. S. C. L. Narayana, M. Agarwala, R. S. Prakash, N. Karamchandani, S. Moharir, Online partial service hosting at the edge, CoRR abs/2103.00555 (2021). arXiv:2103.00555.
    URL https://arxiv.org/abs/2103.00555
  • [3] V. C. L. Narayana, S. Moharir, N. Karamchandani, On renting edge resources for service hosting, ACM Transactions on Modeling and Performance Evaluation of Computing Systems 6 (2) (2021) 1–30.
  • [4] T. Zhao, I.-H. Hou, S. Wang, K. Chan, Red/led: An asymptotically optimal and scalable online algorithm for service caching at the edge, IEEE Journal on Selected Areas in Communications 36 (8) (2018) 1857–1870.
  • [5] T. L. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in applied mathematics 6 (1) (1985) 4–22.
  • [6] G. S. Paschos, A. Destounis, L. Vigneri, G. Iosifidis, Learning to cache with no regrets, in: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp. 235–243.
  • [7] R. Bhattacharjee, S. Banerjee, A. Sinha, Fundamental limits on the regret of online network-caching, Proceedings of the ACM on Measurement and Analysis of Computing Systems 4 (2) (2020) 1–31.
  • [8] C. Puliafito, E. Mingozzi, F. Longo, A. Puliafito, O. Rana, Fog computing for the internet of things: A survey, ACM Trans. Internet Technol. 19 (2) (2019) 18:1–18:41. doi:10.1145/3301443.
    URL http://doi.acm.org/10.1145/3301443
  • [9] Y. Mao, C. You, J. Zhang, K. Huang, K. B. Letaief, A survey on mobile edge computing: The communication perspective, IEEE Communications Surveys & Tutorials 19 (4) (2017) 2322–2358.
  • [10] P. Mach, Z. Becvar, Mobile edge computing: A survey on architecture and computation offloading, IEEE Communications Surveys & Tutorials 19 (3) (2017) 1628–1656.
  • [11] S. Pasteris, S. Wang, M. Herbster, T. He, Service placement with provable guarantees in heterogeneous edge computing systems, in: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp. 514–522.
  • [12] S. Bi, L. Huang, Y. A. Zhang, Joint optimization of service caching placement and computation offloading in mobile edge computing systems, IEEE Transactions on Wireless Communications 19 (7) (2020) 4947–4963. doi:10.1109/TWC.2020.2988386.
  • [13] O. Ascigil, A. Tasiopoulos, T. K. Phan, V. Sourlas, I. Psaras, G. Pavlou, Resource provisioning and allocation in function-as-a-service edge-clouds, IEEE Transactions on Services Computing (2021) 1–1doi:10.1109/TSC.2021.3052139.
  • [14] H. Tan, S. H.-C. Jiang, Z. Han, M. Li, Asymptotically optimal online caching on multiple caches with relaying and bypassing, IEEE/ACM Transactions on Networking (2021).
  • [15] X. Xia, F. Chen, Q. He, J. Grundy, M. Abdelrazek, H. Jin, Online collaborative data caching in edge computing, IEEE Transactions on Parallel and Distributed Systems 32 (2) (2020) 281–294.
  • [16] T. S. Salem, G. Neglia, S. Ioannidis, No-regret caching via online mirror descent, in: ICC 2021-IEEE International Conference on Communications, IEEE, 2021, pp. 1–6.
  • [17] S. Mukhopadhyay, A. Sinha, Online caching with optimal switching regret, in: 2021 IEEE International Symposium on Information Theory (ISIT), IEEE, 2021, pp. 1546–1551.
  • [18] S. Fan, I.-H. Hou, L. B. Van Sy Mai, Online service caching and routing at the edge with unknown arrivals (2022).
  • [19] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, K. K. Leung, Dynamic service migration in mobile edge-clouds, in: 2015 IFIP Networking Conference (IFIP Networking), IEEE, 2015, pp. 1–9.
  • [20] G. Xiong, R. Singh, J. Li, Learning augmented index policy for optimal service placement at the network edge, arXiv preprint arXiv:2101.03641.
  • [21] D. Zeng, L. Gu, S. Pan, J. Cai, S. Guo, Resource management at the network edge: A deep reinforcement learning approach, IEEE Network 33 (3) (2019) 26–33.
  • [22] L. Chen, J. Xu, Budget-constrained edge service provisioning with demand estimation via bandit learning, arXiv preprint arXiv:1903.09080 (2019).
  • [23] W. He, D. He, Y. Huang, Y. Zhang, Y. Xu, G. Yun-feng, W. Zhang, Bandit learning-based service placement and resource allocation for mobile edge computing, in: 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, IEEE, pp. 1–6.
  • [24] A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. H. Epema, The grid workloads archive, Future Generation Computer Systems 24 (7) (2008) 672–686.
  • [25] A. Cohen, T. Hazan, Following the perturbed leader for online structured learning, in: International Conference on Machine Learning, PMLR, 2015, pp. 1034–1042.
  • [26] J. Abernethy, C. Lee, A. Sinha, A. Tewari, Online linear optimization via smoothing, in: Conference on Learning Theory, PMLR, 2014, pp. 807–823.
  • [27] W. Hoeffding, Probability inequalities for sums of bounded random variables, in: The collected works of Wassily Hoeffding, Springer, 1994, pp. 409–426.
  • [28] D. Berend, A. Kontorovich, A sharp estimate of the binomial mean absolute deviation with applications, Statistics & Probability Letters 83 (4) (2013) 1254–1259.
  • [29] H. Robbins, A remark on stirling’s formula, The American mathematical monthly 62 (1) (1955) 26–29.

7 Appendix

Proof of Lemma 1.

For i.i.d. stochastic arrivals, we have

S𝒫\displaystyle\mathcal{R}^{\mathcal{P}}_{S} (T)\displaystyle(T)
=𝔼𝒫,r[𝒞𝒫(T,r)]mini{cαiT+g(αi)μT+Mαi}\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{c\alpha_{i}T+g(\alpha_{i})\mu T+M\alpha_{i}\}
=𝔼𝒫,r[𝒞𝒫(T,r)]mini{𝔼r[cαiT+g(αi)RT+Mαi]}\displaystyle={\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-\min_{i}\{{\mathbb{E}}_{r}[c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}]\}
(a)𝔼𝒫,r[𝒞𝒫(T,r)]𝔼r[mini{cαiT+g(αi)RT+Mαi}]\displaystyle\overset{(a)}{\leq}{\mathbb{E}}_{\mathcal{P},r}[\mathcal{C}^{\mathcal{P}}(T,r)]-{\mathbb{E}}_{r}[\min_{i}\{c\alpha_{i}T+g(\alpha_{i})R_{T}+M\alpha_{i}\}]
=𝔼r[A(T,r)]supr(A𝒫(T,r))=A𝒫(T),\displaystyle={\mathbb{E}}_{r}[\mathcal{R}_{A}(T,r)]\leq\sup_{r}\left(\mathcal{R}^{\mathcal{P}}_{A}(T,r)\right)=\mathcal{R}_{A}^{\mathcal{P}}(T),

where (a)(a) is obtained by Jensen’s inequality. ∎

Lemma 15.

If XX is a random variable and 𝔼[f1(X)]=𝔼[f2(X)]=m{\mathbb{E}}[f_{1}(X)]={\mathbb{E}}[f_{2}(X)]=m, then

m𝔼[min{f1(X),f2(X)}]=12𝔼[|f1(X)f2(X)|].m-{\mathbb{E}}[\min\{f_{1}(X),f_{2}(X)\}]=\frac{1}{2}{\mathbb{E}}[|f_{1}(X)-f_{2}(X)|].
Proof.
min{X,Y}=12(X+Y)12(|XY|)\displaystyle\min\{X,Y\}=\frac{1}{2}(X+Y)-\frac{1}{2}(|X-Y|)
mmin{f1(X),f2(X)}=m12(f1(X)+f2(X))\displaystyle\implies m-\min\{f_{1}(X),f_{2}(X)\}=m-\frac{1}{2}(f_{1}(X)+f_{2}(X))
+12(|f1(X)f2(X)|)\displaystyle+\frac{1}{2}(|f_{1}(X)-f_{2}(X)|)
m𝔼[min{f1(X),f2(X)}]=12𝔼[|f1(X)f2(X)|].\displaystyle\implies m-{\mathbb{E}}[\min\{f_{1}(X),f_{2}(X)\}]=\frac{1}{2}{\mathbb{E}}[|f_{1}(X)-f_{2}(X)|].

Lemma 16.

If XBin(T,p)X\sim\text{Bin}(T,p) then

12𝔼[|XTp|]\displaystyle\frac{1}{2}{\mathbb{E}}[|X-Tp|]\geq Tp(1p)e2π[1+112T+1112Tp\displaystyle\frac{\sqrt{Tp(1-p)}}{e\sqrt{2\pi}}\left[1+\frac{1}{12T+1}-\frac{1}{12\lfloor Tp\rfloor}\right.
112(TTp1)].\displaystyle\left.-\frac{1}{12(T-\lfloor Tp\rfloor-1)}\right].
Proof.

By using equation (1) from [28] we get

𝔼[|X\displaystyle{\mathbb{E}}[|X Tp|]\displaystyle-Tp|]
=\displaystyle= 2(1p)TTppTp+1(Tp+1)(TTp+1)\displaystyle 2(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}(\lfloor Tp\rfloor+1){T\choose\lfloor Tp+1\rfloor}
=\displaystyle= 2(1p)TTppTp+1T!Tp!(TTp1)!\displaystyle 2(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}\frac{T!}{\lfloor Tp\rfloor!(T-\lfloor Tp\rfloor-1)!}
(a)\displaystyle\overset{(a)}{\geq} 2(1p)TTppTp+11e2πTT+12e112T+1TpTp+12e112Tp\displaystyle 2(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}\frac{1}{e\sqrt{2\pi}}\frac{T^{T+\frac{1}{2}}e^{\frac{1}{12T+1}}}{\lfloor Tp\rfloor^{\lfloor Tp\rfloor+\frac{1}{2}}e^{\frac{1}{12\lfloor Tp\rfloor}}}
×1(TTp1)TTp1+12e112(TTp1)\displaystyle\times\frac{1}{(T-\lfloor Tp\rfloor-1)^{T-\lfloor Tp\rfloor-1+\frac{1}{2}}e^{\frac{1}{12(T-\lfloor Tp\rfloor-1)}}}
(b)\displaystyle\overset{(b)}{\geq} 2e2π(1p)TTppTp+1TT+12e112T+1(Tp)Tp+12e112Tp\displaystyle\frac{2}{e\sqrt{2\pi}}(1-p)^{T-\lfloor Tp\rfloor}p^{\lfloor Tp\rfloor+1}\frac{T^{T+\frac{1}{2}}e^{\frac{1}{12T+1}}}{(Tp)^{\lfloor Tp\rfloor+\frac{1}{2}}e^{\frac{1}{12\lfloor Tp\rfloor}}}
×1(TTp)TTp12e112(TTp1)\displaystyle\times\frac{1}{(T-Tp)^{T-\lfloor Tp\rfloor-\frac{1}{2}}e^{\frac{1}{12(T-\lfloor Tp\rfloor-1)}}}
=\displaystyle= 2Tp(1p)e2πe112T+1112Tp112(TTp1)\displaystyle\frac{2\sqrt{Tp(1-p)}}{e\sqrt{2\pi}}e^{\frac{1}{12T+1}-\frac{1}{12\lfloor Tp\rfloor}-\frac{1}{12(T-\lfloor Tp\rfloor-1)}}
(c)\displaystyle\overset{(c)}{\geq} 2Tp(1p)e2π[1+112T+1112Tp\displaystyle\frac{2\sqrt{Tp(1-p)}}{e\sqrt{2\pi}}\left[1+\frac{1}{12T+1}-\frac{1}{12\lfloor Tp\rfloor}\right.
112(TTp1)],\displaystyle\left.-\frac{1}{12(T-\lfloor Tp\rfloor-1)}\right],

where (a)(a) follows by using the result in [29] which is 2πnn+12ene112n+1<n!<2πnn+12ene112n\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n}e^{\frac{1}{12n+1}}<n!<\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n}e^{\frac{1}{12n}}. (b)(b) follows by using the fact that x1xxx-1\leq\lfloor x\rfloor\leq x. (c)(c) follows from the fact ex1+xe^{x}\geq 1+x. ∎