(
Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item Sizes
Abstract.
In modern computing systems, jobs’ resource requirements often vary over time. Accounting for this temporal variability during job scheduling is essential for meeting performance goals. However, theoretical understanding on how to schedule jobs with time-varying resource requirements is limited. Motivated by this gap, we propose a new setting of the stochastic bin-packing problem in service systems that allows for time-varying job resource requirements, also referred to as ‘item sizes’ in traditional bin-packing terms. In this setting, a job or ‘item’ must be dispatched to a server or ‘bin’ upon arrival. Its resource requirement may vary over time while in service, following a Markovian assumption. Once the job’s service is complete, it departs from the system. Our goal is to minimize the expected number of active servers, or ‘non-empty bins’, in steady state.
Under our problem formulation, we develop a job dispatch policy, named Join-Requesting-Server (JRS). Broadly, JRS lets each server independently evaluate its current job configuration and decide whether to accept additional jobs, balancing the competing objectives of maximizing throughput and minimizing the risk of resource capacity overruns. The JRS dispatcher then utilizes these individual evaluations to decide which server to dispatch each arriving job to. The theoretical performance guarantee of JRS is in the asymptotic regime where the job arrival rate scales large linearly with respect to a scaling factor . We show that JRS achieves an additive optimality gap of in the objective value, where the optimal objective value is . When specialized to constant job resource requirements, our result improves upon the state-of-the-art optimality gap. Our technical approach highlights a novel policy conversion framework that reduces the policy design problem into a single-server problem.
1. Introduction
1.1. Background and Motivation
In modern computing systems, a job often takes the form of a virtual machine (VM) or a container (Cloud, 2023b; Foundation, 2023a). Such a job comes with a resource requirement, such as a certain number of CPUs and amount of memory, while in service. Each server in the system offers a limited amount of these resources. When a job arrives at the system, the job dispatch policy needs to decide which server the job should be assigned to, given the job’s resource requirement and servers’ current job configurations. This job scheduling problem can be approached as a Stochastic Bin-Packing (SBP) problem, where jobs are viewed as items, job resource requirements as item sizes, and servers as bins. A traditional SBP setting considers a finite set of jobs that arrive online but do not depart from the system. The objective is to minimize the number of servers that have jobs on them, or ‘non-empty bins’, subject to the resource capacities of the servers. SBP, with a rich history in operations research and theoretical computer science (Courcoubetis and Weber, 1986; Courcobetis and Weber, 1990; Csirik et al., 2006), is a field of continuous developments and advancements (Gupta and Radovanović, 2020; Freund and Banerjee, 2019; Ayyadevara et al., 2022).
To incorporate job departures into the problem formulation, a setting referred to as stochastic bin-packing in service systems has been proposed recently (Stolyar, 2013; Stolyar and Zhong, 2013, 2015; Stolyar, 2017; Stolyar and Zhong, 2021; Ghaderi et al., 2014). In this setting, jobs not only arrive but also depart over time. More specifically, jobs are assumed to arrive according to Poisson processes, and each job is assumed to stay in the system for an exponentially distributed service time. The service time of a job remains unknown until the job departs. Before delving further into SBP in service systems, it is worth mentioning that there is a parallel thread of research on the so-called dynamic bin-packing problem that also handles job departures (see, e.g., (Coffman et al., 1983; Li et al., 2014; Buchbinder et al., 2021), and references therein), but it is primarily from a worst-case analysis perspective. Additionally, the virtual machine scheduling problem with objectives different from minimizing the number of active servers has also been widely studied (see, e.g., (Maguluri et al., 2012; Maguluri and Srikant, 2013; Maguluri et al., 2014; Xie et al., 2015; Psychas and Ghaderi, 2018, 2019, 2021, 2022)).
For SBP in service systems, the goal is to design a job dispatch policy that minimizes the expected number of active servers in steady state, denoted as . The optimality gap of a policy is defined as , where is the optimal policy. Since SBP in service systems aims to model today’s large-scale computing systems, the optimality gap of a policy is usually studied in the regime where the total job arrival rate becomes large. As we scale up the total job arrival rate linearly with a scaling factor, , the optimal value can be shown to be . 111We use the standard Bachmann–Landau notation. Consider two functions and , where is positive for large enough . Then if ; if ; if and . Therefore, we say a policy is asymptotically optimal if its optimality gap is .
The optimality gap for SBP in service systems has been characterized in the line of work (Stolyar, 2013; Stolyar and Zhong, 2013, 2015; Stolyar, 2017; Stolyar and Zhong, 2021). In particular, Stolyar (2013), Stolyar and Zhong (2013) propose greedy policies that are asymptotically optimal, but the scheduler that executes these policies needs to know detailed state information, which is in a high-dimensional space. Stolyar and Zhong (2015, 2021) later develop policies that use much less state information and achieve (with an arbitrarily small constant) and optimality gaps, respectively.
While prior work on SBP in service systems has provided substantial insights into scheduling virtual-machine-type jobs, it primarily focuses on job resource requirements that remain fixed over time. However, in modern computing systems, jobs’ resource requirements often vary over time (Reiss et al., 2012; Delimitrou and Kozyrakis, 2014; Lo et al., 2015; Tirmazi et al., 2020; Rzadca et al., 2020; Bashir et al., 2021). For example, when a job involves providing user-facing services, the instantaneous requirement on CPUs and memory depends on the service demand, which is subject to fluctuation over time (Delimitrou and Kozyrakis, 2014; Lo et al., 2015). Time-varying job resource requirements pose significant challenges in optimizing system efficiency, particularly when aiming to minimize the number of active servers, thereby improving server utilization. It is pertinent to note that low utilization has been recognized as a significant obstacle to the continued scaling of today’s computing systems.
Motivated by this gap, in this paper, we propose a new setting of stochastic bin-packing in service systems that allows job resource requirements, or ‘item sizes’, to vary over time.
1.2. Problem Formulation: A Simplified Version


We first describe our job model that features time-varying resource requirements. For ease of exposition, here we present a simplified setting where each job in service can be in one of the two phases, and , associated with low and high resource requirements, respectively. Our full model, presented in Section 2, allows more than two phases and more than one type of resources. To model the temporal variation in the resource requirement, we assume that each job transitions between the two phases while in service until it is completed, following a continuous-time Markov chain illustrated in Figure 1(a). We use an absorbing state to denote that the job is completed. A job can initialize in either phase or phase , and they are referred to as type and type jobs, respectively. Note that the setting where a job’s resource requirement does not vary over time is a special case of our job model where the transition rates between phases are .
We consider a system with an infinite number of identical servers, illustrated in Figure 1(b). We assume jobs arrive according to a Poisson process as existing work on SBP in service systems. In particular, we assume that the two types of jobs arrive at the system following two independent Poisson processes, with rates and , respectively; i.e., the interarrival times of type and type jobs are i.i.d. following exponential distributions with means and , respectively. Upon arrival, a job needs to be dispatched to a server according to a dispatch policy, and the job enters service immediately. The goal is to design a policy to minimize the expected number of active servers (servers currently serving a positive number of jobs) in steady state, denoted as .
As job resource requirements vary over time, situations can arise where the total job resource requirement on a server exceeds the server’s resource capacity, resulting in resource contention. Modern computing systems can tolerate temporary overruns of resource capacity, though they often incur performance degradation or other costs (Cloud, 2023a; Foundation, 2023b). In our model, we incorporate a rate at which the cost accumulates due to resource contention. We first represent the state of a server by its configuration, a vector where and are the numbers of jobs in phase and phase , respectively. Then a cost rate function maps a server’s configuration to a rate of cost. For example, the cost rate can be proportional to how much the total resource requirement of the jobs on the server exceeds this server’s resource capacity. A more general definition of is given in Section 2. We assume that the resource contention does not affect the transition rates in the job model nor prompt jobs to be terminated, suitable for the application scenarios where the contention level is low and manageable. Let denote the average expected cost rate per server.
Now our bin-packing problem can be formulated as follows:
(1) | ||||||
subject to |
where is a budget for the cost rate of resource contention. We are interested in solving this problem in the asymptotic regime where the arrival rates scale to infinity (Stolyar and Zhong, 2013, 2015; Stolyar, 2017; Stolyar and Zhong, 2021), motivated by the ever-increasing computing demand that drives today’s computing systems to be large-scale. Specifically, we assume for some fixed coefficients and and a scaling factor , and we study the asymptotic regime where increases.
1.3. Main Result
Our main result is an asymptotically optimal policy, named Join-Requesting-Server (JRS), for this new setting of SBP in service systems with time-varying job resource requirements. The asymptotic optimality is in the sense that under our proposed policy JRS, the expected number of active servers is at most times the optimal objective value of the optimization problem in (1), while the cost rate incurred is at most (i.e., exceeding the budget by at most a diminishing fraction). This asymptotic optimality result translates into an additive optimality gap of in the objective value (expected number of active servers), since the optimal objective value can be shown to be . This main result is formally presented in Theorem 1.
Our model can be specialized to the traditional setting of SBP in service systems where jobs’ resource requirements remain fixed over time. For this specialization, we replace the constraint in the problem formulation (1) with a capacity constraint, which requires the total resource requirement by jobs on a server to be within the server’s resource capacity. Our proposed policy JRS can then be adapted into one that has an optimality gap in the objective value, which improves upon the state-of-the-art optimality gap. A discussion on the implementation complexity of JRS and how it compares with existing policies for the traditional setting of SBP in service systems is provided in Section 4.3. To be clear, this setting is not a strict special case of the formulation in (1) because is required there, but our approach and proof carry over.
From a technical approach perspective, our contribution is a novel approach that decomposes the policy design into two steps: defining a single-server sub-problem, and then converting the solution of the sub-problem into a policy in the original problem. This decomposition not only reduces the complexity of policy design but also makes the analysis tractable. We provide an overview of this approach in Section 1.4.
1.4. Approach Overview
To motivate our approach, we ask two questions:
How should we design a good dispatch policy for this system?
How can we prove that a dispatch policy is asymptotically optimal?
Before presenting our answers to these two questions, we first comment on why they are challenging to answer. On the one hand, solving this problem directly via dynamic programming is intractable due to the unbounded state space resulting from the infinitely many servers. Even if we restrict ourselves to the servers that are active, the state space is still prohibitively large. On the other hand, we can consider designing a heuristic policy. However, unlike traditional SBP problems where we can simply seek to pack the servers as compactly as possible, here the question of how many jobs should be put on a server is complicated. The complexity comes from the time-varying resource requirements of jobs, which affect future resource contention.
A policy-conversion framework.
We answer the two questions at the same time with a novel policy-conversion framework. The framework has two steps:
-
(1)
Define the single-server problem, which is a easy-to-solve low-dimensional subproblem;
-
(2)
Convert the optimal policy of the single-server problem into a policy in the original problem.
This framework allows us to break down the complicated policy design problem into two components. In defining the single-server problem in Step 1, our goal is to quantify the throughput of each server under the resource contention constraint; in the policy conversion in Step 2, our goal is to dispatch jobs optimally based on each single server’s characteristics. As we will show, a careful construction of the single-server problem and the conversion procedure naturally leads to a policy for the original system and a proof of its asymptotic optimality. Below we give a quick overview of how we carry out these two steps and the motivation for the design choices.
Single-server problem.
To define the single-server problem, consider the following setting: suppose that our goal is to maximize the throughput of one specific server while keeping its expected cost rate of resource contention below , then how should we send jobs to this server? Observe that even though we want to send jobs to the server as frequently as possible, the frequency is fundamentally limited by how fast the server is able to serve jobs and how many jobs can be packed on the server. This motivates us to consider the single-server system illustrated in Figure 2. The system has one server and an infinite supply of jobs of all types, so the server can start the service of any number of new jobs of any type at any time. We say the server requests a job from the infinite supply whenever it starts serving a new job. We assume the same job model and cost model as in the original infinite-server system. The single-server problem aims to find a job-requesting policy that maximizes the throughput (the number of each type of job that can be served) along the direction of the arrival rate vector , while maintaining the steady-state expected cost rate of resource contention below .

How is the single-server problem related to the original problem?
Let be the number such that the total throughput of single-server systems under the optimal job-requesting policy is equal to (assuming is an integer for simplicity). Consider the following policy in the original system: let each of the first servers in the original system adopt the optimal job-requesting policy and send requests to the dispatcher based on its current configuration. If the requested jobs were to arrive as soon as the dispatcher received the requests, the dispatcher would be able to fulfill the requests immediately. In this case, the first servers in the original system would have the same dynamics as i.i.d. single-server systems, achieving the largest possible throughput and satisfying the constraint on resources contention. So the original system would have achieved the optimal number of active servers.
However, in the actual model, the dispatcher cannot immediately fulfill a job request because jobs arrive stochastically over time. Nevertheless, the dispatcher can still find a suitable way to match each job arrival with the requests. To see this, note that the time points when the dispatcher receives type requests result from the superposition of independent point processes, each with the average rate . As , the instantaneous rates of requesting type jobs concentrate around the arrival rate for each . As a result, most job requests can be fulfilled within a diminishing delay, so most servers in the original system can closely track the optimal single-server dynamics.
A meta-policy, Join-Requesting-Server (JRS), and its asymptotic optimality.
Based on the single-server problem and the idea of tracking the optimal single-server dynamics, we propose a meta-policy, Join-Requesting-Server (JRS), which converts a single-server policy to a dispatch policy in the original infinite-server system. We say that JRS takes as a subroutine. The full definition of JRS is given in Section 4, along with discussions on various practical considerations in its implementation.
We show that the asymptotic performance of JRS is related to its subroutine in the sense described in Theorem 3, which we refer to as the conversion theorem. In particular, JRS with the optimal single-server policy (which we refer to as Single-OPT) as the subroutine is asymptotically optimal for the original infinite-server problem as .
In order to track the optimal single-server dynamics, JRS uses a more sophisticated mechanism to control the long-term consequences of missing job requests or fulfilling requests with delays. The mechanism involves the auxiliary variables of tokens and virtual jobs, which regularize the process of generating requests and matching arrivals with requests. These auxiliary variables play a crucial role in the proof of Theorem 3 in Section 5, where a novel Stein’s method argument is carried out. We discuss the role of tokens and virtual jobs and their necessity at the end of Section 4 and in Section 5.5.
Finally, we comment that our policy conversion framework can be applicable to other systems with similar structures. Specifically, we can try to define a suitable single-server problem, solve for its optimal policy, and convert the optimal single-server policy to the original problem. A similar conversion theorem should hold as long as the servers are weakly coupled in some sense. See Section 6 for a discussion of such systems.
Relation to the mean-field approach.
We remark that the mean-field approach often studies the empirical distribution of configurations on all servers (Stolyar, 2013; Stolyar and Zhong, 2013, 2015; Stolyar, 2017; Stolyar and Zhong, 2021; Ghaderi et al., 2014), which can be viewed as a probability distribution of a single server’s configuration. However, the mean-field approach is typically used to analyze this empirical distribution under a given policy for the original system. In contrast, our approach solves the single-server problem to design a single-server policy, and then converts it to a policy in the original system with performance guarantee.
1.5. Paper organization
In Section 2, we present the general problem formulation, which generalizes the simplified version in this section. In Section 3, we give a more detailed overview of our main result and approach, with a short proof of Theorem 1 (main result) based on Theorems 2–4 at the end of the section. Section 4 provides a detailed description of our meta-policy, Join-Requesting-Server (JRS), along with discussions on practical considerations in its implementation. In Section 5, we prove the performance guarantee of JRS (Theorem 3) under an irreducibility assumption. The proof for the general case and other proofs are deferred to the appendices. We conclude the paper and discuss some future directions in Section 6.
2. Problem Formulation
Job Model.
As described in Section 1, we consider a job model where each job in service can be in one of multiple phases, each phase associated with a different resource requirement. Here the resource requirement can be a multi-dimensional vector, with each coordinate specifying the requirement of one type of resource. To model the temporal variation in the resource requirement, we assume that each job transitions between phases while in service until it is completed. The phase transition process is described by a continuous-time Markov chain on the state space , where is the set of phases and is the absorbing state that denotes the completion of the job. We call a transition between two phases in an internal transition, and let denote the transition rate from phase to phase ; the departure of a job then corresponds to a transition from a phase to , whose transition rate is denoted as . The phase transitions of different jobs are assumed to be independent of each other.
We classify a job as a type job if it starts from phase when entering service. Jobs of each type arrive to the system according to an independent Poisson process with rate .
Server Model.
We consider an infinite-server system with identical servers. As soon as a job arrives to the system, the job needs to be dispatched to a server to start service immediately. Note that this is always feasible because there are an infinite number of servers in the system. We assume that jobs cannot be preempted or migrated. To describe the state of a server, we define the configuration of a server as an -dimensional vector , whose -th entry is the number of jobs in phase on the server. Each server has a limit on the total number of jobs in service at the same time. This limit is denoted as and referred to as the service limit. Then the set of feasible server configurations is .
System Dynamics.
The system state can be represented by the concatenation of the configurations of all servers. Specifically, we index the servers by positive integer numbers and denote the configuration of server at time as . Then the state of the entire system can be represented by the infinite vector .
Suppose that the system is in state . Let be an -dimensional vector whose -th entry is and all other entries are . Then the following state transitions can happen:
-
•
, : a type job arrives and is dispatched to server ;
-
•
, : a job on server transits from phase to phase ;
-
•
, : a job on server departs the system from phase .
The specifics of the system dynamics depend on the employed dispatch policy that decides which server to dispatch to when a job arrives.
Active Servers.
We are interested in the number of active servers, i.e., servers currently serving a positive number of jobs. Note that given the arrival rates of jobs, the smaller the number of active servers, the better the system is utilized. Let be the number of servers in configuration at time , i.e., Then the number of active servers can be written as where is the zero vector.
Cost of Resource Contention.
Recall that the cost rate function maps a server’s configuration to a rate of cost. We assume that is any function that is -Lipschitz continuous with respect to the distance for some constant and satisfies .
Performance Goal.
Our high-level goal is to design dispatch policies that minimize the number of active users while keeping the cost rate of resource contention within a certain budget. Specifically, we consider policies that are allowed to be randomized and non-Markovian (i.e., the policies can make history-dependent decisions). We further focus on policies that induce a unique stationary distribution on the configuration process , assuming that the configuration process is embedded in a Markov chain that has a unique stationary distribution. We are interested in such policies because the resulting time averages of quantities related to the configurations are equal to the corresponding expectations under the unique stationary distribution regardless of the initial state. Let be a policy of interest, be a random element that follows the stationary distribution of the system state induced by , and be the corresponding number of servers in configuration in steady state under . Then the expected number of active servers is given by
We define the expected cost rate per expected active server as
Note that if , we have . Now our goal can be formulated as the following optimization problem, referred to as problem :
(2) | ||||||
subject to |
where is a budget for the cost rate of resource contention.
Asymptotic Optimality.
We focus on the asymptotic regime where for all , the arrival rate is given by for some constant coefficient and a positive scaling factor . To define asymptotic optimality, we first define the following notion of approximation to the optimization problem in (2): a policy is said to be -optimal if and , where is the optimal objective value in (2). Now consider a family of policies indexed by the scaling factor . We say that the policy is asymptotically optimal if it is -optimal to the optimization problem with as . We will suppress the superscript for simplicity when it is clear from the context.
We note that under any policy , . This can be proven using the renowned Little’s Law (Kleinrock, 1975) in the following way. The total job arrival rate is and the expected time that a job spends in the system is . So by Little’s Law, the expected number of jobs in the system in steady state is . Since each server can accommodate a constant number of jobs, the expected number of active servers is . Given this, -optimality implies an optimality gap of in the objective value.
3. Main Result and Our Approach
3.1. Main Result
Our main result, Theorem 1, is the asymptotic optimality of our proposed policy Join-Requesting-Server (JRS), with a subroutine we call Single-OPT, as briefly discussed in Section 1. This asymptotic optimality result implies an optimality gap in the expected number of active servers. We defer the detailed descriptions of JRS and Single-OPT to Section 4 and Appendix C. Theorem 1 follows immediately from Theorems 2–4 to be introduced in Section 3.2; a short proof is included at the end of Section 3.2 for clarity.
Theorem 1 (Asymptotic Optimality).
Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements. Let the arrival rates be and the cost rate budget be . Then the policy Join-Requesting-Server (JRS) with the subroutine Single-OPT is -optimal. That is, the expected number of active servers under JRS with Single-OPT is at most times the optimal value of the problem , while the cost rate incurred is at most .
Specialization to Non-Time-Varying Resource Requirements.
As mentioned in Section 1, we can specialize this result to the setting where the resource requirement of a job does not vary over time. To do that, we remove the cost constraint in , and redefine the set of feasible server configurations, , to also incorporate hard capacity constraints for each type of resources. The rest of the analysis is almost identical to that of the analysis for time-varying resource requirements; we omit the details due to the space limit. This specialization results in a policy that is -optimal in the expected number of active servers.
3.2. Our Approach
In a nutshell, our approach is to reduce the original optimization problem in an infinite-server system to an optimization problem in a single-server system, which is defined below.
A Single-Server System.
Consider a single-server system serving jobs with time-varying resource requirements. The system has an infinite supply of jobs of all types. As a result, the server can request any number of new jobs of any type at any time. Once a job is requested, it immediately enters service.
We represent the server configuration at time using a vector , whose -th entry denotes the number of jobs in phase . We assume that the single-server system has the same service limit and cost rate function as a server in the original infinite-server system. Therefore, the server configuration is also in the set , and the cost rate at time is .
A single-server policy determines when and how many jobs of each type to request. We allow the single-server policy to be randomized and assume it is Markovian, i.e., it makes decisions only based on the current configuration. Note that allowing non-Markovian policies will not change the optimal value of the single-server problem that we will consider (see Appendix C). Let be a stationary distribution of the server configuration under the policy , and let be a random variable with the distribution . When we consider a policy and its stationary distribution , we assume that the system is initialized from . The policy together with defines the request rate of type jobs , which is the expected number of type jobs requested per unit time in steady state. Note that is the throughput of type jobs since the system has a finite state space.
We consider the following single-server problem, denoted as :
(3) | ||||||
subject to | ||||||
The single-server problem can be interpreted as follows. We can think of as the number of copies of the single-server system under needed to support the arrival rates in the infinite-server system. To minimize , it is equivalent to maximizing the throughput in each single-server system, while maintaining their proportions as .
We remark that for the problem , we only need to consider policies that do not depend on the scaling factor . To see this, we can replace the decision variable with and the optimization problem can be equivalently formulated as follows, which does not involve :
(4) | ||||||
subject to | ||||||
Lower Bound.
The single-server problem gives a lower bound to the original problem in (2) as stated in the following theorem. The proof is given in Appendix A.
Theorem 2 (Lower Bound).
Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements. Let the arrival rates be and the cost rate budget be . Let be the optimal value of the original infinite-server problem in (2), and let be the optimal value of the single-server problem , then .
Converting From the Single-Server System to the Infinite-Server System.
Having established a lower bound on the infinite-server problem in terms of the optimal value of the single-server problem , next we focus on finding an asymptotically optimal policy. We will characterize the performance guarantee of a class of policies and then show that the best policy within the class is asymptotically optimal. Specifically, we consider a meta-policy called Join-Requesting-Server (JRS), which converts a Markovian single-server policy into an infinite-server policy. We call the policy resulting from the conversion a JRS policy with a subroutine . Through analyzing the meta-policy JRS, we show that the performance of each JRS policy can be characterized by the performance of its subroutine, as stated in Theorem 3 below. The proof of Theorem 3 under an irreducibility assumption is given in Section 5, and the proof for the full version is given in Appendix B.3.
Theorem 3 (Conversion Theorem).
Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements. Let the arrival rates be and the cost rate budget be . Let be a solution feasible to the single-server problem . In addition, we assume that the policy is Markovian. Let the infinite-server policy be JRS with a subroutine . Then under , we have
(5) | ||||
(6) |
As a result,
(7) | ||||
(8) |
Optimal Single-Server Policy.
Theorem 3 together with the lower bound in Theorem 2 reduces the infinite-server problem in (2) to the single-server problem in (3). We can obtain the optimal single-server policy, Single-OPT, by solving a linear program, as stated in the theorem below.
Theorem 4 (Optimality of Single-OPT, Informal).
There exists a linear program that is equivalent to the single-server problem . In particular, we can construct an optimal Markovian policy for from the optimal solution of .
4. Proposed meta-policy: Join-Requesting-Server (JRS)
In this section, we describe our meta-policy, Join-Requesting-Server (JRS), in full detail. For ease of presentation, we focus on the case where the subroutine policy for JRS is -irreducible, i.e., under , there exists a configuration such that the single-server system can return to from any other configurations (which is equivalent to assuming that the configuration of the single-server system under policy forms a unichain). The algorithm for the general case is given in Section B.3.
4.1. How the Single-Server Policy Requests Jobs
Before going into the definition of JRS, we first take a closer look at how the Markovian single-server policy requests jobs, to avoid potential ambiguity caused by the fact that a single-server policy can request jobs at any time. Let denote the number of type jobs requested, and let . We say is feasible if the total number of jobs on the server does not exceed after adding the jobs. The policy performs one of the following two types of requests based on the current configuration.
-
•
Reactive requests. A reactive request is triggered by either an internal transition or a departure. The changes in the configuration when a reactive request is made can be represented by the diagram
where is due to the internal transition or departure, and happens since the policy immediately requests jobs. The policy specifies a probability distribution over all feasible when it decides to perform reactive requests for the configuration .
-
•
Proactive requests. A proactive request happens on its own, and it happens at a finite rate depending on the current configuration of the server. The change of the configuration when a proactive request happens can be represented by the diagram
More specifically, suppose the policy decides to perform proactive requests for a configuration . Then for each feasible , the policy specifies a rate and runs a timer with an exponentially distributed duration with the specified rate. When the timer ticks, jobs are requested. When the configuration changes, all the timers are canceled and restarted with new rates based on the new configuration.
4.2. Description of Join-Requesting-Server (JRS)
The inputs of JRS include: (i) the single-server policy , (ii) the objective value of in the single-server problem (3), denoted as , and (iii) the transition rates in the job model.
We first divide the infinite server pool into two sets based on the server index . Let . We call servers with index normal servers; we call servers with index backup servers. The normal servers are responsible for serving most of the jobs, while the backup servers are activated only to handle overflow jobs (jobs that are not dispatched to normal servers).
The JRS is specified in two steps.
-
•
Step 1 (Job Requesting on a Normal Server): We let each normal server request jobs using its subroutine, the single-server policy . The input to the policy is what we refer to as the observed configuration of the server, which will be further explained below. When requests jobs, type tokens are generated for each to store the job requests. The server pauses the job requesting process if it already has any type of tokens, and resumes when all the tokens that it generated are removed.
-
•
Step 2 (Arrival Dispatching):
-
–
Real jobs. When a type job arrives, the dispatcher chooses a type token uniformly at random, removes the token, and assigns the job to the corresponding server. When there are no type tokens, the dispatcher sends the job to an idle backup server.
-
–
Virtual jobs. When the total number of type tokens throughout the system exceeds the limit (called the token limit), a type virtual arrival is triggered, which causes the dispatcher to choose a type token uniformly at random, remove the token, and assign a virtual job to the corresponding server. A virtual job has the same transition dynamics as a real job but does not consume physical resources.
-
–
The observed configuration of a normal server in Step 1 is the configuration resulting from real jobs and virtual jobs combined. That is, it is a vector whose -th entry represents the total number of real and virtual jobs in phase on this server. The observed configuration changes when there is a new real or virtual job arrival assigned to the server, or when a real or virtual job on the server has a phase transition or departs. We update the input to the policy when the observed configuration changes. Whenever the observed configuration changes, the policy cancels the exponential timers in progress; but a reactive request from the policy can only be triggered when a real or virtual job on the server has a phase transition or departs.
Intuition behind JRS
To provide a better understanding of the main design ideas of JRS, here we give an intuitive description of how it works. Broadly, servers generate job requests and store unfulfilled requests as tokens; the dispatcher assigns jobs to servers according to the tokens to fulfill job requests. This is the mechanism for matching job arrivals with requests, which is referenced at the end of Section 1.4. However, rather than matching all tokens with job arrivals, JRS opts to convert some of the tokens into virtual jobs to keep the total number of tokens within an upper limit . By capping the number of tokens, JRS ensures that the job requests generated by each server get fulfilled quickly (either by a real job or a virtual job), and thus the observed configurations of servers maintain proximity to i.i.d. copies of the single-server systems.
The choice of the token limit balances two key considerations. On the one hand, a smaller brings the observed configurations closer to i.i.d. copies of single-server systems. On the other hand, if is overly small, the rate of generating virtual jobs becomes high and the probability for a job arrival to see no tokens is also high. As a result, the observed configurations, which include both real and virtual jobs, deviate from the real-job configurations. A more in-depth discussion on the role of tokens and virtual jobs and whether they are fundamental is in Section 5.5.
4.3. Practical considerations in implementing Join-Requesting-Server (JRS)
Computational complexity of JRS
The computational complexity of JRS consists of two components: the offline component that computes a single-server policy and its objective value , and the online component that carries out the two steps of JRS.
The offline component reduces to solving the linear program given in (61) in Appendix C.1, whose number of optimization variables is linear in the number of feasible configurations times the number of phases, i.e., , on a single server. Admittedly, can be large when a single server has a large quantity of resources and there are many job phases. However, we opt for the view that a single server is not excessively large and the system’s scale is primarily captured by the scaling factor . Therefore, it is advantageous that the computational complexity of this offline component is independent of .
In the online component, the bulk of the computation is in job requesting and virtual job simulation, which can be executed distributedly on the normal servers. Specifically, each normal server monitors its observed configuration and generates tokens according to the single-server policy ; additionally, when a virtual job is assigned to the server, the server simulates the dynamics of the virtual job, i.e., generates random variables corresponding to phase transitions and job departure. Backup servers do not need to perform any computation beyond serving jobs.
The scheduler, which stores all the tokens, has two responsibilities in the online component: (i) the scheduler matches each newly arrived job to a token of the same type, chosen uniformly at random, or sends the job to a backup server when there are no tokens of the same type; (ii) the scheduler monitors the number of tokens of each type and assigns virtual jobs when the number of tokens exceeds the limit .
It is informative to compare the computational complexity of JRS with existing algorithms designed for the traditional setting of stochastic bin-packing in service systems, where the resource requirements are non-time-varying (Stolyar, 2013; Stolyar and Zhong, 2013; Ghaderi et al., 2014; Stolyar and Zhong, 2015; Stolyar, 2017; Stolyar and Zhong, 2021). At a high level, these existing algorithms function as follows: upon the arrival of a job, the scheduler checks the current configurations of all servers and assigns the job to a server whose configuration optimizes certain predefined criteria. Among these, the GRAND algorithm (Stolyar and Zhong, 2015; Stolyar, 2017; Stolyar and Zhong, 2021) stands out for its simplicity and asymptotic optimality. Under GRAND, the scheduler only needs to identify configurations that can accommodate the incoming job and then randomly assigns the job to one of these feasible servers, along with some idle servers. Compared to JRS, GRAND does not have an offline planning component, and individual servers do not perform computation beyond serving jobs. The scheduler’s role in GRAND is slightly more complex than in JRS. Consequently, when considering using JRS in settings where job resource requirements are non-time-varying, practitioners should weigh whether the additional computational complexity is warranted.
Model parameter estimation.
A limitation of JRS is its dependency on known model parameters, including job arrival rates and phrase transition rates. Such dependency is not present in existing algorithms designed for the setting with non-time-varying resource requirements. In real-world applications, the model parameters can be estimated from workload traces such as (Tirmazi et al., 2020; Wilkes, 2019). Estimation errors can impact the system’s performance, an issue that merits further in-depth investigation in future work. Here, we provide a preliminary result on the performance degradation due to parameter estimation errors. Roughly speaking, suppose that the estimation error in the job arrival rate coefficients ’s and the phase transition rates ’s and ’s are bounded by (along with an insensitivity assumption on the single-server problem). Then if we use JRS where the single-server policy is obtained by solving for the optimal single-server policy under the estimated parameters, the resulting JRS is -optimal for any , where and are positive constants independent of . The exact statement is given in Proposition 1 in Appendix D, along with a proof.
Connection to practical algorithms.
Recent progress has been made in addressing the issue of low utilization due to time-varying job resource requirements, notably within Google’s datacenters, as discussed in (Bashir et al., 2021). The approach in (Bashir et al., 2021) makes predictions on the future resource requirements of jobs, which lead to a further prediction on the future peak resource requirement on a server if a newly arrived job were to be sent to that server (assuming no future job arrivals). This prediction categorizes each server as either feasible or infeasible for the new job, and this binary outcome is subsequently used by a separate scheduler for job assignment.
Our proposed JRS policy can be viewed as giving more detailed predictions on whether it is suitable for a server to take on new jobs, represented by the tokens. The predictions are optimized by taking into account future job arrivals and the stochastic dynamics of jobs.
5. Proof of Theorem 3 (Conversion Theorem) Assuming Irreducibility
In this section, we prove Theorem 3 to establish the performance guarantee of JRS. For ease of presentation, we focus on the case where the subroutine policy is -irreducible. The proof for the general case is in Section B.3.
This section is organized as follows. We first provide some preliminaries in Section 5.1. Then we outline the steps and lemmas needed for the proof in Section 5.2. In Section 5.3, we prove Theorem 3 based on the lemmas. In Section 5.4, we prove one of the lemmas, Lemma 2, where we devise a novel approach to employ Stein’s method. Finally, in Section 5.5, we discuss the role of tokens and virtual jobs and their necessity from a proof perspective. The proofs of the rest of the lemmas presented in this section are given in Appendix B.
5.1. Preliminaries
Consider an infinite-server system under the JRS policy. For each normal server , we describe its status at time using the following variables: configuration of real jobs (referred to simply as configuration in previous sections), tokens , configuration of virtual jobs , and observed configuration . We use the superscript “” to refer to a certain descriptor of all normal servers, for example, . The system under JRS is a Markov chain with a unique Markovian representation . The following lemma shows that the system has a unique stationary distribution (the proof is provided in Section B.1).
Lemma 0 (Unique Stationary Distribution).
Consider an infinite-server system under the JRS policy with as its subroutine, where is a single-server policy that is Markovian and -irreducible. Then the state of the system has a unique stationary distribution.
Let be the configuration vector of i.i.d. copies of the single-server system under . As discussed in Section 4.2, we will show that can be approximated by . In the remainder of this section, we omit the steady-state symbol for simplicity.
To rigorously discuss the approximation of the steady-state random variables, we define some metrics. Recall that is the set of feasible single-server configurations. Let be the set of feasible configurations for all normal servers. We use to denote the norm in both space and space :
For any two random variables , their closeness will be measured in terms of Wasserstein distance as follows:
where the supremum is taken over the all Lipschitz- functions from to .
5.2. Steps and Lemmas Needed for the Proof of Theorem 3 Assuming Irreducibility
Our goal is to show that the steady-state distribution of the normal servers’ real-job configurations is close to the steady-state distribution of i.i.d. copies of the single-server systems in Wasserstein distance, and that the backup servers are almost empty as the arrival rate gets large. More formally, we aim to show that and as . These two bounds provide the performance guarantee claimed in Theorem 3.
Instead of directly looking into the distribution of real-job configuration , we show that the distribution of each of the three sums, , , and , can be approximated by the distribution of in Wasserstein distance. The approximation result for each sum helps us derive the approximation result for the sum with one fewer term. The result that the backup servers are almost empty also follows from these approximations. This sequence of approximations is illustrated in the figure below, where recall that is the observed configuration.

A crucial observation that leads to this stepwise proof is that the process forms a Markov chain on its own. This is because real jobs and virtual jobs have the same transition dynamics and are indistinguishable by the subroutine when requesting jobs. Moreover, by the construction of JRS, the Markov chain governs the dynamics of the virtual-job configurations and the configurations on backup servers.
Our proof consists of two steps. In Step 1, we focus on the process . We show that , which immediately implies because we have limited the total number of tokens to . In Step 2, we use the approximation result for in Step 1 to show that the total number of virtual jobs, , and the total number of jobs on backup servers are both . Recall that , so we get .
Next, we state the specific lemmas.
Step 1.
Lemma 2 below bounds the Wasserstein distance between and .
Lemma 0.
Under the conditions of Theorem 3 and being -irreducible, we have
The key challenge for proving Lemma 2 is that the job dispatching decisions are based on the configurations of all normal servers, which creates dependencies among the transitions of different servers. The key idea that helps us overcome this challenge is to consider the sum , which remains unchanged under job arrivals regardless of dispatching decisions. Observe that has decoupled dynamics across servers because it is only changed by internal phase transitions, departures, and requests of new jobs, which happen independently on each server. This helps us prove , which implies Lemma 2, as argued earlier in the section.
Formally, the proof of Lemma 2 makes use of Stein’s method (see, e.g., (Braverman and Dai, 2017; Braverman et al., 2017; Braverman, 2022)) to compare with . Stein’s method usually consists of three steps: generator comparison, Stein factor bound, and moment bound. In our case, due to the finiteness of the state space , we only need to do the generator comparison and the Stein factor bound. In the generator comparison step, we show that the instantaneous transition rates of match with those of ; in the Stein factor bound step, we show that small difference in the transition rates of and does not cause much increase in the overall distance of the distributions. The detailed proof is in Section 5.4.
Step 2.
We establish Lemma 3 and Lemma 4 below, which bound the steady-state expected number of virtual jobs and the jobs on backup servers.
Lemma 0.
Under the conditions of Theorem 3 and being -irreducible, for each , the steady-state expected number of virtual jobs of type is of the order , i.e.,
Lemma 0.
Under the conditions of Theorem 3 and being -irreducible, for each , the steady-state expected number of type jobs on backup servers is of the order , i.e.,
The key idea for proving Lemma 3 and Lemma 4 is that by the characterization of in Lemma 2 and the fact that the job requests are made based on , we can show that the rate of requesting jobs is approximately equal to the arrival rate for each job type. Therefore, the number of tokens rarely reaches or . This implies the rarity of virtual jobs and jobs on backup servers. The proofs are provided in Section B.2.
5.3. Proof of Theorem 3 Assuming Irreducibility Based on Lemmas 1–4.
Proof.
First we show that Lemmas 2 and 3 imply the closeness between and . By Lemma 2, for any , we have By Lemma 3, . Recall that , so . Therefore,
(9) |
Now we prove , the bound on the expected number of the active servers, by taking a suitable in (9). Observe that
(10) |
where the last term on RHS is by Lemma 4. To show that the difference between the first two terms on the RHS of (10) are also , consider . Because
for any , we have . By (9), . Therefore, Recall that , so we get .
Similarly, to prove (6), we observe that
(11) |
The last term of (11) can be bounded as , which is by Lemma 4 and the fact that is a finite set. To show that the difference between the first two terms on the RHS of (11) is also , consider , where is the Lipschitz constant of . Because
for any , we have . By (9), . Therefore, . Recall that , so we get .
To show (7) and (8), noting that , we have
where in the last inequality we have used the fact that . This completes the proof.
∎
5.4. More Details on the System and Proof of Lemma 2
To bound the distance between and , observe that because and , it suffices to bound bound the Wasserstein distance between and , i.e.,
(12) |
where is a valid expression because as discussed in Remark 1 below.
5.4.1. More Details on System Dynamics and Generator
To prepare for the proof, we first look into the dynamics of the two systems under study. In particular, we write out the generators of and , which are used in the Stein’s method arguments.
We first examine the dynamics of the single-server system under Markovian policy . Four types of events change a single-server system’s configuration: internal transitions, departures, reactive requests, and proactive requests (see Section 4.1). The change of configuration due to any event can be represented by the diagram
where the arrow denotes an internal transition or a departure from configuration to if ; the arrow denotes a reactive request that adds jobs to the system if , and denotes a proactive request if . We call the above change of configuration a transition, and denote its rate as . Let denote the set of possible pairs in a transition.
We define the total transition rate at configuration as and define the maximal transition rate . Since is a finite set, we have . Also, observe that the request rate of type jobs is given by
(13) |
where denotes the stationary distribution of single-server configuration under policy
Next, we focus on the dynamics of i.i.d. copies of single-server systems. Consider the generator of the corresponding Markov chain , which is a linear operator on functions defined as:
(14) |
and we call the resulting function the drift of . Based on the transition rates defined above, we have
(15) |
where is a shorthand for , i.e., we use to omit the entries that agree with .
Similarly, for the infinite-server system, consider the generator of defined as
(16) |
for any function . The drift of under turns out to have a similar decoupled form as : observe that for each , the transition of from to occurs at the rate for each , and any real or virtual job arrivals do not change the sum . Consider any function and the function .
(17) |
In this context is a shorthand for . In other words, we use to omit the entries of ’s input that agree with the corresponding entries of .
Remark 1.
In (17), although is only defined on the domain , it is valid to write as its input because we always have , i.e., the total number of real jobs, virtual jobs, and tokens on a normal server never exceeds . To see why this is true, the single-server policy requests jobs only when there are no tokens on the server, and it will not request more than jobs if there are already real and virtual jobs on the server.
5.4.2. Proof of Lemma 2.
Proof.
Generator Comparison. For any , consider the Poisson equation (see, e.g., (Braverman, 2022)) that solves for :
(18) |
We let in (18) and take the expectation. This results in
(19) |
On the other hand, because is a finite-state Markov chain, we have
(20) |
where is given by Subtracting (20) from (19), we get
(21) |
We want to show that and are close so that we can bound the RHS of (21).
Now we plug the formula of the generators in (15) and (17) into the RHS of (21) and get
(22) |
where in we have omitted the entries that agree with . The equality (i) is true because each of the -th terms in and is equal if . For the inequalities (ii) and (iii), recall that is the total transition rate given by , and . Observe that
Therefore (22) can be further bounded by
To prove (12), it remains to show that
(23) |
Stein Factor Bound. To prove (23), observe that the following is a solution to the Poisson equation (18):
(24) |
This allows us to bound the difference of using coupling. Specifically, we define the coupling of two systems, each consisting of i.i.d. copies of the single-server system under . The two systems are initialized with configurations and that only differ at the -th server, where we omit the entries that agree with . Let be the joint configuration of the two systems, which is actually i.i.d. copies of the single-server system. As a result, we can specify the couplings for different separately. For , the corresponding server in the two systems have the same initial configurations, so we can always keep their configurations identical. For the -th servers, we let them evolve independently following their own dynamics until a stopping time when their configurations become the same. After that, we can use coupling to keep their configurations identical. Under this coupling, it is not hard to see that
(25) |
where in the second inequality we have used the fact that is -Lipschitz continuous under the norm of the space . For each pair of , observe that because is a -irreducible policy, is finite; and because is a finite set, is uniformly bounded. All these finite quantities depend on a single-server system under a policy that is independent of . As a result, the last expression in (25) is of constant order. Moreover, because there are finite pairs of , the supremum is also of constant order, independent of . This proves the Stein factor bound in (23). Together with the generator comparison, we have proved
(12) |
5.5. Role of Tokens and Virtual Jobs
This section aims to shed light on the role of tokens and virtual jobs in the proposed policy, JRS. We first outline how we devise the token-and-virtual-job mechanism from the perspective of generators. To begin with, consider the scenario where dispatch decisions are solely based on real-job configurations . In this case, the transitions of servers’ configurations would be correlated in general due to job arrivals, which are dispatched based on the joint configuration of all servers. To break this correlation, let us introduce tokens but not set an upper limit yet on the number of tokens (thus no virtual jobs). Observe that remains unchanged by job arrivals, so tokens remove the correlation brought about by job arrivals. However, because tokens lack internal phase transitions or departures, the transition dynamics of will diverge from when is large. In other words, although tokens help decouple the transitions on servers, they cannot keep the transitions of close to . To solve this issue, we finally introduce the mechanism of converting tokens into virtual jobs when the number of tokens is high, where virtual jobs can make internal phase transitions or departures just like real jobs. Now, the sum remains unchanged by job arrivals nor the creation of virtual jobs, and the internal phase transitions and job departures are similar to those of . More formally, the generators of and are close to each other – their additive difference can be upper bounded by a quantity proportional to the expected number of tokens, as shown in (22). Therefore, by regulating the number of tokens, we can control the difference between the generators of and .
Another key design component of JRS is that the subroutine requests jobs based on the observed configurations. This has been used in the proof of Lemma 2 to show that the observed configurations are close to , which consist of i.i.d. single-server systems. Recall that each single-server system in is designed to have a throughput of for each job type . Therefore, the proximity between and ensures that the job request rate mirrors the arrival rate for each job type, regardless of the real-job configurations. The fact that these two rates are approximately equal is important for proving Lemma 3 and Lemma 4. It guarantees that both the rate of generating virtual jobs (when there are too many tokens) and the rate of dispatching jobs to backup servers (when there are no tokens) are appropriate.
A natural follow-up question is whether the usage of tokens and virtual jobs is fundamental or an artifact of our analysis technique. For example, it is unclear whether removing the upper limit on the number of tokens would still yield an asymptotically optimal policy. This is an interesting question that we do not have a complete answer to. The token-and-virtual-job mechanism emerges as a natural choice under our analysis framework. Nevertheless, it is worth noting that our analysis primarily treats each server’s configuration as a generic Markov chain, without utilizing many properties specific to the stochastic bin-packing setting. An exception to this is the proofs in Section B.2, where we use the model that each job leaves the system within a constant expected time. It would be interesting to explore more properties of the problem to better understand policy designs without auxiliary state variables like tokens and virtual jobs.
6. Conclusion
In this paper, we study a new setting of stochastic bin-packing in service systems that features time-varying item sizes. Since our formulation is motivated by the problem of virtual-machine scheduling in computing systems, we use the terminology of jobs and servers, where jobs are viewed as items, whose sizes are their resource requirements, and servers as bins. The time-varying item sizes capture the emerging trend in practice that jobs’ resource requirements vary over time. Our goal is to design a job dispatch policy to minimize the expected number of active servers in steady state, subject to a constraint on resource contentions. Our main result is the design of a policy that achieves an optimality gap of , where is the scaling factor of the arrival rate. When specialized to the setting where jobs’ resource requirements remain fixed over time, this result improves upon the state-of-the-art optimality gap. Our technical approach highlights a novel policy conversion framework, Join-Requesting-Server, that reduces the policy design problem to that in a single-server system.
There are several potential directions that may be worth further exploration. One direction is to strengthen the optimality result within the current setting. Specifically, it is interesting to investigate: (i) whether it is possible to achieve an optimality gap smaller than ; and (ii) whether there exist asymptotically optimal policies whose average cost rate of resource contention satisfies the budget strictly instead of asymptotically.
We are also interested in extending our technique to the optimal control of other systems with similar structures. Intuitively, this technique could be applied to systems with many components that evolve mostly independently but are weakly coupled by certain constraints. Viewing each component as a server, we can define a suitable single-server problem and then design a policy for the original system to track the dynamics of the optimal single-server solution. Below we list several variations of our model that can potentially be analyzed using the proposed technique.
-
•
A model where jobs running on each server will be put into a local queue when there are resource contentions. The goal thus becomes finding the optimal trade-offs between the number of active servers and the waiting time of the jobs.
-
•
A model that allows each server to have a Markovian state that affects the dynamics of the jobs running on the server.
-
•
A model that allows jobs to migrate to different servers at the cost of migration delays.
-
•
A closed-system model where jobs re-enter the system after completion.
A third possible direction is to tackle the problem when the arrival rates and the parameters in the job model are unknown, as mentioned in Section 4.3. A possible approach is to develop an approximate version of the JRS framework, where the optimal single-server policy and the simulator for the virtual jobs are both learned from data. It is desirable to design such an approximate framework whose performance degrades gracefully as the approximation error increases.
Acknowledgements.
Y. Hong and W. Wang are supported in part by NSF grants CNS-200773 and ECCS-2145713. Q. Xie is supported in part by NSF grant CNS-1955997.References
- (1)
- Ayyadevara et al. (2022) Nikhil Ayyadevara, Rajni Dabas, Arindam Khan, and K. V. N. Sreenivas. 2022. Near-Optimal Algorithms for Stochastic Online Bin Packing. In Proc. Int. Conf. Automata, Languages and Programming (ICALP), Vol. 229. 12:1–12:20.
- Bashir et al. (2021) Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take It to the Limit: Peak Prediction-Driven Resource Overcommitment in Datacenters. In Proc. European Conf. Computer Systems (EuroSys). Online Event, United Kingdom, 556–573.
- Braverman (2022) Anton Braverman. 2022. The Prelimit Generator Comparison Approach of Stein’s Method. Stoch. Syst. 12, 2 (2022), 181–204.
- Braverman and Dai (2017) Anton Braverman and J. G. Dai. 2017. Stein’s method for steady-state diffusion approximations of systems. Ann. Appl. Probab. 27 (Feb. 2017), 550–581.
- Braverman et al. (2017) Anton Braverman, J. G. Dai, and Jiekun Feng. 2017. Stein’s method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stoch. Syst. 6, 2 (2017), 301–366.
- Buchbinder et al. (2021) Niv Buchbinder, Yaron Fairstein, Konstantina Mellou, Ishai Menache, and Joseph (Seffi) Naor. 2021. Online Virtual Machine Allocation with Lifetime and Load Predictions. ACM SIGMETRICS Perform. Eval. Rev. 49, 1 (May 2021), 9–10.
- Cloud (2023a) Google Cloud. 2023a. Overcommitting CPUs on sole-tenant VMs. https://cloud.google.com/compute/docs/nodes/overcommitting-cpus-sole-tenant-vms.
- Cloud (2023b) Google Cloud. 2023b. Virtual machine instances. https://cloud.google.com/compute/docs/instances.
- Coffman et al. (1983) E. G. Coffman, Jr., M. R. Garey, and D. S. Johnson. 1983. Dynamic Bin Packing. SIAM J. Comput. 12, 2 (1983), 227–258.
- Courcobetis and Weber (1990) Coastas Courcobetis and Richard Weber. 1990. Stability of On-Line Bin Packing with Random Arrivals and Long-Run-Average Constraints. Probab. Eng. Inf. Sci. 4, 4 (1990), 447–460.
- Courcoubetis and Weber (1986) C. Courcoubetis and R. R. Weber. 1986. Necessary and Sufficient Conditions for Stability of a Bin-Packing System. J. Appl. Probab. 23, 4 (1986), 989–999.
- Csirik et al. (2006) Janos Csirik, David S. Johnson, Claire Kenyon, James B. Orlin, Peter W. Shor, and Richard R. Weber. 2006. On the Sum-of-Squares Algorithm for Bin Packing. J. ACM 53, 1 (Jan. 2006), 1–65.
- Delimitrou and Kozyrakis (2014) Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proc. Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT, 127–144.
- Foundation (2023a) Apache Software Foundation. 2023a. Apache Mesos: Containerizers. https://mesos.apache.org/documentation/latest/containerizers/.
- Foundation (2023b) Apache Software Foundation. 2023b. Apache Mesos: Oversubscription. https://mesos.apache.org/documentation/latest/oversubscription/.
- Freund and Banerjee (2019) Daniel Freund and Siddhartha Banerjee. 2019. Good prophets know when the end is near. Available at SSRN: https://ssrn.com/abstract=3479189 (Nov. 2019).
- Ghaderi et al. (2014) Javad Ghaderi, Yuan Zhong, and R. Srikant. 2014. Asymptotic Optimality of BestFit for Stochastic Bin Packing. SIGMETRICS Perform. Eval. Rev. 42, 2 (Sept. 2014), 64–66.
- Gupta and Radovanović (2020) Varun Gupta and Ana Radovanović. 2020. Interior-Point-Based Online Stochastic Bin Packing. Oper. Res. 68, 5 (2020), 1474–1492.
- Kleinrock (1975) Leonard Kleinrock. 1975. Queueing Systems. John Wiley & Son.
- Li et al. (2014) Yusen Li, Xueyan Tang, and Wentong Cai. 2014. On Dynamic Bin Packing for Resource Allocation in the Cloud. In Proc. Ann. ACM Symp. Parallelism in Algorithms and Architectures (SPAA). Prague, Czech Republic, 2–11.
- Lo et al. (2015) David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proc. ACM/IEEE Ann. Int. Symp. Computer Architecture (ISCA). Portland, OR, 450–462.
- Maguluri and Srikant (2013) Siva Theja Maguluri and R. Srikant. 2013. Scheduling jobs with unknown duration in clouds. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Turin, Italy, 1887–1895.
- Maguluri et al. (2012) Siva Theja Maguluri, R Srikant, and Lei Ying. 2012. Stochastic Models of Load Balancing and Scheduling in Cloud Computing Clusters. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Orlando, FL, 702–710.
- Maguluri et al. (2014) Siva Theja Maguluri, R. Srikant, and Lei Ying. 2014. Heavy traffic optimal resource allocation algorithms for cloud computing clusters. Perform. Eval. 81 (2014), 20–39.
- Meyn (2007) Sean Meyn. 2007. Control Techniques for Complex Networks (1st ed.). Cambridge University Press, USA.
- Psychas and Ghaderi (2018) Konstantinos Psychas and Javad Ghaderi. 2018. On Non-Preemptive VM Scheduling in the Cloud. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems. Irvine, CA, 67–69.
- Psychas and Ghaderi (2019) Konstantinos Psychas and Javad Ghaderi. 2019. Scheduling Jobs with Random Resource Requirements in Computing Clusters. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). 2269–2277.
- Psychas and Ghaderi (2021) Konstantinos Psychas and Javad Ghaderi. 2021. High-Throughput Bin Packing: Scheduling Jobs With Random Resource Demands in Clusters. IEEE/ACM Trans. Netw. 29, 1 (2021), 220–233.
- Psychas and Ghaderi (2022) Konstantinos Psychas and Javad Ghaderi. 2022. A Theory of Auto-Scaling for Resource Reservation in Cloud Services. Stoch. Syst. 12, 3 (2022), 227–252.
- Reiss et al. (2012) Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proc. ACM Symp. Cloud Computing (SoCC). San Jose, CA, Article 7, 13 pages.
- Rzadca et al. (2020) Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, and John Wilkes. 2020. Autopilot: Workload Autoscaling at Google. In Proc. European Conf. Computer Systems (EuroSys). Heraklion, Greece, Article 16, 16 pages.
- Stolyar (2013) Alexander L. Stolyar. 2013. An Infinite Server System with General Packing Constraints. Oper. Res. 61, 5 (2013), 1200–1217.
- Stolyar (2017) Alexander L. Stolyar. 2017. Large-scale heterogeneous service systems with general packing constraints. Adv. Appl. Probab. 49 (March 2017), 61–83. Issue 1.
- Stolyar and Zhong (2013) Alexander L. Stolyar and Yuan Zhong. 2013. A large-scale service system with packing constraints: Minimizing the number of occupied servers. ACM SIGMETRICS Perform. Eval. Rev. 41, 1 (June 2013), 41–52.
- Stolyar and Zhong (2015) Alexander L. Stolyar and Yuan Zhong. 2015. Asymptotic optimality of a greedy randomized algorithm in a large-scale service system with general packing constraints. Queueing Syst. 79 (June 2015), 117–143. Issue 2.
- Stolyar and Zhong (2021) Alexander L. Stolyar and Yuan Zhong. 2021. A Service System with Packing Constraints: Greedy Randomized Algorithm Achieving Sublinear in Scale Optimality Gap. Stoch. Syst. 11 (June 2021), 83–111. Issue 2.
- Tirmazi et al. (2020) Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: The next Generation. In Proc. European Conf. Computer Systems (EuroSys). Heraklion, Greece, Article 30, 14 pages.
- Wilkes (2019) John Wilkes. 2019. Google cluster-usage traces v3. http://github.com/google/cluster-data.
- Xie et al. (2015) Qiaomin Xie, Xiaobo Dong, Yi Lu, and R. Srikant. 2015. Power of d Choices for Large-Scale Bin Packing: A Loss Model. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems. Portland, OR, 321–334.
Appendix A Proof of Theorem 2 (Lower Bound)
Proof.
It is sufficient to show that given an infinite-server policy for in (2), we have . To this end, we will construct a single-server policy such that the resulting system configuration in steady state satisfies:
(27) | |||
(28) |
Let be the distribution of , then is a feasible solution to the problem in (3). As a result, we have . Note that although is actually non-Markovian, i.e., it makes decisions based on not only the current configuration but also the history, as we will show in Appendix C, is still a lower bound to the objective value that can achieve in .
The construction of the single-server policy involves simulating an infinite-server system under from the empty configuration. At time , the policy randomly chooses the -th server in the infinite-server system with probability , for . It then requests jobs for the single-server system according to a policy . The key to our policy is to make the single-server system emulate the job assignment at the -th server of the simulated infinite-server system, but without incurring idleness. We first construct the policy , and then specify the probabilities .
Let us start by introducing some useful notation. Let be the single-server system configuration under at time and be the configuration of the -th simulated server in the infinite-server system under . We define a stochastic process as follows: . The “” is well-defined because the integral is continuous in . Intuitively, gives the maximum time when the accumulative busy time of the -th server is . Note that is only discontinuous when reaches , thus it is right-differentiable with derivative equal to at any point.
We construct and the simulation of the infinite-server system under in a way such that:
(29) |
That is, we want that the single-server system has the same dynamic of the simulated -th server except skipping the idle period. To this end, we couple the two systems as follows:
-
(1)
When the -th simulated server receives a type job, we let the single-server system request a type job at time . For each such job, its phase transition process in the -th simulated server is the same as that in the single-server system. That is, when we observe any internal transition or departure event in , we produce a same event on the -th simulated server .
-
(2)
The simulations of the rest of the infinite-server system under policy are driven by independently generated random seeds.
It is not hard to see that the simulated infinite server-system has the same stochastic behavior as an uncoupled system under Moreover, as we couple all the events that happen in and , together with the facts that and are piecewise constant and , we get (29).
Next we claim that (29) implies the following relationship between the steady-state cost of the single-server system under and the steady-state cost of the -th simulated server in the infinite-server system under :
(30) |
This is because for all , we have
where and hold because long-run averages converge to steady-state expectations; is due to the fact that for any ; is due to the fact that and follows from (29).
Let be the long-run request rates of type jobs in the single-server system under , and be the throughput of type jobs of -th simulated server under . By the construction of , the single-server system requests jobs based on the arrival events of the -th simulated server, we have
With the constructed policies , we are ready to define the policy . We let choose an index with probability at time , and then follow . We set the probability as
(31) |
where the second inequality uses the fact that . Then under , we have
where follows from (30) and (31). Observe that under , almost surely, we thus have
which proves (27). Moreover, for each the request rate is given by
This proves (28). By the argument presented at the beginning of the proof, we get . ∎
Appendix B The Rest of the Proofs Needed for Theorem 3 (Conversion Theorem)
B.1. Proof of Lemma 1
Proof.
We will show that under the JRS policy, the Markov chain for the system state (represented as ) has a unique stationary distribution by first arguing that it is -irreducible (here being -irreducible means the Markov chain has a state that can be reached by all other states through transitions, “” in “-irreducible” does not refer to any specific states), and then use Foster-Lyapunov theorem to show the positive recurrence (see, e.g., Meyn, 2007). Combining -irreducibility with positive recurrence, we can conclude that the Markov chain under study has a unique stationary distribution.
First, we show that the Markov chain is -irreducible. Specifically, observe that the Markov chain starting from any state can reach the state , after experiencing a sequence of departures and arrivals that clears up all the tokens, virtual jobs and jobs on backup servers. Further, letting be the configuration reachable by all other configuration in the single-server system under the policy , we argue that starting from any states of the form , the Markov chain can reach the state . Because for any , there is a transition path from to , consider the sequence of events where each transitions independently following the path, and the jobs arrive right after making a request, so that the tokens are checked out before has a further transition. In this way, each with can eventually reach from . This proves the -irreducibility of .
Next, we show that satisfies the Foster-Lyapunov criterion, i.e.,
(32) |
where is a non-negative function of the states, is a finite set, is a finite number, and is the infinitesimal generator of the continuous-time Markov chain. Let be the expected remaining time in the system when a job is in phase for each . According to the job model, we have the recurrence relation
(33) |
We construct a Lyapunov function as follows:
(34) |
Using the relation (33), it can be verified that the drift of satisfies
where the first inequality uses the fact that virtual jobs are generated at a rate no faster than the total rate of job requests. Then the Foster-Lyapunov criterion in (32) is satisfied with and given by
By the Foster-Lyapunov theorem, is positive recurrent. ∎
B.2. Proofs of Lemma 3 and Lemma 4
In this subsection, we prove Lemma 3 and Lemma 4 together. We begin by introducing some notations. We use to represent the state of the -th server, and use to represent the joint state of the first servers. We also use the lowercase , to represent the realizations of the corresponding random variables. We denote the total number of type virtual jobs as for , and its realization as . We denote the total number of type jobs on backup servers as for , and its realizations as . We also denote the total number of type tokens throughout the system as , and its realization as . Our goal can be rewritten as proving and for each .
We first give an overview of the proof. Observe that in our model, the expected time that a job stay in the system is fixed. As a result, bounding the number of virtual jobs or jobs on backup servers in the system is equivalent to bounding the rate that they are generated, according to Little’s Law. By our construction of the policy, the rate of generating those jobs is closely related to the dynamics of the total number of type tokens .
To describe the dynamics of , we first introduce two functions and :
The function represents the increment in the number of type virtual jobs due to the event that the total number of type tokens on the first servers exceeds the token limit . The function corresponds to the increment in the total number of type jobs on backup servers due to the event that a type job arrives to the system without seeing a type token. For a function that only depends on the number of type tokens , its drift can be written as
(35) |
We abuse the notation of here. For ease of exposition, we will simply write and to represent and .
By construction, the total number of type tokens is a stochastic process constrained within . Note that increases when some servers request new tokens, and decreases when a real or virtual arrival checks out the token or when some servers have the excessive tokens removed. When is away from the boundaries, the average rate that it increases is approximately equal to , and the average rate that decreases is given by
where we have used the approximations that , and .
As randomly moves up and down with approximately the same rate and reflects on the boundaries of and , it behaves as a reflected simple symmetric random walk. Intuitively speaking, the steady-state distribution of is approximately a uniform distribution over . Recall that and can only be non-zero when is near the boundaries. Since the length of the interval , we can expect that and diminish as .
In the proof, we first establish the relationship between , and , using Little’s Law. Then we derive bounds on and by analyzing the drift of several test functions of . This step is implicitly based on the intuition that is approximately uniformly distributed over , with the tokens being generated and eliminated at similar rates. Finally, we invoke Lemma 2 to show that the tokens are indeed generated and eliminated at similar speeds, which leads to bounds on and .
Finally, we make some additional remarks on the notations. First, and depend on the total number of type tokens and the number of newly requested jobs , although we omit the dependency expression for ease of exposition. Second, we abuse the notation and to denote the corresponding random variables. We also write as a shorthand for when the context is clear.
B.2.1. Proofs of Lemma 3 and Lemma 4.
Proof.
Step 1: Bounding Virtual Jobs and Jobs on Backup Servers using Little’s Law. We first apply Little’s Law to and . For each type , we let the expected time that a type job stays in the system be . Let . Because there are only finitely many types of jobs, and each job spends finite expected time in the system, is a finite constant. By Little’s law,
(36) | ||||
(37) |
Step 2: Drift Analysis. The above two equations (36) and (37) suggest that we can derive upper bounds on and by analyzing the following two terms:
-
•
, interpreted as the average rate that reflects on the boundary at ;
-
•
, interpreted as the average rate that reflects on the boundary at .
We establish the relationships of , and by analyzing the drift of two test functions .
Letting and taking steady-state expectation over its drift, by (B.2) and the fact that the drift is zero in steady state, we get
(38) |
Similarly, letting and taking steady-state expectation over its drift, one can verify that
(39) | |||
(40) | |||
(41) |
Readers may refer to the complete calculation at the end of this subsection.
Step 3: Estimating the Terms Obtained from Drift Analysis. We will first focus on bounding analyzing the two terms in (40) and (41) separately. Then we invoke (38) to bound .
The term in (41) is easy to deal with. Observe that the number of jobs requested each time should be no more than the maximal number of jobs that a server can hold, i.e., , so
(42) |
where in the second inequality we have used the fact that the total rate is uniformly bounded by , and the last step uses the facts that and .
To bound the term in (40), first observe that , which implies that
The term on the RHS of the above equation is the expected absolute difference between the rates of generating and eliminating type tokens, which can be shown to be small relative to . Specifically, we claim that
(43) |
To show (43), first notice that we can remove the indicator without introducing much error:
where the first inequality is due to triangular inequality, the second inequality is due to the definition of , and the last inequality is because . It remains to bound the term , which can be seen as showing that the rate of generating type tokens concentrates around the type jobs’ arrival rate . It is natural to think of using some Law of Large Numbers. Unfortunately, is not a sum of i.i.d. random variables due to dependencies among for different ’s. As a result, we want to invoke the Wasserstein distance bound in Lemma 2 to replace in the above expression with . We define the function as
(44) |
We claim that . For any two , ,
where the first inequality is due to triangular inequality; the second inequality uses the fact that and ; the third inequality uses triangular inequality, the fact that the total rate at a configuration is bounded by and the property of the norm . Therefore, . The Lipschitz continuity of allows us to invoke Lemma 2 and get
Therefore,
(45) |
Observe that under a Markovian policy, the request rate of type jobs can be written as , so we have
(46) |
Moreover, because are i.i.d. for , we have
Therefore, by combining the arguments above, we get
(47) |
which proves (43). This implies that the term in (40) is also in .
B.2.2. Deriving the equality in (39)
We show the calculation detail of deriving the following equality.
(39) | |||
(40) | |||
(41) |
The equality is obtained by considering the drift of the function , which is zero in steady state. Recall that the drift of is given by
(B.2) |
We will first calculate , then , and finally plug them into the (B.2).
The calculation of utilizes the following property of :
(48) |
This property follows from the definition . Intuitively, this is because is the “force” that pushes back when it hits the boundary at . Using the property, we have
The second last equality is due to (48), and the rest are all algebraic manipulations.
We carry out a similar calculation for :
where the last equality is due to the property that
(49) |
and the rest are all algebraic manipulations.
Putting together,
After recombining the terms, we get
This finishes the calculation.
B.3. Proof of Conversion Theorem without Assuming Irreducibility
In this subsection, we prove Theorem 3 without assuming -irreducibility of the subroutine . Specifically, suppose that we have a Markovian single-server policy and an initial distribution over its recurrent classes for , we will construct an infinite-server policy such that (5) and (6) still hold. The basic idea is to decompose a general Markovian single-server policy into multiple -irreducible Markovian policies, each induces one recurrent class and preserves stationary distribution and the throughput of on that recurrent class, as stated in Lemma 1 below.
Lemma 0 (Decomposing The Reducible Policy).
Let be a general single-server Markovian policy with recurrent classes for . Then for each exists a Markovian policy such that
-
•
The induced Markov chain is -irreducible with the unique recurrent class being ;
-
•
The stationary distribution is the same as the stationary distribution under starting from a configuration in .
Proof.
For each , we define the policy as follows: when the system has configuration , the policy makes the same decisions as ; when and , the policy starts a timer whose duration follows an exponential distribution with rate and immediately adds many jobs of each type when the timer ticks for some arbitrary ; when and , the policy does not request any jobs.
We show that under the new policy , is also a recurrent class of the induced Markov chain. This is because if the system starts from a configuration , then it will stay in since it makes the same decisions and has the same transitions as under the policy . Because is a recurrent class under , it is still a recurrent class under .
To show that the Markov chain induced by is -irreducible, observe that starting from any , the system state will return to . Specifically,
-
•
If the system starts from a configuration such that and , then no new jobs will be requested until either or . In the latter case, by the construction of the policy, the system jumps to a configuration in after the next transition.
-
•
If the system starts from a configuration such that and , the system jumps to a configuration in after the next transition.
The claim that the stationary distribution under is the same as the stationary distribution under starting from a configuration in is trivial to show, because when the system initializes from any configuration in , it stays in and the transitions are exactly the same under the two policies. ∎
The JRS policy with a general Markovian single-server policy as its subroutine is constructed using the -irreducible policies ’s obtained from the decomposition of and the probabilities ’s.
-
(1)
We divide all the servers into server pools, each with infinitely many servers. Let the -th server pool run the JRS policy with subroutine (defined in Section 4.2) for each .
-
(2)
Whenever we see an arrival of type , we route the job to the -th infinite-server system with probability for each .
To analyze the policy , let and ’s be the stationary distribution and throughput of the policy for . By Lemma 1, we have the following relationships:
(50) | ||||
(51) |
Based on the above relationships, we can prove the general version of the Conversion Theorem that does not require -irreducibility.
Proof of Theorem 3..
For each , Lemma 1 implies that the policy and stationary distribution form a feasible solution to the single-server problem , and the corresponding objective value is . Consider the infinite-server system with arrival rates and budget . As we have proved Theorem 3 for the JRS policy with -irreducible subroutine , it follows that
(52) | ||||
(53) |
where is the random variable representing the steady-state number of servers in configuration in the infinite-server system under the JRS policy with subroutine
By the construction of , the arrival rate to the -th server pool is equal to , where the first equality is due to (51), and the second equality is due to the condition that . Therefore, (52) and (53) hold, and we have
Here we use (52) and the relationship between and . Similarly,
After re-indexing the servers, we get (5) and (6). The bounds on and , (7) and (8), follow from (5) and (6). They can be verified in the same way as that in the proof for the irreducible case, so we omit the argument here. ∎
Appendix C Solving the Single-Server Problem
In this section, we show the equivalence of the single-server problem in (3) and a linear program (LP) as stated in Theorem 4. The equivalence needs to be proved in two directions. In Section C.1, we first derive the linear program (61) as a relaxation of the single-server problem (3) so that the optimal value of the LP is a lower bound to the optimal value of (3). Then in Section C.2, we will construct a Markovian single-server policy that achieves the optimal value of the LP, which implies the optimality of the policy.
C.1. Lower Bound via LP Relaxation
In this subsection we derive an LP relaxation of the optimization problem (3), restated below:
(3 revisited) | ||||||
subject to | ||||||
Here we allow to be non-Markovian, i.e., it can make decisions based on the history, but we still require its performance metrics used in the objective and the constraints of (3) to have well-defined steady-state distributions.
Observe that both and depend on the stationary distribution , but the constraints in terms of are implicit. To derive an LP relaxation, we give an explicit characterization of the constraints that must be satisfied by the stationary distribution induced by any feasible policy .
To do this, we derive a version of the stationary equation in terms of a quantity called transition frequency. The transition frequency of type jobs is a function describing the steady-state frequency of requesting a type job when the system has configuration . To rigorously define transition frequency, we first introduce a concept called nominal transition.
Definition 0 (Nominal Transition).
Consider a single-server system under any policy. When the configuration transitions from to for some with new jobs added into service, we decompose the transition by adding intermediate configurations as illustrated below, where first goes to if , then add jobs of each type one by one.
where is a fixed ordering of the set of phases . We call each short transition in the diagram a nominal transition.
For , we denote as the cumulative number of nominal transitions from to during the time interval , which is a random variable with a distribution depending on the single-server policy and initial distribution of configurations.
Note that for any and s.t. , counts the number of times that a type job departs when being in configuration . As a result,
where denotes a unit rate Poisson process. If we take expectation, divide both sides by , and let , we have
(54) |
Similarly, counts the number of times a job in phase transitions to phase when being in configuration for any , s.t. and , so
(55) |
We define transition frequency as follows.
Definition 0 (Transition Frequency).
Transition frequency of type jobs at state is the long-run average number of nominal transitions from configuration to per unit time,
(56) |
The transition frequencies allow us to derive the following version of the stationary equation.
Lemma 0 (Stationary Equation).
Under any policy, the stationary distribution and the transition frequency satisfy the following equation:
(57) | ||||
for any state , and are shorthand for .
Proof.
For each configuration , if we look at the difference between the number of nominal transitions into configuration and that out of configuration by time , we have the following equation,
(58) | ||||
By Definition 2 and (54)-(55), if we divide both sides of (58) by and let , we get the stationary equation in (57). ∎
Since the stationary equation in (57) is linear in and , we can write it in matrix form:
(59) |
where and are column vectors representing , , and are matrices that make (59) equivalent to (57). Therefore, the following three conditions are necessary for any tuple to be a possible pair of stationary distribution and transition frequencies for a Markovian policy.
(60) | ||||
Based on the characterization of stationary and ’s in (60), we can now formulate a linear program . The linear program has decision variables , , and for , where is a factor that scales the throughput of each type of jobs in the direction of .
(61) | ||||||
subject to | ||||||
where is the vector form of the cost rate function ; is a -dimensional vector with one in all entries except those with . In addition to the last three constraints on stationarity, the first constraint of is the resource contention constraint; the second constraint is because the transition frequency from to is equal to the throughput of type jobs.
C.2. Policy Construction
In this subsection, we describe a procedure that allows us to construct a policy that achieves the lower bound given by the LP relaxation in (61). Specifically, given a feasible solution to (61), we define an LP-based policy that requests jobs as follows:
-
•
Case 1: When the system enters a configuration with , for each , the policy starts a timer whose duration follows an exponential distribution with rate . The policy requests a type job when the -th timer ticks. When the configuration changes, all timers are canceled.
-
•
Case 2: When the system enters a configuration with and , the policy immediately requests a type job with probability .
-
•
Case 3: When the system enters a configuration with and , the policy does not request any jobs.
We denote the LP-based policy based on the solution as .
Remark 2.
Note that the definition of the LP-based policy here is stated from a view different from the view in Section 4.2: here each request only adds one job to the server, and one request can happen immediately after another; while in Section 4.2 each request can add multiple jobs to the server, and there is only one request happening at a time. We refer to the view here as the impulsive view, because multiple requests happening at the same time can be thought of as having an infinite request rate. In contrast, we call the view in Section 4.2 the non-impulsive view.
The LP-based policy can be alternatively described using the non-impulsive view if we see multiple requests happening at the same time as one request that adds multiple jobs to the server. More specifically, each reactive request of the LP-based policy is initiated by an internal transition or departure and consists of one or multiple requests of Case 2; each proactive request of the LP-based policy consists of one request in Case 1 and possibly several requests in Case 2.
The following lemma characterizes the steady-state behavior of a single-server system under the LP-based policy.
Lemma 0 (Properties of LP-based Policies).
Consider a single-server system under the LP-based policy , where is a feasible solution to (61). We have that is a stationary distribution under policy , and are the transition frequencies corresponding to .
The proof of Lemma 4 is based on (58), following the same argument as the proof of Lemma 3, as well as an induction argument.
Proof.
Let be a feasible solution to the LP in (61). To show that and are also the actual stationary distribution and transition frequencies, it suffices to show that if the initial distribution follows , i.e.
then under the policy , we have
(63) | |||
(64) |
where is the cumulative number of nominal transitions under the policy and the initial distribution .
Our proof is based on the following equation:
(65) | ||||
The equation is a straightforward consequence of (58), following the same argument as the proof of Lemma 3.
We prove the following two equations by induction on .
(66) | ||||
(67) |
We first consider the base case when . In this case, and we have
(68) |
for all . This reduces (65) to
(69) | ||||
Now we discuss based on whether . If , by the definition of our policy, for all ,
(70) |
which is (66). Combining the above equation and the stationary equation (57) satisfied by , we conclude that the RHS of (69) is zero, i.e.
which is (67). For the case when and , because the system immediately leave the configuration after reaching it through a nominal transition,
i.e., the LHS of (69) is . Again we compare (69) against the stationary equation (57) and get
By the definition of our policy, we have
(71) |
which is (66). For the case when and , (57) implies that , , for any . Then (69) is further reduced to
Because and , the LHS of the above expression is non-negative. However, the RHS of the above expression is non-positive. Therefore, both sides are equal to zero, thus we have and .
Having proved the base case, we do the induction step. Suppose we have proved (66) and (67) for all such that for some integer . We consider with . By the induction hypothesis,
(72) |
Then we repeat the arguments after (68) of the base case verbatim. By induction, we have proved (66) and (67).
Therefore, given the policy and initial distribution, the distribution of is stationary, i.e., we always have for all . As a result, an analogue of (66) holds for all : is differentiable with respect to and
for all and all . Therefore,
This completes the proof. ∎
C.3. Proof of Theorem 4
Theorem 4 (Optimality of Single-OPT).
Given an optimal solution to the linear program , we can solve the single-server problem in (3) to achieve an optimal value , using the optimal policy and the optimal stationary distribution . Moreover, the optimal policy is a Markovian policy.
Proof.
By Lemma 4, under the policy , is a stationary distribution, and are the corresponding transition frequencies. Recall the single-server problem
(3 revisited) | ||||||
subject to | ||||||
Observe that under the policy , the cost rate of resource contention is , the request rate of type jobs is , so
where we have used the fact that in the first equality. Therefore, is a feasible solution to the single-server problem , achieving the objective value of , which is the optimal value because . ∎
Appendix D Performance guarantee of Join-Requesting-Server with an estimated model
D.1. Assumptions and result
In this section, we consider the performance of JRS when it is based on an estimated model. We will state the performance guarantee in terms of the estimation error, and give a proof sketch by pointing out which part of Theorem 3’s proof needs to be changed accordingly.
Consider the setting where the maximal jobs on a server , the set of job phases , the cost rate function , and the budget are all known. However, we only have estimations of the jobs’ arrival rates, internal transition rates, and departure rates.
Specifically, for any , let be the true rate of internal transition from phase to phase ; let be the true departure rate of phase ; let be the true arrival rate of type jobs. We let ’s, ’s, and ’s be the estimated internal transition rates, departure rates, and arrival rates, respectively. We assume that there exists a small positive constant that is independent of the scaling factor such that the following assumptions hold.
Assumption 1 (-accurate estimation).
For any ,
(73) | ||||
(74) | ||||
(75) |
Assumption 2 (Scaling of the single-server objective value).
Consider the single-server problem in (3). Let be a solution feasible to the single-server problem with the estimated parameters that are -accurate, where for some constant . We assume that there exist constants independent of and such that
Assumption 3 (-insensitivity of the optimal value).
Consider the single-server problem in (3). Let the optimal value of the single-server problem with the estimated parameters be , where the estimated parameters are -accurate; let the optimal value of the single-server problem with true parameters be . We assume that
We also assume that JRS can accurately simulate the virtual jobs.
Assumption 4 (Accurate simulation).
The virtual jobs simulated in JRS follow the true transition dynamics.
Given the above assumptions, we have the following proposition that states the optimality gap of JRS policy with estimated model parameters, which has a similar form as Theorem 3 for JRS under true model parameters.
Proposition 0 (Optimality gap with model estimation).
Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements. Let the infinite-server policy be JRS with subroutine . Suppose is specified based on an estimated model satisfying Assumption 1, 2, 3, and 4, for s.t. , where is some positive constant independent of . Let be the objective value achieved by in the single-server problem with estimated parameters. Under , for any initial state, we have
(76) | ||||
(77) |
where denotes the steady-state configurations of the single-server system under with the estimated parameters. If we let be the optimal policy of the single-server problem with estimated parameters, for any initial state, we have
(78) | ||||
(79) |
where is a positive constant independent of . In other words, is -optimal.
Remark 3.
Note even if the single-server system with estimated parameters -irreducible under , it is hard to guarantee that the original system is -irreducible due to the estimation errors. Consequently, the steady-state performance metrics and could depend on the system’s initial state. Fortunately, our proof for Theorem 3 does not rely on the uniqueness of the stationary distribution, so we can adapt the proof to show the inequalities in Proposition 1 for any initial state.
Remark 4.
Note that 4 ensures that the real jobs and virtual jobs on a server are indistinguishable, so that is still a Markov chain. Recall that and and denote the observed configuration and tokens for each normal server , respectively (see Section 5.1). We suspect that 4 can be removed since our proof does not rely too much on being a Markov chain. However, removing the assumption requires a more careful and notationally heavy analysis. We argue that in practice, this assumption is not restrictive, as one can record the traces of the jobs that arrived in the past and resample virtual jobs from those traces.
D.2. Lemmas
In this section, we give a proof sketch for Proposition 1 when the single-server system under with estimated parameters is -irreducible. The argument of extending to the general case is essentially the same as Section B.3, so we omit it here.
On a high level,the proof of Proposition 1 is similar to that of Theorem 1. In particular, the key steps of the proof are to show that and as , where , and are i.i.d. copies of steady-state configurations of the single-server system under with the estimated parameters, and is the steady-state configurations of the infinite server system under .
Proposition 1 is based on the three lemmas stated below. The proof of Proposition 1 using the three lemmas is essentially the same as the argument in Section 5.3.
Lemma 0.
Under the conditions of Proposition 1 and the single-server system with estimated parameters under being -irreducible, for any initial state, we have
Lemma 0.
Under the conditions of Proposition 1 and the single-server system with estimated parameters under being -irreducible, for any initial state and , the steady-state expected number of virtual jobs of type s.t.
Lemma 0.
Under the conditions of Proposition 1 and the single-server system with estimated parameters under being -irreducible, for any initial state and , the steady-state expected number of type jobs on backup servers s.t.
D.3. Proof sketch for Lemma 2
Recall from Section 5.4 that the proof of Lemma 2 is based on Stein’s method, which compares the generator of the i.i.d. copies of the single-server system with the generator of the infinite-server system. To write down the generators with the estimated model, recall that in the single-server system, each transition can be represented by the diagram
where the arrow denotes an internal transition or a departure if ; the arrow denotes a job request that is made right after reaching . For any , let be the set of such that . We define two sets of transition rates as below: for any and ,
-
•
Under the estimated parameters and the policy , we let be the rate of the transition , and let be the total transition rate at configuration .
-
•
Under the true parameters and the policy , we let be the rate of the transition , and let be the total transition rate at configuration .
Let be the generator of i.i.d. copies of single-server systems under the estimated parameters. For any , we have
(80) |
where is a shorthand for . Let be the generator of for the infinite-server system. For any function and , we have
(81) |
To prove Lemma 2, we need to show that for any , , and ,
(82) |
where is the solution to
(83) |
and . Same as the proof of Lemma 2, we prove (82) in two steps: the generator comparison step, and the Stein factor bound step.
In the generator comparison step, we observe that the formula of and in (80) and (81) look almost the same as (15) and (17), except that the rates in (81) are instead of . As a result, after carrying out similar calculations as in the poof of Lemma 2, we get an extra error term involving , which can be bounded using the lemma below. This error term results in the in (82).
Lemma 0.
Under 1, for any and , we have
(84) |
Proof.
When , and are both the rate of adding jobs via a proactive request under the policy , so they are identical.
When and , is equal to the rate of going from to via an internal transition or departure under the estimated job model, while is under the true job model. Because there are at most jobs, and by our assumption the estimation error of each job’s transition rates are bounded by , thus and differ by at most .
When and , is equal to the rate of going from to via an internal transition or departure, multiplied by the probability of adding jobs via a reactive request, under the estimated job model and the policy ; is under the true job model instead of the estimated job model, but uses the same policy. Because the rate of going from to differs by at most under two different job models, and the probability of adding jobs after going from to is the same under the same policy, thus and differ by at most . ∎
In the Stein factor bound step, we need to show that
(85) |
This involves analyzing the i.i.d. copies of the single-server system, and the fact that the single-server system with estimated parameters is -irreducible under . This part of the proof is identical to the corresponding part of the proof of Lemma 2.
D.4. Proof sketch for Lemma 3 and Lemma 4
The proof of Lemma 3 and Lemma 4 has a similar structure as the proof of Lemma 3 and Lemma 4. In the first step, we use Little’s law to bound expectations of the number of type virtual jobs, , and the number of type jobs on backup servers, . We have two equations almost the same as (36) and (37) except that the rates and are replaced by and :
(86) | ||||
(87) |
where is the maximal expected service time of any type of virtual job or real job. Because , it suffices to bound and .
In the second step, we utilize the fact that the two Lyapunov functions and have zero drift in steady-state, where is the total number of type tokens. We get two equalities similar to (38) and (41):
(88) |
(89) | |||
(90) | |||
(91) |
In the third step, we use the above two equalities to bound and . We first use the equality in (89) to (91) to bound . Following the same argument as the proof of Lemma 3 and Lemma 4 until (47), we can show that
(92) |
where to get (92), we apply Lemma 2 to replace with , which causes an error. Next, we show that
(93) |
(94) | ||||
(95) |
These two bounds allow us to replace and on the LHS of (93) with and at the cost of introducing error. Moreover, because are i.i.d., concentrates around its mean with error, where the mean can be shown to be
Note that the first equality in (96) holds because for each , is equal to , i.e., the long-run average rate of requesting type jobs on a single-server system with estimated parameters under ; the second equality in (96) is because , and . As a result,
(96) |
Combining (94) to (96), we get (93). Therefore,
which completes the proof of Lemma 3.