This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Task Admission Control and Boundary Analysis of Cognitive Cloud Data Centers

Wenlong Ni1, Yuhong Zhang2, and Wei Li2,  This work was supported in part by US National Science Foundation under Grant No. CNS1827940 and JiangXi Education Department under Grant No. GJJ191688. 199 ZiYang Ave, JiangXi Normal University NanChang, CHINA 330022 23100 Cleburne St, Texas Southern University, Houston, USA TX 77004
Abstract

A novel cloud data center (DC) model is studied here with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources and an online user or the user with batch tasks may decide whether or not to pay for getting the services. The online service tasks have a higher priority in getting the service over batch tasks. Both types of tasks need a certain number of virtual machines (VM). By targeting on the maximization of total discounted reward, an optimal policy for admitting task tasks is finally verified to be a state-related control limit policy. Next, a lower and an upper bound for such an optimal policy are derived, respectively, for the estimation and utilization in reality. Finally, a comprehensive set of experiments on the various cases to validate this proposed model and the solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. The results achieved in this paper will be expectedly utilized in various cloud data centers with cognitive characteristics in an economically optimal strategy.

Index Terms:
Cloud Data Center, Cognitive Network, Optimal Strategy, Cost Efficiency, Bandwidth Allocation.

I Introduction

With the rapid development of Internet and computing technology, more and more data are continually being produced from all over the world. Nowadays, cloud computing has become fundamental for IT operations worldwide, replacing traditional business models. Enterprises can now access the vast majority of software and services online through a visualized environment, avoiding the need for expensive investments in IT infrastructure. On the other hand, challenges in cloud computing need to be addressed such as security, energy efficiency and automated service provisioning. According to various studies, electricity use by DCs has increased significantly due largely to explosive growth in both the number and density of data centers (DCs)[1]. To serve the vast and rapidly growing needs of online businesses, infrastructure providers often end up over provisioning their resources (e.g., number of physical servers) in the DC. Energy efficiency in DCs is a goal of fundamental importance. Various energy-aware scheduling approaches have been proposed to save power and energy consumptions in DCs by minimizing the number of active physical servers hosting the active virtual machines (VMs); VMs also can be migrated between different hosts if necessary[2, 3, 4].

There are basically two parties in the cloud computing paradigm: the cloud service providers (CSPs) and the clients, who each have their own roles in providing and using the computing resources. While enjoying the convenient on-demand access to computing resources or services, the clients need to pay for this access. CSPs can make a profit by charging their clients for the services provided. Clients can avoid the costs associated with ’inhouse’ provisioning of computing resources and at the same time have access to a larger pool of computing resources than they could possibly own by themselves. Many different approaches can be used to evaluate the cloud computing service quality, and various optimization methods can be used to optimize them [5, 6, 7, 8, 9, 10, 11].

Note that in general, there are two types of applications in the DC: service applications and batch applications [12]. Service applications tend to generate many requests with low processing needs whereas batch applications tend to include a small number of requests with large processing needs. Unlike the batch applications that are throughput sensitive, service applications are typically response-time sensitive. In order to reduce power and energy costs, CSPs can mix online and batch tasks on the same cluster [13]. Although the co-allocation of such tasks improves machine utilization, it challenges the DC scheduler especially for latency critical online services.

The problem under investigation in our current research is for a novel cognitive DC model wherein

  1. 1.

    a DC can determine the cost of using resources and a cloud service user can decide whether or not it will pay the price for the resource for an incoming task;

  2. 2.

    the online service tasks have a higher priority than batch tasks;

  3. 3.

    both types of tasks need a certain number of virtual machines (VM) to process.

This paper’s major contributions include:

  1. 1.

    In order to achieve the maximum total discounted expected reward for any initial state in a DC where the online services have a higher priority than the batch tasks, an Markov Decision Process model for the DCs that serving both online services and batch tasks is firstly established to gain the optimal policy in when to admit or reject a task. As far as we know, this is the first time that an cognitive DC model as described in this paper is modelled in this way with several major theoretical results obtained. This research has also significantly extend our previous work [14] with complicated challenges from non-priority treatment of the tasks in a regular DC model to the priority treatment of the task in a cognitive DC model.

  2. 2.

    Through a systematic probability analysis, the optimal policy in when to admit or reject a task is finally verified to be a state-related control limit policy.

  3. 3.

    Furthermore, to be more efficiently utilized in a cognitive DC model for the control limit policy, a lower and an upper bound, respectively, for the optimal policy values are derived theoretically also for the estimation in reality.

  4. 4.

    Finally, a comprehensive set of experiments on the various cases to validate this proposed solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. The results offered in this paper will be expectedly utilized in various cloud data centers with different cognitive characteristics in an economically

II Model Description and Analysis

This section consists of two subsections, one is the description of the cognitive DC model with all needed parameters, and the other one is the establishment of the corresponding Markov decision process model along with the important components. Alternatively, the online service type is simplified as type-1 and batch task as type-2 tasks.

II-A The Description of the cognitive DC model

The cognitive DC center under investigation provides two services with service type (type-1: T1T_{1}) and batch type (type-2: T2T_{2}) of application task. T1T_{1} is primary user (PU) and T2T_{2} is secondary user (SU), each of which will require resources in the DC. A system work flow diagram is drawn in Fig. 1.

Refer to caption
Figure 1: System Work Flow.

The other detailed assumptions for the cognitive DC are given as follows:

  1. 1.

    There is a total number of C VMs defining the capacity of resources from the DCs, and two types of Tasks (T1T_{1} and T2T_{2}) in the system will share those C VMs. T1T_{1} is a type-11 task that is a time-sensitive service type task and needs a number of bb (here b is a given non-zero positive integer) VMs for service; T2T_{2} is a type-22 task that is a specific application task and needs one VM for a sequence of operation steps. The T1T_{1} task has a higher priority than the T2T_{2} task as explained in the following items. Generally, since the system mainly provides service for T1T_{1} tasks, it is always assumed that C=bN1C=bN_{1}, where N1N_{1} is a positive integer.

  2. 2.

    The arriving process for tasks T1T_{1} and T2T_{2} are Poisson processes with rates λ1\lambda_{1} and λ2\lambda_{2}, respectively. The task processing time for these tasks in one VM follows a negative exponential distribution with rates μ1\mu_{1} and μ2\mu_{2}, respectively. If a T1T_{1} task is processed with multiple bb VMs, the service rate would be bμ1b\mu_{1} for that task.

  3. 3.

    When a T2T_{2} task comes to the system, the CSP will get it processed immediately when there is a VM available. However, if all VMs are busy, the CSP will decide whether to admit or reject the task based on the current number of type-1 and type-2 tasks in the system. The rejected tasks will leave the system. The admitted T2T_{2} task will be put in a buffer waiting for the service whenever a VM is released for use. The waiting discipline in the buffer is ignored because the type-2 tasks in the buffer are indistinguishable. When a T2T_{2} task completes the service, it will leave the system and the corresponding VM will be released for use by other tasks. When a T2T_{2} task under processing is interrupted because of the arrival of a higher priority type-1 task, the interrupted T2T_{2} task will be put in the buffer waiting for a new service whenever a VM is free to use.

  4. 4.

    When a T1T_{1} task comes to the system, let n1n_{1} be the number of T1T_{1} tasks currently in the system; the CSP will take different actions in the following two cases:

    1. (a)

      If n1<N1n_{1}<N_{1}, provide the best service through allocating bb VMs for service, which includes the possibility of interrupting the service of several T2T_{2} Tasks under processing;

    2. (b)

      If n1=N1n_{1}=N_{1}, reject the T1T_{1} task.

  5. 5.

    In this paper, we focus on the admission control of a T2T_{2} task. Admitting and then serving a T2T_{2} task would contribute RR units of reward to the CSP. However, if a VM serving T2T_{2} task is preempted by a higher priority T1T_{1} task, there is a cost, say r(r0)r(r\geq 0), for that interruption. To hold the tasks in the system, the CSP needs to pay a holding price at a rate f(n1,n2)f(n_{1},n_{2}) to manage the VMs (resources) in the DCs when there are n1n_{1} type-1 tasks in the process and n2n_{2} type-2 tasks in the processing and in the buffer.

It is noteworthy to point put that some notations above, including T1T_{1}, T2T_{2}, λ1\lambda_{1} and λ2\lambda_{2}, and μ1\mu_{1} and μ2\mu_{2}, are used as the same as those in our recent work in [14] so that the audience may easily link up with our recent works in this area.

To better understanding all major given parameters in this paper, we summarize them in the Table I below:

TABLE I: A list of major given parameters
C Number of VMs in the DC
λi\lambda_{i} Arrival rate of TiT_{i} (i=1,2i=1,2)
μi\mu_{i} Service rate of TIiTIi (i=1,2i=1,2)
α\alpha Continuous-time discount factor
bb Number of VMs needed for processing a T1T_{1}
rr Interruption cost of a T2T_{2} by a T1T_{1}
RR Reward of completing a T2T_{2}
f(n1,n2)f(n_{1},n_{2}) Holding cost rate at state (n1,n2)(n_{1},n_{2})

II-B Markov Decision Process Model and its Components Analysis

Our objective in this research is to find the optimal policy such that the total expected discounted reward is maximized. In detail, if denote by sts_{t} the state at time tt, ata_{t} the action to take at state sts_{t}, and r(st,at)r(s_{t},a_{t}) for the reward obtained when action ata_{t} is selected at state sts_{t}, our objective is to find an optimal policy πα\pi_{\alpha} that can bring the maximum total expected discounted reward vαπ(s)v_{\alpha}^{\pi}(s) as defined below for every initial state ss.

vαπ(s)=Esπ{0eαtr(st,at)𝑑t}.v_{\alpha}^{\pi}(s)=E_{s}^{\pi}\bigg{\{}\int_{0}^{\infty}e^{-\alpha t}r(s_{t},a_{t})dt\bigg{\}}. (1)

Here, a policy π\pi specifies the decision rule to be used at every decision epoch, α\alpha is the discount factor. It gives the decision maker a prescription for action selection for any possible future system state or history.

Based on the model description and above objective, we can now establish a Markov decision process as follows:

  1. 1.

    Let’s define the state space of the system operating process as S={s:s=(n1,n2)}S=\{s:s=(n_{1},n_{2})\}, where integers n1n_{1} and n2n_{2} satisfy N1n10N_{1}\geq n_{1}\geq 0 and n20n_{2}\geq 0. The event space is defined by E={D1,D2,A1,A2}E=\{D_{1},D_{2},A_{1},A_{2}\}, where D1D_{1} and D2D_{2} means a T1T_{1} and T2T_{2} departure from the system after service, while A1A_{1} means an arrival of a T1T_{1} task, A2A_{2} is an arrival of T2T_{2} task. Since the states migration not only depends on the number of tasks in the system but also depends on the happening departure and arrival events, we will need to define a new state space as S^=S×E\hat{S}=S\times E. By doing so a state could be generally written as s^=s,e=(n1,n2),e\hat{s}=\langle s,e\rangle=\langle(n_{1},n_{2}),e\rangle, where n1n_{1} and n2n_{2} are the numbers for T1T_{1} and T2T_{2} tasks, ee stands for the event which will probably happen on state (n1,n2)(n_{1},n_{2}), e{D1,D2,A1,A2}e\in\{D_{1},D_{2},A_{1},A_{2}\}. Please be noticed that the specification of the event in this paper is one of major technical differences from that in paper [15], in which the event is assumed to happen before a state changes.

  2. 2.

    Denote by aCa_{C} as the action to continue, the action space for states (n1,n2),D1\langle(n_{1},n_{2}),D_{1}\rangle and (n1,n2),D2\langle(n_{1},n_{2}),D_{2}\rangle in the state space is then given by

    A(n1,n2),D1={aC},n1>0;\displaystyle A_{\langle(n_{1},n_{2}),D_{1}\rangle}=\{a_{C}\},n_{1}>0;
    A(n1,n2),D2={aC},n2>0.\displaystyle A_{\langle(n_{1},n_{2}),D_{2}\rangle}=\{a_{C}\},n_{2}>0.

    Similarly, in states (n1,n2),A1\langle(n_{1},n_{2}),A_{1}\rangle and (n1,n2),A2\langle(n_{1},n_{2}),A_{2}\rangle, if denote by aRa_{R} as the action to reject the request and aAa_{A} as the action to admit, the action space will be

    A(n1,n2),A1={aR,aA};\displaystyle A_{\langle(n_{1},n_{2}),A_{1}\rangle}=\{a_{R},a_{A}\};
    A(n1,n2),A2={aR,aA}.\displaystyle A_{\langle(n_{1},n_{2}),A_{2}\rangle}=\{a_{R},a_{A}\}.
  3. 3.

    Let C1(n1)C_{1}(n_{1}) be the number of VMs occupied by T1T_{1} tasks, C2(n1,n2)C_{2}(n_{1},n_{2}) be the number of VMs occupied by T2T_{2} tasks, Cv(n1,n2)C_{v}(n_{1},n_{2}) be the number of VMs serving T2T_{2} tasks that will be pre-empted if admitting a T1T_{1} task, sometimes simplified as C1C_{1}, C2C_{2} and CvC_{v} in this paper. From these definitions, it is easy to know that

    C1(n1)\displaystyle C_{1}(n_{1}) =\displaystyle= bn1,n1N1,\displaystyle bn_{1},n_{1}\leq N_{1},
    C2(n1,n2)\displaystyle C_{2}(n_{1},n_{2}) =\displaystyle= min(CC1(n1),n2),\displaystyle\min{(C-C_{1}(n_{1}),n_{2})},
    Cv(n1,n2)\displaystyle C_{v}(n_{1},n_{2}) =\displaystyle= max(C1+C2+bC,0),n1<N1,\displaystyle\max{(C_{1}+C_{2}+b-C,0)},n_{1}<N_{1},
    Cv(n1,n2)\displaystyle C_{v}(n_{1},n_{2}) =\displaystyle= 0,n1=N1.\displaystyle 0,n_{1}=N_{1}.

    The decision epochs are those time points when a call arriving or leaving the system. Based on our assumption, it is not too hard to know that the distribution of time between two epochs is

    F(t|s^,a)=1eβ(s^,a)t,t0,F(t|\hat{s},a)=1-e^{-\beta(\hat{s},a)t},t\geq 0,

    where s^=((n1,n2)),b\hat{s}=\langle((n_{1},n_{2})),b\rangle. Denote by s=(n1,n2)s=(n_{1},n_{2}) and β0(s)=λ1+λ2+C1μ1+C2μ2.\beta_{0}(s)=\lambda_{1}+\lambda_{2}+C_{1}\mu_{1}+C_{2}\mu_{2}. Since a departure event only happens when there is a task in the system, the β(s^,a)\beta(\hat{s},a) will be represented for an action aa as

    {β0(s)bμ1,e=D1,a=aC,n1>0,C2=n2β0(s)bμ1+e=D1,a=aC,n1>0,n2>C2,min(n2C2,b)μ2,β0(s)μ2,e=D2,a=aC,C2=n2>0,β0(s),e=D2,a=aC,n2>C2,β0(s)+bμ1,e=A1,a=aA,C1+C2Cb,β0(s)+bμ1e=A1,a=aA,Cvμ2,C1+C2>Cb,n1<N11,λ1+λ2+Cμ1,e=A1,a=aA,n1=N11,β0(s)+μ2,e=A2,a=aA,C1+C2<C,β0(s),e=A2,a=aA,C1+C2=C,β0(s),e={A1,A2},a=aR.\displaystyle\left\{\begin{array}[]{ll}\beta_{0}(s)-b\mu_{1},&e=D_{1},a=a_{C},n_{1}>0,C_{2}=n_{2}\\ \beta_{0}(s)-b\mu_{1}+&e=D_{1},a=a_{C},n_{1}>0,n_{2}>C_{2},\\ \min{(n_{2}-C_{2},b)}\mu_{2},&\\ \beta_{0}(s)-\mu_{2},&e=D_{2},a=a_{C},C_{2}=n_{2}>0,\\ \beta_{0}(s),&e=D_{2},a=a_{C},n_{2}>C_{2},\\ \beta_{0}(s)+b\mu_{1},&e=A_{1},a=a_{A},C_{1}+C_{2}\leq C-b,\\ \beta_{0}(s)+b\mu_{1}-&e=A_{1},a=a_{A},\\ C_{v}\mu_{2},&C_{1}+C_{2}>C-b,n_{1}<N_{1}-1,\\ \lambda_{1}+\lambda_{2}+C\mu_{1},&e=A_{1},a=a_{A},n_{1}=N_{1}-1,\\ \beta_{0}(s)+\mu_{2},&e=A_{2},a=a_{A},C_{1}+C_{2}<C,\\ \beta_{0}(s),&e=A_{2},a=a_{A},C_{1}+C_{2}=C,\\ \beta_{0}(s),&e=\{A_{1},A_{2}\},a=a_{R}.\end{array}\right.
  4. 4.

    Let q(j|s^,a)q(j|\hat{s},a) denote the probability that the system occupies state jj in the next epoch, if at the current epoch the system is at state s^\hat{s} and the decision maker takes action aAs^a\in A_{\hat{s}}. For the cases of departure events, e.g. for a departure event of D1D_{1} under the condition of (n1>0n_{1}>0), (s^,a)=((n1,n2),D1,aC)(\hat{s},a)=(\langle(n_{1},n_{2}),D_{1}\rangle,a_{C}), if denote by sn1=(n11,n2)s_{n_{1}}=(n_{1}-1,n_{2}), then we will have q(j|s^,a)q(j|\hat{s},a) as

    {λ1/β0(sn1),j=(n11,n2),A1,λ2/β0(sn1),j=(n11,n2),A2,C1(n11)μ1/β0(sn1),j=(n11,n2),D1,C2(n11,n2)μ2/β0(sn1),j=(n11,n2),D2.\displaystyle\left\{\begin{array}[]{ll}\lambda_{1}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),A_{1}\rangle,\\ \lambda_{2}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),A_{2}\rangle,\\ C_{1}(n_{1}-1)\mu_{1}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),D_{1}\rangle,\\ C_{2}(n_{1}-1,n_{2})\mu_{2}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),D_{2}\rangle.\\ \end{array}\right.

    Similar equations can be derived for case when (s^,a)=((n1,n2),D2,aC)(\hat{s},a)=(\langle(n_{1},n_{2}),D_{2}\rangle,a_{C}). If denote by sn2=(n1,n21)s_{n_{2}}=(n_{1},n_{2}-1) at the condition of n2>0n_{2}>0, we have q(j|s^,a)q(j|\hat{s},a) as

    {λ1/β0(sn2),j=(n1,n21),A1,λ2/β0(sn2),j=(n1,n21),A2,C1(n1)μ1/β0(sn2),j=(n1,n21),D1,C2(n1,n21)μ2/β0(sn2),j=(n1,n21),D2.\displaystyle\left\{\begin{array}[]{ll}\lambda_{1}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),A_{1}\rangle,\\ \lambda_{2}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),A_{2}\rangle,\\ C_{1}(n_{1})\mu_{1}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),D_{1}\rangle,\\ C_{2}(n_{1},n_{2}-1)\mu_{2}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),D_{2}\rangle.\\ \end{array}\right.

    For the cases of arrival events, such as (s^,a)=((n1,n2),A1,aA)(\hat{s},a)=(\langle(n_{1},n_{2}),A_{1}\rangle,a_{A}), and (s^,a)=((n1,n2),A2,aA)(\hat{s},a)=(\langle(n_{1},n_{2}),A_{2}\rangle,a_{A}), since admitting an incoming call migrates the system state immediately (adding one user or not), we will get q(j|s^,a)q(j|\hat{s},a) as

    {q(j|(n1+2,n2),D1,aC),e=A1,a=aA,q(j|(n1+1,n2),D1,aC),e=A1,a=aR,q(j|(n1,n2+2),D2,aC),e=A2,a=aA,q(j|(n1,n2+1),D2,aC),e=A2,a=aR.\displaystyle\left\{\begin{array}[]{ll}q(j|\langle(n_{1}+2,n_{2}),D_{1}\rangle,a_{C}),&e=A_{1},a=a_{A},\\ q(j|\langle(n_{1}+1,n_{2}),D_{1}\rangle,a_{C}),&e=A_{1},a=a_{R},\\ q(j|\langle(n_{1},n_{2}+2),D_{2}\rangle,a_{C}),&e=A_{2},a=a_{A},\\ q(j|\langle(n_{1},n_{2}+1),D_{2}\rangle,a_{C}),&e=A_{2},a=a_{R}.\end{array}\right.
  5. 5.

    Because the system state does not change between decision epochs, from Chp 11.5.2  [16] and our assumptions, the expected discounted reward between epochs satisfies

    r(s^,a)\displaystyle r(\hat{s},a) =\displaystyle= k(s^,a)+c(s^,a)Es^a{0τ1eαt𝑑t}\displaystyle k(\hat{s},a)+c(\hat{s},a)E_{\hat{s}}^{a}\left\{\int_{0}^{\tau_{1}}e^{-\alpha t}dt\right\}
    =\displaystyle= k(s^,a)+c(s^,a)Es^a{[1eατ1]/α}\displaystyle k(\hat{s},a)+c(\hat{s},a)E_{\hat{s}}^{a}\left\{[1-e^{-\alpha\tau_{1}}]/\alpha\right\}
    =\displaystyle= k(s^,a)+c(s^,a)α+β(s^,a),\displaystyle k(\hat{s},a)+\frac{c(\hat{s},a)}{\alpha+\beta(\hat{s},a)},

    where

    k(s^,a)={0,e={D1,D2},a=aC,0,e={A1,A2},a=aR,Cvr,e=A1,a=aA,R,e=A2,a=aA.\displaystyle k(\hat{s},a)=\left\{\begin{array}[]{cc}0,&e=\{D_{1},D_{2}\},a=a_{C},\\ 0,&e=\{A_{1},A_{2}\},a=a_{R},\\ -C_{v}r,&e=A_{1},a=a_{A},\\ R,&e=A_{2},a=a_{A}.\end{array}\right.

    Also, we have the cost function c(s^,a)c(\hat{s},a) as

    {f(n11,n2),e=D1,a=aC,n1>0,f(n1,n21),e=D2,a=aC,n2>0,f(n1+1,n2),e=A1,a=aA,n1<N1,f(n1,n2+1),e=A2,a=aA,f(n1,n2),e={A1,A2},a=aR.\displaystyle\left\{\begin{array}[]{cl}-f(n_{1}-1,n_{2}),&e=D_{1},a=a_{C},n_{1}>0,\\ -f(n_{1},n_{2}-1),&e=D_{2},a=a_{C},n_{2}>0,\\ -f(n_{1}+1,n_{2}),&e=A_{1},a=a_{A},n_{1}<N_{1},\\ -f(n_{1},n_{2}+1),&e=A_{2},a=a_{A},\\ -f(n_{1},n_{2}),&e=\{A_{1},A_{2}\},a=a_{R}.\end{array}\right.

In the next section we will prove that there exists a state-related threshold for accepting the tasks if the cost function has some special properties.

III Optimal Stationary State-related Control Limit Policy

A policy is stationary if, for each decision epoch tt, decision rule at tt epoch dt=dd_{t}=d is the same. Furthermore, a policy is called a control limit policy (or a threshold policy) for a given number of Tasks n1n_{1} and n2n_{2} in the system, for T2T_{2} task, is there a constant or threshold D(n1)0D(n_{1})\geq 0 such that the system will accept the arriving T2T_{2} whenever the number of T2T_{2} currently in the system is less than D(n1)D(n_{1}), that means the decision rule for T2T_{2} is:

d(n1,n2)={Admit,n2D(n1),Reject,n2>D(n1).\displaystyle d(n_{1},n_{2})=\left\{\begin{array}[]{cc}Admit,&n_{2}\leq D(n_{1}),\\ Reject,&n_{2}>D(n_{1}).\end{array}\right. (10)

III-A Total Discounted Reward

Denote by a constant c=λ1+λ2+Cmax(μ1,μ2)c=\lambda_{1}+\lambda_{2}+C*\max(\mu_{1},\mu_{2}), which is bigger than [1q(s^|s^,a)]β(s^,a)[1-q(\hat{s}|\hat{s},a)]\beta(\hat{s},a). From Chp 11.5.2  [16], we know that there is a unique optimal solution of our model and this solution is also a stationary state-related control limit policy satisfying

vαd(s^)=rd(s^)+βd(s^)α+βd(s^)jS^qd(j|s^)vαd(j).v_{\alpha}^{d}(\hat{s})=r_{d}(\hat{s})+\frac{\beta_{d}(\hat{s})}{\alpha+\beta_{d}(\hat{s})}\sum_{j\in\hat{S}}q_{d}(j|\hat{s})v_{\alpha}^{d}(j). (11)

By using above equation and also the uniformization technique described in  [16], we have

v((n1+1,n2),D1)\displaystyle v(\langle(n_{1}+1,n_{2}),D_{1}\rangle) (12)
=\displaystyle= 1α+c[f(n1,n2)\displaystyle\frac{1}{\alpha+c}\Big{[}-f(n_{1},n_{2})
+λ1v((n1,n2),A1)+λ2v((n1,n2),A2)\displaystyle\hskip 14.22636pt+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1)μ1v((n1,n2),D1)\displaystyle\hskip 14.22636pt+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}),D_{1}\rangle)
+C2(n1,n2)μ2v((n1,n2),D2)\displaystyle\hskip 14.22636pt+C_{2}(n_{1},n_{2})\mu_{2}v(\langle(n_{1},n_{2}),D_{2}\rangle)
+(cβ0(n1,n2))v((n1+1,n2),D1)].\displaystyle\hskip 14.22636pt+(c-\beta_{0}(n_{1},n_{2}))v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)\Big{]}.

This means that

v((n1+1,n2),D1)\displaystyle v(\langle(n_{1}+1,n_{2}),D_{1}\rangle) (13)
=\displaystyle= 1α+β0(n1,n2)[f(n1,n2)\displaystyle\frac{1}{\alpha+\beta_{0}(n_{1},n_{2})}\Big{[}-f(n_{1},n_{2})
+λ1v((n1,n2),A1)+λ2v((n1,n2),A2)\displaystyle\hskip 14.22636pt+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1)μ1v((n1,n2),D1)\displaystyle\hskip 14.22636pt+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}),D_{1}\rangle)
+C2(n1,n2)μ2v((n1,n2),D2)].\displaystyle\hskip 14.22636pt+C_{2}(n_{1},n_{2})\mu_{2}v(\langle(n_{1},n_{2}),D_{2}\rangle)\Big{]}.

Similarly, it is easily found that

v((n1+1,n2),D1)=v((n1,n2+1),D2),v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)=v(\langle(n_{1},n_{2}+1),D_{2}\rangle),

which shows the equality between different departure events. This leads us to define a new function X(n1,n2)X(n_{1},n_{2}) as below:

X(n1,n2)\displaystyle X(n_{1},n_{2}) =\displaystyle= v((n1+1,n2),D1)\displaystyle v(\langle(n_{1}+1,n_{2}),D_{1}\rangle) (14)
=\displaystyle= v((n1,n2+1),D2),\displaystyle v(\langle(n_{1},n_{2}+1),D_{2}\rangle), (15)

for any n10n_{1}\geq 0 and n20n_{2}\geq 0.

It is noticed that X(n1,n2)X(n_{1},n_{2}), is only related to the state, but not with the happening event. This observation will greatly simplify the needed proof processes in next several sections.

Similar as above results for a departure event, we can consider an arrival event and will get the following results:

v((n1,n2),A2,aA)\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{A}) (16)
=\displaystyle= Rα+β0(n1,n2+1)α+c\displaystyle R\frac{\alpha+\beta_{0}(n_{1},n_{2}+1)}{\alpha+c}
+1α+c[f(n1,n2+1)\displaystyle+\frac{1}{\alpha+c}\Big{[}-f(n_{1},n_{2}+1)
+λ1v((n1,n2+1),A1)+λ2v((n1,n2+1),A2)\displaystyle+\lambda_{1}v(\langle(n_{1},n_{2}+1),A_{1}\rangle)+\lambda_{2}v(\langle(n_{1},n_{2}+1),A_{2}\rangle)
+C1(n1)μ1v((n1,n2+1),D1)\displaystyle+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}+1),D_{1}\rangle)
+C2(n1,n2+1)μ2v((n1,n2+1),D2)\displaystyle+C_{2}(n_{1},n_{2}+1)\mu_{2}v(\langle(n_{1},n_{2}+1),D_{2}\rangle)
+(cβ0(n1,n2+1))v((n1,n2),A2)],\displaystyle+(c-\beta_{0}(n_{1},n_{2}+1))v(\langle(n_{1},n_{2}),A_{2}\rangle)\Big{]},

and

v((n1,n2),A2,aR)\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{R}) (17)
=\displaystyle= 1α+c[f(n1,n2)+λ1v((n1,n2),A1)\displaystyle\frac{1}{\alpha+c}\Big{[}-f(n_{1},n_{2})+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)
+λ2v((n1,n2),A2)\displaystyle\hskip 14.22636pt+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1)μ1v((n1,n2),D1)\displaystyle\hskip 14.22636pt+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}),D_{1}\rangle)
+C2(n1,n2)μ2v((n1,n2),D2)\displaystyle\hskip 14.22636pt+C_{2}(n_{1},n_{2})\mu_{2}v(\langle(n_{1},n_{2}),D_{2}\rangle)
+(cβ0(n1,n2))v((n1,n2),A2)].\displaystyle\hskip 14.22636pt+(c-\beta_{0}(n_{1},n_{2}))v(\langle(n_{1},n_{2}),A_{2}\rangle)\Big{]}.

From above equations, we can easily get

v((n1,n2),A2,aA)\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{A}) \displaystyle\geq R+X((n1,n2+1)),\displaystyle R+X((n_{1},n_{2}+1)),
v((n1,n2),A2,aR)\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{R}) \displaystyle\geq X((n1,n2)).\displaystyle X((n_{1},n_{2})).

In fact, these two inequalities will be the equalities when the corresponding action aAa_{A} or aRa_{R} is the best action, respectively. From these analysis, it is not too hard to verify that

v((n1,n2),A2)\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle) (18)
=\displaystyle= max[X((n1,n2)),R+X((n1,n2+1))].\displaystyle\max\Big{[}X((n_{1},n_{2})),R+X((n_{1},n_{2}+1))\Big{]}.

For the T1T_{1} tasks, as the system always accepts them until all VMs are being used, we have

v((n1,n2),A1)\displaystyle v(\langle(n_{1},n_{2}),A_{1}\rangle) (21)
=\displaystyle= {Cvr+X((n1+1,n2)),n1<N1,X((n1,n2)),n1=N1.\displaystyle\left\{\begin{array}[]{cl}-C_{v}r+X((n_{1}+1,n_{2})),&n_{1}<N_{1},\\ X((n_{1},n_{2})),&n_{1}=N_{1}.\end{array}\right.

III-B Optimal Result

Before providing our major optimal result, we need to introduce a general result as below.

Lemma 1: Let h(i)h(i) (i0i\geq 0) be an integer concave function, and denote by

g(i)max{h(i),R+h(i+1)}i0,g(i)\equiv\max\{h(i),R+h(i+1)\}\,\,\,\,i\geq 0,

for a given constant RR. Then, g(i)g(i) is also an integer concave function for i0i\geq 0.

Proof: Denote by Δh(i)=h(i+1)h(i)\Delta h(i)=h(i+1)-h(i) for any integer i0i\geq 0, we will prove this Lemma by considering the following three cases:

Case 1: If for any i0i\geq 0 , Δh(i)R\Delta h(i)\leq-R. Thus, g(i)h(i)g(i)\equiv h(i) and g(i)g(i) is then concave.

Case 2: If for any i0i\geq 0, Δh(i)R\Delta h(i)\geq-R. Thus, g(i)R+h(i+1)g(i)\equiv R+h(i+1) and g(i)g(i) is then concave.

Case 3: We now consider the case when both of above cases would not be true. In this case, we can first know that Δh(1)R\Delta h(1)\geq-R. In fact, if it is not sure, i.e., if Δh(1)R\Delta h(1)\leq-R, we can inductively verify that Δh(i)R\Delta h(i)\leq-R for any i0i\geq 0 by noting the assumption that h(i)h(i) is an integer concave function.

Since Δh(1)R\Delta h(1)\geq-R, Δh(i)\Delta h(i) is decreasing because of the concavity of h(i)h(i), and the Case 2 will not hold for all ii in this Case 3, there then must exist an integer k1k\geq 1 such that

Δh(j)\displaystyle\Delta h(j) \displaystyle\geq R,forj=1,2,,k,\displaystyle-R,\,\,\,\,\,{\rm for}\,\,\,j=1,2,...,k,
Δh(k+1)\displaystyle\Delta h(k+1) \displaystyle\leq R.\displaystyle-R.

From this analysis, we will know that for any i0i\geq 0,

g(i)={R+h(i+1),ik,h(i),i>k.\displaystyle g(i)=\left\{\begin{array}[]{cl}R+h(i+1),&i\leq k,\\ h(i),&i>k.\end{array}\right.

The concavity of g(i)g(i), i.e., Δg(i)0\Delta g(i)\leq 0, is then verified by the concavity of h(i)h(i), of a constant R-R, and the following further notification for any i1i\geq 1:

Δg(i)={Δh(i+1),i<k,R,i=k,Δh(i),i>k.\displaystyle\Delta g(i)=\left\{\begin{array}[]{cl}\Delta h(i+1),&i<k,\\ -R,&i=k,\\ \Delta h(i),&i>k.\end{array}\right.

Remark 1: It is worthy to point out that there is no condition for the given constant in the Lemma 1. This constant could be either negative or positive. In fact, we introduced this result in paper [17] for the case when the constant is non-positive, and in paper [14] for the case when the constant is non-negative. However, it is the first time in this paper that we point out this general result without any condition on the constant RR and also provide the mathematical verification.

Based on above Lemma 1 and the expression developed before Lemma 1, we can now provide and then verify the following major optimal result:

Theorem 1: If f(n1,n2)f(n_{1},n_{2}) is convex and increasing function on n2n_{2} for any given n1n_{1}, the optimal policy is then a control limit policy. That means, for any state (n1,n2)(n_{1},n_{2}), there must exist an integer, say N2N_{2}, such that decision

a(n1,n2),A2={aA,ifn2N2,aR,ifn2>N2.\displaystyle a_{\langle(n_{1},n_{2}),A_{2}\rangle}=\left\{\begin{array}[]{ll}a_{A},&\mbox{if}\,\,n_{2}\leq N_{2},\\ a_{R},&\mbox{if}\,\,n_{2}>N_{2}.\\ \end{array}\right. (26)

Proof: If all VMs are busy when an SU arrives at state (n1,n2)(n_{1},n_{2}), we know that C1(n1)+n2CC_{1}(n_{1})+n_{2}\geq C and then

C2(n1,n2)=CC1(n1).C_{2}(n_{1},n_{2})=C-C_{1}(n_{1}).

Therefore

β0(n1,n2+2)=β0(n1,n2+1)=β0(n1,n2)\displaystyle\beta_{0}(n_{1},n_{2}+2)=\beta_{0}(n_{1},n_{2}+1)=\beta_{0}(n_{1},n_{2}) (27)
=\displaystyle= λ1+λ2+C1(n1)μ1+(CC1(n1))μ2,\displaystyle\lambda_{1}+\lambda_{2}+C_{1}(n_{1})\mu_{1}+(C-C_{1}(n_{1}))\mu_{2},

which is independent of n2n_{2}. Further, by using the notation of X((n1,n2))X((n_{1},n_{2})), we can rewrite the equation (13) as below:

X((n1,n2))\displaystyle X((n_{1},n_{2})) (28)
=\displaystyle= 1α+β0(n1,n2)[f(n1,n2)+λ1v((n1,n2),A1)\displaystyle\frac{1}{\alpha+\beta_{0}(n_{1},n_{2})}\Big{[}-f(n_{1},n_{2})+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)
+λ2v((n1,n2),A2)+C1(n1)μ1X((n11,n2))\displaystyle+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)+C_{1}(n_{1})\mu_{1}X((n_{1}-1,n_{2}))
+(CC1(n1))μ2X((n1,n21))].\displaystyle+(C-C_{1}(n_{1}))\mu_{2}X((n_{1},n_{2}-1))\Big{]}.

For any two-dimensional integer function g(n1,n2)g(n_{1},n_{2}) (n10,n20n_{1}\geq 0,n_{2}\geq 0), we introduce the following definitions for n1n_{1} and n2n_{2}, respectively:

Δn2g(n1,n2)\displaystyle\Delta_{n_{2}}g(n_{1},n_{2}) =\displaystyle= g(n1,n2+1)g(n1,n2).\displaystyle g(n_{1},n_{2}+1)-g(n_{1},n_{2}). (29)
Δn2(2)g(n1,n2)\displaystyle\Delta^{(2)}_{n_{2}}g(n_{1},n_{2}) =\displaystyle= Δn2g(n1,n2+1)Δn2g(n1,n2).\displaystyle\Delta_{n_{2}}g(n_{1},n_{2}+1)-\Delta_{n_{2}}g(n_{1},n_{2}). (30)

From the observation in equation (27) and the equation (28), we will have

(α+β0(n1,n2+1))Δn2X(n1,n2)\displaystyle\big{(}\alpha+\beta_{0}(n_{1},n_{2}+1)\big{)}\Delta_{n_{2}}X(n_{1},n_{2}) (31)
=\displaystyle= Δn2f(n1,n2)\displaystyle-\Delta_{n_{2}}f(n_{1},n_{2})
+λ1Δn2v((n1,n2),A1)+λ2Δn2v((n1,n2),A2)\displaystyle+\lambda_{1}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1)μ1Δn2X(n11,n2)\displaystyle+C_{1}(n_{1})\mu_{1}\Delta_{n_{2}}X(n_{1}-1,n_{2})
+(CC1)μ2Δn2X(n1,n21).\displaystyle+(C-C_{1})\mu_{2}\Delta_{n_{2}}X(n_{1},n_{2}-1).

Next, by noting

Cv(n1,n2+2)=Cv(n1,n2+1)=Cv(n1,n2)\displaystyle C_{v}(n_{1},n_{2}+2)=C_{v}(n_{1},n_{2}+1)=C_{v}(n_{1},n_{2}) (35)
=\displaystyle= {b,n1<N1,0,n1=N1,\displaystyle\begin{array}[]{l}\left\{\begin{array}[]{lc}b,&n_{1}<N_{1},\\ 0,&n_{1}=N_{1},\end{array}\right.\end{array}

and equation (21), we will have

Δn2v((n1,n2),A1)\displaystyle\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle) (39)
=\displaystyle= {Δn2X(n1+1,n2),n1<N1,Δn2X(n1,n2),n1=N1.\displaystyle\begin{array}[]{l}\left\{\begin{array}[]{cc}\Delta_{n_{2}}X(n_{1}+1,n_{2}),&n_{1}<N_{1},\\ \Delta_{n_{2}}X(n_{1},n_{2}),&n_{1}=N_{1}.\end{array}\right.\end{array}

By a similar implementation on about two equations (31) and (39) by using the results in equations (27) and (35), we have

(α+β0(n1,n2+2))Δn2(2)X(n1,n2)\displaystyle(\alpha+\beta_{0}(n_{1},n_{2}+2))\Delta^{(2)}_{n_{2}}X(n_{1},n_{2}) (40)
=\displaystyle= Δn2(2)f(n1,n2)\displaystyle-\Delta^{(2)}_{n_{2}}f(n_{1},n_{2})
+λ1Δn2(2)v((n1,n2),A1)+λ2Δn2(2)v((n1,n2),A2)\displaystyle+\lambda_{1}\Delta^{(2)}_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}\Delta^{(2)}_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1)μ1Δn2(2)X(n11,n2)\displaystyle+C_{1}(n_{1})\mu_{1}\Delta^{(2)}_{n_{2}}X(n_{1}-1,n_{2})
+(CC1)μ2Δn2(2)X(n1,n21).\displaystyle+(C-C_{1})\mu_{2}\Delta^{(2)}_{n_{2}}X(n_{1},n_{2}-1).

and

Δn2(2)v((n1,n2),A1)\displaystyle\Delta^{(2)}_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle) (44)
=\displaystyle= {Δn2(2)X(n1+1,n2),n1<N1,Δn2(2)X(n1,n2),n1=N1.\displaystyle\begin{array}[]{l}\left\{\begin{array}[]{cc}\Delta^{(2)}_{n_{2}}X(n_{1}+1,n_{2}),&n_{1}<N_{1},\\ \Delta^{(2)}_{n_{2}}X(n_{1},n_{2}),&n_{1}=N_{1}.\end{array}\right.\end{array}

With the preparations on all equations from equation (28) to (44), we can now use Value Iteration Method with three steps to show that for all states X(n1,n2)X(n_{1},n_{2}) is concave and nonincreasing for nonnegative integer function on n2n_{2} for any given n1n_{1} as below:

Step 1: Set X(0)(n1,n2)=0X^{(0)}(n_{1},n_{2})=0, by noting equations (21) and (18), we know v(0)((n1,n2),A2)=Rv^{(0)}(\langle(n_{1},n_{2}),A_{2}\rangle)=R and

v(0)((n1,n2),A1)={br,n1<N1,0,n1=N1.\displaystyle v^{(0)}(\langle(n_{1},n_{2}),A_{1}\rangle)=\left\{\begin{array}[]{cl}-br,&n_{1}<N_{1},\\ 0,&n_{1}=N_{1}.\end{array}\right.

Substitute these three results into equation (12), we will have

X(1)(n1,n2)={f(n1,n2)+λ1(br)+λ2Rα+c,n1<N1,f(n1,n2)+0+λ2Rα+c,n1=N1.\displaystyle X^{(1)}(n_{1},n_{2})=\left\{\begin{array}[]{cl}\frac{-f(n_{1},n_{2})+\lambda_{1}(-br)+\lambda_{2}R}{\alpha+c},&n_{1}<N_{1},\\ \frac{-f(n_{1},n_{2})+0+\lambda_{2}R}{\alpha+c},&n_{1}=N_{1}.\end{array}\right.

Therefore, for any n1n_{1}, X(1)(n1,n2)X^{(1)}(n_{1},n_{2}) is concave and nonincreasing on n2n_{2}.

Step 2: By using above concavity and non-increasing property of X(1)(n1,n2)X^{(1)}(n_{1},n_{2}), and the equation (21) for the case when all VMs are busy for state (n1,n2)(n_{1},n_{2}), or equations (39) and (44), we know that v(1)(n1,n2,A1)v^{(1)}(\langle n_{1},n_{2},A_{1}\rangle) is concave and non-increasing functions for any n2n_{2}. By further applying the result in Lemma 1, we know that v(1)(n1,n2,A2)v^{(1)}(\langle n_{1},n_{2},A_{2}\rangle) is also concave and non-increasing functions for any n2n_{2}. With these results in mind, and using the results in equations (31) and (40), we will know that

Δn2X(2)(n1,n2)0,andΔn2(2)X(2)(n1,n2)0.\Delta_{n_{2}}X^{(2)}(n_{1},n_{2})\leq 0,\hskip 14.22636pt{\rm and}\hskip 14.22636pt\Delta^{(2)}_{n_{2}}X^{(2)}(n_{1},n_{2})\leq 0.

These two inequalities justify that for any n1n_{1}, X(2)(n1,n2)X^{(2)}(n_{1},n_{2}) is nonincreasing and concave on n2n_{2}.

Step 3: Finally, by noting the Theorem 11.3.2 of  [16] that the optimality equation has the unique solution, we know the value iteration X(n)(n1,n2)X^{(n)}(n_{1},n_{2}) will uniquely converges. Therefore, as the iteration continues, with nn goes to \infty, we know that for any n1n_{1}, X(n1,n2)X(n_{1},n_{2}) is always concave nonincreasing for any n2n_{2}.

Finally, by using the equation (18) and the property of nonincreasing and concave on n2n_{2} for X(n1,n2)X(n_{1},n_{2}), it is straight forward to know that the optimal policy is a control limit policy as stated in the Theory.

The proof is now completed.

IV Bound Analysis for the Optimal Result

From the above analysis, we have known that discounted optimal policy is a control limit policy if the cost function f(n1,n2)f(n_{1},n_{2}) is convex and increasing on n2n_{2} for any given n1n_{1}. However, since identification of the optimal value is pretty important in reality, how to determine the corresponding threshold value of the control limit policy or the range of the value becomes a challenging issue as long as the optimal strategy if verified. In the Section, we will step toward to the identification of the optimal value’s range and derive a useful range result in terms of the lower bound and the upper bound of the optimal value. We now first introduce a general result as below.

Lemma 2: Let hk(i)h_{k}(i) be an integer concave function (k=1,2), and for a constant RR denote by

gk(i)max{hk(i),R+hk(i+1)},k=1,2.g_{k}(i)\equiv\max\{h_{k}(i),R+h_{k}(i+1)\},\hskip 14.22636ptk=1,2.

Then, Δg1(i)Δg2(i)\Delta g_{1}(i)\leq\Delta g_{2}(i) holds if Δh1(i)Δh2(i)\Delta h_{1}(i)\leq\Delta h_{2}(i) for any i0.i\geq 0.

Proof: For any given integer ii, since Δh1(i)Δh2(i)\Delta h_{1}(i)\leq\Delta h_{2}(i), we only need to verify the result is true for the following three cases.

Case 1: If Δh2(i)R\Delta h_{2}(i)\leq-R is true. In this case, since Δgk(i)=Δhk(i)\Delta g_{k}(i)=\Delta h_{k}(i), it is straightforward to know

Δg1(i)=Δh1(i)Δh2(i)=Δg2(i).\Delta g_{1}(i)=\Delta h_{1}(i)\leq\Delta h_{2}(i)=\Delta g_{2}(i).

Case 2: If RΔh1(i)-R\leq\Delta h_{1}(i) is true. In this case, we know gk(i)=R+hk(i+1)g_{k}(i)=R+h_{k}(i+1) (k=1, 2), and then,

Δgk(i)\displaystyle\Delta g_{k}(i) =\displaystyle= gk(i+1)[R+hk(i+1)]\displaystyle g_{k}(i+1)-[R+h_{k}(i+1)]
=\displaystyle= max{R,Δhk(i+1)}.\displaystyle\max\{-R,\Delta h_{k}(i+1)\}.

From above result, it is also directly to know that

Δg1(i)Δg2(i),\Delta g_{1}(i)\leq\Delta g_{2}(i),

by noting Δh1(i+1)Δh2(i+1)\Delta h_{1}(i+1)\leq\Delta h_{2}(i+1).

Case 3: If Δh1(i)RΔh2(i)\Delta h_{1}(i)\leq-R\leq\Delta h_{2}(i) is true. In this case, from analysis in above Case 1 and Case 2, we know that Δg1(i)=Δh1(i)\Delta g_{1}(i)=\Delta h_{1}(i), and

Δg2(i)=max{R,Δh2(i+1)}.\Delta g_{2}(i)=\max\{-R,\Delta h_{2}(i+1)\}.

Therefore, Δg1(i)RΔg2(i)\Delta g_{1}(i)\leq-R\leq\Delta g_{2}(i).

The proof is now completed.

With the condition that the cost function f(n1,n2)f(n_{1},n_{2}) is convex and increasing on n2n_{2} for any given n1n_{1}, from Theorem 1, we already know that X(n1,n2)X(n_{1},n_{2}) is concave and decreasing on n2n_{2} for any n1n_{1}, and therefore we will have

Δn2X(n1,n21)\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}-1) \displaystyle\geq Δn2X(n1,n2).\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}). (47)

Also from Lemma 1 and equation (21) and (18), we can also have

Δn2v((n1,n2),A1)\displaystyle\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle) \displaystyle\leq Δn2X(n1,n2),\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}),
Δn2v((n1,n2),A2)\displaystyle\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle) \displaystyle\leq Δn2X(n1,n2).\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}). (48)

From these results, by noting equation (27), we may rewrite equation (31) as below:

αΔn2X(n1,n2)+Δn2f(n1,n2)\displaystyle\alpha\Delta_{n_{2}}X(n_{1},n_{2})+\Delta_{n_{2}}f(n_{1},n_{2}) (49)
=\displaystyle= λ1[Δn2v((n1,n2),A1)Δn2X(n1,n2)]\displaystyle\hskip 7.11317pt\lambda_{1}\big{[}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}
+λ2[Δn2v((n1,n2),A2)Δn2X(n1,n2)]\displaystyle+\lambda_{2}\big{[}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}
+C1(n1)μ1[Δn2X(n11,n2)Δn2X(n1,n2)]\displaystyle+C_{1}(n_{1})\mu_{1}\big{[}\Delta_{n_{2}}X(n_{1}-1,n_{2})-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}
+C2μ2[Δn2X(n1,n21)Δn2X(n1,n2)].\displaystyle+C_{2}\mu_{2}\big{[}\Delta_{n_{2}}X(n_{1},n_{2}-1)-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}.

We also need to introduce a result below before presenting our major boundary result:

Lemma 3: If the cost function f(n1,n2)f(n_{1},n_{2}) is a convex and nondecreasing function on n2n_{2} for any given n1n_{1}, and Δn2f(n1,n2)\Delta_{n_{2}}f(n_{1},n_{2}) is a nondecreasing function on n1n_{1}, then Δn2X(n1,n2)\Delta_{n_{2}}X(n_{1},n_{2}) a nonincreasing function on n1n_{1}.

Proof: The detailed verification is included in Appendix.

From these preparation, we can know provide and prove our bound result as below.

Theorem 2: If the cost function f(n1,n2)f(n_{1},n_{2}) is a convex and nondecreasing function on n2n_{2} for any given n1n_{1}, and Δn2f(n1,n2)\Delta_{n_{2}}f(n_{1},n_{2}) is a nondecreasing function on n1n_{1}, then we will have

  1. 1.

    A lower bound of the optimal threshold value N2N_{2} is given by

    N2=max{n2:Δn2f(N1,n2)<αR},N_{2*}=\max\left\{n_{2}:\Delta_{n_{2}}f(N_{1},n_{2})<\alpha R\right\},

    if there exists an integer n2n_{2} such that

    Δn2f(N1,n2)<αR.\Delta_{n_{2}}f(N_{1},n_{2})<\alpha R. (50)
  2. 2.

    An upper bound of the optimal threshold value N2N_{2} is given by

    N2=min{n2:Δn2f(0,n2)\displaystyle N^{*}_{2}=\min\Big{\{}n_{2}:\Delta_{n_{2}}f(0,n_{2})\hskip 102.43008pt
    >(α+min(n2+1,C)μ2)R},\displaystyle\hskip 56.9055pt>(\alpha+\min(n_{2}+1,C)\mu_{2})R\Big{\}},

    if there exists an integer nn such that

    Δn2f(0,n2)>(α+min(n2+1,C)μ2)R.\Delta_{n_{2}}f(0,n_{2})>(\alpha+\min(n_{2}+1,C)\mu_{2})R. (51)

Proof: We will have the following justifications:

  1. 1.

    It is intuitive that an integer n2n_{2} would be a lower bound for the optimal schedule value in the Theorem 1 as long as the action on the state (n1,n2)(n_{1},n_{2}) is acceptance for an arrived SU. Next, since N1N_{1} is the largest value of all n1n_{1} in state (n1,n2)(n_{1},n_{2}), an acceptance action for arrived SU at state (N1,n2)(N_{1},n_{2}) must result in an acceptance action for arrived SU at state (n1,n2)(n_{1},n_{2}). Therefore, the optimal threshold value in Theorem 1 for state (N1,n2)(N_{1},n_{2}) is bigger than the optimal threshold value in Theorem 1 for state (n1,n2)(n_{1},n_{2}). This also implies that the lower bound of optimal threshold value in Theorem 1 for any state (N1,n2)(N_{1},n_{2}) is more closer to the optimal threshold value than that for any state (n1,n2)(n_{1},n_{2}).

    To verify the lower bound result in the Theorem 2, we only need to verify that the action on the state (N1,n2)(N_{1},n_{2}) is acceptance for an arrived SU under the condition in equation (50). Or equivalently, we only need to verify that if the action on the state (N1,n2)(N_{1},n_{2}) is rejection, then the equation (50) must not be valid. In fact, at state (N1,n2)(N_{1},n_{2}), there is no free VM for both PU and SU arrivals, which means the system will take the action to reject both PU and SU arrival. By noticing the equation (21) and (18), if system will reject the arrived PU and SU at sate (N1,n2)(N_{1},n_{2}), we have

    Δn2v((N1,n2),A1)\displaystyle\Delta_{n_{2}}v(\langle(N_{1},n_{2}),A_{1}\rangle) =\displaystyle= Δn2X(N1,n2),\displaystyle\Delta_{n_{2}}X(N_{1},n_{2}),
    Δn2v((N1,n2),A2)\displaystyle\Delta_{n_{2}}v(\langle(N_{1},n_{2}),A_{2}\rangle) =\displaystyle= Δn2X(N1,n2).\displaystyle\Delta_{n_{2}}X(N_{1},n_{2}).

    Submitting these results into the equation (49), by recalling the concavity of X(N1,n2))X(N_{1},n_{2})) on n2n_{2} in the verification of Theorem 1 and the result in Lemma 3, we know the right-hand side of the equation is non-negative. Therefore, we will have

    Δn2f(N1,n2)αΔn2X(N1,n2)αR.\Delta_{n_{2}}f(N_{1},n_{2})\geq-\alpha\Delta_{n_{2}}X(N_{1},n_{2})\geq\alpha R.

    The last inequality comes from the rejection of SU at state (N1,n2)(N_{1},n_{2}) by equation (18). Thus, we verified that the equation (50) is invalid. Therefore, the verification of the low bound for any state (n1,n2))(n_{1},n_{2})) is now completed.

  2. 2.

    On the contrary, an integer n2n_{2} would be an upper bound for the optimal schedule value in the Theorem 1 as long as the action on the state (n1,n2)(n_{1},n_{2}) is rejection for an arrived SU. We first consider the case of n2<Cn_{2}<C, which means there is at least one free VM when an SU arrives. Therefore, to verify the upper bound result in the Theorem 2, we only need to verify that the action on the state (0,n2)(0,n_{2}) is rejection for an arrived SU. Or equivalently, we only need to verify that if the action on the state (0,n2)(0,n_{2}) is acceptance, then the equation (51) must not be valid. In fact, at state (0,n2)(0,n_{2}), by noticing the equation (21) and (18), if system will accept the arrived SU (T2T_{2}) at sate (0,n2)(0,n_{2}), n2<Cn_{2}<C, we have

    Δn2v((0,n2),A1)\displaystyle\Delta_{n_{2}}v(\langle(0,n_{2}),A_{1}\rangle) \displaystyle\leq Δn2X(0,n2),\displaystyle\Delta_{n_{2}}X(0,n_{2}),
    Δn2v((0,n2),A2)\displaystyle\Delta_{n_{2}}v(\langle(0,n_{2}),A_{2}\rangle) \displaystyle\leq Δn2X(0,n2).\displaystyle\Delta_{n_{2}}X(0,n_{2}).

    Since the action is to accept T2T_{2} arrival, from (18) we have

    Δn2X(0,n21)\displaystyle\Delta_{n_{2}}X(0,n_{2}-1)\geq Δn2X(0,n2)\displaystyle\Delta_{n_{2}}X(0,n_{2})\geq R.\displaystyle-R.

    Submitting these results into the equation (49), by again recalling the concavity of X(n1,n2))X(n_{1},n_{2})) in the verification of Theorem 1 and the result in Lemma 3, we will have

    (α+(n2+1)μ2)Δn2X(0,n2)+Δn2f(0,n2)0.(\alpha+(n_{2}+1)\mu_{2})\Delta_{n_{2}}X(0,n_{2})+\Delta_{n_{2}}f(0,n_{2})\leq 0.

    Therefore, we will have

    Δn2f(0,n2)\displaystyle\Delta_{n_{2}}f(0,n_{2}) \displaystyle\leq (α+(n2+1)μ2)Δn2X(0,n2)\displaystyle-(\alpha+(n_{2}+1)\mu_{2})\Delta_{n_{2}}X(0,n_{2})
    \displaystyle\leq (α+(n2+1)μ2)R.\displaystyle(\alpha+(n_{2}+1)\mu_{2})R.

    The last inequality comes from the acceptance of SU at state (0,n2)(0,n_{2}) for n2<Cn_{2}<C by equation (18). Thus, we verified that the equation (51) is invalid.

    For the states (0,n2)(0,n_{2}) with nCn\geq C, which means all the VMs are busy when an SU arrives, a similar upper bound deduction can be derived from equation (49) as

    Δn2f(0,n2)>(α+Cμ2)R.\Delta_{n_{2}}f(0,n_{2})>(\alpha+C\mu_{2})R.

    Since f(0,n2)f(0,n_{2}) is the smallest value of all n1n_{1}, the upper bound for state (0,n2)(0,n_{2}) is therefore also an upper bound for any other state (n1,n2)(n_{1},n_{2}) with n1>0n_{1}>0. The verification of the upper bound in the Theorem is now completed by the justification in the beginning of this paragraph.

Remark 2: More specifically, if equation (50) never holds, i.e., Δn2f(n1,n2)>αR\Delta_{n_{2}}f(n_{1},n_{2})>\alpha R holds for all states, it can be observed, from equation (51) on the case when n2=0n_{2}=0, that the upper bound becomes 0. This means the system will not accept any T2T_{2} arrivals into the system if equation (50) never holds.

V Numerical Analysis

We have theoretically verified that the optimal policy to maximize the total expected discounted reward in equation (1) is a control limit policy or a threshold policy for accepting task-2 arrivals. Our CTMDP model consists of several parameters like arrival rates, departure rates, rewards and cost function, etc. Here for simulation and numerical analysis purpose we set some parameters as those listed in TABLE II. It can be seen from the Table II that the loads for T1T_{1} and T2T_{2} tasks are ρ1=λ1μ1=1/6\rho_{1}=\frac{\lambda_{1}}{\mu_{1}}=1/6 and ρ2=λ2μ2=1/4\rho_{2}=\frac{\lambda_{2}}{\mu_{2}}=1/4, which means the system are lightly loaded. Other parameters of the system is set to be as R=5R=5, r=0.5r=0.5, C=10C=10, b=5b=5, discount factor be α=0.1\alpha=0.1, holding cost function be f(x,y)=x2+y2f(x,y)=x^{2}+y^{2}.

TABLE II: Parameter value setting
λ\lambda μ\mu
T1T_{1} 1 6
T2T_{2} 2 8

However, it is always a challenging problem in exactly finding this threshold value or optimal objective value for a given problem, especially in a continuously changing environment, such as the changes of arrival rates, service rates, rewards in the real world. While we have demonstrated in above subsection on how to derive the thresholds by using value iteration method, the calculation through this way is always a time-consuming issue.

In the 3rd subsection, we propose the machine learning method and then demonstrate it to obtain or estimate the threshold value and the optimal objective value by using a feed-forward neural network model. A feed-forward neural network is an artificial neural network where connections between the units do not form a cycle.

V-A Threshold Policy

With this parameter setting, by using value iteration method we have

TABLE III: X(n1,n2)X(n_{1},n_{2}) Values with Optimal Policy
0 n2n_{2}\rightarrow 5
0 96.53 96.38 96.10 95.69 95.16 94.51
n1n_{1}\downarrow 96.50 96.33 96.03 95.61 95.07 94.40
2 96.43 96.24 95.89 95.38 94.72 93.89
6 n2n_{2}\rightarrow 11
0 93.72 92.80 91.74 90.55 89.22 87.63
n1n_{1}\downarrow 93.51 92.41 91.11 89.61 87.90 85.93
2 92.81 91.49 89.94 88.14 86.11 83.78
12 n2n_{2}\rightarrow 17
0 85.73 83.51 80.95 78.00 74.66 70.89
n1n_{1}\downarrow 83.65 81.03 78.03 74.62 70.79 66.50
2 81.11 78.06 74.61 70.71 66.35 61.51
18 n2n_{2}\rightarrow 23
0 66.68 61.99 56.81 51.11 44.88 38.09
n1n_{1}\downarrow 61.73 56.46 50.68 44.34 37.44 29.94
2 56.17 50.29 43.87 36.87 29.26 21.02

As observed from TABLE III, X(n1,n2)X(n_{1},n_{2}) values are concave decreasing on n2n_{2} direction, which fits our theoretical result.

TABLE IV: Actions for T2T_{2} task of Optimal Policy
0 n2n_{2}\rightarrow 7
0 1 1 1 1 1 1 1 1
n1n_{1}\downarrow 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
8 n2n_{2}\rightarrow 15
0 1 1 1 1 1 1 1 1
n1n_{1}\downarrow 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
16 n2n_{2}\rightarrow 23
0 1 1 1 0 0 0 0 0
n1n_{1}\downarrow 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0

As noticed from TABLE IV, ”1” means the system will accept the arrival, ”0” means the system will reject the arrival. Since the reward RR is large, from the table we see the system will accept T2T_{2} tasks into the buffer even there is already some T2T_{2} tasks waiting.

Next we decrease the reward R=1R=1, then we have the X(n1,n2)X(n_{1},n_{2}) Value as listed in TABLE V. As seen from TABLE VI, with a smaller reward, from the table we see the system will not accept many T2T_{2} tasks into the waiting buffer.

TABLE V: X(n1,n2)X(n_{1},n_{2}) Values with Optimal Policy
0 n2n_{2}\rightarrow 5
0 16.53 16.38 16.10 15.69 15.16 14.51
n1n_{1}\downarrow 16.50 16.33 16.03 15.61 15.07 14.40
2 16.43 16.24 15.89 15.38 14.72 13.89
6 n2n_{2}\rightarrow 11
0 13.72 12.80 11.75 10.57 9.26 7.68
n1n_{1}\downarrow 13.51 12.43 11.14 9.65 7.97 6.03
2 12.83 11.52 9.99 8.22 6.22 3.94
12 n2n_{2}\rightarrow 17
0 5.82 3.64 1.12 -1.76 -5.03 -8.72
n1n_{1}\downarrow 3.79 1.22 -1.72 -5.05 -8.80 -12.99
2 1.32 -1.66 -5.04 -8.85 -13.11 -17.85
TABLE VI: Actions for T2T_{2} task of Optimal Policy
0 n2n_{2}\rightarrow 7
0 1 1 1 1 1 1 1 1
n1n_{1}\downarrow 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
8 n2n_{2}\rightarrow 15
1 1 1 1 0 0 0 0 0
n1n_{1}\downarrow 1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0

V-B Threshold to Values

From equation (1), the optimal policy would get the maximum total expected discounted reward if starting from an initial state. Therefore, by using the obtained thresholds in a policy, we can also calculate the corresponding total expected discounted reward using equation (1). Using rate uniformization technique, the expected time between two epoch is 1c\frac{1}{c}. The calculation runs until the discount is less than 1E61E-6 and the Expected X(n1,n2)X(n_{1},n_{2}) Values are shown in TABLE VII and VIII.

TABLE VII: Expected X(n1,n2)X(n_{1},n_{2}) Values of Optimal Policy with R=1R=1
0 n2n_{2}\rightarrow 5
0 16.53 16.38 16.10 15.69 15.16 14.51
n1n_{1}\downarrow 16.50 16.33 16.03 15.61 15.07 14.40
2 16.43 16.24 15.89 15.38 14.72 13.89
6 n2n_{2}\rightarrow 11
0 13.72 12.80 11.75 10.57 9.26 7.68
n1n_{1}\downarrow 13.51 12.43 11.14 9.65 7.97 6.03
2 12.83 11.52 9.99 8.22 6.22 3.94
12 n2n_{2}\rightarrow 17
0 5.82 3.64 1.12 -1.76 -5.03 -8.72
n1n_{1}\downarrow 3.79 1.22 -1.72 -5.05 -8.80 -12.99
2 1.32 -1.66 -5.04 -8.85 -13.11 -17.85
TABLE VIII: Expected X(n1,n2)X(n_{1},n_{2}) Values of Optimal Policy with R=5R=5
0 n2n_{2}\rightarrow 5
0 96.53 96.38 96.10 95.69 95.16 94.51
n1n_{1}\downarrow 96.50 96.33 96.03 95.61 95.07 94.40
2 96.43 96.24 95.89 95.38 94.72 93.89
6 n2n_{2}\rightarrow 11
0 93.72 92.79 91.74 90.55 89.22 87.63
n1n_{1}\downarrow 93.51 92.41 91.11 89.61 87.90 85.93
2 92.81 91.49 89.94 88.14 86.11 83.78
12 n2n_{2}\rightarrow 17
0 85.73 83.51 80.95 78.00 74.66 70.89
n1n_{1}\downarrow 83.65 81.03 78.03 74.62 70.79 66.49
2 81.11 78.06 74.61 70.71 66.35 61.51
18 n2n_{2}\rightarrow 23
0 66.67 61.98 56.81 51.11 44.88 38.09
n1n_{1}\downarrow 61.73 56.46 50.67 44.34 37.44 29.94
2 56.17 50.29 43.87 36.86 29.26 21.02

Comparing Table VII and VIII to the TABLE III and V, we see the differences between these Tables is very small, which proves the correctness of the calculation from the Value Iteration method in the last subsection.

V-C Threshold Estimation with Machine Learning

Taking the parameters in the CTMDP model as the input and the corresponding threshold value as the output, we can build the neural network model as shown in Fig. 2. Based on these development, we can build the training data sets as shown in TABLE IX, here reward RR is chosen from 1, 2, 3, 4, 5, 6, 7, 81,\,2,\,3,\,4,\,5,\,6,\,7,\,8, λ2\lambda_{2} is chosen from 1, 2, 3, 4, 51,\,2,\,3,\,4,\,5, and μ2\mu_{2} is chosen from 8, 10, 12, 14, 168,\,10,\,12,\,14,\,16, while keeping all the other parameters unchanged.

The combination of these parameters will give us 855=2008*5*5=200 training data inputs totally. More specifically, as shown in Fig. 2, this is a two-layer neural network with 30 hidden neurons. The inputs are the set of parameters in our CTMDP model, such as reward, arrival rates, departure rates, which makes the total input parameters be 5; output are the threshold values for each n1n_{1}, in this case there are 33 outputs, which is reflected in Fig. 2.

TABLE IX: Training Dataset for Neural Network Model
RR 1, 2, 3, 4, 5, 6, 7, 8
λ2\lambda_{2} 1,  2,  3,  4,  5
μ2\mu_{2} 8,  10,  12,  14,  16
Refer to caption
Figure 2: Neural Network Model Created With Matlab
TABLE X: Thresholds comparison from Neural Network to Actual Value
R=1.3R=1.3 2.3 3.3 4.3
0 (11, 11) (13, 13) (16, 16) (18, 18)
n1n_{1}\downarrow (9, 9) (12, 12) (14, 15) (17, 17)
2 (8, 7) (11, 11) (13, 14) (16, 16)
R=5.3R=5.3 6.3 7.3 8.3
0 (20, 20) (22, 22) (23, 24) (25, 25)
n1n_{1}\downarrow (19, 18) (21, 20) (22, 22) (24, 24)
2 (18, 18) (19, 20) (21, 21) (23, 23)

After the model is trained, we can test the model with some other parameter settings which is different from those in the initial training dataset. If denote by nrn_{r} as the real threshold values from Value Iteration method, nmn_{m} as the threshold value through machine learning method, as listed in the Table IX, several parameter sets for (R,λ2=1,μ2=8)(R,\lambda_{2}=1,\mu_{2}=8) are chosen to compare the threshold pairs (nr,nm)(n_{r},n_{m}) from actual computation and the neural network model. From TABLE X, it is observed that the machine learning model can do a good estimation of the thresholds.

VI Conclusion and Discussion

In order to reduce energy costs, cloud providers start to mix online and batch tasks in the same data center. However, it is always challengeable to efficiently optimize the assignment of the suitable resource to the users. To address this challenging problem, a novel model of a cloud data center is studied in this paper with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources. An online user or the user with batch tasks may decide whether or not to pay for getting the services. Particularly, the online service tasks have a higher priority over batch tasks and both types of tasks needs a certain number of virtual machines (VM). The objective here is to maximize the total discounted reward. The optimal policy to reach the objective for admitting task tasks is finally verified to be a state-related control limit policy. After the optimal policy is determined in general, it is hard to identify the range of the threshold values which is particularly useful in reality. Therefore, we further consider the possible boundaries of the optimal value. A lower and an upper bound for such an optimal policy are then derived, respectively, for the estimation purpose. Finally, a comprehensive set of experiments on the various cases to validate this proposed solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. It is our observation that the major idea of this research can be extended to the research to maximize the expected average award or other related objective functions. The results offered in this paper will be expectedly utilized in various cloud data centers with different cognitive characteristics in an economically optimal strategy.

[Verification of Lemma 3] We will now provide the Verification of Lemma 3. Similar as equation (31), for n1<N1n_{1}<N_{1}, we have

(α+β0(n1+1,n2+1))Δn2X(n1+1,n2)\displaystyle\big{(}\alpha+\beta_{0}(n_{1}+1,n_{2}+1)\big{)}\Delta_{n_{2}}X(n_{1}+1,n_{2}) (52)
=\displaystyle= Δn2f(n1+1,n2)\displaystyle-\Delta_{n_{2}}f(n_{1}+1,n_{2})
+λ1Δn2v((n1+1,n2),A1)\displaystyle+\lambda_{1}\Delta_{n_{2}}v(\langle(n_{1}+1,n_{2}),A_{1}\rangle)
+λ2Δn2v((n1+1,n2),A2)\displaystyle+\lambda_{2}\Delta_{n_{2}}v(\langle(n_{1}+1,n_{2}),A_{2}\rangle)
+C1(n1+1)μ1Δn2X(n1,n2)\displaystyle+C_{1}(n_{1}+1)\mu_{1}\Delta_{n_{2}}X(n_{1},n_{2})
+(CC1(n1+1))μ2Δn2X(n1+1,n21).\displaystyle+(C-C_{1}(n_{1}+1))\mu_{2}\Delta_{n_{2}}X(n_{1}+1,n_{2}-1).

By a similar implementation on equation (31) and equation (52), for n1<N1n_{1}<N_{1}, we have

(α+β0(n1+1,n2+1))Δn1,n2(2)X(n1,n2)\displaystyle(\alpha+\beta_{0}(n_{1}+1,n_{2}+1))\Delta^{(2)}_{n_{1},n_{2}}X(n_{1},n_{2}) (53)
=\displaystyle= (α+β0(n1+1,n2+1))Δn2X(n1+1,n2)\displaystyle(\alpha+\beta_{0}(n_{1}+1,n_{2}+1))\Delta_{n_{2}}X(n_{1}+1,n_{2})
(α+β0(n1,n2+1)+bμ1bμ2)Δn2X(n1,n2)\displaystyle-(\alpha+\beta_{0}(n_{1},n_{2}+1)+b\mu_{1}-b\mu_{2})\Delta_{n_{2}}X(n_{1},n_{2})
=\displaystyle= Δn1,n2(2)f(n1,n2)\displaystyle-\Delta^{(2)}_{n_{1},n_{2}}f(n_{1},n_{2})
+λ1Δn1,n2(2)v((n1,n2),A1)\displaystyle+\lambda_{1}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)
+λ2Δn1,n2(2)v((n1,n2),A2)\displaystyle+\lambda_{2}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1+1)μ1Δn2X(n1,n2)\displaystyle+C_{1}(n_{1}+1)\mu_{1}\Delta_{n_{2}}X(n_{1},n_{2})
C1(n1)Δn2X(n11,n2)\displaystyle-C_{1}(n_{1})\Delta_{n_{2}}X(n_{1}-1,n_{2})
+C2(n1+1,n2)μ2Δn2X(n1+1,n21)\displaystyle+C_{2}(n_{1}+1,n_{2})\mu_{2}\Delta_{n_{2}}X(n_{1}+1,n_{2}-1)
C2(n1,n2)Δn2X(n1,n21)\displaystyle-C_{2}(n_{1},n_{2})\Delta_{n_{2}}X(n_{1},n_{2}-1)
+bμ2Δn2X(n1,n2)bμ1Δn2X(n1,n2)\displaystyle+b\mu_{2}\Delta_{n_{2}}X(n_{1},n_{2})-b\mu_{1}\Delta_{n_{2}}X(n_{1},n_{2})
=\displaystyle= Δn1,n2(2)f(n1,n2)\displaystyle-\Delta^{(2)}_{n_{1},n_{2}}f(n_{1},n_{2})
+λ1Δn1,n2(2)v((n1,n2),A1)\displaystyle+\lambda_{1}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)
+λ2Δn1,n2(2)v((n1,n2),A2)\displaystyle+\lambda_{2}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)
+C1(n1)μ1Δn1,n2(2)X(n11,n2)\displaystyle+C_{1}(n_{1})\mu_{1}\Delta^{(2)}_{n_{1},n_{2}}X(n_{1}-1,n_{2})
+C2(n1+1,n2)μ2Δn1,n2(2)X(n1,n21)\displaystyle+C_{2}(n_{1}+1,n_{2})\mu_{2}\Delta^{(2)}_{n_{1},n_{2}}X(n_{1},n_{2}-1)
+bμ2Δn2(2)X(n1,n21).\displaystyle+b\mu_{2}\Delta^{(2)}_{n_{2}}X(n_{1},n_{2}-1).

We can now use Value Iteration Method with three steps to show that Δn2X(n1,n2)\Delta_{n_{2}}X(n_{1},n_{2}) is a nonincreasing function on n1n_{1} for any given n2n_{2} as below:

Step A-1: Similar as the analysis in Theorem 1, set X(0)(n1,n2)=0X^{(0)}(n_{1},n_{2})=0, we will have

Δn2X(1)(n1,n2)=Δn2f(n1,n2)α+c.\Delta_{n_{2}}X^{(1)}(n_{1},n_{2})=-\frac{\Delta_{n_{2}}f(n_{1},n_{2})}{\alpha+c}.

Since Δn2f(n1+1,n2)Δn2f(n1,n2))\Delta_{n_{2}}f(n_{1}+1,n_{2})\geq\Delta_{n_{2}}f(n_{1},n_{2})), we will also have

Δn2X(1)(n1+1,n2)Δn2X(1)(n1,n2))\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2})-\Delta_{n_{2}}X^{(1)}(n_{1},n_{2}))
=\displaystyle= Δn2f(n1,n2)Δn2f(n1+1,n2))α+c\displaystyle\frac{\Delta_{n_{2}}f(n_{1},n_{2})-\Delta_{n_{2}}f(n_{1}+1,n_{2}))}{\alpha+c}
\displaystyle\leq 0.\displaystyle 0.

Step A-2: By using the result in above Step 1 and the equation (39), we know that

Δn2v(1)((n1+1,n2),A1)Δn2v(1)((n1,n2),A1).\Delta_{n_{2}}v^{(1)}(\langle(n_{1}+1,n_{2}),A_{1}\rangle)\leq\Delta_{n_{2}}v^{(1)}(\langle(n_{1},n_{2}),A_{1}\rangle).

From the result and verification process in Theorem 1, with the cost function being convex and nondecreasing which means Δn2f(n1,n2+1)Δn2f(n1,n2)\Delta_{n_{2}}f(n_{1},n_{2}+1)\geq\Delta_{n_{2}}f(n_{1},n_{2}), we know that X(1)(n1,n2)X^{(1)}(n_{1},n_{2}) is concave and nonincreasing on n2n_{2} for any n1n_{1}, which means

Δn2X(1)(n1,n2)\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1},n_{2}) \displaystyle\leq Δn2X(1)(n1,n21),\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1},n_{2}-1),
Δn2X(1)(n1+1,n2)\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2}) \displaystyle\leq Δn2X(1)(n1+1,n21).\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2}-1).

By further applying the result in Lemma 2, let X(1)(n1+1,i)X^{(1)}(n_{1}+1,i) be h1(i)h_{1}(i) and X(1)(n1,i)X^{(1)}(n_{1},i) be h2(i)h_{2}(i). Since Δn2X(1)(n1+1,n2)Δn2X(1)(n1,n2))\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2})\leq\Delta_{n_{2}}X^{(1)}(n_{1},n_{2})) for any n2n_{2}, by using equation (18), we will know that

Δn2v(1)((n1+1,n2),A2)Δn2v(1)((n1,n2),A2).\Delta_{n_{2}}v^{(1)}(\langle(n_{1}+1,n_{2}),A_{2}\rangle)\leq\Delta_{n_{2}}v^{(1)}(\langle(n_{1},n_{2}),A_{2}\rangle).

By using the results in equation (53), we will have

Δn2X(2)(n1+1,n2)Δn2X(2)(n1,n2))0.\Delta_{n_{2}}X^{(2)}(n_{1}+1,n_{2})-\Delta_{n_{2}}X^{(2)}(n_{1},n_{2}))\leq 0.

Step A-3: Finally, by noting the Theorem 11.3.2 of  [16] that the optimality equation has the unique solution, we know the value iteration X(n)(n1,n2)X^{(n)}(n_{1},n_{2}) will uniquely converges. Therefore, as the iteration continues, with nn goes to \infty, we know that for any n1<N1n_{1}<N_{1},

Δn2X(n1+1,n2)Δn2X(n1,n2)),\Delta_{n_{2}}X(n_{1}+1,n_{2})\leq\Delta_{n_{2}}X(n_{1},n_{2})),

always holds.

The verification of the Lemma 3 is now completed.

References

  • [1] A. Shehabi, S. Smith, D. Sartor, R. Brown, M. Herrlin, J. Koomey, E. Masanet, N. Horner, I. Azevedo, and W. Lintner, “United states data center energy usage report,” Jun 2016.
  • [2] C. Canali., R. Lancellotti., and M. Shojafar., “A computation- and network-aware energy optimization model for virtual machines allocation,” in Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,, INSTICC.   SciTePress, 2017, pp. 71–81.
  • [3] J. Fu, B. Moran, J. Guo, E. W. M. Wong, and M. Zukerman, “Asymptotically optimal job assignment for energy-efficient processor-sharing server farms,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 4008–4023, Dec 2016.
  • [4] C. Canali, L. Chiaraviglio, R. Lancellotti, and M. Shojafar, “Joint minimization of the energy costs from computing, data transmission, and migrations in cloud data centers,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 2, pp. 580–595, Jun 2018.
  • [5] L. Wang, F. Zhang, J. A. Aroca, A. V. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “Greendcn: A general framework for achieving energy efficiency in data center networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 4–15, Jan 2014.
  • [6] J. Liu, Y. Zhang, Y. Zhou, D. Zhang, and H. Liu, “Aggressive resource provisioning for ensuring qos in virtualized environments,” IEEE Transactions on Cloud Computing, vol. 3, no. 2, pp. 119–131, Apr 2015.
  • [7] M. Alicherry and T. Lakshman, “Optimizing data access latencies in cloud systems by intelligent virtual machine placement,” in Proceedings - IEEE INFOCOM, Apr 2013, pp. 647–655.
  • [8] R. Cohen, L. Lewin-Eytan, J. Seffi) Naor, and D. Raz, “Almost optimal virtual machine placement for traffic intense data centers,” in Proceedings - IEEE INFOCOM, Apr 2013, pp. 355–359.
  • [9] J. Mei, K. Li, Z. Tong, Q. Li, and K. Li, “Profit maximization for cloud brokers in cloud computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 1, pp. 190–203, Jan 2019.
  • [10] V. D. Valerio, V. Cardellini, and F. L. Presti, “Optimal pricing and service provisioning strategies in cloud systems: A stackelberg game approach,” in 2013 IEEE Sixth International Conference on Cloud Computing, Jun 2013, pp. 115–122.
  • [11] J. He, D. Wu, Y. Zeng, X. Hei, and Y. Wen, “Toward optimal deployment of cloud-assisted video distribution services,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 1717–1728, Oct 2013.
  • [12] L. A. Barroso, J. Clidaras, and U. Hoelzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.   Morgan & Claypool, 2013.
  • [13] C. Jiang, G. Han, J. Lin, G. Jia, W. Shi, and J. Wan, “Characteristics of co-allocated online services and batch jobs in internet data centers: A case study from alibaba cloud,” IEEE Access, vol. 7, pp. 22 495–22 508, Feb 2019.
  • [14] W. Ni, Y. Zhang, and W. W. Li, “An optimal strategy for resource utilization in cloud data centers,” IEEE Access, vol. 7, pp. 158 095–158 112, Oct 2019.
  • [15] W. Ni, W. Li, and M. Alam, “Determination of optimal call admission control policy in wireless networks,” IEEE Transactions on Wireless Communications, vol. 8, pp. 1038 – 1044, Mar 2009.
  • [16] M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Mar 2005.
  • [17] X. Chao, H. Chen, and W. Li, “Optimal control for a tandem network of queues with blocking,” Acta Mathematicae Applicatae Sinica, vol. 13, no. 4, pp. 425–437, Oct 1997. [Online]. Available: https://doi.org/10.1007/BF02009552