Task Admission Control and Boundary Analysis of Cognitive Cloud Data Centers

Wenlong Ni1, Yuhong Zhang2, and Wei Li2, This work was supported in part by US National Science Foundation under Grant No. CNS1827940 and JiangXi Education Department under Grant No. GJJ191688. 199 ZiYang Ave, JiangXi Normal University NanChang, CHINA 330022 23100 Cleburne St, Texas Southern University, Houston, USA TX 77004

Abstract

A novel cloud data center (DC) model is studied here with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources and an online user or the user with batch tasks may decide whether or not to pay for getting the services. The online service tasks have a higher priority in getting the service over batch tasks. Both types of tasks need a certain number of virtual machines (VM). By targeting on the maximization of total discounted reward, an optimal policy for admitting task tasks is finally verified to be a state-related control limit policy. Next, a lower and an upper bound for such an optimal policy are derived, respectively, for the estimation and utilization in reality. Finally, a comprehensive set of experiments on the various cases to validate this proposed model and the solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. The results achieved in this paper will be expectedly utilized in various cloud data centers with cognitive characteristics in an economically optimal strategy.

Index Terms:

Cloud Data Center, Cognitive Network, Optimal Strategy, Cost Efficiency, Bandwidth Allocation.

I Introduction

With the rapid development of Internet and computing technology, more and more data are continually being produced from all over the world. Nowadays, cloud computing has become fundamental for IT operations worldwide, replacing traditional business models. Enterprises can now access the vast majority of software and services online through a visualized environment, avoiding the need for expensive investments in IT infrastructure. On the other hand, challenges in cloud computing need to be addressed such as security, energy efficiency and automated service provisioning. According to various studies, electricity use by DCs has increased significantly due largely to explosive growth in both the number and density of data centers (DCs)[1]. To serve the vast and rapidly growing needs of online businesses, infrastructure providers often end up over provisioning their resources (e.g., number of physical servers) in the DC. Energy efficiency in DCs is a goal of fundamental importance. Various energy-aware scheduling approaches have been proposed to save power and energy consumptions in DCs by minimizing the number of active physical servers hosting the active virtual machines (VMs); VMs also can be migrated between different hosts if necessary[2, 3, 4].

There are basically two parties in the cloud computing paradigm: the cloud service providers (CSPs) and the clients, who each have their own roles in providing and using the computing resources. While enjoying the convenient on-demand access to computing resources or services, the clients need to pay for this access. CSPs can make a profit by charging their clients for the services provided. Clients can avoid the costs associated with ’inhouse’ provisioning of computing resources and at the same time have access to a larger pool of computing resources than they could possibly own by themselves. Many different approaches can be used to evaluate the cloud computing service quality, and various optimization methods can be used to optimize them [5, 6, 7, 8, 9, 10, 11].

Note that in general, there are two types of applications in the DC: service applications and batch applications [12]. Service applications tend to generate many requests with low processing needs whereas batch applications tend to include a small number of requests with large processing needs. Unlike the batch applications that are throughput sensitive, service applications are typically response-time sensitive. In order to reduce power and energy costs, CSPs can mix online and batch tasks on the same cluster [13]. Although the co-allocation of such tasks improves machine utilization, it challenges the DC scheduler especially for latency critical online services.

The problem under investigation in our current research is for a novel cognitive DC model wherein

1.

a DC can determine the cost of using resources and a cloud service user can decide whether or not it will pay the price for the resource for an incoming task;
2.

the online service tasks have a higher priority than batch tasks;
3.

both types of tasks need a certain number of virtual machines (VM) to process.

This paper’s major contributions include:

1.

In order to achieve the maximum total discounted expected reward for any initial state in a DC where the online services have a higher priority than the batch tasks, an Markov Decision Process model for the DCs that serving both online services and batch tasks is firstly established to gain the optimal policy in when to admit or reject a task. As far as we know, this is the first time that an cognitive DC model as described in this paper is modelled in this way with several major theoretical results obtained. This research has also significantly extend our previous work [14] with complicated challenges from non-priority treatment of the tasks in a regular DC model to the priority treatment of the task in a cognitive DC model.
2.

Through a systematic probability analysis, the optimal policy in when to admit or reject a task is finally verified to be a state-related control limit policy.
3.

Furthermore, to be more efficiently utilized in a cognitive DC model for the control limit policy, a lower and an upper bound, respectively, for the optimal policy values are derived theoretically also for the estimation in reality.
4.

Finally, a comprehensive set of experiments on the various cases to validate this proposed solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. The results offered in this paper will be expectedly utilized in various cloud data centers with different cognitive characteristics in an economically

II Model Description and Analysis

This section consists of two subsections, one is the description of the cognitive DC model with all needed parameters, and the other one is the establishment of the corresponding Markov decision process model along with the important components. Alternatively, the online service type is simplified as type-1 and batch task as type-2 tasks.

II-A The Description of the cognitive DC model

The cognitive DC center under investigation provides two services with service type (type-1: $T_{1}$ ) and batch type (type-2: $T_{2}$ ) of application task. $T_{1}$ is primary user (PU) and $T_{2}$ is secondary user (SU), each of which will require resources in the DC. A system work flow diagram is drawn in Fig. 1.

Refer to caption — Figure 1: System Work Flow.

The other detailed assumptions for the cognitive DC are given as follows:

1.

There is a total number of C VMs defining the capacity of resources from the DCs, and two types of Tasks ( $T_{1}$ and $T_{2}$ ) in the system will share those C VMs. $T_{1}$ is a type- $1$ task that is a time-sensitive service type task and needs a number of $b$ (here b is a given non-zero positive integer) VMs for service; $T_{2}$ is a type- $2$ task that is a specific application task and needs one VM for a sequence of operation steps. The $T_{1}$ task has a higher priority than the $T_{2}$ task as explained in the following items. Generally, since the system mainly provides service for $T_{1}$ tasks, it is always assumed that $C=bN_{1}$ , where $N_{1}$ is a positive integer.
2.

The arriving process for tasks $T_{1}$ and $T_{2}$ are Poisson processes with rates $\lambda_{1}$ and $\lambda_{2}$ , respectively. The task processing time for these tasks in one VM follows a negative exponential distribution with rates $\mu_{1}$ and $\mu_{2}$ , respectively. If a $T_{1}$ task is processed with multiple $b$ VMs, the service rate would be $b\mu_{1}$ for that task.
3.

When a $T_{2}$ task comes to the system, the CSP will get it processed immediately when there is a VM available. However, if all VMs are busy, the CSP will decide whether to admit or reject the task based on the current number of type-1 and type-2 tasks in the system. The rejected tasks will leave the system. The admitted $T_{2}$ task will be put in a buffer waiting for the service whenever a VM is released for use. The waiting discipline in the buffer is ignored because the type-2 tasks in the buffer are indistinguishable. When a $T_{2}$ task completes the service, it will leave the system and the corresponding VM will be released for use by other tasks. When a $T_{2}$ task under processing is interrupted because of the arrival of a higher priority type-1 task, the interrupted $T_{2}$ task will be put in the buffer waiting for a new service whenever a VM is free to use.
4.
When a $T_{1}$ task comes to the system, let $n_{1}$ be the number of $T_{1}$ tasks currently in the system; the CSP will take different actions in the following two cases:
1. (a)
  
  If $n_{1}<N_{1}$ , provide the best service through allocating $b$ VMs for service, which includes the possibility of interrupting the service of several $T_{2}$ Tasks under processing;
2. (b)
  
  If $n_{1}=N_{1}$ , reject the $T_{1}$ task.
5.

In this paper, we focus on the admission control of a $T_{2}$ task. Admitting and then serving a $T_{2}$ task would contribute $R$ units of reward to the CSP. However, if a VM serving $T_{2}$ task is preempted by a higher priority $T_{1}$ task, there is a cost, say $r(r\geq 0)$ , for that interruption. To hold the tasks in the system, the CSP needs to pay a holding price at a rate $f(n_{1},n_{2})$ to manage the VMs (resources) in the DCs when there are $n_{1}$ type-1 tasks in the process and $n_{2}$ type-2 tasks in the processing and in the buffer.

It is noteworthy to point put that some notations above, including $T_{1}$ , $T_{2}$ , $\lambda_{1}$ and $\lambda_{2}$ , and $\mu_{1}$ and $\mu_{2}$ , are used as the same as those in our recent work in [14] so that the audience may easily link up with our recent works in this area.

To better understanding all major given parameters in this paper, we summarize them in the Table I below:

TABLE I: A list of major given parameters

C	Number of VMs in the DC
$\lambda_{i}$	Arrival rate of $T_{i}$ ( $i=1,2$ )
$\mu_{i}$	Service rate of $TIi$ ( $i=1,2$ )
$\alpha$	Continuous-time discount factor
$b$	Number of VMs needed for processing a $T_{1}$
$r$	Interruption cost of a $T_{2}$ by a $T_{1}$
$R$	Reward of completing a $T_{2}$
$f(n_{1},n_{2})$	Holding cost rate at state $(n_{1},n_{2})$

II-B Markov Decision Process Model and its Components Analysis

Our objective in this research is to find the optimal policy such that the total expected discounted reward is maximized. In detail, if denote by $s_{t}$ the state at time $t$ , $a_{t}$ the action to take at state $s_{t}$ , and $r(s_{t},a_{t})$ for the reward obtained when action $a_{t}$ is selected at state $s_{t}$ , our objective is to find an optimal policy $\pi_{\alpha}$ that can bring the maximum total expected discounted reward $v_{\alpha}^{\pi}(s)$ as defined below for every initial state $s$ .

v_{\alpha}^{\pi}(s)=E_{s}^{\pi}\bigg{\{}\int_{0}^{\infty}e^{-\alpha t}r(s_{t},a_{t})dt\bigg{\}}.

(1)

Here, a policy $\pi$ specifies the decision rule to be used at every decision epoch, $\alpha$ is the discount factor. It gives the decision maker a prescription for action selection for any possible future system state or history.

Based on the model description and above objective, we can now establish a Markov decision process as follows:

1.

Let’s define the state space of the system operating process as $S=\{s:s=(n_{1},n_{2})\}$ , where integers $n_{1}$ and $n_{2}$ satisfy $N_{1}\geq n_{1}\geq 0$ and $n_{2}\geq 0$ . The event space is defined by $E=\{D_{1},D_{2},A_{1},A_{2}\}$ , where $D_{1}$ and $D_{2}$ means a $T_{1}$ and $T_{2}$ departure from the system after service, while $A_{1}$ means an arrival of a $T_{1}$ task, $A_{2}$ is an arrival of $T_{2}$ task. Since the states migration not only depends on the number of tasks in the system but also depends on the happening departure and arrival events, we will need to define a new state space as $\hat{S}=S\times E$ . By doing so a state could be generally written as $\hat{s}=\langle s,e\rangle=\langle(n_{1},n_{2}),e\rangle$ , where $n_{1}$ and $n_{2}$ are the numbers for $T_{1}$ and $T_{2}$ tasks, $e$ stands for the event which will probably happen on state $(n_{1},n_{2})$ , $e\in\{D_{1},D_{2},A_{1},A_{2}\}$ . Please be noticed that the specification of the event in this paper is one of major technical differences from that in paper [15], in which the event is assumed to happen before a state changes.

Denote by $a_{C}$ as the action to continue, the action space for states $\langle(n_{1},n_{2}),D_{1}\rangle$ and $\langle(n_{1},n_{2}),D_{2}\rangle$ in the state space is then given by

	$\displaystyle A_{\langle(n_{1},n_{2}),D_{1}\rangle}=\{a_{C}\},n_{1}>0;$
	$\displaystyle A_{\langle(n_{1},n_{2}),D_{2}\rangle}=\{a_{C}\},n_{2}>0.$

Similarly, in states $\langle(n_{1},n_{2}),A_{1}\rangle$ and $\langle(n_{1},n_{2}),A_{2}\rangle$ , if denote by $a_{R}$ as the action to reject the request and $a_{A}$ as the action to admit, the action space will be

	$\displaystyle A_{\langle(n_{1},n_{2}),A_{1}\rangle}=\{a_{R},a_{A}\};$
	$\displaystyle A_{\langle(n_{1},n_{2}),A_{2}\rangle}=\{a_{R},a_{A}\}.$

Let $C_{1}(n_{1})$ be the number of VMs occupied by $T_{1}$ tasks, $C_{2}(n_{1},n_{2})$ be the number of VMs occupied by $T_{2}$ tasks, $C_{v}(n_{1},n_{2})$ be the number of VMs serving $T_{2}$ tasks that will be pre-empted if admitting a $T_{1}$ task, sometimes simplified as $C_{1}$ , $C_{2}$ and $C_{v}$ in this paper. From these definitions, it is easy to know that

$\displaystyle C_{1}(n_{1})$	$\displaystyle=$	$\displaystyle bn_{1},n_{1}\leq N_{1},$
$\displaystyle C_{2}(n_{1},n_{2})$	$\displaystyle=$	$\displaystyle\min{(C-C_{1}(n_{1}),n_{2})},$
$\displaystyle C_{v}(n_{1},n_{2})$	$\displaystyle=$	$\displaystyle\max{(C_{1}+C_{2}+b-C,0)},n_{1}<N_{1},$
$\displaystyle C_{v}(n_{1},n_{2})$	$\displaystyle=$	$\displaystyle 0,n_{1}=N_{1}.$

The decision epochs are those time points when a call arriving or leaving the system. Based on our assumption, it is not too hard to know that the distribution of time between two epochs is

F(t|\hat{s},a)=1-e^{-\beta(\hat{s},a)t},t\geq 0,

where $\hat{s}=\langle((n_{1},n_{2})),b\rangle$ . Denote by $s=(n_{1},n_{2})$ and $\beta_{0}(s)=\lambda_{1}+\lambda_{2}+C_{1}\mu_{1}+C_{2}\mu_{2}.$ Since a departure event only happens when there is a task in the system, the $\beta(\hat{s},a)$ will be represented for an action $a$ as

\displaystyle\left\{\begin{array}[]{ll}\beta_{0}(s)-b\mu_{1},&e=D_{1},a=a_{C},n_{1}>0,C_{2}=n_{2}\\ \beta_{0}(s)-b\mu_{1}+&e=D_{1},a=a_{C},n_{1}>0,n_{2}>C_{2},\\ \min{(n_{2}-C_{2},b)}\mu_{2},&\\ \beta_{0}(s)-\mu_{2},&e=D_{2},a=a_{C},C_{2}=n_{2}>0,\\ \beta_{0}(s),&e=D_{2},a=a_{C},n_{2}>C_{2},\\ \beta_{0}(s)+b\mu_{1},&e=A_{1},a=a_{A},C_{1}+C_{2}\leq C-b,\\ \beta_{0}(s)+b\mu_{1}-&e=A_{1},a=a_{A},\\ C_{v}\mu_{2},&C_{1}+C_{2}>C-b,n_{1}<N_{1}-1,\\ \lambda_{1}+\lambda_{2}+C\mu_{1},&e=A_{1},a=a_{A},n_{1}=N_{1}-1,\\ \beta_{0}(s)+\mu_{2},&e=A_{2},a=a_{A},C_{1}+C_{2}<C,\\ \beta_{0}(s),&e=A_{2},a=a_{A},C_{1}+C_{2}=C,\\ \beta_{0}(s),&e=\{A_{1},A_{2}\},a=a_{R}.\end{array}\right.

Let $q(j|\hat{s},a)$ denote the probability that the system occupies state $j$ in the next epoch, if at the current epoch the system is at state $\hat{s}$ and the decision maker takes action $a\in A_{\hat{s}}$ . For the cases of departure events, e.g. for a departure event of $D_{1}$ under the condition of ( $n_{1}>0$ ), $(\hat{s},a)=(\langle(n_{1},n_{2}),D_{1}\rangle,a_{C})$ , if denote by $s_{n_{1}}=(n_{1}-1,n_{2})$ , then we will have $q(j|\hat{s},a)$ as

\displaystyle\left\{\begin{array}[]{ll}\lambda_{1}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),A_{1}\rangle,\\ \lambda_{2}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),A_{2}\rangle,\\ C_{1}(n_{1}-1)\mu_{1}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),D_{1}\rangle,\\ C_{2}(n_{1}-1,n_{2})\mu_{2}/\beta_{0}(s_{n_{1}}),&j=\langle(n_{1}-1,n_{2}),D_{2}\rangle.\\ \end{array}\right.

Similar equations can be derived for case when $(\hat{s},a)=(\langle(n_{1},n_{2}),D_{2}\rangle,a_{C})$ . If denote by $s_{n_{2}}=(n_{1},n_{2}-1)$ at the condition of $n_{2}>0$ , we have $q(j|\hat{s},a)$ as

\displaystyle\left\{\begin{array}[]{ll}\lambda_{1}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),A_{1}\rangle,\\ \lambda_{2}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),A_{2}\rangle,\\ C_{1}(n_{1})\mu_{1}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),D_{1}\rangle,\\ C_{2}(n_{1},n_{2}-1)\mu_{2}/\beta_{0}(s_{n_{2}}),&j=\langle(n_{1},n_{2}-1),D_{2}\rangle.\\ \end{array}\right.

For the cases of arrival events, such as $(\hat{s},a)=(\langle(n_{1},n_{2}),A_{1}\rangle,a_{A})$ , and $(\hat{s},a)=(\langle(n_{1},n_{2}),A_{2}\rangle,a_{A})$ , since admitting an incoming call migrates the system state immediately (adding one user or not), we will get $q(j|\hat{s},a)$ as

\displaystyle\left\{\begin{array}[]{ll}q(j|\langle(n_{1}+2,n_{2}),D_{1}\rangle,a_{C}),&e=A_{1},a=a_{A},\\ q(j|\langle(n_{1}+1,n_{2}),D_{1}\rangle,a_{C}),&e=A_{1},a=a_{R},\\ q(j|\langle(n_{1},n_{2}+2),D_{2}\rangle,a_{C}),&e=A_{2},a=a_{A},\\ q(j|\langle(n_{1},n_{2}+1),D_{2}\rangle,a_{C}),&e=A_{2},a=a_{R}.\end{array}\right.

Because the system state does not change between decision epochs, from Chp 11.5.2 [16] and our assumptions, the expected discounted reward between epochs satisfies

$\displaystyle r(\hat{s},a)$	$\displaystyle=$	$\displaystyle k(\hat{s},a)+c(\hat{s},a)E_{\hat{s}}^{a}\left\{\int_{0}^{\tau_{1}}e^{-\alpha t}dt\right\}$
	$\displaystyle=$	$\displaystyle k(\hat{s},a)+c(\hat{s},a)E_{\hat{s}}^{a}\left\{[1-e^{-\alpha\tau_{1}}]/\alpha\right\}$
	$\displaystyle=$	$\displaystyle k(\hat{s},a)+\frac{c(\hat{s},a)}{\alpha+\beta(\hat{s},a)},$

where

\displaystyle k(\hat{s},a)=\left\{\begin{array}[]{cc}0,&e=\{D_{1},D_{2}\},a=a_{C},\\ 0,&e=\{A_{1},A_{2}\},a=a_{R},\\ -C_{v}r,&e=A_{1},a=a_{A},\\ R,&e=A_{2},a=a_{A}.\end{array}\right.

Also, we have the cost function $c(\hat{s},a)$ as

\displaystyle\left\{\begin{array}[]{cl}-f(n_{1}-1,n_{2}),&e=D_{1},a=a_{C},n_{1}>0,\\ -f(n_{1},n_{2}-1),&e=D_{2},a=a_{C},n_{2}>0,\\ -f(n_{1}+1,n_{2}),&e=A_{1},a=a_{A},n_{1}<N_{1},\\ -f(n_{1},n_{2}+1),&e=A_{2},a=a_{A},\\ -f(n_{1},n_{2}),&e=\{A_{1},A_{2}\},a=a_{R}.\end{array}\right.

In the next section we will prove that there exists a state-related threshold for accepting the tasks if the cost function has some special properties.

III Optimal Stationary State-related Control Limit Policy

A policy is stationary if, for each decision epoch $t$ , decision rule at $t$ epoch $d_{t}=d$ is the same. Furthermore, a policy is called a control limit policy (or a threshold policy) for a given number of Tasks $n_{1}$ and $n_{2}$ in the system, for $T_{2}$ task, is there a constant or threshold $D(n_{1})\geq 0$ such that the system will accept the arriving $T_{2}$ whenever the number of $T_{2}$ currently in the system is less than $D(n_{1})$ , that means the decision rule for $T_{2}$ is:

\displaystyle d(n_{1},n_{2})=\left\{\begin{array}[]{cc}Admit,&n_{2}\leq D(n_{1}),\\ Reject,&n_{2}>D(n_{1}).\end{array}\right.

(10)

III-A Total Discounted Reward

Denote by a constant $c=\lambda_{1}+\lambda_{2}+C*\max(\mu_{1},\mu_{2})$ , which is bigger than $[1-q(\hat{s}|\hat{s},a)]\beta(\hat{s},a)$ . From Chp 11.5.2 [16], we know that there is a unique optimal solution of our model and this solution is also a stationary state-related control limit policy satisfying

v_{\alpha}^{d}(\hat{s})=r_{d}(\hat{s})+\frac{\beta_{d}(\hat{s})}{\alpha+\beta_{d}(\hat{s})}\sum_{j\in\hat{S}}q_{d}(j|\hat{s})v_{\alpha}^{d}(j).

(11)

By using above equation and also the uniformization technique described in [16], we have

	$\displaystyle v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)$	(12)
$\displaystyle=$	$\displaystyle\frac{1}{\alpha+c}\Big{[}-f(n_{1},n_{2})$
	$\displaystyle\hskip 14.22636pt+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle\hskip 14.22636pt+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}),D_{1}\rangle)$
	$\displaystyle\hskip 14.22636pt+C_{2}(n_{1},n_{2})\mu_{2}v(\langle(n_{1},n_{2}),D_{2}\rangle)$
	$\displaystyle\hskip 14.22636pt+(c-\beta_{0}(n_{1},n_{2}))v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)\Big{]}.$

This means that

	$\displaystyle v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)$	(13)
$\displaystyle=$	$\displaystyle\frac{1}{\alpha+\beta_{0}(n_{1},n_{2})}\Big{[}-f(n_{1},n_{2})$
	$\displaystyle\hskip 14.22636pt+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle\hskip 14.22636pt+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}),D_{1}\rangle)$
	$\displaystyle\hskip 14.22636pt+C_{2}(n_{1},n_{2})\mu_{2}v(\langle(n_{1},n_{2}),D_{2}\rangle)\Big{]}.$

Similarly, it is easily found that

v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)=v(\langle(n_{1},n_{2}+1),D_{2}\rangle),

which shows the equality between different departure events. This leads us to define a new function $X(n_{1},n_{2})$ as below:

	$\displaystyle X(n_{1},n_{2})$	$\displaystyle=$	$\displaystyle v(\langle(n_{1}+1,n_{2}),D_{1}\rangle)$		(14)
		$\displaystyle=$	$\displaystyle v(\langle(n_{1},n_{2}+1),D_{2}\rangle),$		(15)

for any $n_{1}\geq 0$ and $n_{2}\geq 0$ .

It is noticed that $X(n_{1},n_{2})$ , is only related to the state, but not with the happening event. This observation will greatly simplify the needed proof processes in next several sections.

Similar as above results for a departure event, we can consider an arrival event and will get the following results:

	$\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{A})$	(16)
$\displaystyle=$	$\displaystyle R\frac{\alpha+\beta_{0}(n_{1},n_{2}+1)}{\alpha+c}$
	$\displaystyle+\frac{1}{\alpha+c}\Big{[}-f(n_{1},n_{2}+1)$
	$\displaystyle+\lambda_{1}v(\langle(n_{1},n_{2}+1),A_{1}\rangle)+\lambda_{2}v(\langle(n_{1},n_{2}+1),A_{2}\rangle)$
	$\displaystyle+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}+1),D_{1}\rangle)$
	$\displaystyle+C_{2}(n_{1},n_{2}+1)\mu_{2}v(\langle(n_{1},n_{2}+1),D_{2}\rangle)$
	$\displaystyle+(c-\beta_{0}(n_{1},n_{2}+1))v(\langle(n_{1},n_{2}),A_{2}\rangle)\Big{]},$

and

	$\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{R})$	(17)
$\displaystyle=$	$\displaystyle\frac{1}{\alpha+c}\Big{[}-f(n_{1},n_{2})+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)$
	$\displaystyle\hskip 14.22636pt+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle\hskip 14.22636pt+C_{1}(n_{1})\mu_{1}v(\langle(n_{1},n_{2}),D_{1}\rangle)$
	$\displaystyle\hskip 14.22636pt+C_{2}(n_{1},n_{2})\mu_{2}v(\langle(n_{1},n_{2}),D_{2}\rangle)$
	$\displaystyle\hskip 14.22636pt+(c-\beta_{0}(n_{1},n_{2}))v(\langle(n_{1},n_{2}),A_{2}\rangle)\Big{]}.$

From above equations, we can easily get

	$\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{A})$	$\displaystyle\geq$	$\displaystyle R+X((n_{1},n_{2}+1)),$
	$\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle,a_{R})$	$\displaystyle\geq$	$\displaystyle X((n_{1},n_{2})).$

In fact, these two inequalities will be the equalities when the corresponding action $a_{A}$ or $a_{R}$ is the best action, respectively. From these analysis, it is not too hard to verify that

			$\displaystyle v(\langle(n_{1},n_{2}),A_{2}\rangle)$		(18)
		$\displaystyle=$	$\displaystyle\max\Big{[}X((n_{1},n_{2})),R+X((n_{1},n_{2}+1))\Big{]}.$		(18)

For the $T_{1}$ tasks, as the system always accepts them until all VMs are being used, we have

			$\displaystyle v(\langle(n_{1},n_{2}),A_{1}\rangle)$		(21)
		$\displaystyle=$	$\displaystyle\left\{\begin{array}[]{cl}-C_{v}r+X((n_{1}+1,n_{2})),&n_{1}<N_{1},\\ X((n_{1},n_{2})),&n_{1}=N_{1}.\end{array}\right.$		(21)

III-B Optimal Result

Before providing our major optimal result, we need to introduce a general result as below.

Lemma 1: Let $h(i)$ ( $i\geq 0$ ) be an integer concave function, and denote by

g(i)\equiv\max\{h(i),R+h(i+1)\}\,\,\,\,i\geq 0,

for a given constant $R$ . Then, $g(i)$ is also an integer concave function for $i\geq 0$ .

Proof: Denote by $\Delta h(i)=h(i+1)-h(i)$ for any integer $i\geq 0$ , we will prove this Lemma by considering the following three cases:

Case 1: If for any $i\geq 0$ , $\Delta h(i)\leq-R$ . Thus, $g(i)\equiv h(i)$ and $g(i)$ is then concave.

Case 2: If for any $i\geq 0$ , $\Delta h(i)\geq-R$ . Thus, $g(i)\equiv R+h(i+1)$ and $g(i)$ is then concave.

Case 3: We now consider the case when both of above cases would not be true. In this case, we can first know that $\Delta h(1)\geq-R$ . In fact, if it is not sure, i.e., if $\Delta h(1)\leq-R$ , we can inductively verify that $\Delta h(i)\leq-R$ for any $i\geq 0$ by noting the assumption that $h(i)$ is an integer concave function.

Since $\Delta h(1)\geq-R$ , $\Delta h(i)$ is decreasing because of the concavity of $h(i)$ , and the Case 2 will not hold for all $i$ in this Case 3, there then must exist an integer $k\geq 1$ such that

	$\displaystyle\Delta h(j)$	$\displaystyle\geq$	$\displaystyle-R,\,\,\,\,\,{\rm for}\,\,\,j=1,2,...,k,$
	$\displaystyle\Delta h(k+1)$	$\displaystyle\leq$	$\displaystyle-R.$

From this analysis, we will know that for any $i\geq 0$ ,

\displaystyle g(i)=\left\{\begin{array}[]{cl}R+h(i+1),&i\leq k,\\ h(i),&i>k.\end{array}\right.

The concavity of $g(i)$ , i.e., $\Delta g(i)\leq 0$ , is then verified by the concavity of $h(i)$ , of a constant $-R$ , and the following further notification for any $i\geq 1$ :

\displaystyle\Delta g(i)=\left\{\begin{array}[]{cl}\Delta h(i+1),&i<k,\\ -R,&i=k,\\ \Delta h(i),&i>k.\end{array}\right.

Remark 1: It is worthy to point out that there is no condition for the given constant in the Lemma 1. This constant could be either negative or positive. In fact, we introduced this result in paper [17] for the case when the constant is non-positive, and in paper [14] for the case when the constant is non-negative. However, it is the first time in this paper that we point out this general result without any condition on the constant $R$ and also provide the mathematical verification.

Based on above Lemma 1 and the expression developed before Lemma 1, we can now provide and then verify the following major optimal result:

Theorem 1: If $f(n_{1},n_{2})$ is convex and increasing function on $n_{2}$ for any given $n_{1}$ , the optimal policy is then a control limit policy. That means, for any state $(n_{1},n_{2})$ , there must exist an integer, say $N_{2}$ , such that decision

\displaystyle a_{\langle(n_{1},n_{2}),A_{2}\rangle}=\left\{\begin{array}[]{ll}a_{A},&\mbox{if}\,\,n_{2}\leq N_{2},\\ a_{R},&\mbox{if}\,\,n_{2}>N_{2}.\\ \end{array}\right.

(26)

Proof: If all VMs are busy when an SU arrives at state $(n_{1},n_{2})$ , we know that $C_{1}(n_{1})+n_{2}\geq C$ and then

C_{2}(n_{1},n_{2})=C-C_{1}(n_{1}).

Therefore

			$\displaystyle\beta_{0}(n_{1},n_{2}+2)=\beta_{0}(n_{1},n_{2}+1)=\beta_{0}(n_{1},n_{2})$		(27)
		$\displaystyle=$	$\displaystyle\lambda_{1}+\lambda_{2}+C_{1}(n_{1})\mu_{1}+(C-C_{1}(n_{1}))\mu_{2},$		(27)

which is independent of $n_{2}$ . Further, by using the notation of $X((n_{1},n_{2}))$ , we can rewrite the equation (13) as below:

	$\displaystyle X((n_{1},n_{2}))$	(28)
$\displaystyle=$	$\displaystyle\frac{1}{\alpha+\beta_{0}(n_{1},n_{2})}\Big{[}-f(n_{1},n_{2})+\lambda_{1}v(\langle(n_{1},n_{2}),A_{1}\rangle)$
	$\displaystyle+\lambda_{2}v(\langle(n_{1},n_{2}),A_{2}\rangle)+C_{1}(n_{1})\mu_{1}X((n_{1}-1,n_{2}))$
	$\displaystyle+(C-C_{1}(n_{1}))\mu_{2}X((n_{1},n_{2}-1))\Big{]}.$

For any two-dimensional integer function $g(n_{1},n_{2})$ ( $n_{1}\geq 0,n_{2}\geq 0$ ), we introduce the following definitions for $n_{1}$ and $n_{2}$ , respectively:

	$\displaystyle\Delta_{n_{2}}g(n_{1},n_{2})$	$\displaystyle=$	$\displaystyle g(n_{1},n_{2}+1)-g(n_{1},n_{2}).$		(29)
	$\displaystyle\Delta^{(2)}_{n_{2}}g(n_{1},n_{2})$	$\displaystyle=$	$\displaystyle\Delta_{n_{2}}g(n_{1},n_{2}+1)-\Delta_{n_{2}}g(n_{1},n_{2}).$		(30)

From the observation in equation (27) and the equation (28), we will have

	$\displaystyle\big{(}\alpha+\beta_{0}(n_{1},n_{2}+1)\big{)}\Delta_{n_{2}}X(n_{1},n_{2})$	(31)
$\displaystyle=$	$\displaystyle-\Delta_{n_{2}}f(n_{1},n_{2})$
	$\displaystyle+\lambda_{1}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle+C_{1}(n_{1})\mu_{1}\Delta_{n_{2}}X(n_{1}-1,n_{2})$
	$\displaystyle+(C-C_{1})\mu_{2}\Delta_{n_{2}}X(n_{1},n_{2}-1).$

Next, by noting

			$\displaystyle C_{v}(n_{1},n_{2}+2)=C_{v}(n_{1},n_{2}+1)=C_{v}(n_{1},n_{2})$		(35)
		$\displaystyle=$	$\displaystyle\begin{array}[]{l}\left\{\begin{array}[]{lc}b,&n_{1}<N_{1},\\ 0,&n_{1}=N_{1},\end{array}\right.\end{array}$		(35)

and equation (21), we will have

			$\displaystyle\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)$		(39)
		$\displaystyle=$	$\displaystyle\begin{array}[]{l}\left\{\begin{array}[]{cc}\Delta_{n_{2}}X(n_{1}+1,n_{2}),&n_{1}<N_{1},\\ \Delta_{n_{2}}X(n_{1},n_{2}),&n_{1}=N_{1}.\end{array}\right.\end{array}$		(39)

By a similar implementation on about two equations (31) and (39) by using the results in equations (27) and (35), we have

	$\displaystyle(\alpha+\beta_{0}(n_{1},n_{2}+2))\Delta^{(2)}_{n_{2}}X(n_{1},n_{2})$	(40)
$\displaystyle=$	$\displaystyle-\Delta^{(2)}_{n_{2}}f(n_{1},n_{2})$
	$\displaystyle+\lambda_{1}\Delta^{(2)}_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)+\lambda_{2}\Delta^{(2)}_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle+C_{1}(n_{1})\mu_{1}\Delta^{(2)}_{n_{2}}X(n_{1}-1,n_{2})$
	$\displaystyle+(C-C_{1})\mu_{2}\Delta^{(2)}_{n_{2}}X(n_{1},n_{2}-1).$

and

			$\displaystyle\Delta^{(2)}_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)$		(44)
		$\displaystyle=$	$\displaystyle\begin{array}[]{l}\left\{\begin{array}[]{cc}\Delta^{(2)}_{n_{2}}X(n_{1}+1,n_{2}),&n_{1}<N_{1},\\ \Delta^{(2)}_{n_{2}}X(n_{1},n_{2}),&n_{1}=N_{1}.\end{array}\right.\end{array}$		(44)

With the preparations on all equations from equation (28) to (44), we can now use Value Iteration Method with three steps to show that for all states $X(n_{1},n_{2})$ is concave and nonincreasing for nonnegative integer function on $n_{2}$ for any given $n_{1}$ as below:

Step 1: Set $X^{(0)}(n_{1},n_{2})=0$ , by noting equations (21) and (18), we know $v^{(0)}(\langle(n_{1},n_{2}),A_{2}\rangle)=R$ and

\displaystyle v^{(0)}(\langle(n_{1},n_{2}),A_{1}\rangle)=\left\{\begin{array}[]{cl}-br,&n_{1}<N_{1},\\ 0,&n_{1}=N_{1}.\end{array}\right.

Substitute these three results into equation (12), we will have

\displaystyle X^{(1)}(n_{1},n_{2})=\left\{\begin{array}[]{cl}\frac{-f(n_{1},n_{2})+\lambda_{1}(-br)+\lambda_{2}R}{\alpha+c},&n_{1}<N_{1},\\ \frac{-f(n_{1},n_{2})+0+\lambda_{2}R}{\alpha+c},&n_{1}=N_{1}.\end{array}\right.

Therefore, for any $n_{1}$ , $X^{(1)}(n_{1},n_{2})$ is concave and nonincreasing on $n_{2}$ .

Step 2: By using above concavity and non-increasing property of $X^{(1)}(n_{1},n_{2})$ , and the equation (21) for the case when all VMs are busy for state $(n_{1},n_{2})$ , or equations (39) and (44), we know that $v^{(1)}(\langle n_{1},n_{2},A_{1}\rangle)$ is concave and non-increasing functions for any $n_{2}$ . By further applying the result in Lemma 1, we know that $v^{(1)}(\langle n_{1},n_{2},A_{2}\rangle)$ is also concave and non-increasing functions for any $n_{2}$ . With these results in mind, and using the results in equations (31) and (40), we will know that

\Delta_{n_{2}}X^{(2)}(n_{1},n_{2})\leq 0,\hskip 14.22636pt{\rm and}\hskip 14.22636pt\Delta^{(2)}_{n_{2}}X^{(2)}(n_{1},n_{2})\leq 0.

These two inequalities justify that for any $n_{1}$ , $X^{(2)}(n_{1},n_{2})$ is nonincreasing and concave on $n_{2}$ .

Step 3: Finally, by noting the Theorem 11.3.2 of [16] that the optimality equation has the unique solution, we know the value iteration $X^{(n)}(n_{1},n_{2})$ will uniquely converges. Therefore, as the iteration continues, with $n$ goes to $\infty$ , we know that for any $n_{1}$ , $X(n_{1},n_{2})$ is always concave nonincreasing for any $n_{2}$ .

Finally, by using the equation (18) and the property of nonincreasing and concave on $n_{2}$ for $X(n_{1},n_{2})$ , it is straight forward to know that the optimal policy is a control limit policy as stated in the Theory.

The proof is now completed.

IV Bound Analysis for the Optimal Result

From the above analysis, we have known that discounted optimal policy is a control limit policy if the cost function $f(n_{1},n_{2})$ is convex and increasing on $n_{2}$ for any given $n_{1}$ . However, since identification of the optimal value is pretty important in reality, how to determine the corresponding threshold value of the control limit policy or the range of the value becomes a challenging issue as long as the optimal strategy if verified. In the Section, we will step toward to the identification of the optimal value’s range and derive a useful range result in terms of the lower bound and the upper bound of the optimal value. We now first introduce a general result as below.

Lemma 2: Let $h_{k}(i)$ be an integer concave function (k=1,2), and for a constant $R$ denote by

g_{k}(i)\equiv\max\{h_{k}(i),R+h_{k}(i+1)\},\hskip 14.22636ptk=1,2.

Then, $\Delta g_{1}(i)\leq\Delta g_{2}(i)$ holds if $\Delta h_{1}(i)\leq\Delta h_{2}(i)$ for any $i\geq 0.$

Proof: For any given integer $i$ , since $\Delta h_{1}(i)\leq\Delta h_{2}(i)$ , we only need to verify the result is true for the following three cases.

Case 1: If $\Delta h_{2}(i)\leq-R$ is true. In this case, since $\Delta g_{k}(i)=\Delta h_{k}(i)$ , it is straightforward to know

\Delta g_{1}(i)=\Delta h_{1}(i)\leq\Delta h_{2}(i)=\Delta g_{2}(i).

Case 2: If $-R\leq\Delta h_{1}(i)$ is true. In this case, we know $g_{k}(i)=R+h_{k}(i+1)$ (k=1, 2), and then,

	$\displaystyle\Delta g_{k}(i)$	$\displaystyle=$	$\displaystyle g_{k}(i+1)-[R+h_{k}(i+1)]$
		$\displaystyle=$	$\displaystyle\max\{-R,\Delta h_{k}(i+1)\}.$

From above result, it is also directly to know that

\Delta g_{1}(i)\leq\Delta g_{2}(i),

by noting $\Delta h_{1}(i+1)\leq\Delta h_{2}(i+1)$ .

Case 3: If $\Delta h_{1}(i)\leq-R\leq\Delta h_{2}(i)$ is true. In this case, from analysis in above Case 1 and Case 2, we know that $\Delta g_{1}(i)=\Delta h_{1}(i)$ , and

\Delta g_{2}(i)=\max\{-R,\Delta h_{2}(i+1)\}.

Therefore, $\Delta g_{1}(i)\leq-R\leq\Delta g_{2}(i)$ .

The proof is now completed.

With the condition that the cost function $f(n_{1},n_{2})$ is convex and increasing on $n_{2}$ for any given $n_{1}$ , from Theorem 1, we already know that $X(n_{1},n_{2})$ is concave and decreasing on $n_{2}$ for any $n_{1}$ , and therefore we will have

\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}-1)

\displaystyle\geq

\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}).

(47)

Also from Lemma 1 and equation (21) and (18), we can also have

	$\displaystyle\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)$	$\displaystyle\leq$	$\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}),$
	$\displaystyle\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)$	$\displaystyle\leq$	$\displaystyle\Delta_{n_{2}}X(n_{1},n_{2}).$		(48)

From these results, by noting equation (27), we may rewrite equation (31) as below:

	$\displaystyle\alpha\Delta_{n_{2}}X(n_{1},n_{2})+\Delta_{n_{2}}f(n_{1},n_{2})$	(49)
$\displaystyle=$	$\displaystyle\hskip 7.11317pt\lambda_{1}\big{[}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}$
	$\displaystyle+\lambda_{2}\big{[}\Delta_{n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}$
	$\displaystyle+C_{1}(n_{1})\mu_{1}\big{[}\Delta_{n_{2}}X(n_{1}-1,n_{2})-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}$
	$\displaystyle+C_{2}\mu_{2}\big{[}\Delta_{n_{2}}X(n_{1},n_{2}-1)-\Delta_{n_{2}}X(n_{1},n_{2})\big{]}.$

We also need to introduce a result below before presenting our major boundary result:

Lemma 3: If the cost function $f(n_{1},n_{2})$ is a convex and nondecreasing function on $n_{2}$ for any given $n_{1}$ , and $\Delta_{n_{2}}f(n_{1},n_{2})$ is a nondecreasing function on $n_{1}$ , then $\Delta_{n_{2}}X(n_{1},n_{2})$ a nonincreasing function on $n_{1}$ .

Proof: The detailed verification is included in Appendix.

From these preparation, we can know provide and prove our bound result as below.

Theorem 2: If the cost function $f(n_{1},n_{2})$ is a convex and nondecreasing function on $n_{2}$ for any given $n_{1}$ , and $\Delta_{n_{2}}f(n_{1},n_{2})$ is a nondecreasing function on $n_{1}$ , then we will have

1.

A lower bound of the optimal threshold value $N_{2}$ is given by

$N_{2*}=\max\left\{n_{2}:\Delta_{n_{2}}f(N_{1},n_{2})<\alpha R\right\},$

if there exists an integer $n_{2}$ such that

$\Delta_{n_{2}}f(N_{1},n_{2})<\alpha R.$ (50)

An upper bound of the optimal threshold value $N_{2}$ is given by

	$\displaystyle N^{*}_{2}=\min\Big{\{}n_{2}:\Delta_{n_{2}}f(0,n_{2})\hskip 102.43008pt$
	$\displaystyle\hskip 56.9055pt>(\alpha+\min(n_{2}+1,C)\mu_{2})R\Big{\}},$

if there exists an integer $n$ such that

\Delta_{n_{2}}f(0,n_{2})>(\alpha+\min(n_{2}+1,C)\mu_{2})R.

(51)

Proof: We will have the following justifications:

It is intuitive that an integer $n_{2}$ would be a lower bound for the optimal schedule value in the Theorem 1 as long as the action on the state $(n_{1},n_{2})$ is acceptance for an arrived SU. Next, since $N_{1}$ is the largest value of all $n_{1}$ in state $(n_{1},n_{2})$ , an acceptance action for arrived SU at state $(N_{1},n_{2})$ must result in an acceptance action for arrived SU at state $(n_{1},n_{2})$ . Therefore, the optimal threshold value in Theorem 1 for state $(N_{1},n_{2})$ is bigger than the optimal threshold value in Theorem 1 for state $(n_{1},n_{2})$ . This also implies that the lower bound of optimal threshold value in Theorem 1 for any state $(N_{1},n_{2})$ is more closer to the optimal threshold value than that for any state $(n_{1},n_{2})$ .

To verify the lower bound result in the Theorem 2, we only need to verify that the action on the state $(N_{1},n_{2})$ is acceptance for an arrived SU under the condition in equation (50). Or equivalently, we only need to verify that if the action on the state $(N_{1},n_{2})$ is rejection, then the equation (50) must not be valid. In fact, at state $(N_{1},n_{2})$ , there is no free VM for both PU and SU arrivals, which means the system will take the action to reject both PU and SU arrival. By noticing the equation (21) and (18), if system will reject the arrived PU and SU at sate $(N_{1},n_{2})$ , we have

	$\displaystyle\Delta_{n_{2}}v(\langle(N_{1},n_{2}),A_{1}\rangle)$	$\displaystyle=$	$\displaystyle\Delta_{n_{2}}X(N_{1},n_{2}),$
	$\displaystyle\Delta_{n_{2}}v(\langle(N_{1},n_{2}),A_{2}\rangle)$	$\displaystyle=$	$\displaystyle\Delta_{n_{2}}X(N_{1},n_{2}).$

Submitting these results into the equation (49), by recalling the concavity of $X(N_{1},n_{2}))$ on $n_{2}$ in the verification of Theorem 1 and the result in Lemma 3, we know the right-hand side of the equation is non-negative. Therefore, we will have

\Delta_{n_{2}}f(N_{1},n_{2})\geq-\alpha\Delta_{n_{2}}X(N_{1},n_{2})\geq\alpha R.

The last inequality comes from the rejection of SU at state $(N_{1},n_{2})$ by equation (18). Thus, we verified that the equation (50) is invalid. Therefore, the verification of the low bound for any state $(n_{1},n_{2}))$ is now completed.

On the contrary, an integer $n_{2}$ would be an upper bound for the optimal schedule value in the Theorem 1 as long as the action on the state $(n_{1},n_{2})$ is rejection for an arrived SU. We first consider the case of $n_{2}<C$ , which means there is at least one free VM when an SU arrives. Therefore, to verify the upper bound result in the Theorem 2, we only need to verify that the action on the state $(0,n_{2})$ is rejection for an arrived SU. Or equivalently, we only need to verify that if the action on the state $(0,n_{2})$ is acceptance, then the equation (51) must not be valid. In fact, at state $(0,n_{2})$ , by noticing the equation (21) and (18), if system will accept the arrived SU ( $T_{2}$ ) at sate $(0,n_{2})$ , $n_{2}<C$ , we have

	$\displaystyle\Delta_{n_{2}}v(\langle(0,n_{2}),A_{1}\rangle)$	$\displaystyle\leq$	$\displaystyle\Delta_{n_{2}}X(0,n_{2}),$
	$\displaystyle\Delta_{n_{2}}v(\langle(0,n_{2}),A_{2}\rangle)$	$\displaystyle\leq$	$\displaystyle\Delta_{n_{2}}X(0,n_{2}).$

Since the action is to accept $T_{2}$ arrival, from (18) we have

\displaystyle\Delta_{n_{2}}X(0,n_{2}-1)\geq

\displaystyle\Delta_{n_{2}}X(0,n_{2})\geq

\displaystyle-R.

Submitting these results into the equation (49), by again recalling the concavity of $X(n_{1},n_{2}))$ in the verification of Theorem 1 and the result in Lemma 3, we will have

(\alpha+(n_{2}+1)\mu_{2})\Delta_{n_{2}}X(0,n_{2})+\Delta_{n_{2}}f(0,n_{2})\leq 0.

Therefore, we will have

	$\displaystyle\Delta_{n_{2}}f(0,n_{2})$	$\displaystyle\leq$	$\displaystyle-(\alpha+(n_{2}+1)\mu_{2})\Delta_{n_{2}}X(0,n_{2})$
		$\displaystyle\leq$	$\displaystyle(\alpha+(n_{2}+1)\mu_{2})R.$

The last inequality comes from the acceptance of SU at state $(0,n_{2})$ for $n_{2}<C$ by equation (18). Thus, we verified that the equation (51) is invalid.

For the states $(0,n_{2})$ with $n\geq C$ , which means all the VMs are busy when an SU arrives, a similar upper bound deduction can be derived from equation (49) as

\Delta_{n_{2}}f(0,n_{2})>(\alpha+C\mu_{2})R.

Since $f(0,n_{2})$ is the smallest value of all $n_{1}$ , the upper bound for state $(0,n_{2})$ is therefore also an upper bound for any other state $(n_{1},n_{2})$ with $n_{1}>0$ . The verification of the upper bound in the Theorem is now completed by the justification in the beginning of this paragraph.

Remark 2: More specifically, if equation (50) never holds, i.e., $\Delta_{n_{2}}f(n_{1},n_{2})>\alpha R$ holds for all states, it can be observed, from equation (51) on the case when $n_{2}=0$ , that the upper bound becomes $0$ . This means the system will not accept any $T_{2}$ arrivals into the system if equation (50) never holds.

V Numerical Analysis

We have theoretically verified that the optimal policy to maximize the total expected discounted reward in equation (1) is a control limit policy or a threshold policy for accepting task-2 arrivals. Our CTMDP model consists of several parameters like arrival rates, departure rates, rewards and cost function, etc. Here for simulation and numerical analysis purpose we set some parameters as those listed in TABLE II. It can be seen from the Table II that the loads for $T_{1}$ and $T_{2}$ tasks are $\rho_{1}=\frac{\lambda_{1}}{\mu_{1}}=1/6$ and $\rho_{2}=\frac{\lambda_{2}}{\mu_{2}}=1/4$ , which means the system are lightly loaded. Other parameters of the system is set to be as $R=5$ , $r=0.5$ , $C=10$ , $b=5$ , discount factor be $\alpha=0.1$ , holding cost function be $f(x,y)=x^{2}+y^{2}$ .

TABLE II: Parameter value setting

	$\lambda$	$\mu$
$T_{1}$	1	6
$T_{2}$	2	8

However, it is always a challenging problem in exactly finding this threshold value or optimal objective value for a given problem, especially in a continuously changing environment, such as the changes of arrival rates, service rates, rewards in the real world. While we have demonstrated in above subsection on how to derive the thresholds by using value iteration method, the calculation through this way is always a time-consuming issue.

In the 3rd subsection, we propose the machine learning method and then demonstrate it to obtain or estimate the threshold value and the optimal objective value by using a feed-forward neural network model. A feed-forward neural network is an artificial neural network where connections between the units do not form a cycle.

V-A Threshold Policy

With this parameter setting, by using value iteration method we have

TABLE III:

X(n_{1},n_{2})

Values with Optimal Policy

	0			$n_{2}\rightarrow$		5
0	96.53	96.38	96.10	95.69	95.16	94.51
$n_{1}\downarrow$	96.50	96.33	96.03	95.61	95.07	94.40
2	96.43	96.24	95.89	95.38	94.72	93.89
	6			$n_{2}\rightarrow$		11
0	93.72	92.80	91.74	90.55	89.22	87.63
$n_{1}\downarrow$	93.51	92.41	91.11	89.61	87.90	85.93
2	92.81	91.49	89.94	88.14	86.11	83.78
	12			$n_{2}\rightarrow$		17
0	85.73	83.51	80.95	78.00	74.66	70.89
$n_{1}\downarrow$	83.65	81.03	78.03	74.62	70.79	66.50
2	81.11	78.06	74.61	70.71	66.35	61.51
	18			$n_{2}\rightarrow$		23
0	66.68	61.99	56.81	51.11	44.88	38.09
$n_{1}\downarrow$	61.73	56.46	50.68	44.34	37.44	29.94
2	56.17	50.29	43.87	36.87	29.26	21.02

As observed from TABLE III, $X(n_{1},n_{2})$ values are concave decreasing on $n_{2}$ direction, which fits our theoretical result.

TABLE IV: Actions for

T_{2}

task of Optimal Policy

	0				$n_{2}\rightarrow$			7
0	1	1	1	1	1	1	1	1
$n_{1}\downarrow$	1	1	1	1	1	1	1	1
2	1	1	1	1	1	1	1	1
	8				$n_{2}\rightarrow$			15
0	1	1	1	1	1	1	1	1
$n_{1}\downarrow$	1	1	1	1	1	1	1	1
2	1	1	1	1	1	1	1	1
	16				$n_{2}\rightarrow$			23
0	1	1	1	0	0	0	0	0
$n_{1}\downarrow$	1	1	0	0	0	0	0	0
2	1	0	0	0	0	0	0	0

As noticed from TABLE IV, ”1” means the system will accept the arrival, ”0” means the system will reject the arrival. Since the reward $R$ is large, from the table we see the system will accept $T_{2}$ tasks into the buffer even there is already some $T_{2}$ tasks waiting.

Next we decrease the reward $R=1$ , then we have the $X(n_{1},n_{2})$ Value as listed in TABLE V. As seen from TABLE VI, with a smaller reward, from the table we see the system will not accept many $T_{2}$ tasks into the waiting buffer.

TABLE V:

X(n_{1},n_{2})

Values with Optimal Policy

	0			$n_{2}\rightarrow$		5
0	16.53	16.38	16.10	15.69	15.16	14.51
$n_{1}\downarrow$	16.50	16.33	16.03	15.61	15.07	14.40
2	16.43	16.24	15.89	15.38	14.72	13.89
	6			$n_{2}\rightarrow$		11
0	13.72	12.80	11.75	10.57	9.26	7.68
$n_{1}\downarrow$	13.51	12.43	11.14	9.65	7.97	6.03
2	12.83	11.52	9.99	8.22	6.22	3.94
	12			$n_{2}\rightarrow$		17
0	5.82	3.64	1.12	-1.76	-5.03	-8.72
$n_{1}\downarrow$	3.79	1.22	-1.72	-5.05	-8.80	-12.99
2	1.32	-1.66	-5.04	-8.85	-13.11	-17.85

TABLE VI: Actions for

T_{2}

task of Optimal Policy

	0				$n_{2}\rightarrow$			7
0	1	1	1	1	1	1	1	1
$n_{1}\downarrow$	1	1	1	1	1	1	1	1
2	1	1	1	1	1	1	1	1
	8				$n_{2}\rightarrow$			15
1	1	1	1	0	0	0	0	0
$n_{1}\downarrow$	1	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0

V-B Threshold to Values

From equation (1), the optimal policy would get the maximum total expected discounted reward if starting from an initial state. Therefore, by using the obtained thresholds in a policy, we can also calculate the corresponding total expected discounted reward using equation (1). Using rate uniformization technique, the expected time between two epoch is $\frac{1}{c}$ . The calculation runs until the discount is less than $1E-6$ and the Expected $X(n_{1},n_{2})$ Values are shown in TABLE VII and VIII.

TABLE VII: Expected

X(n_{1},n_{2})

Values of Optimal Policy with

R=1

	0			$n_{2}\rightarrow$		5
0	16.53	16.38	16.10	15.69	15.16	14.51
$n_{1}\downarrow$	16.50	16.33	16.03	15.61	15.07	14.40
2	16.43	16.24	15.89	15.38	14.72	13.89
	6			$n_{2}\rightarrow$		11
0	13.72	12.80	11.75	10.57	9.26	7.68
$n_{1}\downarrow$	13.51	12.43	11.14	9.65	7.97	6.03
2	12.83	11.52	9.99	8.22	6.22	3.94
	12			$n_{2}\rightarrow$		17
0	5.82	3.64	1.12	-1.76	-5.03	-8.72
$n_{1}\downarrow$	3.79	1.22	-1.72	-5.05	-8.80	-12.99
2	1.32	-1.66	-5.04	-8.85	-13.11	-17.85

TABLE VIII: Expected

X(n_{1},n_{2})

Values of Optimal Policy with

R=5

	0			$n_{2}\rightarrow$		5
0	96.53	96.38	96.10	95.69	95.16	94.51
$n_{1}\downarrow$	96.50	96.33	96.03	95.61	95.07	94.40
2	96.43	96.24	95.89	95.38	94.72	93.89
	6			$n_{2}\rightarrow$		11
0	93.72	92.79	91.74	90.55	89.22	87.63
$n_{1}\downarrow$	93.51	92.41	91.11	89.61	87.90	85.93
2	92.81	91.49	89.94	88.14	86.11	83.78
	12			$n_{2}\rightarrow$		17
0	85.73	83.51	80.95	78.00	74.66	70.89
$n_{1}\downarrow$	83.65	81.03	78.03	74.62	70.79	66.49
2	81.11	78.06	74.61	70.71	66.35	61.51
	18			$n_{2}\rightarrow$		23
0	66.67	61.98	56.81	51.11	44.88	38.09
$n_{1}\downarrow$	61.73	56.46	50.67	44.34	37.44	29.94
2	56.17	50.29	43.87	36.86	29.26	21.02

Comparing Table VII and VIII to the TABLE III and V, we see the differences between these Tables is very small, which proves the correctness of the calculation from the Value Iteration method in the last subsection.

V-C Threshold Estimation with Machine Learning

Taking the parameters in the CTMDP model as the input and the corresponding threshold value as the output, we can build the neural network model as shown in Fig. 2. Based on these development, we can build the training data sets as shown in TABLE IX, here reward $R$ is chosen from $1,\,2,\,3,\,4,\,5,\,6,\,7,\,8$ , $\lambda_{2}$ is chosen from $1,\,2,\,3,\,4,\,5$ , and $\mu_{2}$ is chosen from $8,\,10,\,12,\,14,\,16$ , while keeping all the other parameters unchanged.

The combination of these parameters will give us $8*5*5=200$ training data inputs totally. More specifically, as shown in Fig. 2, this is a two-layer neural network with 30 hidden neurons. The inputs are the set of parameters in our CTMDP model, such as reward, arrival rates, departure rates, which makes the total input parameters be 5; output are the threshold values for each $n_{1}$ , in this case there are $3$ outputs, which is reflected in Fig. 2.

TABLE IX: Training Dataset for Neural Network Model

$R$	1, 2, 3, 4, 5, 6, 7, 8
$\lambda_{2}$	1, 2, 3, 4, 5
$\mu_{2}$	8, 10, 12, 14, 16

TABLE X: Thresholds comparison from Neural Network to Actual Value

	$R=1.3$	2.3	3.3	4.3
0	(11, 11)	(13, 13)	(16, 16)	(18, 18)
$n_{1}\downarrow$	(9, 9)	(12, 12)	(14, 15)	(17, 17)
2	(8, 7)	(11, 11)	(13, 14)	(16, 16)
	$R=5.3$	6.3	7.3	8.3
0	(20, 20)	(22, 22)	(23, 24)	(25, 25)
$n_{1}\downarrow$	(19, 18)	(21, 20)	(22, 22)	(24, 24)
2	(18, 18)	(19, 20)	(21, 21)	(23, 23)

After the model is trained, we can test the model with some other parameter settings which is different from those in the initial training dataset. If denote by $n_{r}$ as the real threshold values from Value Iteration method, $n_{m}$ as the threshold value through machine learning method, as listed in the Table IX, several parameter sets for $(R,\lambda_{2}=1,\mu_{2}=8)$ are chosen to compare the threshold pairs $(n_{r},n_{m})$ from actual computation and the neural network model. From TABLE X, it is observed that the machine learning model can do a good estimation of the thresholds.

VI Conclusion and Discussion

In order to reduce energy costs, cloud providers start to mix online and batch tasks in the same data center. However, it is always challengeable to efficiently optimize the assignment of the suitable resource to the users. To address this challenging problem, a novel model of a cloud data center is studied in this paper with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources. An online user or the user with batch tasks may decide whether or not to pay for getting the services. Particularly, the online service tasks have a higher priority over batch tasks and both types of tasks needs a certain number of virtual machines (VM). The objective here is to maximize the total discounted reward. The optimal policy to reach the objective for admitting task tasks is finally verified to be a state-related control limit policy. After the optimal policy is determined in general, it is hard to identify the range of the threshold values which is particularly useful in reality. Therefore, we further consider the possible boundaries of the optimal value. A lower and an upper bound for such an optimal policy are then derived, respectively, for the estimation purpose. Finally, a comprehensive set of experiments on the various cases to validate this proposed solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. It is our observation that the major idea of this research can be extended to the research to maximize the expected average award or other related objective functions. The results offered in this paper will be expectedly utilized in various cloud data centers with different cognitive characteristics in an economically optimal strategy.

[Verification of Lemma 3] We will now provide the Verification of Lemma 3. Similar as equation (31), for $n_{1}<N_{1}$ , we have

	$\displaystyle\big{(}\alpha+\beta_{0}(n_{1}+1,n_{2}+1)\big{)}\Delta_{n_{2}}X(n_{1}+1,n_{2})$	(52)
$\displaystyle=$	$\displaystyle-\Delta_{n_{2}}f(n_{1}+1,n_{2})$
	$\displaystyle+\lambda_{1}\Delta_{n_{2}}v(\langle(n_{1}+1,n_{2}),A_{1}\rangle)$
	$\displaystyle+\lambda_{2}\Delta_{n_{2}}v(\langle(n_{1}+1,n_{2}),A_{2}\rangle)$
	$\displaystyle+C_{1}(n_{1}+1)\mu_{1}\Delta_{n_{2}}X(n_{1},n_{2})$
	$\displaystyle+(C-C_{1}(n_{1}+1))\mu_{2}\Delta_{n_{2}}X(n_{1}+1,n_{2}-1).$

By a similar implementation on equation (31) and equation (52), for $n_{1}<N_{1}$ , we have

	$\displaystyle(\alpha+\beta_{0}(n_{1}+1,n_{2}+1))\Delta^{(2)}_{n_{1},n_{2}}X(n_{1},n_{2})$	(53)
$\displaystyle=$	$\displaystyle(\alpha+\beta_{0}(n_{1}+1,n_{2}+1))\Delta_{n_{2}}X(n_{1}+1,n_{2})$
	$\displaystyle-(\alpha+\beta_{0}(n_{1},n_{2}+1)+b\mu_{1}-b\mu_{2})\Delta_{n_{2}}X(n_{1},n_{2})$
$\displaystyle=$	$\displaystyle-\Delta^{(2)}_{n_{1},n_{2}}f(n_{1},n_{2})$
	$\displaystyle+\lambda_{1}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)$
	$\displaystyle+\lambda_{2}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle+C_{1}(n_{1}+1)\mu_{1}\Delta_{n_{2}}X(n_{1},n_{2})$
	$\displaystyle-C_{1}(n_{1})\Delta_{n_{2}}X(n_{1}-1,n_{2})$
	$\displaystyle+C_{2}(n_{1}+1,n_{2})\mu_{2}\Delta_{n_{2}}X(n_{1}+1,n_{2}-1)$
	$\displaystyle-C_{2}(n_{1},n_{2})\Delta_{n_{2}}X(n_{1},n_{2}-1)$
	$\displaystyle+b\mu_{2}\Delta_{n_{2}}X(n_{1},n_{2})-b\mu_{1}\Delta_{n_{2}}X(n_{1},n_{2})$
$\displaystyle=$	$\displaystyle-\Delta^{(2)}_{n_{1},n_{2}}f(n_{1},n_{2})$
	$\displaystyle+\lambda_{1}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{1}\rangle)$
	$\displaystyle+\lambda_{2}\Delta^{(2)}_{n_{1},n_{2}}v(\langle(n_{1},n_{2}),A_{2}\rangle)$
	$\displaystyle+C_{1}(n_{1})\mu_{1}\Delta^{(2)}_{n_{1},n_{2}}X(n_{1}-1,n_{2})$
	$\displaystyle+C_{2}(n_{1}+1,n_{2})\mu_{2}\Delta^{(2)}_{n_{1},n_{2}}X(n_{1},n_{2}-1)$
	$\displaystyle+b\mu_{2}\Delta^{(2)}_{n_{2}}X(n_{1},n_{2}-1).$

We can now use Value Iteration Method with three steps to show that $\Delta_{n_{2}}X(n_{1},n_{2})$ is a nonincreasing function on $n_{1}$ for any given $n_{2}$ as below:

Step A-1: Similar as the analysis in Theorem 1, set $X^{(0)}(n_{1},n_{2})=0$ , we will have

\Delta_{n_{2}}X^{(1)}(n_{1},n_{2})=-\frac{\Delta_{n_{2}}f(n_{1},n_{2})}{\alpha+c}.

Since $\Delta_{n_{2}}f(n_{1}+1,n_{2})\geq\Delta_{n_{2}}f(n_{1},n_{2}))$ , we will also have

			$\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2})-\Delta_{n_{2}}X^{(1)}(n_{1},n_{2}))$
		$\displaystyle=$	$\displaystyle\frac{\Delta_{n_{2}}f(n_{1},n_{2})-\Delta_{n_{2}}f(n_{1}+1,n_{2}))}{\alpha+c}$
		$\displaystyle\leq$	$\displaystyle 0.$

Step A-2: By using the result in above Step 1 and the equation (39), we know that

\Delta_{n_{2}}v^{(1)}(\langle(n_{1}+1,n_{2}),A_{1}\rangle)\leq\Delta_{n_{2}}v^{(1)}(\langle(n_{1},n_{2}),A_{1}\rangle).

From the result and verification process in Theorem 1, with the cost function being convex and nondecreasing which means $\Delta_{n_{2}}f(n_{1},n_{2}+1)\geq\Delta_{n_{2}}f(n_{1},n_{2})$ , we know that $X^{(1)}(n_{1},n_{2})$ is concave and nonincreasing on $n_{2}$ for any $n_{1}$ , which means

	$\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1},n_{2})$	$\displaystyle\leq$	$\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1},n_{2}-1),$
	$\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2})$	$\displaystyle\leq$	$\displaystyle\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2}-1).$

By further applying the result in Lemma 2, let $X^{(1)}(n_{1}+1,i)$ be $h_{1}(i)$ and $X^{(1)}(n_{1},i)$ be $h_{2}(i)$ . Since $\Delta_{n_{2}}X^{(1)}(n_{1}+1,n_{2})\leq\Delta_{n_{2}}X^{(1)}(n_{1},n_{2}))$ for any $n_{2}$ , by using equation (18), we will know that

\Delta_{n_{2}}v^{(1)}(\langle(n_{1}+1,n_{2}),A_{2}\rangle)\leq\Delta_{n_{2}}v^{(1)}(\langle(n_{1},n_{2}),A_{2}\rangle).

By using the results in equation (53), we will have

\Delta_{n_{2}}X^{(2)}(n_{1}+1,n_{2})-\Delta_{n_{2}}X^{(2)}(n_{1},n_{2}))\leq 0.

Step A-3: Finally, by noting the Theorem 11.3.2 of [16] that the optimality equation has the unique solution, we know the value iteration $X^{(n)}(n_{1},n_{2})$ will uniquely converges. Therefore, as the iteration continues, with $n$ goes to $\infty$ , we know that for any $n_{1}<N_{1}$ ,

\Delta_{n_{2}}X(n_{1}+1,n_{2})\leq\Delta_{n_{2}}X(n_{1},n_{2})),

always holds.

The verification of the Lemma 3 is now completed.

References

[1] A. Shehabi, S. Smith, D. Sartor, R. Brown, M. Herrlin, J. Koomey, E. Masanet, N. Horner, I. Azevedo, and W. Lintner, “United states data center energy usage report,” Jun 2016.
[2] C. Canali., R. Lancellotti., and M. Shojafar., “A computation- and network-aware energy optimization model for virtual machines allocation,” in Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,, INSTICC. SciTePress, 2017, pp. 71–81.
[3] J. Fu, B. Moran, J. Guo, E. W. M. Wong, and M. Zukerman, “Asymptotically optimal job assignment for energy-efficient processor-sharing server farms,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 4008–4023, Dec 2016.
[4] C. Canali, L. Chiaraviglio, R. Lancellotti, and M. Shojafar, “Joint minimization of the energy costs from computing, data transmission, and migrations in cloud data centers,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 2, pp. 580–595, Jun 2018.
[5] L. Wang, F. Zhang, J. A. Aroca, A. V. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “Greendcn: A general framework for achieving energy efficiency in data center networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 4–15, Jan 2014.
[6] J. Liu, Y. Zhang, Y. Zhou, D. Zhang, and H. Liu, “Aggressive resource provisioning for ensuring qos in virtualized environments,” IEEE Transactions on Cloud Computing, vol. 3, no. 2, pp. 119–131, Apr 2015.
[7] M. Alicherry and T. Lakshman, “Optimizing data access latencies in cloud systems by intelligent virtual machine placement,” in Proceedings - IEEE INFOCOM, Apr 2013, pp. 647–655.
[8] R. Cohen, L. Lewin-Eytan, J. Seffi) Naor, and D. Raz, “Almost optimal virtual machine placement for traffic intense data centers,” in Proceedings - IEEE INFOCOM, Apr 2013, pp. 355–359.
[9] J. Mei, K. Li, Z. Tong, Q. Li, and K. Li, “Profit maximization for cloud brokers in cloud computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 1, pp. 190–203, Jan 2019.
[10] V. D. Valerio, V. Cardellini, and F. L. Presti, “Optimal pricing and service provisioning strategies in cloud systems: A stackelberg game approach,” in 2013 IEEE Sixth International Conference on Cloud Computing, Jun 2013, pp. 115–122.
[11] J. He, D. Wu, Y. Zeng, X. Hei, and Y. Wen, “Toward optimal deployment of cloud-assisted video distribution services,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 1717–1728, Oct 2013.
[12] L. A. Barroso, J. Clidaras, and U. Hoelzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2013.
[13] C. Jiang, G. Han, J. Lin, G. Jia, W. Shi, and J. Wan, “Characteristics of co-allocated online services and batch jobs in internet data centers: A case study from alibaba cloud,” IEEE Access, vol. 7, pp. 22 495–22 508, Feb 2019.
[14] W. Ni, Y. Zhang, and W. W. Li, “An optimal strategy for resource utilization in cloud data centers,” IEEE Access, vol. 7, pp. 158 095–158 112, Oct 2019.
[15] W. Ni, W. Li, and M. Alam, “Determination of optimal call admission control policy in wireless networks,” IEEE Transactions on Wireless Communications, vol. 8, pp. 1038 – 1044, Mar 2009.
[16] M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Mar 2005.
[17] X. Chao, H. Chen, and W. Li, “Optimal control for a tandem network of queues with blocking,” Acta Mathematicae Applicatae Sinica, vol. 13, no. 4, pp. 425–437, Oct 1997. [Online]. Available: https://doi.org/10.1007/BF02009552