Task Admission Control and Boundary Analysis of Cognitive Cloud Data Centers
Abstract
A novel cloud data center (DC) model is studied here with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources and an online user or the user with batch tasks may decide whether or not to pay for getting the services. The online service tasks have a higher priority in getting the service over batch tasks. Both types of tasks need a certain number of virtual machines (VM). By targeting on the maximization of total discounted reward, an optimal policy for admitting task tasks is finally verified to be a state-related control limit policy. Next, a lower and an upper bound for such an optimal policy are derived, respectively, for the estimation and utilization in reality. Finally, a comprehensive set of experiments on the various cases to validate this proposed model and the solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. The results achieved in this paper will be expectedly utilized in various cloud data centers with cognitive characteristics in an economically optimal strategy.
Index Terms:
Cloud Data Center, Cognitive Network, Optimal Strategy, Cost Efficiency, Bandwidth Allocation.I Introduction
With the rapid development of Internet and computing technology, more and more data are continually being produced from all over the world. Nowadays, cloud computing has become fundamental for IT operations worldwide, replacing traditional business models. Enterprises can now access the vast majority of software and services online through a visualized environment, avoiding the need for expensive investments in IT infrastructure. On the other hand, challenges in cloud computing need to be addressed such as security, energy efficiency and automated service provisioning. According to various studies, electricity use by DCs has increased significantly due largely to explosive growth in both the number and density of data centers (DCs)[1]. To serve the vast and rapidly growing needs of online businesses, infrastructure providers often end up over provisioning their resources (e.g., number of physical servers) in the DC. Energy efficiency in DCs is a goal of fundamental importance. Various energy-aware scheduling approaches have been proposed to save power and energy consumptions in DCs by minimizing the number of active physical servers hosting the active virtual machines (VMs); VMs also can be migrated between different hosts if necessary[2, 3, 4].
There are basically two parties in the cloud computing paradigm: the cloud service providers (CSPs) and the clients, who each have their own roles in providing and using the computing resources. While enjoying the convenient on-demand access to computing resources or services, the clients need to pay for this access. CSPs can make a profit by charging their clients for the services provided. Clients can avoid the costs associated with ’inhouse’ provisioning of computing resources and at the same time have access to a larger pool of computing resources than they could possibly own by themselves. Many different approaches can be used to evaluate the cloud computing service quality, and various optimization methods can be used to optimize them [5, 6, 7, 8, 9, 10, 11].
Note that in general, there are two types of applications in the DC: service applications and batch applications [12]. Service applications tend to generate many requests with low processing needs whereas batch applications tend to include a small number of requests with large processing needs. Unlike the batch applications that are throughput sensitive, service applications are typically response-time sensitive. In order to reduce power and energy costs, CSPs can mix online and batch tasks on the same cluster [13]. Although the co-allocation of such tasks improves machine utilization, it challenges the DC scheduler especially for latency critical online services.
The problem under investigation in our current research is for a novel cognitive DC model wherein
-
1.
a DC can determine the cost of using resources and a cloud service user can decide whether or not it will pay the price for the resource for an incoming task;
-
2.
the online service tasks have a higher priority than batch tasks;
-
3.
both types of tasks need a certain number of virtual machines (VM) to process.
This paper’s major contributions include:
-
1.
In order to achieve the maximum total discounted expected reward for any initial state in a DC where the online services have a higher priority than the batch tasks, an Markov Decision Process model for the DCs that serving both online services and batch tasks is firstly established to gain the optimal policy in when to admit or reject a task. As far as we know, this is the first time that an cognitive DC model as described in this paper is modelled in this way with several major theoretical results obtained. This research has also significantly extend our previous work [14] with complicated challenges from non-priority treatment of the tasks in a regular DC model to the priority treatment of the task in a cognitive DC model.
-
2.
Through a systematic probability analysis, the optimal policy in when to admit or reject a task is finally verified to be a state-related control limit policy.
-
3.
Furthermore, to be more efficiently utilized in a cognitive DC model for the control limit policy, a lower and an upper bound, respectively, for the optimal policy values are derived theoretically also for the estimation in reality.
-
4.
Finally, a comprehensive set of experiments on the various cases to validate this proposed solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. The results offered in this paper will be expectedly utilized in various cloud data centers with different cognitive characteristics in an economically
II Model Description and Analysis
This section consists of two subsections, one is the description of the cognitive DC model with all needed parameters, and the other one is the establishment of the corresponding Markov decision process model along with the important components. Alternatively, the online service type is simplified as type-1 and batch task as type-2 tasks.
II-A The Description of the cognitive DC model
The cognitive DC center under investigation provides two services with service type (type-1: ) and batch type (type-2: ) of application task. is primary user (PU) and is secondary user (SU), each of which will require resources in the DC. A system work flow diagram is drawn in Fig. 1.

The other detailed assumptions for the cognitive DC are given as follows:
-
1.
There is a total number of C VMs defining the capacity of resources from the DCs, and two types of Tasks ( and ) in the system will share those C VMs. is a type- task that is a time-sensitive service type task and needs a number of (here b is a given non-zero positive integer) VMs for service; is a type- task that is a specific application task and needs one VM for a sequence of operation steps. The task has a higher priority than the task as explained in the following items. Generally, since the system mainly provides service for tasks, it is always assumed that , where is a positive integer.
-
2.
The arriving process for tasks and are Poisson processes with rates and , respectively. The task processing time for these tasks in one VM follows a negative exponential distribution with rates and , respectively. If a task is processed with multiple VMs, the service rate would be for that task.
-
3.
When a task comes to the system, the CSP will get it processed immediately when there is a VM available. However, if all VMs are busy, the CSP will decide whether to admit or reject the task based on the current number of type-1 and type-2 tasks in the system. The rejected tasks will leave the system. The admitted task will be put in a buffer waiting for the service whenever a VM is released for use. The waiting discipline in the buffer is ignored because the type-2 tasks in the buffer are indistinguishable. When a task completes the service, it will leave the system and the corresponding VM will be released for use by other tasks. When a task under processing is interrupted because of the arrival of a higher priority type-1 task, the interrupted task will be put in the buffer waiting for a new service whenever a VM is free to use.
-
4.
When a task comes to the system, let be the number of tasks currently in the system; the CSP will take different actions in the following two cases:
-
(a)
If , provide the best service through allocating VMs for service, which includes the possibility of interrupting the service of several Tasks under processing;
-
(b)
If , reject the task.
-
(a)
-
5.
In this paper, we focus on the admission control of a task. Admitting and then serving a task would contribute units of reward to the CSP. However, if a VM serving task is preempted by a higher priority task, there is a cost, say , for that interruption. To hold the tasks in the system, the CSP needs to pay a holding price at a rate to manage the VMs (resources) in the DCs when there are type-1 tasks in the process and type-2 tasks in the processing and in the buffer.
It is noteworthy to point put that some notations above, including , , and , and and , are used as the same as those in our recent work in [14] so that the audience may easily link up with our recent works in this area.
To better understanding all major given parameters in this paper, we summarize them in the Table I below:
C | Number of VMs in the DC |
---|---|
Arrival rate of () | |
Service rate of () | |
Continuous-time discount factor | |
Number of VMs needed for processing a | |
Interruption cost of a by a | |
Reward of completing a | |
Holding cost rate at state |
II-B Markov Decision Process Model and its Components Analysis
Our objective in this research is to find the optimal policy such that the total expected discounted reward is maximized. In detail, if denote by the state at time , the action to take at state , and for the reward obtained when action is selected at state , our objective is to find an optimal policy that can bring the maximum total expected discounted reward as defined below for every initial state .
(1) |
Here, a policy specifies the decision rule to be used at every decision epoch, is the discount factor. It gives the decision maker a prescription for action selection for any possible future system state or history.
Based on the model description and above objective, we can now establish a Markov decision process as follows:
-
1.
Let’s define the state space of the system operating process as , where integers and satisfy and . The event space is defined by , where and means a and departure from the system after service, while means an arrival of a task, is an arrival of task. Since the states migration not only depends on the number of tasks in the system but also depends on the happening departure and arrival events, we will need to define a new state space as . By doing so a state could be generally written as , where and are the numbers for and tasks, stands for the event which will probably happen on state , . Please be noticed that the specification of the event in this paper is one of major technical differences from that in paper [15], in which the event is assumed to happen before a state changes.
-
2.
Denote by as the action to continue, the action space for states and in the state space is then given by
Similarly, in states and , if denote by as the action to reject the request and as the action to admit, the action space will be
-
3.
Let be the number of VMs occupied by tasks, be the number of VMs occupied by tasks, be the number of VMs serving tasks that will be pre-empted if admitting a task, sometimes simplified as , and in this paper. From these definitions, it is easy to know that
The decision epochs are those time points when a call arriving or leaving the system. Based on our assumption, it is not too hard to know that the distribution of time between two epochs is
where . Denote by and Since a departure event only happens when there is a task in the system, the will be represented for an action as
-
4.
Let denote the probability that the system occupies state in the next epoch, if at the current epoch the system is at state and the decision maker takes action . For the cases of departure events, e.g. for a departure event of under the condition of (), , if denote by , then we will have as
Similar equations can be derived for case when . If denote by at the condition of , we have as
For the cases of arrival events, such as , and , since admitting an incoming call migrates the system state immediately (adding one user or not), we will get as
-
5.
Because the system state does not change between decision epochs, from Chp 11.5.2 [16] and our assumptions, the expected discounted reward between epochs satisfies
where
Also, we have the cost function as
In the next section we will prove that there exists a state-related threshold for accepting the tasks if the cost function has some special properties.
III Optimal Stationary State-related Control Limit Policy
A policy is stationary if, for each decision epoch , decision rule at epoch is the same. Furthermore, a policy is called a control limit policy (or a threshold policy) for a given number of Tasks and in the system, for task, is there a constant or threshold such that the system will accept the arriving whenever the number of currently in the system is less than , that means the decision rule for is:
(10) |
III-A Total Discounted Reward
Denote by a constant , which is bigger than . From Chp 11.5.2 [16], we know that there is a unique optimal solution of our model and this solution is also a stationary state-related control limit policy satisfying
(11) |
By using above equation and also the uniformization technique described in [16], we have
(12) | |||||
This means that
(13) | |||||
Similarly, it is easily found that
which shows the equality between different departure events. This leads us to define a new function as below:
(14) | |||||
(15) |
for any and .
It is noticed that , is only related to the state, but not with the happening event. This observation will greatly simplify the needed proof processes in next several sections.
Similar as above results for a departure event, we can consider an arrival event and will get the following results:
(16) | |||||
and
(17) | |||||
From above equations, we can easily get
In fact, these two inequalities will be the equalities when the corresponding action or is the best action, respectively. From these analysis, it is not too hard to verify that
(18) | |||||
For the tasks, as the system always accepts them until all VMs are being used, we have
(21) | |||||
III-B Optimal Result
Before providing our major optimal result, we need to introduce a general result as below.
Lemma 1: Let () be an integer concave function, and denote by
for a given constant . Then, is also an integer concave function for .
Proof: Denote by for any integer , we will prove this Lemma by considering the following three cases:
Case 1: If for any , . Thus, and is then concave.
Case 2: If for any , . Thus, and is then concave.
Case 3: We now consider the case when both of above cases would not be true. In this case, we can first know that . In fact, if it is not sure, i.e., if , we can inductively verify that for any by noting the assumption that is an integer concave function.
Since , is decreasing because of the concavity of , and the Case 2 will not hold for all in this Case 3, there then must exist an integer such that
From this analysis, we will know that for any ,
The concavity of , i.e., , is then verified by the concavity of , of a constant , and the following further notification for any :
Remark 1: It is worthy to point out that there is no condition for the given constant in the Lemma 1. This constant could be either negative or positive. In fact, we introduced this result in paper [17] for the case when the constant is non-positive, and in paper [14] for the case when the constant is non-negative. However, it is the first time in this paper that we point out this general result without any condition on the constant and also provide the mathematical verification.
Based on above Lemma 1 and the expression developed before Lemma 1, we can now provide and then verify the following major optimal result:
Theorem 1: If is convex and increasing function on for any given , the optimal policy is then a control limit policy. That means, for any state , there must exist an integer, say , such that decision
(26) |
Proof: If all VMs are busy when an SU arrives at state , we know that and then
Therefore
(27) | |||||
which is independent of . Further, by using the notation of , we can rewrite the equation (13) as below:
(28) | |||||
For any two-dimensional integer function (), we introduce the following definitions for and , respectively:
(29) | |||||
(30) |
From the observation in equation (27) and the equation (28), we will have
(31) | |||||
Next, by noting
(35) | |||||
and equation (21), we will have
(39) | |||||
By a similar implementation on about two equations (31) and (39) by using the results in equations (27) and (35), we have
(40) | |||||
and
(44) | |||||
With the preparations on all equations from equation (28) to (44), we can now use Value Iteration Method with three steps to show that for all states is concave and nonincreasing for nonnegative integer function on for any given as below:
Step 1: Set , by noting equations (21) and (18), we know and
Substitute these three results into equation (12), we will have
Therefore, for any , is concave and nonincreasing on .
Step 2: By using above concavity and non-increasing property of , and the equation (21) for the case when all VMs are busy for state , or equations (39) and (44), we know that is concave and non-increasing functions for any . By further applying the result in Lemma 1, we know that is also concave and non-increasing functions for any . With these results in mind, and using the results in equations (31) and (40), we will know that
These two inequalities justify that for any , is nonincreasing and concave on .
Step 3: Finally, by noting the Theorem 11.3.2 of [16] that the optimality equation has the unique solution, we know the value iteration will uniquely converges. Therefore, as the iteration continues, with goes to , we know that for any , is always concave nonincreasing for any .
Finally, by using the equation (18) and the property of nonincreasing and concave on for , it is straight forward to know that the optimal policy is a control limit policy as stated in the Theory.
The proof is now completed.
IV Bound Analysis for the Optimal Result
From the above analysis, we have known that discounted optimal policy is a control limit policy if the cost function is convex and increasing on for any given . However, since identification of the optimal value is pretty important in reality, how to determine the corresponding threshold value of the control limit policy or the range of the value becomes a challenging issue as long as the optimal strategy if verified. In the Section, we will step toward to the identification of the optimal value’s range and derive a useful range result in terms of the lower bound and the upper bound of the optimal value. We now first introduce a general result as below.
Lemma 2: Let be an integer concave function (k=1,2), and for a constant denote by
Then, holds if for any
Proof: For any given integer , since , we only need to verify the result is true for the following three cases.
Case 1: If is true. In this case, since , it is straightforward to know
Case 2: If is true. In this case, we know (k=1, 2), and then,
From above result, it is also directly to know that
by noting .
Case 3: If is true. In this case, from analysis in above Case 1 and Case 2, we know that , and
Therefore, .
The proof is now completed.
With the condition that the cost function is convex and increasing on for any given , from Theorem 1, we already know that is concave and decreasing on for any , and therefore we will have
(47) |
Also from Lemma 1 and equation (21) and (18), we can also have
(48) |
(49) | |||||
We also need to introduce a result below before presenting our major boundary result:
Lemma 3: If the cost function is a convex and nondecreasing function on for any given , and is a nondecreasing function on , then a nonincreasing function on .
Proof: The detailed verification is included in Appendix.
From these preparation, we can know provide and prove our bound result as below.
Theorem 2: If the cost function is a convex and nondecreasing function on for any given , and is a nondecreasing function on , then we will have
-
1.
A lower bound of the optimal threshold value is given by
if there exists an integer such that
(50) -
2.
An upper bound of the optimal threshold value is given by
if there exists an integer such that
(51)
Proof: We will have the following justifications:
-
1.
It is intuitive that an integer would be a lower bound for the optimal schedule value in the Theorem 1 as long as the action on the state is acceptance for an arrived SU. Next, since is the largest value of all in state , an acceptance action for arrived SU at state must result in an acceptance action for arrived SU at state . Therefore, the optimal threshold value in Theorem 1 for state is bigger than the optimal threshold value in Theorem 1 for state . This also implies that the lower bound of optimal threshold value in Theorem 1 for any state is more closer to the optimal threshold value than that for any state .
To verify the lower bound result in the Theorem 2, we only need to verify that the action on the state is acceptance for an arrived SU under the condition in equation (50). Or equivalently, we only need to verify that if the action on the state is rejection, then the equation (50) must not be valid. In fact, at state , there is no free VM for both PU and SU arrivals, which means the system will take the action to reject both PU and SU arrival. By noticing the equation (21) and (18), if system will reject the arrived PU and SU at sate , we have
Submitting these results into the equation (49), by recalling the concavity of on in the verification of Theorem 1 and the result in Lemma 3, we know the right-hand side of the equation is non-negative. Therefore, we will have
The last inequality comes from the rejection of SU at state by equation (18). Thus, we verified that the equation (50) is invalid. Therefore, the verification of the low bound for any state is now completed.
-
2.
On the contrary, an integer would be an upper bound for the optimal schedule value in the Theorem 1 as long as the action on the state is rejection for an arrived SU. We first consider the case of , which means there is at least one free VM when an SU arrives. Therefore, to verify the upper bound result in the Theorem 2, we only need to verify that the action on the state is rejection for an arrived SU. Or equivalently, we only need to verify that if the action on the state is acceptance, then the equation (51) must not be valid. In fact, at state , by noticing the equation (21) and (18), if system will accept the arrived SU () at sate , , we have
Since the action is to accept arrival, from (18) we have
Submitting these results into the equation (49), by again recalling the concavity of in the verification of Theorem 1 and the result in Lemma 3, we will have
Therefore, we will have
The last inequality comes from the acceptance of SU at state for by equation (18). Thus, we verified that the equation (51) is invalid.
For the states with , which means all the VMs are busy when an SU arrives, a similar upper bound deduction can be derived from equation (49) as
Since is the smallest value of all , the upper bound for state is therefore also an upper bound for any other state with . The verification of the upper bound in the Theorem is now completed by the justification in the beginning of this paragraph.
Remark 2: More specifically, if equation (50) never holds, i.e., holds for all states, it can be observed, from equation (51) on the case when , that the upper bound becomes . This means the system will not accept any arrivals into the system if equation (50) never holds.
V Numerical Analysis
We have theoretically verified that the optimal policy to maximize the total expected discounted reward in equation (1) is a control limit policy or a threshold policy for accepting task-2 arrivals. Our CTMDP model consists of several parameters like arrival rates, departure rates, rewards and cost function, etc. Here for simulation and numerical analysis purpose we set some parameters as those listed in TABLE II. It can be seen from the Table II that the loads for and tasks are and , which means the system are lightly loaded. Other parameters of the system is set to be as , , , , discount factor be , holding cost function be .
1 | 6 | |
2 | 8 |
However, it is always a challenging problem in exactly finding this threshold value or optimal objective value for a given problem, especially in a continuously changing environment, such as the changes of arrival rates, service rates, rewards in the real world. While we have demonstrated in above subsection on how to derive the thresholds by using value iteration method, the calculation through this way is always a time-consuming issue.
In the 3rd subsection, we propose the machine learning method and then demonstrate it to obtain or estimate the threshold value and the optimal objective value by using a feed-forward neural network model. A feed-forward neural network is an artificial neural network where connections between the units do not form a cycle.
V-A Threshold Policy
With this parameter setting, by using value iteration method we have
0 | 5 | |||||
0 | 96.53 | 96.38 | 96.10 | 95.69 | 95.16 | 94.51 |
96.50 | 96.33 | 96.03 | 95.61 | 95.07 | 94.40 | |
2 | 96.43 | 96.24 | 95.89 | 95.38 | 94.72 | 93.89 |
6 | 11 | |||||
0 | 93.72 | 92.80 | 91.74 | 90.55 | 89.22 | 87.63 |
93.51 | 92.41 | 91.11 | 89.61 | 87.90 | 85.93 | |
2 | 92.81 | 91.49 | 89.94 | 88.14 | 86.11 | 83.78 |
12 | 17 | |||||
0 | 85.73 | 83.51 | 80.95 | 78.00 | 74.66 | 70.89 |
83.65 | 81.03 | 78.03 | 74.62 | 70.79 | 66.50 | |
2 | 81.11 | 78.06 | 74.61 | 70.71 | 66.35 | 61.51 |
18 | 23 | |||||
0 | 66.68 | 61.99 | 56.81 | 51.11 | 44.88 | 38.09 |
61.73 | 56.46 | 50.68 | 44.34 | 37.44 | 29.94 | |
2 | 56.17 | 50.29 | 43.87 | 36.87 | 29.26 | 21.02 |
As observed from TABLE III, values are concave decreasing on direction, which fits our theoretical result.
0 | 7 | |||||||
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
8 | 15 | |||||||
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
16 | 23 | |||||||
0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
As noticed from TABLE IV, ”1” means the system will accept the arrival, ”0” means the system will reject the arrival. Since the reward is large, from the table we see the system will accept tasks into the buffer even there is already some tasks waiting.
Next we decrease the reward , then we have the Value as listed in TABLE V. As seen from TABLE VI, with a smaller reward, from the table we see the system will not accept many tasks into the waiting buffer.
0 | 5 | |||||
0 | 16.53 | 16.38 | 16.10 | 15.69 | 15.16 | 14.51 |
16.50 | 16.33 | 16.03 | 15.61 | 15.07 | 14.40 | |
2 | 16.43 | 16.24 | 15.89 | 15.38 | 14.72 | 13.89 |
6 | 11 | |||||
0 | 13.72 | 12.80 | 11.75 | 10.57 | 9.26 | 7.68 |
13.51 | 12.43 | 11.14 | 9.65 | 7.97 | 6.03 | |
2 | 12.83 | 11.52 | 9.99 | 8.22 | 6.22 | 3.94 |
12 | 17 | |||||
0 | 5.82 | 3.64 | 1.12 | -1.76 | -5.03 | -8.72 |
3.79 | 1.22 | -1.72 | -5.05 | -8.80 | -12.99 | |
2 | 1.32 | -1.66 | -5.04 | -8.85 | -13.11 | -17.85 |
0 | 7 | |||||||
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
8 | 15 | |||||||
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
V-B Threshold to Values
From equation (1), the optimal policy would get the maximum total expected discounted reward if starting from an initial state. Therefore, by using the obtained thresholds in a policy, we can also calculate the corresponding total expected discounted reward using equation (1). Using rate uniformization technique, the expected time between two epoch is . The calculation runs until the discount is less than and the Expected Values are shown in TABLE VII and VIII.
0 | 5 | |||||
0 | 16.53 | 16.38 | 16.10 | 15.69 | 15.16 | 14.51 |
16.50 | 16.33 | 16.03 | 15.61 | 15.07 | 14.40 | |
2 | 16.43 | 16.24 | 15.89 | 15.38 | 14.72 | 13.89 |
6 | 11 | |||||
0 | 13.72 | 12.80 | 11.75 | 10.57 | 9.26 | 7.68 |
13.51 | 12.43 | 11.14 | 9.65 | 7.97 | 6.03 | |
2 | 12.83 | 11.52 | 9.99 | 8.22 | 6.22 | 3.94 |
12 | 17 | |||||
0 | 5.82 | 3.64 | 1.12 | -1.76 | -5.03 | -8.72 |
3.79 | 1.22 | -1.72 | -5.05 | -8.80 | -12.99 | |
2 | 1.32 | -1.66 | -5.04 | -8.85 | -13.11 | -17.85 |
0 | 5 | |||||
0 | 96.53 | 96.38 | 96.10 | 95.69 | 95.16 | 94.51 |
96.50 | 96.33 | 96.03 | 95.61 | 95.07 | 94.40 | |
2 | 96.43 | 96.24 | 95.89 | 95.38 | 94.72 | 93.89 |
6 | 11 | |||||
0 | 93.72 | 92.79 | 91.74 | 90.55 | 89.22 | 87.63 |
93.51 | 92.41 | 91.11 | 89.61 | 87.90 | 85.93 | |
2 | 92.81 | 91.49 | 89.94 | 88.14 | 86.11 | 83.78 |
12 | 17 | |||||
0 | 85.73 | 83.51 | 80.95 | 78.00 | 74.66 | 70.89 |
83.65 | 81.03 | 78.03 | 74.62 | 70.79 | 66.49 | |
2 | 81.11 | 78.06 | 74.61 | 70.71 | 66.35 | 61.51 |
18 | 23 | |||||
0 | 66.67 | 61.98 | 56.81 | 51.11 | 44.88 | 38.09 |
61.73 | 56.46 | 50.67 | 44.34 | 37.44 | 29.94 | |
2 | 56.17 | 50.29 | 43.87 | 36.86 | 29.26 | 21.02 |
Comparing Table VII and VIII to the TABLE III and V, we see the differences between these Tables is very small, which proves the correctness of the calculation from the Value Iteration method in the last subsection.
V-C Threshold Estimation with Machine Learning
Taking the parameters in the CTMDP model as the input and the corresponding threshold value as the output, we can build the neural network model as shown in Fig. 2. Based on these development, we can build the training data sets as shown in TABLE IX, here reward is chosen from , is chosen from , and is chosen from , while keeping all the other parameters unchanged.
The combination of these parameters will give us training data inputs totally. More specifically, as shown in Fig. 2, this is a two-layer neural network with 30 hidden neurons. The inputs are the set of parameters in our CTMDP model, such as reward, arrival rates, departure rates, which makes the total input parameters be 5; output are the threshold values for each , in this case there are outputs, which is reflected in Fig. 2.
1, 2, 3, 4, 5, 6, 7, 8 | |
---|---|
1, 2, 3, 4, 5 | |
8, 10, 12, 14, 16 |

2.3 | 3.3 | 4.3 | ||
---|---|---|---|---|
0 | (11, 11) | (13, 13) | (16, 16) | (18, 18) |
(9, 9) | (12, 12) | (14, 15) | (17, 17) | |
2 | (8, 7) | (11, 11) | (13, 14) | (16, 16) |
6.3 | 7.3 | 8.3 | ||
0 | (20, 20) | (22, 22) | (23, 24) | (25, 25) |
(19, 18) | (21, 20) | (22, 22) | (24, 24) | |
2 | (18, 18) | (19, 20) | (21, 21) | (23, 23) |
After the model is trained, we can test the model with some other parameter settings which is different from those in the initial training dataset. If denote by as the real threshold values from Value Iteration method, as the threshold value through machine learning method, as listed in the Table IX, several parameter sets for are chosen to compare the threshold pairs from actual computation and the neural network model. From TABLE X, it is observed that the machine learning model can do a good estimation of the thresholds.
VI Conclusion and Discussion
In order to reduce energy costs, cloud providers start to mix online and batch tasks in the same data center. However, it is always challengeable to efficiently optimize the assignment of the suitable resource to the users. To address this challenging problem, a novel model of a cloud data center is studied in this paper with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources. An online user or the user with batch tasks may decide whether or not to pay for getting the services. Particularly, the online service tasks have a higher priority over batch tasks and both types of tasks needs a certain number of virtual machines (VM). The objective here is to maximize the total discounted reward. The optimal policy to reach the objective for admitting task tasks is finally verified to be a state-related control limit policy. After the optimal policy is determined in general, it is hard to identify the range of the threshold values which is particularly useful in reality. Therefore, we further consider the possible boundaries of the optimal value. A lower and an upper bound for such an optimal policy are then derived, respectively, for the estimation purpose. Finally, a comprehensive set of experiments on the various cases to validate this proposed solution is conducted. As a demonstration, the machine learning method is adopted to show how to obtain the optimal values by using a feed-forward neural network model. It is our observation that the major idea of this research can be extended to the research to maximize the expected average award or other related objective functions. The results offered in this paper will be expectedly utilized in various cloud data centers with different cognitive characteristics in an economically optimal strategy.
[Verification of Lemma 3] We will now provide the Verification of Lemma 3. Similar as equation (31), for , we have
(52) | |||||
We can now use Value Iteration Method with three steps to show that is a nonincreasing function on for any given as below:
Step A-1: Similar as the analysis in Theorem 1, set , we will have
Since , we will also have
Step A-2: By using the result in above Step 1 and the equation (39), we know that
From the result and verification process in Theorem 1, with the cost function being convex and nondecreasing which means , we know that is concave and nonincreasing on for any , which means
By further applying the result in Lemma 2, let be and be . Since for any , by using equation (18), we will know that
By using the results in equation (53), we will have
Step A-3: Finally, by noting the Theorem 11.3.2 of [16] that the optimality equation has the unique solution, we know the value iteration will uniquely converges. Therefore, as the iteration continues, with goes to , we know that for any ,
always holds.
The verification of the Lemma 3 is now completed.
References
- [1] A. Shehabi, S. Smith, D. Sartor, R. Brown, M. Herrlin, J. Koomey, E. Masanet, N. Horner, I. Azevedo, and W. Lintner, “United states data center energy usage report,” Jun 2016.
- [2] C. Canali., R. Lancellotti., and M. Shojafar., “A computation- and network-aware energy optimization model for virtual machines allocation,” in Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,, INSTICC. SciTePress, 2017, pp. 71–81.
- [3] J. Fu, B. Moran, J. Guo, E. W. M. Wong, and M. Zukerman, “Asymptotically optimal job assignment for energy-efficient processor-sharing server farms,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 4008–4023, Dec 2016.
- [4] C. Canali, L. Chiaraviglio, R. Lancellotti, and M. Shojafar, “Joint minimization of the energy costs from computing, data transmission, and migrations in cloud data centers,” IEEE Transactions on Green Communications and Networking, vol. 2, no. 2, pp. 580–595, Jun 2018.
- [5] L. Wang, F. Zhang, J. A. Aroca, A. V. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “Greendcn: A general framework for achieving energy efficiency in data center networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 4–15, Jan 2014.
- [6] J. Liu, Y. Zhang, Y. Zhou, D. Zhang, and H. Liu, “Aggressive resource provisioning for ensuring qos in virtualized environments,” IEEE Transactions on Cloud Computing, vol. 3, no. 2, pp. 119–131, Apr 2015.
- [7] M. Alicherry and T. Lakshman, “Optimizing data access latencies in cloud systems by intelligent virtual machine placement,” in Proceedings - IEEE INFOCOM, Apr 2013, pp. 647–655.
- [8] R. Cohen, L. Lewin-Eytan, J. Seffi) Naor, and D. Raz, “Almost optimal virtual machine placement for traffic intense data centers,” in Proceedings - IEEE INFOCOM, Apr 2013, pp. 355–359.
- [9] J. Mei, K. Li, Z. Tong, Q. Li, and K. Li, “Profit maximization for cloud brokers in cloud computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 1, pp. 190–203, Jan 2019.
- [10] V. D. Valerio, V. Cardellini, and F. L. Presti, “Optimal pricing and service provisioning strategies in cloud systems: A stackelberg game approach,” in 2013 IEEE Sixth International Conference on Cloud Computing, Jun 2013, pp. 115–122.
- [11] J. He, D. Wu, Y. Zeng, X. Hei, and Y. Wen, “Toward optimal deployment of cloud-assisted video distribution services,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 1717–1728, Oct 2013.
- [12] L. A. Barroso, J. Clidaras, and U. Hoelzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2013.
- [13] C. Jiang, G. Han, J. Lin, G. Jia, W. Shi, and J. Wan, “Characteristics of co-allocated online services and batch jobs in internet data centers: A case study from alibaba cloud,” IEEE Access, vol. 7, pp. 22 495–22 508, Feb 2019.
- [14] W. Ni, Y. Zhang, and W. W. Li, “An optimal strategy for resource utilization in cloud data centers,” IEEE Access, vol. 7, pp. 158 095–158 112, Oct 2019.
- [15] W. Ni, W. Li, and M. Alam, “Determination of optimal call admission control policy in wireless networks,” IEEE Transactions on Wireless Communications, vol. 8, pp. 1038 – 1044, Mar 2009.
- [16] M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Mar 2005.
- [17] X. Chao, H. Chen, and W. Li, “Optimal control for a tandem network of queues with blocking,” Acta Mathematicae Applicatae Sinica, vol. 13, no. 4, pp. 425–437, Oct 1997. [Online]. Available: https://doi.org/10.1007/BF02009552