Delay Optimal Cross-Layer Scheduling Over Markov Channels with Power Constraint
Abstract
We consider a scenario where a power constrained transmitter delivers randomly arriving packets to the destination over Markov time-varying channel and adapts different transmission power to each channel state in order to guarantee successful transmission. To minimize the expected average transmission delay of each packet, we formulate the problem into a constrained Markov decision process (CMDP). We reveal the queue-length threshold structure of the optimal policy, i.e., the transmitter sends packets if and only if the queue length surpasses a threshold and obtain the optimal cross-layer scheduling strategy through linear programming (LP). Numerical results validate the performance of the proposed strategy and illustrate a delay-power tradeoff in such scenario.
Index Terms:
Cross-layer Control, delay-power trade-off, Constrained Markov Decision Process (CMDP).I Introduction
††footnotetext: —————–This work was supported in part by the National Key R&D Program of China under Grant 2017YFE0112300, Beijing National Research Center for Information Science and Technology under Grant BNR2019RC01014 and BNR2019TD01001. Corresponding author: Jintao Wang.
The proliferation of the Internet of Things (IoT) network and real time services raises high reliable and low latency data communication requirements on future networks. Due to the large number of access nodes, each communication device in such network is equipped with limited power. To meet both low latency and high throughput requirements under power constraint in time-varying wireless networks, efficient transmission strategy is needed.
Cross-layer control strategy has been an effective approach to reduce transmission delay in time-varying channels with limited power [1, 2, 3, 4, 5, 6]. By classifying channel states into ”Good” and ”Bad”, previous work [4, 5] studied energy efficient transmission strategy to minimize queueing delay for a point to point communication system. When different levels of power can be used to guarantee successful transmission in different channel states, Wang et al. derived the optimal stationary transmission policy that minimizes the queueing delay under an average power constraint [1, 2]. Notice that the above work assumed the channel fading to be an i.i.d process. A more practical assumption to model the time-varying channel is to assume that channel states evolve as a Markov chain similar to [7, 8].
To characterize the delay-power trade-off under a more practical Markov time-varying channel, we study joint queue aware and channel aware cross-layer control strategy to obtain a minimum delay performance. We formulate the delay-optimal scheduling problem into a constrained Markov decision process and obtain the optimal stationary randomized policy through Linear Programming (LP). Finally, the performance of the proposed algorithm is evaluated through simulations.
The rest of the paper is organized as follows: we formulate the overall scheduling problem in Section II. In Section III, we formulate the scheduling problem into a constrained Markov decision process (CMDP) and analyze its optimal structure. The optimal solution to the CMDP is then obtained through Linear Programming (LP). Section IV provides simulation results and Section V draws the conclusion.
Notations: The probability of event is denoted as and the expectation is denoted by .
II System Model
II-A Network Model
We consider the scheduling policy for a discrete time point-to-point communication system and let denote the index of slots. The number of packets arriving at the transmitter in the -th slot, denoted by , follows an i.i.d Bernoulli distribution with expectation . Those packets wait in an FIFO queue at the transmitter before they are sent out to the receiver and the queue length in slot is captured by . We assume the undelivered packets wait in a finite buffer of size and when the current queue length is about to exceed the buffer capacity, i.e, , arriving packets in th next slot will be discarded.
We assume packets are sent from the transmitter to the receiver through a time-varying wireless link. Assume that the channel state remains invariant in each slot and we model the channel in each slot as an ergodic -state Markov chain. Let denotes the conditional probability that channel state evolves from state to state , i.e.,
(1) |
With no loss of generality, we assume large indicates better channel quality and thus less power is needed to transmit a packet. When the current channels state is , i.e., , the transmitter uses power to guarantee successful transmission in channel state . Thus . In this work, similar to [1], we assume that at most one packet can be transmitted in each time slot and the transmitted packet will be successfully received at the end of the slot. Let denote the decision in -th slot, where denotes a transmission decision is made and indicates that the channel remains idle. To minimize the queueing delay under an average power constraint, the transmission decision is made based on the current channel state and queue length . The average power consumed over consecutive slots can thus be characterized by:
(2) |
And the queue length evolution is as follows:
(3) |
Here we assume the buffer size is large enough so that packet-loss due to full buffer is unlikely to happen. Then the average transmission delay of each packet can be computed by:
II-B Problem Formulation
Our goal is to design a non-anticipated policy that utilizes the current queue length and channel state information in making schedule decisions to minimize the average queueing delay under power constraint. Denote to be the set of non-anticipated scheduling decisions, then the optimization problem is organized as follows:
Problem 1 (Delay Optimal Scheduling Problem)
(4a) |
(4b) |
III Problem Resolution
In this section we first formulate the problem as a constrained Markov decision process (CMDP) and verify the threshold structure of the optimal policy. Then we derive the solution through linear programming (LP).
III-A Constrained Markov Decision Process Formulation
The scheduling problem can be formulated into a CMDP with the following four parts:
-
(1)
State Space: The state in slot can be characterized by the queue length and channel state .
-
(2)
Action Space: The source node can choose two possible actions in each slot. Action denotes that the node schedules to transmit a packet, while means that the transmitter stays idle in the slot. Thus the action space .
-
(3)
Transfer Function: The queue length in the next slot relies on the number of arriving packets and scheduling decision. If , then . Otherwise . Since the channel state evolves independently, the transfer function can be computed as follows:
(5a) (5b) (5c) (5d) -
(4)
One-Step Cost: The one-step cost includes delay cost and power cost. Let denote the delay cost while the power cost, then
(6a) (6b)
With the introduction of the above four elements, the Delay Optimal Scheduling Problem can be expressed as the following finite state constrained Markov Decision Process (CMDP):
(7a) | |||
(7b) | |||
(7c) |
III-B Queue-Length Threshold Structure
In this part we study the structure of the delay optimal policy. First we introduce a corollary from [9] to illustrate a basic property of the policy. Here deterministic policies refer to those whose transmission probability is either 1 or 0.
Corollary 1
An optimal stationary policy is a mixture of two deterministic policies . Let denote the weight of , then in each slot , policy selects policy with probability and policy with probability , i.e.,
(8) |
To obtain through LP, first we need to know the structure of . This optimum structure is obtained by analyzing the structure of and . To obtain and , we place the power constraint into the objective function with the Lagrangian multiplier :
(9) |
Theorem 1
Given , the optimal deterministic stationary policy to the unconstrained scheduling problem (9) is controlled by a queue length threshold for each channel state . When , the optimum policy transmits a packet i.e., , otherwise the transmitter idles and .
Proof:
The detail is provided in Appendix A. ∎
Theorem 1 indicates following the optimum strategy we will have a finite queue length, which is shown in the corollary below:
Corollary 2
When the queue capacity is sufficiently large, following the optimum stationary policy , the queue length will never reach .
Proof:
Suppose that the threshold values of the optimal deterministic policy and are and , respectively. Denote According to Theorem 1, since the optimum stationary policy is a mixture of and , whenever the queue length reaches , policy will transmit a packet, and thus the queue length will not exceeds . As long as the is selected sufficiently large, i.e., , we can guarantee the queue length following policy is smaller than . ∎
III-C Linear Programming Formulation
According to the previous analysis, the optimum policy to the above CMDP has a threshold structure. In this section, we obtain the optimum policy through LP.
Recall that we assume the buffer size is large enough so that the queue length is always smaller than . For each stationary policy , denote as the steady state distribution that the current queue length is and the channel state is . Let be the probability that the current queue length is , the channel state is and the transmitter schedules to send one packet. To transform the CMDP into an equivalent LP, first we introduce a set of variables , which denotes the probability of having packets waiting in the queue after sending a packet in channel state . The following theorem helps us to transform the CMDP into an equivalent LP:
Theorem 2
Solving the Delay-Opt Stationary Problem is equivalent to solve the following LP problem:
(10a) | ||||
s.t. | (10b) | |||
(10c) | ||||
(10d) | ||||
(10e) | ||||
(10f) |
where and is the stationary distribution of . is a matrix that characterizes the transformation from to , i.e.,
(11) |
can be obtained from the following equations:
(12) |
Proof:
The detail is given in Appendix C. ∎
According to the above theorem, we present the power-delay tradeoff as the following corollary:
Corollary 3
The optimal delay monotonically decreases with the available power .
Proof:
Suppose that when the power constraint is . Now we consider . Denote , then (10a),(10c) are equivalent to:
(13) |
(14) |
Since , there exists so that we can construct the following new policy:
(15a) | |||
(15b) | |||
where . |
We can choose sufficient small such that . Then
Notice (10d) is only related to the sum of all queue length under one particular channel state. Hence, given , we can always adjust the values of so that (10d) is satisfied. From (14), it can be easily observed that . Thus,
Therefore . ∎
IV Simulations
In this part simulation results are first provided to validate our proposed optimal policy and the trade-off between power and delay is revealed. We assume a 3-state channel which satisfies the majorization relationship and the transmission matrix is as follows:
(16) |
The power consumed in each channel state is and packets arrive at a rate of . Queue capacity is constrained by . We simulate each setting over slots. We compare the proposed scheduling policy with the greedy policy that transmits a packet whenever the transmitter has enough power.
We first evaluate the delay performance of our proposed policy. Fig. 1 plots the average delay of the optimal scheduling policy and greedy policy with power constraint . Moreover, the performance of the optimal policy is much better than the greedy policy when available power is limited, which proves the effectiveness of our strategy. Notice that the difference between these two policies vanishes as the power supply increases. This indicates that the optimal policy behaves similarly to the greedy policy to achieve shorter delay when the power supply is sufficient.

To study how our algorithm achieves such superior performance, Fig. 2 displays the threshold structure of the optimal policy given different power constraint. When the available power increases, the threshold becomes smaller, which means that the source node transmits a packet more frequently. Besides, it is observed that the delay-power trade-off curve consists of three linear segments, each corresponding to a different threshold value of the policy. This verifies the linear property of the problem.

V Conclusion
In this paper we study the delay minimization scheduling problem under power constraint in wireless networks with ergodic Markov channels. We first reveal the threshold structure and then obtain the optimal policy through linear programming. It is shown that the optimal scheduling awaits and utilizes better channel quality while maintaining a small delay. In the future we will extend the work to more general scenarios with multi-user scheduling and bandwidth constraint.
References
- [1] M. Wang, J. Liu, W. Chen, and A. Ephremides, “On delay-power tradeoff of rate adaptive wireless communications with random arrivals,” in GLOBECOM 2017 - 2017 IEEE Global Communications Conference, Dec 2017, pp. 1–6.
- [2] ——, “Joint queue-aware and channel-aware delay optimal scheduling of arbitrarily bursty traffic over multi-state time-varying channels,” IEEE Transactions on Communications, vol. 67, no. 1, pp. 503–517, Jan 2019.
- [3] J. Yang and S. Ulukus, “Delay-minimal transmission for average power constrained multi-access communications,” IEEE Transactions on Wireless Communications, vol. 9, no. 9, pp. 2754–2767, Sep. 2010.
- [4] E. Uysal-Biyikoglu, B. Prabhakar, and A. El Gamal, “Energy-efficient packet transmission over a wireless link,” IEEE/ACM Transactions on Networking, vol. 10, no. 4, pp. 487–499, Aug 2002.
- [5] R. A. Berry and R. G. Gallager, “Communication over fading channels with delay constraints,” IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 1135–1149, May 2002.
- [6] H. Tang, J. Wang, L. Song, and J. Song, “Scheduling to minimize age of information in multi-state time-varying networks with power constraints,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
- [7] K. Chen and L. Huang, “Timely-throughput optimal scheduling with prediction,” IEEE/ACM Transactions on Networking, vol. 26, no. 6, pp. 2457–2470, Dec 2018.
- [8] H. Tang, J. Wang, L. Song, and J. Song, “Minimizing age of information with power constraints: Multi-user opportunistic scheduling in multi-state time-varying channels,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 854–868, 2020.
- [9] E. Altman, Constrained Markov decision processes. CRC Press, 1999, vol. 7.
- [10] H. Tang, J. Wang, Z. Tang, and J. Song, “Scheduling to minimize age of synchronization in wireless broadcast networks with random updates,” IEEE Transactions on Wireless Communications, vol. 19, no. 6, pp. 4023–4037, 2020.
Appendix A Proof of Theorem 1
The structure that minimizes the time-average cost of the Lagrange function (9) is obtained by the optimum structure that minimizes the -discounted cost and letting . Following policy , the -discounted cost starting from the state , denoted by can be computed by:
(17) |
Then, suppose is the optimum policy that minimizes the -discounted cost. Then the value function satisfies the following Bellman equation:
(18) |
To validate the threshold structure, we first introduce the following lemma, proof details are similar to [10, Lemma 3]:
Lemma 1
Fix discounted factor and channel state , then the difference of value function monotonically increases with .
Denote to be the difference of the sum of reward and the value function in the next state of taking , i.e.,
(19) |
Plugging into the above equation we then have:
(20) |
Since we have shown that increases monotonically, it can be proved that decreases monotonically. As a result, when , it is indicated that the optimum policy assigns an update decision in state . Then for states , the optimum chooses to transmit because . Hence for each Lagrange multiplier , the optimal policy that minimizes the -discouted cost has two solutions: (1) There exists a set of thresholds . When the channel state is , policy sends a packet if the queue length surpasses the threshold . (2) Policy never sends a packet.
Next we show the latter scenario is impossible. Assume that the source node never transmits a packet in any channel state and any queue length, then , which means that sequence is bounded for any . According to Lemma 1, we obtain that the limit point of the sequence exists. Denote that . Let , then for channel state , the channel stays idle; for other states, the source node transmits a packet. For the first case we have:
(21) |
Similarly for the second case we have:
(22) |
Take , then in both cases the following equation holds:
(23) |
Let . Multiply each equation above by and sum them up, then we have:
which clearly contradicts itself. Thus the threshold for each channel state exits, which verifies the structure of the optimal policy in Theorem 1.
Appendix B Proof of Theorem 2
The power constraint (10b) can be easily obtained from the definition of . Now we derive the expression of average delay (10a). Consider the stationary probability of queue length being larger than and channel state being . On the one hand it can be characterized as follows:
(24) |
On the other hand, it can be transferred from the previous slot whose queue length is either larger than or equal to while the node receives a new packet and does not transmit it. Therefore we have:
(25) |
Thus:
(26) |
which is exactly (LABEL:relationship). From the above equation, we can obtain the transformation matrix . Take from 1 to and sum the equations up, then we have:
(27) |
By substituting the above equation into Little’s Theorem, we have (10a). Next we consider the steady-state constraint (10c). Notice that the sum of stationary distribution should equal 1, so we have:
(28) |
Since the communication channel is an ergodic Markov chain, we also needs to construct a constraint on the channel evolution. Consider the stationary distribution of the channel and we have:
(29) |
Substitute (11) into the above equation and we can obtain (10d). Finally considering that transmission probability and stationary distribution should not surpass 1, we have (10e) and (10f) respectively.