This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Delay Optimal Cross-Layer Scheduling Over Markov Channels with Power Constraint

Wenhao Zhan1, Haoyue Tang1, Jintao Wang1,2 1Beijing National Research Center for Information Science and Technology (BNRist),
Dept. of Electronic Engineering, Tsinghua University, Beijing 100084, China
2Research Institute of Tsinghua University in Shenzhen, Shenzhen, 518057
{zhanwh17@mails, thy17@mails, wangjintao@}tsinghua.edu.cn
Abstract

We consider a scenario where a power constrained transmitter delivers randomly arriving packets to the destination over Markov time-varying channel and adapts different transmission power to each channel state in order to guarantee successful transmission. To minimize the expected average transmission delay of each packet, we formulate the problem into a constrained Markov decision process (CMDP). We reveal the queue-length threshold structure of the optimal policy, i.e., the transmitter sends packets if and only if the queue length surpasses a threshold and obtain the optimal cross-layer scheduling strategy through linear programming (LP). Numerical results validate the performance of the proposed strategy and illustrate a delay-power tradeoff in such scenario.

Index Terms:
Cross-layer Control, delay-power trade-off, Constrained Markov Decision Process (CMDP).

I Introduction

footnotetext: —————–
This work was supported in part by the National Key R&D Program of China under Grant 2017YFE0112300, Beijing National Research Center for Information Science and Technology under Grant BNR2019RC01014 and BNR2019TD01001. Corresponding author: Jintao Wang.

The proliferation of the Internet of Things (IoT) network and real time services raises high reliable and low latency data communication requirements on future networks. Due to the large number of access nodes, each communication device in such network is equipped with limited power. To meet both low latency and high throughput requirements under power constraint in time-varying wireless networks, efficient transmission strategy is needed.

Cross-layer control strategy has been an effective approach to reduce transmission delay in time-varying channels with limited power [1, 2, 3, 4, 5, 6]. By classifying channel states into ”Good” and ”Bad”, previous work [4, 5] studied energy efficient transmission strategy to minimize queueing delay for a point to point communication system. When different levels of power can be used to guarantee successful transmission in different channel states, Wang et al. derived the optimal stationary transmission policy that minimizes the queueing delay under an average power constraint [1, 2]. Notice that the above work assumed the channel fading to be an i.i.d process. A more practical assumption to model the time-varying channel is to assume that channel states evolve as a Markov chain similar to [7, 8].

To characterize the delay-power trade-off under a more practical Markov time-varying channel, we study joint queue aware and channel aware cross-layer control strategy to obtain a minimum delay performance. We formulate the delay-optimal scheduling problem into a constrained Markov decision process and obtain the optimal stationary randomized policy through Linear Programming (LP). Finally, the performance of the proposed algorithm is evaluated through simulations.

The rest of the paper is organized as follows: we formulate the overall scheduling problem in Section II. In Section III, we formulate the scheduling problem into a constrained Markov decision process (CMDP) and analyze its optimal structure. The optimal solution to the CMDP is then obtained through Linear Programming (LP). Section IV provides simulation results and Section V draws the conclusion.

Notations: The probability of event AA is denoted as Pr{A}\text{Pr}\{A\} and the expectation is denoted by 𝔼[A]\mathbb{E}[A].

II System Model

II-A Network Model

We consider the scheduling policy for a discrete time point-to-point communication system and let n+n\in\mathbb{N}^{+} denote the index of slots. The number of packets arriving at the transmitter in the nn-th slot, denoted by b[n]{0,1}b[n]\in\{0,1\}, follows an i.i.d Bernoulli distribution with expectation 𝔼[b[n]]=θ\mathbb{E}[b[n]]=\theta. Those packets wait in an FIFO queue at the transmitter before they are sent out to the receiver and the queue length in slot nn is captured by l[n]l[n]. We assume the undelivered packets wait in a finite buffer of size KK and when the current queue length is about to exceed the buffer capacity, i.e, l[n]=Kl[n]=K, arriving packets in th next slot will be discarded.

We assume packets are sent from the transmitter to the receiver through a time-varying wireless link. Assume that the channel state remains invariant in each slot and we model the channel s[n]s[n] in each slot as an ergodic SS-state Markov chain. Let PijP_{ij} denotes the conditional probability that channel state evolves from state ii to state jj, i.e.,

Pr{s[n]=j|s[n1]=i}Pij,i,j=1,2,,S.\text{Pr}\{s[n]=j|s[n-1]=i\}\triangleq P_{ij},i,j=1,2,...,S. (1)

With no loss of generality, we assume large s[n]s[n] indicates better channel quality and thus less power is needed to transmit a packet. When the current channels state is ss, i.e., s[n]=ss[n]=s, the transmitter uses XsX_{s} power to guarantee successful transmission in channel state ss. Thus Xs>Xs,s<sX_{s}>X_{s^{\prime}},\forall s<s^{\prime}. In this work, similar to [1], we assume that at most one packet can be transmitted in each time slot and the transmitted packet will be successfully received at the end of the slot. Let a[n]{0,1}a[n]\in\{0,1\} denote the decision in nn-th slot, where a[n]=1a[n]=1 denotes a transmission decision is made and a[n]=0a[n]=0 indicates that the channel remains idle. To minimize the queueing delay under an average power constraint, the transmission decision a[n]a[n] is made based on the current channel state s[n]s[n] and queue length q[n]q[n]. The average power consumed over consecutive NN slots can thus be characterized by:

EN=1Nn=1Na[n]Xs[n]E_{N}=\frac{1}{N}\sum_{n=1}^{N}a[n]X_{s[n]} (2)

And the queue length evolution is as follows:

l[n]=max{min{l[n1]+b[n],K}a[n],0}.l[n]=\max\{\min\{l[n-1]+b[n],K\}-a[n],0\}. (3)

Here we assume the buffer size KK is large enough so that packet-loss due to full buffer is unlikely to happen. Then the average transmission delay of each packet can be computed by:

limN1Nn=1Nl[n].\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^{N}l[n].

II-B Problem Formulation

Our goal is to design a non-anticipated policy π\pi that utilizes the current queue length and channel state information in making schedule decisions to minimize the average queueing delay under power constraint. Denote πNA\pi_{\text{NA}} to be the set of non-anticipated scheduling decisions, then the optimization problem is organized as follows:

Problem 1 (Delay Optimal Scheduling Problem)
π=argminπΠNAD(π),where D(π)=limN𝔼π[1Nn=1Nl[n]],\pi^{*}=\arg\min_{\pi\in\Pi_{\text{NA}}}D(\pi),\text{where }D(\pi)=\lim_{N\to\infty}\mathbb{E}_{\pi}\left[\frac{1}{N}\sum_{n=1}^{N}l[n]\right], (4a)
s.t. limN𝔼π[1Nn=1Na[n]Xs[n]].\text{s.t. }\lim_{N\to\infty}\mathbb{E}_{\pi}\left[\frac{1}{N}\sum_{n=1}^{N}a[n]X_{s[n]}\right]\leq\mathcal{E}. (4b)

III Problem Resolution

In this section we first formulate the problem as a constrained Markov decision process (CMDP) and verify the threshold structure of the optimal policy. Then we derive the solution through linear programming (LP).

III-A Constrained Markov Decision Process Formulation

The scheduling problem can be formulated into a CMDP with the following four parts:

  • (1)

    State Space: The state in slot nn can be characterized by the queue length and channel state (l[n],s[n])(l[n],s[n]).

  • (2)

    Action Space: The source node can choose two possible actions in each slot. Action a[n]=1a[n]=1 denotes that the node schedules to transmit a packet, while a[n]=0a[n]=0 means that the transmitter stays idle in the slot. Thus the action space 𝔸={0,1}\mathbb{A}=\{0,1\}.

  • (3)

    Transfer Function: The queue length in the next slot relies on the number of arriving packets and scheduling decision. If a[n]=1a[n]=1, then q[n+1]=q[n]+b[n]1q[n+1]=q[n]+b[n]-1. Otherwise q[n+1]=q[n]+b[n]q[n+1]=q[n]+b[n]. Since the channel state evolves independently, the transfer function can be computed as follows:

    Pr{(q,s)(q+1,s)}=Pssθ,a=0;\displaystyle\text{Pr}\{(q,s)\rightarrow(q+1,s^{\prime})\}=P_{ss^{\prime}}\theta,\quad a=0; (5a)
    Pr{(q,s)(q,s)}=Pss(1θ),a=0;\displaystyle\text{Pr}\{(q,s)\rightarrow(q,s^{\prime})\}=P_{ss^{\prime}}(1-\theta),\quad a=0; (5b)
    Pr{(q,s)(q,s)}=Pssθ,a=1;\displaystyle\text{Pr}\{(q,s)\rightarrow(q,s^{\prime})\}=P_{ss^{\prime}}\theta,\quad a=1; (5c)
    Pr{(q,s)(q1,s)}=Pss(1θ),a=1.\displaystyle\text{Pr}\{(q,s)\rightarrow(q-1,s^{\prime})\}=P_{ss^{\prime}}(1-\theta),\quad a=1. (5d)
  • (4)

    One-Step Cost: The one-step cost includes delay cost and power cost. Let CQ(q,s,a)C_{Q}(q,s,a) denote the delay cost while CX(q,s,a)C_{X}(q,s,a) the power cost, then

    CQ(q,s,a)=q,C_{Q}(q,s,a)=q, (6a)
    CX(q,s,a)=Xsa.C_{X}(q,s,a)=X_{s}a. (6b)

With the introduction of the above four elements, the Delay Optimal Scheduling Problem can be expressed as the following finite state constrained Markov Decision Process (CMDP):

π=argminπΠNAD(π),\pi^{*}=\arg\min_{\pi\in\Pi_{\text{NA}}}D(\pi), (7a)
where D(π)=limN1N𝔼π[n=1NCQ(l[n],s[n],a[n])],\text{where }D(\pi)=\lim_{N\to\infty}\frac{1}{N}\mathbb{E}_{\pi}\left[\sum_{n=1}^{N}C_{Q}(l[n],s[n],a[n])\right], (7b)
s.t. limN1N𝔼π[n=1NCX(l[n],s[n],a[n])].\text{s.t. }\lim_{N\rightarrow\infty}\frac{1}{N}\mathbb{E}_{\pi}\left[\sum_{n=1}^{N}C_{X}(l[n],s[n],a[n])\right]\leq\mathcal{E}. (7c)

III-B Queue-Length Threshold Structure

In this part we study the structure of the delay optimal policy. First we introduce a corollary from [9] to illustrate a basic property of the policy. Here deterministic policies refer to those whose transmission probability fq,sf_{q,s} is either 1 or 0.

Corollary 1

An optimal stationary policy π\pi^{*} is a mixture of two deterministic policies π1,π2\pi_{1},\pi_{2}. Let λ\lambda denote the weight of π1\pi_{1}, then in each slot nn, policy π\pi^{*} selects policy π1\pi_{1} with probability λ\lambda and policy π2\pi_{2} with probability 1λ1-\lambda, i.e.,

π=λπ1+(1λ)π2.\pi^{*}=\lambda\pi_{1}+(1-\lambda)\pi_{2}. (8)

To obtain π\pi^{*} through LP, first we need to know the structure of π\pi^{*}. This optimum structure is obtained by analyzing the structure of π1\pi_{1} and π2\pi_{2}. To obtain π1\pi_{1} and π2\pi_{2}, we place the power constraint into the objective function with the Lagrangian multiplier η0\eta\geq 0:

minπΠNAlimN1N𝔼π[n=1NCQ(l[n],s[n],a[n])+η(CX(l[n],s[n],a[n]))].\begin{split}&\min_{\pi\in\Pi_{\text{NA}}}\lim_{N\to\infty}\frac{1}{N}\mathbb{E}_{\pi}\Bigg{[}\sum_{n=1}^{N}C_{Q}(l[n],s[n],a[n])+\\ &\hskip 42.67912pt\eta(C_{X}(l[n],s[n],a[n])-\mathcal{E})\Bigg{]}.\end{split} (9)
Theorem 1

Given η\eta, the optimal deterministic stationary policy to the unconstrained scheduling problem (9) is controlled by a queue length threshold LsL_{s} for each channel state ss. When l[n1]+b[n]Lsl[n-1]+b[n]\geq L_{s}, the optimum policy transmits a packet i.e., a[n]=1a[n]=1, otherwise the transmitter idles and a[n]=0a[n]=0.

Proof:

The detail is provided in Appendix A. ∎

Theorem 1 indicates following the optimum strategy we will have a finite queue length, which is shown in the corollary below:

Corollary 2

When the queue capacity KK is sufficiently large, following the optimum stationary policy π\pi^{*}, the queue length l[n]l[n] will never reach KK.

Proof:

Suppose that the threshold values of the optimal deterministic policy π1\pi_{1} and π2\pi_{2} are L11,L21,,LS1L_{1}^{1},L_{2}^{1},...,L_{S}^{1} and L12,L22,,LS2L_{1}^{2},L_{2}^{2},...,L_{S}^{2}, respectively. Denote L=maxi,sLsi.L^{*}=\max_{i,s}L_{s}^{i}. According to Theorem 1, since the optimum stationary policy π\pi^{*} is a mixture of π1\pi_{1} and π2\pi_{2}, whenever the queue length reaches LL^{*}, policy π\pi^{*} will transmit a packet, and thus the queue length will not exceeds LL^{*}. As long as the KK is selected sufficiently large, i.e., KL+1K\geq L^{*}+1, we can guarantee the queue length l[n]l[n] following policy π\pi^{*} is smaller than KK. ∎

III-C Linear Programming Formulation

According to the previous analysis, the optimum policy to the above CMDP has a threshold structure. In this section, we obtain the optimum policy through LP.

Recall that we assume the buffer size KK is large enough so that the queue length is always smaller than KK. For each stationary policy π\pi, denote μq,s\mu_{q,s} as the steady state distribution that the current queue length is qq and the channel state is ss. Let yq,sy_{q,s} be the probability that the current queue length is qq, the channel state is ss and the transmitter schedules to send one packet. To transform the CMDP into an equivalent LP, first we introduce a set of variables yq,s=μq+1,s(1θ)fq+1,s+μq,sθfq+1,sy_{q,s}=\mu_{q+1,s}(1-\theta)f_{q+1,s}+\mu_{q,s}\theta f_{q+1,s}, which denotes the probability of having qq packets waiting in the queue after sending a packet in channel state ss. The following theorem helps us to transform the CMDP into an equivalent LP:

Theorem 2

Solving the Delay-Opt Stationary Problem is equivalent to solve the following LP problem:

Dopt=min{yq,s}1θ2q=0Ks=1Sqyq,s,\text{D}^{\text{opt}}=\min_{\{y_{q,s}\}}\frac{1}{\theta^{2}}\sum_{q=0}^{K}\sum_{s=1}^{S}qy_{q,s}, (10a)
s.t. q=0Ks=1SXsyq,s,\displaystyle\sum_{q=0}^{K}\sum_{s=1}^{S}X_{s}y_{q,s}\leq\mathcal{E}, (10b)
q=0Ks=1Syq,s=θ,\displaystyle\sum_{q=0}^{K}\sum_{s=1}^{S}y_{q,s}=\theta, (10c)
q=0K𝒈𝒒,𝒔𝑻𝒀=ρs,S,\displaystyle\sum_{q=0}^{K}\boldsymbol{g_{q,s}^{T}Y}=\rho_{s},\forall S, (10d)
0yq,s(1θ)𝒈𝒒,𝒔𝑻𝒀+θ𝒈𝒒,𝒔𝑻𝒀,q,s,\displaystyle 0\leq y_{q,s}\leq(1-\theta)\boldsymbol{g_{q,s}^{T}Y}+\theta\boldsymbol{g_{q,s}^{T}Y},\forall q,s, (10e)
0𝒈𝒒,𝒔𝑻𝒀1,q,s.\displaystyle 0\leq\boldsymbol{g_{q,s}^{T}Y}\leq 1,\forall q,s. (10f)

where 𝐘=[y0,1,y1,1,y2,1,,yK,1,y0,2,,yK,S]T\boldsymbol{Y}=[y_{0,1},y_{1,1},y_{2,1},...,y_{K,1},y_{0,2},...,y_{K,S}]^{T} and ρs\rho_{s} is the stationary distribution of 𝐏\mathbf{P}. 𝐆=[𝐠𝟎,𝟏,𝐠𝟏,𝟏,𝐠𝟐,𝟏,,𝐠𝐊,𝟏,𝐠𝟎,𝟐,,𝐠𝐊,𝐒]T\mathbf{G}=[\boldsymbol{g_{0,1},g_{1,1},g_{2,1},...,g_{K,1},g_{0,2},...,g_{K,S}}]^{T} is a matrix that characterizes the transformation from 𝐘\boldsymbol{Y} to 𝛍=[μ0,1,μ1,1,μ2,1,,μK,1,μ0,2,,μK,S]T\boldsymbol{\mu}=[\mu_{0,1},\mu_{1,1},\mu_{2,1},...,\mu_{K,1},\mu_{0,2},...,\mu_{K,S}]^{T}, i.e.,

𝝁=𝐆𝒀.\boldsymbol{\mu}=\mathbf{G}\boldsymbol{Y}. (11)

𝐆\mathbf{G} can be obtained from the following equations:

s=1SPss[(μq,sθ+i=q+1Kμi,s)yq,s]=i=q+1Kμi,s,q,s.\begin{split}&\sum_{s=1}^{S}P_{ss^{\prime}}[(\mu_{q,s}\theta+\sum_{i=q+1}^{K}\mu_{i,s})-y_{q,s}]\\ &=\sum_{i=q+1}^{K}\mu_{i,s^{\prime}},\forall q,s.\end{split} (12)
Proof:

The detail is given in Appendix C. ∎

According to the above theorem, we present the power-delay tradeoff as the following corollary:

Corollary 3

The optimal delay Dopt\text{D}^{\text{opt}} monotonically decreases with the available power \mathcal{E}.

Proof:

Suppose that Dopt()=D\text{D}^{\text{opt}}(\mathcal{E})=D when the power constraint is \mathcal{E}. Now we consider >\mathcal{E^{\prime}}>\mathcal{E}. Denote yk=s=1Syk,sy_{k}=\sum_{s=1}^{S}y_{k,s}, then (10a),(10c) are equivalent to:

D=1θ2q=0Kqyq,D=\frac{1}{\theta^{2}}\sum_{q=0}^{K}qy_{q}, (13)
q=0Kyq=θ\sum_{q=0}^{K}y_{q}=\theta (14)

Since >\mathcal{E^{\prime}}>\mathcal{E}, there exists q0q_{0} so that we can construct the following new policy:

yq=yq+δyq,qq0,y^{\prime}_{q}=y_{q}+\delta y_{q},q\leq q_{0}, (15a)
yq=yqδyq,q>q0.y^{\prime}_{q}=y_{q}-\delta y_{q},q>q_{0}. (15b)
where δyq0\delta y_{q}\geq 0.

We can choose sufficient small δyq\delta y_{q} such that q=0q0X1δyq\sum_{q=0}^{q_{0}}X_{1}\delta y_{q}\leq\mathcal{E^{\prime}}-\mathcal{E}. Then

q=0Ks=1SXsyq,sq=0Ks=1SXsyq,s+q=0q0s=1SXsδyq,sq=0Ks=1SXsyq,s+q=0q0X1δyq.\begin{split}&\quad\sum_{q=0}^{K}\sum_{s=1}^{S}X_{s}y^{\prime}_{q,s}\\ &\leq\sum_{q=0}^{K}\sum_{s=1}^{S}X_{s}y_{q,s}+\sum_{q=0}^{q_{0}}\sum_{s=1}^{S}X_{s}\delta y_{q,s}\\ &\leq\sum_{q=0}^{K}\sum_{s=1}^{S}X_{s}y_{q,s}+\sum_{q=0}^{q_{0}}X_{1}\delta y_{q}\\ &\leq\mathcal{E^{\prime}}.\end{split}

Notice (10d) is only related to the sum of all queue length under one particular channel state. Hence, given yqy^{\prime}_{q}, we can always adjust the values of yq,sy^{\prime}_{q,s} so that (10d) is satisfied. From (14), it can be easily observed that q=0q0δyq=q=q0+1Kδyq\sum_{q=0}^{q_{0}}\delta y_{q}=\sum_{q=q_{0}+1}^{K}\delta y_{q}. Thus,

DD=1θ2q=0K(yqyq)1θ2(q0q=0q0δyq(q0+1)q0+1Kδyq0.\begin{split}D^{\prime}-D&=\frac{1}{\theta^{2}}\sum_{q=0}^{K}(y^{\prime}_{q}-y_{q})\\ &\leq\frac{1}{\theta^{2}}(q_{0}\sum_{q=0}^{q_{0}}\delta y_{q}-(q_{0}+1)\sum_{q_{0}+1}^{K}\delta y_{q}\\ &\leq 0.\end{split}

Therefore Dopt()DD\text{D}^{\text{opt}}(\mathcal{E^{\prime}})\leq D^{\prime}\leq D. ∎

IV Simulations

In this part simulation results are first provided to validate our proposed optimal policy and the trade-off between power and delay is revealed. We assume a 3-state channel which satisfies the majorization relationship and the transmission matrix is as follows:

𝐏=[0.50.30.20.30.40.30.20.30.5]\mathbf{P}=\begin{bmatrix}0.5&0.3&0.2\\ 0.3&0.4&0.3\\ 0.2&0.3&0.5\end{bmatrix} (16)

The power consumed in each channel state is {X1,X2,X3}={4.5,1.5,0.5}\{X_{1},X_{2},X_{3}\}=\{4.5,1.5,0.5\} and packets arrive at a rate of θ=0.6\theta=0.6. Queue capacity is constrained by K=11K=11. We simulate each setting over 10610^{6} slots. We compare the proposed scheduling policy with the greedy policy that transmits a packet whenever the transmitter has enough power.

We first evaluate the delay performance of our proposed policy. Fig. 1 plots the average delay of the optimal scheduling policy and greedy policy with power constraint [0.8,1.3]\mathcal{E}\in[0.8,1.3]. Moreover, the performance of the optimal policy is much better than the greedy policy when available power is limited, which proves the effectiveness of our strategy. Notice that the difference between these two policies vanishes as the power supply increases. This indicates that the optimal policy behaves similarly to the greedy policy to achieve shorter delay when the power supply is sufficient.

Refer to caption
Figure 1: Average delay of optimal policies and greedy policies under different power constraint

To study how our algorithm achieves such superior performance, Fig. 2 displays the threshold structure of the optimal policy given different power constraint. When the available power increases, the threshold becomes smaller, which means that the source node transmits a packet more frequently. Besides, it is observed that the delay-power trade-off curve consists of three linear segments, each corresponding to a different threshold value of the policy. This verifies the linear property of the problem.

Refer to caption
Figure 2: Threshold structure of optimal policies under different power constraint.

V Conclusion

In this paper we study the delay minimization scheduling problem under power constraint in wireless networks with ergodic Markov channels. We first reveal the threshold structure and then obtain the optimal policy through linear programming. It is shown that the optimal scheduling awaits and utilizes better channel quality while maintaining a small delay. In the future we will extend the work to more general scenarios with multi-user scheduling and bandwidth constraint.

References

  • [1] M. Wang, J. Liu, W. Chen, and A. Ephremides, “On delay-power tradeoff of rate adaptive wireless communications with random arrivals,” in GLOBECOM 2017 - 2017 IEEE Global Communications Conference, Dec 2017, pp. 1–6.
  • [2] ——, “Joint queue-aware and channel-aware delay optimal scheduling of arbitrarily bursty traffic over multi-state time-varying channels,” IEEE Transactions on Communications, vol. 67, no. 1, pp. 503–517, Jan 2019.
  • [3] J. Yang and S. Ulukus, “Delay-minimal transmission for average power constrained multi-access communications,” IEEE Transactions on Wireless Communications, vol. 9, no. 9, pp. 2754–2767, Sep. 2010.
  • [4] E. Uysal-Biyikoglu, B. Prabhakar, and A. El Gamal, “Energy-efficient packet transmission over a wireless link,” IEEE/ACM Transactions on Networking, vol. 10, no. 4, pp. 487–499, Aug 2002.
  • [5] R. A. Berry and R. G. Gallager, “Communication over fading channels with delay constraints,” IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 1135–1149, May 2002.
  • [6] H. Tang, J. Wang, L. Song, and J. Song, “Scheduling to minimize age of information in multi-state time-varying networks with power constraints,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
  • [7] K. Chen and L. Huang, “Timely-throughput optimal scheduling with prediction,” IEEE/ACM Transactions on Networking, vol. 26, no. 6, pp. 2457–2470, Dec 2018.
  • [8] H. Tang, J. Wang, L. Song, and J. Song, “Minimizing age of information with power constraints: Multi-user opportunistic scheduling in multi-state time-varying channels,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 854–868, 2020.
  • [9] E. Altman, Constrained Markov decision processes.   CRC Press, 1999, vol. 7.
  • [10] H. Tang, J. Wang, Z. Tang, and J. Song, “Scheduling to minimize age of synchronization in wireless broadcast networks with random updates,” IEEE Transactions on Wireless Communications, vol. 19, no. 6, pp. 4023–4037, 2020.

Appendix A Proof of Theorem 1

The structure that minimizes the time-average cost of the Lagrange function (9) is obtained by the optimum structure that minimizes the α\alpha-discounted cost and letting α1\alpha\rightarrow 1. Following policy π\pi, the α\alpha-discounted cost starting from the state (q,s)(q,s), denoted by Jα,π(q,s)J_{\alpha,\pi}(q,s) can be computed by:

Jα,π(q,s)=limN𝔼π[n=1Nαn[CQ(l[n],s[n],a[n])+η(CX(l[n],s[n],a[n]))]|(l[0]=q,s[0]=s)].\begin{split}&J_{\alpha,\pi}(q,s)=\lim_{N\to\infty}\mathbb{E}_{\pi}\Bigg{[}\sum_{n=1}^{N}\alpha^{n}[C_{Q}(l[n],s[n],a[n])+\\ &\hskip 14.22636pt\eta(C_{X}(l[n],s[n],a[n])-\mathcal{E})]|(l[0]=q,s[0]=s)\Bigg{]}.\end{split} (17)

Then, suppose πα\pi_{\alpha}^{*} is the optimum policy that minimizes the α\alpha-discounted cost. Then the value function Vα(q,s)=minπJα,π(q,s)V_{\alpha}(q,s)=\min_{\pi}J_{\alpha,\pi}(q,s) satisfies the following Bellman equation:

Vα(q,s)=min{CQ(q,s,0)+αs=1SPss(θVα(q+1,s)+(1θ)Vα(q,s)),CQ(q,s,1)+ηCX(q,s,1)+αs=1SPss(θVα(q,s)+(1θ)Vα(q1,s))}\begin{split}&V_{\alpha}(q,s)=\min\{C_{Q}(q,s,0)+\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta V_{\alpha}(q+1,s^{\prime})\\ &\hskip 28.45274pt+(1-\theta)V_{\alpha}(q,s^{\prime})),C_{Q}(q,s,1)+\eta C_{X}(q,s,1)\\ &\hskip 28.45274pt+\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta V_{\alpha}(q,s^{\prime})+(1-\theta)V_{\alpha}(q-1,s^{\prime}))\}\end{split} (18)

To validate the threshold structure, we first introduce the following lemma, proof details are similar to [10, Lemma 3]:

Lemma 1

Fix discounted factor α\alpha and channel state ss, then the difference of value function dα(q,)=Vα(q,)Vα(q1,),q1d_{\alpha}(q,\cdot)=V_{\alpha}(q,\cdot)-V_{\alpha}(q-1,\cdot),q\geq 1 monotonically increases with qq.

Denote ΔVα(q,s)\Delta V_{\alpha}(q,s) to be the difference of the sum of reward and the value function in the next state of taking aα(q,s)={1,0}a_{\alpha}(q,s)=\{1,0\}, i.e.,

ΔVα(q,s)=CQ(q,s,1)+ηCX(q,s,1)+αs=1SPss(θVα(q,s)+(1θ)Vα(q1,s))CQ(q,s,0)αs=1SPss(θVα(q+1,s)(1θ)Vα(q,s)).\begin{split}\Delta V_{\alpha}(q,s)=&C_{Q}(q,s,1)+\eta C_{X}(q,s,1)\\ &+\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta V_{\alpha}(q,s^{\prime})+(1-\theta)V_{\alpha}(q-1,s^{\prime}))\\ &-C_{Q}(q,s,0)-\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta V_{\alpha}(q+1,s^{\prime})\\ &-(1-\theta)V_{\alpha}(q,s^{\prime})).\end{split} (19)

Plugging dα(q,)=Vα(q,)Vα(q1,)d_{\alpha}(q,\cdot)=V_{\alpha}(q,\cdot)-V_{\alpha}(q-1,\cdot) into the above equation we then have:

ΔVα(q,s)=ηXsαs=1SPss(θdα(q+1,s)+(1θ)dα(q,s)).\begin{split}\Delta V_{\alpha}(q,s)=&\eta X_{s}-\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta d_{\alpha}(q+1,s^{\prime})\\ &+(1-\theta)d_{\alpha}(q,s^{\prime})).\end{split} (20)

Since we have shown that dα(q,)d_{\alpha}(q,\cdot) increases monotonically, it can be proved that ΔVα(q,)\Delta V_{\alpha}(q,\cdot) decreases monotonically. As a result, when ΔVα(q,s)<0\Delta V_{\alpha}(q,s)<0, it is indicated that the optimum policy πα\pi_{\alpha}^{*} assigns an update decision in state (q,s)(q,s). Then for states qqq^{\prime}\geq q, the optimum chooses to transmit because ΔVα(q,s)ΔVα(q,s)0\Delta V_{\alpha}(q^{\prime},s)\leq\Delta V_{\alpha}(q,s)\leq 0. Hence for each Lagrange multiplier η\eta, the optimal policy πα\pi_{\alpha}^{*} that minimizes the α\alpha-discouted cost has two solutions: (1) There exists a set of thresholds LsL_{s}. When the channel state is ss, policy πα\pi_{\alpha}^{*} sends a packet if the queue length surpasses the threshold LsL_{s}. (2) Policy πα\pi_{\alpha}^{*} never sends a packet.

Next we show the latter scenario is impossible. Assume that the source node never transmits a packet in any channel state and any queue length, then ΔVα(q,s)>0,s,q\Delta V_{\alpha}(q,s)>0,\forall s,q, which means that sequence dα(q,s)d_{\alpha}(q,s) is bounded for any ss. According to Lemma 1, we obtain that the limit point of the sequence exists. Denote that limqdα(q,s)=d¯s,1sS\lim_{q\to\infty}d_{\alpha}(q,s^{\prime})=\overline{d}_{s^{\prime}},\forall 1\leq s^{\prime}\leq S. Let qq\to\infty, then for channel state s1,s2,,srs_{1},s_{2},...,s_{r}, the channel stays idle; for other states, the source node transmits a packet. For the first case we have:

dα(q,s)=1+αs=1SPss(θdα(q+1,s)+(1θ)dα(q,s)).\begin{split}&d_{\alpha}(q,s)=1+\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta d_{\alpha}(q+1,s^{\prime})+\\ &(1-\theta)d_{\alpha}(q,s^{\prime})).\end{split} (21)

Similarly for the second case we have:

dα(q,s)=1+αs=1SPss(θdα(q,s)+(1θ)dα(q1,s)).\begin{split}&d_{\alpha}(q,s)=1+\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}(\theta d_{\alpha}(q,s^{\prime})+\\ &(1-\theta)d_{\alpha}(q-1,s^{\prime})).\end{split} (22)

Take qq\to\infty, then in both cases the following equation holds:

d¯s=1+αs=1SPssd¯s,s.\overline{d}_{s}=1+\alpha\sum_{s^{\prime}=1}^{S}P_{ss^{\prime}}\overline{d}_{s^{\prime}},\forall s. (23)

Let α1\alpha\to 1. Multiply each equation above by ρs\rho_{s} and sum them up, then we have:

s=1Sρsd¯s=1+s=1Sρsd¯s\sum_{s=1}^{S}\rho_{s}\overline{d}_{s}=1+\sum_{s=1}^{S}\rho_{s}\overline{d}_{s}

which clearly contradicts itself. Thus the threshold for each channel state exits, which verifies the structure of the optimal policy in Theorem 1.

Appendix B Proof of Theorem 2

The power constraint (10b) can be easily obtained from the definition of yq,sy_{q,s}. Now we derive the expression of average delay (10a). Consider the stationary probability of queue length being larger than qq and channel state being ss^{\prime}. On the one hand it can be characterized as follows:

Pr{l>q,s=s}=l=q+1Kμl,s.\text{Pr}\{l>q,s=s^{\prime}\}=\sum_{l=q+1}^{K}\mu_{l,s^{\prime}}. (24)

On the other hand, it can be transferred from the previous slot whose queue length is either larger than qq or equal to qq while the node receives a new packet and does not transmit it. Therefore we have:

Pr{l>q,s=s}=s=1SPss(μq,sθ+l=q+1Kμl,syq,s).\text{Pr}\{l>q,s=s^{\prime}\}=\sum_{s=1}^{S}P_{ss^{\prime}}(\mu_{q,s}\theta+\sum_{l=q+1}^{K}\mu_{l,s}-y_{q,s}). (25)

Thus:

s=1SPss(μq,sθ+l=q+1Kμl,syq,s)=l=q+1Kμl,s,q,s.\sum_{s=1}^{S}P_{ss^{\prime}}(\mu_{q,s}\theta+\sum_{l=q+1}^{K}\mu_{l,s}-y_{q,s})=\sum_{l=q+1}^{K}\mu_{l,s^{\prime}},\forall q,s. (26)

which is exactly (LABEL:relationship). From the above equation, we can obtain the transformation matrix 𝐆\mathbf{G}. Take ss^{\prime} from 1 to SS and sum the equations up, then we have:

x=1Syq,s=θs=1Sμq,s.\sum_{x=1}^{S}y_{q,s}=\theta\sum_{s=1}^{S}\mu_{q,s}. (27)

By substituting the above equation into Little’s Theorem, we have (10a). Next we consider the steady-state constraint (10c). Notice that the sum of stationary distribution should equal 1, so we have:

q=0Ks=1Syq,s=θq=0Ks=1Sμq,s=θ.\sum_{q=0}^{K}\sum_{s=1}^{S}y_{q,s}=\theta\sum_{q=0}^{K}\sum_{s=1}^{S}\mu_{q,s}=\theta. (28)

Since the communication channel is an ergodic Markov chain, we also needs to construct a constraint on the channel evolution. Consider the stationary distribution of the channel and we have:

ρs=q=0Kμq,s.\rho_{s}=\sum_{q=0}^{K}\mu_{q,s}. (29)

Substitute (11) into the above equation and we can obtain (10d). Finally considering that transmission probability and stationary distribution should not surpass 1, we have (10e) and (10f) respectively.