This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Scheduling Policy for Multi-process WNCS under Edge Computing thanks: Y. Qiu, S. Wu and Y. Wang are with the Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China. E-mail: [email protected], [email protected], [email protected]

Yifei Qiu, Shaohua Wu, and Ying Wang
Abstract

This paper considers a multi-process and multi-controller wireless networked control system (WNCS). There are NN independent linear time-invariant processes in the system plant which represent different kinds of physical processes. By considering the edge computing, the controllers are played by edge server and cloud server. Each process is measured by a sensor, and the status updates is sent to controller to generate the control command. The link delay of cloud server is longer than that of edge server. The processing time of status update depends on the characteristic of servers and processes. By taking into account such conditions, we mainly investigate how to choose the destination of status updates to minimize the system’s average Mean Square Error (MSE), edge server or cloud server? To address this issue, we formulate an infinite horizon average cost Markov Decision Process (MDP) problem and obtain the optimal scheduling policy. The monotonicity of the value function in MDP is characterized and then used to show the threshold structure properties of the optimal scheduling policy. To overcome the curse of dimensionality, we propose a low-complexity suboptimal policy by using additive separable structure of value function. Furthermore, the processing preemption mechanism is considered to handle the status updates more flexible, and the consistency property is proved. On this basis, a numerical example is provided. The simulation results illustrate that the controller selection is related to the timeliness of process and show the performance of the suboptimal policy. We also find that the optimal policy will become a static policy in which the destination of status update is fixed when the wireless channel is error free.

Index Terms:
WNCS, multi-process system, Edge computing, age of information, preemption, schelduling

I Introduction

In a wireless networked control system (WNCS), to ensure the accuracy of control and the stability of design, it is necessary to describe the timeliness and error of the system quantitatively. There are two commonly used parameters: Age of information (AoI) and Mean Square Error (MSE). The concept of AoI was proposed in [1], which can accurately quantify the timeliness of status information. Typically, the AoI is defined as the time elapsed since the most recently received status update was generated at the sensor. Unlike AoI, MSE is mainly used to measure the system’s error, which can describe the change rates of different sources. It is widely used, especially in the control system, such as the multi-sensor control system in [2].

Based on AoI and MSE, the research under the background of WNCS has yielded many beneficial results. Some literature study the scheduling polices to optimize the system AoI or MSE. For example, the work in [3] studies wireless networks with multiple users under the problem of AoI minimization and develops a low-complexity transmission scheduling algorithm. The authors in [4] extend the multiple users system to incorporate stochastic packet arrivals and optimal packet management and proposes an index-prioritized random access policy by minimizing the AoI. The works in [1] and [5] obtained the optimal actions in current state by designing the indicator based on latency and reliability, respectively. The authors in [5] and [6] study different lengths of source status and give the optimal policies in many cases by minimizing the system AoI. Based on [6], the work in [7] investigates the preemption mechanism in channel and prove the structural properties of the optimal policy. The non-uniform status model was further extended in [8], which is no longer limited to a single source and considers the situation of multiple processes with different status updates. The work in [9] introduces an optimal policy considering the change rate of different sources by minimizing the system MSE.

In WNCS, smaller delay means more accurate control, but the existing scheduling policies can not reduce the inherent delay in uplink and downlink. Therefore, edge computing was proposed to decrease prolonged delays in some time-critical tasks. In some WNCS, a small set of special servers was deployed near the sensors. These special servers are called edge servers, which nearby sensors can reach via wireless connections. In [10] and [11], edge computing is added to the sensor network. The authors in [10] introduce that the sensors can only full-offload the task, i.e., each status update can only be processed by one server, which can not be divided into multiple parts and handed over to multiple servers. The work in [12] introduces a neural network for training the system model and considers the processing time on server. The work in [13] uses the existing cloud processing center to build an edge computing model, and the results show that the computing capacity is different between edge server and cloud server. The authors in [11] propose a universal model which uses delay and computational capacity to distinguish different servers. The full task offloading without task migration in sensor network is also considered in [11].

Most existing works, e.g., [2]-[11], assume that the system has only one controller, and it takes the same time for different status updates to generate the control command. However, for the WNCS system containing edge computing, there are often multiple controllers in the system, and the controllers are mainly divided into edge servers and cloud servers. In addition, the time of generating control commands is also different. The edge servers are less potent in both resources and computational ability than remote cloud servers. It take longer time to process state information and generate control commands than cloud servers [12]. Moreover, to timely respond to time-critical tasks and improve system performance, the preemption mechanism commonly exists on the server, and tasks with higher priority can preempt the ongoing task. Therefore, some questions naturally arise: How to select the status update destination, remote cloud server or edge server, to optimize the system performance? Due to the randomness in the actual process, the server-side cannot perform the best action in every time slot; how to allocate the priority between different status updates to get the preemption policy that minimizes the MSE of the system? These questions motivate the study in this paper.

In this paper, the scheduling problem in WNCS with NN independent processes is studied, and we are interested in minimizing the average MSE. By referring to the model in [8] and [11], we adopt edge computing in the system model and consider that the state updates need to be processed in multiple time slots to generate control commands. Moreover, to handle the status updates more flexibly, we add the preemption mechanism on the basis of [8]. The main contributions of this paper are summarized as follows:

  • This paper considers a multi-process WNCS and applies the edge computing. The different processes stands for different physical processes, which status updates need different processing time on the same server. Meanwhile, we use link delay and computing power to describe edge server and cloud server. By introducing the AoI and remaining processing time of each process, the problem of how to scheduling status updates can be described as an infinite horizon average cost Markov Decision Process (MDP). Thus the scheduling policy of minimizing the MSE of the system is obtained.

  • A suboptimal policy is proposed to reduce the computational complexity. Along the proof of exists studies, we have the additive separable structure of value function. The original MDP problem can be divided into multiple small tasks. The simulation result shows that the performance of proposed suboptimal policy is near to optimal policy.

  • The processing preemption mechanism is considered in this paper. In order to handle the status updates more flexibly, we allow the task with higher priority to preempt the processing task on server. The preemption optimal policy is obtained by solving the formulated MDP problem. Then the consistency property of the preemption optimal policy is proved.

The rest of this paper is organized as follows. In Section II, we introduce the system model, formulate the problem and prove some structure properties. Section III presents a low-complexity suboptimal policy. Section IV further extends to the preemption mechanism on controller and characterizes the consistency property of its optimal policy. Simulation result and analysis are provided in Section V. Finally, conclusions are drawn in Section VI.

Refer to caption
Figure 1: System Model

II System Model And Problem Formulation

II-A System Model

We consider a multi-process WNCS. As shown in Fig.1, there are NN processes and NN sensors in the system, and each sensor samples its corresponding process[14]-[15]. For the uplink and downlink wireless channel, we adopt the orthogonal channel which can transmit multiple status updates at the same time. The controller in the system consists of an edge server and a cloud server. Same as the assumption in [11] and [16], compared with the remote server, the edge server has a smaller uplink and downlink delay but longer task processing time. In Fig.1, different color of dots in the circle represents different physical processes and the number of dots represents the dimension of process, e.g., the processing time required for generating control command from video signals is not the same as that from audio signals.

A time-slotted system is considered, where time is divided into slots with equal duration. And we model each process as a discrete linear time-invariant system [17]:

xk(t+1)=Akxk(t)+Bkuk(t)+zk(t),x_{k}(t+1)=A_{k}x_{k}(t)+B_{k}u_{k}(t)+z_{k}(t), (1)

where k𝒩{1,,N}k\in\mathcal{N}\triangleq\{1,...,N\} and tt represent the process index and time slot, respectively. xk(t)Rnkx_{k}(t)\in R^{n_{k}} is the state of the kk-th process in tt-th time slot and uk(t)Rnku_{k}(t)\in R^{n_{k}} is the executed control command. zk(t)z_{k}(t) represents the normally distributed noise with the distribution N(0,Rk)N(0,R_{k}) of the kk-th process and we assume that RkR_{k} is a positive semi-definite. AkRnk×1A_{k}\in R^{n_{k}\times 1} and BkRnk×1B_{k}\in R^{n_{k}\times 1} represent the state transition matrix and command control coefficient of process kk, respectively. In this article, we consider the case that all processes remain unchanged within a single time slot. The goal of the system is to keep the state xkx_{k} of each process kk close to 𝟎Rnk\boldsymbol{0}\in R^{n_{k}}.

In wireless channel, the transmission success probability for a status updates is set as p(0,1)p\in(0,1). As described in [14] [18][19], it is assumed that there is a perfect feedback channel between each sensor and the destination, so that each sensor will immediately inform whether the transmission is successful and determine the destination of the next time slot.

To generate effective control commands, the controller must maintain an accurate estimate xkx_{k} of plant state. When the controller receives the status information from sensor successfully, it can use the timestamp to estimate the plant state. Define τk\tau_{k} as the AoI of the status information generated by process kk. The estimation of xk(t)x_{k}(t), denoted by x~k(t)\widetilde{x}_{k}(t), can be expressed as:

x~k(t)=Akτk(t)xk(tτk(t)),\widetilde{x}_{k}(t)=A_{k}^{\tau_{k}(t)}x_{k}(t-\tau_{k}(t)), (2)

We denote Δ\Delta^{\uparrow} and Δ\Delta^{\downarrow} as the uplink and downlink delay, respectively. Assuming that the controller knows the delay of control command transmitted to the actuator. The goal of the system is to keep the state xkx_{k} of each process kk close to 𝟎Rnk\boldsymbol{0}\in R^{n_{k}}. When the state information has been processed completely, the control command on actuator uk(t)u_{k}(t) and the control command on controller u~k(t)\widetilde{u}_{k}(t) have the following relationship:

uk(t)=u~k(tΔ)=Kkx~k(t).u_{k}(t)=\widetilde{u}_{k}(t-\Delta^{\downarrow})=K_{k}\widetilde{x}_{k}(t). (3)

where Kk=Ak/BkK_{k}=-A_{k}/B_{k} is the command generation coefficient.

Let JJ denote the long-term average MSE of the system, which is given by

J=limT+1Tk=1Nt=0TQk(t).J={\lim_{T\to+\infty}}\frac{1}{T}\sum_{k=1}^{N}\sum_{t=0}^{T}Q_{k}(t). (4)

where Qk(t)=𝔼[xk(t)xk(t)T]Q_{k}(t)=\mathbb{E}[x_{k}(t)x_{k}(t)^{T}] represents the MSE of each process, which can be calculated by combining (1)-(3), given as:

Qk(t)=𝔼[xk(t)xk(t)T]=i=1τk(t)(Ak)(i1)Rk(AkT)(i1).Q_{k}(t)=\mathbb{E}[x_{k}(t)x_{k}(t)^{T}]=\sum_{i=1}^{\tau_{k}(t)}(A_{k})^{(i-1)}R_{k}(A_{k}^{T})^{(i-1)}. (5)

This equation constructs the mapping between AoI and MSE and can simplify the calculation.

For the processing time, different tasks on the same server is not equal, and the same task on different server is also different. For each physical process kk, we denote the state information length LkL_{k}, which is equal to the number of elements in xkx_{k}. The number of bits per element in the state matrix and the number of CPU cycles needed to process one bit are denoted by ll and vv, respectively. The number of time slots required for process one status update is then given by [20]

Tp=Lklvfε.T_{p}=\left\lceil\frac{L_{k}lv}{f\varepsilon}\right\rceil. (6)

where ff (in Hz) is the CPU frequency of the processor and ε\varepsilon (in seconds) is the time duration in each slot.

We assume that the edge computing system is stable such that the link delay is time invariable and a server can execute at most one task at a time. Because the remote cloud servers can be modeled as edge servers but with long link delay and more powerful processing capability, we do not explicitly differentiate between the edge servers and the remote cloud servers[11].

In order to describe the control performance of the system more accurately, we make a common assumption in the following:

Assumption:The transition matrix AkA_{k} of each process satisfies that ρ\rho(AkA_{k})>1. It means that without control, the system will be unstable, which is why the control system exists. This assumption makes the control necessary.

Here ρ\rho(.) stands for Spectral radius.

II-B Problem Formulation

For edge servers, the uplink and downlink delays are Δe\Delta_{e}^{\uparrow} and Δe\Delta_{e}^{\downarrow}, respectively. Moreover, the processing time on edge server of state xkx_{k} is defined as Te,kT_{e,k}. For a cloud server (or called the remote server), the uplink and downlink delays are defined as Δr\Delta_{r}^{\uparrow} and Δr\Delta_{r}^{\downarrow}, respectively. And the processing time on remote server of state xkx_{k} is defined as Tr,kT_{r,k}.

In order to facilitate the subsequent representation, we make some assumptions which are similar to the existing works in [11] [21]-[22]. For the edge server, the uplink and downlink delay are set to be 11 time slot, i.e., Δe=Δe=1\Delta_{e}^{\uparrow}=\Delta_{e}^{\downarrow}=1, and the processing time Te,kT_{e,k} is set to be twice the dimension of xkx_{k}. For the cloud server, we set the uplink and downlink delay to 22 time slots, i.e., Δr=Δr=2\Delta_{r}^{\uparrow}=\Delta_{r}^{\downarrow}=2, and the processing time Tr,kT_{r,k} is numerically the same as the dimension of xkx_{k}. e.g., for a task with a dimension of 1, if dispatched to the edge server, the uplink and downlink delays are both 1, and the processing time is 22; if dispatched to the cloud server, the uplink and downlink delays are both 2 and the processing time is 1. This can be extended to a general case and will be introduced in the following. But note that, the uploading and downloading of a task do not consume any computing resource of the servers, so a server can process a task while transmitting other tasks at the same time.

Our goal is to jointly control every process to minimize the system MSE under non-uniform status update and edge computing system. This problem can be modeled as a MDP problem, and the system space can be expressed as 𝐒(τ,Cr,Ce)𝒮\mathbf{S}\triangleq(\tau,C_{r},C_{e})\in\mathcal{S} where 𝒮\mathcal{S} is the collection of all feasible states. Each parameter is described as follows:

In order to record the AoI information of all processes, we set τ(τ1,,τN)\mathbf{\tau}\triangleq(\tau_{1},\cdots,\tau_{N}). For different process kk, we define τk{1,2,,τ^k}\tau_{k}\triangleq\{1,2,\cdots,\hat{\tau}_{k}\} as the AoI at the beginning of slot tt. We set τ^k\hat{\tau}_{k} be the upper limits of the AoI for process kk. For tractability [23], we assume that τ^k\hat{\tau}_{k} is finite, but can be arbitrarily large.

Let Cr(Ir,1,Ir,2,Ir,3,dr)C_{r}\triangleq(I_{r,1},I_{r,2},I_{r,3},d_{r}) be the state space for the remote server, Ir,1{1,,N}I_{r,1}\in\{1,\cdots,N\} stands for the index of process which is transmitting in the uplink, Ir,2{1,,N}I_{r,2}\in\{1,\cdots,N\} stands for the index of process which is being computed on the remote server, Ir,3{1,,N}I_{r,3}\in\{1,\cdots,N\} stands for the index of process which control command is transmitting from controller. drd_{r} records the number of slot that are left to be computed.

Let Ce(Ie,1,de)C_{e}\triangleq(I_{e,1},d_{e}) be the state space for the edge server, Ie,1{1,,N}I_{e,1}\in\{1,\cdots,N\} stands for the index of process which is being computed on the edge server, ded_{e} records the number of slot that are left to be computing.

The status information in transmission is like a queue and arrives at the server in the order of FCFS [16]. Therefore, to expand to the general case, just add index element for representing the process according to different uplink and downlink delays.

The action space is defined as 𝐀(a1,a2)\mathbf{A}\triangleq(a_{1},a_{2}), where a1,a2{1,,N}a_{1},a_{2}\in\{1,\cdots,N\} represent the index of process where status update is transmitted to the remote server and edge server, respectively.

For each variable, we can write the update rules as follows:

When both the uplink and downlink transmission are successful:

τk(t+1)={Δr+Δr+Tr,k, if Ir,3(t)=k,Δe+Δe+Te,Ie,1(t), if de(t)=0,τk(t)+1, otherwise. \tau_{k}(t+1)=\left\{\begin{array}[]{ll}\Delta_{r}^{\uparrow}+\Delta_{r}^{\downarrow}+T_{r,k},&\text{ if }\ I_{r,3}(t)=k,\\ \Delta_{e}^{\uparrow}+\Delta_{e}^{\downarrow}+T_{e,I_{e,1}(t)},&\text{ if }\ d_{e}(t)=0,\\ \tau_{k}(t)+1,&\text{ otherwise. }\end{array}\right. (7)
Cr(t+1)={(a1(t),I1(t),I2(t),Tr,I1(t)1), if dr(t)=0,(a1(t),I2(t),0,dr(t)1),otherwise.\begin{array}[]{l}C_{r}(t+1)=\\ \left\{\begin{array}[]{ll}(a_{1}(t),I_{1}(t),I_{2}(t),T_{r,I_{1}(t)}-1),&\text{ if }\ d_{r}(t)=0,\\ (a_{1}(t),I_{2}(t),0,d_{r}(t)-1),&\ \text{otherwise.}\end{array}\right.\end{array} (8)
Ce(t+1)={(a2(t),Tr,I1(t)), if de(t)=1,(Ie,1(t),de(t)1), otherwise. C_{e}(t+1)=\left\{\begin{array}[]{ll}(a_{2}(t),T_{r,I_{1}(t)}),&\text{ if }\ d_{e}(t)=1,\\ (I_{e,1}(t),d_{e}(t)-1),&\text{ otherwise. }\end{array}\right. (9)

In other conditions, (7) will change to τk(t+1)=τk(t)+1\tau_{k}(t+1)=\tau_{k}(t)+1 when the uplink transmission failed. When the downlink transmission failed, just replace the a1(t)a_{1}(t) and a2(t)a_{2}(t) to 0 in (8) and (9), respectively.

The one-stage cost only depends on the current state s=(τ,Cr,Ce)s=(\tau,C_{r},C_{e}) and is defined as

c(s,a)k=1NQk=k=1Ni=1τk(Ak)(i1)Rk(AkT)(i1).c(s,a)\triangleq\sum_{k=1}^{N}Q_{k}=\sum_{k=1}^{N}\sum_{i=1}^{\tau_{k}}(A_{k})^{(i-1)}R_{k}(A_{k}^{T})^{(i-1)}. (10)

The set of scheduling decisions for all possible states is called a scheduling policy π(a(1),a(2),)Π\pi\triangleq(a(1),a(2),\cdots)\in\Pi, where Π\Pi is the collection of all feasible scheduling policies. As a result, under a feasible stationary policy π\pi, the average MSE of the system starting from a given initial state 𝑺(1)=𝑺1\boldsymbol{S}(1)=\boldsymbol{S}_{1} is given by:

Q¯π(𝑺1)lim supT1Tt=1Tk=1N𝔼[Qk(t)𝑺1].\bar{Q}^{\pi}\left(\boldsymbol{S}_{1}\right)\triangleq\limsup_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}\sum_{k=1}^{N}\mathbb{E}\left[Q_{k}(t)\mid\boldsymbol{S}_{1}\right]. (11)

We are seeking for the policy which can minimize average MSE of the system in the following:

Q¯(𝑺1)infπQ¯π(𝑺1).\bar{Q}^{*}\left(\boldsymbol{S}_{1}\right)\triangleq\inf_{\pi}\bar{Q}^{\pi}\left(\boldsymbol{S}_{1}\right). (12)

According to [23, Propositions 5.2.1], the optimal policy π\pi^{*} can be obtained by solving the following Bellman equation.

Lemma 1: There exists a unique scalar θ\theta and a value function {V(s)}\{V(s)\} satisfying:

θ+V(𝒔)=c(s,a)+minaAs𝒮Pr[s|s,a]V(s),s𝒮,{\theta}+{V}(\boldsymbol{s})=c(s,a)+\min_{a\in A}\sum_{s^{\prime}\in\mathcal{S}}\Pr[s^{\prime}|s,a]V(s^{\prime}),\ \forall s\in\mathcal{S}, (13)

where θ\theta is the optimal value to (12) for all initial state 𝑺1𝒮\boldsymbol{S}_{1}\in\mathcal{S} and the optimal policy which can achieve the optimal value θ\theta satisfies:

π(s)=argminaAs𝒮Pr[s|s,a]V(s),s𝒮.\pi^{*}(s)=\arg\min_{a\in A}\sum_{s^{\prime}\in\mathcal{S}}\Pr[s^{\prime}|s,a]V(s^{\prime}),\ \forall s\in\mathcal{S}. (14)

It can be observed that the argument of the value function in (13) is only the state ss. If it is extended to the general situation and the action aa is also regarded as the argument of the function, the state-action cost function is obtained as follows:

J(𝒔,a)=c(s,a)+s𝒮Pr[s|s,a]V(s),s𝒮.J(\boldsymbol{s},a)=c(s,a)+\sum_{s^{\prime}\in\mathcal{S}}\Pr[s^{\prime}|s,a]V(s^{\prime}),\ \forall s\in\mathcal{S}. (15)

It is evident from Lemma 1 and (14) that the optimal policy π\pi^{*} relies upon the value function V(.)V(.). In order to obtain V(.)V(.), we have to solve the Bellman equation in (13). But disappointingly, there is no closed-form solution in (13). The numerical solution can only be obtained by iterative methods such as value iteration or policy iteration. Furthermore, the design of the numerical solution corresponding to the optimal policy cannot provide more suggestions and will consume a lot of computing resources, especially in the high dimension problem.

Therefore, if the structural properties of the optimal policy can be obtained, it can be used to reduce the computational complexity of the policy and verify the final numerical result.

By the dynamics in (7)-(9) and using the relative value iteration algorithm, we can show the following properties of the value function V(s)V(s):

Theorem 1: For any state s1=(τ1,Cr1,Ce1)s^{1}=(\tau^{1},C_{r}^{1},C_{e}^{1}), s2=(τ2,Cr2,Ce2)s^{2}=(\tau^{2},C_{r}^{2},C_{e}^{2}), satisfy that τ11τ12\tau_{1}^{1}\geq\tau_{1}^{2}, τ21τ22\tau_{2}^{1}\geq\tau_{2}^{2}, τ31τ32\tau_{3}^{1}\geq\tau_{3}^{2}, Ce2=Ce1C_{e}^{2}=C_{e}^{1} and Cr2=Cr1C_{r}^{2}=C_{r}^{1}, we have V(s1)V(s2)V(s^{1})\geq V(s^{2}).

Proof: The proof can be found in the online version.

Theorem 2: For the state s=(τ,Cr,Ce)s=(\tau,C_{r},C_{e}) which satisfies Cr=𝟎C_{r}=\boldsymbol{0} and Ce=𝟎C_{e}=\boldsymbol{0} have the following type policy:

If π(s)=(k1,k2)\pi^{*}(s)=(k_{1},k_{2}), then π(s)=(k1,k2)\pi^{*}(s^{\prime})=(k_{1},k_{2}). Where k1k2k_{1}\neq k_{2} and ss^{\prime} is the same with ss except τk1=τk1+z\tau^{\prime}_{k_{1}}=\tau_{k_{1}}+z, τk2=τk2+z\tau^{\prime}_{k_{2}}=\tau_{k_{2}}+z. zz is any positive integer number.

Proof: The proof can be found in the online version.

From Theorem 1 and Theorem 2, we present the threshold type policy of the MDP. The structural result will simplify off-line computation and facilitate the online application of scheduling policy. Note that we mainly focus on the case in which the remote server and edge server are idle, i.e., Cr=𝟎C_{r}=\boldsymbol{0} and Ce=𝟎C_{e}=\boldsymbol{0}. Because the result in other conditions agrees with intuition in other conditions, e.g., when the status information of process kk is transmitting to the remote server, or the remote server is computing the status information of process kk, the edge server will serve other processes except process kk. Also, this result can be extended to the single server model.

III Low-Complexity Suboptimal policy

In the MDP mentioned above, the computational complexity will become very high when too many elements are in the state space, which is called the curse of dimensionality. So we proposed a suboptimal policy, which sacrifices a small amount of performance in exchange for a significant reduction in computational complexity.

Let θ^\hat{\theta} and V^(S)\hat{V}(S) be the system average MSE and the value function under an unchained base policy π^\hat{\pi}. From the proof of [8], there exists (θ^,V^(𝑺))(\hat{\theta},\hat{V}(\boldsymbol{S})) satisfying the following Bellman equation.

θ^+V^(𝒔)=minaA{c(s,a)+𝔼[V(s)s,a]},s𝒮\hat{\theta}+\hat{V}(\boldsymbol{s})=\min_{a\in A}\left\{c(s,a)+\mathbb{E}\left[V\left(s^{\prime}\right)\mid s,a\right]\right\},\ \forall s\in\mathcal{S} (16)

where ss^{\prime} is the next state from state ss under the given action aa. Obviously, the state cost is independent of the action and (16) can be further written as

θ^+\displaystyle\hat{\theta}+ V^(𝒔)=k=1NQk\displaystyle\hat{V}(\boldsymbol{s})=\sum_{k=1}^{N}Q_{k} (17)
+minvs𝒮𝔼π^[Pr[s|s,a]]V(s),s𝒮\displaystyle+\min_{v}\sum_{s^{\prime}\in\mathcal{S}}\mathbb{E}^{\hat{\pi}}[\text{Pr}[s^{\prime}|s,a]]V(s^{\prime}),\ \forall s\in\mathcal{S}

For each process kk, we define sk(τk,Cr.Ce)𝒮ks_{k}\triangleq(\tau_{k},C_{r}.C_{e})\in\mathcal{S}_{k} as the state space and 𝒮k\mathcal{S}_{k} is the set of all feasible states. The action space is the same as SectionII. The transmit probability can also be derived from the previous description, which is omitted here. After the definition, we show that V^(X)\hat{V}(X) has the following additive separable structure.

Lemma 2: Given any unchain policy, the value function V^(s)\hat{V}(s) in (16) can be express as V^(s)=s𝒮V^k(sk)\hat{V}(s)=\sum_{s\in\mathcal{S}}\hat{V}_{k}(s_{k}), where for each kk, V^k(sk)\hat{V}_{k}(s_{k}) have the following property:

θ^+\displaystyle\hat{\theta}+ V^k(𝒔𝒌)=Qk\displaystyle\hat{V}_{k}(\boldsymbol{s_{k}})=Q_{k} (18)
+minvksk𝒮k𝔼π^[Pr[sk|sk,a]]V(sk),sk𝒮k\displaystyle+\min_{v_{k}}\sum_{s^{\prime}_{k}\in\mathcal{S}_{k}}\mathbb{E}^{\hat{\pi}}[\text{Pr}[s^{\prime}_{k}|s_{k},a]]V(s^{\prime}_{k}),\ \forall s_{k}\in\mathcal{S}_{k}

where θk\theta_{k} and V^k\hat{V}_{k} are the average MSE and value function of each process under policy π^\hat{\pi}, respectively.

Proof: Along the line of proof of [[24], Lemma3], we have the additive separable structure of the value function under a unchain base policy π^\hat{\pi} and by making use of the relationship between the joint distribution and marginal distribution. So, we have the equation s𝒮Pr[s|s,a]=sk𝒮kPr[sk|s,a]=sk𝒮kPr[sk|sk,a]\sum_{s^{\prime}\in\mathcal{S}}\text{Pr}[s^{\prime}|s,a]=\sum_{s^{\prime}_{k}\in\mathcal{S}_{k}}\text{Pr}[s^{\prime}_{k}|s,a]=\sum_{s^{\prime}_{k}\in\mathcal{S}_{k}}\text{Pr}[s^{\prime}_{k}|s_{k},a] holds. Then, by substituting V^(s)=k𝒩V^k(sk)\hat{V}(s)=\sum_{k\in\mathcal{N}}\hat{V}_{k}(s_{k}) into (16), it can be easily checked that the equality in (18) holds.

Now, we approximate the value function in Bellman equation with V^(s):V(s)V^(s)=k𝒩V^k(sk)\hat{V}(s):V(s)\approx\hat{V}(s)=\sum_{k\in\mathcal{N}}\hat{V}_{k}(s_{k}) , and V^k(sk)\hat{V}_{k}(s_{k}) is given in (18). Then, according to (17) we develop a deterministic scheduling suboptimal policy in the following:

π^(s)=argminaAs𝒮Pr[s|s,a]k𝒩V^k(sk),s𝒮\hat{\pi}^{*}(s)=\text{arg}\min_{a\in A}\sum_{s^{\prime}\in\mathcal{S}}\text{Pr}[s^{\prime}|s,a]\sum_{k\in\mathcal{N}}\hat{V}_{k}(s^{\prime}_{k}),\ \forall s\in\mathcal{S} (19)

The proposed deterministic policy in (19) likes the one iteration step in the standard policy iteration algorithm. It divides the original MDP into multiple small tasks. Though the number of problem states has not decreased, the dimensionality of each small task has been reduced.

In [23, proposition 5.4.2] and [25, Theorem 8.6.6], the convergence of the sub-optimal algorithm is proved. In addition, in [24, Theorem 1] and [8, Lemma 3], similar sub-optimal strategies are also adopted, which also proves that the proposed suboptimal algorithm is superior to other random algorithms.

When there are multiple processes in the plant, this suboptimal algorithm will significantly reduce the complexity of the calculation, making it possible to deal with multi-process problems. In general, in order to obtain the deterministic suboptimal policy, it is necessary to calculate the value function of each process V^k(sk)\hat{V}_{k}(s_{k}), the computational complexity of the proposed sub-optimal algorithm is: O(k𝒩A^k|Cr||Ce|)O(\sum_{k\in\mathcal{N}}\hat{A}_{k}*|C_{r}|*|C_{e}|). In contrast, the computational complexity required to obtain the optimal policy π\pi by calculating V(s)V(s) is O(k=1NA^k|Cr||Ce|)O(\prod_{k=1}^{N}\hat{A}_{k}*|C_{r}|*|C_{e}|), where |Cr||C_{r}| and |Ce||C_{e}| represent the number of feasible states in the set CrC_{r} and CeC_{e}, respectively.

IV Scheduling Under Preemption Mechanism

Due to the noisy channel, status information cannot transmit successfully in every slot. The controller may be idle if only one status is allowed to send the state to controller at a slot. Moreover, the worse the channel, the longer the controller will be idle.

The orthogonal channel we adopted allows that multiple sensors can transmit status updates at the same time slot [26], some "standby status information" can be transmitted with the "optimal status information" in the same time slot. When the “optimal status information” transmit failed, the server can process the “standby status information” instead. Work with the preemption mechanism, it can significantly reduce the idle time of the server without reducing the system performance.

In order to facilitate the description of the calculating process on the controller, we call the status update as information task. Preemption means when a higher priority task arrives at the controller, the controller will terminate the currently processing task and execute the newly arrived task. The preempted but not completed task will continue to be executed after the current task is completed when no other preemption occurs later. As shown in Fig.2, task 1 arrives at the server first. When task 1 is being processed, task 2 with higher priority arrives. The server immediately stops executing task 1 and instead performs task 2. When task 3 with a lower priority arrives, the server does not respond. After task 2 is completed, the server continues to execute task 1 that has not been completed before. The preemption mechanism makes that the server can switch to another task and resume the current task later. Therefore, the server can handle tasks more flexibly, and the average MSE of the system will also be reduced.

The goal is the same as Section II, to minimize the system MSE. To investigate the scheduling policy on each controller, we assume the system has N=2N=2 sensors, and both sensors can transmit their status updates to server in a time slot using the orthogonal channel. Furthermore, this problem can be extended to general situation. In addition, the task processing time and uplink and downlink delay of the server are the same as the remote server in the system model.

The state space is defined as 𝐒(τ,𝐓𝟏,𝐓𝟐,Et,It)\mathbf{S}\triangleq(\mathbf{\tau},\mathbf{T_{1}},\mathbf{T_{2}},E_{t},I_{t}) which consists all possible states.

τ(τ1,τ2)\tau\triangleq(\tau_{1},\tau_{2}) which records the AoI of each process. For different process kk, we defined τk{1,2,,τ^k}\tau_{k}\triangleq\{1,2,\cdots,\hat{\tau}_{k}\} as the AoI at the beginning of slot tt. We set τ^k\hat{\tau}_{k} be the upper limits of the AoI for process kk.

𝐓𝟏(I1,d1,E1)\mathbf{T_{1}}\triangleq(I_{1},d_{1},E_{1}), these parameters characterize the processing task. I1{0,1,2}I_{1}\in\{0,1,2\} represents the index of the process in which status is computing on controller and 0 represents no task is processing. d1d_{1} records the number of slots that are left to computing. When the task is preempted, the number of time slots in the controller will increase, and E1E_{1} records the increasing time slot length of processing tasks.

𝐓𝟐(I2,d2,E2)\mathbf{T_{2}}\triangleq(I_{2},d_{2},E_{2}), these parameters characterize the preempted task. I2{0,1,2}I_{2}\in\{0,1,2\} represents the index of the preempted task and 0 represents no task is processing. d2d_{2} records the number of slots that are left to computing. E2E_{2} records the increasing time slot length of preempted tasks.

Refer to caption
Figure 2: Preemption mechanism on controller.

ItI_{t} and EtE_{t} characterize the task which has been processed completed. It{0,1,2}I_{t}\in\{0,1,2\} is the index of the process in which the control command is being transmitted to actuator, and 0 represents no task is being transmitted. Moreover, EtE_{t} records the increasing time slot length of the process in which the control command is being transmitted.

The action aa is in the action space A={0,1,2}A=\{0,1,2\}, where a=0a=0 means do not preempt and a=k(k0)a=k\ (k\neq 0) means the status of process kk preempt the processing task. Note that, the current task will be canceled when a=ka=k and the processing task is from process kk.Because the current task is obsolete and will not reduced the MSE after completion under this situation.

At the tt-th time slot, we assume the system state is s(t)=(τ(t),𝐓𝟏(t),𝐓𝟐(t),Et(t),It(t))s(t)=(\mathbf{\tau}(t),\mathbf{T_{1}}(t),\mathbf{T_{2}}(t),E_{t}(t),I_{t}(t)) where τ(t)=(τ1(t),τ2(t))\mathbf{\tau}(t)=(\tau_{1}(t),\tau_{2}(t)), T1(t)=(I1(t),d1(t),E1(t))T_{1}(t)=(I_{1}(t),d_{1}(t),E_{1}(t)) and T2(t)=(I2(t),d2(t),E2(t))T_{2}(t)=(I_{2}(t),d_{2}(t),E_{2}(t)). The state transition formula is given below, and the transition will be written in the order of the success and failure of the downlink channel transmission.

When the downlink transmission is successful, we have:

τk(t+1)={Δr+Δr+Tr,k+Et(t), if It(t)=k,τk(t)+1, otherwise. \tau_{k}(t+1)=\left\{\begin{array}[]{ll}\Delta_{r}^{\uparrow}+\Delta_{r}^{\downarrow}+T_{r,k}+E_{t}(t),&\text{ if }\ I_{t}(t)=k,\\ \tau_{k}(t)+1,&\text{ otherwise. }\end{array}\right. (20)
T1(t+1)={{Tr,k1,k,0}, if a=k(k0),{d2(t),I2(t),E2(t)}, if a=0,d1=1,{max(d2(t)1,0),I2(t),E2(t)}, if a=0,d1=0,{d1(t)1,1(t),E1(t)}, otherwise. \begin{array}[]{l}T_{1}(t+1)=\\ \left\{\begin{array}[]{ll}\{T_{r,k}-1,k,0\},&\text{ if }a=k(k\neq 0),\\ \{d_{2}(t),I_{2}(t),E_{2}(t)\},&\text{ if }a=0,\ d_{1}=1,\\ \{\max(d_{2}(t)-1,0),I_{2}(t),E_{2}(t)\},&\text{ if }a=0,\ d_{1}=0,\\ \{d_{1}(t)-1,1(t),E_{1}(t)\},&\text{ otherwise. }\end{array}\right.\end{array} (21)
T2(t+1)={{d1(t),I1,E1(t)+Tr,k}, if a=k(k0),{0,0,0}, if a=0andd1=1,{0,0,0}, if a=0andd1=0,{d2(t),I2(t),E2(t)}, otherwise. \begin{array}[]{l}T_{2}(t+1)=\\ \left\{\begin{array}[]{ll}\{d_{1}(t),I_{1},E_{1}(t)+T_{r,k}\},&\text{ if }a=k\ (k\neq 0),\\ \{0,0,0\},&\text{ if }a=0\ \text{and}\ d_{1}=1,\\ \{0,0,0\},&\text{ if }a=0\ \text{and}\ d_{1}=0,\\ \{d_{2}(t),I_{2}(t),E_{2}(t)\},&\text{ otherwise. }\end{array}\right.\end{array} (22)
(It(t+1),Et(t+1))={(I1(t),E1(t)), if It(t)=k(0,0), Otherwise (I_{t}(t+1),E_{t}(t+1))=\left\{\begin{array}[]{ll}(I_{1}(t),E_{1}(t)),&\text{ if }\ I_{t}(t)=k\\ (0,0),&\text{ Otherwise }\end{array}\right. (23)

When the downlink transmission is failed, the only difference with the success situation is in (20). The rest part T1T_{1}, T2T_{2}, ItI_{t} and EtE_{t} are the same as (21) (22) (23). Therefore, it will not be described again. Furthermore, the dynamics of the system AoI in (20) is changed to:

τk(t+1)=τk(t)+1,\tau_{k}(t+1)=\tau_{k}(t)+1, (24)

The characteristics of state cost are the same as those in Section II. The one-stage cost only depends on the current state and is defined as:

c(s,a)k=1NQk=k=1Ni=1τk(Ak)(i1)Rk(AkT)(i1)c(s,a)\triangleq\sum_{k=1}^{N}Q_{k}=\sum_{k=1}^{N}\sum_{i=1}^{\tau_{k}}(A_{k})^{(i-1)}R_{k}(A_{k}^{T})^{(i-1)} (25)

For the preemption model, it has consistency properties. Theorem 3 Consistency. If a(t)=π(s)=3a^{*}(t)=\pi^{*}(s)=3, then a(t:t+Tr,i1)=3a^{*}(t:t+T_{r,i}-1)=3 where a(t:t+Tr,i1)=3a^{*}(t:t+T_{r,i}-1)=3 denotes a(t+1)=a(t+2)==a(t+Tr,i1)=3a^{*}(t+1)=a^{*}(t+2)=\cdots=a^{*}(t+T_{r,i}-1)=3 and ii is the index of status which is calculating on the server.

Proof: See in Appendix C

In other words, Theorem 3 shows that the process status will not be preempted until it is processed completely when the server takes the optimal action a(t)=π(s)=i,i=1,2a^{*}(t)=\pi^{*}(s)=i,i=1,2 successfully.

V SIMULATION RESULT AND ANALYSIS

In this section, we give some parameter settings in simulation and present the numerical result. Besides, the structure of the optimal policies in Section III and IV can verify the numerical result. In addition, we consider a greedy baseline policy for comparison, in which the top two processes with the highest MSE can transmit their status to the controller and the highest one has the priority to choose the controller first.

V-A Setting of system parameters

In the simulation, we consider four processes and their dimensions decrease in turn:

A1=[1.02101010.2000100001.01]A2=[11001.020001]\centering{\begin{matrix}A_{1}=\begin{bmatrix}1.02&1&0&1\\ 0&1&0.2&0\\ 0&0&1&0\\ 0&0&0&1.01\end{bmatrix}&A_{2}=\begin{bmatrix}1&1&0\\ 0&1.02&0\\ 0&0&1\end{bmatrix}\end{matrix}}\@add@centering
A3=[1.02101]A4=1.02\centering{\begin{matrix}A_{3}=\begin{bmatrix}1.02&1\\ 0&1\end{bmatrix}&A_{4}=1.02\end{matrix}}\@add@centering

where AkA_{k}, k=1,2,3,4k=1,2,3,4 is the state transmition matrix in (1). As for the noise in (1), like most previous studies, RkR_{k} is set to an identity matrix where k=1,2,3,4k=1,2,3,4.

The controller parameter settings in the simulation are shown in Table 1. Where the function size(.)size(.) can obtain the dimension of independent variable, e.g., size(A1)=4size(A_{1})=4.

TABLE I: The controller parameter setting
o 0.4X[2,c]|X[3,c]|X[2,c]|X[3,c] Parameter Value Parameter Value
Δr\Delta^{\uparrow}_{r} 1 Δe\Delta^{\uparrow}_{e} 0
Δr\Delta^{\downarrow}_{r} 1 Δr\Delta^{\downarrow}_{r} 0
Tr,kT_{r,k} size(AkA_{k}) Te,kT_{e,k} 2size(Ak)2*\text{size}(A_{k})

V-B Scheduling simulation result

In order to more vividly show the difference in processing tasks between the edge server and the remote server. We first simulate the case where there are only two processes in the system plant.

In Fig.3 and Fig.4, the transmission success probability are all set to p=0.9p=0.9. Fig.3 is the optimal scheduling policy when only process 1 and process 4 in system. It is obvious that all the actions are a=(1,4)a=(1,4), which means the status information from process 1 always send to remote controller and the status information from process 4 always send to edge server. Fig.4 is the optimal scheduling policy when only process 1 and process 2 in system. Different from Fig.3, there are different actions in different states, and a=(2,1)a^{*}=(2,1) appears in the upper left corner of Fig.4.

Refer to caption
Figure 3: The optimal scheduling policy, where the blue cross stands for action a=(1,4)a=(1,4), and pp is set to p=0.9p=0.9. Only process1 and process4 in the system.
Refer to caption
Figure 4: The optimal scheduling policy, where the cross stands for action a=(1,2)a=(1,2) and red circle stands for action a=(2,1)a=(2,1), and pp is set to p=0.9p=0.9. Only process1 and process2 in the system.

It can be found that there are some subtle differences between the two cases, and the first case is more intuitive, that is, each process corresponds to a controller, e.t., the destination of status updates are deterministic. Nevertheless, in the second case, the optimal policy does not entirely form as two closed loops. It seems a bit counter-intuitive. But under this parameter setting, we can find that, for process 1, if the status is sent to the remote controller, the entire control process needs Δr+Δr+Tr,1=6\Delta^{\uparrow}_{r}+\Delta^{\downarrow}_{r}+T_{r,1}=6 time slots from sampling to completing. Besides, if the status is sent to edge server, the control loop needs Δe+Δe+Te,1=8\Delta^{\uparrow}_{e}+\Delta^{\downarrow}_{e}+T_{e,1}=8 time slots. For process 2 the control loop needs Δr+Δr+Tr,2=5\Delta^{\uparrow}_{r}+\Delta^{\downarrow}_{r}+T_{r,2}=5 and Δe+Δe+Te,2=6\Delta^{\uparrow}_{e}+\Delta^{\downarrow}_{e}+T_{e,2}=6, respectively. Obviously, it is the best choice for both process1 and process2 to send the status information to the remote server. Also, the controller can calculate status information when another status is transmitting, i.e., Δr+Δr=2\Delta^{\uparrow}_{r}+\Delta^{\downarrow}_{r}=2 in the control loop can calculate another status. Therefore, when the MSE of process1 is much smaller than the MSE of process2, process 2 will choose a better action for itself, and process 1 can only choose a non-optimal action. However, there is no such trade-off in case 1, where the processing time of status differs significantly. Process1 and process4 are choosing their own optimal conditions, respectively.

Refer to caption
Figure 5: The performance of each policy, when p=[0.8,1]p=[0.8,1]. Only process 1 and process 2 in the system.

To verify the performance of various policies, we considered the greedy baseline policy and static policy in addition to the optimal policy. The static policy means that each process will only transmit the status information to the corresponding controller and will not transmit it to other controllers.

Since the optimal policy in case 1 is entirely equal to the static policy, performance simulation is only performed for case 2. We have done Monte Carlo simulations of 40,000 time slots with different values of pp. The result is shown in Fig.5.

Compared with the greedy baseline policy, the performance of optimal policy is greatly improved. While for the static policy, it is vastly improved only when the transmission success probability is low. This is because the optimal policy in Fig.4, only the upper left corner is different from the static policy, and the lower the transmission success probability, the easier it is to reach these different states. When the transmission success probability is 1, these states will not be reached at all.

For the case where the number of processes in the system is more than that of the controller, we consider the case where process1, process2, and process3 are in the system. When the states of both controllers have been determined, the optimal policy is a three-dimensional diagram. In order to see the optimal policy more intuitively, we intercepted a plane in the three-dimensional coordinate system like Fig.6, which satisfies τ1+τ2+τ3=45\tau_{1}+\tau_{2}+\tau_{3}=45.

The results are shown in Fig.7. There are four regions in the graph. It is a threshold policy, which is similar to the optimal policy in a two-dimensional case. When the action is given, the set of actions in the optimal policy is a simply connected region.

Refer to caption
Figure 6: Schematic diagram of intercepting plane in three-dimensional optimal policy
Refer to caption
Figure 7: The optimal policy of three processes system
Refer to caption
Figure 8: The optimal policy on remote server when edge server is occupied by process 3
Refer to caption
Figure 9: The optimal policy on edge server when remote server is occupied by process 1

Fig.8 represents the optimal policy on remote server when the edge server is occupied by the status from process 3, e.g. Cr=(Ir,1,dr)=(0,0)C_{r}=(I_{r,1},d_{r})=(0,0) and Ce=(Ie,1,de)=(3,3)C_{e}=(I_{e,1},d_{e})=(3,3). Fig.9 represents the optimal policy on edge server when the remote server is occupied by the status from process 1, e.g.Cr=(Ir,1,dr)=(1,4)C_{r}=(I_{r,1},d_{r})=(1,4) and Ce=(Ie,1,de)=(0,0)C_{e}=(I_{e,1},d_{e})=(0,0).

Refer to caption
Figure 10: The performance of each policy, when p=[0.8,1]p=[0.8,1].

In Fig.10, the optimal policy we obtained is compared with sub-optimal, greedy, and static policies. It can be observed that compared with the greedy baseline policy, the proposed optimal policy still has a significant improvement. When the transmission success probability is closer to 1, the performance of optimal policy, sub-optimal policy, and static policy tend to be the same. It is also observed that the performance of sub-optimal policy is close to optimal policy.

V-C Application of preemption mechanism

Fig.11 and Fig.12 are the optimal preemption policy when the remote server processes the tasks of process 1 and process 2, respectively. It is a threshold policy, which is the same as the theory in Section IV. Furthermore, according to the nature of the Theorem 3, when preemption action is executed, the currently running task will not be preempted.

Refer to caption
Figure 11: The optimal policy when remote server is calculating the task from process 2. The red circle represents the process 1 task preempts the process 2 task, and the cross represents continue execution of the current task
Refer to caption
Figure 12: The optimal policy when remote server is calculating the task from process 1. The red circle represents the process 2 task preempts the process 1 task, and the cross represents continue execution of the current task
Refer to caption
Figure 13: Performance comparison between preemption and no preemption

We compared the performance with the preemption mechanism and the performance without the preemption mechanism. After adding the preemption policy into the optimal policy in Fig.10, Fig.13 is obtained. It can be found that the higher transmission success probability, the more minor performance improvement, and the overall performance improvement is not significant. Although there are many actions to perform preemption in Fig.11 and Fig.12, preemption is based on the execution of non-optimal actions. Therefore, for WNCS with a periodic status update, although the preemption mechanism will not reduce system performance, it will not significantly improve the system performance.

In the simulation, we can find that compared with the static policy, the performance of our proposed polices is much improved when the transmission success probability is low. Meanwhile, the performance of the optimal policy is the same as that of the static policy when the wireless channel is perfect, i.e., the transmission success probability pp is set to 11. This is because the optimal policy is the same as static policy when the wireless channel is perfect, and the AoI of all processes will change periodically.

VI Conclusion

In this paper, we study the scenario of multi-process and edge computing in WNCS intending to minimize the system MSE. We have formulated this problem as an MDP problem and proposed a low complexity suboptimal policy based on random policy. For the obtained optimal policy, by characterizing the monotonicity property of the value function, we prove the structural characteristics of the optimal policy. For the imperfect channel, we also discuss the preemption mechanism on the server and prove that the preemption action is the current optimal action and will not be preempted before the task is completed, which we call consistency properties.

For the preemption mechanism on the server, the performance improvement is not apparent. Because it will not perform non-optimal actions when the channel is reliable. We will further study the event-triggered WNCS system. Because the state generation is random, it is expected that the preemption mechanism will more significantly improve system performance.

Appendix A Proof of Theorem 1

We prove Lemma 2 using the relative value iteration algorithm(RVIA) [23, Chapter 5.3] and mathematical induction. In order to make the description clearly, we briefly present the RVIA at the beginning. For each system state s𝒮s\in\mathcal{S}, we denote the value function at iteration nn by Vn(s)V_{n}(s), where n=1,2,n=1,2,\cdots. Define the state-action cost function at iteration nn as:

Jn(s,a)=k=1NQk+s𝒮Pr[s|s,a]Vn(s).J_{n}(s,a)=\sum_{k=1}^{N}Q_{k}+\sum_{s\in\mathcal{S}}\Pr[s^{\prime}|s,a]V_{n}(s^{\prime}). (26)

Note that Jn(s,a)J_{n}(s,a) is related to the right-hand side of the Bellman equation in (13). For each ss, RVIA can be used to find Vn(s)V_{n}(s) according to:

Vn+1(s)=minaAJn+1(s,a)minaAJn+1(s,a),n,V_{n+1}(s)=\min_{a\in A}J_{n+1}(s,a)-\min_{a\in A}J_{n+1}(s^{\dagger},a),\ \forall n, (27)

where ss^{\dagger} is some fixed state. According to [23, Proposition 5.3.2], the generated sequence {Vn(s)}\{V_{n}(s)\} converges to {V(s)}\{V(s)\} under any initialization of V0(X)V_{0}(X), i.e.,

limn+Vn(s)=V(s),s𝒮,\lim_{n\rightarrow+\infty}V_{n}(s)=V(s),\ \forall s\in\mathcal{S}, (28)

where V(s)V(s) satisfies the Bellman equation in (13). Let πn(s)\pi^{*}_{n}(s) be the scheduling action attains to the minimum of the first term in (27) at the nn-th iteration for all ss, i.e.

πn(s)=argminaAJn+1(s,a),s𝒮,\pi^{*}_{n}(s)=\arg\min_{a\in A}J_{n+1}(s,a),\ \forall s\in\mathcal{S}, (29)

Define πn(s)(πn,k(s))k𝒩\pi^{*}_{n}(s)\triangleq(\pi_{n,k}^{*}(s))_{k\in\mathcal{N}}, where πn,k(s)\pi_{n,k}^{*}(s) denotes the scheduling action of the process kk under state ss. We refer to πn\pi^{*}_{n} as the optimal policy at iteration nn.

By the introduction of above, we will prove Lemma 2 through the RVIA using mathematical induction. Consider two system states s1=(τ1,Cr1,Ce1)s^{1}=(\tau^{1},C_{r}^{1},C_{e}^{1}) and s2=(τ2,Cr2,Ce2)s^{2}=(\tau^{2},C_{r}^{2},C_{e}^{2}). According to (28), it is sufficient to prove Lemma 2 by show that for any s1s^{1} and s2s^{2} such that τ2τ1\tau^{2}\geq\tau^{1}, Cr2=Cr1C_{r}^{2}=C_{r}^{1} and Ce2=Ce1C_{e}^{2}=C_{e}^{1}.

Vn(s2)Vn(s1).V_{n}(s^{2})\geq V_{n}(s^{1}). (30)

holds for all n=1,2,n=1,2,\cdots

First we initialize V1(s)V_{1}(s) for all ss. We initialize V0(s)=0V_{0}(s)=0 and can easily find (30) holds for n=1n=1. Assume (30) holds for some n>1n>1. We will prove that (30) holds for n+1n+1. By (27), we have

Vn+1(𝒔1)=\displaystyle V_{n+1}\left(\boldsymbol{s}^{1}\right)= Jn+1(𝒔1,πn(𝒔1))Jn+1(𝒔,πn(𝒔))\displaystyle J_{n+1}\left(\boldsymbol{s}^{1},\pi_{n}^{*}\left(\boldsymbol{s}^{1}\right)\right)-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right) (31)
(a)\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}} Jn+1(𝒔1,πn(𝒔2))Jn+1(𝒔,πn(𝒔))\displaystyle J_{n+1}\left(\boldsymbol{s}^{1},\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)\right)-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right)
=\displaystyle= kQk1+𝒔sPr[𝒔1𝒔1,πn(𝒔2)]V(𝒔1)\displaystyle\sum_{k}Q_{k}^{1}+\sum_{\boldsymbol{s}^{\prime}\in s}\operatorname{Pr}\left[\boldsymbol{s}^{1^{\prime}}\mid\boldsymbol{s}^{1},\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)\right]V\left(\boldsymbol{s}^{1^{\prime}}\right)
Jn+1(𝒔,πn(𝒔)),\displaystyle-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right),

where (a)(a) is due to the optimal of πn(s1)\pi_{n}^{*}(s^{1}) for s1s^{1} at iteration nn. By (26) and (27), we have

Vn+1(𝒔2)=\displaystyle V_{n+1}\left(\boldsymbol{s}^{2}\right)= Jn+1(𝒔2,πn(𝒔2))Jn+1(𝒔,πn(𝒔))\displaystyle J_{n+1}\left(\boldsymbol{s}^{2},\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)\right)-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right) (32)
=\displaystyle= kQk2+𝒔𝓈Pr[𝒔2𝒔2,πn(𝒔2)]V(𝒔2)\displaystyle\sum_{k}Q_{k}^{2}+\sum_{\boldsymbol{s}^{\prime}\in\mathcal{s}}\operatorname{Pr}\left[\boldsymbol{s}^{2^{\prime}}\mid\boldsymbol{s}^{2},\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)\right]V\left(\boldsymbol{s}^{2^{\prime}}\right)
Jn+1(𝒔,πn(𝒔)).\displaystyle-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right).

Then we compare 𝒔1𝒮Pr[s1|𝒔1,πn(𝒔2)]V(𝒔l)\sum_{\boldsymbol{s}^{1^{\prime}}\in\mathcal{S}}\operatorname{Pr}[{s}^{{1}^{\prime}}|\boldsymbol{s}^{1},\pi_{n}^{*}(\boldsymbol{s}^{2})]V(\boldsymbol{s}^{\mathrm{l}^{\prime}}) with 𝒔2𝒮Pr[s2|𝒔2,πn(𝒔2)]V(𝒔l)\sum_{\boldsymbol{s}^{2^{\prime}}\in\mathcal{S}}\operatorname{Pr}[{s}^{{2}^{\prime}}|\boldsymbol{s}^{2},\pi_{n}^{*}(\boldsymbol{s}^{2})]V(\boldsymbol{s}^{\mathrm{l}^{\prime}}) for all possible πn(𝒔2)=(πn,k(𝒔2))k𝒩\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)=(\pi_{n,k}^{*}(\boldsymbol{s}^{2}))_{k\in\mathcal{N}}. For each process kk, we need to consider all cases under the feasible actions. According to (7)-(9), we can check that Qk2Qk1Q_{k}^{2^{\prime}}\geq Q_{k}^{1^{\prime}}, Cr2=Cr1C_{r}^{2^{\prime}}=C_{r}^{1^{\prime}} and Ce2=Ce1C_{e}^{2^{\prime}}=C_{e}^{1^{\prime}} hold for all corresponding actions. Thus, by the induction hypothesis, we have 𝒔1𝒮Pr[s1|𝒔1,πn(𝒔2)]V(𝒔l)𝒔2𝒮Pr[s2|𝒔2,πn(𝒔2)]V(𝒔l)\sum_{\boldsymbol{s}^{1^{\prime}}\in\mathcal{S}}\operatorname{Pr}[{s}^{{1}^{\prime}}|\boldsymbol{s}^{1},\pi_{n}^{*}(\boldsymbol{s}^{2})]V(\boldsymbol{s}^{\mathrm{l}^{\prime}})\geq\sum_{\boldsymbol{s}^{2^{\prime}}\in\mathcal{S}}\operatorname{Pr}[{s}^{{2}^{\prime}}|\boldsymbol{s}^{2},\pi_{n}^{*}(\boldsymbol{s}^{2})]V(\boldsymbol{s}^{\mathrm{l}^{\prime}}) which implies that Vn+1(s2)Vn+1(s1)V_{n+1}(s^{2})\geq V_{n+1}(s^{1}), i.e., (30) holds for n+1n+1. Therefore, by induction, we know that (30) holds for any nn. By taking limits on both sides of (30) and by (28). Now, we complete the proof.

Appendix B Proof of Theorem 2

From [9] and [27] we define the discounted cost under a deterministic and stationary policy π\pi and an initial state s0s_{0} by

Jα(s0,π)=lim supT1T𝔼sπ[k=0T1αkc(sk,ak)],J_{\alpha}\left(s_{0},\pi\right)=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}_{s}^{\pi}\left[\sum_{k=0}^{T-1}\alpha^{k}c\left(s_{k},a_{k}\right)\right], (33)

Note that (33) is different from (15), the independent variable is policy instead of action. Define Jα(s0)=infπΠJα(s0,π)J_{\alpha}^{*}(s_{0})=\inf_{\pi\in\Pi}J_{\alpha}(s_{0},\pi) and u(s)=Jα(s)Jα(s)u(s)=J_{\alpha}^{*}(s)-J_{\alpha}^{*}(s^{\dagger}), where ss^{\dagger} is some fixed state. Because there exists an optimal policy to the average cost counterpart of the cost function (33), the limit of u(s)u(s) as \aleph goes to 11 exits and is the relative value function in (13), i.e.,

V(s)=limα1u(s),V(s)=\lim_{\alpha\rightarrow 1}u(s), (34)

From the mapping in (34), we can analyze the properties of V(s)V(s) by examining u(s)u(s) by value iteration. We further define the dynamic programming operator Tαu(.)T_{\alpha}u(.) for a given measurable function uu: 𝒮\mathcal{S}\mapsto\mathcal{R} as

Tαu(s)minaAc(s,a)+α𝔼sπ[u],s𝒮,T_{\alpha}u(s)\triangleq\min_{a\in A}c(s,a)+\alpha\mathbb{E}_{s}^{\pi}[u],\ s\in\mathcal{S}, (35)

Tαu(s)T_{\alpha}u(s) is a contraction mapping and follows from Herna´\acute{a}ndez-Lerma [24]. By Banach fixed-point theorem,

limnTαnu(s)=Jα(s).\lim_{n\rightarrow\infty}T_{\alpha}^{n}u(s)=J_{\alpha}^{*}(s). (36)

If the dynamic operator preserves such properties, (36) can guarantees that certain properties holds for u(s)u(s). e.g. we can obtain the structure of V(s)=limα1u(s)V(s)=\lim_{\alpha\rightarrow 1}u(s) by verifying that the dynamic operator (34) preserves the same structure[14].

For the clarity of later proof, we set the number of processes in the plant to 3 and express the state space as s=(τ,Cr,Ce)=((τ1,τ2,τ3),Cr,Ce)s=\left(\tau,C_{r},C_{e}\right)=\left((\tau_{1},\tau_{2},\tau_{3}),C_{r},C_{e}\right). In addition, the value of c(s,a)c(s,a) is only related to τ\tau, so we set c(s,a)=c(τ)c(s,a)=c(\tau) in the following proof.

Take action a=(1,2)a^{*}=(1,2) as the optimal action of state ss, and compare it with action a=(2,3)a=(2,3) here. The other non-optimal actions can also be compared with the same steps. This Theorem is equivalent to the following:

(1) If V(τ,Cr1,Ce2)V(τ,Cr2,Ce3)V\left(\tau,C_{r}^{1},C_{e}^{2}\right)\leq V\left(\tau,C_{r}^{2},C_{e}^{3}\right), then V(τ,Cr1,Ce2)V(τ,Cr2,Ce3)V\left(\tau^{\prime},C_{r}^{1},C_{e}^{2}\right)\leq V\left(\tau^{\prime},C_{r}^{2},C_{e}^{3}\right).

(2) If V(τ,Cr1,Ce2)V(τ,Cr2,Ce3)V\left(\tau,C_{r}^{1},C_{e}^{2}\right)\leq V\left(\tau,C_{r}^{2},C_{e}^{3}\right), then V(τ,Cr1,Ce2)V(τ,Cr2,Ce3)V\left(\tau^{\prime},C_{r}^{1},C_{e}^{2}\right)\leq V\left(\tau^{\prime},C_{r}^{2},C_{e}^{3}\right).

where Cr1=(1,0,0,0)C_{r}^{1}=(1,0,0,0), Ce2=(Te,21,2)C_{e}^{2}=(T_{e,2}-1,2), Cr2=(2,0,0,0)C_{r}^{2}=(2,0,0,0), Ce3=(Te,31,3)C_{e}^{3}=(T_{e,3}-1,3) and τ=(τ1+z,τ2+z,τ3)\tau^{\prime}=(\tau_{1}+z,\tau_{2}+z,\tau_{3}), zz is any positive integer.

Then we prove (1) the structure properties by showing that u(.)u(.) has the same structure, and (2) can be proved in the same way.

In the case that Te,2>1T_{e,2}>1 and Te,3>1T_{e,3}>1.

If u(τ,Cr1,Ce2)u(τ,Cr2,Ce3)u\left(\tau,C_{r}^{1},C_{e}^{2}\right)\leq u\left(\tau,C_{r}^{2},C_{e}^{3}\right), from Theorem 2, we obtain

c(τ1,τ2,τ3)+\displaystyle c(\tau_{1},\tau_{2},\tau_{3})+ u(τ+𝟏,Cr1,Ce2)\displaystyle u(\tau+\boldsymbol{1},C_{r}^{1},C_{e}^{2}) (37)
c(τ1,τ2,τ3)+u(τ+𝟏,Cr2,Ce3),\displaystyle\leq c(\tau_{1},\tau_{2},\tau_{3})+u(\tau+\boldsymbol{1},C_{r}^{2},C_{e}^{3}),

where τ+𝟏=(τ1+1,τ2+1,τ3+1)\tau+\boldsymbol{1}=(\tau_{1}+1,\tau_{2}+1,\tau_{3}+1), and (37) implies that

u(τ+𝟏,Cr1,Ce2)u(τ+𝟏,Cr2,Ce3),u(\tau+\boldsymbol{1},C_{r}^{1},C_{e}^{2})\leq u(\tau+\boldsymbol{1},C_{r}^{2},C_{e}^{3}), (38)

Then we assume u(τ,Cr1,Ce2)>u(τ,Cr2,Ce3)u\left(\tau^{\prime},C_{r}^{1},C_{e}^{2}\right)>u\left(\tau^{\prime},C_{r}^{2},C_{e}^{3}\right), the same with (37) we have

u(τ\displaystyle u(\tau^{\prime} ,Cr1,Ce2)\displaystyle,C_{r}^{1},C_{e}^{2}) (39)
>c(τ1+z,τ2+z,τ3)+u(τ+𝟏,Cr2,Ce3),\displaystyle>c(\tau_{1}+z,\tau_{2}+z,\tau_{3})+u(\tau^{\prime}+\boldsymbol{1},C_{r}^{2},C_{e}^{3}),

beacuse c(τ1+z,τ2+z,τ3)>0c(\tau_{1}+z,\tau_{2}+z,\tau_{3})>0 which implies that

u(τ,Cr1,Ce2)>u(τ+𝟏,Cr2,Ce3),u(\tau^{\prime},C_{r}^{1},C_{e}^{2})>u(\tau^{\prime}+\boldsymbol{1},C_{r}^{2},C_{e}^{3}), (40)

Meanwhile, from Lemma 2 there exist

u(τ+𝟏,Cr2,Ce3)>u(τ+𝟏,Cr2,Ce3),u(\tau^{\prime}+\boldsymbol{1},C_{r}^{2},C_{e}^{3})>u(\tau+\boldsymbol{1},C_{r}^{2},C_{e}^{3}), (41)

From (39)-(41) we have

u(τ,Cr1,Ce2)>u(τ+𝟏,Cr2,Ce3),u(\tau^{\prime},C_{r}^{1},C_{e}^{2})>u(\tau+\boldsymbol{1},C_{r}^{2},C_{e}^{3}), (42)

According to the existing conditions (38)

u(τ+𝟏,Cr2,Ce3)u(τ+𝟏,Cr1,Ce2),u(\tau+\boldsymbol{1},C_{r}^{2},C_{e}^{3})\geq u(\tau+\boldsymbol{1},C_{r}^{1},C_{e}^{2}), (43)

which causes

u(τ,Cr1,Ce2)>u(τ+𝟏,Cr1,Ce2).u(\tau^{\prime},C_{r}^{1},C_{e}^{2})>u(\tau+\boldsymbol{1},C_{r}^{1},C_{e}^{2}). (44)

When z=1z=1 (44) violate the monotonicity and consistency of the value function and hence the assumption is incorrect. Therefore, u(τ,Cr1,Ce2)u(τ,Cr2,Ce3)u\left(\tau^{\prime},C_{r}^{1},C_{e}^{2}\right)\leq u\left(\tau^{\prime},C_{r}^{2},C_{e}^{3}\right) when z=1z=1. Same as the proof in Lemma2, it can be obtained by mathematical induction that this formula holds for any positive integer zz. The proof is done and other cases can be proved in the same steps.

Appendix C Proof of theorem 3

Consider the case that Tr,1>2T_{r,1}>2 and Tr,2>2T_{r,2}>2. Other cases can be easily extended. In order to simplify the expression, we omit the completely equal parameters in the proof. The state can be expressed as s=(τ1,τ2,d1,d2)s=(\tau_{1},\tau_{2},d_{1},d_{2}), where d1d_{1} and d2d_{2} represent the remaining computing time of the two processes in plant, respectively. Without loss of generality, assume that a(t1)=1,a(t+1)=3a^{*}(t-1)=1,\ a^{*}(t+1)=3, i.e.the server will compute the status from sensor 1 at (t1)(t-1)-th time slots, and will continue calculate the status information of the last time slot at tt-th time slot.

By the assumption, we have

c(τ1,τ2)\displaystyle c(\tau_{1},\tau_{2}) +u(τ1+1,τ2+1,Tr,12,Tr,2)\displaystyle+u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-2,T_{r,2}) (45)
<c(τ1,τ2)+u(τ1+1,τ2+1,Tr,11,Tr,2),\displaystyle<c(\tau_{1},\tau_{2})+u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-1,T_{r,2}),
c(τ1,τ2)\displaystyle c(\tau_{1},\tau_{2}) +u(τ1+1,τ2+1,Tr,12,Tr,2)\displaystyle+u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-2,T_{r,2}) (46)
<c(τ1,τ2)+u(τ1+1,τ2+1,Tr,11,Tr,21),\displaystyle<c(\tau_{1},\tau_{2})+u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-1,T_{r,2}-1),

(45) indicates the action a(t)=3a(t)=3 is better than a(t)=1a(t)=1, and (46) indicates the action a(t)=3a(t)=3 is better than a(t)=2a(t)=2. Meanwhile, the inequality (45) and (46) also can be further simplified to:

u(τ1+1,τ2+1,Tr,12,Tr,2)<u(τ1+1,τ2+1,Tr,11,Tr,2),u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-2,T_{r,2})<u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-1,T_{r,2}), (47)
u(τ1+1,τ2+1,Tr,12,Tr,2)<u(τ1+1,τ2+1,Tr,11,Tr,21),u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-2,T_{r,2})<u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-1,T_{r,2}-1), (48)

If a(t+1)=1a^{*}(t+1)=1, then the Left side of inequality (45) can be expressed as

c(τ1,τ2)+c(τ1+1,τ2+1)+u(τ1+2,τ2+2,Tr,11,Tr,2),\displaystyle c(\tau_{1},\tau_{2})+c(\tau_{1}+1,\tau_{2}+1)+u(\tau_{1}+2,\tau_{2}+2,T_{r,1}-1,T_{r,2}), (49)

Because c(τ1+1,τ2+1)>0c(\tau_{1}+1,\tau_{2}+1)>0, so we have:

u(τ1+1,τ2+1,Tr,12,Tr,2)>u(τ1+2,τ2+2,Tr,11,Tr,2),u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-2,T_{r,2})>u(\tau_{1}+2,\tau_{2}+2,T_{r,1}-1,T_{r,2}), (50)

One the other hands, due to the monotonicity, we have

u(τ1+2,τ2+2,Tr,11,Tr,2)>u(τ1+1,τ2+1,Tr,11,Tr,2),u(\tau_{1}+2,\tau_{2}+2,T_{r,1}-1,T_{r,2})>u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-1,T_{r,2}), (51)

Combine (50) and (51), it is obviously that

u(τ1+2,τ2+2,Tr,12,Tr,2)>u(τ1+1,τ2+1,Tr,11,Tr,2).u(\tau_{1}+2,\tau_{2}+2,T_{r,1}-2,T_{r,2})>u(\tau_{1}+1,\tau_{2}+1,T_{r,1}-1,T_{r,2}). (52)

We can found that (52) is contradict to (47). Therefore, different from proving that action a(t+1)=3a^{*}(t+1)=3 is optimal, the contradiction just prove a(t+1)=3a(t+1)=3 is better than a(t+1)=1a(t+1)=1. It is also necessary to prove that action a(t+1)=3a(t+1)=3 is better than the rest. The proof is exactly the same as (45)-(52), and is omitted here.

References

  • [1] C.-H. Tsai and C.-C. Wang, “Unifying aoi minimization and remote estimation—optimal sensor/controller coordination with random two-way delay,” in IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 466–475, IEEE, 2020.
  • [2] X. Zang, W. Liu, Y. Li, and B. Vucetic, “Over-the-air computation systems: Optimal design with sum-power constraint,” IEEE Wireless Communications Letters, vol. 9, no. 9, pp. 1524–1528, 2020.
  • [3] Y.-P. Hsu, “Age of information: Whittle index for scheduling stochastic arrivals,” in 2018 IEEE International Symposium on Information Theory (ISIT), pp. 2634–2638, IEEE, 2018.
  • [4] Z. Jiang, B. Krishnamachari, S. Zhou, and Z. Niu, “Can decentralized status update achieve universally near-optimal age-of-information in wireless multiaccess channels?,” in 2018 30th International Teletraffic Congress (ITC 30), vol. 1, pp. 144–152, IEEE, 2018.
  • [5] K. Huang, W. Liu, M. Shirvanimoghaddam, Y. Li, and B. Vucetic, “Real-time remote estimation with hybrid arq in wireless networked control,” IEEE Transactions on Wireless Communications, vol. 19, no. 5, pp. 3490–3504, 2020.
  • [6] S. Wu, X. Ren, S. Dey, and L. Shi, “Optimal scheduling of multiple sensors with packet length constraint,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 14430–14435, 2017.
  • [7] B. Wang, S. Feng, and J. Yang, “When to preempt? age of information minimization under link capacity constraint,” Journal of Communications and Networks, vol. 21, no. 3, pp. 220–232, 2019.
  • [8] B. Zhou and W. Saad, “Minimum age of information in the internet of things with non-uniform status packet sizes,” IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp. 1933–1947, 2019.
  • [9] S. Wu, X. Ren, S. Dey, and L. Shi, “Optimal scheduling of multiple sensors with packet length constraint,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 14430–14435, 2017.
  • [10] B. Wang, C. Wang, W. Huang, Y. Song, and X. Qin, “A survey and taxonomy on task offloading for edge-cloud computing,” IEEE Access, vol. 8, pp. 186080–186101, 2020.
  • [11] Z. Han, H. Tan, X.-Y. Li, S. H.-C. Jiang, Y. Li, and F. C. Lau, “Ondisc: Online latency-sensitive job dispatching and scheduling in heterogeneous edge-clouds,” IEEE/ACM Transactions on Networking, vol. 27, no. 6, pp. 2472–2485, 2019.
  • [12] H. Shiri, J. Park, and M. Bennis, “Massive autonomous UAV path planning: A neural network based mean-field game theoretic approach,” in 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6, IEEE, 2019.
  • [13] P. Skarin, W. Tärneberg, K.-E. Årzen, and M. Kihl, “Towards mission-critical control at the edge and over 5g,” in 2018 IEEE international conference on edge computing (EDGE), pp. 50–57, IEEE, 2018.
  • [14] S. Feng and J. Yang, “Minimizing age of information for an energy harvesting source with updating failures,” in 2018 IEEE International Symposium on Information Theory (ISIT), pp. 2431–2435, IEEE, 2018.
  • [15] M. A. Abd-Elmagid and H. S. Dhillon, “Average peak age-of-information minimization in UAV-assisted IoT networks,” IEEE Transactions on Vehicular Technology, vol. 68, no. 2, pp. 2003–2008, 2018.
  • [16] M. Jia, J. Cao, and W. Liang, “Optimal cloudlet placement and user to cloudlet allocation in wireless metropolitan area networks,” IEEE Transactions on Cloud Computing, vol. 5, no. 4, pp. 725–737, 2015.
  • [17] K. Huang, W. Liu, Y. Li, B. Vucetic, and A. Savkin, “Optimal downlink–uplink scheduling of wireless networked control for industrial iot,” IEEE Internet of Things Journal, vol. 7, no. 3, pp. 1756–1772, 2019.
  • [18] E. T. Ceran, D. Gündüz, and A. György, “A reinforcement learning approach to age of information in multi-user networks,” in 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1967–1971, IEEE, 2018.
  • [19] K. Chen and L. Huang, “Age-of-information in the presence of error,” in 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2579–2583, IEEE, 2016.
  • [20] X. Wang, M. Fang, C. Xu, H. H. Yang, X. Sun, X. Chen, and T. Q. Quek, “When to preprocess? keeping information fresh for computing enable internet of things,” arXiv preprint arXiv:2108.01919, 2021.
  • [21] W. Liu, G. Nair, Y. Li, D. Nesic, B. Vucetic, and H. V. Poor, “On the latency, rate, and reliability tradeoff in wireless networked control systems for iiot,” IEEE Internet of Things Journal, vol. 8, no. 2, pp. 723–733, 2020.
  • [22] M. Gorlatova, H. Inaltekin, and M. Chiang, “Characterizing task completion latencies in fog computing,” arXiv preprint arXiv:1811.02638, 2018.
  • [23] D. P. Bertsekas, “Dynamic programming and optimal control 3rd edition, volume ii,” Belmont, MA: Athena Scientific, 2011.
  • [24] Y. Cui, V. K. Lau, and Y. Wu, “Delay-aware bs discontinuous transmission control and user scheduling for energy harvesting downlink coordinated mimo systems,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3786–3795, 2012.
  • [25] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  • [26] K. M. Hosny and M. M. Darwish, “New set of multi-channel orthogonal moments for color image representation and recognition,” Pattern Recognition, vol. 88, pp. 153–173, 2019.
  • [27] X. Ren, J. Wu, S. Dey, and L. Shi, “Attack allocation on remote state estimation in multi-systems: Structural results and asymptotic solution,” Automatica, vol. 87, pp. 184–194, 2018.