On the Scheduling Policy for Multi-process WNCS under Edge Computing ^†^†thanks: Y. Qiu, S. Wu and Y. Wang are with the Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China. E-mail: [email protected], [email protected], [email protected]

Yifei Qiu, Shaohua Wu, and Ying Wang

Abstract

This paper considers a multi-process and multi-controller wireless networked control system (WNCS). There are $N$ independent linear time-invariant processes in the system plant which represent different kinds of physical processes. By considering the edge computing, the controllers are played by edge server and cloud server. Each process is measured by a sensor, and the status updates is sent to controller to generate the control command. The link delay of cloud server is longer than that of edge server. The processing time of status update depends on the characteristic of servers and processes. By taking into account such conditions, we mainly investigate how to choose the destination of status updates to minimize the system’s average Mean Square Error (MSE), edge server or cloud server? To address this issue, we formulate an infinite horizon average cost Markov Decision Process (MDP) problem and obtain the optimal scheduling policy. The monotonicity of the value function in MDP is characterized and then used to show the threshold structure properties of the optimal scheduling policy. To overcome the curse of dimensionality, we propose a low-complexity suboptimal policy by using additive separable structure of value function. Furthermore, the processing preemption mechanism is considered to handle the status updates more flexible, and the consistency property is proved. On this basis, a numerical example is provided. The simulation results illustrate that the controller selection is related to the timeliness of process and show the performance of the suboptimal policy. We also find that the optimal policy will become a static policy in which the destination of status update is fixed when the wireless channel is error free.

Index Terms:

WNCS, multi-process system, Edge computing, age of information, preemption, schelduling

I Introduction

In a wireless networked control system (WNCS), to ensure the accuracy of control and the stability of design, it is necessary to describe the timeliness and error of the system quantitatively. There are two commonly used parameters: Age of information (AoI) and Mean Square Error (MSE). The concept of AoI was proposed in [1], which can accurately quantify the timeliness of status information. Typically, the AoI is defined as the time elapsed since the most recently received status update was generated at the sensor. Unlike AoI, MSE is mainly used to measure the system’s error, which can describe the change rates of different sources. It is widely used, especially in the control system, such as the multi-sensor control system in [2].

Based on AoI and MSE, the research under the background of WNCS has yielded many beneficial results. Some literature study the scheduling polices to optimize the system AoI or MSE. For example, the work in [3] studies wireless networks with multiple users under the problem of AoI minimization and develops a low-complexity transmission scheduling algorithm. The authors in [4] extend the multiple users system to incorporate stochastic packet arrivals and optimal packet management and proposes an index-prioritized random access policy by minimizing the AoI. The works in [1] and [5] obtained the optimal actions in current state by designing the indicator based on latency and reliability, respectively. The authors in [5] and [6] study different lengths of source status and give the optimal policies in many cases by minimizing the system AoI. Based on [6], the work in [7] investigates the preemption mechanism in channel and prove the structural properties of the optimal policy. The non-uniform status model was further extended in [8], which is no longer limited to a single source and considers the situation of multiple processes with different status updates. The work in [9] introduces an optimal policy considering the change rate of different sources by minimizing the system MSE.

In WNCS, smaller delay means more accurate control, but the existing scheduling policies can not reduce the inherent delay in uplink and downlink. Therefore, edge computing was proposed to decrease prolonged delays in some time-critical tasks. In some WNCS, a small set of special servers was deployed near the sensors. These special servers are called edge servers, which nearby sensors can reach via wireless connections. In [10] and [11], edge computing is added to the sensor network. The authors in [10] introduce that the sensors can only full-offload the task, i.e., each status update can only be processed by one server, which can not be divided into multiple parts and handed over to multiple servers. The work in [12] introduces a neural network for training the system model and considers the processing time on server. The work in [13] uses the existing cloud processing center to build an edge computing model, and the results show that the computing capacity is different between edge server and cloud server. The authors in [11] propose a universal model which uses delay and computational capacity to distinguish different servers. The full task offloading without task migration in sensor network is also considered in [11].

Most existing works, e.g., [2]-[11], assume that the system has only one controller, and it takes the same time for different status updates to generate the control command. However, for the WNCS system containing edge computing, there are often multiple controllers in the system, and the controllers are mainly divided into edge servers and cloud servers. In addition, the time of generating control commands is also different. The edge servers are less potent in both resources and computational ability than remote cloud servers. It take longer time to process state information and generate control commands than cloud servers [12]. Moreover, to timely respond to time-critical tasks and improve system performance, the preemption mechanism commonly exists on the server, and tasks with higher priority can preempt the ongoing task. Therefore, some questions naturally arise: How to select the status update destination, remote cloud server or edge server, to optimize the system performance? Due to the randomness in the actual process, the server-side cannot perform the best action in every time slot; how to allocate the priority between different status updates to get the preemption policy that minimizes the MSE of the system? These questions motivate the study in this paper.

In this paper, the scheduling problem in WNCS with $N$ independent processes is studied, and we are interested in minimizing the average MSE. By referring to the model in [8] and [11], we adopt edge computing in the system model and consider that the state updates need to be processed in multiple time slots to generate control commands. Moreover, to handle the status updates more flexibly, we add the preemption mechanism on the basis of [8]. The main contributions of this paper are summarized as follows:

•

This paper considers a multi-process WNCS and applies the edge computing. The different processes stands for different physical processes, which status updates need different processing time on the same server. Meanwhile, we use link delay and computing power to describe edge server and cloud server. By introducing the AoI and remaining processing time of each process, the problem of how to scheduling status updates can be described as an infinite horizon average cost Markov Decision Process (MDP). Thus the scheduling policy of minimizing the MSE of the system is obtained.
•

A suboptimal policy is proposed to reduce the computational complexity. Along the proof of exists studies, we have the additive separable structure of value function. The original MDP problem can be divided into multiple small tasks. The simulation result shows that the performance of proposed suboptimal policy is near to optimal policy.
•

The processing preemption mechanism is considered in this paper. In order to handle the status updates more flexibly, we allow the task with higher priority to preempt the processing task on server. The preemption optimal policy is obtained by solving the formulated MDP problem. Then the consistency property of the preemption optimal policy is proved.

The rest of this paper is organized as follows. In Section II, we introduce the system model, formulate the problem and prove some structure properties. Section III presents a low-complexity suboptimal policy. Section IV further extends to the preemption mechanism on controller and characterizes the consistency property of its optimal policy. Simulation result and analysis are provided in Section V. Finally, conclusions are drawn in Section VI.

Refer to caption — Figure 1: System Model

II System Model And Problem Formulation

II-A System Model

We consider a multi-process WNCS. As shown in Fig.1, there are $N$ processes and $N$ sensors in the system, and each sensor samples its corresponding process[14]-[15]. For the uplink and downlink wireless channel, we adopt the orthogonal channel which can transmit multiple status updates at the same time. The controller in the system consists of an edge server and a cloud server. Same as the assumption in [11] and [16], compared with the remote server, the edge server has a smaller uplink and downlink delay but longer task processing time. In Fig.1, different color of dots in the circle represents different physical processes and the number of dots represents the dimension of process, e.g., the processing time required for generating control command from video signals is not the same as that from audio signals.

A time-slotted system is considered, where time is divided into slots with equal duration. And we model each process as a discrete linear time-invariant system [17]:

x_{k}(t+1)=A_{k}x_{k}(t)+B_{k}u_{k}(t)+z_{k}(t),

(1)

where $k\in\mathcal{N}\triangleq\{1,...,N\}$ and $t$ represent the process index and time slot, respectively. $x_{k}(t)\in R^{n_{k}}$ is the state of the $k$ -th process in $t$ -th time slot and $u_{k}(t)\in R^{n_{k}}$ is the executed control command. $z_{k}(t)$ represents the normally distributed noise with the distribution $N(0,R_{k})$ of the $k$ -th process and we assume that $R_{k}$ is a positive semi-definite. $A_{k}\in R^{n_{k}\times 1}$ and $B_{k}\in R^{n_{k}\times 1}$ represent the state transition matrix and command control coefficient of process $k$ , respectively. In this article, we consider the case that all processes remain unchanged within a single time slot. The goal of the system is to keep the state $x_{k}$ of each process $k$ close to $\boldsymbol{0}\in R^{n_{k}}$ .

In wireless channel, the transmission success probability for a status updates is set as $p\in(0,1)$ . As described in [14] [18][19], it is assumed that there is a perfect feedback channel between each sensor and the destination, so that each sensor will immediately inform whether the transmission is successful and determine the destination of the next time slot.

To generate effective control commands, the controller must maintain an accurate estimate $x_{k}$ of plant state. When the controller receives the status information from sensor successfully, it can use the timestamp to estimate the plant state. Define $\tau_{k}$ as the AoI of the status information generated by process $k$ . The estimation of $x_{k}(t)$ , denoted by $\widetilde{x}_{k}(t)$ , can be expressed as:

\widetilde{x}_{k}(t)=A_{k}^{\tau_{k}(t)}x_{k}(t-\tau_{k}(t)),

(2)

We denote $\Delta^{\uparrow}$ and $\Delta^{\downarrow}$ as the uplink and downlink delay, respectively. Assuming that the controller knows the delay of control command transmitted to the actuator. The goal of the system is to keep the state $x_{k}$ of each process $k$ close to $\boldsymbol{0}\in R^{n_{k}}$ . When the state information has been processed completely, the control command on actuator $u_{k}(t)$ and the control command on controller $\widetilde{u}_{k}(t)$ have the following relationship:

u_{k}(t)=\widetilde{u}_{k}(t-\Delta^{\downarrow})=K_{k}\widetilde{x}_{k}(t).

(3)

where $K_{k}=-A_{k}/B_{k}$ is the command generation coefficient.

Let $J$ denote the long-term average MSE of the system, which is given by

J={\lim_{T\to+\infty}}\frac{1}{T}\sum_{k=1}^{N}\sum_{t=0}^{T}Q_{k}(t).

(4)

where $Q_{k}(t)=\mathbb{E}[x_{k}(t)x_{k}(t)^{T}]$ represents the MSE of each process, which can be calculated by combining (1)-(3), given as:

Q_{k}(t)=\mathbb{E}[x_{k}(t)x_{k}(t)^{T}]=\sum_{i=1}^{\tau_{k}(t)}(A_{k})^{(i-1)}R_{k}(A_{k}^{T})^{(i-1)}.

(5)

This equation constructs the mapping between AoI and MSE and can simplify the calculation.

For the processing time, different tasks on the same server is not equal, and the same task on different server is also different. For each physical process $k$ , we denote the state information length $L_{k}$ , which is equal to the number of elements in $x_{k}$ . The number of bits per element in the state matrix and the number of CPU cycles needed to process one bit are denoted by $l$ and $v$ , respectively. The number of time slots required for process one status update is then given by [20]

T_{p}=\left\lceil\frac{L_{k}lv}{f\varepsilon}\right\rceil.

(6)

where $f$ (in Hz) is the CPU frequency of the processor and $\varepsilon$ (in seconds) is the time duration in each slot.

We assume that the edge computing system is stable such that the link delay is time invariable and a server can execute at most one task at a time. Because the remote cloud servers can be modeled as edge servers but with long link delay and more powerful processing capability, we do not explicitly differentiate between the edge servers and the remote cloud servers[11].

In order to describe the control performance of the system more accurately, we make a common assumption in the following:

Assumption:The transition matrix $A_{k}$ of each process satisfies that $\rho$ ( $A_{k}$ )>1. It means that without control, the system will be unstable, which is why the control system exists. This assumption makes the control necessary.

Here $\rho$ (.) stands for Spectral radius.

II-B Problem Formulation

For edge servers, the uplink and downlink delays are $\Delta_{e}^{\uparrow}$ and $\Delta_{e}^{\downarrow}$ , respectively. Moreover, the processing time on edge server of state $x_{k}$ is defined as $T_{e,k}$ . For a cloud server (or called the remote server), the uplink and downlink delays are defined as $\Delta_{r}^{\uparrow}$ and $\Delta_{r}^{\downarrow}$ , respectively. And the processing time on remote server of state $x_{k}$ is defined as $T_{r,k}$ .

In order to facilitate the subsequent representation, we make some assumptions which are similar to the existing works in [11] [21]-[22]. For the edge server, the uplink and downlink delay are set to be $1$ time slot, i.e., $\Delta_{e}^{\uparrow}=\Delta_{e}^{\downarrow}=1$ , and the processing time $T_{e,k}$ is set to be twice the dimension of $x_{k}$ . For the cloud server, we set the uplink and downlink delay to $2$ time slots, i.e., $\Delta_{r}^{\uparrow}=\Delta_{r}^{\downarrow}=2$ , and the processing time $T_{r,k}$ is numerically the same as the dimension of $x_{k}$ . e.g., for a task with a dimension of 1, if dispatched to the edge server, the uplink and downlink delays are both 1, and the processing time is $2$ ; if dispatched to the cloud server, the uplink and downlink delays are both 2 and the processing time is 1. This can be extended to a general case and will be introduced in the following. But note that, the uploading and downloading of a task do not consume any computing resource of the servers, so a server can process a task while transmitting other tasks at the same time.

Our goal is to jointly control every process to minimize the system MSE under non-uniform status update and edge computing system. This problem can be modeled as a MDP problem, and the system space can be expressed as $\mathbf{S}\triangleq(\tau,C_{r},C_{e})\in\mathcal{S}$ where $\mathcal{S}$ is the collection of all feasible states. Each parameter is described as follows:

In order to record the AoI information of all processes, we set $\mathbf{\tau}\triangleq(\tau_{1},\cdots,\tau_{N})$ . For different process $k$ , we define $\tau_{k}\triangleq\{1,2,\cdots,\hat{\tau}_{k}\}$ as the AoI at the beginning of slot $t$ . We set $\hat{\tau}_{k}$ be the upper limits of the AoI for process $k$ . For tractability [23], we assume that $\hat{\tau}_{k}$ is finite, but can be arbitrarily large.

Let $C_{r}\triangleq(I_{r,1},I_{r,2},I_{r,3},d_{r})$ be the state space for the remote server, $I_{r,1}\in\{1,\cdots,N\}$ stands for the index of process which is transmitting in the uplink, $I_{r,2}\in\{1,\cdots,N\}$ stands for the index of process which is being computed on the remote server, $I_{r,3}\in\{1,\cdots,N\}$ stands for the index of process which control command is transmitting from controller. $d_{r}$ records the number of slot that are left to be computed.

Let $C_{e}\triangleq(I_{e,1},d_{e})$ be the state space for the edge server, $I_{e,1}\in\{1,\cdots,N\}$ stands for the index of process which is being computed on the edge server, $d_{e}$ records the number of slot that are left to be computing.

The status information in transmission is like a queue and arrives at the server in the order of FCFS [16]. Therefore, to expand to the general case, just add index element for representing the process according to different uplink and downlink delays.

The action space is defined as $\mathbf{A}\triangleq(a_{1},a_{2})$ , where $a_{1},a_{2}\in\{1,\cdots,N\}$ represent the index of process where status update is transmitted to the remote server and edge server, respectively.

For each variable, we can write the update rules as follows:

When both the uplink and downlink transmission are successful:

\tau_{k}(t+1)=\left\{\begin{array}[]{ll}\Delta_{r}^{\uparrow}+\Delta_{r}^{\downarrow}+T_{r,k},&\text{ if }\ I_{r,3}(t)=k,\\ \Delta_{e}^{\uparrow}+\Delta_{e}^{\downarrow}+T_{e,I_{e,1}(t)},&\text{ if }\ d_{e}(t)=0,\\ \tau_{k}(t)+1,&\text{ otherwise. }\end{array}\right.

(7)

\begin{array}[]{l}C_{r}(t+1)=\\ \left\{\begin{array}[]{ll}(a_{1}(t),I_{1}(t),I_{2}(t),T_{r,I_{1}(t)}-1),&\text{ if }\ d_{r}(t)=0,\\ (a_{1}(t),I_{2}(t),0,d_{r}(t)-1),&\ \text{otherwise.}\end{array}\right.\end{array}

(8)

C_{e}(t+1)=\left\{\begin{array}[]{ll}(a_{2}(t),T_{r,I_{1}(t)}),&\text{ if }\ d_{e}(t)=1,\\ (I_{e,1}(t),d_{e}(t)-1),&\text{ otherwise. }\end{array}\right.

(9)

In other conditions, (7) will change to $\tau_{k}(t+1)=\tau_{k}(t)+1$ when the uplink transmission failed. When the downlink transmission failed, just replace the $a_{1}(t)$ and $a_{2}(t)$ to $0$ in (8) and (9), respectively.

The one-stage cost only depends on the current state $s=(\tau,C_{r},C_{e})$ and is defined as

c(s,a)\triangleq\sum_{k=1}^{N}Q_{k}=\sum_{k=1}^{N}\sum_{i=1}^{\tau_{k}}(A_{k})^{(i-1)}R_{k}(A_{k}^{T})^{(i-1)}.

(10)

The set of scheduling decisions for all possible states is called a scheduling policy $\pi\triangleq(a(1),a(2),\cdots)\in\Pi$ , where $\Pi$ is the collection of all feasible scheduling policies. As a result, under a feasible stationary policy $\pi$ , the average MSE of the system starting from a given initial state $\boldsymbol{S}(1)=\boldsymbol{S}_{1}$ is given by:

\bar{Q}^{\pi}\left(\boldsymbol{S}_{1}\right)\triangleq\limsup_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}\sum_{k=1}^{N}\mathbb{E}\left[Q_{k}(t)\mid\boldsymbol{S}_{1}\right].

(11)

We are seeking for the policy which can minimize average MSE of the system in the following:

\bar{Q}^{*}\left(\boldsymbol{S}_{1}\right)\triangleq\inf_{\pi}\bar{Q}^{\pi}\left(\boldsymbol{S}_{1}\right).

(12)

According to [23, Propositions 5.2.1], the optimal policy $\pi^{*}$ can be obtained by solving the following Bellman equation.

Lemma 1: There exists a unique scalar $\theta$ and a value function $\{V(s)\}$ satisfying:

{\theta}+{V}(\boldsymbol{s})=c(s,a)+\min_{a\in A}\sum_{s^{\prime}\in\mathcal{S}}\Pr[s^{\prime}|s,a]V(s^{\prime}),\ \forall s\in\mathcal{S},

(13)

where $\theta$ is the optimal value to (12) for all initial state $\boldsymbol{S}_{1}\in\mathcal{S}$ and the optimal policy which can achieve the optimal value $\theta$ satisfies:

\pi^{*}(s)=\arg\min_{a\in A}\sum_{s^{\prime}\in\mathcal{S}}\Pr[s^{\prime}|s,a]V(s^{\prime}),\ \forall s\in\mathcal{S}.

(14)

It can be observed that the argument of the value function in (13) is only the state $s$ . If it is extended to the general situation and the action $a$ is also regarded as the argument of the function, the state-action cost function is obtained as follows:

J(\boldsymbol{s},a)=c(s,a)+\sum_{s^{\prime}\in\mathcal{S}}\Pr[s^{\prime}|s,a]V(s^{\prime}),\ \forall s\in\mathcal{S}.

(15)

It is evident from Lemma 1 and (14) that the optimal policy $\pi^{*}$ relies upon the value function $V(.)$ . In order to obtain $V(.)$ , we have to solve the Bellman equation in (13). But disappointingly, there is no closed-form solution in (13). The numerical solution can only be obtained by iterative methods such as value iteration or policy iteration. Furthermore, the design of the numerical solution corresponding to the optimal policy cannot provide more suggestions and will consume a lot of computing resources, especially in the high dimension problem.

Therefore, if the structural properties of the optimal policy can be obtained, it can be used to reduce the computational complexity of the policy and verify the final numerical result.

By the dynamics in (7)-(9) and using the relative value iteration algorithm, we can show the following properties of the value function $V(s)$ :

Theorem 1: For any state $s^{1}=(\tau^{1},C_{r}^{1},C_{e}^{1})$ , $s^{2}=(\tau^{2},C_{r}^{2},C_{e}^{2})$ , satisfy that $\tau_{1}^{1}\geq\tau_{1}^{2}$ , $\tau_{2}^{1}\geq\tau_{2}^{2}$ , $\tau_{3}^{1}\geq\tau_{3}^{2}$ , $C_{e}^{2}=C_{e}^{1}$ and $C_{r}^{2}=C_{r}^{1}$ , we have $V(s^{1})\geq V(s^{2})$ .

Proof: The proof can be found in the online version.

Theorem 2: For the state $s=(\tau,C_{r},C_{e})$ which satisfies $C_{r}=\boldsymbol{0}$ and $C_{e}=\boldsymbol{0}$ have the following type policy:

If $\pi^{*}(s)=(k_{1},k_{2})$ , then $\pi^{*}(s^{\prime})=(k_{1},k_{2})$ . Where $k_{1}\neq k_{2}$ and $s^{\prime}$ is the same with $s$ except $\tau^{\prime}_{k_{1}}=\tau_{k_{1}}+z$ , $\tau^{\prime}_{k_{2}}=\tau_{k_{2}}+z$ . $z$ is any positive integer number.

Proof: The proof can be found in the online version.

From Theorem 1 and Theorem 2, we present the threshold type policy of the MDP. The structural result will simplify off-line computation and facilitate the online application of scheduling policy. Note that we mainly focus on the case in which the remote server and edge server are idle, i.e., $C_{r}=\boldsymbol{0}$ and $C_{e}=\boldsymbol{0}$ . Because the result in other conditions agrees with intuition in other conditions, e.g., when the status information of process $k$ is transmitting to the remote server, or the remote server is computing the status information of process $k$ , the edge server will serve other processes except process $k$ . Also, this result can be extended to the single server model.

III Low-Complexity Suboptimal policy

In the MDP mentioned above, the computational complexity will become very high when too many elements are in the state space, which is called the curse of dimensionality. So we proposed a suboptimal policy, which sacrifices a small amount of performance in exchange for a significant reduction in computational complexity.

Let $\hat{\theta}$ and $\hat{V}(S)$ be the system average MSE and the value function under an unchained base policy $\hat{\pi}$ . From the proof of [8], there exists $(\hat{\theta},\hat{V}(\boldsymbol{S}))$ satisfying the following Bellman equation.

\hat{\theta}+\hat{V}(\boldsymbol{s})=\min_{a\in A}\left\{c(s,a)+\mathbb{E}\left[V\left(s^{\prime}\right)\mid s,a\right]\right\},\ \forall s\in\mathcal{S}

(16)

where $s^{\prime}$ is the next state from state $s$ under the given action $a$ . Obviously, the state cost is independent of the action and (16) can be further written as

	$\displaystyle\hat{\theta}+$	$\displaystyle\hat{V}(\boldsymbol{s})=\sum_{k=1}^{N}Q_{k}$		(17)
		$\displaystyle+\min_{v}\sum_{s^{\prime}\in\mathcal{S}}\mathbb{E}^{\hat{\pi}}[\text{Pr}[s^{\prime}\|s,a]]V(s^{\prime}),\ \forall s\in\mathcal{S}$		(17)

For each process $k$ , we define $s_{k}\triangleq(\tau_{k},C_{r}.C_{e})\in\mathcal{S}_{k}$ as the state space and $\mathcal{S}_{k}$ is the set of all feasible states. The action space is the same as SectionII. The transmit probability can also be derived from the previous description, which is omitted here. After the definition, we show that $\hat{V}(X)$ has the following additive separable structure.

Lemma 2: Given any unchain policy, the value function $\hat{V}(s)$ in (16) can be express as $\hat{V}(s)=\sum_{s\in\mathcal{S}}\hat{V}_{k}(s_{k})$ , where for each $k$ , $\hat{V}_{k}(s_{k})$ have the following property:

	$\displaystyle\hat{\theta}+$	$\displaystyle\hat{V}_{k}(\boldsymbol{s_{k}})=Q_{k}$		(18)
		$\displaystyle+\min_{v_{k}}\sum_{s^{\prime}_{k}\in\mathcal{S}_{k}}\mathbb{E}^{\hat{\pi}}[\text{Pr}[s^{\prime}_{k}\|s_{k},a]]V(s^{\prime}_{k}),\ \forall s_{k}\in\mathcal{S}_{k}$		(18)

where $\theta_{k}$ and $\hat{V}_{k}$ are the average MSE and value function of each process under policy $\hat{\pi}$ , respectively.

Proof: Along the line of proof of [[24], Lemma3], we have the additive separable structure of the value function under a unchain base policy $\hat{\pi}$ and by making use of the relationship between the joint distribution and marginal distribution. So, we have the equation $\sum_{s^{\prime}\in\mathcal{S}}\text{Pr}[s^{\prime}|s,a]=\sum_{s^{\prime}_{k}\in\mathcal{S}_{k}}\text{Pr}[s^{\prime}_{k}|s,a]=\sum_{s^{\prime}_{k}\in\mathcal{S}_{k}}\text{Pr}[s^{\prime}_{k}|s_{k},a]$ holds. Then, by substituting $\hat{V}(s)=\sum_{k\in\mathcal{N}}\hat{V}_{k}(s_{k})$ into (16), it can be easily checked that the equality in (18) holds.

Now, we approximate the value function in Bellman equation with $\hat{V}(s):V(s)\approx\hat{V}(s)=\sum_{k\in\mathcal{N}}\hat{V}_{k}(s_{k})$ , and $\hat{V}_{k}(s_{k})$ is given in (18). Then, according to (17) we develop a deterministic scheduling suboptimal policy in the following:

\hat{\pi}^{*}(s)=\text{arg}\min_{a\in A}\sum_{s^{\prime}\in\mathcal{S}}\text{Pr}[s^{\prime}|s,a]\sum_{k\in\mathcal{N}}\hat{V}_{k}(s^{\prime}_{k}),\ \forall s\in\mathcal{S}

(19)

The proposed deterministic policy in (19) likes the one iteration step in the standard policy iteration algorithm. It divides the original MDP into multiple small tasks. Though the number of problem states has not decreased, the dimensionality of each small task has been reduced.

In [23, proposition 5.4.2] and [25, Theorem 8.6.6], the convergence of the sub-optimal algorithm is proved. In addition, in [24, Theorem 1] and [8, Lemma 3], similar sub-optimal strategies are also adopted, which also proves that the proposed suboptimal algorithm is superior to other random algorithms.

When there are multiple processes in the plant, this suboptimal algorithm will significantly reduce the complexity of the calculation, making it possible to deal with multi-process problems. In general, in order to obtain the deterministic suboptimal policy, it is necessary to calculate the value function of each process $\hat{V}_{k}(s_{k})$ , the computational complexity of the proposed sub-optimal algorithm is: $O(\sum_{k\in\mathcal{N}}\hat{A}_{k}*|C_{r}|*|C_{e}|)$ . In contrast, the computational complexity required to obtain the optimal policy $\pi$ by calculating $V(s)$ is $O(\prod_{k=1}^{N}\hat{A}_{k}*|C_{r}|*|C_{e}|)$ , where $|C_{r}|$ and $|C_{e}|$ represent the number of feasible states in the set $C_{r}$ and $C_{e}$ , respectively.

IV Scheduling Under Preemption Mechanism

Due to the noisy channel, status information cannot transmit successfully in every slot. The controller may be idle if only one status is allowed to send the state to controller at a slot. Moreover, the worse the channel, the longer the controller will be idle.

The orthogonal channel we adopted allows that multiple sensors can transmit status updates at the same time slot [26], some "standby status information" can be transmitted with the "optimal status information" in the same time slot. When the “optimal status information” transmit failed, the server can process the “standby status information” instead. Work with the preemption mechanism, it can significantly reduce the idle time of the server without reducing the system performance.

In order to facilitate the description of the calculating process on the controller, we call the status update as information task. Preemption means when a higher priority task arrives at the controller, the controller will terminate the currently processing task and execute the newly arrived task. The preempted but not completed task will continue to be executed after the current task is completed when no other preemption occurs later. As shown in Fig.2, task 1 arrives at the server first. When task 1 is being processed, task 2 with higher priority arrives. The server immediately stops executing task 1 and instead performs task 2. When task 3 with a lower priority arrives, the server does not respond. After task 2 is completed, the server continues to execute task 1 that has not been completed before. The preemption mechanism makes that the server can switch to another task and resume the current task later. Therefore, the server can handle tasks more flexibly, and the average MSE of the system will also be reduced.

The goal is the same as Section II, to minimize the system MSE. To investigate the scheduling policy on each controller, we assume the system has $N=2$ sensors, and both sensors can transmit their status updates to server in a time slot using the orthogonal channel. Furthermore, this problem can be extended to general situation. In addition, the task processing time and uplink and downlink delay of the server are the same as the remote server in the system model.

The state space is defined as $\mathbf{S}\triangleq(\mathbf{\tau},\mathbf{T_{1}},\mathbf{T_{2}},E_{t},I_{t})$ which consists all possible states.

$\tau\triangleq(\tau_{1},\tau_{2})$ which records the AoI of each process. For different process $k$ , we defined $\tau_{k}\triangleq\{1,2,\cdots,\hat{\tau}_{k}\}$ as the AoI at the beginning of slot $t$ . We set $\hat{\tau}_{k}$ be the upper limits of the AoI for process $k$ .

$\mathbf{T_{1}}\triangleq(I_{1},d_{1},E_{1})$ , these parameters characterize the processing task. $I_{1}\in\{0,1,2\}$ represents the index of the process in which status is computing on controller and $0$ represents no task is processing. $d_{1}$ records the number of slots that are left to computing. When the task is preempted, the number of time slots in the controller will increase, and $E_{1}$ records the increasing time slot length of processing tasks.

$\mathbf{T_{2}}\triangleq(I_{2},d_{2},E_{2})$ , these parameters characterize the preempted task. $I_{2}\in\{0,1,2\}$ represents the index of the preempted task and $0$ represents no task is processing. $d_{2}$ records the number of slots that are left to computing. $E_{2}$ records the increasing time slot length of preempted tasks.

$I_{t}$ and $E_{t}$ characterize the task which has been processed completed. $I_{t}\in\{0,1,2\}$ is the index of the process in which the control command is being transmitted to actuator, and $0$ represents no task is being transmitted. Moreover, $E_{t}$ records the increasing time slot length of the process in which the control command is being transmitted.

The action $a$ is in the action space $A=\{0,1,2\}$ , where $a=0$ means do not preempt and $a=k\ (k\neq 0)$ means the status of process $k$ preempt the processing task. Note that, the current task will be canceled when $a=k$ and the processing task is from process $k$ .Because the current task is obsolete and will not reduced the MSE after completion under this situation.

At the $t$ -th time slot, we assume the system state is $s(t)=(\mathbf{\tau}(t),\mathbf{T_{1}}(t),\mathbf{T_{2}}(t),E_{t}(t),I_{t}(t))$ where $\mathbf{\tau}(t)=(\tau_{1}(t),\tau_{2}(t))$ , $T_{1}(t)=(I_{1}(t),d_{1}(t),E_{1}(t))$ and $T_{2}(t)=(I_{2}(t),d_{2}(t),E_{2}(t))$ . The state transition formula is given below, and the transition will be written in the order of the success and failure of the downlink channel transmission.

When the downlink transmission is successful, we have:

\tau_{k}(t+1)=\left\{\begin{array}[]{ll}\Delta_{r}^{\uparrow}+\Delta_{r}^{\downarrow}+T_{r,k}+E_{t}(t),&\text{ if }\ I_{t}(t)=k,\\ \tau_{k}(t)+1,&\text{ otherwise. }\end{array}\right.

(20)

\begin{array}[]{l}T_{1}(t+1)=\\ \left\{\begin{array}[]{ll}\{T_{r,k}-1,k,0\},&\text{ if }a=k(k\neq 0),\\ \{d_{2}(t),I_{2}(t),E_{2}(t)\},&\text{ if }a=0,\ d_{1}=1,\\ \{\max(d_{2}(t)-1,0),I_{2}(t),E_{2}(t)\},&\text{ if }a=0,\ d_{1}=0,\\ \{d_{1}(t)-1,1(t),E_{1}(t)\},&\text{ otherwise. }\end{array}\right.\end{array}

(21)

\begin{array}[]{l}T_{2}(t+1)=\\ \left\{\begin{array}[]{ll}\{d_{1}(t),I_{1},E_{1}(t)+T_{r,k}\},&\text{ if }a=k\ (k\neq 0),\\ \{0,0,0\},&\text{ if }a=0\ \text{and}\ d_{1}=1,\\ \{0,0,0\},&\text{ if }a=0\ \text{and}\ d_{1}=0,\\ \{d_{2}(t),I_{2}(t),E_{2}(t)\},&\text{ otherwise. }\end{array}\right.\end{array}

(22)

(I_{t}(t+1),E_{t}(t+1))=\left\{\begin{array}[]{ll}(I_{1}(t),E_{1}(t)),&\text{ if }\ I_{t}(t)=k\\ (0,0),&\text{ Otherwise }\end{array}\right.

(23)

When the downlink transmission is failed, the only difference with the success situation is in (20). The rest part $T_{1}$ , $T_{2}$ , $I_{t}$ and $E_{t}$ are the same as (21) (22) (23). Therefore, it will not be described again. Furthermore, the dynamics of the system AoI in (20) is changed to:

\tau_{k}(t+1)=\tau_{k}(t)+1,

(24)

The characteristics of state cost are the same as those in Section II. The one-stage cost only depends on the current state and is defined as:

c(s,a)\triangleq\sum_{k=1}^{N}Q_{k}=\sum_{k=1}^{N}\sum_{i=1}^{\tau_{k}}(A_{k})^{(i-1)}R_{k}(A_{k}^{T})^{(i-1)}

(25)

For the preemption model, it has consistency properties. Theorem 3 Consistency. If $a^{*}(t)=\pi^{*}(s)=3$ , then $a^{*}(t:t+T_{r,i}-1)=3$ where $a^{*}(t:t+T_{r,i}-1)=3$ denotes $a^{*}(t+1)=a^{*}(t+2)=\cdots=a^{*}(t+T_{r,i}-1)=3$ and $i$ is the index of status which is calculating on the server.

Proof: See in Appendix C

In other words, Theorem 3 shows that the process status will not be preempted until it is processed completely when the server takes the optimal action $a^{*}(t)=\pi^{*}(s)=i,i=1,2$ successfully.

V SIMULATION RESULT AND ANALYSIS

In this section, we give some parameter settings in simulation and present the numerical result. Besides, the structure of the optimal policies in Section III and IV can verify the numerical result. In addition, we consider a greedy baseline policy for comparison, in which the top two processes with the highest MSE can transmit their status to the controller and the highest one has the priority to choose the controller first.

V-A Setting of system parameters

In the simulation, we consider four processes and their dimensions decrease in turn:

\centering{\begin{matrix}A_{1}=\begin{bmatrix}1.02&1&0&1\\ 0&1&0.2&0\\ 0&0&1&0\\ 0&0&0&1.01\end{bmatrix}&A_{2}=\begin{bmatrix}1&1&0\\ 0&1.02&0\\ 0&0&1\end{bmatrix}\end{matrix}}\@add@centering

\centering{\begin{matrix}A_{3}=\begin{bmatrix}1.02&1\\ 0&1\end{bmatrix}&A_{4}=1.02\end{matrix}}\@add@centering

where $A_{k}$ , $k=1,2,3,4$ is the state transmition matrix in (1). As for the noise in (1), like most previous studies, $R_{k}$ is set to an identity matrix where $k=1,2,3,4$ .

The controller parameter settings in the simulation are shown in Table 1. Where the function $size(.)$ can obtain the dimension of independent variable, e.g., $size(A_{1})=4$ .

o 0.4X[2,c]\|X[3,c]\|X[2,c]\|X[3,c] Parameter	Value	Parameter	Value
$\Delta^{\uparrow}_{r}$	1	$\Delta^{\uparrow}_{e}$	0
$\Delta^{\downarrow}_{r}$	1	$\Delta^{\downarrow}_{r}$	0
$T_{r,k}$	size( $A_{k}$ )	$T_{e,k}$	$2*\text{size}(A_{k})$

$\displaystyle V_{n+1}\left(\boldsymbol{s}^{1}\right)=$	$\displaystyle J_{n+1}\left(\boldsymbol{s}^{1},\pi_{n}^{}\left(\boldsymbol{s}^{1}\right)\right)-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{}\left(\boldsymbol{s}^{\dagger}\right)\right)$	(31)
$\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}$	$\displaystyle J_{n+1}\left(\boldsymbol{s}^{1},\pi_{n}^{}\left(\boldsymbol{s}^{2}\right)\right)-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{}\left(\boldsymbol{s}^{\dagger}\right)\right)$
$\displaystyle=$	$\displaystyle\sum_{k}Q_{k}^{1}+\sum_{\boldsymbol{s}^{\prime}\in s}\operatorname{Pr}\left[\boldsymbol{s}^{1^{\prime}}\mid\boldsymbol{s}^{1},\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)\right]V\left(\boldsymbol{s}^{1^{\prime}}\right)$
	$\displaystyle-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right),$

$\displaystyle V_{n+1}\left(\boldsymbol{s}^{2}\right)=$	$\displaystyle J_{n+1}\left(\boldsymbol{s}^{2},\pi_{n}^{}\left(\boldsymbol{s}^{2}\right)\right)-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{}\left(\boldsymbol{s}^{\dagger}\right)\right)$	(32)
$\displaystyle=$	$\displaystyle\sum_{k}Q_{k}^{2}+\sum_{\boldsymbol{s}^{\prime}\in\mathcal{s}}\operatorname{Pr}\left[\boldsymbol{s}^{2^{\prime}}\mid\boldsymbol{s}^{2},\pi_{n}^{*}\left(\boldsymbol{s}^{2}\right)\right]V\left(\boldsymbol{s}^{2^{\prime}}\right)$
	$\displaystyle-J_{n+1}\left(\boldsymbol{s}^{\dagger},\pi_{n}^{*}\left(\boldsymbol{s}^{\dagger}\right)\right).$

	$\displaystyle c(\tau_{1},\tau_{2},\tau_{3})+$	$\displaystyle u(\tau+\boldsymbol{1},C_{r}^{1},C_{e}^{2})$		(37)
		$\displaystyle\leq c(\tau_{1},\tau_{2},\tau_{3})+u(\tau+\boldsymbol{1},C_{r}^{2},C_{e}^{3}),$		(37)

	$\displaystyle u(\tau^{\prime}$	$\displaystyle,C_{r}^{1},C_{e}^{2})$		(39)
		$\displaystyle>c(\tau_{1}+z,\tau_{2}+z,\tau_{3})+u(\tau^{\prime}+\boldsymbol{1},C_{r}^{2},C_{e}^{3}),$		(39)

On the Scheduling Policy for Multi-process WNCS under Edge Computing ^†^†thanks: Y. Qiu, S. Wu and Y. Wang are with the Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China. E-mail: [email protected], [email protected], [email protected]

Abstract

Index Terms:

I Introduction