Wireless Human-Machine Collaboration in Industry 5.0
Abstract
Wireless Human-Machine Collaboration (WHMC) represents a critical advancement for Industry 5.0, enabling seamless interaction between humans and machines across geographically distributed systems. As the WHMC systems become increasingly important for achieving complex collaborative control tasks, ensuring their stability is essential for practical deployment and long-term operation. Stability analysis certifies how the closed-loop system will behave under model randomness, which is essential for systems operating with wireless communications. However, the fundamental stability analysis of the WHMC systems remains an unexplored challenge due to the intricate interplay between the stochastic nature of wireless communications, dynamic human operations, and the inherent complexities of control system dynamics. This paper establishes a fundamental WHMC model incorporating dual wireless loops for machine and human control. Our framework accounts for practical factors such as short-packet transmissions, fading channels, and advanced HARQ schemes. We model human control lag as a Markov process, which is crucial for capturing the stochastic nature of human interactions. Building on this model, we propose a stochastic cycle-cost-based approach to derive a stability condition for the WHMC system, expressed in terms of wireless channel statistics, human dynamics, and control parameters. Our findings are validated through extensive numerical simulations and a proof-of-concept experiment, where we developed and tested a novel wireless collaborative cart-pole control system. The results confirm the effectiveness of our approach and provide a robust framework for future research on WHMC systems in more complex environments.
Index Terms:
Wireless control, Industry 5.0, Human-machine collaboration, Stability analysis.I Introduction
The Fourth Industrial Revolution, known as Industry 4.0, envisions significantly increased automation and mechanization in manufacturing, driven by rapidly advancing cyber-physical systems (CPS) with minimal human intervention on the factory floor [1]. However, many dynamically changing and unforeseen control tasks in manufacturing, such as reconfiguring the production line, are challenging for autonomous machines to handle alone [2]. Therefore, humans are reintroduced to the manufacturing process to collaborate with machines in the fifth industrial revolution, Industry 5.0 [3]. In the Industry 5.0 era, human-machine collaboration (HMC) emerges as a key enabling technology to boost productivity, efficiency, and sustainability by combining human’s creativity, cognitive ability, and dexterity with machine’s strength, precision, and speed [3]. Future wireless communications, e.g., 6G, will be essential to provide high-performance connectivity for humans, machines (including robots), autonomous controllers, and ubiquitous sensors, enabling the flexible, scalable, and low-cost deployment of geographically distributed HMC systems [4]. Integrating wireless capabilities within an HMC system will unlock the full potential of human-machine collaboration in Industry 5.0, offering unprecedented flexibility and scalability. This wireless HMC (WHMC) framework will serve as the backbone for seamlessly connecting humans, machines, and sensors across geographically distributed environments, enabling real-time collaboration and decision-making.
The main application of WHMC is in collaborative control, where humans and autonomous controllers work together to achieve shared objectives [5]. WHMC systems enable seamless coordination between human operators and machines, enhancing the efficiency of control tasks. Existing research on WHMC has focused on applied areas such as teleoperation [6], driver assistance systems [7], and human-machine interaction [8], including scenarios where robots anticipate human intentions and assist in tasks like tool-passing during assembly [5]. While these efforts have led to successful implementations in specific domains, they often lack the foundational modeling and theoretical analysis needed for broader application [9, 10].
In a WHMC system, stability is a fundamental property that determines whether the controlled states will converge to a steady state and remain bounded under given collaborations. Stability analysis is essential for certifying that the closed-loop system will perform safely and effectively, even in the face of human-and-network-induced challenges like random delays and packet loss [11]. However, the fundamental theories and analytical tools for designing a WHMC system with guaranteed stability are scarce, as this research area is relatively new. Analyzing the stability of a WHMC system presents unique challenges, as it is determined by three tightly coupled domains: wireless communication, human behavior, and control dynamics. Whilst the dynamical properties of individual components are well understood, the stability condition of WHMC systems has yet to be thoroughly investigated.
I-A Related Work
Establishing fundamental theories is important for guiding the systematic design of a desired WHMC system. Researchers have extensively explored theoretical aspects, such as human control modeling, human characteristics modeling, system stability analysis, and wireless networked control.
I-A1 Human control modeling
The primary goal of human control modeling is to mathematically represent how humans perform tasks, enabling machines to understand and adapt to human control policies. This modeling is essential for the design of high-performance machine control systems that can effectively collaborate with human operators. Researchers have proposed various methods to model human control policies. For example, the human operator is often modeled as a classical machine controller, such as linear feedback controller[12, 13], proportional-integral-feedback controller[14], and impedance controller[15]. The human control behavior can also be modeled using the crossover-reference model with time-invariant dynamics [16], where human operators are characterized as an open-loop transfer function. In addition to the above deterministic models, researchers have also proposed several probabilistic models, such as hidden Markov models (HMMs) [17], partially observable Markov decision processes (POMDPs) [18] and Markov decision processes (MDPs) [19]. Despite significant progress in human control modeling, accurately formulating human control policies mathematically remains a long-lasting unsolved challenge.
I-A2 Human characteristics modeling
Human characteristics modeling aims to represent the stochastic human traits that implicitly influence the delivery and accuracy of control commands generated by the human decision-making process. These time-varying characteristics include operator workload, proficiency, fatigue, and control lag. For example, the operator’s workload can be modeled as a uniform distribution over binary state sets of high and low workload[19]. The operator’s fatigue can be modeled as a binary state set (awake or sleepy) with a certain distribution [18]. However, the human characteristics in these works are modeled as independent and identically distributed random states. Human characteristics are commonly time-correlated. In order to capture the time-correlated feature, many works model human characteristics as a Markov process [20] and adopt HMMs to infer human characteristics based on the recorded temporal data [17, 21]. Although using the Markov process to model time-varying human characteristics has garnered significant attention, its application to modeling human control lag has been less considered.111Modeling the human characteristics impacting the accuracy of human control commands relies on the precise formulation of human control policies, which is a long-lasting unsolved challenge and beyond the scope of our current work [22]. For a specific collaboration task, the control policy of a human operator commonly remains unchanged in the short term. Thus, we focus on the human control lag, which influences the delivery of human control commands and impacts the collaborative control performance. The impact of such stochastic human control lag on system performance remains underexplored.
I-A3 System stability analysis
Stability analysis is crucial for designing a WHMC system to operate efficiently and safely. Effective stability analysis requires tractable modeling of the WHMC system. However, most works focus on the fundamental stability analysis of simplified WHMC systems [12, 14, 15, 23, 16]. In this regard, these works can perform classical analytical frameworks to enable optimal control with a stability guarantee in specific applications, such as irrigation canal [14], robotic exoskeleton [15], collaborative driving [23], and collaborative piloting [16]. These limitations make the methodology of most existing works on stability analysis limited to specified control applications, which may weaken the generalization ability of their analytical frameworks. In addition, these systems do not integrate with wireless communication links. Tractable mathematical modeling of advanced WHMC systems with the integration of wireless communication links to establish the stability condition is an unsolved problem.
I-A4 Wireless networked control
Wireless networked control involves integrating autonomous control systems with wireless communication networks. It primarily focuses on establishing systematic theories related to the stability and optimization of state estimation and automatic control over wireless networks [24, 25, 26]. Existing research has largely concentrated on developing optimal control algorithms that address the challenges posed by imperfect wireless communication channels, such as errors and delays [27, 28]. Some studies investigate the impact of communication protocols and parameters on the stability of automatic control systems [29, 30]. WHMC extends wireless networked control by incorporating human intelligence into the control loop, enhancing system adaptability and performance. While traditional wireless networked control focuses on how communication systems affect control stability, it does not account for the complexities introduced by human operators. Consequently, existing methods in wireless networked control are insufficient for WHMC systems, which require new approaches to address the challenges posed by integrating human factors into the control process.
I-B Motivation
A WHMC system is significantly more complex than a conventional control system. This complexity arises from the integration of wireless human control loops, the need for collaborative control, and the challenge of addressing time-varying and unforeseen tasks. In a WHMC system, the wireless communication links, the human operator, and the automatic machine controller collectively work to achieve dynamic control objectives under stringent stability constraints. This creates a novel networked topology with tightly coupled wireless human and machine control loops. The system’s stability and performance are critically influenced by three factors: wireless communication errors and delays, the stochastic nature of human behavior, and the dynamics of the physical system under control. We name these three factors as the “three-level dynamics”. Addressing these factors presents a unique challenge in modeling and stability analysis for WHMC systems. To date, the impact of these dynamics on WHMC system stability has not been investigated at all.
Fundamental modeling and analysis of a WHMC system, which features a substantially different control model, requires addressing the following fundamental questions:
-
1.
How can we achieve tractable mathematical modeling of a WHMC system that effectively captures the three-level dynamics?
-
2.
How can we establish an analytical framework for stability analysis when an accurate mathematical model of the human control policy is unavailable?
-
3.
What are the primary conditions within the three-level dynamics that enable the stable operation of a WHMC system?
I-C Contributions
In this work, we address the fundamental questions outlined above, and our novel contributions are summarized below.
-
1.
Novel tractable modeling of the WHMC system. For the first time, we propose a WHMC model that consists of dual wireless loops, i.e., the machine control loop and the human control loop. In particular, we have taken into account practical wireless communication factors such as short-packet communications, fading channel models and advanced hybrid automatic repeat request (HARQ) schemes for wireless sensors-controller-actuator transmission (referred to as the machine control loop) and sensors-human-actuator transmission (referred to as the human control loop). Unlike most existing HMC studies, which typically overlook the temporal variability and stochastic nature of human interactions, we model the dynamics of human control lag as a Markov process.
-
2.
Stability analysis of the WHMC system. Leveraging the proposed system model, we introduce a novel cycle-cost-based approach to derive a sufficient condition for the stochastic stability of the WHMC system for the first time. This stability condition is expressed in terms of wireless channel statistics, human state dynamics, and control system parameters. We thoroughly investigate the structural properties and special cases of the derived stability condition, providing comprehensive analysis and numerical illustrations.
-
3.
Proof-of-concept experiment for the proposed WHMC system. To demonstrate the advantages of WHMC and validate the developed fundamental theories and analytical tools, a proof-of-concept experiment is conducted. Specifically, we develop and evaluate a wireless collaborative cart-pole control system in terms of control performance and system stability. The experiment confirms the practicality of our approach and provides the validation of the theoretical framework, which is set in 1) and 2).
Outline. The proposed model of the WHMC system is described in Section II. The stability analysis is presented in Section III. A proof-of-concept experiment is demonstrated in Section IV, followed by conclusions in Section V.
Notations. Matrices and vectors are denoted by capital and lowercase upright bold letters, e.g., and , respectively. is the Euclidean norm of vector . is the expectation operator. denotes the element at -th row and -th column of a matrix . is the vector or matrix transpose operator. and denote the sets of real numbers and positive integers, respectively. denotes the non-negative integers.
II WHMC System
II-A Control System Dynamics

We consider a WHMC system consisting of a dynamic plant, two actuators, an autonomous controller (i.e., a machine), and a human operator, as shown in Fig. 1. The sensors attached to the plant send state measurements to the remote controller and the human operator. These two agents then send their individual control signals to the corresponding actuators in order to complete a collaborative control task of the plant. All information for sensing and control is exchanged via four wireless links: sensor-human (SH) uplink, sensor-controller (SC) uplink, human-actuator (HA) downlink, and controller-actuator (CA) downlink. Such a system model has two types of control loops, i.e., the machine control loop and the human control loop. It is abstracted from the existing visions of HMC systems, e.g., homecare robotic systems for Healthcare 4.0 [31], factory edge robotic systems for Industrial 5.0 [4], collaborative surgery in healthcare [32], collaborative piloting in aviation [16], and collaborative driving in a vehicle [23]. These systems require a human operator to control an actuator as well as collaborate with other machine-controlled actuators.
Having two loops in parallel allows one to clearly distinguish between human and machine contributions and enables individual analysis of each loop’s dynamics and their interactions. Our model can also adjust the degree of influence each loop has, allowing for a spectrum of control schemes, such as human-in-the-loop, supervisory, and shared control.222For example, if the time period of a human control loop is far longer than that of a machine control loop, our model becomes supervisory control, where the machine is predominantly responsive. If the time period of a machine control loop is longer than that of a human control loop, it can be seen as human-in-the-loop control, where the human operator is predominantly responsive. If the time period of a machine control loop is close to that of a human control loop, our model encompasses shared control, where both the human operator and the machine contribute significantly.
The plant dynamics is modeled as a nonlinear discrete time-invariant system
(1) |
where is the time index given the sampling period ; is the plant state vector at time ; and are the corresponding human control input and machine control input, respectively; is the plant disturbance. The control algorithms for generating control inputs will be presented later in this section.
II-B Wireless Control Loops
The temporal operation of the two control loops is shown in Fig. 2. We assume block Rayleigh fading channels, where the channel characteristics remain constant during each time slot but change independently from one time slot to the next.

II-B1 Machine control loops
Each machine control loop takes a single time step, i.e., the period of a machine control loop is , and consists of a pair of SC uplink and CA downlink transmissions. If the SC packet is not detected successfully, there is no CA transmission scheduled as the controller has no instantaneous plant state information. We consider short-packet transmissions for low-latency communications [33]. The computation time for generating a control signal is usually much shorter than the transmission delay and thus is omitted [34, 35]. A machine control loop is closed only when both the SC and CA transmissions within it are successful.
II-B2 Human control loops
Each human control loop period is delineated by an HA downlink transmission, as illustrated in Fig. 2. A human control loop contains multiple SH uplink transmissions, a human control procedure, and an HA downlink transmission. A downlink transmission attempt marks the end of one human control loop period and the beginning of the next. Each period starts from a new packet transmission from the sensors, which contains the current plant state measurement. If the transmission fails, then a retransmission takes place using a HARQ protocol. In instances where a given maximum number of retransmissions has been reached, a new transmission is triggered.333Unlike machine control loops, the lag of human control, which captures the delay in human decision-making, is significantly longer than the transmission delay [36]. This results in frequent machine control actions and infrequent human interventions. In contexts where the lag of human control is substantial, the transmission delay becomes relatively insignificant. Consequently, retransmissions are used to improve transmission reliability, as a longer transmission delay caused by retransmissions does not notably affect the overall human control process. The human operator generates a control command after receiving a successful packet, and then sends the command to the actuator. There is no retransmission for the HA and CA downlinks, since retransmissions lead to unpredictable delays, making the generated time-sensitive control command useless. Let denote the human control loop index. Then, the transmission delay for the SH and HA transmissions are and , respectively, and the lag of human control is . In particular, is modeled as a finite state Markov chain with a transition probability matrix , where . A shorter lag of human control leads to better control performance. The stationary distribution of is given as
(2) |
We assume each transmission in a human control loop takes one time step because human-type communication generally requires a larger packet length than machine-type communication [37]. Considering the random period of each human control loop, we define as the starting time slot of the -th human control loop. The human control loop is closed once the HA transmission is successful.
II-C Control Algorithms
Due to packet detection errors, the sensor’s packet for the remote controller may not be received by the remote controller, and the machine control input may not be received by the actuator at every time step. Let the binary variables denote the transmission success and failure of the corresponding channel at , respectively. The machine control input at is given as
(3) |
where is the machine control policy. Hence, only a pair of successful uplink and downlink transmissions can generate an effective control input, closing the machine control loop.
From the definition of human control loops, a human control input can only be available at the beginning of each control loop. Considering the random delay of SH transmissions and human decision-making, the human control input at is
(4) |
where is the human control policy and
(5) |
As an accurate model of the human control policy is unavailable, we propose an analytical framework for stability analysis without using specific control policies of human and machine, but using their control significance in the next section.
III Stability Analysis
From (3)–(5), we see that the two control loops can be either open or closed due to the packet loss and delays, which may cause instability of the WHMC system. In this section, we derive the stability condition of the proposed WHMC system by taking into account the randomness in wireless communications and human decision-making. Since only closed control loops generate effective control inputs that regulate plant state and affect stability, we analyze the statistics of the stochastic closed (and open) control loop first.
III-A Stochastic Control Loop Analysis
III-A1 Open loop probabilities of human and machine control
Let , , , and denote the signal-to-noise ratio (SNR) of received packets in HA, SH, CA, and SC channels, respectively. Given the packet length (i.e., the number of symbols per packet), the number of data bits in the packet, and the SNR of the packet, we have the approximated decoding error probability of a packet as [38]
(6) |
where and are the Shannon capacity and the channel dispersion, respectively, and is the Gaussian Q-function.
The probability of the machine control operating in an open loop at time can be obtained as
(7) | ||||
The expectation of (7) with respect to and is denoted as , and can be obtained by
(8) |
Since each human control loop contains a successful SH packet, the probability of an open human control loop only depends on the HA transmission and is given by
(9) |
The expectation of (9) with respect to is denoted as , and can be obtained by
(10) |

III-A2 Distribution of the SH delay
The duration of a human control loop in (5) includes the SH channel delay , human control lag , and the HA channel delay . The HA channel delay is constant across human control loops, while the human control lag is time-correlated across human control loops due to the Markovian property. The SH channel delay is attributed to the HARQ and i.i.d across all human control loops. We analyze the distribution of the SH channel delay before proceeding with the distribution of the duration of consecutive time steps where the human control loop is open. We consider the following three types of HARQ schemes for the SH channel, including Type I HARQ (TI-HARQ), Chase Combing HARQ (CC-HARQ), and Incremental Redundancy HARQ (IR-HARQ).444In TI-HARQ, the packet is re/transmitted for all re/transmissions, and all erroneously decoded packets are discarded at the receiver side. All decoding attempts during re/transmissions of the packet are independent. In CC-HARQ, all erroneously decoded packets in previous re/transmissions are saved and their signals are combined together as a single strengthened signal for decoding. In IR-HARQ, the packet in each re/transmission is a punctured version of a low-rate mother packet. If errors occur, it only retransmits the additional redundancy for the previous uncorrectable packets. The newly received redundancy is combined with the previously received packets to construct a packet with a longer length for decoding.
The number of re/transmission attempts is . Let denote the set of experienced SNRs during re/transmission attempts, that is
(11) |
The decoding error probability of the packet after re/transmission attempts is an expectation over (11), and can be approximated as [39, 40, 41]
(12) | ||||
To facilitate our subsequent analysis, we assume that all packets have the same length . For CC-HARQ, since the channel gain is exponentially distributed, is gamma distributed with the probability distribution function [42]
(13) |
where is the mean of the exponential distribution. Thus, for TI- and CC-HARQ, is obtained by leveraging (6), (12), and (13). For IR-HARQ, can be determined by Monte Carlo simulations.
(14) |
The delay induced by the SH transmission period is . Note that the SH transmission period may contain multiple -trails of the retransmission process as described in Section II-B2, and the number of experienced -trails is . The probability distribution of is then given as (14). When , the SH transmission is successful in the first -trials. In this case, if , indicates that the first trials have failed and the -th transmission attempt is successful. When , the SH transmission is successful in the th -trials, while the former th -trials are decoded erroneously.
III-A3 Time interval distribution between consecutive closed human control loops
We denote the starting time of the th closed human control loop as , as shown in Fig. 3. Let and denote time steps and the numbers of (open or closed) human control loops between and , respectively, i.e.,
(15) |
where is the index number of the th closed human control loop among all the loops. The probability distribution of in (15) can be expressed as
(16) |
The probability distribution of the number of consecutive open human control loops in (16) can be expressed as
(17) |
where is defined in (10). The time interval distribution of under the condition with open human control loops in (16) consists of two independent and stochastic parts, i.e., the total delay induced by SH channel and human control lag. In the following, we analyze the conditional probabilities of the two parts. The conditional probabilities of the delay induced by the SH channel can be expressed as
(18) | ||||
where is defined in (14). The conditional probabilities of the delay induced by the human control lag can be expressed as (19),
(19) | ||||
where , is defined in (2), and . Then, the time interval distribution under the condition with open human control loops is
(20) | ||||
where and are conditional probabilities defined in (18) and (19), respectively. In summary, by using (18) and (19), we can obtain (20). By substituting (17) and (20) into (16), we can obtain the time interval distribution between consecutive closed human control loops.
III-B Stability Condition of WHMC
Lyapunov functions are powerful tools used for stability analysis in dynamic systems without needing explicit control policies. A function is said to be a Lyapunov-like function, if , for , and . It is a scalar function that can be treated as a cost function associated with the system state . For example, the function can be the magnitude of the input vector . The dynamic system is stable if the expected cumulative cost over an infinite time horizon remains bounded. Thus, we have the following definition.
Definition 1 (Stochastic Stability [43, 44, 45]).
The wireless networked human-machine collaborative system is stochastically stable, if for some Lyapunov-like functions : , the expected value .
From (7), (9) and (18), we note that the WHMC system randomly switches between the following four cases: 1) Case one: both the machine control loop and the human control loop are closed; 2) Case two: only the machine control loop is closed; 3) Case three: only the human control loop is closed; and 4) Case four: both the machine control loop and the human control loop are open. We next examine the stability condition taking into account each individual case.
For tractable analysis, we make the following assumption.
Assumption 1 (Lyapunov-Like Function Gains).
There exists a Lyapunov-like function : , non-negative control system parameters , , , and , such that for all following (1) and the initial plant state satisfying , we have
(21) |
Assumption 1 bounds the one-step cost function ratio between and in the four cases based on the Lyapunov gains, , , , and . Note that Lyapunov gains are often assumed in non-linear system stability analysis [43, 44, 45]. If a ratio is less than , then the cost decreases; otherwise, it increases. Considering extreme cases, if all ratios in the four cases are less than 1, the WHMC system is directly stabilized, as the cost in all cases decreases over time. Conversely, if all ratios are significantly larger, the system may not stabilize according to Definition 1. The control system parameters , , , and are determined by the plant dynamics (1) and the control algorithms (3) and (4).
III-B1 Stability condition
In the following, we propose a stochastic cycle-cost-based approach to obtain sufficient stability conditions for the WHMC system.
Theorem 1.
The plant of the WHMC system defined in Section II is stochastically stable if
(22) |
where is the expected probability of an open machine control loop defined in (8); the control system parameters , , , and are defined in Assumption 1; is the random time interval between consecutive closed human control loops with the probability distribution defined in (16).
Proof.
(Main ideas) We investigate the stability condition of the WHMC system defined in (1) by following the stability analysis framework adopting Lyapunov-like functions [43, 44, 45].555The methods in [43, 44, 45] are not directly applicable, as the control process involves human control operations with a Markovian lag model. Since human control is less frequent than machine control, it is convenient to focus on the plant events in which the actuator received human control commands. Therefore, the control process is divided by the closed-human-control-loop events. We name the time interval between consecutive closed human control loops as a cycle within the control process, and the sum of stochastic costs in a cycle is a cycle cost. Thus, the total cost of the control process is the sum of all cycle costs. The stability is equivalent to the bounded sum of all cycle costs, according to Definition 1. To prove the stability condition, we first analyze a stochastic cycle cost, where only case two and case four defined in Assumption 1 exist. It depends on the number of these two cases conditioned on the open loop probabilities and the time interval distribution presented in Section III-A. Then, we analyze the sum of stochastic cycle costs to the infinity cycles by further considering case one and case three defined in Assumption 1. Finally, we derive the stability condition by making the sum of the stochastic cycle costs bounded as Definition 1. See Appendix A for detailed proof. ∎
Sufficient conditions in stability analysis are critical because they provide guarantees that the system will be stable under the specific assumption. They are thus preferred since they give engineers and researchers a clear set of criteria to design and analyze their systems safely. The stability condition of the WHMC systems depends on the wireless communication parameters, i.e., the open loop probabilities of human and machine control and , the control system parameters, i.e., , , and , and the Markov human state transition rule . In particular, and impact the distribution of , which further affect the stability condition. The condition indicates that if the WHMC systems exhibit high dynamics (i.e., the plant state changes significantly even with very small control input), the human operator experiences fatigue with a high control lag, and the open-loop probability is high, then the WHMC system becomes difficult to stabilize through collaboration.
III-B2 Stability region
The stability region in WHMC systems defines the range of system parameters that ensure stable operation, as per Theorem 1. The boundary of this stability region represents the critical limits beyond which the system may become unstable. The properties of this boundary are elucidated next.
Corollary 1.
Given the WHMC stability condition in Theorem 1,
(i) the stability region boundary in terms of and is linear:
(23) |
where ;
(ii) the stability region boundary in terms of and is linear:
(24) |
where and ;
(iii) the stability region boundaries, in terms of the other four possible pairs of control system parameters, i.e., , , , and , are concave.
Proof.
See Appendix B. ∎
As illustrated in Fig. 4, a linear stability region (e.g., Corollary 1 (i) and (ii)) means the boundary is governed by a linear function. It implies that any combination of the control system parameters within the region will maintain system stability, offering engineers substantial flexibility in parameter selection and system tuning without compromising stability. This implication is applicable to the convex stability region, where the boundary is governed by a convex function. In contrast, a concave stability region (e.g., Corollary 1 (iii)) has a boundary governed by a concave function. This indicates that while individual parameter sets within the region ensure stability, linear combinations of these parameters may not. For any stable parameter set, all parameter sets within the rectangular area defined by this set and the origin are also stable. In addition to control system parameters, communication system parameters also impact the stability region, which is presented in Section III-C.

III-B3 Special cases
Given the stability condition of the general WHMC system in Theorem 1, we examine the stability conditions for three specific cases.
For an error-free channel, assuming the communication channels are perfect, we have . The stability condition in (1) reduces to
(25) |
where and defined in (2) is determined by the human state transition matrix . In this case, the stability depends on , , and . Since the communication channels are perfect, only human control loops may be open due to the human control lag. Thus, only the Lyapunov gains in cases one and two of Assumption 1, i.e., and , play a role in this scenario.
Human control only, assuming that the plant is only controlled by a human operator, i.e., the machine control loop is always open (). The stability condition is
(26) |
where and is defined in (16). In this case, the stability depends on , , and . Since the machine control loop is always open, only the Lyapunov gains in cases three and four of Assumption 1 are relevant. We note that if the human control lag is a constant, is still a random time interval due to the random SH delay.
Machine control only, assuming that the plant is only controlled by a machine controller, i.e., the human control loop is always open (). The stability condition of this case cannot be directly obtained from Theorem 1, because the stochastic cycle-based approach in Theorem 1 is on the basis of closed human control loops. Modifications to the definition of stochastic cycles are required to analyze the stability condition. Our results are presented next:
Proposition 1.
Proof.
See Appendix C. ∎
III-C Numerical Examples of the Stability Region

We present numerical results to illustrate the stability region in terms of the communication, the control system, and the human model parameters, which show how these parameters affect the stability condition (22) in Theorem 1. The average channel gain is denoted as and follows the free-space path loss model , where denotes the antenna gain; denotes the carrier frequency; denote the distance from the human operator or the machine to the plant; denote the path loss exponent [46]. The time-varying wireless channel power gains are generated from Rayleigh fading channel models, i.e., . Given the transmission power and the receiving noise power , the SNR of received packets in all channels are obtained from , respectively. The communication parameters are summarized in Table I.
Items | Value |
Communication parameters | |
Code rate [bps], | 2 |
Packet length [symbols], | 1500 |
Transmit power [dBm], | 23 |
Background noise power [dBm], | -70 |
Maximum number of re/transmissions, | |
Free-space path loss model | |
Antenna gain, | 4 |
Carrier frequency [MHz], | 915 |
Distance from machine to plant [m], | 40 |
Distance from human to plant [m], | 45 |
Path loss exponent, | 2.9 |
The human control lag has two states (i.e., fast and slow) with the stationary probability distribution and the state transition matrix can be one of the three cases below:
(28) |
is a Prolonged Response Model, where the human operator tends to remain in a single state—either fast (low lag) or slow (high lag)—for extended periods. This reflects a tendency for the operator’s reaction time to be consistently fast or slow, with infrequent transitions between these two states. is a Random Response Model, where the human operator has an equal probability of staying in their current state or switching to the other, leading to unpredictable shifts between fast and slow reactions. is a Variable Response Model, where the human operator frequently switches between fast and slow reactions, indicating high variability in response times.
Numerical results are illustrated in Fig. 5. We select the pair of and to show the impacts because this pair has the simplest linear relationship for demonstration (see Corollary 1). Fig. 5(a) illustrates the impacts of human model parameters on the stability region. In particular, a human operator with a variable response model shows the largest stability region, while a human operator with a prolonged response model has the smallest stability region. A human operator with a prolonged response model has a higher chance of instantly staying in a large lag state. Thus, to guarantee closed loop stability, more reliable communications are required. As shown in Corollary 1(i), the slope of the linear stability region in Fig. 5(a) depends on the expected probability of an open machine control loop defined in (8).
Fig. 5(b) illustrates the impacts of three HARQ schemes on the stability region. Compared with TI-HARQ, WHMC systems with IR-HARQ and CC-HARQ schemes show a larger stability region due to the fact that the packet combining can significantly reduce the number of retransmissions by taking advantage of the accumulated SNRs. A WHMC system with the IR-HARQ scheme has the largest stability region because only incremental redundancies are retransmitted for each event of the erroneous packet. Fig. 5(c) illustrates the impacts of maximum re/transmission attempts on the stability region. We see that the system with HARQ schemes (i.e., ) has a larger stability region than the system without retransmission (i.e., ). As increases, the extension of the stability region becomes small; thus, is commonly used in the numerical illustrations.
In addition to the linear boundary, Fig. 5(d) illustrates the concave boundaries in terms of vs. and vs. , where a small variation of leads to a significant change in both and . This is because machine control attempts are more frequent than human control attempts, and the accumulated significance of is significantly higher than the Lyapunov gains and involving human control attempts. Fig. 5(e) illustrates stability regions in terms of the pair of and with different . As decreases, the stability region expands dramatically, highlighting the significant reduction of human control efforts to stabilize the system. Fig. 5(f) illustrates stability regions in terms of the pair of and with different . The stability region expands with decreasing . This is because a larger open loop Lyapunov gain indicates greater effort required for both automatic machine and human control inputs.
IV Proof of Concept Experiment
In this section, we present a case study of WHMC to illustrate its advantage in control performance. The experiment data of the case study are recorded to estimate the control system and the human model parameters, followed by the stability analysis of the case study to show the effectiveness of Theorem 1.

IV-A Experiment Setups
We build a WHMC system where a cart-pole system is simulated and controlled by a machine controller and a real human operator, as shown in Fig. 6. The machine controller is implemented to control the applied force to the cart with an unknown weight for balancing the pole. The dynamic weight on the cart can be observed by the human operator who monitors the state of the simulated cart-pole system and uses a key ‘S’ between ‘A’ and ‘D’ on a keyboard to intervene in the control of the cart-pole system to remove the dynamic weight on the cart. The dynamic weight can be seen as a catastrophic disturbance to the system, which cannot be handled by the machine controller designed without such knowledge. Therefore, such a control system has nonlinear dynamics and unknown disturbance to the machine controller, which is challenging without collaboration with a human operator. The IR-HARQ scheme is adopted with a maximum re/transmissions number of . Other communication parameters and the free-space path loss model are the same as Table I.
IV-A1 Cart-pole dynamics
In the simulated cart-pole system, the mass of the pole is assumed to be concentrated at its end mass. The states of the cart-pole system consist of the position and velocity of the cart, the angle and angular velocity of the pole, and the unknown weight on the cart, which is denoted as . The dynamics of the cart-pole system are governed by the non-linear dynamic equations in (29),
(29) |
where and are the mass of the pole and cart, respectively; is the gravitational acceleration; is the length of the pole; is the moment of inertia for a point mass in terms of the center of the pole; is the applied force to the cart by the machine controller in (3); are the damping coefficients for the pole and cart, respectively. For the dynamics of the unknown weight on the cart, , we assume that once the weight is successfully removed by the actuator remotely controlled by the human operator, it will reappear on the cart after a random time interval; otherwise, the unknown weight will remain on the cart continuously. Thus, has the following updating rule
(30) |
where is randomly generated and is the human control input. The sampling period is . and can be derived from (29) given . Then, by leveraging , , , and (30), we can get , indicating the proposed cart-pole system follows (1). The initial state is .
IV-A2 Control policies
In this experiment, is unknown to the machine controller, but all other states, parameters, and the system dynamics in (29) (excluding ) are known. The machine control policy seeks to achieve a decaying angle as per by applying force , where . Using the Euler approximation to update the angle, we obtain
(31) |
The updated angular velocity is also obtained by using the Euler approximation, i.e., . A smaller will lead to a control policy enabling a faster speed of , which is 0.7 in the experiment. By leveraging (31) and (29) with , the machine control policy can be obtained as (32) to determine ,
(32) | ||||
where . Recall that the human control policy is to remove the unknown weight on the cart if the human operator observes it through visual feedback. Thus, the human control policy is .
IV-B WHMC Control Performance
Definition 2 (Collaborative Control Performance).
The control performance of a WHMC system at each time step is evaluated by a cost function : , which is defined as
(33) |
where is a positive diagonal matrix to individually penalize the states of interest. A smaller control cost indicates a better control performance.
In the experiment described in Section IV-A, the objective of the WHMC system is to balance the pole (i.e., is closely around the zero point). Thus, we are only interested in the angle of the pole, resulting in a cost function . The control cost of the machine control only case, the human control only case, and the WHMC case are shown in Fig. 7. The human operator’s objective is to remove the weight, not to balance the pole. Only the machine controller handles pole balancing. Without machine control inputs, the control cost increases. Both the WHMC and machine-only cases can reduce the cost over time, with their stability guaranteed, as will be further discussed in Section IV-C3. Compared to the machine-only case, the WHMC case shows a faster decrease in cost, demonstrating the importance of WHMC.

IV-C WHMC System Stability
IV-C1 Estimation of control system parameters
Since the control objective is to balance the pole, the Lyapunov-like function is defined as
(34) |
where the threshold of 0.05 is set to eliminate the impacts of uncontrolled on the control objective when reaches to the desired zero point. To estimate the four control system parameters, i.e., , , , and , we collect data by conducting the experiment in four cases, i.e., no control inputs, machine control only, human control only, and human-machine collaborative control, respectively. The parameter in each case is estimated by based on the corresponding data set, where is defined in (34). The estimated four control system parameters are , , , and .
IV-C2 Estimation of human model
To reduce the estimation complexity, we quantize the human control lag to a two-state set of , which corresponds to 0.15 and 0.35 . The state transition matrix is estimated based on the maximum likelihood estimation approach, which is
(35) |
The corresponding stationary probability distribution is , i.e., and .
IV-C3 Stability of the cart-pole system
Based on the above estimation and parameters in Table I, the left term of the stability condition in Theorem 1 is , demonstrating a stabilized WHMC system. In the machine-only control scenario, the left term of the stability condition (27) is , indicating a stochastically stable system. Conversely, in the human-control-only case, the left term of the stability condition (26) is , signifying an unstable system. This instability also explains the increasing control cost observed in Fig. 7.
V Conclusions
We have developed a foundational WHMC model that integrates dual wireless loops for both machine and human control, addressing the intricate challenges associated with WHMC systems. By introducing a novel stochastic cycle-cost-based approach, we have derived a stability condition that accounts for the complexities of wireless communication, human behavior, and control system dynamics. Our approach has been validated through extensive numerical analysis and the creation of a new case study, demonstrating its practical effectiveness. These contributions offer a strong basis for advancing WHMC systems in increasingly complex and dynamic environments.
Acknowledgments
The authors would like to express their sincere gratitude to Dr. Anuradha Annaswamy, Director of the Active-Adaptive Control Laboratory at MIT, for her valuable comments on this paper. Her insightful feedback and suggestions have been instrumental in improving the clarity and rigor of this work.
Appendix A Proof of Theorem 1
The time steps of the th closed-loop human control is defined as , as shown in Fig. 3. Let and denote the number of case two and case four defined in Assumption 1 between and , respectively. Then we have
(36) |
and
(37) |
Since
(38) |
the sum of between the two adjacent closed human control loops has the following inequality
(39) |
where
By further processing the above inequality, we have
(40) |
Then,
(41) |
Let and denote the numbers of case one and case three defined in Assumption 1 between and , respectively. In this time interval, and denote the numbers of case two and case four, respectively. Then,
(42) |
It can be further processed as
(43) |
Since
(44) |
we have
(45) |
where
By leveraging (41) and (45), we have
(46) |
Since , to make , we need
(47) |
Let
(48) |
Then, we have
(49) |
and
(50) |
To satisfy (47), (22) is derived from (50) as the stability condition of the WHMC system.
Appendix B Proof of Corollary 1
According to (22), the bound of the stability region is
(51) |
(i) When control system parameters and are fixed, by further processing (51), we have
(52) |
where the linearity between and is showcased.
(ii) When control system parameters and are fixed, by further processing (51), we have
(53) |
which is a sum of linear equations and can be represented as
(54) |
Thus, the linearity between and is demonstrated.
(iii) For any other possible pairs of two control system parameters, we can take one of equations from (53) and prove that it is convex in terms of the pairs other than those in (i) and (ii). In particular, we have
(55) |
Then, it can be represented as
(56) |
We take the pair of and for example, given the fixed and . Then, (56) can be represented as
(57) |
The first-order derivative is
(58) |
The second-order derivative is
(59) |
We note that , and . Thus, . Then, (57) is convex and the sum of convex functions (53) is convex and has a concave stability boundary. Other pairs other than those in (i) and (ii) can also be proved following the above analysis.
Appendix C Proof of Proposition 1
We also leverage the stochastic cycle-based approach in Section III-B1. Assume the time steps of the two adjacent closed machine control loops are and . Then, we have
(60) |
and
(61) |
The sum of between the two adjacent closed machine control loops has the following inequality
(62) |
By further processing the above inequality, we have
(63) |
Then,
(64) |
Since
(65) |
we have
(66) |
Since , to make , we need
(67) |
Let
(68) |
Then we have
(69) |
Then we have the following equation
(70) |
The stability condition in (27) is derived from (70) to satisfy (67).
References
- [1] H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, and M. Hoffmann, “Industry 4.0,” Bus. Inf. Syst. Eng., vol. 6, pp. 239–242, 2014.
- [2] S. Kumar, C. Savur, and F. Sahin, “Survey of human-robot collaboration in industrial settings: Awareness, intelligence, and compliance,” IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 1, pp. 280–297, 2020.
- [3] P. K. R. Maddikunta, Q.-V. Pham, P. B, N. Deepa, K. Dev, T. R. Gadekallu, R. Ruby, and M. Liyanage, “Industry 5.0: A survey on enabling technologies and potential applications,” J. Ind. Infor. Integr., vol. 26, 2021, Art. no. 100257.
- [4] I. Kardush, S. Kim, and E. Wong, “A techno-economic study of Industry 5.0 enterprise deployments for human-to-machine communications,” IEEE Commun. Mag., vol. 60, no. 12, pp. 74–80, 2022.
- [5] A. P. Dani, I. Salehi, G. Rotithor, D. Trombetta, and H. Ravichandar, “Human-in-the-loop robot control for human-robot collaboration: Human intention estimation and safe trajectory tracking control for collaborative tasks,” IEEE Control Syst. Mag., vol. 40, no. 6, pp. 29–56, 2020.
- [6] Z. Lu, Y. Guan, and N. Wang, “An adaptive fuzzy control for human-in-the-loop operations with varying communication time delays,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 5599–5606, 2022.
- [7] F. Mars and P. Chevrel, “Modelling human control of steering for the design of advanced driver assistance systems,” Annu. Rev. Control, vol. 44, pp. 292–302, 2017.
- [8] H. Kress-Gazit, K. Eder, G. Hoffman, H. Admoni, B. Argall, R. Ehlers, C. Heckman, N. Jansen, R. Knepper, J. Křetínský, S. Levy-Tzedek, J. Li, T. Murphey, L. Riek, and D. Sadigh, “Formalizing and guaranteeing human-robot interaction,” Commun. ACM, vol. 64, no. 9, p. 78–84, 2021.
- [9] J. v. Oosterhout, J. G. W. Wildenbeest, H. Boessenkool, C. J. M. Heemskerk, M. R. d. Baar, F. C. T. v. d. Helm, and D. A. Abbink, “Haptic shared control in tele-manipulation: Effects of inaccuracies in guidance on task execution,” IEEE Trans. Haptic, vol. 8, no. 2, pp. 164–175, 2015.
- [10] A. Lopes, J. Rodrigues, J. Perdigao, G. Pires, and U. Nunes, “A new hybrid motion planner: Applied in a brain-actuated robotic wheelchair,” IEEE Rob. Autom. Mag., vol. 23, no. 4, pp. 82–93, 2016.
- [11] W. Liu, D. E. Quevedo, K. H. Johansson, B. Vucetic, and Y. Li, “Stability conditions for remote state estimation of multiple systems over multiple Markov fading channels,” IEEE Trans. Autom. Control, vol. 68, no. 7, pp. 4273–4280, 2022.
- [12] T. Yucelen, Y. Yildiz, R. Sipahi, E. Yousefi, and N. Nguyen, “Stability limit of human-in-the-loop model reference adaptive control architectures,” Int. J. Control, vol. 91, no. 10, pp. 2314–2331, 2018.
- [13] H.-N. Wu and M. Wang, “Human-in-the-loop behavior modeling via an integral concurrent adaptive inverse reinforcement learning,” IEEE Trans. Neural Networks Learn. Sys., pp. 1–12, 2023.
- [14] P. van Overloop, J. Maestre, A. D. Sadowska, E. F. Camacho, and B. De Schutter, “Human-in-the-loop model predictive control of an irrigation canal [applications of control],” IEEE Control Syst. Mag., vol. 35, no. 4, pp. 19–29, 2015.
- [15] Z. Li, J. Liu, Z. Huang, Y. Peng, H. Pu, and L. Ding, “Adaptive impedance control of human–robot cooperation using reinforcement learning,” IEEE Trans. Ind. Electron., vol. 64, no. 10, pp. 8013–8022, 2017.
- [16] E. Eraslan, Y. Yildiz, and A. M. Annaswamy, “Shared control between pilots and autopilots: An illustration of a cyberphysical human system,” IEEE Control Syst. Mag., vol. 40, no. 6, pp. 77–97, 2020.
- [17] Q. Deng and D. Söffker, “A review of HMM-based approaches of driving behaviors recognition and prediction,” IEEE Trans. Intell. Veh., vol. 7, no. 1, pp. 21–31, 2022.
- [18] C.-P. Lam and S. S. Sastry, “A POMDP framework for human-in-the-loop system,” in Proc. IEEE CDC, 2014, pp. 6031–6036.
- [19] L. Feng, C. Wiltsche, L. Humphrey, and U. Topcu, “Synthesis of human-in-the-loop control protocols for autonomous systems,” IEEE Trans. Autom. Sci. Eng., vol. 13, no. 2, pp. 450–462, 2016.
- [20] B. Hu and J. Chen, “Optimal task allocation for human–machine collaborative manufacturing systems,” IEEE Robot. Autom. Lett., vol. 2, no. 4, pp. 1933–1940, 2017.
- [21] C. Craye, A. Rashwan, M. S. Kamel, and F. Karray, “A multi-modal driver fatigue and distraction assessment system,” Int. J. Intelligent Transp. Syst. Res., vol. 14, no. 3, pp. 173–194, 2016.
- [22] M. Protte, R. Fahr, and D. E. Quevedo, “Behavioral economics for human-in-the-loop control systems design: Overconfidence and the hot hand fallacy,” IEEE Control Syst. Mag., vol. 40, no. 6, pp. 57–76, 2020.
- [23] H.-N. Wu and X.-M. Zhang, “Stochastic stability analysis and synthesis of a class of human-in-the-loop control systems,” IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 2, pp. 822–832, 2022.
- [24] P. Park, S. Coleri Ergen, C. Fischione, C. Lu, and K. H. Johansson, “Wireless network design for control systems: A survey,” IEEE Commun. Surv. Tutor., vol. 20, no. 2, pp. 978–1013, 2018.
- [25] W. Liu, X. Zang, Y. Li, and B. Vucetic, “Over-the-air computation systems: Optimization, analysis and scaling laws,” IEEE Trans. Wirel. Commun., vol. 19, no. 8, pp. 5488–5502, 2020.
- [26] J. Chen, W. Liu, D. E. Quevedo, S. R. Khosravirad, Y. Li, and B. Vucetic, “Structure-enhanced DRL for optimal transmission scheduling,” IEEE Trans. Wirel. Commun., vol. 23, no. 1, pp. 379–393, 2023.
- [27] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla, and S. S. Sastry, “Foundations of control and estimation over lossy networks,” Proc. IEEE, vol. 95, no. 1, pp. 163–187, 2007.
- [28] P. Minero, M. Franceschetti, S. Dey, and G. N. Nair, “Data rate theorem for stabilization over time-varying feedback channels,” IEEE Trans. Autom. Control, vol. 54, no. 2, pp. 243–255, 2009.
- [29] K. Huang, W. Liu, Y. Li, A. Savkin, and B. Vucetic, “Wireless feedback control with variable packet length for industrial IoT,” IEEE Wirel. Commun. Lett., vol. 9, no. 9, pp. 1586–1590, 2020.
- [30] W. Liu, P. Popovski, Y. Li, and B. Vucetic, “Wireless networked control systems with coding-free data transmission for industrial IoT,” IEEE Internet Things J., vol. 7, no. 3, pp. 1788–1801, 2020.
- [31] G. Yang, Z. Pang, M. Jamal Deen, M. Dong, Y.-T. Zhang, N. Lovell, and A. M. Rahmani, “Homecare robotic systems for healthcare 4.0: Visions and enabling technologies,” IEEE J. Biomedical Health Informat., vol. 24, no. 9, pp. 2535–2549, 2020.
- [32] G. Zhao, M. A. Imran, Z. Pang, Z. Chen, and L. Li, “Toward real-time control in future wireless networks: Communication-control co-design,” IEEE Commun. Mag., vol. 57, no. 2, pp. 138–144, 2019.
- [33] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inform. Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
- [34] C. Xu, H. Yu, P. Zeng, and Y. Li, “Towards critical industrial wireless control: Prototype implementation and experimental evaluation on URLLC,” IEEE Commun. Mag., vol. 61, no. 9, pp. 193–199, 2023.
- [35] Z. Xiang, F. Gabriel, E. Urbano, G. T. Nguyen, M. Reisslein, and F. H. P. Fitzek, “Reducing latency in virtual machines: Enabling tactile internet for human-machine co-working,” IEEE J. Sel. Areas Commun., vol. 37, no. 5, pp. 1098–1116, 2019.
- [36] S. Mondal, L. Ruan, M. Maier, D. Larrabeiti, G. Das, and E. Wong, “Enabling remote human-to-machine applications with AI-enhanced servers over access networks,” IEEE Open J. Commun. Soc., vol. 1, pp. 889–899, 2020.
- [37] X. Kuai, X. Yuan, W. Yan, and Y.-C. Liang, “Coexistence of human-type and machine-type communications in uplink massive MIMO,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 804–819, 2021.
- [38] G. Pang, W. Liu, Y. Li, and B. Vucetic, “DRL-based resource allocation in remote state estimation,” IEEE Trans. Wirel. Commun., vol. 22, no. 7, pp. 4434–4448, 2022.
- [39] K. Huang, W. Liu, M. Shirvanimoghaddam, Y. Li, and B. Vucetic, “Real-time remote estimation with hybrid ARQ in wireless networked control,” IEEE Trans. Wirel. Commun., vol. 19, no. 5, pp. 3490–3504, 2020.
- [40] C. Sahin, L. Liu, E. Perrins, and L. Ma, “Delay-sensitive communications over IR-HARQ: Modulation, coding latency, and reliability,” IEEE J. Sel. Areas Commun., vol. 37, no. 4, pp. 749–764, 2019.
- [41] F. Ghanami, G. A. Hodtani, B. Vucetic, and M. Shirvanimoghaddam, “Performance analysis and optimization of NOMA with HARQ for short packet communications in massive IoT,” IEEE Internet Things J., vol. 8, no. 6, pp. 4736–4748, 2021.
- [42] P. Larsson, B. Smida, T. Koike-Akino, and V. Tarokh, “Analysis of network coded HARQ for multiple unicast flows,” IEEE Trans. Commun., vol. 61, no. 2, pp. 722–732, 2013.
- [43] W. Liu, D. E. Quevedo, Y. Li, and B. Vucetic, “Anytime control under practical communication models,” IEEE Trans. Autom. Control, vol. 67, no. 10, pp. 5400–5407, 2021.
- [44] T. V. Dang, K.-V. Ling, and D. E. Quevedo, “Stability analysis of event-triggered anytime control with multiple control laws,” IEEE Trans. Autom. Control, vol. 64, no. 1, pp. 420–426, 2019.
- [45] D. E. Quevedo, W.-J. Ma, and V. Gupta, “Anytime control using input sequences with Markovian processor availability,” IEEE Trans. Autom. Control, vol. 60, no. 2, pp. 515–521, 2015.
- [46] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,” IEEE Trans. Mob. Comput., vol. 19, no. 11, pp. 2581–2593, 2020.