Energy Efficient Offloading Policies in Multi-Access Edge Computing Systems with Task Handover

Ling Hou, Shi Li, , Zhishu Shen, , Jing Fu, , Jingjin Wu, and Jiong Jin Ling Hou and Jing Fu are with the School of Engineering, STEM College, RMIT University, Melbourne, Australia. E-mail: [email protected], [email protected] Li and Jiong Jin are with the School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, Australia. E-mail: [email protected], [email protected] Shen is with the School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China. E-mail: [email protected] Wu is with the Guangdong Key Laboratory of IRADS, Department of Statistics and Data Science, BNU-HKBU United International College, China. E-mail: [email protected]

Abstract

The rapid growth of mobile devices and the increasing complexity of tasks have made energy efficiency a critical challenge in Multi-Access Edge Computing (MEC) systems. This paper explores energy-efficient offloading strategies in large-scale MEC systems with heterogeneous mobile users, diverse network components, and frequent task handovers to capture user mobility. The problem is inherently complex due to the system’s scale, task and resource diversity, and the need to maintain real-time performance. Traditional optimization approaches are often computationally infeasible for such scenarios. To tackle these challenges, we model the offloading problem using the restless multi-armed bandit (RMAB) framework and develop two scalable online policies that prioritize resources based on their marginal costs. The proposed policies dynamically adapt to the system’s heterogeneity and mobility while ensuring near-optimal energy efficiency. Through extensive numerical simulations, we demonstrate that the policies significantly outperform baseline methods in power conservation and show robust performance under non-exponentially distributed task lifespans. These results highlight the practical applicability and scalability of our approach in dynamic MEC environments.

Index Terms:

Multi-access edge computing (MEC), stochastic optimization, offloading policy, restless multi-armed bandit (RMAB)

I Introduction

The Multi-access Edge Computing (MEC) has emerged as an evolutionary paradigm with the rapid expansion of Information and Communication Technology (ICT) infrastructure [1]. By decentralizing computation and storage closer to the network edge, MEC enhances data processing efficiency while alleviating the load on centralized cloud systems. This edge-cloud synergy not only reduces latency but also offers potential gains in energy efficiency by optimizing task execution at the network’s edge rather than relying solely on large-scale data centers. As task complexity grows, optimizing energy consumption across both MEC and cloud networks becomes crucial. Energy efficiency must be achieved while maintaining a balance between minimizing energy consumption and meeting task response time requirements.

One notable challenge in designing efficient task assignment policies in MEC systems is to handle handovers of mobile terminals (MTs). The dynamic nature of task handovers between edge servers and cloud resources, driven by MT mobility, adds complexity to energy-efficient task offloading [2]. While handovers are an inherent feature of MEC systems, optimizing resource allocation during these events is particularly difficult due to heterogeneous communication channels, user mobility patterns, and the computational demands of diverse applications.

Existing studies often simplify the heterogeneity of network resources, implicitly assuming either than an MT will experience the same channel condition after a handover [3, 4], or that the service provider can dynamically allocate a new channel at the moment of handover [5, 6]. However, the former assumption overlooks the dynamic nature of wireless transmissions, while the latter risks unexpected service interruptions due to unavailable channels during handovers. These limitations are particularly problematic for real-time applications, such as live video analytics or autonomous vehicle navigation, where identifying a new channel during handover can lead to additional latency and energy consumption.

Moreover, as the problem size grows, with a large number of MTs, diverse tasks, and resource types, the increased complexity of control variables due to channel and resource heterogeneity presents a significant challenge. This complexity often renders traditional optimization methods computationally infeasible, particularly in real-time environments where stringent latency constraints must be satisfied. Such challenges hinder the scalability and practical applicability of existing approaches, particularly in large-scale, dynamic MEC systems.

Different from most existing studies that explicitly considered MT handovers [2, 4, 7], this work focuses on scenarios where network service providers can accurately predict the future movements and hence handovers of MTs. Such scenarios are applicable in MEC applications like urban traffic flow management [8], emergency response [9], and healthcare management [10], where MTs in the same class typically perform similar routine activities, generate homogeneous computation tasks, and follow regular trajectories. By utilizing the prediction information of MT handovers, this study seeks to optimize the energy efficiency of such a system by appropriately allocating computation and communication resources to incoming tasks.

To address the inherent complexities of this problem, we adopt a stochastic optimization approach that captures the dynamic states of network resources and communication channels to minimize the long-run average power consumption (energy consumption rate) of the MEC system. Unlike conventional methods mentioned earlier, which are often limited by static or simplified mobility assumptions, our approach models the heterogeneous and dynamic nature of MEC environments. Specifically, we formulate the offloading problem using the restless multi-armed bandit (RMAB) framework [11], which computes marginal costs associated with resource allocation decisions. Building on this, we adopt the restless-bandit-based (RBB) resource allocation technique [12], which dynamically assigns and reserves communication and computation resources (if available) upon arrival of a task in the MEC system, according to the type of task. By leveraging the RMAB framework, we dynamically allocate resources while ensuring scalability and robustness, particularly in MEC systems with dense mobile users and frequent task handovers. Furthermore, our proposed methodology addresses practical challenges arising from heterogeneous MTs, diverse task types, varying channel conditions, as well as computing/storage components distributed across MTs, network edge, and central cloud. By explicitly considering frequent task handovers and resource heterogeneity, our work provides a scalable and optimal solution to address critical research gaps in energy-efficient resource management for large-scale MEC systems.

The contributions of this paper are summarized as follows:

•

A novel RMAB-based framework for energy-efficient task offloading is proposed in large-scale MEC systems, incorporating user mobility, resource heterogeneity and task handovers into the problem formulation.
•

A class of scalable scheduling policies, termed highest energy efficiency-adjusted capacity coefficient (HEE-ACC) is developed, which prioritize resources with the least marginal costs while ensuring asymptotic optimality for large-scale MEC systems.
•

Among HEE-ACC, two specific policies are introduced: one with proven asymptotic optimality under dominant server or channel capacity, and another that dynamically learns marginal costs from historical data for robust performance in dynamic environments.
•

Extensive simulations demonstrate significant energy conservation compared to baseline methods, and highlight the robustness of HEE-ACC policies to varying shapes of non-exponentially distributed task lifespans.

The remainder of the paper is structured as follows: Section II reviews the relevant literature. Section III introduces the MEC system model for task handover. Section IV outlines the stochastic process and formulates the optimization problem. Section V presents the energy efficient scheduling policies. Section VI showcases the results and analysis. The paper concludes in Section VII. The main notations used in this paper are listed in Table I (Appendix LABEL:app:notation).

II Related Work

II-A MEC Offloading

Recent studies on offloading policies in MEC systems have explored various aspects of this problem, including time-varying nature of transmission channels [13], partially offloading policies [14], and the combination of offloading and dispatching processes [15]. Additionally, joint planning of MEC and other techniques, such as edge server deployment [16], network slicing [17], and 3D rendering [18], have been investigated to enhance system performance. While many of these works aim to improve performance metrics like latency or throughput, some also consider energy efficiency as an additional objective.

Task offloading in MEC systems is often a non-convex optimization problem due to conflicting objectives and constraints. To tackle this, many studies have adopted deep reinforcement learning (DRL) approaches to obtain optimal task offloading policies [7, 19]. Common DRL methods include deep Q-network (DQN) [20], deep deterministic policy gradient (DDPG) [21], TD3 [22] etc. Despite the potential of these methods, their high computational complexity and convergence challenges often make them less suitable for real-time and energy-efficient decision-making in dynamic MEC environments, where reliable system performance is critical.

In contrast, several low-complexity heuristic algorithms have been proposed to overcome these computational challenges. For example, a dependency-aware edge-cloud collaboration strategy was introduced in [23] to minimize task completion time by mitigating tasks in an energy-efficient order. The Lyapunov optimization technique was applied in [24] to balance energy efficiency and queue length under unknown channel conditions. Other approaches, including game theory [25] and auction theory [26], were also proposed to maximize the net profit of MTs by allowing them to bid for communication or computation resources.

II-B Task Handover in MEC

The impact of task handovers in MEC systems, particularly when MTs move between different network zones, has been a growing focus in recent research. While several works (e.g. [2, 4, 6, 3, 5, 27, 25]) considered the effect of MT handovers on offloading decisions, many of these studies concentrate on re-allocation of computational resources rather than the full offloading process. For example, constraints on MT computational capacities and CPU frequencies were examined in [4, 2]. Some studies focused on the availability of transmission channels, considering bandwidth limitations [27, 25]. However, the unpredictable nature of wireless transmissions, such as random fading, has been less studied in the context of task offloading during handovers.

More recent research aimed to improve the overall performance of resource allocation by incorporating handover prediction [28, 29]. For example, a machine learning based prediction mechanism was applied in [27] to forecast future specifications of offloaded tasks and reassign offloading destinations after a set number of time slots. While such methods improve resource allocation performance, they often rely on fixed periodic reassignment intervals, limiting their flexibility in real-time scenarios.

To address the computational and scalability challenges of MEC offloading in dynamic environments, recent studies have proposed the use of multi-armed bandit (MAB) algorithms [30, 31]. These algorithms are particularly well-suited to balancing exploration and exploitation in resource-constrained environments. For example, a contextual online vehicular task offloading policy was explored in [32] using an online bandits approach to improve energy efficiency and reduce delays in vehicular networks. Our approach builds on these concepts by using a RMAB framework, which is more appropriate for dynamic environments where resource availability and task requirements change over time. This RMAB-based method incorporates real-time predictions of task handovers and prioritizes resources based on marginal costs, achieving scalable and energy-efficient resource allocation in large-scale MEC systems.

II-C Discussion

To summarize, compared with existing studies aiming at improving energy efficiency in MEC, our study offers a more comprehensive, holistic and nuanced approach. We address key limitations in current research by explicitly modeling the dynamic and heterogeneous nature of MEC systems, particularly in scenarios with frequent task handovers and real-time adaptability. Our proposed RMAB-based framework optimizes energy efficiency by leveraging predictive information about MT handovers and dynamically allocating resources. This newly proposed low-complexity algorithm effectively bridges the gap in previous studies and enhances the scalability and practicality of energy-efficient resource management in large-scale MEC environments.

III System Model

Let $\mathbb{R}_{0}$ , $\mathbb{R}_{+}$ and $\mathbb{N}_{+}$ represent the sets of non-negative reals, positive reals and positive integers, respectively. For any $N\in\mathbb{N}_{+}$ , we use $[N]$ to represent a set $\{1,2,\ldots,N\}$ .

III-A Network Model

Consider a wireless network with orthogonal sub-channels, which establishes connections between MTs and communication nodes (CNs), such as base stations and access points. Computing tasks generated by MTs can be appropriately offloaded to CNs through the sub-channels. As CNs are equipped and/or connected (through wired connections) with more storage, computing and networking resources (CPU, GPU, RAM, disk I/O, etc.), they are generally considered preferable for executing computationally intensive tasks than personal MTs. This offloading operation can liberate MTs from being overloaded and achieve higher energy efficiency in the network. We refer to all these storage, computing and networking equipped and/or connected with the CNs as the service components (SCs), which are classified in groups (referred to as SC groups thereafter) based on their functionalities and geographical locations. We denote $K$ as the number of SC groups at the network edge, and let $C_{k}\in\mathbb{N}_{+}$ represent the capacity of SC groups $k\in[K]$ , which is the total number of SC units in the SC group. CNs are connected to the central cloud via wired connections, enabling additional task offloading when required. Tasks offloaded by MTs can eigher be processed at the network edge by SC groups or forwarded to the cloud for completion. We assume that the cloud has infinite capacity for all SC groups.

Consider a case where MTs, with similar hardware/software profiles, in the same geographical area have homogeneous channel conditions for transmitting to and from CNs in a certain destination area, labeled by $\ell$ , which locates a set of SC groups $\mathscr{K}_{\ell}\subset[K]$ . There are in total $L\in\mathbb{N}_{+}$ such destination areas (that is, $\ell\in[L]$ ) at the network edge, and we assume without loss of generality that all $\mathscr{K}_{\ell}$ are non-empty and mutually exclusive for different $\ell\in[L]$ . We label all such channels used to connect the $L$ areas, each of which is equipped with SC groups $\mathscr{K}_{\ell}$ ( $\ell\in[L]$ ), as $i=1,2,\ldots,I$ . Each channel $i\in[I]$ can support at most $N_{i}\in\mathbb{N}_{+}$ orthogonal sub-channels due to technical constraints such as total bandwidth limits. When an MT wishes to utilize computational/storage resources in area $\ell$ , a subset of the $I$ channels are eligible to support the connection between the MT and the CNs in area $\ell$ . If this MT is too far away and cannot be connected to any CN in area $\ell$ , then none of the $I$ channels can be used between the MT and any SC group in $\mathscr{K}_{\ell}$ . We assume that $I\geq L$ - for each area $\ell$ , the SC groups may be connected via more than one wireless channels.

Refer to caption — Figure 1: A simple handover example for the network system. CN1 and CN2, SC1, and a sub-channel in each of the channels are highlighted in red, as they are occupied/reserved by a task generated by the moving MTs.

Consider a simple example of the network model in Figure 1, where there are $K=2$ SC groups (SC1 and SC2) locate in the same area $\ell=1$ (with $L=1$ ) with $I=2$ channels, used by two base stations (CN1 and CN2). For MTs with similar hardware/software profiles, if they are in the coverage of CN 1, or equivalently the coverage of Channel 1, they can connect to both SC groups in the area $\ell=1$ through Channel 1. Similarly, the connections can be established between the SC groups and the MTs through Channel 2. In this case, both channels are eligible to establish connections to the SC groups in the area $\ell=1$ .

Consider $J\in\mathbb{N}_{+}$ classes of moving MTs that can be classified by the differences in their locations, moving speeds and directions, application styles and relevant requirements on computing resources. In particular, we consider a scenario where MTs in the same class can upload/download data through the same set of eligible channels and potentially connect to the same set of SC groups. In this context, a moving MT may change its class from time to time. For instance, in Figure 1, all MTs, belonging to the same class when they start moving from the left circle, may be assigned to different classes after they have arrived in the right circle. The MTs can be classified manually by the service provider and/or automatically through conventional techniques, such as signature matching [33], based on historical features of the network traffic and mobility prediction techniques (for handover), such as [34].

We refer to the offloaded tasks generated by MTs in class $j\in[J]$ as $j$ -tasks. We consider tasks of the same type to be those generated by MTs in the same class, with similar moving directions and speeds, with the same application styles, and with the same requirements on SCs. For a certain type of task, not all SC groups at the network edge are necessarily able to serve it due to geographical or functional dis-match. If a $j$ -task is accommodated by an SC in group $k\in[K]$ that is able to serve it at the network edge, $w_{j,k}\in[C_{k}]$ units of the SC are occupied by this task until it is completed. Occupied SC units will be simultaneously released upon the completion of the task, and can be reused for future tasks [35]. If a $j$ -task cannot be served by units in SC group $k$ , because of un-matched geographical locations or required functions, define $w_{j,k}\rightarrow+\infty$ , or any other integer greater than $C_{k}$ , to prohibit these tasks from being assigned there.

III-B Edge Offloading with Task Handover

With $I$ channels in total, an MT offloading a $j$ -task transmits through a sub-channel of channel $i\in[I]$ at an achievable rate

\mu_{i,j}\coloneqq\left\{\begin{array}[]{l l}B_{i}\log_{2}\left(1+\frac{p_{i,j}h_{i,j}}{N_{0}}\right),&\text{if }\frac{p_{i,j}h_{i,j}}{N_{0}}\geq 20\mathrm{dB},\\ 0&\text{otherwise.}\end{array}\right.

(1)

where $p_{i,j}\in\mathbb{R}_{0}$ , $h_{i,j}\in\mathbb{R}_{+}$ , $B_{i}\in\mathbb{R}_{+}$ and $N_{0}\in\mathbb{R}_{+}$ are the transmission power, channel gain, bandwidth of each channel and the noise power, respectively [36, 37]. If the signal-to-noise-ratio (SNR) $(p_{i,j}h_{i,j})/N_{0}$ is smaller than $20$ dB, then reliable connections cannot be established, and the channel is regarded as not available.

We consider the situation where offloaded tasks with relatively small sizes are dominant in the network, so that the computing power of SCs located at the edge of the network or in the central cloud are sufficient to complete the services of the requests in a relatively short period of time. In this sense, the duration of computational operations of offloaded tasks is generally much shorter than the transmission time between the MTs and the CNs or the cloud; that is, the processing time of an offloaded task is dominated by the transmission time between the MT and the CN for tasks completed at the network edge, and the transmission time between the MT and the central cloud via the network edge for tasks offloaded to the central cloud. In this context, a wireless sub-channel between the MT and the CN needs to be reserved for data transmission for the entire period of the offloaded task, no matter whether the task is addressed at the network edge or in the central cloud [38]. If no such an available sub-channel (all sub-channels are fully occupied), the MT cannot offload this task to any CN, and the task has to be computed locally. In this case, the task never reaches the network edge, and any scheduling policies employed at the network edge will not affect the power consumption.

In a MEC network, MTs move across different areas, requiring connections to different CNs through different sub-channels. When an MT is sending data to or receiving results from an SC, it may change the connected CNs when moving into different places covered by different CNs, leading to task handover between CNs, as illustrated in Figure 1. For the tasks generated by such MTs, other CNs with available sub-channels in their moving directions should be prepared and reserved to ensure the quality of the connections. For instance, in Figure 1, CNs 1 and 2, some SC units, and a sub-channel in each of the channels are highlighted in red, meaning that they are occupied/reserved by a task generated by the moving MTs.

III-C Cloud Offloading

For the tasks that are offloaded to the cloud, two transmission segments, namely from the MT to the network edge and from the network edge to the cloud, are requested. For notational consistency, define a special SC group $K+1$ , which has infinite capacity $C_{K+1}\to+\infty$ , for the SC units in the cloud. As the CNs and the cloud are usually connected by a cable backbone network, we can assume that the transmission time between the network edge and the cloud, denoted by $D_{0}$ , is the same for all tasks of the same class. Therefore, we denote the expected duration of $j$ -tasks computed in the cloud and transmitted via an CN in destination area $\ell$ through a sub-channel in channel $i$ as $1/\mu_{i,j}+D_{0}$ [39]. Note that $K+1$ does not belong to any destination area $\ell\in[L]$ located at the network edge.

III-D Energy Efficiency

Power consumption of the network edge consists of static and operational power. Static power is the essential consumption incurred when an SC group is activated (which means the hardware components associated with the SC group must be powered on and connected to the network); while the operational power is consumed when SC units are engaged in processing offloading tasks. In a dynamic system such as a wireless network, the amount of operational power consumption is affected by the real-time utilization rate of the SCs. Let $\varepsilon_{k}^{0}\in\mathbb{R}_{+}$ and $\varepsilon_{k}\in\mathbb{R}_{+}$ represent the amounts of the static power and the operational power consumption per SC unit in group $k\in[K]$ , and define the expected power consumption of an SC unit for computing a $j$ -task in the cloud as $\bar{\varepsilon}_{j}\in\mathbb{R}_{+}$ . Similar power consumption models have been empirically justified and widely applied in existing research [40]. Because of its dependencies on the dynamic states of the network, a detailed discussion about the power consumption of the network edge will be provided in Section IV.

IV Stochastic Process and Optimization

Define $\mathscr{K}\coloneqq[K]\cup\{K+1\}$ as the set of SC groups in the edge and the cloud. Consider random variables $X_{i,i^{\prime},j,k}(t)\in\mathbb{N}_{0}$ , $i,i^{\prime}\in[I]$ , $j\in[J]$ , $k\in\mathscr{K}$ , that represent the number of $j$ -tasks being served by SC units in group $k$ and occupying two sub-channels in channels $i$ and $i^{\prime}$ at time $t\geq 0$ . When the $j$ -task starts to be transmitted to the network, it uses channel $i$ ; while the corresponding MT will move during the data transmission and computing process and end up using channel $i^{\prime}$ to complete the data transmission and to receive computing results from the network. For instance, in Figure 1, Channels 1 and 2 correspond to such $i$ and $i^{\prime}$ , respectively. We refer to such channels $i$ and $i^{\prime}$ as the starting and ending channel, respectively. If MTs of a specific class $j$ move relatively slowly and do not need to change the communication channel when processing their tasks, then the tasks may be transmitted via the same starting and ending channels (that is, $i=i^{\prime}$ ). Because of the limited capacities of channels and SCs, these variables should satisfy

\sum\limits_{j\in[J]}w_{j,k}\sum\limits_{\begin{subarray}{~{}}i,i^{\prime}\in[I]:\\ \mu_{i,j}\mu_{i^{\prime},j}>0\end{subarray}}X_{i,i^{\prime},j,k}(t)\leq C_{k},~{}\forall k\in[K],~{}t\geq 0,

(2)

and

\sum\limits_{k\in\mathscr{K}}\sum\limits_{i^{\prime}\in[I]}\sum\limits_{j\in[J]}\Bigl{(}X_{i,i^{\prime},j,k}(t)+X_{i^{\prime},i,j,k}(t)\Bigr{)}\leq N_{i},\\ ~{}\forall i\in[I],~{}t\geq 0.

(3)

Constraints (2) and (3) are led by the limited capacities of SC groups and wireless channels, respectively.

Let $\bm{X}(t)=(X_{i,i^{\prime},j,k}(t):i,i^{\prime}\in[I],j\in[J],k\in\mathscr{K})$ , and the state space of the process $\{\bm{X}(t),\ t\geq 0\}$ (the set involves all possible values of $\bm{X}(t)$ for $t\geq 0$ ) be

\mathscr{X}\coloneqq\prod\limits_{\begin{subarray}{~{}}i,i^{\prime}\in[I],\\ j\in[J],\\ k\in\mathscr{K}\end{subarray}}\left\{0,1,\ldots,\min\Bigl{\{}\lfloor\frac{C_{k}}{w_{j,k}}\rfloor,N_{i},N_{i^{\prime}}\Bigr{\}}\right\},

(4)

where $\prod$ represents the Cartesian product. Note that although $\mathscr{X}$ is larger than the set of possible values of $\bm{X}(t)$ , the process $\bm{X}(t)$ will be constrained by Constraints (2) and (3) in our optimization problem defined later in this section.

Upon each arrival of a user task, a scheduling policy selects a channel-SC tuple $(i,i^{\prime},k)$ to serve it. Define action variables $a_{i,i^{\prime},j,k}(\bm{x})\in\{0,1\}$ , $\bm{x}\in\mathscr{X}$ , as a function of the state for each $i,i^{\prime}\in[I]$ , $j\in[J]$ and $k\in\mathscr{K}$ . If $a_{i,i^{\prime},j,k}(\bm{x})=1$ , then two sub-channels of channels $i$ and $i^{\prime}$ and $w_{j,k}$ units of SC in group $k$ are selected to serve an incoming $j$ -task when $\bm{X}(t)=\bm{x}$ ; otherwise, the tuple of channels $i$ and $i^{\prime}$ and SC group $k$ is not selected. An incoming $j$ -task, when $\bm{X}(t)=\bm{x}$ , means the first $j$ -task coming after and excluding time $t$ , and the process $\bm{X}(t)$ is defined as left continuous in $t\geq 0$ . Let $\bm{a}(\bm{x})=(a_{i,i^{\prime},j,k}(\bm{x}):i,i^{\prime}\in[I],j\in[J],k\in\mathscr{K})$ , $\bm{x}\in\mathscr{X}$ . For all the action variables, it should further satisfy

\sum\limits_{\begin{subarray}{~{}}i,i^{\prime}\in[I]:\\ \mu_{i,j}\mu_{i^{\prime},j}>0\end{subarray}}\sum\limits_{k\in\mathscr{K}}a_{i,i^{\prime},j,k}(\bm{X}(t))\leq 1,~{}\forall j\in[J],t\geq 0.

(5)

In Constraint (5), when a $j$ -task coming at time $t$ , at most one tuple $(i,i^{\prime},k)$ will be selected, representing sub-channels for data transmission and SC units for computation to serve it. If no tuple $(i,i^{\prime},k)$ with $\mu_{i,j}>0$ and $\mu_{i^{\prime},j}>0$ is selected, then this task is blocked by the network and has to be processed or dropped by the associated MT. The blocking event happens due to the limited capacities of the channels in $[I]$ , although we assume an infinite computing capacity for the SC units in the cloud. We provide a diagram of the decision making process in Appendix A.

If, at time $t$ , a newly arrived $j$ -task is decided to be served by the channel-SC tuple $(i,i^{\prime},k)$ , then $X_{i,i^{\prime},j,k}(t)$ increments by one; if a $j$ -task is completed and leaves the system at time $t$ , then $X_{i,i^{\prime},j,k}(t)$ decrements by one.

For a newly arrived $j$ -task served by the channel-SC tuple $(i,i^{\prime},k)$ , its lifespan is considered as an independently and identically distributed random variable with mean $1/u_{j}(i,i^{\prime},k)$ , where $u_{j}(i,i^{\prime},k)$ is determined by the profiles of the associated MT and the intrinsic parameters of the communications scenarios such as the moving speed of the MT, the transmission rates of the connected sub-channels $\mu_{i,j}$ and $\mu_{i^{\prime},j}$ and the actual processing time of the task. Assume without loss of generality, for $i,i^{\prime}\in[I]$ and $k\in\mathscr{K}$ , if $\mu_{i,j}=0$ or $\mu_{i^{\prime},j}=0$ , then $u_{j}(i,i^{\prime},k)\equiv 0$ ; otherwise, $u_{j}(i,i^{\prime},k)>0$ .

We consider a realistic case with a large number of MTs that generate tasks independently and are classified into different classes. The arrivals of tasks in class $j\in[J]$ are considered to follow a Poisson process with the mean rate $\lambda_{j}\in\mathbb{R}_{+}$ , which is appropriate for a large number of independent MTs sharing similar stochastic properties [37]. We consider Poisson arrivals for the clarity of analytical descriptions, while our scheduling policy proposed in Section V is not limited to the Poisson case and applies to a wide range of practical scenarios. In Section VI, the effectiveness of the proposed policy is demonstrated through extensive simulation results with time-varying arrival rates that are able to capture scenarios with busy and idle periods of the network system.

In conjunction with the mean arrival rates and the expected lifespans of different tasks being served by different SCs through different channels, the action variables determine the transition rates of the system state, $\bm{X}(t)$ , at each time $t$ . They further affect the long-run probabilities of different states that incur different energy consumption rates.

A scheduling policy, denoted by $\phi$ , is determined by the action variables for all states $\bm{x}\in\mathscr{X}$ and selects a channel-SC tuple $(i,i^{\prime},k)$ upon each arrival of a user task. We add a superscript and rewrite the action variables as $a^{\phi}_{i,i^{\prime},j,k}(\bm{x})$ , which represents the action variables under policy $\phi$ . To ensure the fairness for all arrived tasks, in this paper, we consider those policies $\phi$ that rejects/blocks a new task if and only if there is no vacant channel-SC tuple to locate it. Let $\Phi$ represent the set of all such policies $\phi$ determined by action variables $a^{\phi}_{i,i^{\prime},j,k}(\bm{x})$ , $i,i^{\prime}\in[I]$ , $j\in[J]$ and $k\in\mathscr{K}$ . Similarly, since the stochastic process $\{\bm{X}(t),t\geq 0\}$ is conditioned on the underlying policy, we rewrite it as $\bm{X}^{\phi}(t)$ .

We aim to minimize the long-run average power consumption (energy consumption rate) of the network. That is,

\min\limits_{\phi\in\Phi}\lim\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\int_{0}^{T}\sum\limits_{i,i^{\prime}\in[I]}\sum\limits_{j\in[J]}\sum\limits_{k\in[K]}\varepsilon_{k}w_{j,k}X^{\phi}_{i,i^{\prime},j,k}(t)\ dt\\ +\sum\nolimits_{k\in[K]}\varepsilon_{k}^{0}+\mathcal{E}^{\phi}_{c},

(6)

where the long-run average power consumption for computing tasks offloaded to the cloud is given by

\mathcal{E}^{\phi}_{c}\coloneqq\\ \sum\limits_{j\in[J]}\bar{\varepsilon}_{j}\lim\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\int_{0}^{T}\sum\limits_{i,i^{\prime}\in[I]}u_{j}(i,i^{\prime},K+1)X^{\phi}_{i,i^{\prime},j,K+1}(t)\ dt,

(7)

subject to Constraints (5), (2) and (3).

The first and second items in Objective (6) are the long-run average operational and static power consumption of the SC groups at the edge network, respectively, and the last term, described in Eq. (7), stands for the long-run average power consumption for computing tasks offloaded to the cloud. Recall that $\varepsilon_{k}$ ( $k\in[K])$ is the energy consumption rate per SC unit of group $k$ , while $\bar{\varepsilon}_{j}$ is the energy consumption per $j$ -task that is processed and completed by computing components in the cloud network.

We refer to the minimization problem described in Eq. (6), Constraints (5), (2) and (3) as the task offloading scheduling problem (TOSP). The TOSP consists of $I^{2}J(K+1)$ parallel bandit processes, $\{X^{\phi}_{i,i^{\prime},j,k}(t),t\geq 0\}$ , each of which is a Markov decision process (MDP) with binary actions [41]. The parallel bandit processes are coupled by Constraints (5), (2) and (3). The TOSP is complicated by its large state space $\mathscr{X}$ , which increases exponentially in $I,J$ and $K$ and prevents conventional optimizers for MDP, such as value iteration and reinforcement learning, from being applied directly. The TOSP also extends the unrealistic assumption in [42] that considers only communication channels without any MEC servers or components. It follows with a substantially complicated problem with respect to both analytical and numerical analysis.

V Scheduling Policy

This section first presents the restless bandit technique, then describes our proposed scalable scheduling policy in a greedy manner and demonstrates its near-optimality.

The restless bandit technique in [12] provides a sensible way to decompose all the bandit processes coupled by the constraints involving both state and action variables into $I^{2}J(K+1)$ independent processes. The marginal cost of selecting a certain tuple of the SC and the communication channels $(i,i^{\prime},k)$ ( $i,i^{\prime}\in[I],k\in\mathscr{K}$ ) is then quantified by an offline-computed real number with significantly reduced computational complexity. Based on the marginal costs of all the possible channel-SC tuples, a heuristic scheduling policy can be proposed by prioritizing those with the least marginal costs. Following the tradition of the restless multi-armed bandit (RMAB) problem initially proposed in [11], we refer to the marginal cost as the index of the associated tuple of SC and communication channels. For any task in class $j$ , each of the index is offline-computed by optimizing the bandit process associated with the channel-SC tuple $(i,i^{\prime},k)$ , for which the computational complexity is only linear to $\min\{\lfloor C_{k}/w_{j,k}\rfloor,N_{i},N_{i^{\prime}}\}$ . The resulting scheduling policy is hence applicable to realistically large network systems without requesting excessively large computational or storage power. More importantly, under provided conditions, we prove that such a scalable scheduling policy approaches optimality as the size of the system tends to infinity. We refer to a detailed exposition in Proposition 1. Unlike the canonical resource allocation problem studied in [12], the TOSP focuses on the edge computing systems with mobile MTs and task handover between different CNs. In particular, for the TOSP, the marginal costs for the tuples involve unknown parameters and not directly computable.

V-A Randomization and Relaxation

Along with the Whittle relaxation technique [11], we randomize the action variables and relax Constraints (5), (2) and (3) to

\sum\limits_{\begin{subarray}{~{}}(i,i^{\prime},k)\in[I]^{2}\times\mathscr{K}:\\ \mu_{i,j}\mu_{i^{\prime},j}>0\end{subarray}}\lim\limits_{t\rightarrow\infty}\mathbb{E}\Bigl{[}a_{i,i^{\prime},j,k}^{\phi}\bigl{(}\bm{X}(t)\bigr{)}\Bigr{]}\leq 1,\forall j\in[J],

(8)

\sum\limits_{\begin{subarray}{~{}}(i,i^{\prime},j)\in[I]^{2}\times[J]:\\ \mu_{i,j}\mu_{i^{\prime},j}>0\end{subarray}}w_{j,k}\lim\limits_{t\rightarrow\infty}\mathbb{E}\Bigl{[}X^{\phi}_{i,i^{\prime},j,k}(t)\Bigr{]}\leq C_{k},\forall k\in[K],

(9)

and

\sum\limits_{i^{\prime}\in[I]}\sum\limits_{j\in[J]}\sum\limits_{k\in\mathscr{K}}\Bigl{(}\lim\limits_{t\rightarrow\infty}\mathbb{E}\bigl{[}X^{\phi}_{i,i^{\prime},j,k}(t)\bigr{]}+\lim\limits_{t\rightarrow\infty}\mathbb{E}\bigl{[}X_{i^{\prime},i,j,k}(t)\bigr{]}\Bigr{)}\\ \leq N_{i},\forall i\in[I],

(10)

respectively. For $i,i^{\prime}\in[I]$ , $j\in[J]$ , $k\in\mathscr{K}$ , define

\mathscr{X}_{i,i^{\prime},j,k}\coloneqq\Bigl{\{}0,1,\ldots,\min\bigl{\{}\lfloor C_{k}/w_{j,k}\rfloor,N_{i},N_{i^{\prime}}\bigr{\}}\Bigr{\}},

(11)

and for $x\in\mathscr{X}_{i,i^{\prime},j,k}$ , define

\alpha^{\phi}_{i,i^{\prime},j,k}(x)\coloneqq\lim_{t\rightarrow\infty}\mathbb{E}[a^{\phi}_{i,i^{\prime},j,k}(\bm{X}^{\phi}(t))|X^{\phi}_{i,i^{\prime},j,k}(t)=x],

(12)

which takes values in $[0,1]$ . Let $\bm{\alpha}^{\phi}_{i,i^{\prime},j,k}\coloneqq(\alpha^{\phi}_{i,i^{\prime},j,k}(x):x\in\mathscr{X}_{i,i^{\prime},j,k})$ , and let $\tilde{\Phi}$ represent the set of all the policies determined by the randomized action variables $\bm{\alpha}^{\phi}_{i,i^{\prime},j,k}$ for all $i,i^{\prime}\in[I]$ , $j\in[J]$ , and $k\in\mathscr{K}$ . We refer to the problem

\min\limits_{\phi\in\tilde{\Phi}}\lim\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\int_{0}^{T}\sum\limits_{i,i^{\prime}\in[I]}\sum\limits_{j\in[J]}\sum\limits_{k\in[K]}\varepsilon_{k}w_{j,k}X^{\phi}_{i,i^{\prime},j,k}(t)~{}dt\\ +\sum\nolimits_{k\in[K]}\varepsilon_{k}^{0}+\mathcal{E}^{\phi}_{c},

(13)

subject to Constraints (8), (9) and (10) as the relaxed problem. Objective (13) is derived from Objective (6) by replacing $\Phi$ with $\tilde{\Phi}$ . Since $\Phi\subset\tilde{\Phi}$ and Constraints (5), (2) and (3) are more stringent than Constraints (8), (9) and (10), any policy applicable to the TOSP will also be applicable to the relaxed problem; while, a policy for the relaxed problem is not necessarily applicable to the TOSP. It follows that the relaxed problem achieves a lower bound for the minimum of TOSP.

Consider a stable system with existing stationary distribution $\boldsymbol{\pi}^{\phi}\in[0,1]^{|\mathscr{X}|}$ in the long-run case under a given policy $\phi\in\tilde{\Phi}$ , we write the dual function of the relaxed problem as

L(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})\\ =\min\limits_{\phi\in\tilde{\Phi}}\biggl{[}\sum\limits_{i,i^{\prime}\in[I]}\sum\limits_{j\in[J]}\sum\limits_{k\in[K]}\varepsilon_{k}w_{j,k}\sum_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}(x)x\\ +\sum\limits_{k\in[K]}\varepsilon_{k}^{0}\\ +\sum\limits_{i,i^{\prime}\in[I]}\sum\limits_{j\in[J]}\bar{\varepsilon}_{j}u_{j}(i,i^{\prime},K+1)\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,K+1}}\pi^{\phi}_{i,i^{\prime},j,K+1}(x)x\\ +\sum\limits_{j\in[J]}\nu_{j}\Bigl{(}\sum\limits_{\begin{subarray}{~{}}(i,i^{\prime},k)\in[I]^{2}\times\mathscr{K}:\\ u_{j}(i,i^{\prime},k)>0\end{subarray}}\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}(x)\alpha^{\phi}_{i,i^{\prime},j,k}(x)-1\Bigr{)}\\ +\sum\limits_{k\in[K]}\gamma_{k}\Bigl{(}\sum\limits_{\begin{subarray}{~{}}(i,i^{\prime},j)\in[I]^{2}\times[J]:\\ u_{j}(i,i^{\prime},k)>0\end{subarray}}w_{j,k}\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}(x)x-C_{k}\Bigr{)}\\ +\sum\limits_{i\in[I]}\eta_{i}\Bigl{(}\sum\limits_{j\in[J]}\sum\limits_{i^{\prime}\in[I]}\sum\limits_{k\in\mathscr{K}}\bigl{(}\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}(x)x\\ +\sum\nolimits_{x\in\mathscr{X}_{i^{\prime},i,j,k}}\pi^{\phi}_{i^{\prime},i,j,k}(x)x\bigr{)}-N_{i}\Bigr{)}\biggr{]},

(14)

where $\boldsymbol{\nu}\in\mathbb{R}_{0}^{J}$ , $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ , and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ are the Lagrange multipliers for Constraints (8), (9), and (10), respectively. Following the Whittle relaxation [11] and the restless bandit technique for resource allocation [12], the minimization in Eq. (14) can be decomposed into $I^{2}J(K+1)$ independent sub-problems that are, for $(i,i^{\prime},j,k)\in[I]^{2}\times[J]\times[K]$ ,

L_{i,i^{\prime},j,k}(\nu_{j},\gamma_{k},\eta_{i},\eta_{i^{\prime}})\coloneqq\min\nolimits_{\phi\in\tilde{\Phi}}L^{\phi}_{i,i^{\prime},j,k}(\nu_{j},\gamma_{k},\eta_{i},\eta_{i^{\prime}})\\ =\min\nolimits_{\phi\in\tilde{\Phi}}\varepsilon_{k}w_{j,k}\sum\nolimits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}x\\ +\nu_{j}\sum\nolimits_{x\in\mathscr{X}_{i,i^{\prime},j,K+1}}\pi^{\phi}_{i,i^{\prime},j,k}(x)\alpha^{\phi}_{i,i^{\prime},j,k}(x)\\ +\gamma_{k}w_{j,k}\sum\nolimits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}(x)x\\ +(\eta_{i}+\eta_{i^{\prime}})\sum\nolimits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\phi}_{i,i^{\prime},j,k}(x)x,

(15)

and, for $(i^{\prime},i,j)\in I^{2}\times[J]$ ,

L_{i,i^{\prime},j,K+1}(\nu_{j},\gamma_{K+1},\eta_{i},\eta_{i^{\prime}})\\ \coloneqq\min_{\phi\in\tilde{\Phi}}L^{\phi}_{i,i^{\prime},j,K+1}(\nu_{j},\gamma_{K+1},\eta_{i},\eta_{i^{\prime}})\\ =\min\nolimits_{\phi\in\tilde{\Phi}}\bar{\varepsilon}_{j}u_{j}(i,i^{\prime},j,K+1)\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,K+1}}\pi^{\phi}_{i,i^{\prime},j,K+1}(x)x\\ +\nu_{j}\sum\nolimits_{x\in\mathscr{X}_{i,i^{\prime},j,K+1}}\pi^{\phi}_{i,i^{\prime},j,K+1}(x)\alpha^{\phi}_{i,i^{\prime},j,K+1}(x)\\ +(\eta_{i}+\eta_{i^{\prime}})\sum\nolimits_{x\in\mathscr{X}_{i,i^{\prime},j,K+1}}\pi^{\phi}_{i,i^{\prime},j,K+1}(x)x,

(16)

where recall that each policy $\phi\in\tilde{\Phi}$ is determined by the action variables $\alpha^{\phi}_{i,i^{\prime},j,k}(x)$ for $x\in\mathscr{X}_{i,i^{\prime},j,k}$ . In particular,

L(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})=\sum\limits_{i,i^{\prime}\in[I]}\sum\limits_{j\in[K]}\sum\limits_{k\in\mathscr{K}}L_{i,i^{\prime},j,k}(\nu_{j},\gamma_{k},\eta_{i},\eta_{i^{\prime}})\\ +\sum\limits_{k\in[K]}\varepsilon_{k}^{0}-\sum\limits_{j\in[J]}\nu_{j}-\sum\limits_{k\in[K]}C_{k}\gamma_{k}-\sum\limits_{i\in[I]}\eta_{i}.

(17)

In this context, the computational complexity of each sub-problem is linear to the size of $\mathscr{X}_{i,i^{\prime},j,k}$ - alternatively, $\min\bigl{\{}\lfloor C_{k}/w_{j,k}\rfloor,N_{i},N_{i^{\prime}}\bigr{\}}$ . Together with the independence between those sub-problems, the computational complexity to achieve the minimum in Eq. (14) is linear in $I^{2}$ , $J$ and $K$ . We have the following corollaries of [12, Proposition 1].

Corollary 1.

For $i,i^{\prime}\in[I]$ , $j\in[J]$ , and $k\in\mathscr{K}$ with $\mu_{i,j}\mu_{i^{\prime},j}>0$ and given $\nu_{j},\gamma_{k},\eta_{i},\eta_{i^{\prime}}\in\mathbb{R}_{0}$ , a policy $\phi$ determined by the action variables $\bm{\alpha}^{\phi}_{i,i^{\prime},j,k}\in[0,1]^{|\mathscr{X}_{i,i^{\prime},j,k}|}$ is optimal to the sub-problem associated with $(i,i^{\prime},j,k)$ , if, for any $x\in\mathscr{X}_{i,i^{\prime},j,k}$ and $k\in[K]$ ,

\alpha^{\phi}_{i,i^{\prime},j,k}(x)\begin{cases}1,&\text{if }\nu_{j}>\psi_{j}(i,i^{\prime},k),\\ \in[0,1],&\text{if }\nu_{j}=\psi_{j}(i,i^{\prime},k),\\ 0,&\text{otherwise,}\end{cases}

(18)

where

\psi_{j}(i,i^{\prime},k)\coloneqq\frac{\lambda_{j}}{u_{j}(i,i^{\prime},k)}\varepsilon_{k}w_{j,k}\mathds{1}\{k<K+1\}\\ +\lambda_{j}\bar{\varepsilon}_{j}\mathds{1}\{k=K+1\}\\ +\bigl{(}1+\frac{\lambda_{j}}{u_{j}(i,i^{\prime},k)}\bigr{)}\bigl{(}w_{j,k}\gamma_{k}\mathds{1}\{k<K+1\}+\eta_{i}+\eta_{i^{\prime}}\bigr{)}

(19)

Corollary 1 indicates the existence of a threshold-style policy, satisfying Eq. (18), that is optimal to the sub-problem associated with $(i,i^{\prime},j,k)$ . Although there also exists a threshold-style policy that achieves the minimum of the sub-problems, the minimum of the sub-problems is only a lower bound of the minimum of the original TOSP described in Objective (6), Constraints (5), (2) and (3), and the threshold-style policy described in Eq. (18) is in general not applicable to the TOSP. The key is to establish a connection between the sub-problems and the TOSP, and then we can translate the threshold-style policy to those applicable, scalable and near-optimal to the original TOSP.

Following the tradition of the restless bandit studies in the past decades, we refer to the real number $\psi_{j}(i,i^{\prime},k)$ as the index associated with the bandit process $\{X^{\phi}_{i,i^{\prime},j,k}(t),t\geq 0\}$ , which intuitively represents the marginal cost of selecting the tuple $(i,i^{\prime},k)$ to serve a $j$ -task. For the threshold-style, optimal policy, tuples $(i,i^{\prime},k)$ with smaller indices are prioritized to have $a^{\phi}_{i,i^{\prime},j,k}(X^{\phi}_{i,i^{\prime},j,k}(t))=1$ than those with larger indices.

In [11], Whittle proposed the well-known Whittle index policy that always prioritizes bandit processes with the highest/lowest indices and conjectured asymptotic optimality of the Whittle index policy to the original problem. In subsequent studies such as [12, 43], the Whittle index policy has been proved to be asymptotically optimal in special cases and/or numerically demonstrated to be near-optimal.

However, unlike the past work, exempli gratia, [42, 43], the index $\psi_{j}(i,i^{\prime},k)$ described in Eq. (19) is not directly computable. It is dependent on the unknown multipliers $\gamma_{k}$ , $\eta_{i}$ and $\eta_{i^{\prime}}$ due to the inevitable capacity constraints described in Constraints (2) and (3). These capacity constraints substantially complicate the analysis of the TOSP, and, more importantly, prevent existing theorems related to bounded performance degradation from being directly applied to the TOSP.

In Section V-B, we will discuss the methods to approximate the indices. Based on the approximated indices (representing the marginal costs), we propose heuristic policies applicable and scalable to the original TOSP, and demonstrate their effectiveness with respect to energy efficiency in Section VI.

V-B Highest Energy Efficiency with Adjusted Capacity Coefficients (HEE-ACC)

Input : Indices

\psi_{j}(i,i^{\prime},k)

for all tuples

[I]^{2}\times[J]\times\mathscr{K}

, adjusted capacity coefficients

\boldsymbol{\gamma}

and

\bm{\eta}

, and system state

\bm{X}^{\text{HEE-ACC}}(t)

Output : Scheduling actions

\boldsymbol{a}^{\text{HEE-ACC}}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}\coloneqq\Bigl{(}a^{\text{HEE-ACC}}_{i,i^{\prime},j,k}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}:(i,i^{\prime},j,k)\in[I]^{2}\times[J]\times\mathscr{K}\Bigr{)}

2 Function HEE-ACC

3 Initialize

\boldsymbol{a}^{\text{HEE-ACC}}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}\leftarrow\bm{0}

;

4 For

j\in[J]

, build the minimum heap

\mathcal{H}_{j}

of all the tuples

(i,i^{\prime},k)\in[I]^{2}\times\mathscr{K}

with

u_{j}(i,i^{\prime},k)>0

according to their indices

\psi_{j}(i,i^{\prime},k)

;

/* Tie cases are broken by selecting the tuples with the smallest expected lifespans. */

5 ;

6 for $\forall j\in[J]$ do

(\bar{i},\bar{i}^{\prime},\bar{k})\leftarrow

the root node of the minimum heap

\mathscr{H}_{j}

;

8 while $(\bar{i},\bar{i}^{\prime},\bar{k})\notin\mathscr{T}_{j}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}$ do

\mathscr{H}_{j}

pop heap;

(\bar{i},\bar{i}^{\prime},\bar{k})\leftarrow

the root node of the updated

\mathscr{H}_{j}

;

a^{\text{HEE-ACC}}_{\bar{i},\bar{i}^{\prime},j,\bar{k}}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}\leftarrow 1

;

14 return

\boldsymbol{a}^{\text{HEE-ACC}}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}

;

Algorithm 1 Implementing HEE-ACC

Let $\bm{\nu}=\bm{\nu}^{\textup{*}}$ , $\boldsymbol{\gamma}=\boldsymbol{\gamma}^{\textup{*}}$ and $\bm{\eta}=\bm{\eta}^{\textup{*}}$ represent the optimal dual variables of the relaxed problem described in Objective (13), Constraints (8), (9) and (10). Based on [12, Theorem EC.1] and Corollary 1, we have the following proposition.

Proposition 1.

If, for $\bm{\nu}=\bm{\nu}^{\textup{*}}$ , $\boldsymbol{\gamma}=\boldsymbol{\gamma}^{\textup{*}}$ and $\bm{\eta}=\bm{\eta}^{\textup{*}}$ , a policy $\phi\in\tilde{\Phi}$ satisfying Eq. (18) is optimal to the relaxed problem described in Objective (13), Constraints (8), (9) and (10), then there exists a policy $\varphi$ proposed based on the indices $\psi_{j}(i,i^{\prime},k)$ with plugged in $\boldsymbol{\gamma}=\boldsymbol{\gamma}^{\textup{*}}$ and $\bm{\eta}=\bm{\eta}^{\textup{*}}$ such that $\varphi$ approaches optimality of TOSP (described in Objective (6), Constraints (5), (2) and (3)) as $\lambda_{j}$ ( $j\in[J]$ ), $C_{k}$ ( $k\in[K]$ ), and $N_{i}$ ( $i\in[I]$ ) tend to infinity proportionately.

The proof of Proposition 1 is provided in Appendix B. We say such a policy $\varphi$ is asymptotically optimal to TOSP. Asymptotic optimality is appropriate for large-scale systems with highly dense and heterogeneous mobile users, MEC servers/service components, base stations, access points, et cetera.

Recall that, from Corollary 1, the threshold-style policy $\phi$ satisfying Eq. (18) minimizes the right-hand side of the dual function in Eq. (14) (that is, minimize all the sub-problems), but such a policy is in general not applicable to the TOSP, and this minimum is only a lower bound of the minimum of the TOSP. The threshold-style policy $\phi$ reveals intrinsic features of the bandit processes and quantifies them through the indices, $\psi_{j}(i,i^{\prime},k)$ , representing the marginal costs of selecting certain bandit processes. These indices can then be utilized to construct effective policices, such as the $\varphi$ in Proposition 1, applicable to the original TOSP. However, the exact values of the indices that are able to lead to asymptotic optimality remain an open question because of the unknown $\boldsymbol{\gamma}$ and $\bm{\eta}$ .

Intuitively, the Lagrange multipliers $\boldsymbol{\gamma}$ and $\bm{\eta}$ represent the marginal budgets for running out the network capacities described in Constraints (9) and (10), respectively. When an SC or a communication channel exhibits heavy traffic, it should become less popular to incoming tasks, which can be adjusted by the attached multipliers $\boldsymbol{\gamma}$ or $\bm{\eta}$ to the indices. In other words, the multiplier associated with a capacity constraint of an SC or a channel with heavy traffic is expected to be large; while for those SCs or channels with light traffic or sufficiently large capacities, the corresponding multipliers should remain zero. We refer to $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ as the capacity coefficients attached to the indices.

On the other hand, observing $\psi_{j}(i,i^{\prime},k)$ in Eq. (19), apart from the items related to $\boldsymbol{\gamma}$ and $\bm{\eta}$ , the remaining item is, for $i,i^{\prime}\in[I]$ , $j\in[J]$ , and $k\in\mathscr{K}$

\mathcal{R}_{j}(i,i^{\prime},k)=\begin{cases}\frac{\lambda_{j}}{u_{j}(i,i^{\prime},k)}\varepsilon_{k}w_{j,k},&\text{if }k\in[K],\\ \lambda_{j}\bar{\varepsilon}_{j},&\text{otherwise},\end{cases}

(20)

which is in fact the expected power consumption per unit service rate on SC $k$ contributed by the $j$ -tasks, or equivalently the reciprocal of the achieved service rate per unit power consumption. We refer to the achieved service rate per unit power consumption of a certain SC as its energy efficiency. Less marginal costs of selecting certain tuples $(i,i^{\prime},k)$ imply SCs with higher energy efficiencies. The phenomenon coincides with a special case of the RMAB process studied in [43] but further considers the heterogeneous task requirements and complex capacity constraints over heterogeneous network resources.

In this context, for the TOSP described in Objective (6), Constraints (5), (2) and (3), we propose a policy that always prioritizes tuples $(i,i^{\prime},k)$ with higher $\psi_{j}(i,i^{\prime},k)$ for the $j$ -tasks, for which the capacity coefficients $\boldsymbol{\gamma}$ and $\bm{\eta}$ are given a prior through sensible algorithms. We refer to this policy as the Highest Energy Efficiency with Adjusted Capacity Coefficients (HEE-ACC). More precisely, for $j\in[J]$ and $\bm{x}\in\mathscr{X}$ , define a subset of tuples $\mathscr{T}_{j}(\bm{x})\subset[I]^{2}\times\mathscr{K}$ such that, for any $(i,i^{\prime},k)\in\mathscr{T}_{j}(\bm{x})$ , $u_{j}(i,i^{\prime},k)>0$ ,

\sum\limits_{i_{1},i^{\prime}_{1}\in[I]}\sum\limits_{j_{1}\in[J]}x_{i_{1},i^{\prime}_{1},j_{1},k}+w_{j,k}\leq C_{k},

(21)

\sum\limits_{i_{1}\in[I]}\sum\limits_{j_{1}\in[J]}\sum\limits_{k_{1}\in\mathscr{K}}\bigl{(}x_{i,i_{1},j_{1},k_{1}}+x_{i_{1},i,j_{1},k_{1}}\bigr{)}+1\leq N_{i},

(22)

and

\sum\limits_{i_{1}\in[I]}\sum\limits_{j_{1}\in[J]}\sum\limits_{k_{1}\in\mathscr{K}}\bigl{(}x_{i^{\prime},i_{1},j_{1},k_{1}}+x_{i_{1},i^{\prime},j_{1},k_{1}}\bigr{)}+1\leq N_{i^{\prime}}.

(23)

The set $\mathscr{T}_{j}(\bm{x})$ is the set of available tuples $(i,i^{\prime},j)$ that can serve a new $j$ -task with complied capacity constraints (2) and (3) when $\bm{X}^{\phi}(t)=\bm{x}$ . The action variables for HEE-ACC are given by

a^{\text{HEE-ACC}}_{i,i^{\prime},j,k}\bigl{(}\bm{X}^{\text{HEE-ACC}}(t)\bigr{)}\\ =\begin{cases}1,&\text{if }(i,i^{\prime},k)=\arg\min\nolimits_{(i,i^{\prime},k)\in\mathscr{T}_{j}(\bm{X}^{\text{HEE-ACC}}(t))}\psi_{j}(i,i^{\prime},k),\\ 0,&\text{otherwise},\end{cases}

where if $\arg\min$ returns more than one tuples $(i,i^{\prime},k)$ , then we select the one with the highest $u_{j}(i,i^{\prime},k)$ . In Algorithm 1, as an example, we provide the pseudo-code of implementing HEE-ACC. Note that Algorithm 1 is a possible but not unique way of implementing HEE-ACC. The computational complexity of implementing HEE-ACC is at most linear logarithmic to the number of possible tuples $|\{(i,i^{\prime},k)\in I^{2}\times\mathscr{K}|u_{j}(i,i^{\prime},k)>0\}|$ , mainly dependent on the complexity of finding the tuple with smallest index and checking tuples’ availability with respect to the capacity constraints.

It remains to adjust the values of the capacity coefficients $\boldsymbol{\gamma}$ and $\bm{\eta}$ , which should be adversely affected by the SCs and channels’ remaining capacities.

V-B1 HEE-ACC-zero

In a simple case with sufficiently large capacities, we can directly set $\boldsymbol{\gamma}=\bm{0}$ and $\bm{\eta}=\bm{0}$ , for which HEE-ACC is equivalent to a policy that always selects the most energy-efficient channel-SC tuples. We refer to such a policy as the HEE-ACC-zero policy, which implies the zero capacity coefficients. Given the simplicity of the HEE-ACC-zero policy, from [12, Corollary EC.1], when the capacity constraint over an SC (in Constraint (2)) or a channel (in Constraint (3)) is dominant, HEE-ACC-zero is near-optimal - it approaches optimality as the problem size becomes sufficiently large.

Mathematically, we say SC $k\in[K]$ is dominant for the resource tuple $(i,i^{\prime},k)$ if $N_{i},N_{i^{\prime}}\rightarrow+\infty$ . Similarly, a channel $i$ (or $i^{\prime}\in[I]$ ) is dominant when $C_{k}\rightarrow+\infty$ and $N_{i^{\prime}}\rightarrow+\infty$ (or $N_{i}\rightarrow+\infty)$ . We have the following proposition.

Proposition 2.

If, for each $j\in[J]$ and resource tuple $(i,i^{\prime},k)\in[I]^{2}\times[K]$ with $\mu_{j}(i,i^{\prime},k)>0$ , the SC $k$ , channel $i$ , or channel $i^{\prime}$ is dominant, the blocking probabilities of all task classes $j\in[J]$ are positive in the asymptotic regime, and $w_{j,k}(1+\frac{\lambda_{j}}{\mu_{j}(i,i^{\prime},k)})$ is a constant for all $j\in[J]$ and tuples $(i,i^{\prime},k)\in[I]^{2}\times[K]$ with $\mu_{j}(i,i^{\prime},k)>0$ , then HEE-ACC-zero approaches optimality of TOSP (described in Objective (6), Constraints (5), (2) and (3)) as $\lambda_{j}$ ( $j\in[J]$ ) and the capacities of the dominant SCs or channels of all the resource tuples tend to infinity proportionately.

The proof of Proposition 2 is provided in Appendix C. Proposition 2 imposes a hypothesis, requesting the existence of a dominant SC $k\in[K]$ or channel $i\in[I]$ for each resource tuple such that $C_{k}$ or $N_{i}$ is significantly less than the capacities of the non-dominant SC and/or channel(s) of the same tuple. It indicates a situation where the capacity constraint associated with the dominant SC group $k$ (in Constraint (2)) or channel $i$ (in Constraint (3)) frequently achieves equality while there is negligible chance to invoke the capacity constraints for the non-dominant SC groups and channels. Note that, for a resource tuple $(i,i^{\prime},k)$ , the definition about dominant SC $k$ (or channel $i$ ) in Proposition 2, i.e., $N_{i},N_{i^{\prime}}\rightarrow+\infty$ (or $C_{k},N_{i^{\prime}}\rightarrow+\infty$ ), are stated for rigorous descriptions in mathematics, which, in practice, can be interpreted as a case with $C_{k}\ll N_{i},N_{i^{\prime}}$ (or $N_{i}\ll C_{k},N_{i^{\prime}}$ ).

Although, for the more general case with no dominant capacity constraint, HEE-ACC-zero neglects the effects of traffic densities imposed on different SC groups and channels, it is neat with proven near-optimality in special cases and is easy to implement.

Input : The learned capacity coefficients

\boldsymbol{\gamma}

and

\bm{\eta}

at time

t

, variables

\bm{M}(t)

and system state

\bm{X}^{\text{HEE-ALRN}}(t)

Output : Scheduling actions

\boldsymbol{a}^{\text{HEE-ALRN}}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}\coloneqq\Bigl{(}a^{\text{HEE-ALRN}}_{i,i^{\prime},j,k}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}:(i,i^{\prime},j,k)\in[I]^{2}\times[J]\times\mathscr{K}\Bigr{)}

3Function HEE-ALRN

4 Initialize

\boldsymbol{a}^{\text{HEE-ALRN}}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}\leftarrow\bm{0}

;

5 For

j\in[J]

, build the minimum heap

\mathcal{H}_{j}

of all the tuples

(i,i^{\prime},k)\in[I]^{2}\times\mathscr{K}

with

u_{j}(i,i^{\prime},k)>0

according to their indices

\psi_{j}(i,i^{\prime},k)

with the input

\boldsymbol{\gamma}

and

\bm{\eta}

6 for $\forall j\in[J]$ do

(\bar{i},\bar{i}^{\prime},\bar{k})\leftarrow

the root node of the minimum heap

\mathscr{H}_{j}

;

8 if $(\bar{i},\bar{i}^{\prime},\bar{k})\in\mathscr{T}_{j}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}$ then

9 if $k\in[K]$ then

\gamma_{\bar{k}}\leftarrow\max\{0,\gamma_{\bar{k}}-\Delta_{\gamma}\}

;

\eta_{\bar{i}}\leftarrow\max\{0,\eta_{\bar{i}}-\Delta_{\eta}\}

;

\eta_{\bar{i}^{\prime}}\leftarrow\max\{0,\eta_{\bar{i}^{\prime}}-\Delta_{\eta}\}

;

15 while $(\bar{i},\bar{i}^{\prime},\bar{k})\notin\mathscr{T}_{j}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}$ do

\mathscr{H}_{j}

pop heap;

(\bar{i},\bar{i}^{\prime},\bar{k})\leftarrow

the root node of the updated

\mathscr{H}_{j}

;

18 if $(\bar{i},\bar{i}^{\prime},\bar{k})\in\mathscr{T}_{j}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}$ then

19 if $k\in[K]$ then

\gamma_{\bar{k}}\leftarrow\max\{0,\gamma_{\bar{k}}-\Delta_{\gamma}\}

;

\eta_{\bar{i}}\leftarrow\max\{0,\eta_{\bar{i}}-\Delta_{\eta}\}

;

\eta_{\bar{i}^{\prime}}\leftarrow\max\{0,\eta_{\bar{i}^{\prime}}-\Delta_{\eta}\}

;

25 else

26 If (21), (22) and/or (23) is violated by setting

k=\bar{k}

i=\bar{i}

and

i^{\prime}=\bar{i}^{\prime}

, then increment

M_{\bar{k}}(t)

M_{K+\bar{i}}(t)

and/or

M_{K+\bar{i}^{\prime}}(t)

by one.;

29 If the increment adjustment is triggered by the updated

M_{\bar{k}}(t)

M_{K+\bar{i}}(t)

M_{K+\bar{i}^{\prime}}(t)

, then, based on the sub-gradients described in (25) and (26), update the values of

\bm{M}(t)

\boldsymbol{\gamma}

and

\bm{\eta}

accordingly.;

a^{\text{HEE-ALRN}}_{\bar{i},\bar{i}^{\prime},j,\bar{k}}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}\leftarrow 1

;

32 return

\boldsymbol{a}^{\text{HEE-ALRN}}\bigl{(}\bm{X}^{\text{HEE-ALRN}}(t)\bigr{)}

;

Algorithm 2 Implementing HEE-ALRN

V-B2 HEE-ALRN

Recall that the minimum of the sub-problems is a lower bound of the minimum of the relaxed TOSP problem. As stated in Proposition 1, when the relaxed problem achieves strong duality in the asymptotic regime, there exists a policy prioritizes tuples $(i,i^{\prime},k)$ with the lowest $\psi_{j}(i,i^{\prime},k)$ that is asymptotically optimal to TOSP. In particular, in this case with achieved asymptotic optimality, the capacity coefficients $\boldsymbol{\gamma}$ and $\bm{\eta}$ should be equal to the optimal dual variables $\boldsymbol{\gamma}^{\textup{*}}$ and $\bm{\eta}^{\textup{*}}$ of the relaxed problem in the asymptotic regime. Nonetheless, due to the complexity of the relaxed problem in both the asymptotic and non-asymptotic regime, the strong duality and the exact values of the dual variables remain open questions. Here, we aim at a diminishing gap between the performance of the sub-problems and the relaxed problem and approximate the values of $\boldsymbol{\gamma}^{\textup{*}}$ and $\bm{\eta}^{\textup{*}}$ through a learning method with closed-loop feedback.

To emphasize the effects of different $\boldsymbol{\gamma}$ and $\bm{\eta}$ , we rewrite HEE-ACC as HEE-ACC $(\boldsymbol{\gamma},\bm{\eta})$ . We explore the values of $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ while exploiting HEE-ACC $(\boldsymbol{\gamma},\bm{\eta})$ with the most updated capacity coefficients in TOSP. We iteratively learn the values of $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ in the vein of the well-known gradient descent algorithm [44] based on the observed traffic tensity upon the SCs and channels.

In particular, let $\Phi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})$ represent the set of all policies satisfying (18) with given $\boldsymbol{\nu}\in\mathbb{R}_{0}^{J}$ , $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ . Define

\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})\coloneqq\sup\Bigl{\{}\boldsymbol{\nu}\in\mathbb{R}_{0}^{J}\Bigl{|}\\ \exists\phi\in\Phi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta}),~{}\text{Constraint~{}\eqref{eqn:action_constraint:relax} is satisfied. }\Bigr{\}},

(24)

and let $\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})$ represent the only policy in $\Phi\bigl{(}\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta}),\boldsymbol{\gamma},\bm{\eta}\bigr{)}$ . We provide the pseudo-code of computing $\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})$ and the action variables for $\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})$ with given $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ in Appendix D.

Since dual function $L(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})$ is continuous, piece-wise linear and concave in $\boldsymbol{\nu}$ , $\boldsymbol{\gamma}$ and $\boldsymbol{\eta}$ , $-L(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})$ is sub-differentiable with existing sub-gradients at $\gamma_{k}=\gamma$ for $k\in[K]$ ,

\Lambda_{k}^{\gamma}(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})\coloneqq\lim\limits_{\gamma_{k}\downarrow\gamma}\nabla_{\gamma_{k}}L(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})\\ =\sum\nolimits_{\begin{subarray}{~{}}(i,i^{\prime},j)\in[I]^{2}\times[J]:\\ u_{j}(i,i^{\prime},k)\end{subarray}}w_{j,k}\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\varphi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})}_{i,i^{\prime},j,k}(x)x-C_{k},

(25)

where $\varphi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})$ is a policy in $\Phi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})$ , and at $\eta_{i}=\eta$ for $i\in[I]$ ,

\Lambda_{i}^{\eta}(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})\coloneqq\lim\limits_{\eta_{i,j,\ell}\downarrow\eta}\nabla_{\eta_{i}}L(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})\\ =\sum\limits_{(i^{\prime},j,k)\in[I]\times[J]\times\mathscr{K}}\Bigl{(}\sum\limits_{x\in\mathscr{X}_{i,i^{\prime},j,k}}\pi^{\varphi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})}_{i,i^{\prime},j,k}(x)x\\ +\sum\limits_{x\in\mathscr{X}_{i^{\prime},i,j,k}}\pi^{\varphi(\boldsymbol{\nu},\boldsymbol{\gamma},\bm{\eta})}_{i^{\prime},i,j,k}(x)x\Bigr{)}-N_{i}.

(26)

We implement the HEE-ACC policy with some prior values of the capacity coefficients $\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$ and $\bm{\eta}\in\mathbb{R}_{0}^{I}$ and initialize $\boldsymbol{\gamma}=\boldsymbol{\gamma}_{0}$ and $\bm{\eta}=\bm{\eta}_{0}$ . HEE-ACC is implemented in the way as described in Algorithm 1. Recall that we aim to approximate the optimal dual variables $\boldsymbol{\gamma}^{\textup{*}}$ and $\bm{\eta}^{\textup{*}}$ in the asymptotic regime, which should satisfy the complementary slackness conditions of the relaxed problem described in Objective (13), Constraints (8), (9) and (10). That is, for the SCs and channels exhibiting light traffic, the corresponding capacity coefficients should remain zero, while for SCs and channels with heavy traffic, the capacity coefficients are likely to be incremented.

Upon the arrival of a $j$ -task, if tuple $(i,i^{\prime},k)$ is selected by the currently employed HEE-ACC $(\boldsymbol{\gamma},\bm{\eta})$ and successfully accepts the newly arrived $j$ -task, then update the value of $\gamma_{k}$ , $\eta_{i}$ and $\eta_{i^{\prime}}$ to $\max\{0,\gamma_{k}-\Delta_{\gamma}\}$ , $\max\{0,\eta_{i}-\Delta_{\eta}\}$ and $\max\{0,\eta_{i^{\prime}}-\Delta_{\eta}\}$ , respectively, where $\Delta_{\gamma}$ and $\Delta_{\eta}$ are hyper-parameters used to adjust the step size of the learning process. We refer to the decrement process of the capacity coefficients as the decrement adjustment. On the other hand, we keep a vector $\bm{M}(t)\coloneqq\bigl{(}M_{n}(t):n\in[K+I]\bigr{)}\in\mathbb{N}_{0}^{K+I}$ to count the occurrences of rejecting a task due to violated capacity constraints. The vector $\bm{M}(t)$ is initialized to be $\bm{0}$ . In Line 2 of Algorithm 2, a tuple is selected based on HEE-ACC $(\boldsymbol{\gamma},\bm{\eta})$ . However, in Line 2 of Algorithm 2, if the selected tuple is unable to serve the j-task due to violated capacity constraint(s), the values $M_{\bar{k}}(t)$ , $M_{K+\bar{i}}(t)$ and/or $M_{K+\bar{i}^{\prime}}(t)$ are incremented by one to indicate which constraint has been violated (Constraints (21), (22) and/or (23), respectively). When $M_{n}(t)$ for some $n\in[K+I]$ reaches a pre-determined threshold, $\bar{M}\in\mathbb{N}_{+}$ , we re-set $M_{n}(t)$ to zero and trigger an increment adjustment of the capacity coefficients. In particular, with hyper-parameter $\Delta^{+}_{\gamma}\in\mathbb{R}_{+}$ and $\Delta^{+}_{\eta}\in\mathbb{R}_{+}$ , if $n\in[K]$ and $\Lambda_{k}^{\gamma}\bigl{(}\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta}),\boldsymbol{\gamma},\bm{\eta}\bigr{)}>0$ , then increment $\gamma_{n}$ by $\Delta^{+}_{\gamma}$ ; if $n\in[K+I]\backslash[K]$ and $\Lambda_{i}^{\eta}\bigl{(}\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta}),\boldsymbol{\gamma},\bm{\eta}\bigr{)}>0$ with $i=n-K$ , then increment $\eta_{i}$ by $\Delta^{+}_{\eta}$ ; otherwise, do not change the capacity coefficients. With the adjusted capacity coefficients $\boldsymbol{\gamma}$ and $\bm{\eta}$ , we continue implementing the HEE-ACC $(\boldsymbol{\gamma},\bm{\eta})$ policy and keep refining the coefficients subsequently. We refer such a HEE-ACC $(\boldsymbol{\gamma},\bm{\eta})$ policy, for which the coefficients $\boldsymbol{\gamma}$ and $\bm{\eta}$ are learnt and updated through historical observations, as the HEE-ACC-learning (HEE-ALRN) policy. We provide the pseudo-code of implementing HEE-ALRN in Algorithm 2.

VI Results and Analysis

VI-A Simulation Setup

We consider an MEC network with 20 different destination areas ( $L=20$ ), three different groups of SCs ( $K=3$ ), installed on the edge of the Internet, and infinitely many SC units located in the cloud. Each destination area includes a CN that serves $h\in\mathbb{N}_{+}$ wireless channels for various MTs, and is connected with the SCs in the edge network through wired connections. In this context, there are $I=20h$ wireless channels, each of which is potentially shared by multiple offloading requests with capacity $N_{i}$ ( $i\in[I]$ ) listed in Appendix E. We set the computing capacity of each SC group $C_{k}=C^{0}_{k}h$ ( $k\in[K]$ ) where $C^{0}_{k}$ are positive integers randomly generated from $\{5,6,\ldots,10\}$ .

There are four different classes of MTs ( $J=4$ ) that keep sending tasks to the MEC system. We set the arrival rates of the requests $\lambda_{j}$ ( $j\in[J]$ ) to be $\lambda^{0}_{j}h$ , where $\lambda^{0}_{j}$ is randomly generated from $[1,1.5]$ . Consider the expected lifespan of a $j$ -task by setting $\mu_{j}(i,i^{\prime},k)=\lambda_{j}/\rho$ with given parameter $\rho$ . We will specify different $\rho$ in several different simulation scenarios. The specified $\lambda_{j}$ ( $j\in[J]$ ) for the tested simulation runs are listed in Appendix E. We refer to the parameter $h\in\mathbb{N}_{+}$ as the scaling parameter, which implies the size of the MEC system and the scale of the offloading problem. When $h$ becomes large, the studied MEC system is appropriate for an urban area with a highly dense population and compatibly many MEC servers or other computing components located near mobile users.

The wireless channels between different CNs and MTs may be interfered and become unstable due to distances, obstacles, geographical conditions, signal power, et cetera. The starting and ending channels for each MT in class $j$ are selected from eligible candidates with $\mu_{i,j}>0$ , where the movement of the MT determines the eligibility. We provide in Appendix E the sets of eligible candidates for the starting and ending channels of the MTs in each of the four classes, as well as the remaining parameter $w_{j,k}$ for all $j\in[J]$ and $k\in[K]$ . In particular, if requests in class $j$ cannot be served by SCs in group $k$ , we set $w_{j,k}\rightarrow+\infty$ .

VI-B Baseline Policies

We numerically demonstrate the effectiveness of HEE-ACC-zero and HEE-ALRN by comparing them with two baseline policies: Maximum expected Revenue Rate (MRR) [45] and Node Ranking Measurement virtual network embedding (NRM-VNE) [46]. MRR and NRM-VNE are both priority-style policies that always select the channel-SC tuple with the highest instantaneous revenue rate per unit requirement and the highest product of the SC and the channels’ remaining capacities, respectively. We adapt MRR and NRM-VNE to our MEC system by considering the wireless channels and SC units as substrate network resources/physcial nodes used to support the incoming requests. In particular, from [45], MRR was demonstrated to be a promising policy that aims to maximize the long-run average revenue, which is equivalent to the minimization of the long-run average power consumption discussed in this paper. More precisely, for the MEC system in this paper, the instantaneous revenue per unit requirement of MRR for the tuple $(i,i^{\prime},k)$ and $j$ -tasks reduces to

-\frac{\lambda_{j}w_{j,k}\varepsilon_{k}}{(\lambda_{j}+\mu_{j}(i,i^{\prime},k))(w_{j,k}+2)}\mathds{1}\{k<K+1\}\\ -\frac{\lambda_{j}\mu_{j}(i,i^{\prime},k)\bar{\varepsilon}_{j}}{(\lambda_{j}+\mu_{j}(i,i^{\prime},k)(w_{j,k}+2)}\mathds{1}\{k=K+1\}.

(27)

The NRM-VNE was proposed in [46] to avoid traffic congestion and aimed to achieve a maximized throughput.

In Sections VI-C and VI-D, we explore scenarios with exponentially and non-exponentially, respectively, distributed task lifespans. In Section VI-C, we further consider time-varying real-world workloads by incorporating Google cluster traces [47, 48] in our simulations.

The $95\%$ confidence intervals of all the simulated results in this section, based on the Student t-distribution, are within $\pm 3\%$ of the observed mean.

VI-C Exponentially Distributed Lifespans

In Figs. LABEL:fig:seed50-rho7.5:timeline and LABEL:fig:seed50-rho10:timeline, we evaluate the performance of HEE-ACC-zero and HEE-ALRN with respect to power conservation, throughput per unit power, and the average delay of the offloaded tasks. We examine $\rho=7.5,10$ in the two figures, representing cases with relatively light and heavy traffic, respectively.

Define $\mathcal{E}^{\phi}$ and $\mathcal{T}^{\phi}$ as the long-run average operational power consumption and the long-run average throughput of the edge system, respectively, under policy $\phi$ . In this section, we consider the percentage of power conservation of a policy $\phi$ as the relative difference between $\mathcal{E}^{\text{NRM-VNE}}$ and $\mathcal{E}^{\phi}$ ; that is,

\frac{\mathcal{E}^{\text{NRM-VNE}}-\mathcal{E}^{\phi}}{\mathcal{E}^{\text{NRM-VNE}}}.

(28)

Similarly, consider the percentage of throughput improvement per unit power of policy $\phi$ as the relative difference between $\mathcal{T}^{\phi}/\mathcal{E}^{\phi}$ and $\mathcal{T}^{\text{NRM-VNE}}/\mathcal{E}^{\text{NRM-VNE}}$ , given by

\frac{\mathcal{T}^{\phi}/\mathcal{E}^{\phi}-\mathcal{T}^{\text{NRM-VNE}}/\mathcal{E}^{\text{NRM-VNE}}}{\mathcal{T}^{\text{NRM-VNE}}/\mathcal{E}^{\text{NRM-VNE}}}.

(29)

In Figure LABEL:fig:seed50-rho7.5:timeline, HEE-ALRN, HEE-ACC-zero, and MRR achieve over $15\%$ conservation against NRM-VNE with respect to operational power, while the average delay of the four policies are compatible. In this case, with relatively light traffic, HEE-ALRN clearly outperforms HEE-ACC-zero and MRR with respect to power consumption. When the traffic becomes heavier, in Figure LABEL:fig:seed50-rho10:timeline, HEE-ALRN achieves even higher energy conservation compared to HEE-ACC-zero, MRR and NRM-VNE, while keeps compatible average delay with the other three policies.

In particular, in Figs. LABEL:fig:rho7.5:throughput-power-seed50:timeline and LABEL:fig:rho10:throughput-power-seed50:timeline, we plot the throughput improvement per unit power of HEE-ALRN, MRR and HEE-ACC-zero against NRM-VNE. Observing these figures, HEE-ALRN, HEE-ACC-zero and MRR all achieve substantially higher throughput per unit power than that of NRM-VNE. It is caused by the significant power conservation but negligible throughput degradation for the three policies. Similarly, in Figure LABEL:fig:rho10:throughput-power-seed50:timeline with heavy offered traffic, HEE-ALRN clearly outperforms all the other policies. It is consistent with our expectation since HEE-ALRN dynamically adjusts the capacity coefficients in a closed-loop manner with feedback from the system, while the other policies consider only offline-determined action variable for each state.

In Figure LABEL:fig:varying-time:seed50-rho1.5:timeline, we illustrate the performance of the above-mentioned four policies, where different task groups $j\in[J]$ have different offered traffic intensities, $\rho_{j}\in\mathbb{R}_{+}$ and we set $\mu_{j}(i,i^{\prime},k)=\lambda_{j}/\rho_{j}$ . The values of $\rho_{j}$ for $j\in[J]$ are provided in Appendix E. The other system parameters for the simulations presented in Figure LABEL:fig:varying-time:seed50-rho1.5:timeline are the same as those for the simulations in Figure LABEL:fig:seed50-rho7.5:timeline. In Figure LABEL:fig:varying-time:seed50-rho1.5:timeline, HEE-ALRN remains its clear advantages against all the other policies while achieves the least average delay and compatible throughput per unit power. The observation is consistent with that in previous figures and our arguments in Section V.

In Figs. LABEL:fig:seed50-rho7.5:scaler and LABEL:fig:seed50-rho10:scaler, we examined the performance of the four policies against the scaling parameter $h$ . Recall that the scaling parameter $h$ measures the size of the optimization problem. A large $h$ is appropriate for an urban area with highly dense mobile users and many compatible communication and computing capacities, which is the primary concern of this paper. Similar to Figs. LABEL:fig:seed50-rho7.5:timeline and LABEL:fig:seed50-rho10:timeline, HEE-ALRN, HEE-ACC-zero and MRR in general outperforms NRM-VNE in all the tested cases with respect to power consumption and throughput per unit power. The substantially higher throughput per unit power for HEE-ALRN, HEE-ACC-zero and MRR is guaranteed by their substantial power conservation with negligible degradation in average throughput. In particular, in Figs. LABEL:fig:delay-seed50-rho7.5:scaler and LABEL:fig:delay-seed50-rho10:scaler, we tested the average delay of the offloaded tasks, for which HEE-ALRN achieves the least delay, and the others are slightly higher and similar to each other. Among these three policies, in Figs. LABEL:fig:seed50-rho7.5:scaler and LABEL:fig:seed50-rho10:scaler, HEE-ALRN always outperforms HEE-ACC-zero and MRR with respect to power consumption and average delay, which is consistent with our observations and arguments in the previous paragraphs.

In Figure LABEL:fig:seed50-varying-rho1.5:scaler, we consider the performance of the above-mentioned four policies against the scaling parameter $h$ with different $\rho_{j}$ for each $j\in[J]$ . The values of $\rho_{j}$ for $j\in[J]$ are provided in Appendix E, and the other system parameters are the same as those for the simulations in Figure LABEL:fig:seed50-rho7.5:scaler. In Figure LABEL:fig:seed50-varying-rho1.5:scaler, HEE-ALRN significantly outperforms all the others with respect to power consumption, job throughput and the average delay, which is consistent with our observations and arguments in previous figures and paragraphs. HEE-ACC-zero achieves some merits in power consumption and has similar job throughput and average delay, compared to NRM-VNE, and it has close performance as MRR.

Apart from the time-invariant arrival rates discussed earlier in this section, in Figure LABEL:fig:google, we further consider time-varying arrival rates of tasks. We incorporate Google cluster traces [47, 48] in the same MEC system with $\rho_{j}=\rho=10$ and $h=1$ . In Figs. LABEL:fig:google-pwer and LABEL:fig:google-pwer-percentage, we present the absolute and relative, respectively, power consumption averaged per hour under the four policies. Recall that in Figure LABEL:fig:google-pwer-percentage, the percentages of power conservation is the relative difference between the dedicated policy and NRM-VNE with respect to power consumption averaged per hour. As demonstrated in Figs. LABEL:fig:google-pwer and LABEL:fig:google-pwer-percentage, HEE-ALRN, HEE-ACC-zero, and MRR still achieves significantly lower power consumption, over $15\%$ lower in all the tested time slots (hours), than that of NRM-VNE (the pink dashes in Figure LABEL:fig:google-pwer). In Figure LABEL:fig:google-pwer-percentage, HEE-ALRN saves slightly more power consumption than that of MRR and HEE-ACC-zero, for which the curves almost coincide with each other. In Figure LABEL:fig:google-delay, for the same case, we further test the average delay of the tasks offloading to the edge system, where the four policies have almost the same average delay for each time slot (hour).

VI-D Non-exponential Lifespans

For a large-scale MEC system, the system performance is expected to be robust against different distributions of task lifespans, because, even if some sub-channels and SC units are busy to serve excessively long tasks, the subsequently arrived tasks are still likely to be supported by compatibly many idle sub-channels and SC units. The long-run average performance of the entire system is unlikely to be significantly affected by different shapes of the lifespan distributions, such as the heavy-tailed distributions [49, 50].

In Figs. LABEL:fig:sensitivity-zero and LABEL:fig:sensitivity-alrn, we evaluate the effects of non-exponentially distributed task lifespans for HEE-ACC-zero and HEE-ALRN, respectively. Similar to the settings in [43], we simulated four different lifespan distributions for the requests: deterministic, exponential, Pareto distribution with finite variance (Pareto-F), and Pareto distribution with infinite variance (Pareto-INF). Pareto-F and Pareto-INF are constructed by setting the shape parameter of the Pareto distribution to 2.001 and 1.98, respectively. Other simulation settings are the same as those for Figs. LABEL:fig:seed50-rho7.5:timeline and LABEL:fig:seed50-rho7.5:scaler.

In Figs. LABEL:fig:sensitivity-zero and LABEL:fig:sensitivity-alrn, the y-axis stands for the relative deviation between the power consumption for a specified lifespan distribution and that for the exponentially distributed lifespan. That is, let $\mathcal{E}^{\phi,D}$ represent the average power consumption under the policy $\phi$ and the lifespan distribution $D$ , the y-axis of Figs. LABEL:fig:sensitivity-zero and LABEL:fig:sensitivity-alrn represents

\frac{\mathcal{E}^{\phi,D}-\mathcal{E}^{\phi,\text{Exponential}}}{\mathcal{E}^{\phi,\text{Exponential}}},

(30)

for the policies $\phi=$ HEE-ACC-zero and HEE-ALRN, respectively, with specified distributions $D=$ Deterministic, Pareto-F, and Pareto-INF. In the figures, the deviations are all the time within $\pm 2\%$ , which is negligible given that the confidence intervals of the simulations are considered to be $\pm 3\%$ of the means. It is consistent with the argument stated at the beginning of this subsection.

VII Conclusions

We study the energy efficiency of a large-scale MEC offloading problem with task handover. As mentioned earlier in the paper, the complexity and the large size of the problem prevent conventional optimization techniques from being directly applied. We adapt the restless bandit technique to the MEC offloading problem and propose a class of online strategies, the HEE-ACC policies, that are applicable to realistically scaled MEC systems. If appropriate capacity coefficients are provided, the HEE-ACC strategies achieve proven asymptotic optimality – an asymptotically optimal strategy approaches optimality as the scale of the MEC system tends to infinity. This implies that the proposed strategies are likely to be near-optimal in practical scenarios with highly dense mobile users and compatibly many communication channels and computing components. Within the HEE-ACC class, we propose two specific strategies: HEE-ACC-zero and HEE-ALRN. The former is neat with proved asymptotic optimality in special cases and the latter is expected to achieve a higher performance by dynamically learning the most appropriate capacity coefficients. In Section VI, we validate the effectiveness of HEE-ACC-zero and HEE-ALRN by comparing them with two baseline policies. It is demonstrated that HEE-ALRN outperforms HEE-ACC-zero with respect to power conservation, which is consistent with our discussion in Section V. We further demonstrate the robustness of the two policies, through numerical simulations, against different lifespan distributions of the proposed policies.

References

[1] T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and D. Sabella, “On Multi-access edge computing: A survey of the emerging 5G network edge cloud architecture and orchestration,” IEEE Communications Surveys and Tutorials, vol. 19, no. 3, pp. 1657–1681, 2017.
[2] T. M. Ho and K.-K. Nguyen, “Joint server selection, cooperative offloading and handover in multi-access edge computing wireless network: A deep reinforcement learning approach,” IEEE Transactions on Mobile Computing, vol. 21, no. 7, pp. 2421–2435, 2022.
[3] T. Deng, Y. Chen, G. Chen, M. Yang, and L. Du, “Task offloading based on edge collaboration in MEC-enabled IoV networks,” Journal of Communications and Networks, vol. 25, no. 2, pp. 197–207, 2023.
[4] H. Maleki, M. Başaran, and L. Durak-Ata, “Handover-enabled dynamic computation offloading for vehicular edge computing networks,” IEEE Transactions on Vehicular Technology, vol. 72, no. 7, pp. 9394–9405, 2023.
[5] N. Monir, M. M. Toraya, A. Vladyko, A. Muthanna, M. A. Torad, F. E. A. El-Samie, and A. A. Ateya, “Seamless handover scheme for MEC/SDN-based vehicular networks,” Journal of Sensor and Actuator Networks, vol. 11, no. 1, 2022.
[6] W. Shu and Y. Li, “Joint offloading strategy based on quantum particle swarm optimization for MEC-enabled vehicular networks,” Digital Communications and Networks, vol. 9, no. 1, pp. 56–66, 2023.
[7] M. Li, J. Gao, L. Zhao, and X. Shen, “Deep reinforcement learning for collaborative edge computing in vehicular networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 4, pp. 1122–1135, 2020.
[8] F. Davoli, M. Marchese, and F. Patrone, “Flow assignment and processing on a distributed edge computing platform,” IEEE Transactions on Vehicular Technology, vol. 71, no. 8, pp. 8783–8795, 2022.
[9] L. Wang, J. Zhang, J. Chuan, R. Ma, and A. Fei, “Edge intelligence for mission cognitive wireless emergency networks,” IEEE Wireless Communications, vol. 27, no. 4, pp. 103–109, 2020.
[10] T. Hewa, A. Braeken, M. Ylianttila, and M. Liyanage, “Multi-access edge computing and blockchain-based secure telehealth system connected with 5G and IoT,” in Proceedings of the IEEE Global Communications Conference (GLOBECOM), 2020, pp. 1–6.
[11] P. Whittle, “Restless bandits: Activity allocation in a changing world,” Journal of Applied Probability, vol. 25, pp. 287–298, 1988.
[12] J. Fu, B. Moran, and P. G. Taylor, “A restless bandit model for resource allocation, competition, and reservation,” Operations Research, vol. 70, no. 1, pp. 416–431, 2021.
[13] Z. Sun and M. R. Nakhai, “An online learning algorithm for distributed task offloading in multi-access edge computing,” IEEE Transactions on Signal Processing, vol. 68, pp. 3090–3102, 2020.
[14] M. Zhao, J.-J. Yu, W.-T. Li, D. Liu, S. Yao, W. Feng, C. She, and T. Q. S. Quek, “Energy-aware task offloading and resource allocation for time-sensitive services in mobile edge computing systems,” IEEE Transactions on Vehicular Technology, vol. 70, no. 10, pp. 10 925–10 940, 2021.
[15] T. Liu, D. Guo, Q. Xu, H. Gao, Y. Zhu, and Y. Yang, “Joint task offloading and dispatching for MEC with rational mobile devices and edge nodes,” IEEE Transactions on Cloud Computing, vol. 11, no. 3, pp. 3262–3273, 2023.
[16] H. Song, B. Gu, K. Son, and W. Choi, “Joint optimization of edge computing server deployment and user offloading associations in wireless edge network via a genetic algorithm,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2535–2548, 2022.
[17] B. Xiang, J. Elias, F. Martignon, and E. D. Nitto, “Joint planning of network slicing and mobile edge computing: Models and algorithms,” IEEE Transactions on Cloud Computing, vol. 11, no. 1, pp. 620–638, 2023.
[18] R. Xie, J. Fang, J. Yao, X. Jia, and K. Wu, “Sharing-aware task offloading of remote rendering for interactive applications in mobile edge computing,” IEEE Transactions on Cloud Computing, vol. 11, no. 1, pp. 997–1010, 2023.
[19] P. Zhao, J. Tao, K. Lui, G. Zhang, and F. Gao, “Deep reinforcement learning-based joint optimization of delay and privacy in multiple-user MEC systems,” IEEE Transactions on Cloud Computing, vol. 11, no. 2, pp. 1487–1499, 2023.
[20] J. Wang, L. Zhao, J. Liu, and N. Kato, “Smart resource allocation for mobile edge computing: A deep reinforcement learning approach,” IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 3, pp. 1529–1541, 2021.
[21] H. Hu, D. Wu, F. Zhou, S. Jin, and R. Q. Hu, “Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach,” in Proceedings of the IEEE Global Communications Conference (GLOBECOM), 2021, pp. 1–6.
[22] F. G. Wakgra, B. Kar, S. B. Tadele, S.-H. Shen, and A. U. Khan, “Multi-objective offloading optimization in MEC and vehicular-fog systems: A distributed-TD3 approach,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–13, 2024.
[23] L. Chen, J. Wu, J. Zhang, H.-N. Dai, X. Long, and M. Yao, “Dependency-aware computation offloading for mobile edge computing with edge-cloud cooperation,” IEEE Transactions on Cloud Computing, vol. 10, no. 4, pp. 2451–2468, 2022.
[24] Y. Chen, N. Zhang, Y. Zhang, X. Chen, W. Wu, and X. Shen, “Energy efficient dynamic offloading in mobile edge computing for Internet of things,” IEEE Transactions on Cloud Computing, vol. 9, no. 3, pp. 1050–1060, 2021.
[25] G. Wu, H. Wang, H. Zhang, Y. Zhao, S. Yu, and S. Shen, “Computation offloading method using stochastic games for software-defined-network-based multiagent mobile edge computing,” IEEE Internet of Things Journal, vol. 10, no. 20, pp. 17 620–17 634, 2023.
[26] Y. Su, W. Fan, Y. Liu, and F. Wu, “A truthful combinatorial auction mechanism towards mobile edge computing in industrial Internet of Things,” IEEE Transactions on Cloud Computing, vol. 11, no. 2, pp. 1678–1691, 2023.
[27] E. F. Maleki, L. Mashayekhy, and S. M. Nabavinejad, “Mobility-aware computation offloading in edge computing using machine learning,” IEEE Transactions on Mobile Computing, vol. 22, no. 1, pp. 328–340, 2023.
[28] N. Uniyal, A. Bravalheri, X. Vasilakos, R. Nejabati, D. Simeonidou, W. Featherstone, S. Wu, and D. Warren, “Intelligent mobile handover prediction for zero downtime edge application mobility,” in Proceedings of the IEEE Global Communications Conference (GLOBECOM), 2021, pp. 1–6.
[29] X. Yuan, M. Sun, and W. Lou, “A dynamic deep-learning-based virtual edge node placement scheme for edge cloud systems in mobile environment,” IEEE Transactions on Cloud Computing, vol. 10, no. 2, pp. 1317–1328, 2022.
[30] L. Wang and J. Zhang, “Adaptive multi-armed bandit learning for task offloading in mobile edge computing,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 5285–5289.
[31] H. Li, L. Li, S. Tan, X. Zhong, and S. Zhang, “A multi-user effective computation offloading mechanism for MEC system: Batched multi-armed bandits approach,” IEEE Transactions on Network and Service Management, 2024.
[32] Y. Lin, Y. Zhang, J. Li, F. Shu, and C. Li, “Popularity-aware online task offloading for heterogeneous vehicular edge computing using contextual clustering of bandits,” IEEE Internet of Things Journal, vol. 9, no. 7, pp. 5422–5433, 2022.
[33] N. Jing, M. Yang, S. Cheng, Q. Dong, and H. Xiong, “An efficient SVM-based method for multi-class network traffic classification,” in Proceedings of the IEEE International performance computing and communications conference (IPCCC), 2011, pp. 1–8.
[34] A. Magnano, X. Fei, A. Boukerche, and A. A. Loureiro, “A novel predictive handover protocol for mobile IP in vehicular networks,” IEEE Transactions on Vehicular Technology, vol. 65, no. 10, pp. 8476–8495, 2015.
[35] L. Gillam, K. Katsaros, M. Dianati, and A. Mouzakitis, “Exploring edges for connected and autonomous driving,” in Proceedings of the IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2018, pp. 148–153.
[36] M. Mukherjee, L. Shu, and D. Wang, “Survey of fog computing: Fundamental, network applications, and research challenges,” IEEE Communications Surveys Tutorials, vol. 20, no. 3, pp. 1826–1857, thirdquarter 2018.
[37] G. Lee, W. Saad, and M. Bennis, “An online secretary framework for fog network formation with minimal latency,” in Proceedings of the IEEE International Conference on Communications (ICC), May 2017, pp. 1–6.
[38] C. You, K. Huang, H. Chae, and B. Kim, “Energy-efficient resource allocation for mobile-edge computation offloading,” IEEE Transactions on Wireless Communications, vol. 16, no. 3, pp. 1397–1411, March 2017.
[39] R. Deng, R. Lu, C. Lai, and T. H. Luan, “Towards power consumption-delay tradeoff by workload allocation in cloud-fog computing,” in Proceedings of the IEEE International Conference on Communications (ICC), June 2015, pp. 3909–3914.
[40] J. Wu, E. W. M. Wong, Y.-C. Chan, and M. Zukerman, “Power consumption and GoS tradeoff in cellular mobile networks with base station sleeping and related performance studies,” IEEE Transactions on Green Communications and Networking, vol. 4, no. 4, pp. 1024–1036, 2020.
[41] J. C. Gittins, K. Glazebrook, and R. R. Weber, Multi-armed bandit allocation indices: 2nd edition. Wiley, Mar. 2011.
[42] Q. Wang, J. Fu, J. Wu, B. Moran, and M. Zukerman, “Energy-efficient priority-based scheduling for wireless network slicing,” in Proceedings of the IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, UAE, Dec. 2018.
[43] J. Fu and B. Moran, “Energy-efficient job-assignment policy with asymptotically guaranteed performance deviation,” IEEE/ACM Transactions on Networking, vol. 28, no. 3, pp. 1325–1338, 2020.
[44] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge University Press, 2004.
[45] J. Fu, B. Moran, P. G. Taylor, and C. Xing, “Resource competition in virtual network embedding,” Stochastic Models, vol. 37, no. 1, pp. 231–263, 2020.
[46] P. Zhang, H. Yao, and Y. Liu, “Virtual network embedding based on computing, network, and storage resource constraints,” IEEE Internet of Things Journal, vol. 5, no. 5, pp. 3298–3304, 2017.
[47] J. Wilkes, “More Google cluster data,” Google research blog, Nov. 2011, posted at http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html, accessed at Jul. 8, 2019.
[48] C. Reiss, J. Wilkes, and J. L. Hellerstein, “Google cluster-usage traces: format + schema,” Google Inc., Mountain View, CA, USA, Technical Report, Nov. 2011, revised 2014-11-17 for version 2.1. Posted at https://github.com/google/cluster-data, accessed at Jul. 8, 2019.
[49] M. E. Crovella and A. Bestavros, “Self-similarity in World Wide Web traffic: evidence and possible causes,” IEEE/ACM Transactions on Networking, vol. 5, no. 6, pp. 835–846, Dec. 1997.
[50] M. Harchol-Balter, Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.

Jing Fu (S’15-M’16) received the B.Eng. degree in computer science from Shanghai Jiao Tong University, Shanghai, China, in 2011, and the Ph.D. degree in electronic engineering from the City University of Hong Kong in 2016. She has been with the School of Mathematics and Statistics, the University of Melbourne as a Post-Doctoral Research Associate from 2016 to 2019. She has been a lecturer in the discipline of Electronic & Telecommunications Engineering, RMIT University, since 2020. Her research interests now include energy-efficient networking/scheduling, resource allocation in large-scale networks, semi-Markov/Markov decision processes, restless multi-armed bandit problems, stochastic optimization.

Input :

\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}

and

\bm{\eta}\in\mathbb{R}_{0}^{IJL}

Output :

\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})

defined in Eq. (24), and action variables

\boldsymbol{\alpha}^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}\coloneqq\Bigl{(}\alpha^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}_{i,i^{\prime},j,k}(x):~{}i,i^{\prime}\in[I],j\in[J],k\in[K],x\in\mathscr{X}_{i,i^{\prime},j,k}\Bigr{)}

4Function Multipliers

5 Initialize

\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})\leftarrow\bm{0}

\alpha^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}_{i,i^{\prime},j,k}(x)\leftarrow 0

for all

i,i^{\prime}\in[I]

j\in[J]

k\in\mathscr{K}

, and

x\in\mathscr{X}_{i,i^{\prime},j,k}

7 Rank all the tuples

(i,i^{\prime},j,k)

according to the ascending order of their

\psi_{j}(i,i^{\prime},j)

, where tie cases are broken by selecting the shortest lifespans.

N\leftarrow I^{2}J(K+1)

9 Let

(i_{\iota},i^{\prime}_{\iota},j_{\iota},k_{\iota})

represent the

\iota

th tuple in the above mentioned ranking.

f_{j}\leftarrow-1

for all

j\in[J]

11 for $\iota\in[N]$ do

12 if $f_{j_{\iota}}=0$ then

13 Break

15 end if

16 for $x\in\mathscr{X}_{i_{\iota},i^{\prime}_{\iota},j_{\iota},k_{\iota}}$ do

a_{1}\leftarrow\min\Bigl{\{}a\in[0,1]~{}\Bigl{|}\text{ \eqref{eqn:action_constraint:relax} achieves equality for $j=j_{\iota}$ by setting }\alpha_{i_{\iota},i^{\prime}_{\iota},j_{\iota},k_{\iota}}^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}(x)=a\Bigr{\}}\cup\Bigl{\{}1\Bigr{\}}

a_{2}\leftarrow\min\Bigl{\{}a\in[0,1]~{}\Bigl{|}\text{ \eqref{eqn:capacity_constraint1:relax} achieves equality for $k=k_{\iota}$ by setting }\alpha_{i_{\iota},i^{\prime}_{\iota},j_{\iota},k_{\iota}}^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}(x)=a\Bigr{\}}\cup\Bigl{\{}1\Bigr{\}}

a_{3}\leftarrow\min\Bigl{\{}a\in[0,1]~{}\Bigl{|}\text{ \eqref{eqn:capacity_constraint2:relax} achieves equality for $(i,j,\ell)=\bigl{(}i_{\iota},i^{\prime}_{\iota},\ell(k_{\iota})\bigr{)}$ by setting }\alpha_{i_{\iota},i^{\prime}_{\iota},j_{\iota},k_{\iota}}^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}(x)=a\Bigr{\}}\cup\Bigl{\{}1\Bigr{\}}

\alpha_{i_{\iota},i^{\prime}_{\iota},j_{\iota},k_{\iota}}^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}(x)\leftarrow\min\{a_{1},a_{2},a_{3}\}

22 if $\min\{a_{1},a_{2},a_{3}\}=0$ then

f_{j_{\iota}}\leftarrow 0

\nu^{\textup{*}}_{j_{\iota}}(\boldsymbol{\gamma},\bm{\eta})\leftarrow\psi_{j_{\iota}}(i_{\iota},i^{\prime}_{\iota},k_{\iota})

25 Break

27 end if

29 end for

31 end for

32 return

\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})

and

\boldsymbol{\alpha}^{\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})}

34 End

Algorithm 3 Computing

\boldsymbol{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})

We provide summary of important variables in Tables I.

TABLE I: Important Symbols

Real Numbers and Vectors
$K$	The number of SC groups in the edge network	$L$	The number of destination areas
$I$	The number of channels	$J$	The number of task classes
$C_{k}$	The capacity of SC group $k$	$N_{i}$	The number of sub-channels of channel $i$
$w_{j,k}$	The number of SC units occupied by a $j$ -task if it is served by SC group $k$	$\mu_{i,j}$	The transmission rate for transmitting a $j$ -task through channel $i$
$\lambda_{j}$	The arrival rate of $j$ -tasks	$\mu_{j}(i,i^{\prime},k)$	The reciprocal of the expected lifespan of a $j$ -task served by channels $i,i^{\prime}$ and SC group $k$
$\varepsilon_{k}$	The operational power of an SC unit of group $k$	$\varepsilon_{k}^{0}$	The static power of an SC unit of group $k$
$\bar{\varepsilon}_{j}$	The expected energy consumption per $j$ -task processed by the cloud	$\psi_{j}(i,i^{\prime},k)$	The index for assigning a $j$ -task to resource tuple $i,i^{\prime},k)$ , defined in (19), used for constructing the HEE-ACC policy
$D_{0}$	The transmission time between the edge network and the cloud	$\mathcal{E}^{\phi}_{c}$	The long-run average power consumption for computing tasks offloaded to the cloud
$\boldsymbol{\nu}\in\mathbb{R}^{J}_{0}$	Lagrange multipliers for the relaxed constraints in (8)	$\boldsymbol{\gamma}\in\mathbb{R}_{0}^{K}$	Lagrange multipliers for the relaxed constraints in (9)
$\bm{\eta}\in\mathbb{R}_{0}^{I}$	Lagrange multipliers for the relaxed constraints in (10)	$a^{\phi}_{i,i^{\prime},j,k}(\bm{x})$	The action variable of the task offloading problem, taking values in $\{0,1\}$ . Given the state vector $\bm{X}^{\phi}(t)=\bm{x}\in\mathscr{X}$ , it indicates whether or not a newly arrived $j$ -task is assigned to the resource tuple $(i,i^{\prime},k)$ at time $t$ under policy $\phi$
$\bm{a}^{\phi}(\bm{x})$	$\bm{a}^{\phi}(\bm{x})\coloneqq(a^{\phi}_{i,i^{\prime},j,k}(\bm{x}):i,i^{\prime}\in[I],j\in[J],k\in\mathscr{K})$ , the action vector of the task offloading problem	$\alpha^{\phi}_{i,i^{\prime},j,k}(x)$	The action variable of the sub-problem associated with $(i,i^{\prime},j,k)$ , taking values in $[0,1]$ . Given the state of the sub-problem $X^{\phi}_{i,i^{\prime},j,k}(t)=x\in\mathscr{X}_{i,i^{\prime},j,k}$ , it indicates the probability of assigning a newly arrived $j$ -task to the resource tuple $(i,i^{\prime},k)$ at time $t$ under policy $\phi$
Important Labels
$i\in[I]$	Label of a channel	$j\in[J]$	Label of a task class
$k\in\mathscr{K}$	Label of an SC group	$\ell\in[L]$	Label of a destination area which locates a set of SC groups
Sets
$\mathscr{K}$	The set of all SC groups in the edge and cloud	$\mathscr{K}_{\ell}$	The set of SC groups located in the area $\ell\in[L]$
$\mathscr{X}$	The state space of the entire task offloading problem	$\mathscr{X}_{i,i^{\prime},j,k}$	The state space of the sub-problem associated with $(i,i^{\prime},j,k)$
$\Phi$	The set of all policies determined by action variables $a^{\phi}_{i,i^{\prime},j,k}(\bm{x})$ for all $i,i^{\prime}\in[I]$ , $j\in[J]$ , $k\in\mathscr{K}$ , and $\bm{x}\in\mathscr{X}$	$\tilde{\Phi}$	The set of all policies determined by action variables $\alpha^{\phi}_{i,i^{\prime},j,k}(x)$ for all $x\in\mathscr{X}_{i,i^{\prime},j,k}$ , $i,i^{\prime}\in[I]$ , $j\in[J]$ , and $k\in\mathscr{K}$ .
Random Variables
$X^{\phi}_{i,i^{\prime},j,k}(t)$	The number of $j$ -tasks that are being served by resource tuple $(i,i^{\prime},k)$ at time $t$ under policy $\phi$	$\bm{X}^{\phi}(t)$	$\bm{X}^{\phi}(t)\coloneqq(X^{\phi}_{i,i^{\prime},j,k}(t):i,i^{\prime}\in[I],j\in[J],k\in\mathscr{K})$ , the state vector of the entire task offloading problem at time $t$ under policy $\phi$

Appendix A Diagram for Decision Making

We provide a diagram in Figure 16 to demonstrate the process of deciding channel-SC tuples to serve newly arrived tasks.

Appendix B Proof of Proposition 1

Proof of Proposition 1. Let $\Gamma^{\phi}$ represent the long-run average power consumption (energy consumption rate), as described in (6), under policy $\phi\in\tilde{\Phi}$ , where recall $\tilde{\Phi}$ is the set of the policies with randomized action variables and includes those policies applicable to the original and/or the relaxed problem. Let $\Gamma^{\textup{*}}$ and $\Gamma^{\text{R},\textup{*}}$ represent the minimized objective function of the original and the relaxed problem, respectively.

Without loss of generality, for $k\in[K]$ and $i\in[I]$ , we rewrite $C_{k}=hC_{k}^{0}$ , $N_{i}=hN_{i}^{0}$ , and $\lambda_{j}=h\lambda_{j}^{0}$ for $h\in\mathbb{N}_{+}$ , $C_{k}^{0},N_{i}^{0}\in\mathbb{N}_{+}$ , and $\lambda_{j}^{0}\in\mathbb{R}_{+}$ . From [12, Theorem EC.1], for a policy $\phi$ satisfying (18), there exists a policy $\varphi$ derived based on $\psi_{j}(i,i^{\prime},k)$ such that

\lim\nolimits_{h\rightarrow+\infty}\lvert\Gamma^{\varphi}-\Gamma^{\phi}\rvert=0.

(31)

Note that $\Gamma^{\varphi}$ and $\Gamma^{\phi}$ in (31) are dependent on $h$ through $C_{k}$ , $N_{i}$ , and $\lambda_{j}$ . It follows that

\lim\nolimits_{h\rightarrow+\infty}\lvert\Gamma^{\varphi}-\Gamma^{\text{R},\textup{*}}\rvert=0,

(32)

which proves the proposition.∎

Appendix C Proof of Proposition 2

Proof of Proposition 2. If, for each $j\in[J]$ and resource tuple $(i,i^{\prime},k)\in[I]^{2}\times[K]$ with $\mu_{j}(i,i^{\prime},k)>0$ , the SC $k$ , channel $i$ or channel $i^{\prime}$ is dominant, then the system is weakly coupled [12, Section 3.3.1]. If the blocking probabilities of all task classes $j\in[J]$ are positive in the asymptotic regime, then the system is in heavy traffic [12, Section 3.3.2]. By invoking [12, Corollary EC.1], a policy $\phi\in\Phi$ given by

a^{\phi}_{i,i^{\prime},j,k}\bigl{(}\bm{X}^{\phi}(t)\bigr{)}\\ =\begin{cases}1,&\begin{array}[]{l}\text{if }(i,i^{\prime},k)=\\ ~{}~{}~{}\arg\min\nolimits_{(i,i^{\prime},k)\in\mathscr{T}_{j}(\bm{X}^{\text{HEE-ACC}}(t))}\frac{\psi_{j}(i,i^{\prime},k)}{w_{j,k}(1+\frac{\lambda_{j}}{\mu_{j}(i,i^{\prime},k)})},\end{array}\\ 0,&\text{otherwise},\end{cases}

is asymptotically optimal - it approaches optimality as $h\rightarrow+\infty$ . When $w_{j,k}(1+\frac{\lambda_{j}}{\mu_{j}(i,i^{\prime},k)})$ is a constant for all $j\in[J]$ and $(i,i^{\prime},k)\in[I]^{2}\times[K]$ with $\mu_{j}(i,i^{\prime},k)$ , the above described policy $\phi$ coincides with HEE-ACC-zero. It proves the proposition. ∎

Appendix D Pseudo-code for Computing $\bm{\nu}^{\textup{}}$ and the associated action variables for $\varphi^{\textup{}}$

The pseudo-code for computing $\bm{\nu}^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})$ and the associated action variables for $\varphi^{\textup{*}}(\boldsymbol{\gamma},\bm{\eta})$ with given $\boldsymbol{\gamma}\in\mathbb{R}^{K}_{0}$ and $\bm{\eta}\in\mathbb{R}^{I}_{0}$ is given in Algorithm 3.

Appendix E Simulation Settings

In the system discussed in Section VI,

•

for $i\in[I]$ , the channel capacities are $N_{i}=\bar{N}_{\lfloor(i-1)/h\rfloor+1}$ where $(\bar{N}_{1},\bar{N}_{2},\ldots,\bar{N}_{10})=(8,5,5,7,6,5,5,6,5,7)$ and $(\bar{N}_{11},\bar{N}_{12},\ldots,\bar{N}_{20})=(9,6,6,5,5,9,9,9,5,7)$ ;
•

the SC capacities are $(C_{1}^{0},C_{2}^{0},C_{3}^{0})=(5,5,8)$ ;
•

for $k\in[K]$ , the requested AC units $w_{1,k}=3$ , $w_{2,k}=4$ , $w_{3,k}=2$ and $w_{4,k}=1$ except that $w_{1,2}=w_{3,1}=w_{3,2}=w_{4,1}=w_{4,2}=+\infty$ ;
•

the operational power $(\varepsilon_{1},\varepsilon_{2},\varepsilon_{3})=(3.362,3.996,8.979)$ , and the power consumption per task processed in the cloud $\bar{\varepsilon}_{j}$ ( $j\in[J])$ are set to be $20.1\rho/\lambda_{j}$ (or $20.1\rho_{j}/\lambda_{j}$ for the case with heterogeneous traffic intensities presented in Figure LABEL:fig:varying-time:seed50-rho1.5:timeline and LABEL:fig:seed50-varying-rho1.5:scaler), of which the unit is Watt;
•

and the arrival rates of requests are $(\lambda_{1},\lambda_{2},\ldots,\lambda_{4})=(1.097,1.026,1.456,1.383)$ , which represent the numbers of arrived requests per second;

All the above-listed numbers are instances generated by a pseudo-random generator in C++. Let $\mathscr{I}_{j}^{\text{start}}$ and $\mathscr{I}_{j}^{\text{end}}$ represent the sets of eligible candidates for the starting and ending channels of requests in class $j\in[J]$ , respectively. We set $(\mathscr{I}_{1}^{\text{start}},\mathscr{I}_{1}^{\text{end}})=(\{3,5,6,7,10,12,13,14,20\},\{3,5,7,8,9,\\ 10,11,12,13,14,15,16\})$ , $(\mathscr{I}_{2}^{\text{start}},$ $\mathscr{I}_{2}^{\text{end}})=(\{2,9,17,19\},\\ \{2,7,8,9,10,11,12,16,19\})$ , $(\mathscr{I}_{3}^{\text{start}},\mathscr{I}_{3}^{\text{end}})=(\{2,9,17,\\ 19\},\{2,8,9,11,19\})$ , and $(\mathscr{I}_{4}^{\text{start}},\mathscr{I}_{4}^{\text{end}})=(\{1,4,6,17,18\},\\ \{1,4,17,18,19\})$ . In the simulated system, each CN in the system is considered to connect with all the ACs through wired links with sufficiently large capacities.

For the HEE-ALRN policy, we set the hyper-parameters $\Delta_{\lambda}=\Delta_{\eta}=\Delta^{+}_{\lambda}=\Delta^{+}_{\eta}=2$ and $\bar{M}=100$ .

For the simulations in Figure LABEL:fig:varying-time:seed50-rho1.5:timeline and LABEL:fig:seed50-varying-rho1.5:scaler, the offered traffic intensities are $\rho_{1}=3.876$ , $\rho_{2}=9.115$ , $\rho_{3}=7.042$ , and $\rho_{4}=8.150$ .