Dynamic DAG-Application Scheduling for Multi-Tier Edge Computing in Heterogeneous Networks

Xiang Li, Mustafa Abdallah, Yuan-Yao Lou, Mung Chiang, Kwang Taik Kim, Saurabh Bagchi Manuscript received XXX; revised XXX.

Abstract

Edge computing is deemed a promising technique to execute latency-sensitive applications by offloading computation-intensive tasks to edge servers. Extensive research has been conducted in the field of end-device to edge server task offloading for several goals, including latency minimization, energy optimization, and resource optimization. However, few of them consider our mobile computing devices (smartphones, tablets, and laptops) to be edge devices. In this paper, we propose a novel multi-tier edge computing framework, which we refer to as M-TEC, that aims to optimize latency, reduce the probability of failure, and optimize cost while accounting for the sporadic failure of personally owned devices and the changing network conditions. We conduct experiments with a real testbed and a real commercial CBRS 4G network, and the results indicate that M-TEC is capable of reducing the end-to-end latency of applications by at least 8% compared to the best baseline under a variety of network conditions, while providing reliable performance at an affordable cost.

Index Terms:

Multi-tier edge computing, Directed acyclic graphs, Task co-location, Latency, Reliability, Cost

I Introduction

The growing computation intensity and low latency requirement of surging applications which run on user-generated data fuels the need for moving computation closer to users [1, 2]. Such requirements have led to the widespread adoption of edge computing (EC), which offers reduced latency by executing computations close to the data source and scalability by splitting the workload across several edge devices.

Refer to caption — Figure 1: A system overview of classical cloud computing (red), edge computing (yellow), and multi-tier edge computing with peer-to-peer offloading (green).

Figure 1 depicts an overview of three computing paradigms: cloud computing, which is widely used in the real world, edge computing or fog computing, which is extensively discussed in the existing literature [3, 4, 5, 6], and the paradigm proposed by this work - multi-tier edge computing, which brings the computing power even closer to data source by allowing peer-to-peer offloading in close proximity.

Framework	Heterogeneous Edge Devices	Low Latency	P2P Offloading	Network Fluctuation	Supporting DAG	Failure Reduction
M-TEC (Ours)	✓	✓	✓	✓	✓	✓
IBOT [7]	✓	✓	✗	✗	✗	✓
DCC [8]	✓	✓	✓	✓	✓	✗
LaTS [9]	✓	✓	✗	✗	✓	✗
Petrel [10]	✓	✓	✗	✗	✗	✗
LAVEA [11]	✓	✓	✗	✓	✓	✗

TABLE I: A comparison of the available features between the prior related works and our framework (M-TEC).

Existing literature on EC has focused on scheduling and offloading schemes for computation-intensive services from end devices to commercially managed edge server infrastructures, under the assumption that such infrastructures will be available for extended periods and achieve reasonably low latency [12, 13, 14, 11, 10, 9, 15, 16, 17]. Majority of the previously proposed frameworks, however, solely utilized commercially maintained edge servers. Commercially maintained edge servers have advantages in terms of dependability and performance, but they also have drawbacks. They are not yet widely available, and edge computing services are typically not free. Consequently, a framework that may leverage the omnipresent devices among us to assist with demanding tasks appears to be a viable option.

Additionally, the EC literature on device-to-device (D2D) offloading in terms of computing resources and data optimization is scant [18, 8]. Their offloading scheme only takes into account individual tasks and not the dependencies between tasks, making it less practical for real-world applications. [19] proposed multihop offloading through edge device collaboration, in which they focus on the task dependency and formulate a fixed scheduling policy based on network flow at the beginning of the first task, which is less robust. [8] proposed a new three-layer framework called DCC, in which they integrate end devices into dynamic cloudlets by Kmean clustering and classify tasks into several categories to pick the optimal task allocation that aims to reduce latency and energy. However, they neglected to address the dynamic network condition of these end devices and did not account for the possibility of end device failure.

To accommodate the heterogeneous network environment and utilize the ubiquitous personal devices, we present the design of a multi-tier edge computing scheduling framework, which we refer to as M-TEC that combines personal edge devices (PEDs) and commercial edge devices (CEDs) from different networks into a single system to leverage the benefits of both types of devices. To the best of our knowledge, M-TEC is the first multi-tier edge computing framework that enables within-layer communication, combining end-devices, edge, and cloud from different networks into a single edge computing system to schedule complex directed acyclic graph (DAG)-based applications with low end-to-end latency, failure probability, and cost. One sample application of M-TEC is vehicular edge computing, where tasks such as pedestrian detection, obstacle detection can be executed on board as well as other vehicles in close proximity.

Figure 2 provides an overview of the proposed orchestration framework M-TEC. The application instance to be offloaded needs to be represented as a directed acyclic graph (DAG) workflows, which has been addressed in previous literature [20, 21] . The initiator preprocesses the DAG and map task dependencies (as described in detail in Section III-A1), then it employs a greedy method to lower the end-to-end latency of the application instance while taking into account the probability of failure and the dollar cost incurred. The primary contribution of this paper is the proposal of a novel three-tier dependable edge computing system that operates in a heterogeneous network environment and enables the within-layer p2p offloading to unleash the ubiquitous and powerful computing resources all around us.

In our evaluation, we compare M-TEC to two intuitive baselines, Random Allocation and Round Robin, and four state-of-the-art solutions, LAVEA [11], Petrel [10], LaTS [9], and DCC (three-layer offloading) [8], in a commercial CBRS 4G network with real edge devices and servers, as outlined in Table I. Four applications with diverse DAG structures and domains are utilized to test our framework. The results indicate that M-TEC can lower application end-to-end latency by at least 8% relative to the best baseline across all four testing applications across a wired or wireless connection with dynamic network traffic while reducing failure probability by more than 40%. To the best of our knowledge, M-TEC is the first scheduling framework for scheduling DAG workflows between peer client devices, edge servers, and cloud servers. Further, M-TEC is deployed and tested on real devices with a real network, in contrast to simulations.

We summarize our contributions as below:

1.

We propose a multi-tier edge computing framework M-TEC that executes complex DAG-based user applications in a heterogeneous network that comprises both commercial edge devices and personal edge devices.
2.

We propose a greedy algorithm that jointly optimizes end-to-end latency, failure probability, and cost while accounting for the unpredictability of network and device failure. The fact that our algorithm is distributed is key to its deployment in a real-world environment where it is operationally difficult to rely on an always-on central scheduler.
3.

We validate our framework by conducting extensive experiments in the real world network environment, which consists of Ethernet, CBRS 4G, and Wi-Fi. We demonstrate the advantages of M-TEC in reducing average application end-to-end latency, failure probability, and cost across different networks.

Our paper is organized as follows. In Section II, we precisely lay out the problem. In Section III, we present our design, first the high-level elements in our solution and then the details of each element. Section IV presents our extensive evaluation on the real-world testbed with real edge devices connected by a controllable CBRS 4G cellular network. We survey the relevant prior work in Section V and then conclude the paper.

II Problem Statement

The combination of commercially managed edge devices and personally owned edge devices already poses unique challenges due to the sporadic availability of personal devices. In addition, these devices are connected to separate networks, which creates an additional level of network speed, stability, and availability unpredictability. Such issues arise when attempting to combine devices from several networks to construct a dependable multi-tier edge computing platform. Such difficulties have not, to our knowledge, been addressed in the existing literature on edge computing. In this section, we will cover the key obstacles, the solutions to which will highlight the originality of our most recent work, M-TEC.

II-A Preliminary and notations

Now, we present the notations and terminology utilized by our framework M-TEC in Table II.

Symbols	Definition
$T=\{T_{1},T_{2},\ldots,T_{N}\}$	Types of tasks in a given application
$S=\{S_{1},\ldots,S_{N}\},T_{i}\in S_{j}$	Number of stages in DAG
$G=(V_{i},E_{j}),\ S_{i}\in G$	DAG representation of application
$ED=\{ED_{1},\ldots,ED_{N}\}$	Available edge devices (participators) in the offloading network
$L(T_{i})_{ED_{p}}$	Execution latency of $T_{i}$ on $ED_{p}$
$L(M(T_{i}))_{ED_{p}}$	Model download latency of $T_{i}$
$L(T_{i})_{d}$	Data transfer latency of input for $T_{i}$ from other devices
$L(T_{i}),L(S_{i}),L(G)$	End-to-end latency of task $T_{i}$ , stage $i$ , application G
$M(T_{i})$	Model required for $T_{i}$
$H(T_{i})$	Memory required for $T_{i}$
$T(i)_{d}$	Input data for task $T_{i}$
$P(T_{i})$	Placement of task $T_{i}$
$D(T_{i})$	Dependency of task $T_{i}$ in terms of other tasks
$T_{i}(meta)$	Meta file generated by $T_{i}$
$F(T_{i})$	Probability that $T_{i}$ fails
$P_{f}(G)$	Probability of failure of application, given by graph G
$ED_{rep}(T_{i})$	Devices that execute task $T_{i}$ replications
$c(ED_{p})$	Unit time cost of edge device p
$C(T_{i})$	Cost of complete task $T_{i}$
$C(G)$	Cost of complete application instance
$T_{rep}$	Tracker for the number of replications
$\phi$	Probability of failure threshold
$\kappa$	Threshold of the Replication degree
$\alpha,\beta,\gamma$	Hyper-parameter for tuning the weight on latency, probability of failure, and cost
$\eta$	Hyper-parameter for transmission speed error over provision
$P(G)$	Placement of each task in graph G
$Ed_{info}$	Total and free space on each ED
$M_{info}$	Available models on each edge device
$Task_{info}$	Data structure to track executing tasks and types on each ED
$WeightS$	Weight score of joint optimization
$WeightSnew$	Weighted score after PF reduction

TABLE II: Summary of the notations and their respective definitions used in M-TEC. To improve readability, we arrange symbols that are linked to a task (or function) together.

Peer-to-peer offloading: The majority of existing literature on task offloading/scheduling focuses on client-edge or client-cloud offloading, assuming the dependability and performance of commercially managed edge servers. However, such edge servers have not been widely deployed, and they are relatively expensive. Therefore, a multi-tier edge computing framework that permits peer-to-peer offloading in the client tier to release the power of idle computing resources, such as desktops, laptops, tablets, etc., appears promising.

Heterogeneity in the devices and communication network among devices: Different devices have different processing power, memory, etc and they are connected to various networks. Mobile phones and tablets can interact with one another over Wi-Fi or cellular network, whilst desktop computers and servers communicate with one another via LAN. As the network connection of privately owned devices might vary, heterogeneity in network connection adds an additional layer of complexity to addressing dynamic network conditions.

Dynamic network conditions: Dynamic network conditions, which relate to the ever-changing data transmission speed and latency among devices and between devices and servers, make it difficult to establish a stable allocation scheme that ensures consistent performance. To orchestrate the execution strategy optimally, a dynamic orchestration framework that monitors network traffic between every two edge devices within the orchestration network is required. Such an approach requires the initiator to be aware of the global network condition and, if not managed effectively, can incur excessive latency overhead.

Application’s DAG: An application could be made up of a few different tasks, and there might be data dependencies or order of execution requirements between those tasks. This dependency adds another layer of complexity, as some tasks can be conducted in parallel while others must be executed in a specific order. The remainder of this paper uses DAG to represent an application, where a node of the DAG represents a task in an application and the edge connecting the nodes reflects their dependency. The smallest unit used for scheduling in this paper is a task (node). Static DAG partition method is used throughout the paper.

III Design of M-TEC

In order to address the problems outlined in the preceding section, we propose a multi-tier edge computing framework that we refer to as M-TEC. M-TEC is a dynamic decentralized scheduling framework for complex DAG-based applications that aims to jointly optimize the end-to-end latency, probability of failure, and cost based on the application’s requirements while taking into account the uncertain nature of different networks and the intermittent availability of edge devices. The remainder of this section discusses the framework’s design components and our proposed algorithm for scheduling.

III-A Design components

Each device in the network has two sets of functional programs - the initiator program and the participator program. Depending on the user’s intention of joining the network, it will select the appropriate program to execute. If the user joins the network as an initiator, it indicates that the user is intended to offload some tasks. Therefore, the initiator is responsible for DAG-Preprocessing and collecting the profiling information from each device in the network for optimization purpose. As each initiator only process the offloading requests originated from itself, it will not be the bottleneck of the system.

If the user joins the network as a participator, it indicates that the user is willing to contribute its idle resources. It will inform the initiator whether the task to be offloaded has been profiled on itself. If it is not, it should be profiled before it can join the offloading network.

III-A1 DAG-preprocessing

When the initiator decides to offload an application instance, it needs to transform the application’s DAG and divides the execution into stages. The benefit of breaking the DAG into stages is that the task dependencies are integrated inside the stages, and all tasks within the same stage can be done concurrently. This DAG transformation is carried out using a modified version of Breadth-First Search, in which the stage of a node is the length of the longest path from the start node.

III-A2 Initiator

The initiator is the origin of the offloading process. When a device join the network as an initiator, it quickly discovers the available participators who already in the network around itself and gathering the profiling information on those devices against the task that is going to be offloaded. If such information on a participator is not available, the initiator can request profiling information from a similar device in the network. If such a similar device does not exists, the participator will be removed from the current offloading process and put on profiling. It will rejoin the offloading network once the profiling is done. We use the linear, independent, and additive property [22] of collocated task for latency estimation, which is described in Sec III-C.

End-to-end latency: With the profiling information of each participator available, the initiator estimates the end-to-end latency of each task in the DAG-based application on each participator $p$ in the offloading network, which is defined as $L(T_{i})_{ED_{p}}$ . The estimated end-to-end latency of executing task $T_{i}$ on each participator is rated from low to high with their assigned edge device, and this procedure is described as follows:

		$\displaystyle\underset{p}{\arg\min}~{}L(T_{i})$
		$\displaystyle\text{where}\ L(T_{i})=L(T_{i})_{ED_{p}}+L(M(T_{i}))_{ED_{p}}+L(T_{i})_{d}$		(1)
		$\displaystyle\text{s.t.}\ H(T_{i})\leq H(ED_{p}),\ ED_{p}\in ED$

Here, the end-to-end latency of $T_{i}$ consists of three components: task execution latency $L(T_{i})_{ED_{p}}$ , inferred from profiling data and the number of distinct types of tasks co-located on the assigned device and tracked by the structure $\text{Task}_{\text{info}}$ ; model downloading latency $L(M(T_{i}))_{ED_{p}}$ , which is determined based on model size and network download speed, if the task requires the execution of a model; and input data transmission latency $L(T_{i})_{d}$ , which is determined depending on the input file size and the transmission speed between source and destination devices. $H(T_{i})$ is the memory required for $T_{i}$ ’s execution, including memory to store data and model, whereas $H(ED_{p})$ is the memory available on $ED_{p}$ , which is tracked by $ED_{\text{info}}$ . The aforementioned procedure is illustrated in Algorithm 1. A priority queue is used to store the allocation of task $T_{i}$ , with devices in ascending order of lowest latency to highest latency.

Since the network conditions change dynamically, the transmission speed between devices also varies accordingly. This fluctuation in transmission speed is accounted for in our framework, as described in Section III-A3.

Let us now define $L(S_{i})=\max_{T_{i}\in S_{i}}L(T_{i})$ as the latency for stage $i$ , where $L(T_{i})$ is given by (III-A2). As a result, the total end-to-end latency of the application is equal to the sum of the latencies of the tasks in each stage that take the longest:

L(G)=\Sigma_{i=1}^{i=S}L(S_{i}).

Input :

T_{i}

Output :

L(T_{i})-Queue

1 for $ED_{p}\in ED$ do

L(T_{i})_{ED_{p}}=GetEstimatedTime(T_{i},ED_{p})

L(M(T_{i}))_{ED_{p}}=0

4 if $M(T_{i})\ not\ on\ {ED_{p}}$ then

L(M(T_{i}))_{ED_{p}}=GetModelDownloadTime(size(M(T_{i})),B)

7 end if

8 if $T(i)_{d}\ not\ on\ \ ED_{p}$ then

L(T_{i})_{d}=GetDataTransTime(T(i))

11 end if

L(T_{i})=L(T_{i})_{ED_{p}}+L(M(T_{i}))_{ED_{p}}+L(T_{i})_{d}

L(T_{i})\_Queue.enqueue([ED_{p},L(T_{i})])

14 end for

Algorithm 1 Min_Latency_Scheduling

Probability of failure: Now, with the end-to-end latency of the task $T_{i}$ executing on each available participator, the initiator attempts to reduce failure probability and cost. Due to the unmanaged nature of personal devices, their availability may fluctuate. Therefore, it is vital to incorporate redundancy to replicate jobs given to participators with a high failure probability. To estimate the failure probability of each participator, we construct a failure prediction model for each participator, which is expressed as the exponential function [23, 22] given by

\displaystyle P(ED_{i})=1-e^{-\lambda t},

where the failure rate $\lambda$ is estimated from the participator’s history data. Since tasks fail as soon as those devices they are assigned to fail, therefore, the probability of failure for a task $T_{i}$ equals the probability of failure of the participator that executes the task $T_{i}$ during its execution period, which is denoted by $F(T_{i})$ .

Now, the initiator estimates the failure probability of the most optimal allocation for end-to-end latency for task $T_{i}$ determined in the previous phase. If $F(T_{i})$ is greater than a certain threshold $\phi$ and the number of replication for task $T_{i}$ is less than the maximum number of replications allowed $\kappa$ , a weighted score, $WeightS$ , is computed for the current allocation for $T_{i}$ , with user-defined weights $\alpha,\beta,\gamma$ assigned to end-to-end latency, probability of failure, and cost. The framework then attempts to replicate tasks on the second-most optimal option from the end-to-end latency queue in an effort to lower the failure probability. A new weighted score is derived from the new latency, failure probability, and cost. If the new weighted score, $WeightSnew$ , is less than the original weighted score, $WeightS$ , task $T_{i}$ is replicated on the second most optimal allocation. The process is continued until $F(T_{i})$ is less than the failure threshold $\phi$ , the total number of replications exceeds the maximum allowed replications or $WeightS$ is no larger than $WeightSnew$ . All participators that execute replications of task $T_{i}$ are denoted by the set $ED_{rep}(T_{i})$ . The above-described procedure is depicted in Algorithm 2.

The entire application instance is deemed successfully completed when every task $T_{i}$ in the instance is successfully executed, denoted as $T_{i}^{s}$ . The failure probability for the application instance is given by:

P_{f}(G)=1-\Pr(T_{1}^{s},T_{2}^{s},\ldots,T_{N}^{s}).

Given that the tasks in the application instance are structured as a DAG, the events of successful task completions are conditional on the completions of prior tasks. To compute the probability of failure for a given application instance, consider the following example where $A\rightarrow B$ implies that task $B$ depends on task $A$ . For a DAG application $G$ with six tasks where the dependencies are $T_{0}\rightarrow T_{1}$ , $T_{1}\rightarrow T_{3}$ , $T_{1}\rightarrow T_{4}$ , $T_{0}\rightarrow T_{2}$ , and $T_{2}\rightarrow T_{5}$ , the probability of failure is calculated as:

	$\displaystyle P_{f}(G)$	$\displaystyle=1-\Pr(T_{0}^{s})\Pr(T_{1}^{s}\mid T_{0}^{s})\Pr(T_{3}^{s}\mid T_{1}^{s},T_{0}^{s})$
		$\displaystyle\cdot\Pr(T_{4}^{s}\mid T_{1}^{s},T_{0}^{s})\Pr(T_{2}^{s}\mid T_{0}^{s})\Pr(T_{5}^{s}\mid T_{2}^{s},T_{0}^{s}).$

Input :

L(T_{i})\_Queue

Output :

P(T_{i})

F(T_{i})=GetPf(T_{i},ED_{p},L(T_{i})_{ED_{p}},D(T_{i}))

WeightS=\alpha L(T_{i})+\beta F(T_{i})+\tau C(T_{i})

3 while $F(T_{i})\geq\phi$ and $T_{rep}<\gamma$ do

ED_{p},L(T_{i}=L(T_{i})\_Queue.dequeue

F(T_{i})=GetPf(T_{i},ED_{p},L(T_{i}))

C(T_{i})=L(T_{i})*c(ED_{p})

WeightSnew=\alpha L(T_{i})+\beta F(T_{i})+\gamma C(T_{i})

8 if $WeightSnew\leq WeightS$ then

ED_{rep}(T_{i}).add(ED_{p})

WeightS=WeightSnew

T_{rep}++

13 end if

15 end while

Algorithm 2 PF_Cost_Reduction

Cost: With end-to-end latency and failure probability taken into account, the framework’s final optimization objective is cost. If replication is not required, the cost for the task $T_{i}$ executing on participator $p$ is determined as follows:

\displaystyle C(T_{i})=L(T_{i})_{ED_{p}}*c(ED_{p})

where $c(ED_{p})$ is the unit time cost for using participator p. If replications exist, the overall cost for task $T_{i}$ is determined as the sum of all costs incurred by each participator that executes task $T_{i}$ , which can be represented as follows:

	$\displaystyle C(T_{i})=\sum_{n=1}^{\text{\#\ of\ replications}}c(ED_{p_{n}})*L(T_{i})_{ED_{p_{n}}}$
	$\displaystyle\text{s.t.}\ ED_{p_{n}}\in ED_{rep}(T_{i}).$

The total cost of the entire application is given by

\displaystyle C(G)=\sum_{i=1}^{N}C(T_{i}).

The final optimization problem can be expressed as:

	$\displaystyle\min\alpha L(G)+\beta P_{f}(G)+\gamma C(G)$
	$\displaystyle s.t.\ \alpha+\beta+\gamma=1$		(2)

In this setup, the parameters $\alpha$ , $\beta$ , and $\gamma$ are user-defined weights that control the trade-offs among end-to-end latency, failure probability, and cost. This approach allows for fine-tuning based on the specific application requirements. Such joint optimization problems are commonly addressed in the literature by assigning different weights to linearly combined metrics, as seen in references like [24, 25, 26, 27].

III-A3 Participators

In the following, we detail the role of a participator within the offloading network. When a user joins as a participator, they consent to allocate their idle computational resources to the initiators within the network. A participator is restricted to involvement in a single offloading network concurrently. The key function of a participator is to execute tasks that are dispatched by the initiator. Moreover, each participator must also monitor the communication speed between itself and other participators in the network (as depicted by the blue dashed arrows in Figure 3). The transmission rate between edge participators $i$ and $j$ is measured as follows:

\displaystyle S_{(i,j)}=\frac{2*\text{size}(x)}{\text{rtt}_{(i,j)}*\eta},

where $\text{size}(x)$ represents the testing packet’s size, $\text{rtt}_{(i,j)}$ represents the round-trip time required for the testing packet to be transferred from device $i$ to device $j$ and back to device $i$ , and $\eta$ is the transmission speed error over-provision hyper-parameter. The reason for including such a parameter is that the current observed network speed is used to orchestrate future tasks, and by the time offloading occurs, there should already be tasks running on the device, which can cause packet queuing and packet processing delays. This parameter provides a lower bound for the transmission speed and inhibits task offloading that could result in lengthy transmission latency. One may believe that such transmission error over-provisioning could lead to the aggregation of tasks on particular devices. We discuss the specifics of this issue in Section IV-I. The $network\_probe$ function is shown in Algorithm 3.

Input : ED,

\eta

return :

S_{i}

1 for $ED_{p}\in ED$ do

\text{rtt}_{(i,p)}=send\_test\_packet(x)

S_{(i,p)}

=transmission\_speed\_calculation

(\text{rtt}\_{(i,p)},\eta)

4 end for

Algorithm 3 network_probe

Additionally, participants collect input and output data size for their assigned tasks, which is sent to the initiator to build and continuously update a regression model for predicting intermediate file sizes in future orchestration [28].

III-B Scheduling algorithm

We have now reviewed the framework’s specifics. Let’s go over the deployment overview. Each device in the offloading network can potentially run two algorithms, which we refer to as the initiator and the participator. The initiator originates offloading requests, pre-processes the DAG, and optimizes latency, probability of failure, and cost before dispatching tasks to participators in accordance with the optimization allocation scheme. Algorithm 4 illustrates the initiator algorithm.

Input : DAG-based application G ,

\alpha,\beta,\tau,\phi

S=Pre\_process(G)

2 for $S_{i}\in S$ do

3 for $T_{i}\in S_{i}$ do

L(T_{i})\_Queue=Min\_Latency\_Orchestration(T_{i},S)

P(T_{i})=PF\_Cost\_Reduction(L(T_{i})\_Queue)

6 end for

8 end for

Dispatch(P(G),ED)

10 # following functions running in separate threads

S=Receive\_Network\_Test\_Result()

Receive\_InOut\_Size\_Result()

Update\_Meta\_Size\_Regression()

Algorithm 4 Initiator

The participator receives tasks and an allocation scheme from the initiator, as well as task inputs from other clients or the initiator. As soon as all necessary input files are ready, clients begin completing their given tasks. Algorithm 5 depicts the general function of the participator.

Input : P(G), ED,

\eta

T_{i}=Receive\_job()

D(T_{i})=Receive\_inputs()

T_{i}(meta)=Execute\_job(T_{i},D(T_{i}))

4 # following function running in separate threads

S=network\_probe(ED)

Send\_network\_result(S)

Send\_InOut\_size(T_{i})

Algorithm 5 Participators

Everyone can be initiator: Note that our framework does not rely on a centralized server to coordinate the scheduling process. Every initiator who intended to offload tasks will have their own smaller scaled offloading network. In the event that multiple initiators are present, however, it must be assured that those smaller scaled offloading networks do not overlap. For instance, if we have two initiators with their respective offloading netwotk with participators $offNet1=\{ED_{a1},ED_{a2},\ldots,ED_{an}\}$ and $offNet2=\{ED_{b1},ED_{b2},\ldots,ED_{bn}\}$ , then the following condition must be true:

\displaystyle offNet1\cap offNet2=\emptyset.

The participators can be regrouped into different offloading networks when one initiator completed its tasks and a new initiator joins the network.

III-C Interference based service time prediction

Previous literature demonstrated that co-located tasks on the same device interfere with one another and that this interference increases the total execution time of the tasks [7, 22]. Specifically, [22] demonstrated that there is a correlation between the execution latency of a task and the number of different types of other tasks already running on the device, and they validated this correlation on multiple platforms with various operating systems and architectures. We take a similar approach in our work, profiling our testing applications against all candidate devices in the testbed and characterizing the relationship between execution latency and the number of different types of tasks executing on the devices. We present a linear regression model to capture this relationship and utilize it to predict the execution time of incoming tasks. Note that a device will not be considered as the candidate of the orchestration network until its profiling is finished. Once the device is profiled, its profiling data will be shared with other devices in the network to avoid redundant profiling.

III-D Scalability, privacy, and fairness

A main disadvantage of the existing centralized scheduling approach lies at its scalability. Due to the large amount of devices existing in the network, it is impossible to check all of them and schedule a task in an efficient manner. M-TEC works around this disadvantage by letting the initiators divide the large network into smaller scaled self-governed offloading networks based on the proximity, thus significantly reducing the scheduling overhead. The proximity is measured as the round trip time from the initiator to the ED. The ideas of dividing the entire orchestration network into subnets are used by [29, 30] as well. However, instead of using round trip time to measure proximity, they use devices with localization capability as gateways to manage other devices on the same local area network. Such methods did not take network traffic into consideration, which may result extra orchestration overhead.

Privacy is another concern when dealing with personally owned devices. There exists a large body of literature focus on data privacy, user location privacy and storage privacy in a distributed system [31, 32, 33, 34], which is beyond the scope of this paper. Such privacy preserving tasks offloading techniques have been proven feasible in practice such as in the field of federated learning.

There exists rich literature that focuses on fairness scheduling on edge and cloud computing [35, 36, 37, 38]. That is not a focus of this paper. However, fairness can be easily enforced by each initiator in the network during the orchestration stage.

IV Experiment

To assess our framework in diverse real-world scenarios, we construct a testbed comprising a CBRS 4G network and enterprise wireless routers (Cradlepoint E3000 and R1900) connected to heterogeneous edge devices. Our testbed, unlike simulation-based testbeds, does not contain any assumptions or fixed network settings (e.g., end-to-end latency, data rate, queueing delay, etc.). Instead, our testbed can connect to a CBRS 4G network to experience and assess the actual impact of wireless channel dynamics.

In contrast to the private CBRS 4G network, we were unable to conduct experiments on commercial 5G networks because of the inability to configure static and public IP addresses for Cradlepoint gateways (which provide cellular connections to edge devices) and inbound connections for specific external IP addresses of edge devices/servers located outside of premise without the support of an operator. This prevented us from being able to use commercial 5G networks for our research. Nevertheless, our results conducted over CBRS 4G networks will be comparable to those conducted over 5G networks with sub-6GHz frequency bands, as almost all operators currently adhere to the NSA (non-standalone) architecture in which the 5G RAN (radio access network) is connected to a 4G LTE core network [39], and channel characteristics between the CBRS frequency band at 3.5GHz and sub-6GHz frequency bands are comparable.

Location	Device	Name	Amount	CPU	RAM
Cloud	Cloud Server	Desktop	1	16 core (x86)	32GB
Edge Cloud	Edge Server	Desktop	1
	Edge Servers	Jetson AGX Xavier	2	8 core (ARM)	16 GB
		Jetson Xavier NX	7	6 core (ARM)	8 GB
		Jetson TX2	4	4 core (ARM)	8 GB
Cell Site	Initiator (ED)	Up Squared AI Edge	2	4 core (x86)	8 GB
	ED 1
	ED 2	Jetson AGX Xavier	2	8 core (ARM)	16 GB
	ED 3

TABLE III: The hardware and physical information of the testbed.

IV-A Experiment scenarios

To evaluate our system, we design and conduct three distinct types of real-world experiments.

•

Offloading experiments for two-tier edge cloud and cloud scenarios that connect through wired internet, aim to quantify the relationship between end-to-end latency and the number of edge loud devices involved in the offloading.
•

Offloading experiments for three-tier cloud, edge cloud, and cell site that connects over wired, wireless link, aim to show the feasibility of using cell site devices for computation offloading.
•

Offloading experiments with various network dynamics to examine the performance of task offloading with realistic background TCP traffic.

Figure 4 depicts an overview of the testbed setup and experiment system architecture. The testbed is divided into three tiers. The cloud tier consists of one powerful PC, the edge cloud tier is a combination of edge devices and edge servers, and the cell site is a combination of user devices.

IV-A1 Two-tier edge computing at edge and cloud

First, we consider an Ethernet-based wired interconnection in an edge cloud that adheres to the traditional cloud model in order to offload tasks both at the edge and in the cloud. In addition, given the fluctuating computational resources, there are four experiment iterations. We randomly select a different number of edge servers in each iteration (i.e., 5, 8, 11 and 14). The heterogeneity of cloud and edge servers given in Table III reveals four distinct hardware specifications, including desktop, Jetson AGX Xavier, Jetson Xavier NX, and Jetson TX2. All communications between endpoints within the edge cloud (e.g., TCP connection setup, data transmission, etc.) are conducted via wired lines.

IV-A2 Three-tier edge computing in CBRS 4G network

Instead of depending just on the power of the edge cloud and cloud server, we construct a three-tier edge computing system by incorporating a CBRS 4G network-operated cell site. The heterogeneity of the cell cite is increased by the presence of four devices with two distinct types (i.e., Up Squared AI Edge and Jetson AGX Xavier). In contrast, from a network perspective, end-device communication now utilizes both wireless and wired connections. This is an example of how mobile edge computing typically facilitates compute offloading for end users. In addition, we decrease the number of selected edge servers in the edge cloud to a maximum of five in order to observe the performance of each offloading scheme on the wireless connections. This increases the likelihood that tasks will be dispersed among edge devices at the cell site, hence enhancing the potential for additional data transmission over wireless networks.

IV-A3 Wireless channel dynamics and TCP traffic flow

Extending the second experiment, we evaluate the performance of several schemes and underlying theories in a real-world network environment with variable network conditions. By having edge devices continuously transmit and receive TCP packets from one another, we introduce TCP background traffic into the testbed. Under this configuration, eNodeB (eNB) uplink and downlink channels would be affected most effectively. In addition, we cause the initiator to continue sending TCP packets to edge servers to simulate different types of background traffic. The TCP traffic flows are depicted in Figure 4.

In a nutshell, the cloud and mobile edge computing paradigms, as well as a realistic edge computing scenario with customizable network traffic control, are respectively represented by the three experiments. In the following section, we provide the essential setup for conducting these real-world experiments on our testbed.

IV-B Testbed setup

At the edge cloud, we deploy edge servers with Ethernet connections in accordance with the standard data center architecture. The cloud server is not co-located with the edge servers, even if they are installed in the same region. Under the support of network administrators, both cloud servers and edge servers can simply acquire a static and public IP address. In contrast, at the cell site, we individually connect four end devices to Cradlepoint enterprise wireless routers. Wireless routers have SIMs for CBRS 4G network access. The edge cloud, wireless routers, and a CBRS 4G network are then configured to provide connected devices with static and public IP addresses. To accomplish computation offloading, we enable peer-to-peer (p2p) connections between edge devices. Furthermore, we construct a three-tier mobile edge computing testbed that operates on a CBRS 4G network. Note that the initiator in our testbed can be any edge device, including edge servers. This is a plausible scenario in which any of us might require computation offloading.

Next, with the configuration described above, we populate the testbed with TCP traffic traveling from end to end. As explained in Section IV-A3, in order to observe the impact of wireless channel dynamics on our framework, edge devices at the cell site continually send TCP packets to each other, and the initiator continuously transmits TCP packets to edge servers in the edge cloud. To bring network traffic into our testbed, we install the Linux tool iperf3 on each end device, which subsequently hosts an iperf3 session to receive TCP packets from corresponding clients. The TCP session’s congestion control algorithm is cubic, and the TCP traffic flows are depicted in Figure 4.

The construction of a testbed on a CBRS 4G network to accommodate three types of realistic experiments has been discussed. Additionally, we noted the paradigm or scenario that each experiment illustrates. In the following section, we evaluate our framework and other compute offloading schemes on our testbed using the aforementioned three experiments. We evaluate performance from multiple perspectives, including end-to-end latency, failure probability, and financial cost.

IV-C Performance metrics

End-to-end latency: We define the end-to-end latency of an application instance as the time between the dispatch of the initial task and the receipt of the final result. In our evaluation, application instances may arrive in a clustered fashion, which may result in tasks accumulating on edge devices and longer end-to-end latency for some instances. As a result, in all of our evaluations, we use the average end-to-end latency of application instances.

Probability of failure (PF): The probability of failure for an application instance is the probability that it does not complete all of its tasks successfully and fails to return the result to the initiator. Due to sporadic availability of edge devices or excessively extended execution times, tasks may fail (e.g., a person leaves the room with his laptop during the middle of the task execution).

Dollar cost ($): Using commercially managed edge devices comes at a cost. Throughout our evaluation, we assign dollar costs to each device using a similar pricing structure as Amazon EC2 instances [40].

IV-D Testing applications

We evaluated the performance of M-TEC using four applications from diverse domains, including data science (mapreduce sort), machine learning (LightGBM, video analytics), and mathematics (matrix computation).The DAG structure of these four applications is depicted in Figure 5. MapReduced Sort: Parallel mappers fetch input job and generate intermediate files, which are consumed by reducer to produce the final result. Lightgbm: Train several decision trees in parallel and combine them to form a random forest predictor. Video analytics: split videos into batches and process them in parallel to generate analytical results. Matrix operation: Heavy matrix computation. Such four applications span a variety of dependency levels among tasks that are used for testing the generality of M-TEC.

IV-E Latency

IV-E1 End-to-end latency comparison

To demonstrate that M-TEC can lower the average end-to-end latency of an application instance, five edge devices are randomly selected from our testbed and added to our orchestration network. We programmed the initiator to send 100 application instance requests at random within 250s and performed the experiment ten times. By repeating the experiments, we are able to retrieve the average performance of the orchestration scheme as well as exam the overall system performance under randomly generated peak load, which was observed as more than 10 application instances within 5 seconds. Then, we compare the average end-to-end latency of each orchestration scheme’s application instance. Due to the fact that none of the baselines account for the probability of device failure or network variation, we did not induce any device failure in this experiment, and all five edge devices in the orchestration network are connected via LAN. As seen in Figure 6, M-TEC outperforms all baselines. In addition, the performance of M-TEC is stable as the average end-to-end latency does not vary significantly during 10 test cycles.

IV-E2 Latency vs. number of devices

As more devices are added to the orchestration network, the average end-to-end latency should ideally decrease. However, such an assumption is implausible and necessitates a vast number of devices in the orchestration network. Additionally, task distribution across several devices incurs communication latency. To demonstrate the benefits of M-TEC, we conducted the latency test with varying numbers of edge devices in the network. Figure 7 demonstrates the trend of the average end-to-end latency of the mapreduce sort application for M-TEC and four additional baselines. We eliminated the two intuitive baselines RR and RD from Figure 6 since they are out of scale. The average end-to-end latency for M-TEC and baselines reduces as the number of devices in the orchestration network increases. M-TEC continues to exceed the second baseline by more than 50 percent when fourteen devices are added to the orchestration network. Moreover, we observe that DCC’s performance fluctuates over the course of the test. This is because the device clustering method can generate clusters with uneven computation capabilities, making some of them the performance bottleneck. We would like to point out that as more devices are added to the network, the performance improvement starts decreasing and it introduces excessive orchestration overhead. Therefore, the number of devices in an initiator’s network does not grow unbounded.

IV-F Probability of application failure

To examine the efficacy of M-TEC in reducing the failure probability, we allowed each of the 14 testbed devices to fail according to an exponential distribution. Each device’s failure rate $\lambda$ is carefully chosen, and it may be linked to real-world device failure data collected by us [23]. Figure 8 depicts the average probability of failure when executing 100 application instances in 250 seconds. We observe that M-TEC reduces the probability of failure to approximately 20%. We see that the performance of LaTS [9] is equivalent to that of M-TEC, however to achieve such a low probability of failure, LATS distributed the majority of work to devices with powerful computational capabilities. This allocation technique is not ideal since the failure of a single device can result in the failure of a significant number of applications.

IV-G Effect of dynamic network conditions

Transmission latency varies greatly in a heterogeneous computing platform where devices in the framework are connected to different networks in different locations. Figure 9 depicts the process delay and transmission latency of the video analytics applications distributed in our testbed in a round-robin fashion. For certain tasks, such as splitting and transmitting the splitted video, transmission latency can account for up to 60% of the overall latency.

To evaluate the performance of M-TEC in a heterogeneous and dynamic network environment, we conduct a real test with the CBRS 4G network and inject network traffic into the wireless connections using iperf. Figure 10 depicts the average end-to-end latency of a video analytics application under various network traffic scenarios. We compare M-TEC to LAVEA [11] and Petrel [10], the two top performers in the scenario with no traffic. When there is no traffic in the network, we can see that all three systems function similarly, with M-TEC winning by a small margin. We notice, however, that M-TEC is capable of adapting to the network condition and maintaining a rather consistent performance with less than 10% deterioration when we begin to introduce traffic. However, neither Petrel nor LAVEA is able to do so, and their latency increases by more than 80%.

IV-H Hyper-parameters $\alpha,\beta,\gamma$

We have now evaluated the performance of M-TEC in terms of lowering latency and failure probability. Next, we demonstrate the performance of M-TEC’s joint optimization in (2). As depicted in Figures 11, 12, and 13, we did a sweep on the hyper parameters $\alpha$ , $\beta$ , and $\gamma$ to demonstrate the join optimization performance of M-TEC.

In Figure 11, we maintain a constant value of 0.1 for the hyper-parameter $\alpha$ , which represents the weight associated with end-to-end latency. Then, a sweep was done on $\beta$ and $\gamma$ . The result demonstrates that when the weight switches from failure probability ( $\beta$ ) to cost ( $\gamma$ ), M-TEC gradually reduces the number of task replications, resulting in an increase in failure probability and a decrease in the average cost per application instance. At $\gamma=0.4$ , instead of falling, the average cost per application momentarily increases. This is because, at this stage, there is a large decrease in end-to-end latency (associated with weight $\alpha=0.1$ ), which can result in greater overall performance than simply reducing the average cost. Consequently, we observe this transitory cost increase. The general pattern remains unchanged: as the probability of failure increases, the average cost drops.

Figure 12 depicts the sweep of hyper-parameters $\alpha$ and $\beta$ , while $\gamma$ is held constant at 0.1. The result indicates that when $\alpha$ is given greater weight, the average end-to-end latency of application instances reduces and the probability of failure rises. Similarly, we also see that there is a temporary rise in average end-to-end latency (at $\alpha=0.3$ , not as obvious as in Figure 11), such behavior is caused by the cost weight $\gamma$ . At around $\alpha=0.3$ , the framework choose to sacrifice latency in return to reduce the cost, which gives an overall better performance.

Figure 13 illustrates the sweep of hyper-parameters $\alpha$ and $\gamma$ , while holding $\beta$ constant at 0.1. Increasing the weight of $\alpha$ improves the average end-to-end latency of application instances while increasing the average cost per application. As observed, the latency plot varies during the experiment. This is due to the fact that when the cost is given less importance, devices with more computation capability and dependability are more likely to be chosen, resulting in a large reduction in failure probability. Consequently, the plot is marked by such variation.

IV-I Transmission speed error over-provision $\eta$

To determine how the transmission latency over-provision parameter $\eta$ influences task allocation, we gather the load on each device for $\eta$ values ranging from 1 to 3 with a 0.2-step increment for the matrix computation application. The outcome is depicted in Figure 14. At various $\eta$ values, there is a wide range of load on each device, but in the majority of cases, the load on a device falls between 30 and 60, which accounts for less than 15 percent of all tasks being dispatched (we are sending 100 application instances and each instance contains 5 tasks, so 500 tasks in total). As a result, changing the value of $\eta$ does not result in the aggregation of tasks on specific devices, which could lead to catastrophic failure.

V Related work

Task scheduling in heterogeneous edge computing has been the subject of various studies in the literature. This section compares our work to those of our predecessors in this field.

Multi-tier edge computing: One of the primary goals of edge computing is to achieve minimal end-to-end latency in order to facilitate latency-sensitive applications. Several prior works [9, 11, 10, 12, 24] offered several scheduling algorithms for client-edge offloading that target various optimization objectives, such as latency, energy, and resource allocation, among others. We have demonstrated that M-TEC outperforms LaTS [9], LAVEA [11], and Petrel [10], which are all latency-optimized schemes. There is a growing body of work [15, 41] on multi-layer edge computing. The majority of such multi-layer architectures adopt a bottom-up approach in which task offloading occurs only across layers or inside edge layers, but not within client layers. For instance, an end device can only offload tasks to edge servers or clouds, not to other end devices. Client-edge offloading is a subset of multi-layer offloading architectures with only two layers. Few works are evaluated within the layer task offloading [8]. However, none of the literature addresses a framework that uses heterogeneous devices from diverse networks to create a dependable edge computing platform that minimizes end-to-end latency, failure probability, and cost.

Dynamic and heterogeneous network: One of the bottlenecks for edge computing is network speed, which determines the rate at which tasks can be dispatched to their assigned edge nodes. Such transmission latency cannot be overlooked in a network environment that is dynamic. The vast majority of the available literature focuses solely on the network delay caused by dynamic network conditions [42, 43], sometimes caused by adversarial actions [44]. [11] briefly discussed the various performance of heterogeneous networks. In the course of our experiments, we have demonstrated that M-TEC is able to adapt task allocation in diverse networks (Ethernet/CBRS 4G).

Machine learning tasks on edge: Recent research attempts to offload machine learning and deep learning tasks to edge platforms [45, 46]. [46], for instance, proposed RT-mDL, a framework that aims to reduce the rate of missed deadlines by employing a novel model scaling method and by utilizing joint model selection and task priority assignment to improve the performance and resource utilization of GPU/CPU for traffic light detection and sign recognition. This offloading approach is promising for some applications, but cannot be easily adapted to other applications, hence it lacks universality. In contrast, M-TEC is capable of handling a variety of application types given the availability of their profiling information.

VI Discussion and Future Work

In this section, we discuss the constraints of M-TEC as well as various performance-enhancing options.

First, the current algorithm begins by comparing each incoming task to all available edge devices. When multiple edge devices are accessible, this procedure may result in a large orchestration overhead for simple tasks. Even though our research demonstrated that adding redundant edge devices does not improve performance (Section IV-E2), we can still cluster edge devices based on their performance. The orchestration burden is thereby reduced from the quantity of devices to the quantity of clusters. For edge device clustering, any of various current approaches, such as [47, 48, 49] or the Kmeans clustering used in DCC [8] can be employed.

Second, the current offloading strategy delegated each task in the application as a whole to the edge devices. However, it is feasible that some tasks within an application may be particularly resource-intensive, therefore they may be subdivided into partial tasks and offloaded to other edge devices. There is an abundance of literature on partial offloading [50, 51, 52].

Third, the current algorithm uses a hyper-parameter $\eta$ to estimate the network transmission error between the network speed measured at the time of task execution and the actual network speed at the time of task orchestration. This transmission error over-provision establishes a lower bound for the transmission rate and prevents task offloading that could result in long transmission latency under potentially adverse network conditions. This hyper-parameter can be a learnable parameter, meaning the algorithm can learn to set this hyper-parameter throughout the orchestration process depending on the past performance of the network estimation.

VII Conclusion

In this paper, we present M-TEC, a novel multi-tier edge computing framework capable of running complex DAG-based applications. Importantly, M-TEC incorporates client-to-client offloading in addition to the conventional client-edge and client-cloud offloading. We proposed an algorithm for the joint optimization of end-to-end latency, failure probability, and cost, taking into account the unpredictability of network and device failure. We evaluated M-TEC on a real device testbed with a commercial CBRS 4G network using four applications spanning diverse DAG structures. We compared M-TEC to four cutting-edge edge scheduling technologies, including DCC, LaTS, LAVEA, and Petrel. We find that M-TEC reduces the end-to-end latency of applications by at least 8%, the average probability of failure for applications by more than 40%, and the average cost per application by around 12% compared to the best baseline.

References

[1] B. Varghese, N. Wang, S. Barbhuiya, P. Kilpatrick, and D. S. Nikolopoulos, “Challenges and opportunities in edge computing,” in 2016 IEEE International Conference on Smart Cloud (SmartCloud). IEEE, 2016, pp. 20–26.
[2] M. Chiang and T. Zhang, “Fog and IoT: An overview of research opportunities,” IEEE Internet of Things Journal, vol. 3, no. 6, pp. 854–864, 2016.
[3] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, “Convergence of edge computing and deep learning: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869–904, 2020.
[4] K. Cao, Y. Liu, G. Meng, and Q. Sun, “An overview on edge computing research,” IEEE Access, vol. 8, pp. 85 714–85 728, 2020.
[5] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, 2019.
[6] A. Yousefpour, C. Fung, T. Nguyen, K. Kadiyala, F. Jalali, A. Niakanlahiji, J. Kong, and J. P. Jue, “All one needs to know about fog computing and related edge computing paradigms: A complete survey,” Journal of Systems Architecture, vol. 98, pp. 289–330, 2019.
[7] S. Suryavansh, C. Bothra, K. T. Kim, M. Chiang, C. Peng, and S. Bagchi, “I-bot: Interference-based orchestration of tasks for dynamic unmanaged edge computing,” arXiv preprint arXiv:2011.05925, 2020.
[8] A. Naouri, H. Wu, N. A. Nouri, S. Dhelim, and H. Ning, “A novel framework for mobile-edge computing by optimizing task offloading,” IEEE Internet of Things Journal, vol. 8, no. 16, pp. 13 065–13 076, 2021.
[9] W. Zhang, S. Li, L. Liu, Z. Jia, Y. Zhang, and D. Raychaudhuri, “Hetero-edge: Orchestration of real-time vision applications on heterogeneous edge clouds,” in IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, 2019, pp. 1270–1278.
[10] L. Lin, P. Li, J. Xiong, and M. Lin, “Distributed and application-aware task scheduling in edge-clouds,” in 2018 14th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), 2018, pp. 165–170.
[11] S. Yi, Z. Hao, Q. Zhang, Q. Zhang, W. Shi, and Q. Li, “Lavea: Latency-aware video analytics on edge computing platform,” in 2nd ACM/IEEE Symposium on Edge Computing, 2017, pp. 1–13.
[12] K. Cheng, Y. Teng, W. Sun, A. Liu, and X. Wang, “Energy-efficient joint offloading and wireless resource allocation strategy in multi-MEC server systems,” in 2018 IEEE International Conference on Communications (ICC), 2018, pp. 1–6.
[13] Z. Zhao, R. Zhao, J. Xia, X. Lei, D. Li, C. Yuen, and L. Fan, “A novel framework of three-hierarchical offloading optimization for MEC in industrial IoT networks,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5424–5434, 2020.
[14] M. Huang, W. Liu, T. Wang, A. Liu, and S. Zhang, “A cloud–MEC collaborative task offloading scheme with service orchestration,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 5792–5805, 2020.
[15] P. Wang, Z. Zheng, B. Di, and L. Song, “Hetmec: Latency-optimal task assignment and resource allocation for heterogeneous multi-layer mobile edge computing,” IEEE Transactions on Wireless Communications, vol. 18, no. 10, pp. 4942–4956, 2019.
[16] R. Yu, G. Xue, and X. Zhang, “Application provisioning in fog computing-enabled Internet-of-things: A network perspective,” in IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, 2018, pp. 783–791.
[17] Q. Fan and N. Ansari, “Workload allocation in hierarchical cloudlet networks,” IEEE Communications Letters, vol. 22, no. 4, pp. 820–823, 2018.
[18] G. Hu, Y. Jia, and Z. Chen, “Multi-user computation offloading with D2D for mobile edge computing,” in 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1–6.
[19] Y. Sahni, J. Cao, L. Yang, and Y. Ji, “Multihop offloading of multiple dag tasks in collaborative edge computing,” IEEE Internet of Things Journal, vol. 8, no. 6, pp. 4893–4905, 2021.
[20] T. Elgamal, A. Sandur, K. Nahrstedt, and G. Agha, “Costless: Optimizing cost of serverless computing through function fusion and placement,” in 2018 IEEE/ACM Symposium on Edge Computing (SEC), 2018, pp. 300–312.
[21] Q. Pu, S. Venkataraman, and I. Stoica, “Shuffling, fast and slow: Scalable analytics on serverless infrastructure,” ser. NSDI’19. USA: USENIX Association, 2019, p. 193–206.
[22] X. Li, M. Abdallah, S. Suryavansh, M. Chiang, K. T. Kim, and S. Bagchi, “Dag-based task orchestration for edge computing,” in 2022 41st International Symposium on Reliable Distributed Systems (SRDS), 2022, pp. 23–34.
[23] H. Zhang, M. A. Roth, R. K. Panta, H. Wang, and S. Bagchi, “Crowdbind: Fairness enhanced late binding task scheduling in mobile crowdsensing,” 2020 International Conference on Embedded Wireless Systems and Networks, p. 61–72, 2020.
[24] X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, “Deepdecision: A mobile deep learning framework for edge video analytics,” IEEE Conference on Computer Communications (INFOCOM), pp. 1421–1429, 2018.
[25] E. El Haber, T. M. Nguyen, and C. Assi, “Joint optimization of computational cost and devices energy for task offloading in multi-tier edge-clouds,” IEEE Transactions on Communications, vol. 67, no. 5, pp. 3407–3421, 2019.
[26] C. Zhu, G. Pastor, Y. Xiao, Y. Li, and A. Ylae-Jaeaeski, “Fog following me: Latency and quality balanced task allocation in vehicular fog computing,” in 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), 2018, pp. 1–9.
[27] M. Qin, N. Cheng, Z. Jing, T. Yang, W. Xu, Q. Yang, and R. R. Rao, “Service-oriented energy-latency tradeoff for iot task partial offloading in mec-enhanced multi-rat networks,” IEEE Internet of Things Journal, vol. 8, no. 3, pp. 1896–1907, 2021.
[28] A. Mahgoub, K. Shankar, S. Mitra, A. Klimovic, S. Chaterji, and S. Bagchi, “Sonic: Application-aware data passing for chained serverless applications,” in USENIX Symposium on Operating Systems Design and Implementation, 2021, pp. 973 – 988.
[29] T. Han, L. Zhang, S. Pirbhulal, W. Wu, and V. H. C, “Computer networks,” vol. 158, 2019, pp. 114–122.
[30] X. Sun and N. Ansari, “Edgeiot: Mobile edge computing for the Internet of things,” IEEE Communications Magazine, vol. 54, no. 12, pp. 22–29, 2016.
[31] A. Papadimitriou, A. Narayan, and A. Haeberlen, “Dstress: Efficient differentially private computations on distributed data,” ser. EuroSys ’17. Association for Computing Machinery, 2017, p. 560–574.
[32] G. D. H. Hunt, R. Pai, M. V. Le, H. Jamjoom, S. Bhattiprolu, R. Boivie, L. Dufour, B. Frey, M. Kapur, K. A. Goldman, R. Grimm, J. Janakirman, J. M. Ludden, P. Mackerras, C. May, E. R. Palmer, B. B. Rao, L. Roy, W. A. Starke, J. Stuecheli, E. Valdez, and W. Voigt, “Confidential computing for openpower,” ser. EuroSys ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 294–310.
[33] J. Zhang, B. Chen, Y. Zhao, X. Cheng, and F. Hu, “Data security and privacy-preserving in edge computing paradigm: Survey and open issues,” IEEE Access, vol. 6, pp. 18 209–18 237, 2018.
[34] W. Tong, B. Jiang, F. Xu, Q. Li, and S. Zhong, “Privacy-preserving data integrity verification in mobile edge computing,” in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), 2019, pp. 1007–1018.
[35] S. Bian, X. Huang, and Z. Shao, “Online task scheduling for fog computing with multi-resource fairness,” in 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), 2019, pp. 1–5.
[36] S. Tang, C. Yu, and Y. Li, “Fairness-efficiency scheduling for cloud computing with soft fairness guarantees,” IEEE Transactions on Cloud Computing, vol. 10, no. 3, pp. 1806–1818, 2022.
[37] M. Zhao, W. Li, L. Bao, J. Luo, Z. He, and D. Liu, “Fairness-aware task scheduling and resource allocation in uav-enabled mobile edge computing networks,” IEEE Transactions on Green Communications and Networking, vol. 5, no. 4, pp. 2174–2187, 2021.
[38] H. Arabnejad and J. Barbosa, “Fairness resource sharing for dynamic workflow scheduling on heterogeneous systems,” in 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, 2012, pp. 633–639.
[39] 3GPP. (2019, April) Release 15.
[40] “Amazon ec2 on-demand pricing,” 2020, accessed: 2021-12-21. [Online]. Available: https://aws.amazon.com/ec2/pricing/on-demand/
[41] P. Wang, B. Di, L. Song, and N. R. Jennings, “Multi-layer computation offloading in distributed heterogeneous mobile edge computing networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 1301–1315, 2022.
[42] S. Misra and N. Saha, “Detour: Dynamic task offloading in software-defined fog for iot applications,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 5, pp. 1159–1166, 2019.
[43] H. A. Alameddine, S. Sharafeddine, S. Sebbah, S. Ayoubi, and C. Assi, “Dynamic task offloading and scheduling for low-latency iot services in multi-access edge computing,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 3, pp. 668–682, 2019.
[44] A. Mitra, J. A. Richards, S. Bagchi, and S. Sundaram, “Resilient distributed state estimation with mobile agents: overcoming byzantine adversaries, communication losses, and intermittent measurements,” Autonomous Robots, vol. 43, pp. 743–768, 2019.
[45] L. L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, and Y. Liu, “nn-meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices,” in Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, 2021, pp. 81–93.
[46] N. Ling, K. Wang, Y. He, G. Xing, and D. Xie, “Rt-mdl: Supporting real-time mixed deep learning tasks on edge platforms,” in Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 2021, p. 1–14.
[47] A. Asensio, X. Masip-Bruin, R. Durán, I. de Miguel, G. Ren, S. Daijavad, and A. Jukan, “Designing an efficient clustering strategy for combined fog-to-cloud scenarios,” Future Generation Computer Systems, p. 392–406, 2020.
[48] L. Shooshtarian, D. Lan, and A. Taherkordi, “A clustering-based approach to efficient resource allocation in fog computing,” Pervasive Systems, Algorithms and Networks, p. 207–224, 2019.
[49] M. Bouet and V. Conan, “Mobile edge computing resources optimization: A geo-clustering approach,” IEEE Transactions on Network and Service Management, vol. 15, no. 2, pp. 787–796, 2018.
[50] Z. Kuang, L. Li, J. Gao, L. Zhao, and A. Liu, “Partial offloading scheduling and power allocation for mobile edge computing systems,” IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6774–6785, 2019.
[51] J. Ren, G. Yu, Y. Cai, Y. He, and F. Qu, “Partial offloading for latency minimization in mobile-edge computing,” in GLOBECOM 2017 - 2017 IEEE Global Communications Conference, 2017, pp. 1–6.
[52] Z. Ning, P. Dong, X. Kong, and F. Xia, “A cooperative partial computation offloading scheme for mobile edge computing enabled Internet of things,” IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4804–4814, 2019.