A Multi-Layered Distributed Computing Framework for Enhanced Edge Computing
Abstract
The rise of the Internet of Things and edge computing has shifted computing resources closer to end-users, benefiting numerous delay-sensitive, computation-intensive applications. To speed up computation, distributed computing is a promising technique that allows parallel execution of tasks across multiple compute nodes. However, current research predominantly revolves around the master-worker paradigm, limiting resource sharing within one-hop neighborhoods. This limitation can render distributed computing ineffective in scenarios with limited nearby resources or constrained/dynamic connectivity. In this paper, we address this limitation by introducing a new distributed computing framework that extends resource sharing beyond one-hop neighborhoods through exploring layered network structures. Our framework involves transforming the network graph into a sink tree and formulating a joint optimization problem based on the layered tree structure for task allocation and scheduling. To solve this problem, we propose two exact methods that find optimal solutions and three heuristic strategies to improve efficiency and scalability. The performances of these methods are analyzed and evaluated through theoretical analyses and comprehensive simulation studies. The results demonstrate their promising performances over the traditional distributed computing and computation offloading strategies.
Index Terms:
Edge computing, Distributed computing, Multi-hop offloading.I Introduction
The proliferation of the Internet of Things (IoT) devices has enabled a multitude of delay-sensitive yet computation-intensive applications, such as face recognition, gaming, environment monitoring, and augmented/virtual reality [9]. The surge of these applications drives the migration of computing resources from the remote cloud to the network edge closer to end-users [10]. While computing at the edge offers compelling benefits, such as low latency, cost effectiveness, and improved data control and security, it also presents notable challenges. The distributed nature of edge servers, along with their inherent constraints in computing power, memory capacity, and available bandwidth when compared to the cloud, pose significant challenges to achieving high-performance edge computing [36].
To speed up computation, distributed computing can be employed. Existing distributed computing strategies typically adopt a master-worker paradigm [26], where a single master node partitions and distributes the task to multiple worker nodes that are directly connected to it. Although the master-worker paradigm is simple to implement, it restricts resource sharing within one-hop neighborhoods. In the edge computing paradigm with constrained computing nodes, scenarios may happen where the residual resources at nearby edge servers are very limited or even less than those at the master node, rendering such master-worker based distributed computing ineffective. This challenge becomes particularly pronounced in edge networks with restricted or dynamic connectivity, such as networked airborne computing systems comprised of drones serving as edge servers [30, 46].
In this paper, we overcome these challenges by exploring resources at distant servers located multiple hops away. While a similar idea has been explored in the mobile edge computing (MEC) [23, 38, 29, 7, 44, 4, 25] and Internet of Vehicle [6, 28, 49, 12] domains, where computation offloading is proposed to address users’ and vehicles’ computing demands, most existing studies focus on offloading tasks to a single MEC server located one or multiple hops away, with intermediate servers serving solely as relays. A few recent works [48, 43, 47, 15] have considered offloading tasks to multiple servers, but they overlook the task scheduling issue addressed in this work and offer only heuristic solutions. To the best of our knowledge, this is the first systematic investigation into computation offloading to multiple servers across multiple hops and into the benefits of layered structures for improving distributed computing performance. Additionally, we present efficient exact solutions for optimal task allocation and scheduling, which generalize existing solutions and are applicable to a wide range of networked computing networks.
The main contributions are summarized as follows:
-
•
A new multi-layered distributed computing framework: The proposed framework explores layered network structures to fully utilize the capacity of the entire edge computing network for enhanced system performance. It transforms the network graph into a sink tree, based on which a mixed integer programming (MIP) problem is formulated to jointly optimize task allocation and scheduling. Notably, this framework can be applied to any networked computing systems such as cloud computing, MEC, and networked airborne computing systems.
-
•
Two exact methods that find optimal solutions: To solve the optimization problem, we propose two exact methods, including a centralized method and a parallel enhancement of it. We also present an offline-online computation scheme that allows both methods to execute in real-time and handle dynamic and mobile networks. How these methods generalize existing solutions is also discussed.
-
•
Three heuristic methods for improving efficiency and scalability: While the two exact methods offer optimality, their execution time grows rapidly as the network expands. To mitigate this challenge, we introduce two worker selection methods, as well as a genetic algorithm for efficiently finding near-optimal solutions.
-
•
Comprehensive simulation studies: To evaluate the performance of the proposed approaches, we conduct extensive comparison studies. Our results demonstrate that enabling resource sharing within the entire network leads to better solutions compared to those found by the traditional distributed computing and computation offloading strategies. Additionally, simulations are conducted to assess the time efficiency of the proposed approaches and the impact of their key parameters.
It should be noted that the multi-layered distributed computing framework was initially introduced in a short conference version [32], which presented a different heuristic method for solving the MIP problem. This paper provides a more comprehensive and systematic investigation, featuring a set of new and rigorously analyzed methods.
The rest of the paper is organized as follows. Sec. II discusses related works. Sec. III describes the system model and the problem to be solved. Sec. IV introduces the proposed multi-layered distributed computing framework. The two exact methods and the three heuristic methods are introduced in Sec. V and Sec. VI, respectively. Sec. VII presents the results of simulation studies. Finally, Sec. VIII concludes the paper and discusses the future works.
II Related Works
This section reviews existing studies related to this work.
II-A Distributed Computing
In the field of distributed computing, the master-worker paradigm has been widely used to implement parallel applications [26, 39, 41, 40, 45]. In this paradigm, multiple workers share the workload assigned by the master and communicate directly with the master. To determine the optimal task allocation, various distributed computing strategies have been proposed. For instance, traditional server-based distributed computing systems often divide the workload among workers equally or proportionally according to workers’ computing power [41]. In heterogeneous systems or those with mobile compute nodes, stragglers, which are nodes with long response times, are common and can significantly degrade system performance. To mitigate the impact of stragglers, coded distributed computing techniques [39, 41, 40] have recently become increasingly popular. These techniques leverage coding theory to introduce redundancies into computations.
Another popular paradigm is the hierarchical master-worker paradigm [1], which involves a supervisor process managing multiple sets of processes, each consisting of a master process and multiple worker processes. It offers several advantages over the traditional master-worker paradigm, including improved scalability and fault tolerance [1]. Differing from these paradigms, we investigate a multi-layer master-worker paradigm that is composed of a single master and multiple workers operating at different layers.
II-B Computation Offloading
In the computing offloading domain, most existing studies consider a single-hop single-server offloading paradigm, where tasks are offloaded from users to a single edge server within their communication range [23, 38, 29, 25, 7, 4, 44]. The tasks can be offloaded as a whole or partially, known as binary offloading and partial offloading, respectively. Under this paradigm, many algorithms have been designed to make the optimal offloading decisions. For instance, studies in [23, 38] examine the scenario where tasks are offloaded from a single user to a single nearby server. [29] extends this analysis by taking server mobility and task dependency into account. There have also been studies that explore tasks from multiple users, optimizing not only the offloading decisions (whether to offload a task or determining the offloading ratio) but also the allocation of resources to each user. For instance, [25] considers the allocation of computing resources, while [7, 4] addresses the allocation of both computing and transmission power resources. The allocation of communication resources, including time slots under the Time Division Multiple Access protocol and sub-channels under the Orthogonal Frequency-Division Multiple Access (OFDMA) protocol, is considered in [44]. These problems are typically solved using numerical approaches [44, 4]. Reinforcement learning has also emerged as a promising tool for computation offloading [7, 25].
In practical scenarios, it is possible that there are no (powerful) edge servers nearby for the end users. To address this limitation, researchers have started to explore multi-hop offloading, enabling the offloading of tasks from users to remote servers multiple hops away. Along this direction, existing studies mostly consider offloading tasks to a single server, as seen in [19, 42, 6, 28, 49, 12]. Additionally, the three-tier network topology comprising end users, edge servers and cloud servers [35, 27, 37, 2] has garnered considerable interest. In their approach, tasks are offloaded either to a nearby edge server one hop away or to a cloud server two hops away, with edge servers acting as relays. In contrast, we consider all servers reachable by users and aim to facilitate collaborative computing among them.
There are also several works that investigate partitioning tasks into multiple parts and offloading these parts to multiple servers, which are most relevant to our study [48, 47, 15, 43]. For instance, [48] investigates the joint routing and multi-part offloading for both data and result. It employs a flow model to capture data/result traffic and introduces a distributed algorithm that finds optimal solutions in polynomial time. [43] formulates the multi-hop offloading problem as a potential game. By dividing tasks into subtasks of equal size, each device independently decides the number of subtasks to forward or compute based on its economic utility. The study in [15] addresses the distribution of a set of tasks, partitioned from a complex application, to multiple cooperative servers that may be multiple hops away. This problem is formulated as a task assignment problem and solved by an iterative algorithm. Another relevant work is presented in [47], which considers a joint user association, channel allocation, and task offloading problem. It solves this problem by combining the genetic algorithm and deep deterministic policy gradient algorithm.
Distinct from previous research, we delve into the essential benefits of layered network structures while investigating how network properties like topology and server resources affect system performance. We also address the task scheduling problem that arises when transmissions of subtasks share channels or relays, which has been overlooked by existing works. Moreover, we propose both exact and heuristic methods to solve the problem, and introduce an offline-online computation scheme to enable real-time implementation and make them adaptable to dynamic and mobile networks.
III System Model and Problem Description
In this section, we first present the system model and then describe the problem to be solved.
III-A System Model
Consider a network formed by edge servers, each with its own unique set of computing and communication capabilities. The servers can share resources with their neighbors either through cables in wired networks or wirelessly when they are within communication range. Additionally, a server can communicate with its one-hop neighbors simultaneously using techniques like Orthogonal Frequency Division Multiplexing [21]. For simplicity, interference among the servers is not considered in this study. The entire system is supervised and managed by a control center (e.g., a software defined networking controller [22]) to ensure that all tasks are completed efficiently and effectively.
Suppose one of the servers, referred to as master, needs to execute a computation-intensive task that is arbitrarily decomposable, which could be generated by the server itself or requested by a user nearby. To complete the task in a timely and energy-efficient manner, the master decomposes the task into subtasks and distributes them to the other servers, referred to as workers (see Fig. 1 for an illustration). The master (highlighted with red) can transmit subtasks simultaneously to their neighboring servers. However, for workers farther away, multi-hop routing is required, which means that each server in the network can act as a worker, a relay, or both. When a subtask arrives at a relay, it is added to a queue and processed in a first-in-first-out order. A worker will not start executing the assigned subtask until it receives the complete subtask package. When a server acts as both a worker and a relay, it can perform the relay process and execute the assigned task simultaneously.

In this preliminary study, we adopt several common assumptions made in existing studies [50, 24] to simplify our analysis. In particular, we assume that the network is stable with no package losses or retransmissions. Additionally, we assume that the computation result is relatively small, and hence the delay incurred in transmitting the result from workers back to the master is negligible. Under these assumptions, we model the network as follows.
III-A1 Network Model
The network is modeled as a directed graph , where is the set of edge servers and is the set of server-to-server communication links that connect servers that can communicate directly.
III-A2 Computing model
Let denote the computing capacity of server , i.e., CPU-cycle frequency (GHz). Given a task of size , let denote the total number of CPU cycles required to process one task size unit. The time required for server to process this task can then be expressed by [8]:
(1) |
III-A3 Communication Model
Let denote the data transmission rate from server to server , which we assume to be known for the sake of simplifying analysis. This rate can be approximated using the Shannon’s Theory [46] as , where is the channel bandwidth, and and represent the signal power and noise power, respectively.
III-A4 Energy Consumption Model
The energy consumed for executing a task mainly constitutes two components: energy consumed for computing and energy consumed for communication. The energy consumed for server to compute a task of size is given by [20]:
(2) |
where is the effective switched capacitance that depends on the chip architecture of server . The energy consumed for server to transmit a task of size to server is given by [20]:
(3) |
where represents the transmission power of server .
III-B Problem Description and Analysis
Without loss of generality, suppose the master receives a task of size to complete. Given the computing and communication characteristics of the whole network, i.e., , are known, the control center aims to jointly minimize the task completion time and energy consumption by partitioning the task into small subtasks and distributing them to other servers in the network.
Finding the optimal solution to this problem is nontrivial and challenging since it requires making decisions on several aspects, including identifying which servers the master should assign subtasks to, determining the amount of workload to be assigned to each worker, and selecting the transmission route for sending the subtask. Moreover, the order in which the subtasks should be sent by the master is also a crucial decision to make.
IV Multi-Layered Distributed Computing Framework
In this section, we present a multi-layered distributed computing framework to solve the problem described in the previous section. Motivated by the fact that a layered tree structure emerges when the master distributes tasks to other servers in the network, this framework first transforms the network graph into a sink tree and exploits this layered tree structure to find optimal task allocation and scheduling solutions.
IV-A Transforming Graph into a Sink Tree
Given the network graph and the characteristics of the servers and communication links forming the graph, we can find the shortest route from the master to each of the other servers in the network that takes the minimum time to transmit one bit of data. This can be achieved by defining the weight of each edge as the inverse of the associated data transmission rate, i.e., , and applying the Dijkstra’s algorithm [13] to find the most communication-efficient path. The resulting paths can then be used to construct a K-ary sink tree , where the master is the root node and all other servers are leaf or internal nodes reachable from the root via a unique path. This layered tree structure enables the distribution of tasks from the master to other servers in an efficient manner.
To facilitate subsequent analysis, we re-label the nodes in the tree level-by-level from the root downward, and from left to right within each level (see Fig. 2). Consequently, nodes in lower levels have larger indices. Let denote the set of indices of nodes in level , where is the height of the tree. Then, . Notably, the master (root) can transmit subtasks to its one-hop neighbors, i.e., nodes in Level , simultaneously. However, if any one-hop neighbor has children, the subtasks assigned to them, including the one-hop neighbor, have to be transmitted one by one. This is because they share the same channel between the master and the one-hop neighbor, and the data arriving at the one-hop neighbor is processed in a first-in-first-out manner. Therefore, the order in which these subtasks should be sent matters. Based on these analyses, we next formulate a joint task allocation and scheduling problem as a mixed integer programming (MIP) model.

IV-B Mixed Integer Programming Model
IV-B1 Decision Variables
To specify the computation workload allocated to each node , we introduce decision variables , where represents the size of the subtask assigned to node . If , it implies that node is not assigned any workload. Note that the master may choose to execute (part of) the task locally, in which case would be nonzero, i.e., .
To describe the offloading order for subtasks transmitted from the master to the other nodes, we introduce decision variables , where and . When , node has a higher priority than node to receive its subtask, where and .
IV-B2 Objective Function
We aim to achieve two objectives simultaneously: minimize the time spent and minimize the energy consumed by each node for executing the task. By employing a weighted sum method, we define the objective function as follows:
where are the weights, representing the relative importance of the two objectives. is the total time required for node to receive its subtask from the master and complete the assigned subtask. Note that the time required for completing the whole task is , . is the total energy consumed by node during task execution. is introduced to denote the cost associated with node . In the following sections, parentheses or subscripts may be omitted for simplicity when there is no confusion.
Next, we derive the formulas for and .
IV-B3 Time Consumption
The task completion time for node , , is comprised of three components: 1) time taken to transmit subtask of size from the master to node , denoted as ; 2) time spent waiting in the queues of relays along the path to node if any, denoted as ; and 3) time to execute the subtask, i.e., . It is noted that the waiting time is impacted by the task sizes assigned to other nodes and the offloading order, which complicates the optimization problem considered in this study.
To obtain the transmission time , we introduce the notation to denote the sequence of nodes that lie on the path from the master to node , and the notation to denote the -th node in the sequence, where , and . finds the cardinality of a set. can then be expressed by:
(5) |
Let’s now consider the waiting time . Let denote the full set of nodes in the -th subtree of the master, where , and . Additionally, define as the set of nodes whose subtasks will be transmitted before node . Note that if nodes and belong to different subtrees, i.e., while , the subtask for node is transmitted using a different channel that is orthogonal to the one used for node , and hence node does not need to wait for node ’s subtask to be transmitted even if . Based on these definitions, we can then express the waiting time as follows:
(6) |
where .
IV-B4 Energy Consumption
IV-C Problem Formulation
Mathematically, the multi-objective optimization problem can be formulated as follows:
Constraint ensures that the total assigned workload sums up to the total task size . Constraints - guarantee that each decision variable takes on valid values.
V Joint Optimization of Task Allocation and Scheduling
In this section, we introduce two exact methods to find the optimal solution to the joint task allocation and scheduling problem .
V-A Centralized MILP-based Optimization (CMO)
V-A1 Algorithm Description
It is noted that , which aims to minimize the maximum cost of individual nodes, is a minmax optimization problem. Hence, we can convert it into an equivalent mixed integer linear programming (MILP) problem by introducing an auxiliary variable as follows:
(9) | ||||
Then, the minimum cost , where is the minimum value of found by solving .
Problem can be further decomposed into two subproblems. The first subproblem aims to optimize the task allocation , given a particular offloading order denoted as :
Denote the optimal solution to problem at as . The second subproblem aims to optimize the offloading order :
Now let’s consider subproblem , which can be solved using Lagrange multipliers [5]. Particularly, the Lagrangian function can be defined as follows:
where and are Lagrangian multipliers. , . Define
The dual optimization problem is then constructed as follows:
As the objective function and the inequality constraints in our problem are convex, and the equality constraints are affine and strictly feasible, Slater’s condition [3] is satisfied and the strong duality holds. That means the optimal value of the primal problem is equal to the optimal value of its dual problem (V-A1). The optimal solution to can then be found by solving the following equation set, known as the Karush-Kuhn-Tucker (KKT) conditions [31]:
(11) |
To solve problem , we can use exhaustive search. This involves computing the cost for each possible offloading order and selecting the one that yields the smallest cost. However, as can take possible values, evaluating each possible value is time-consuming. A significant reduction in the number of possible values to evaluate can be achieved by exploiting the parallelism in sending subtasks belonging to different subtrees of the master. Specifically, the offloading orders for nodes in any subtree are independent of those in any other subtree , where and . Therefore, the number of possible values of that need to be evaluated can be reduced to . Algorithm 1 summarizes the procedure of the proposed approach, named the centralized MILP-based optimization (CMO).
V-A2 Computational Complexity Analysis
V-B Parallel MILP-based Optimization (PMO)
V-B1 Algorithm Description
The parallelism involved in sending subtasks belonging to different subtrees of the master can be further harnessed to greatly enhance efficiency. Specifically, the key idea is to decompose problem alternatively into two different subproblems. The first subproblem optimizes the task allocation and scheduling for nodes within each subtree, which can be solved in parallel. The second subproblem optimizes the total workload assigned to each subtree.
Mathematically, let be the total workload assigned to nodes within the -th subtree of the master, i.e., , . Additionally, let and represent the decision variables associated with nodes within the subtree . Then, the first subproblem can be formulated as follows:
Since tasks assigned to different subtrees can be transmitted simultaneously, this problem can be solved independently and in parallel for different subtrees.
Suppose given , the optimal solution to problem for subtree is . The second subproblem aims to optimize the workload assigned to each subtree as well as to the master, denoted as , which can be mathematically formulated as follows:
where . Of note, this subproblem can be conceptualized by abstracting each subtree as a single node. Therefore, is abstracted as a one-layer tree (see Fig. 3). The optimization of the workload assigned to each abstracted node (subtree) thus does not require consideration of the offloading order. Then, the optimal solution to , denoted as , , can be used to derive the optimal solution to the original problem . Particularly, , and the optimal offloading order for nodes within each subtree is given by .

Solving problem given is relatively straightforward. However, directly addressing subproblem is challenging because is continuous, and obtaining in the constraints requires solving subproblem . Before we proceed with our approach to solving these subproblems, we present the following lemma and theorem, which allow for their simplification.
Lemma 1.
Given and , for an arbitrary offloading order , suppose is an optimal solution to problem . Then, is an optimal solution to when the task size is changed to .
Proof.
As detailed in Sec. V, the optimal solution to can be found by solving the equation set (11). Suppose is the obtained optimal solution for task size . Note that the cost of each node can be expressed as a linear combination of the task assignments , i.e., , where are constants that depend on network characteristics. Hence, equation set (11) can be simplified as:
When the task size is changed to , the solution satisfies the above equation set. This indicates that it is an optimal solution to for task size . With this, the proof is now complete. ∎
Lemma 1 leads directly to the following theorem.
Theorem 2.
Given and , suppose is the optimal solution to problem . Then, is the optimal solution to when the task size is changed to .
The proportionality property described in Theorem 2 infers that, given the optimal solution to for any , i.e., , the subproblem can be simplified to:
Since and are known (by solving ), is now straightforward to solve.
Next, we describe our approach to solve subproblems and . Particularly, for , we can solve it by leveraging the CMO algorithm (Algorithm 1). For each subtree , , we run the CMO algorithm on the tree formed by as well as the master (as highlighted by the green dashed circle in Fig. 3), denoted as , where the input can take any value. Suppose the output generated by CMO for is denoted as , where specifies the tasks allocated to the nodes within . Then we let , . Given and for each , we can then solve the subproblem using a commercial solver, such as Gurobi [34] and CVX [18].
Denote the optimal solution to subproblem as . The optimal solution to the original problem can then be derived as:
(12) | |||||
(13) |
, , specifies the optimal offloading order for subtasks assigned to each node within each subtree , where subtasks for different subtrees can be transmitted simultaneously.
Algorithm 2 summarizes the procedure of the parallel MILP-based optimization (PMO) method.
V-B2 Computational Complexity Analysis
Since subtree , , contains nodes, CMO requires time to execute. The complexity of PMO is with a worst-case complexity of .
V-C Offline-Online Computation
Despite the fact that the computational complexity of CMO and PMO grows rapidly as the network expands, both can be executed in real-time by transferring the majority of the computations offline. This can be achieved by leveraging the proportionality property presented in Theorem 2. Particularly, for any task size , we can execute Algorithm 1 offline to derive a baseline optimal solution . Then, during online computations, upon receiving a new task , we can readily compute the associated optimal solution in real-time by scaling the baseline with a factor , i.e., .
This offline-online computation scheme also equips CMO and PMO with the ability to handle dynamic networks with time-varying network characteristics. One approach to deploying them in dynamic networks is to periodically execute Algorithm 1 to update the baseline solution with the latest network information. Alternatively, the baseline solution can be updated when significant network changes occur, such as alterations in the network topology.
VI Heuristic Methods
In this section, we introduce three heuristic methods to further speed up the computation.
VI-A Worker Selection
Through simulation studies, as presented in Sec. VII, we find that the solutions produced by CMO and PMO typically improve as more workers participate in computations. However, the rate of performance improvement diminishes as the network size reaches a certain threshold. This observation inspires us to consider selecting a subset of workers that contribute the most to performance improvement. Next, we introduce two worker selection methods: 1) a node pruning (NP) strategy, and 2) a level pruning (LP) strategy. These methods can be applied either individually or in combination. When PMO is utilized, they can be employed to prune each subtree before executing Line 2 in Algorithm 2.
VI-A1 Node Pruning (NP)
The key idea of NP is to “prune” nodes that are too costly to use. Specifically, this strategy evaluates each node one by one. For a given node , it estimates the cost of using this node by performing partial offloading [20], which finds the optimal task partition between the master and node exclusively. The obtained cost, denoted as , is then compared with the cost of local computing, i.e., the cost of processing the entire task at the root node, denoted as , which can be obtained by running . If the cost reduction, measured by , exceeds a predefined threshold , node is selected; otherwise, it is “pruned”. Here, “prune” means that no workload is assigned to the node. If the node is a leaf, it is removed from the tree. However, if it is an intermediate node with unpruned children, it remains and only acts as a relay.
VI-A2 Level Pruning (LP)
The key idea of LP is to trim nodes that are excessively distant from the master node, whose computing resources are too costly to use considering the significant communication costs. Specifically, this strategy evaluates the top levels of the original network tree, removing levels from to . The resulting tree, denoted as , satisfies .
VI-B Genetic Algorithm
The worker selection methods allow us to reduce workers but may prune nodes that could significantly improve system performance. Here, we introduce a genetic algorithm (GA) [33] that allows us to evaluate large networks. It involves two phases: initialization and training. In the initialization phase, a population set is first randomly generated, which consists of offloading orders (chromosomes). The corresponding optimal task partition and cost are then computed by solving (11), where is the fitness of the chromosome . Following the initialization, the training phase starts with Elitism, which picks the top of the fittest members from the current population and propagates them to the next generation. After that, an iterative procedure is performed to create offspring. In each iteration, two offloading orders are randomly picked from the current population according to the probability distribution . Ordered crossover [11] is then applied to create offspring. Subsequently, with a low probability , mutation is performed to introduce diversity into the new population by shuffling individual offloading orders. The algorithm terminates upon meeting the stopping condition at which point it outputs the best solution found. In our simulations, we set the stopping condition for GA to be reaching a maximum number of generations, denoted as .
VII Simulation Studies
In this section, we conduct simulation studies to evaluate the performance of the proposed approaches. We start by describing the experiment setup in Sec. VII-A. Next, we conduct two sets of studies to evaluate the optimality and efficiency of the proposed approaches in Sec. VII-B and Sec. VII-C, respectively. We then investigate the impact of key parameters in Sec. VII-D, followed by an analysis of the effects of network characteristics.
VII-A Experiment Setup
We evaluate the proposed approaches on different network graphs generated using the method introduced in [14]. The network graphs are then transformed into tree topologies by using Dijkstra’s algorithm. In each network topology, we configure the computing capacities of the servers by randomly generating values from the range of GHz. The values of the data transmission rates are randomly generated from the range of Gbps. Moreover, we set , and dBm for all . The task size is set to Gbits and cycles/Gbit. All approaches are implemented in Python and evaluated on an Alienware Aurora R15 Gaming Desktop with a 12 Gen Intel i9 CPU and 64G of memory.
VII-B Optimality Analysis
We first evaluate the optimality of the two optimal approaches, CMO and PMO. For comparison, we implement the following four state-of-the-art distributed computing and computation offloading schemes as benchmarks:
-
•
Local computing (Local): In this approach, the master executes the entire task locally.
-
•
Partial offloading (Partial): In this approach, the master offloads part of the task to one of its one-hop neighbors. The offloading ratio and offloadee selection are optimized to minimize the task completion time.
-
•
Master-worker distributed computing (Master-worker): In this approach, the master distributes the task to its one-hop neighbors using the master-worker paradigm. The task allocation is optimized to minimize the task completion time.
-
•
Multi-hop offloading (Multi-hop) [16]: In this approach, the master offloads the whole task to the most powerful and reliable server in the network, which may be multiple hops away.
Their performances are evaluated on four network graphs, which are transformed into trees with varying depths and breadths as illustrated in Fig. 4.

In the first experiment, we set the weights in the objective function to and , which transforms the objective of our approach to minimize the task completion time only, just like the benchmarks. As shown in Fig. 5(a), our approaches outperform all benchmarks across all scenarios. Among the benchmarks, Local and Multi-hop have the poorest performance since they only use the computing resources from a single server. Partial outperforms local computing and Multi-hop by utilizing the resources from two servers. The Master-worker achieves even better performance by utilizing computing resources from all servers within one hop. This experiment provides evidence that increasing the utilization of resources leads to better computing performance.
In the second experiment, we randomly set the weights to and , so that both computation efficiency and energy consumption are considered in our approaches. Note that these weight values are also used in the following experiments. Fig. 5(b) shows the comparison results, demonstrating the promising performance of our approaches in balancing task completion time and energy consumption. In Fig. 6(a) and Fig. 6(b), we show the task completion time, i.e., , and the maximum energy consumption by any server, i.e., , respectively. The result indicates that our approach outperforms all the benchmarks in both task completion time and maximum energy consumption.




VII-C Efficiency Analysis
In this subsection, we evaluate the efficiency of the proposed optimal methods, CMO and PMO, as well as the proposed heuristic methods, NP, LP, and GA. For the implementation of the two worker selection methods, NP and LP, we first use them to prune the network tree, and then apply PMO to allocate tasks. GA is also implemented within the framework of PMO, and employed to determine the task allocation for each subtree, replacing CMO in Line 2 of Algorithm 2.
VII-C1 Small-Scale Networks
We first consider the four small-scale network topologies depicted in Fig. 4. In this experiment, the threshold parameter in NP is set to , respectively, such that one node in each topology is pruned. The threshold parameter in LP is set to for the four topologies, respectively, resulting in the pruning of the last level of each topology. For GA, the parameters are configured as . To measure the efficiency of proposed approaches, we run each method 20 times and record the mean execution time, denoted as .
Fig. 7(a) shows the costs of the solutions found by the five methods. Comparing GA with the optimal methods, CMO and PMO, reveals that GA can find optimal solutions for small networks. This similarity in performance further demonstrates the optimality of GA for small networks. The worker selection methods, NP and LP, underperform compared to the other three methods, which is attributed to the reduced number of nodes involved in sharing the computational workload. Moreover, comparing the performance of LP across different topologies indicates that the extent of performance degradation is closely related to the proportion of nodes pruned from the network tree. Specifically, LP prunes 14.28%, 57.14%, 11.11%, and 37.5% of the nodes in the four topologies, respectively. The largest pruning proportion occur in Topology 2, resulting in the maximum level of performance degradation. For NP, as only one node is “pruned” in each topology, it performs better than LP in these scenarios.
The base-10 logarithm of the execution time, i.e., , of each method is shown in Fig. 7(b). As expected, the optimal methods, CMO and PMO, are more time-consuming than the three heuristic methods. Moreover, PMO, being a parallelized version of CMO, significantly reduces execution time in Topologies 2-4 due to its parallelism. For Topology 1, since the root has a single subtree, PMO is equivalent to CMO. Among the heuristic methods, LP achieves the least execution time by pruning the most nodes and significantly reducing the search space. GA, on the other hand, is the least efficient and even underperforms PMO in Topologies 2 & 4. This suggests that for small networks, PMO can be directly applied. Furthermore, comparing NP and PMO, we can observe that NP does not improve efficiency in all scenarios, despite reducing the number of workers. This is because only one node is “pruned” in each topology, and the time saved by pruning is offset by the overhead generated by the pruning procedure.


As the performance of the proposed approaches largely depend on the network size, we further vary the network size by increasing the number of subtrees, , where each subtree consists of 2 levels and 1 node in each level. The scenario where the network size expands due to the growth of subtrees is explored in the subsequent subsection. In this experiment, we configure parameter in NP in a way such that one additional node is “pruned” when including an additional subtree. Parameter in LP is set to in all cases, meaning that the node(s) in the last level are pruned. For GA, its parameters are configured as . Fig. 8(a) shows the performance of different methods as the number of subtrees increases. As we can see, increasing the number of subtrees results in a reduced total cost , as more nodes are involved in sharing the computational workload. When comparing NP and LP, NP consistently outperforms LP. It’s noteworthy that both methods prune the same number of nodes, each trimming one node from every subtree. This underscores the effectiveness of NP’s worker selection process, which employs a more rigorous approach compared to LP that simply selects nodes at the top levels. However, the simplicity of LP makes it more efficient than NP, as shown in Fig. 8(b). Additionally, from Fig. 8(b), we can observe a significant increase in the execution time of CMO as more subtrees are considered, compared to the other four methods. This is due to the parallelism inherent in the other four methods. Moreover, comparing PMO with the other methods further demonstrates the good performance of PMO in both optimality and efficiency in cases of small networks.


VII-C2 Large-Scale Networks
In this experiment, we consider larger networks and evaluate the performance of the three heuristic methods, NP, LP, and GA. For the implementation of NP and LP, GA is applied after pruning to determine the task allocation. Given that all these methods evaluate subtrees in parallel and their efficiency is bounded by the largest subtree, we evaluate their performance on networks with a single subtree. This approach allows us to avoid considering the impact of the number of subtrees. These networks are randomly generated with node counts of 10, 20, 30, and 50. The parameters in NP and LP are configured to prune a similar number of nodes as follows: the threshold in NP is set to , and the threshold in LP is set to for the corresponding networks, respectively. The parameters in GA, applied across all methods, are set to .
As shown in Fig. 9(a), the total cost generally decreases with the increase in network size for each method, as more nodes share the workload. Notably, the performance of GA degrades when the network size reaches 50. This is due to the large search space, making GA difficult to converge within 100 generations. Comparing the three methods, we can observe that GA outperforms the other two methods by considering all nodes in the network. NP generates better solutions than LP, although they prune roughly the same number of nodes. Additionally, NP and LP achieve performance comparable to GA in large networks (greater than 30 nodes), but with significantly lower execution times, as shown in Fig. 9(b). This suggests that for large networks, NP and/or LP can be applied first to select a subset of workers, followed by GA for task allocation.


VII-D Parameter Impact Analysis
In this subsection, we investigate the impact of key parameters in the proposed heuristic methods, including (1) the threshold in NP, (2) threshold in LP. All experiments are conducted on the network with 20 nodes, as described in Sec. VII-C2. GA is employed for task allocation, using the same configuration detailed in Sec. VII-C2.
VII-D1 Threshold
In NP, a worker is selected if the cost reduction from including this node exceeds the threshold . Therefore, a higher threshold will result in fewer nodes being selected and more nodes being “pruned”. This is demonstrated by the results shown in Fig. 10(a). As we can see, the best performance is achieved when , in which case no nodes are “pruned”. The worst performance occurs at , where all workers are “pruned” and all computations are done locally at the master. Moreover, as decreases, more workers are selected, resulting in a decrease in cost (see Fig. 10(a)) but an increase in execution time (see Fig. 10(b)). Notably, when is reduced to 0.4, tends to converge. This suggests that an appropriate value of that balances optimality and efficiency can be identified by selecting the value at which a sharp change in cost occurs.


VII-D2 Threshold
LP selects all nodes in the top levels as workers. In the special case when , no nodes are selected, resulting in all computations being conducted locally at the master. This leads to the highest cost , as shown in Fig. 11(a). As increases and more nodes are selected, performance improves, as indicated by the decreasing cost . However, the performance improvement slows down when exceeds 3. The best performance is achieved when reaches its maximum value, (height of the network tree), which is 12 in this experiment. Given the rapid increase in execution time with higher , as shown in Fig. 11(b), a proper value for can be chosen at the point where the rate of decrease in slows down.


VII-E Impact of Network Characteristics
The optimal task distribution decisions highly rely on the network characteristics. In this subsection, we explore how communication and computing parameters, specifically and , affect these decisions. For this analysis, we focus on Topology 4, as shown in 4 in Fig. 4.
In the first experiment, we vary , which represents the communication capacity between the master node (Node 0) and its left child (Node 1), from 0.3 Gbps to 10 Gbps. All other settings remain consistent with the previous studies. The optimal task allocation derived by PMO is shown in Fig.12(a). As the figure demonstrates, when is small, communication becomes a bottleneck, preventing the master from offloading tasks to Node 1 or to its descendants. However, as increases, more workload is offloaded to Node 1. Once exceeds certain thresholds, tasks are also offloaded to Node 1’s children and even grandchildren. With more nodes contributing to workload distribution, nodes in the right subtree of the master begin to receive fewer tasks. This study suggests that if a communication link is too slow, both the connected downstream node and its descendants may be pruned from the topology before executing CMO/PMO. To understand the impact of the computing characteristic, we instead vary the computing power of Node 1, , from 0.022 GHz to 21 GHz. As shown in Fig. 12(b), the master starts to offload tasks to Node 1 when its computing power exceeds a certain threshold. Moreover, when it shares more workload, the workloads assigned to all other nodes decrease simultaneously.
Notably, the network characteristics determine whether tasks are offloaded to a node, regardless of the total task size , as inferred from Theorem 2. To demonstrate this, we vary while keeping the network characteristics constant. Table 1 summarizes the optimal task allocations and the corresponding total costs computed by PMO. As shown, when increases, both the workload assigned to each node and the total cost rise proportionally.


![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/eec582ac-ce83-4a13-bccd-b07f43cb4a0a/table.png)
VIII Conclusion and Future Works
This paper introduces a novel multi-layered distributed computing framework that expands the computing capabilities of networked computing systems. Unlike conventional distributed computing paradigms that limit resource sharing to one-hop neighborhoods, our framework explores layered network structures to enable resource sharing beyond one-hop neighborhoods, effectively utilizing the resources of the entire network. To optimize system performance, we formulated an MIP problem that jointly minimizes task completion time and energy consumption through optimal task allocation and scheduling. Two exact methods, CMO and PMO, were proposed to solve this problem optimally, with PMO enhancing CMO’s efficiency by exploiting the parallelism in task distribution across the master’s subtrees. We also introduced an offline-online computation scheme to enable the real-time execution of CMO and PMO and allow them to handle dynamic networks with time-varying characteristics. To further enhance efficiency and scalability, three heuristic methods were introduced, including NP and LP for reducing the number of workers and GA for efficiently finding (sub-)optimal solutions. Simulation results demonstrate the superiority of our approaches over existing distributed computing and computation offloading schemes. Moreover, PMO shows promising performance in both optimality and efficiency for small networks. For larger networks, the results suggest applying NP or LP to reduce workers before using GA or PMO for task allocation. The results also show that NP outperforms LP in terms of optimality but is less efficient due to its more rigorous worker selection process. Additionally, studies on the impact of NP’s and LP’s parameters offer insights into their configurations. Lastly, the analysis of network characteristics highlights how the communication and computing capacities of individual servers influence task distribution decisions. In the future, we will extend this work to consider multi-task scenarios as well as dynamic and mobile networks. We will also investigate the hierarchical master-work paradigm.
Acknowledgment
We would like to thank National Science Foundation under Grant CAREER-2048266 and CCF-2402689 for the support of this work.
References
- [1] Kento Aida, Wataru Natsume and Yoshiaki Futakata “Distributed computing with hierarchical master-worker paradigm for parallel branch and bound algorithm” In The 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003, pp. 156–163 IEEE
- [2] Waleed Almuseelem “Energy-efficient and security-aware task offloading for multi-tier edge-cloud computing systems” In IEEE Access IEEE, 2023
- [3] Alfred Auslender and Marc Teboulle “Lagrangian duality and related multiplier methods for variational inequality problems” In SIAM Journal on Optimization 10.4 SIAM, 2000, pp. 1097–1115
- [4] Sergio Barbarossa, Stefania Sardellitti and Paolo Di Lorenzo “Joint allocation of computation and communication resources in multiuser mobile cloud computing” In Proceedings of IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2013, pp. 26–30 IEEE
- [5] Stephen Boyd “Convex optimization–boyd and vandenberghe”, 2004
- [6] Chen Chen et al. “A multihop task offloading decision model in MEC-enabled internet of vehicles” In IEEE Internet of Things Journal 10.4 IEEE, 2022, pp. 3215–3230
- [7] Juan Chen et al. “A DRL Agent for Jointly Optimizing Computation Offloading and Resource Allocation in MEC” In IEEE Internet of Things Journal 8.24, 2021, pp. 17508–17524 DOI: 10.1109/JIOT.2021.3081694
- [8] Jun Chen and Junfei Xie “Joint task scheduling, routing, and charging for multi-UAV based mobile edge computing” In Proceedings of IEEE International Conference on Communications, 2022, pp. 1–6 IEEE
- [9] Shanzhi Chen et al. “A vision of IoT: Applications, challenges, and opportunities with china perspective” In IEEE Internet of Things journal 1.4 IEEE, 2014, pp. 349–359
- [10] Ying Chen, Ning Zhang, Yongchao Zhang and Xin Chen “Dynamic computation offloading in edge computing for internet of things” In IEEE Internet of Things Journal 6.3 IEEE, 2018, pp. 4242–4251
- [11] Kusum Deep and Hadush Mebrahtu “New variations of order crossover for travelling salesman problem” In International Journal of Combinatorial Optimization Problems and Informatics 2.1 International Journal of Combinatorial Optimization ProblemsInformatics, 2011, pp. 2–13
- [12] Zizhen Deng, Zhen Cai and Mangui Liang “A multi-hop VANETs-assisted offloading strategy in vehicular mobile edge computing” In IEEE Access 8 IEEE, 2020, pp. 53062–53071
- [13] Edsger W Dijkstra “A note on two problems in connexion with graphs” In Edsger Wybe Dijkstra: His Life, Work, and Legacy, 2022, pp. 287–290
- [14] P ERDdS and A R&wi “On random graphs I” In Publ. math. debrecen 6.290-297, 1959, pp. 18
- [15] Colin Funai, Cristiano Tapparello and Wendi Heinzelman “Computational offloading for energy constrained devices in multi-hop cooperative networks” In IEEE Transactions on Mobile Computing 19.1 IEEE, 2019, pp. 60–73
- [16] Wei Gao “Opportunistic peer-to-peer mobile cloud computing at the tactical edge” In Proceedings of IEEE Military Communications Conference, 2014, pp. 1614–1620 IEEE
- [17] Werner H Greub “Linear algebra” Springer Science & Business Media, 2012
- [18] Dayan Adionel Guimaraes, Giovanni Henrique Faria Floriano and Lucas Silvestre Chaves “A tutorial on the CVX system for modeling and solving convex optimization problems” In IEEE Latin America Transactions 13.5 IEEE, 2015, pp. 1228–1257
- [19] Zicong Hong et al. “Multi-hop cooperative computation offloading for industrial IoT–edge–cloud computing environments” In IEEE Transactions on Parallel and Distributed Systems 30.12 IEEE, 2019, pp. 2759–2774
- [20] Qiyu Hu et al. “Joint Offloading and Trajectory Design for UAV-Enabled Mobile Edge Computing Systems” In IEEE Internet of Things Journal 6.2, 2019, pp. 1879–1892 DOI: 10.1109/JIOT.2018.2878876
- [21] Taewon Hwang et al. “OFDM and its wireless applications: A survey” In IEEE transactions on Vehicular Technology 58.4 IEEE, 2008, pp. 1673–1694
- [22] Diego Kreutz et al. “Software-defined networking: A comprehensive survey” In Proceedings of the IEEE 103.1 Ieee, 2014, pp. 14–76
- [23] Zhufang Kuang et al. “Partial offloading scheduling and power allocation for mobile edge computing systems” In IEEE Internet of Things Journal 6.4 IEEE, 2019, pp. 6774–6785
- [24] He Li, Kaoru Ota and Mianxiong Dong “Learning IoT in edge: Deep learning for the Internet of Things with edge computing” In IEEE Network 32.1 IEEE, 2018, pp. 96–101
- [25] Ji Li, Hui Gao, Tiejun Lv and Yueming Lu “Deep reinforcement learning based computation offloading and resource allocation for MEC” In Proceedings of IEEE Wireless Communications and Networking Conference (WCNC), 2018, pp. 1–6 IEEE
- [26] J Linderoth, JP Goux and M Yoder “Metacomputing and the master-worker paradigm” In Mathematics and Computer Science Division, Argonne National Laboratory, Tech. Rep. ANL/MCS-P792–0200, 2000
- [27] Fangzheng Liu, Jiwei Huang and Xianbin Wang “Joint task offloading and resource allocation for device-edge-cloud collaboration with subtask dependencies” In IEEE Transactions on Cloud Computing 11.3 IEEE, 2023, pp. 3027–3039
- [28] Lei Liu et al. “Mobility-aware multi-hop task offloading for autonomous driving in vehicular edge computing and networks” In IEEE Transactions on Intelligent Transportation Systems 24.2 IEEE, 2022, pp. 2169–2182
- [29] Zhang Liu et al. “RFID: Towards low latency and reliable DAG task scheduling over dynamic vehicular clouds” In IEEE Transactions on Vehicular Technology 72.9 IEEE, 2023, pp. 12139–12153
- [30] Kejie Lu, Junfei Xie, Yan Wan and Shengli Fu “Toward uav-based airborne computing” In IEEE Wireless Communications 26.6 IEEE, 2019, pp. 172–179
- [31] Zhi-Quan Luo and Wei Yu “An introduction to convex optimization for communications and signal processing” In IEEE Journal on Selected Areas in Communications 24.8 IEEE, 2006, pp. 1426–1438
- [32] Ke Ma and Junfei Xie “Joint Task Allocation and Scheduling for Multi - Hop Distributed Computing” In ICC 2024 - IEEE International Conference on Communications, 2024, pp. 2664–2669 DOI: 10.1109/ICC51166.2024.10622383
- [33] Seyedali Mirjalili and Seyedali Mirjalili “Genetic algorithm” In Evolutionary algorithms and neural networks: Theory and applications Springer, 2019, pp. 43–55
- [34] Joo Pedro Pedroso “Optimization with gurobi and python” In INESC Porto and Universidade do Porto,, Porto, Portugal 1, 2011
- [35] Houyi Qi et al. “Bridge the Present and Future: A Cross-Layer Matching Game in Dynamic Cloud-Aided Mobile Edge Networks” In IEEE Transactions on Mobile Computing IEEE, 2024
- [36] Weisong Shi et al. “Edge computing: Vision and challenges” In IEEE Internet of Things Journal 3.5 Ieee, 2016, pp. 637–646
- [37] Yilong Sun, Zhiyong Wu, Ke Meng and Yunhui Zheng “Vehicular task offloading and job scheduling method based on cloud-edge computing” In IEEE Transactions on Intelligent Transportation Systems IEEE, 2023
- [38] Qiang Tang et al. “Partial offloading strategy for mobile edge computing considering mixed overhead of time and energy” In Neural Computing and Applications 32 Springer, 2020, pp. 15383–15397
- [39] Baoqian Wang et al. “Learning and Batch-Processing Based Coded Computation With Mobility Awareness for Networked Airborne Computing” In IEEE Transactions on Vehicular Technology 72.5, 2023, pp. 6503–6517 DOI: 10.1109/TVT.2022.3231179
- [40] Baoqian Wang et al. “Multi-agent reinforcement learning based coded computation for mobile ad hoc computing” In Proceedings of IEEE International Conference on Communications, 2021, pp. 1–6 IEEE
- [41] Baoqian Wang et al. “On batch-processing based coded computing for heterogeneous distributed computing systems” In IEEE Transactions on Network Science and Engineering 8.3 IEEE, 2021, pp. 2438–2454
- [42] Yangang Wang, Hai Wang and Xianglin Wei “Energy-efficient UAV deployment and task scheduling in multi-UAV edge computing” In Proceedings of International Conference on Wireless Communications and Signal Processing (WCSP), 2020, pp. 1147–1152 IEEE
- [43] Jindou Xie et al. “Dynamic D2D Multihop Offloading in Multi-Access Edge Computing From the Perspective of Learning Theory in Games” In IEEE Transactions on Network and Service Management 20.1 IEEE, 2022, pp. 305–318
- [44] Changsheng You, Kaibin Huang, Hyukjin Chae and Byoung-Hoon Kim “Energy-efficient resource allocation for mobile-edge computation offloading” In IEEE Transactions on Wireless Communications 16.3 IEEE, 2016, pp. 1397–1411
- [45] Haomeng Zhang, Junfei Xie and Xinyu Zhang “Communication-Efficient -Stepping for Distributed Computing Systems” In Proceedings of International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), 2023, pp. 369–374 IEEE
- [46] Haomeng Zhang et al. “Exploring Networked Airborne Computing: A Comprehensive Approach with Advanced Simulator and Hardware Testbed” In Unmanned Systems World Scientific, 2023
- [47] Hongxia Zhang, Yongjin Yang, Bodong Shang and Peiying Zhang “Joint resource allocation and multi-part collaborative task offloading in MEC systems” In IEEE Transactions on Vehicular Technology 71.8 IEEE, 2022, pp. 8877–8890
- [48] Jinkun Zhang, Yuezhou Liu and Edmund Yeh “Result and Congestion Aware Optimal Routing and Partial Offloading in Collaborative Edge Computing” In arXiv preprint arXiv:2205.00714, 2022
- [49] Wei Zhao et al. “Asynchronous DRL Based Multi-Hop Task Offloading in RSU-Assisted IoV Networks” In IEEE Transactions on Cognitive Communications and Networking IEEE, 2024
- [50] Fuhui Zhou, Yongpeng Wu, Rose Qingyang Hu and Yi Qian “Computation Rate Maximization in UAV-Enabled Wireless-Powered Mobile-Edge Computing Systems” In IEEE Journal on Selected Areas in Communications 36.9, 2018, pp. 1927–1941 DOI: 10.1109/JSAC.2018.2864426
![]() |
Ke Ma is currently a PhD candidate in the joint doctoral program of University of California, San Diego and San Diego State University. He received the B.Sc. degree in Communication Engineer from Donghua University, Shanghai, China, in 2019 and the M.S. degree in Computer Engineer from University of California at Riverside, California, in 2022. |
![]() |
Junfei Xie (S’13-M’16-SM’21) is currently an Associate Professor in the Department of Electrical and Computer Engineering at San Diego State University. She received the B.S. degree in Electrical Engineering from University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2012. She received the M.S. degree in Electrical Engineering in 2013 and the Ph.D. degree in Computer Science and Engineering from University of North Texas, Denton, TX, in 2016. From 2016 to 2019, she was an Assistant Professor in the Department of Computing Sciences at Texas A&M University-Corpus Christi. She is the recipient of the NSF CAREER Award. Her current research interests include large-scale dynamic system design and control, airborne networks, airborne computing, and air traffic flow management, etc. |