Decentralized Network Topology Design for Task Offloading in Mobile Edge Computing

Ke Ma Department of Electrical and Computer Engineering
University of California, San Diego and San Diego State University
La Jolla, USA
[email protected] Junfei Xie Department of Electrical and Computer Engineering
San Diego State University
San Diego, USA
[email protected]

Abstract

The rise of delay-sensitive yet computing-intensive Internet of Things (IoT) applications poses challenges due to the limited processing power of IoT devices. Mobile Edge Computing (MEC) offers a promising solution to address these challenges by placing computing servers close to end users. Despite extensive research on MEC, optimizing network topology to improve computational efficiency remains underexplored. Recognizing the critical role of network topology, we introduce a novel decentralized network topology design strategy for task offloading (DNTD-TO) that jointly considers topology design and task allocation. Inspired by communication and sensor networks, DNTD-TO efficiently constructs three-layered network structures for task offloading and generates optimal task allocations for these structures. Comparisons with existing topology design methods demonstrate the promising performance of our approach.

Index Terms:

MEC, Task offloading, Network topology design

I Introduction

With the advancement in Internet of Things (IoT) technology, many delay-sensitive yet computing-intensive applications have emerged, such as automotive driving, face recognition and virtual reality[1]. However, IoT devices typically have limited computing power, making it difficult to meet the demands of these tasks. Centralized cloud computing is traditionally used to process these tasks, but offloading them to the remote cloud can cause significant transmission delays that degrade user Quality of service (QoS). To address this issue, Mobile Edge Computing (MEC) solutions were introduced[2], where MEC places servers closer to end users at the network edge, such as at base stations, to reduce transmission delays.

To better serve end users, extensive research has been conducted in the field of MEC, addressing various design aspects such as system deployment, task offloading, resource allocation, mobility management, and privacy and security [3]. Nevertheless, little attention has been given to optimizing network topology to enhance computational efficiency. Most existing studies primarily focus on offloading tasks from users to one or more nearby MEC servers within communication range [4, 5]. A few studies [6, 7] have explored offloading tasks to servers multiple hops away, but these designs did not consider the impact of network topology. In our prior work [8], we proposed a multi-layered task offloading framework and demonstrated that computational efficiency can be improved by leveraging layered network structures. We also showed that computational efficiency is influenced not only by the computing and communication characteristics of the servers but also by their network topology. In this paper, we aim to further investigate the joint design of layered network structure and task allocation.

Layered structures have been widely used in communication and practical sensor networks due to energy efficiency and network management simplicity [9]. In these networks, a base station is typically present alongside several clusters of sensors or communication devices, with each cluster comprising a Cluster Head (CH) and multiple Cluster Members (CMs). The CH collects data from its CMs and transmits it to the base station. Methods for selecting CHs and CMs can be broadly categorized into two types: cluster-based [10, 11, 12, 13] and grid-based [14, 15, 16, 17]. In cluster-based methods, CHs are selected directly based on certain criteria. In contrast, grid-based methods first divide the network into grids, and then select CHs within each grid. Despite the widespread use of layered structures in communication and sensor networks, their design for task offloading remains underdeveloped.

In this paper, we introduce a decentralized network topology design strategy for task offloading (DNTD-TO). We explore three-layered network structures, similar to those commonly used in communication and sensor networks, to facilitate computing. In this setup, tasks are offloaded from the root node (referred to as master) to servers in the second layer (CHs), which then distribute the tasks to their child nodes in the third layer (CMs). To select CHs and CMs, our strategy iterates through two nested phases: a local cluster formation phase, where each server within master’s communication range selects CMs in a decentralized manner, and a cluster selection phase where the master selects CHs and their associated CMs. The selection of CHs and CMs is based on servers’ task processing capacities and their potential to enhance computational efficiency. Optimal task allocation is integrated into every step of the selection process.

The rest of the paper is organized as follows. Sec. II covers system modeling and problem formulation. Our approach and simulation studies are presented in Sec. III and Sec. IV, respectively. Sec. V concludes the paper.

II System Modeling and Problem Formulation

Consider an MEC system with $N$ servers scattered in an open area, each communicating wirelessly only with nearby servers within a uniform communication range $\xi$ . Suppose one of the servers, referred to as the master (e.g., a server located at or near a base station), receives computation task processing requests from end users with a total task size of $Y$ . To process tasks efficiently, the master allocates tasks, which are assumed to be arbitrarily decomposable, to its neighbors, referred to as CHs. The CHs then decide whether to further offload these tasks to their own neighbors, referred to as CMs. Since simply offloading tasks to all neighbors may not yield optimal performance, this study investigates the joint optimization of network topology and task allocation.

II-A System Modeling

The MEC system is modeled as a graph $G=\{\mathcal{V},\mathcal{E}\}$ , where $\mathcal{V}=\{0,1,\ldots,N-1\}$ denotes the set of servers, with the master server labeled as $0$ . $\mathcal{E}$ represents the set of edges. An edge between two servers, $i$ and $j$ , is established when they are within each other’s communication range, meaning their Euclidean distance, denoted as $d_{ij}$ , satisfies $d_{ij}\leq\xi$ . The connectivity is described by the adjacency matrix $A$ , where each element $a_{ij}$ is equal to $1$ if there is an edge between server $i$ and server $j$ , and $0$ otherwise.

Additionally, define $\mathcal{N}_{i}=\{j|a_{ij}=1,\forall j\in\mathcal{V}\}$ , where $i\in\mathcal{V}$ , as the set of neighbors of server $i$ that are within its communication range. By strategically selecting CHs from the set $\mathcal{N}_{0}$ to receive tasks offloaded from the master, and assigning CMs from the set $\mathcal{N}_{i}$ to each CH $i$ to handle tasks offloaded from CH $i$ , the master, CHs, and CMs form a three-layer tree topology, denoted as $\mathcal{T}$ . We assume that each CM can belong to only one CH.

To describe data transmission between two servers, we adopt the model in [18]. Specifically, suppose the servers communicate using the Orthogonal Frequency Division Multiplexing protocol [19], where the bandwidth is equally divided when a server transmits data simultaneously to its connected servers via orthogonal channels. For simplicity of analysis, we assume the channels are free from interference. If a server $i$ (e.g., a CH) transmits data to its $m$ neighboring servers (e.g., associated CMs) simultaneously, the data transmission rate for the link between server $i$ and its $j$ -th neighbor is given by $R_{ij}=\frac{B}{m}log_{2}(1+\frac{\beta_{ij}}{d_{ij}^{2}})$ , where $B$ (MHz) denotes the total bandwidth, and $\beta_{ij}$ represents the signal-to-noise (SNR) ratio.

The computation model from [19] is used in this study to describe the task processing time. Let $b$ represent the number of CPU cycles required to compute 1 bit of data, and $f_{i}$ (MHz) denote the computation capacity of server $i$ . Given a task of size $y$ (Gbits), the time taken by server $i$ to process the task is $T_{i}^{comp}=y\gamma_{i}$ , where $\gamma_{i}=\frac{b}{f_{i}}$ (s/Mbits) indicates the time taken by server $i$ to process one Gbit of data.

II-B Problem Formulation

In this subsection, we present the mathematical formulation of the problem.

The master selects a set of CHs from set $\mathcal{N}_{0}$ for task offloading. To describe the selection of CHs, we introduce binary decision variables $o_{i}$ , where $i\in\mathcal{N}_{0}$ . The variable $o_{i}$ equals 1 if server $i$ is selected as a CH, and 0 otherwise.

After receiving tasks from the master, the CHs select their CMs for further task offloading. To describe the selection of CMs, we introduce decision variables $\boldsymbol{x}_{i}=[x_{i1},x_{i2},\ldots,x_{i|\mathcal{N}_{i}|}]$ , where $i\in\mathcal{N}_{0}$ and $|\mathcal{N}_{i}|$ finds the cardinality of the set $\mathcal{N}_{i}$ . $\boldsymbol{x}_{i}\in\mathbb{R}^{|\mathcal{N}_{i}|}$ is a binary vector, where its $j$ -th element, $x_{ij}$ , equals 1 if server $j$ is selected to join the cluster $i$ , and 0 otherwise. The resulting cluster with CH $i\in\mathcal{N}_{0}$ , denoted as $C_{i}$ , is given by $C_{i}=\{j|x_{ij}=1,\forall j\in\mathcal{N}_{i}\}$ .

Additionally, we introduce continuous decision variables $y_{i}\in\mathbb{R}_{\geq 0}$ to represent the size of the tasks offloaded to server $i$ , where $i\in\mathcal{V}$ . In case when $i=0$ , $y_{0}$ indicates the size of the tasks processed at the master.

The time required for each server $i$ to receive its assigned tasks can then be expressed as:

T_{i}^{tran}=\begin{cases}o_{i}y_{i}\frac{1}{R_{0i}},&\text{if }i\in\mathcal{N}_{0}\\ o_{j}x_{ji}y_{i}(\frac{1}{R_{ji}}+\frac{1}{R_{0j}}),&\text{if }i\in\mathcal{N}_{j},j\in\mathcal{N}_{0}\\ 0,&\text{else}\\ \end{cases}

(1)

Notably, if server $i$ is not selected for task offloading, the associated transmission delay is 0.

The total time required for each server $i$ to receive and process the assigned computation tasks is given by $\mathcal{J}_{i}=T_{i}^{tran}+T_{i}^{comp},i\in\mathcal{V}$ , where $T_{i}^{comp}=y_{i}\gamma_{i}$ . Here, we assume the generated results are small in size, making the transmission delay for sending them back to the master negligible, as is often assumed in existing studies (e.g., [20]). The total task completion time can then be written as $\mathcal{J}=\max\mathcal{J}_{i},\ \forall i\in\mathcal{V}$ .

The objective of this study is to minimize the total task completion time by jointly optimizing the selection of CHs and CMs, and the task allocation, which is formulated as follows:

$\displaystyle\mathcal{P}_{\text{main}}:$	$\displaystyle\min_{\{y_{i}\}_{i\in\mathcal{V}},\{\boldsymbol{x}_{i},o_{i}\}_{i\in\mathcal{N}_{0}}}\quad\mathcal{J}$
$\displaystyle s.t.\quad$	$\displaystyle o_{i}\in\{0,1\},\forall i\in\mathcal{N}_{0}$	$\displaystyle C1$
	$\displaystyle x_{ij}\in\{0,1\},\forall i\in\mathcal{N}_{0},j\in\mathcal{N}_{i}$	$\displaystyle C2$
	$\displaystyle 0\leq y_{i}\leq Y,\forall i\in\mathcal{V}$	$\displaystyle C3$
	$\displaystyle C_{i}\cap C_{j}=\emptyset,\forall i,j\in\mathcal{N}_{0},i\neq j$	$\displaystyle C4$
	$\displaystyle y_{i}=\sum_{j\in\mathcal{N}_{i}}x_{ij}y_{j},i\in\mathcal{N}_{0}$	$\displaystyle C5$
	$\displaystyle y_{0}+\sum_{i\in\mathcal{N}_{0}}o_{i}y_{i}=Y$	$\displaystyle C6$

Constraints $C1$ - $C3$ ensure that the decision variables take valid values. Constraints $C4$ ensure that clusters do not overlap. Constraints $C5$ - $C6$ maintain the integrity of task sizes. It should be noted that solving this problem directly is challenging due to the nonlinear constraints $C4$ .

III Decentralized Network Topology Design for Task Offloading (DNTD-TO)

In this section, we introduce our approach, DNTD-TO, for solving problem $\mathcal{P}_{\text{main}}$ . This method consists of two nested phases: (1) Local Cluster Formation (LCF) phase, and (2) Cluster Selection phase.

III-A Local Cluster Formation (LCF)

In this phase, each server $i\in\mathcal{N}_{0}$ within the master’s communication range selects its CMs in a decentralized manner, forming a candidate cluster with itself as the CH. These candidate clusters will then be examined by the master in the cluster selection phase as detailed in the next subsection. Notably, this phase allows a server to join multiple clusters.

To select CMs, each server $i\in\mathcal{N}_{0}$ employs a forward selection mechanism, picking CMs from its neighbors $\forall j\in\mathcal{N}_{i}$ one by one until either no further performance gains can be achieved or all neighbors have been evaluated. In each iteration, the CH adds the neighboring server that maximally reduces the task processing time, assuming tasks of any size $y$ are assigned to server $i$ . This selection is achieved by identifying the neighboring server with the highest processing capacity, defined as the time to receive and process a unit-sized task, and determining if its addition would reduce the overall task processing time. This is challenging, as a neighboring server’s processing capacity depends on the number of servers in the cluster due to shared bandwidth. Additionally, determining time savings from adding a server requires solving an optimization problem for task allocation. In the following, we first derive the processing capacity, denoted as $\alpha_{i}$ .

Initially, the cluster $C_{i}$ , with server $i\in\mathcal{N}_{0}$ as the CH, is empty. Therefore, when a neighboring server is added to the cluster, this server can utilize the entire bandwidth resource. Then, for each server $j\in\mathcal{F}=\mathcal{N}_{i}$ within $i$ ’s communication range, its processing capacity $\alpha_{j}$ can be derived as $\alpha_{j}=\gamma_{j}+\frac{1}{Blog_{2}(1+\beta_{ij}d_{ij}^{-2})},j\in\mathcal{F}$ , where $\mathcal{F}$ denotes the set of neighboring servers that have not been added to the cluster. Nevertheless, in subsequent iterations, as the cluster $C_{i}$ becomes nonempty, a neighboring server $j\in\mathcal{F}$ can no longer utilize the entire bandwidth resource, and its processing capacity $\alpha_{j}$ is given by:

\displaystyle\alpha_{j}=\gamma_{j}+\frac{|C_{i}|+1}{Blog_{2}(1+\beta_{ij}d_{ij}^{-2})},j\in\mathcal{F}

(2)

Notably, the processing capacity of the CH $i$ is always $\alpha_{i}=\gamma_{i}$ due to local computing.

To determine whether adding a neighboring server $j$ would reduce the task processing time, we define a performance indicator $\mathtt{I}$ that compares the minimum task processing time before and after the server is added. Specifically, consider cluster $C_{i}$ at the $k$ -th iteration, for task $y$ assigned to server $i\in\mathcal{N}_{0}$ , the minimum task processing time and the optimal task allocation can be derived by solving the following optimization problem:

	$\displaystyle\mathcal{P}_{1}:$	$\displaystyle\min_{y_{l},\forall l\in C_{i}\cup\{i\}}J$
	$\displaystyle s.t.\quad$	$\displaystyle\sum_{l\in C_{i}\cup\{i\}}y_{l}=y$

where $J=\max_{l\in C_{i}\cup\{i\}}J_{l}$ is the task processing time and $J_{l}=y_{l}\alpha_{l}$ is the time required for server $l$ to receive and process its assigned task $y_{l}$ . The performance indicator is then defined as $\mathtt{I}=\frac{J^{*}_{C_{i}}}{J^{*}_{C_{i}\cup\{j\}}}$ , where $J^{*}_{C_{i}}$ and $J^{*}_{C_{i}\cup\{j\}}$ represent the minimum task processing time obtained before and after adding server $j\in\mathcal{F}$ to cluster $C_{i}$ , respectively. Therefore, if $\mathtt{I}>1$ , adding the server to the cluster will improve performance; otherwise, it will not.

To derive the formula for $\mathtt{I}$ , we solve the above optimization problem, which leads to the following lemma, with the proof provided in the Appendix.

Lemma 1.

Consider problem $\mathcal{P}_{1}$ . The optimal task allocation, denoted as $y^{*}_{l}$ , $\forall l\in C_{i}\cup\{i\}$ , satisfy $\alpha_{i}y^{*}_{i}=\alpha_{l}y^{*}_{l}$ , $\forall l\neq i,l\in C_{i}$ , when $C_{i}\neq\emptyset$ . Consequently, the minimum task processing time is given by $J^{*}=\alpha_{i}y^{*}_{i}=\alpha_{l}y^{*}_{l}$ , $l\in C_{i}$ . In the special case where $C_{i}=\emptyset$ , we have $J^{*}=\alpha_{i}y$ .

From the above lemma, we can derive that $J^{*}_{C_{i}}=y_{l}^{*}\alpha_{l}$ and $J^{*}_{C_{i}\cup\{j\}}=\bar{y}_{l}^{*}\bar{\alpha}_{l}=\bar{y}_{j}^{*}\bar{\alpha}_{j}$ for $l\in C_{i}\cup\{i\}$ . Here, $\alpha_{l}$ and $\bar{\alpha}_{l}$ represent the processing capacities for server $l$ before and after adding server $j$ to cluster $C_{i}$ , respectively, while $y^{*}_{l}$ and $\bar{y}^{*}_{l}$ denote the corresponding optimal task allocations. Therefore, we have $\mathtt{I}=\frac{y_{l}^{*}\alpha_{l}}{\bar{y}_{l}^{*}\bar{\alpha}_{l}}$ . In this equation, the processing capacities $\alpha_{l}$ and $\bar{\alpha}_{l}$ can be readily computed using (2). Specifically, if $l\in C_{i}$ , $\alpha_{l}=\gamma_{l}+\frac{|C_{i}|}{Blog_{2}(1+\beta_{il}d_{il}^{-2})}$ and $\bar{\alpha}_{l}=\gamma_{l}+\frac{|C_{i}|+1}{Blog_{2}(1+\beta_{il}d_{il}^{-2})}$ ; otherwise, if $l=i$ , $\alpha_{i}=\bar{\alpha}_{i}=\gamma_{i}$ . However, determining the optimal task allocations $y_{l}^{*}$ and $\bar{y}_{l}^{*}$ by solving $\mathcal{P}_{1}$ at each iteration is time-consuming. To address this issue, we introduce an iterative method to efficiently compute the values of $\mathtt{I}$ , $y_{l}^{*}$ and $\bar{y}_{l}^{*}$ .

Initially, $C_{i}$ is empty. Hence, we can easily obtain that $y_{i}^{*}=y$ before adding server $j$ , and $\bar{y}^{*}_{i}=\frac{\bar{\alpha}_{j}y}{\bar{\alpha}_{i}+\bar{\alpha}_{j}}$ , $\bar{y}^{*}_{j}=\frac{\bar{\alpha}_{i}y}{\bar{\alpha}_{i}+\bar{\alpha}_{j}}$ after adding server $j$ based on Lemma 1. Thus, $\mathtt{I}=\frac{\bar{\alpha}_{i}+\bar{\alpha}_{j}}{\bar{\alpha}_{i}}$ . In subsequent iterations, to derive $\mathtt{I}$ , we note the existence of following relationships:

		$\displaystyle\mathtt{I}=\frac{y_{l}^{}\alpha_{l}}{\bar{y}_{l}^{}\bar{\alpha}_{l}},l\in C_{i}\cup\{i\}$			(3a)
		$\displaystyle\bar{y}_{j}^{}+\sum_{l\in C_{i}\cup\{i\}}\bar{y}_{l}^{}=y$			(3b)
		$\displaystyle\sum_{\forall l\in C_{i}\cup\{i\}}y_{l}^{*}=y$			(3c)

which yields

\displaystyle\bar{y}_{j}^{*}

\displaystyle=\sum_{l\in C_{i}\cup\{i\}}(1-\frac{\alpha_{l}}{\mathtt{I}\bar{\alpha_{l}}})y_{l}^{*}

(4)

Since $\bar{y}_{j}^{*}\bar{\alpha}_{j}=\bar{y}_{l}^{*}\bar{\alpha}_{l}$ , $\mathtt{I}=\frac{y_{l}^{*}\alpha_{l}}{\bar{y}_{l}^{*}\bar{\alpha}_{l}}$ , and $y_{l}^{*}\alpha_{l}=y_{i}^{*}\alpha_{i}$ , we have

\bar{y}_{j}^{*}=\frac{y_{l}^{*}\alpha_{l}}{\bar{\alpha}_{j}\mathtt{I}}=\frac{y_{i}^{*}\alpha_{i}}{\bar{\alpha}_{j}\mathtt{I}}

(5)

Combining (3c), (4), and (5), we can derive that

\displaystyle\mathtt{I}=\begin{cases}\frac{\bar{\alpha}_{i}+\bar{\alpha}_{j}}{\bar{\alpha}_{i}},&\text{if }k=0\\ \frac{\bar{\alpha}_{j}\sum_{l\in C_{i}\cup\{i\}}\frac{y_{l}^{*}\alpha_{l}}{\bar{\alpha}_{l}}+\bar{\alpha}_{j}y_{i}^{*}}{\bar{\alpha}_{j}y},&\text{if }k\geq 1\end{cases}

(6)

Since $y^{*}_{l}$ was obtained in the previous iteration, i.e., $y^{*(k)}_{l}=\bar{y}^{*(k-1)}_{l}$ , where superscript $(k)$ indicates the iteration index, (6) can be readily computed. Moreover, once $\mathtt{I}$ is obtained, we can derive $\bar{y}_{l}^{*}$ by $\bar{y}_{l}^{*}=\frac{y_{l}^{*}\alpha_{l}}{\bar{\alpha_{l}}\mathtt{I}},\forall l\in C_{i}\cup\{i\}$ , and $\bar{y}_{j}^{*}$ can be obtained by (4). Algorithm 1 summarizes the procedure.

Algorithm 1 LCF(

i,\mathcal{N}_{i}

y

)

C_{i}\leftarrow\emptyset

\mathcal{F}\leftarrow\mathcal{N}_{i}

\alpha_{i}\leftarrow\gamma_{i}

\bar{\alpha}_{i}\leftarrow\gamma_{i}

y^{*}_{i}\leftarrow y

;

2: for

k=0\ \text{to}\ |\mathcal{N}_{i}|-1

3: Compute

\bar{\alpha}_{l}

using (2),

\forall l\in\mathcal{F}

;

j\leftarrow\operatorname*{arg\,min}_{l\in\mathcal{F}}\bar{\alpha}_{l}

;

5: Compute

\mathtt{I}

by (6);

6: if

\mathtt{I}>1

then

7: Compute

\bar{y}_{j}^{*}

by (4);

{y}_{l}^{*}\leftarrow\frac{y_{l}^{*}\alpha_{l}}{\bar{\alpha}_{l}\mathtt{I}}

\forall l\in C_{i}\cup\{i\}

;

y_{j}^{*}\leftarrow\bar{y}_{j}^{*}

;

C_{i}\leftarrow C_{i}\cup\{j\}

;

\mathcal{F}\leftarrow\mathcal{F}\setminus\{j\}

;

10:

\alpha_{l}\leftarrow\bar{\alpha}_{l}

\forall l\in C_{i}

;

11: else

12: Break;

13: end if

14: end for

15: return

C_{i}

\{y^{*}_{l}\}_{l\in\{C_{i}\}\cup\{i\}}

Remark 1.

Inspired by Algorithm 1, we can solve the optimization problem $\mathcal{P}_{1}$ efficiently using an iterative procedure as outlined in Algorithm 2, which generates the optimal solution.

Algorithm 2 OptiSolver-

\mathcal{P}_{1}

(

i,C_{i},y

)

\boldsymbol{y}^{*}\leftarrow\emptyset

k\leftarrow 0

;

2: for

\forall j\in C_{i}

3: Compute

\bar{\alpha}_{j}

\mathtt{I}

using (2) and (6), respectively;

4: Calculate

\bar{y}_{j}^{*}

by (4);

{y}_{l}^{*}\leftarrow\frac{y_{l}^{*}\alpha_{l}}{\bar{\alpha}_{l}\mathtt{I}}

\forall l\in C_{i}\cup\{i\}

;

y_{j}^{*}\leftarrow\bar{y}_{j}^{*}

;

\boldsymbol{y}^{*}\leftarrow\boldsymbol{y}^{*}\cup\{y_{j}^{*}\}

;

C_{i}\leftarrow C_{i}\cup\{j\}

;

\alpha_{l}\leftarrow\bar{\alpha}_{l}

\forall l\in C_{i}

;

k\leftarrow k+1

;

7: end for

8: return

\boldsymbol{y}^{*}

III-B Cluster Selection

In this phase, the master examines the candidate clusters formed in the LCF phase, selects the CHs and their associated CMs, and resolves any overlaps between clusters.

To select CHs, the master evaluates each server $i\in\mathcal{N}_{0}$ within its communication range, following a procedure similar to that of LCF. Particularly, in each iteration, the master identifies the neighboring server $i\in\mathcal{N}_{0}$ with the highest processing capacity and adds it to the set of CHs, denoted as $C_{0}$ , if doing so would reduce task processing time. The iteration stops when no further reduction in processing is achievable or when all neighboring servers of the master have been examined.

Unlike the processing capacity defined for individual servers in the LCF phase, the processing capacity of each candidate CH $i\in\mathcal{N}_{0}$ here is defined as the time required for it and its CMs, as a team, to receive and process a unit-sized task. Specifically, for server $i$ , the time to receive a unit-sized task from the master is $\frac{1}{R_{0i}}$ . Moreover, once server $i$ receives this task, according to Lemma 1, the minimum time required for it and its CMs to process this task is $\frac{y_{i}^{*}\gamma_{i}}{y}$ , where $y_{i}^{*}$ is the output of Algorithm 2. Therefore, the processing capacity of server $i\in\mathcal{N}_{0}$ in the GCF phase, denoted as $\eta_{i}$ , is given by $\eta_{i}=\frac{y_{i}^{*}\gamma_{i}}{y}+\frac{1}{R_{0i}}$ . Notably, the processing capacity of the master is $\eta_{0}=\gamma_{0}$ .

Moreover, to assess whether adding server $i$ to the set of CHs, $C_{0}$ , would improve performance, we use the same performance indicator $\mathtt{I}$ , which can be computed similarly as detailed in the previous subsection. Particularly, at the $k$ -th iteration, we have:

\displaystyle\mathtt{I}=\begin{cases}\frac{\bar{\eta}_{0}+\bar{\eta}_{i}}{\bar{\eta}_{0}},&\text{if }k=0\\ \frac{\bar{\eta}_{i}\sum_{l\in C_{0}\cup\{0\}}\frac{y_{l}^{*}\eta_{l}}{\bar{\eta}_{l}}+\bar{\eta}_{i}y_{0}^{*}}{\bar{\eta}_{i}Y},&\text{if }k\geq 1\end{cases}

(7)

where $\eta_{l}$ and $\bar{\eta}_{l}$ denote the processing capacities of server $l\in C_{0}\cup\{0\}$ before and after adding server $i$ to the cluster $C_{0}$ , and we have

\bar{\eta}_{l}=\frac{\bar{y}_{l}^{*}\gamma_{l}}{y}+\frac{|C_{0}|+1}{Blog_{2}(1+\beta_{0l}d_{0l}^{-2})}

(8)

when $l\in C_{0}\cup\{0\}$ . $y^{*}_{l}$ and $\bar{y}^{*}_{l}$ denote the optimal task allocations before and after adding server $i$ . Similarly, the updating equation for $\bar{y}_{l}^{*}$ is $\bar{y}_{l}^{*}=\frac{y_{l}^{*}\eta_{l}}{\bar{\eta_{l}}\mathtt{I}},\forall l\in C_{0}\cup\{0\}$ , and we have:

\bar{y}_{i}^{*}=\sum_{l\in C_{0}\cup\{0\}}(1-\frac{\eta_{l}}{\mathtt{I}\bar{\eta_{l}}})y_{l}^{*}

(9)

To resolve any overlaps between clusters, the selected CH $i$ and its CMs are “removed” from the network at the end of each iteration and do not participate in subsequent iterations. At the start of each new iteration, the unselected neighbors of the master undergo the LCF phase to update their CMs, ensuring that clusters remain distinct and non-overlapping. Algorithm 3 outlines the core procedure of our approach, which constructs the three-layer tree topology $\mathcal{T}$ and determines the associated optimal task allocation $\{y^{*}_{l}\}_{\mathcal{T}}$ .

Remark 2.

The task allocation $\{y^{*}_{l}\}_{\mathcal{T}}$ generated by our approach in Algorithm 3 is optimal for the topology $\mathcal{T}$ .

Algorithm 3 DNTD-TO(

Y

)

\mathcal{T}\leftarrow\{0\}

y\leftarrow Y

\mathcal{F}\leftarrow\mathcal{N}_{0}

\beta_{0}\leftarrow\gamma_{0}

\bar{\beta}_{0}\leftarrow\gamma_{0}

y_{0}^{*}\leftarrow Y

;

2: for

k=0\ \text{to}\ |\mathcal{N}_{0}|-1

\{C_{l},\{y^{*}_{h}\}_{h\in\{C_{l}\}\cup\{l\}}\}\leftarrow

LCF(

l,\mathcal{N}_{l},y

\forall l\in\mathcal{F}

;

4: Compute

\bar{\beta}_{l}

using (8),

\forall l\in\mathcal{F}

;

i=\operatorname*{arg\,min}_{l\in\mathcal{F}}\bar{\beta}_{l}

;

6: Compute

\mathtt{I}

using (7);

7: if

\mathtt{I}>1

then

8: Compute

\bar{y}_{i}^{*}

by (9);

y_{l}^{*}\leftarrow\frac{y_{l}^{*}\beta_{l}}{\bar{\beta_{l}}\mathtt{I}},\forall l\in C_{0}\cup\{0\}

;

y_{i}^{*}\leftarrow\bar{y}_{i}^{*}

;

10:

\mathcal{T}\leftarrow\mathcal{T}\cup\{j\}\cup C_{j}

;

C_{0}\leftarrow C_{0}\cup\{j\}

;

\mathcal{F}\leftarrow\mathcal{F}\setminus\{j\}

;

11:

\beta_{l}\leftarrow\bar{\beta}_{l}

\forall l\in C_{0}

;

12:

\mathcal{N}_{i}\leftarrow\{l|a_{il}=1,l\in\mathcal{V}\setminus\mathcal{T}\}

\forall i\in\mathcal{F}

;

13: else

14: Break;

15: end if

16: end for

17: return

\mathcal{T}

\{y^{*}_{l}\}_{l\in\mathcal{T}}

IV Simulation studies

In this section, we conduct simulation studies to evaluate the performance of our approach.

IV-A Experiment Setup

We randomly generate and distribute servers within a $100\times 100$ m area, with the exception of the master, which is fixed at the location $(20,20)$ m. The computing power of each server $f_{i}$ is sampled from a uniform distribution over $[0.1,10]$ MHz, with $b$ set to $1$ . The SNR ratio $\beta_{ij}$ is sampled from a uniform distribution over $[30,40]$ dbm. The task size $Y$ is set to $100$ Gbits, and the bandwidth $B$ is set to 50MHz. To evaluate the performance of our method, we compare it with the following three benchmarks:

•

Unequal Cluster (Unequal)[14]: This method selects CHs by comparing servers’ computing power; if two CHs are within each other’s communication range, the server with higher computing power is selected as the CH. CMs join the nearest CH.
•

Leach-C [21]: A centralized approach that calculates each server’s probability of being a CH based on computing power (originally, energy level was used), with the top servers selected as CHs. CMs join the CH with the highest received signal strength (originally, minimal communication energy was used).
•

LBAS [22]: CHs are selected using an iterative approach that considers distance, number of neighbors, and computing power, with higher-scoring servers selected as CHs. A similar procedure is applied to select CMs.
•

Dijkstra’s [8]: A tree topology is constructed by identifying the shortest routes with the least transmission delay from the master to each server using the Dijkstra’s algorithm. Servers more than two hops away are pruned.

Notably, these benchmarks only generate the tree topology $\mathcal{T}$ . The associated optimal task allocation is derived using the same approach as ours.

IV-B Experiment Results

In the first experiment, we compare the performance of different approaches across networks of varying sizes and topologies. Specifically, we examine a small-scale network with $N=20$ servers and a large-scale network with $N=100$ servers. For each network size, we randomly generate 10 different topologies by varying server locations, with each server’s communication range set to $50$ m. As shown in Fig. (1), our approach outperforms all benchmarks in both small-scale (Fig. 1) and large-scale networks (Fig. 1). The performance advantage is more pronounced in large-scale networks, due to the increase of servers that can potentially slow down processing if included without careful selection. Comparing the two figures, we can see that the task completion time generally decreases with the increase of the number of servers, due to more servers participating in computations.

In the second experiment, we compare the performance of different approaches under varying server communication ranges. Specifically, we consider $N=20$ servers distributed over the simulation area, as illustrated in Fig. 2. The communication range $\xi$ is configured to be the same for each server, and we vary $\xi$ from $10$ m to $130$ m. From Fig. 2, we can see that our approach outperforms all benchmarks and continues to improve as the communication range $\xi$ increases. This improvement occurs because, as $\xi$ grows, network connectivity strengthens as more servers within each other’s communication range. This expanded connectivity enables more servers to contribute to task processing, which, when properly selected, further reduces task completion time.

Refer to caption — Figure 1: Performance comparison across different topologies when (a) $N=20$ and (b) $N=100$ .

V Conclusion and Future Works

In this paper, we investigated the design of network topology to improve computational efficiency for task offloading in MEC. The proposed approach, DNTD-TO, draws inspiration from communication and sensor networks and builds three-layered network structures for task offloading in an iterative, decentralized manner. Additionally, it generates the optimal task allocation for the designed topology. To evaluate its performance, we conducted various comparison studies. The simulation results show that DNTD-TO significantly outperforms existing topology design methods. In the future, we will explore network structures with more than three layers and consider networks with mobile servers.

Acknowledgment

We would like to thank the National Science Foundation under Grant CAREER-2048266 and CCRI-1730675 for the support of this work.

[Proof of Lemma 1] It is straightforward that when $C_{i}=\emptyset$ , we have $J^{*}=\alpha_{i}y$ . When $C_{i}\neq\emptyset$ , we can solve problem $\mathcal{P}_{1}$ by relaxing it to a linear programming problem as follows:

	$\displaystyle\mathcal{P}_{2}:$	$\displaystyle\min z$
	$\displaystyle s.t.\quad$	$\displaystyle z\geq y_{l}\alpha_{l},l\in C_{i}\cup\{i\}$
		$\displaystyle\quad\quad y_{0}+\sum_{l\in C_{i}}y_{l}=y$

To solve problem $\mathcal{P}_{2}$ , the Lagrangian multiplier method [8] can be used, with the Lagrangian function given by $\mathcal{L}=z+\sum_{l\in C_{i}\cup\{i\}}\lambda_{l}(y_{l}\alpha_{l}-z)+\mu(\sum_{l\in C_{i}\cup\{i\}}y_{l}-y)$ , where $\boldsymbol{\lambda}=\{\lambda_{l}\geq 0|l\in C_{i}\cup\{i\}\}$ and $\mu$ are Lagrangian multipliers. As it fulfills the Slater’s condition [23], we can resort to the Karush-Kuhn-Tucker (KKT) condition [24] to solve this problem:

		$\displaystyle\frac{\partial}{\partial y_{l}}\mathcal{L}=0,~{}\forall l\in C_{i}\cup\{i\}$			(10a)
		$\displaystyle\frac{\partial}{\partial z}\mathcal{L}=0$			(10b)
		$\displaystyle\lambda_{l}(y_{l}\alpha_{l}-z)=0,~{}\forall l\in C_{i}\cup\{i\}$			(10c)
		$\displaystyle\sum_{l\in C_{i}\cup\{i\}}y_{l}=y$
		$\displaystyle y_{l}\alpha_{l}-z\leq 0,~{}\forall l\in C_{i}\cup\{i\}$
		$\displaystyle\lambda_{l}\geq 0,~{}\forall l\in C_{i}\cup\{i\}$

If $\alpha_{l}y^{*}_{l}\neq\alpha_{i}y^{*}_{i}$ and let $j=\operatorname*{arg\,max}_{l\in C_{i}\cup\{i\}}\alpha_{l}y^{*}_{l}$ , then (10a) - (10c) can be rewritten as:

		$\displaystyle\frac{\partial}{\partial y_{j}}\mathcal{L}=\lambda_{j}\alpha_{j}+\mu=0$			(11a)
		$\displaystyle\frac{\partial}{\partial z}\mathcal{L}=1-\sum_{l\in C_{i}\cup\{i\}}\lambda_{l}=0$			(11b)
		$\displaystyle\lambda_{l}(\alpha_{l}y_{l}-z)=0,\forall l\neq n,l\in C_{i}\cup\{i\}$			(11c)

Since $z=\alpha_{j}y^{*}_{j}$ , we can derive from (11c) that $\lambda_{l}=0,\forall l\neq j,l\in C_{i}\cup\{i\}$ , from (11b) that $\lambda_{j}=1$ , and then from (11a) that $\mu=-\alpha_{j}<0$ . However, for $\forall l\neq j$ , (10a) can also be written as $\frac{\partial}{\partial y_{l}}\mathcal{L}=\lambda_{l}\alpha_{l}+\mu=0=\mu$ . This produces a conflict value for $\mu$ . Therefore, the optimal task allocation has to satisfy $\alpha_{l}y^{*}_{l}=\alpha_{i}y^{*}_{i},\forall l\in C_{i}$ , and $J^{*}=\alpha_{l}y^{*}_{l}=\alpha_{i}y^{*}_{i}$ .

References

[1] X. Yang, Z. Chen, K. Li, Y. Sun, N. Liu, W. Xie, and Y. Zhao, “Communication-constrained mobile edge computing systems for wireless virtual reality: Scheduling and tradeoff,” IEEE Access, vol. 6, pp. 16 665–16 677, 2018.
[2] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edge computing—a key technology towards 5g,” ETSI white paper, vol. 11, no. 11, pp. 1–16, 2015.
[3] J. Yang, A. A. Shah, and D. Pezaros, “A survey of energy optimization approaches for computational task offloading and resource allocation in mec networks,” Electronics, vol. 12, no. 17, p. 3548, 2023.
[4] J. Linderoth, M. Yoder et al., “Metacomputing and the master-worker paradigm,” Mathematics and Computer Science Division, Argonne National Laboratory, Tech. Rep. ANL/MCS-P792–0200, 2000.
[5] B. Wang, J. Xie, K. Lu, Y. Wan, and S. Fu, “Learning and batch-processing based coded computation with mobility awareness for networked airborne computing,” IEEE Transactions on Vehicular Technology, vol. 72, no. 5, pp. 6503–6517, 2023.
[6] H. Qi, M. Liwang, X. Wang, L. Li, W. Gong, J. Jin, and Z. Jiao, “Bridge the present and future: A cross-layer matching game in dynamic cloud-aided mobile edge networks,” IEEE Transactions on Mobile Computing, 2024.
[7] F. Liu, J. Huang, and X. Wang, “Joint task offloading and resource allocation for device-edge-cloud collaboration with subtask dependencies,” IEEE Transactions on Cloud Computing, vol. 11, no. 3, pp. 3027–3039, 2023.
[8] K. Ma and J. Xie, “Joint task allocation and scheduling for multi - hop distributed computing,” in ICC 2024 - IEEE International Conference on Communications, 2024, pp. 2664–2669.
[9] R. Priyadarshi, “Energy-efficient routing in wireless sensor networks: a meta-heuristic and artificial intelligence-based approach: a comprehensive review,” Archives of Computational Methods in Engineering, vol. 31, no. 4, pp. 2109–2137, 2024.
[10] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Energy-efficient communication protocol for wireless microsensor networks,” in Proceedings of the 33rd annual Hawaii international conference on system sciences. IEEE, 2000, pp. 10–pp.
[11] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “An application-specific protocol architecture for wireless microsensor networks,” IEEE Transactions on Wireless Communications, vol. 1, no. 4, pp. 660–670, 2002.
[12] G. Koltsidas and F.-N. Pavlidou, “A game theoretical approach to clustering of ad-hoc and sensor networks,” Telecommunication Systems, vol. 47, pp. 81–93, 2011.
[13] D. Xie, Q. Sun, Q. Zhou, Y. Qiu, and X. Yuan, “An efficient clustering protocol for wireless sensor networks based on localized game theoretical approach,” International Journal of Distributed Sensor Networks, vol. 9, no. 8, p. 476313, 2013.
[14] G. Chen, C. Li, M. Ye, and J. Wu, “An unequal cluster-based routing protocol in wireless sensor networks,” Wireless Networks, vol. 15, pp. 193–207, 2009.
[15] R. Logambigai, S. Ganapathy, and A. Kannan, “Energy–efficient grid–based routing algorithm using intelligent fuzzy rules for wireless sensor networks,” Computers & Electrical Engineering, vol. 68, pp. 62–75, 2018.
[16] Y.-K. Chiang, N.-C. Wang, and C.-H. Hsieh, “A cycle-based data aggregation scheme for grid-based wireless sensor networks,” Sensors, vol. 14, no. 5, pp. 8447–8464, 2014.
[17] H. Farman, H. Javed, J. Ahmad, B. Jan, and M. Zeeshan, “Grid-based hybrid network deployment approach for energy efficient wireless sensor networks,” Journal of Sensors, vol. 2016, no. 1, p. 2326917, 2016.
[18] Y. Wang, Z.-Y. Ru, K. Wang, and P.-Q. Huang, “Joint deployment and task scheduling optimization for large-scale mobile users in multi-uav-enabled mobile edge computing,” IEEE transactions on cybernetics, vol. 50, no. 9, pp. 3984–3997, 2019.
[19] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efficient resource allocation for mobile-edge computation offloading,” IEEE Transactions on Wireless Communications, vol. 16, no. 3, pp. 1397–1411, 2016.
[20] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization in uav-enabled wireless-powered mobile-edge computing systems,” IEEE Journal on Selected Areas in Communications, vol. 36, no. 9, pp. 1927–1941, 2018.
[21] W. B. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, “An application-specific protocol architecture for wireless microsensor networks,” IEEE Transactions on wireless communications, vol. 1, no. 4, pp. 660–670, 2002.
[22] W. Osamy, B. Alwasel, A. Salim, A. M. Khedr, and A. Aziz, “Lbas: Load balancing aware clustering scheme for iot-based heterogeneous wireless sensor networks,” IEEE Sensors Journal, 2024.
[23] A. Auslender and M. Teboulle, “Lagrangian duality and related multiplier methods for variational inequality problems,” SIAM Journal on Optimization, vol. 10, no. 4, pp. 1097–1115, 2000.
[24] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 1426–1438, 2006.