This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Topology-aware Microservice Architecture in Edge Networks: Deployment Optimization and Implementation

Yuang Chen, , Chang Wu, Fangyu Zhang, Chengdi Lu, Yongsheng Huang, and Hancheng Lu Yuang Chen, Chang Wu, Fangyu Zhang, Chengdi Lu, Yongsheng Huang, and Hancheng Lu are with the University of Science and Technology of China, Hefei, 230027, China. (email: {\{yuangchen21, changwu, fv215b, lcd1999, ysh6}\}@mail.ustc.edu.cn; [email protected]). Hancheng Lu is also with the Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China.
Abstract

As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services. However, it also presents significant challenges stemming from dispersed node locations and intricate network topologies. In this paper, we have proposed a topology-aware MSA characterized by a three-tier network traffic model encompassing the service, microservices, and edge node layers. This model meticulously characterizes the complex dependencies between edge network topologies and microservices, mapping microservice deployment onto link traffic to accurately estimate communication delay. Building upon this model, we have formulated a weighted sum communication delay optimization problem considering different types of services. Then, a novel topology-aware and individual-adaptive microservices deployment (TAIA-MD) scheme is proposed to solve the problem efficiently, which accurately senses the network topology and incorporates an individual-adaptive mechanism in a genetic algorithm to accelerate the convergence and avoid local optima. Extensive simulations show that, compared to the existing deployment schemes, TAIA-MD improves the communication delay performance by approximately 30% to 60% and effectively enhances the overall network performance. Furthermore, we implement the TAIA-MD scheme on a practical microservice physical platform. The experimental results demonstrate that TAIA-MD achieves superior robustness in withstanding link failures and network fluctuations.

Index Terms:
microservice, network topology, load balancing, complex dependency, communication delay, link failure, network fluctuation.

I Introduction

With the proliferation of smartphones and internet-of-things (IoT) devices [1], along with the widespread adoption of cloud and edge computing and the explosive growth of digital content and services, the scale of users and the resource demands for network services have surged dramatically [2]. For instance, ChatGPT reached over one billion monthly active users within less than two months of its launch [3]. Currently, mobile applications are increasingly focusing on being location-aware, compute-intensive, and latency-sensitive [1, 4, 5]. Combined with continuously growing user bases and increasing quality-of-service (QoS) requirements, the limitations of cloud computing, which centralizes service operations in the cloud, are gradually becoming apparent [6, 7]. Edge computing has emerged as an advanced computing paradigm that enables the deployment of computing and storage resources closer to users, significantly improving real-time performance and reducing data traffic costs [8, 9, 10, 11]. To fully utilize computing resources, edge computing must reallocate hardware and software resources via virtualization technologies to serve multiple users on the same hardware [12]. Nevertheless, compared to centralized cloud networks, edge networks face tougher challenges in terms of user mobility, device heterogeneity, limited resources, and geographically dispersed edge nodes [13, 14, 15]. Moreover, commonly used virtualization technologies based on virtual machines (VMs) are impractical for edge environments due to their high resource overhead, slow startup times, and complex deployment and migration processes [16, 17].

To alleviate the above issues, a more distributed and service-oriented application architecture [18, 19], microservice architecture (MSA), has recently emerged as a potential enabler for flexible service deployment in distributed scenarios [20, 21, 22, 14, 23, 24]. In MSA, complex monolithic applications are decomposed into multiple logically distinct, mutually complementary, and single-function microservices, which cooperate with each other to provide combined functionality [14, 25, 19]. This approach offers significant flexibility, robustness, and scalability for the deployment of distributed services. Each microservice in MSA can be independently developed, upgraded, and deployed using different programming languages, thus effectively addressing the scalability, maintenance, and team collaboration issues inherent in traditional monolithic applications [26]. Currently, many internet giants such as Amazon, Netflix, and eBay have extensively exploited MSA, and their research and development progress reports claim the effectiveness of MSA [27]. For instance, Amazon emphasized in [28] that MSA can achieve more controllable and faster production, easily expand computing resources, reduce unit complexity, and create more scalable development teams. In addition, microservices are typically deployed using lightweight virtualization technologies like containers [29], which leads to faster deployment and startup, lower resource consumption, and greater compatibility, and can be flexibly deployed on resource-constrained, platform-heterogeneous edge or fog devices [30]. Consequently, developing MSA-based service deployment approaches for edge networks is expected to enhance the flexibility and scalability of services, becoming one of the mainstream technologies for service deployment.

I-A Motivations and Challenges

Refer to caption
Figure 1: Complex dependencies among microservices. (a) Microservice invocations for services. (b) Invocations among microservice instances.
Refer to caption
Figure 2: The impacts of microservice deployment schemes on the QoS of services. (a) Microservice Invocations. (b) Deployment Scheme 1. (c) Deployment Scheme 2.

Although MSA is anticipated to provide finer-grained deployment schemes for edge networks, enhancing flexibility and scalability, and speeding up application development cycles, the development of microservices in edge networks still encounters numerous stringent challenges.

I-A1 Complex microservice dependencies in edge networks with geographically dispersed nodes

The geographical dispersion of edge nodes necessitates complex topology forwarding for inter-device communications, which is exacerbated by the limited and unstable link bandwidth, resulting in significant differences in communication delay and bandwidth occupancy across different nodes [13, 14]. In addition, microservices in MSA typically have complex dependencies [31], as illustrated in Fig. 1, where MiM_{i} represents a type of microservice and Mi,jM_{i,j} denotes the jj-th instance of MiM_{i}. Fig. 1 (a) shows that fulfilling a service requirement usually necessitates the collaboration of multiple microservices, with various services potentially sharing the same type of microservice. As shown in Fig. 1 (b), each type of microservice can have multiple instances with complex invocation relationships among them. Consequently, compared to traditional monolithic applications, MSA demands greater communication requirements and exhibits more intricate inter-instance relationships. This complicates traffic analysis between edge nodes, and existing work lacks effective analysis models [32, 22, 20]. In this regard, an effective topology-aware microservice deployment model is necessitated to map microservice deployments onto edge nodes’ link traffic, thereby accurately characterizing the complex edge network topology as well as inter-microservice dependencies.

I-A2 Difficulties of implementing optimal microservice deployment in edge networks

The essence of microservice deployment involves shared resource allocation and communication overhead, which directly impacts the QoS of microservice-based applications [20]. As illustrated in Fig. 2, the impacts of microservice deployment schemes on the QoS of services are presented. In Fig. 2 (a), the microservice invocations required for a specific service are depicted, where the node labels represent the microservice types and the arrow labels indicate the order of execution for the request or response. Fig. 2 (b) and Fig. 2 (c) show two different deployment schemes for the microservice invocations described in Fig. 2 (a), where node labels represent the deployed microservice types and the connecting line labels denote the inter-node communication delay (in ms). Compared to deployment scheme 1, deployment scheme 2 significantly reduces communication delay by collocating frequently interacting microservice instances on the same or nearby computing nodes. In particular, based on the microservice invocations depicted in Fig. 2 (a), from the moment a user request is received at the user-access node 𝐀\mathbf{A} to the moment a user response returns to node 𝐀\mathbf{A}, the total communication delays generated by deployment schemes 1 and 2 are 4646 ms and 2222 ms, respectively. Moreover, deployment scheme 2 deploys instances of microservice 𝟐\mathbf{2} on both computing nodes 𝐃\mathbf{D} and 𝐅\mathbf{F}, which allows microservice 𝟏\mathbf{1} on computing node 𝐃\mathbf{D} to invoke microservice 𝟐\mathbf{2} locally, thereby avoiding cross-node communication and reducing communication delay and overhead [33].

The aforementioned analysis highlights that a well-designed microservice deployment scheme in distributed edge networks can remarkably conserve network resources and reduce communication delay. Due to the limited computing capacity of edge nodes, multiple microservice instances are usually deployed across distributed nodes to meet the QoS requirements of applications and avoid single points of failure [34, 33], which can effectively combat cross-node communication and further reduce communication delay and overhead [33]. What’s more, deploying multiple instances on the same edge node may lead to resource contention, and allocating more resources to one microservice may slow down the operation speed of other microservices [20]. As a result, microservice deployment optimization must be dynamically adjusted according to service load since microservices compete for resources with each other, and the QoS of the service is affected by almost all microservices.

Refer to caption
Figure 3: An example of bandwidth collision in edge networks.

I-A3 The prerequisite of optimal microservice deployment necessitates establishing an effective traffic analysis model

In edge networks with massive services and users, some links carry multiple service traffic concurrently, leading to competition for network resources [20]. Fig. 3 illustrates the bandwidth collisions of service traffic in edge networks, where two types of services are realized through mutual invocation among various microservices. Both service 1 and service 2 need to invoke microservice 4 and microservice 5, resulting in bandwidth collisions as their traffic inevitably crosses the link marked with the red star in Fig. 3. Therefore, optimizing microservice deployment necessitates an effective traffic analysis model to accurately capture complex microservice dependencies and network topologies. However, this is extremely difficult. Existing studies usually simplify edge networks by considering the bandwidth and delay between nodes as fixed values, which fails to fully incorporate network topology and ignores the impact of deployment schemes on link bandwidth, resulting in lower-than-expected communication capacity between instances and even congestion [31, 20, 35]. Furthermore, most studies focus only on simulation validation and lack actual physical platform verification, which ignores the feasibility of microservice deployment schemes deployed in real systems and leads to inaccurate estimation of link bandwidth. Therefore, implementing and validating the performance of deployment schemes in actual physical validation platforms is essential for optimal microservice deployment schemes in edge networks with intricate topologies [36, 13, 23, 24].

I-B Main Contributions

To effectively overcome the aforementioned challenges, we have proposed an innovative topology-aware MSA, which incorporates a three-tier network traffic analysis model. Then, we have formulated a weighted sum communication delay optimization problem that aims to improve delay performance by optimizing microservices deployment. To effectively tackle this problem, we have developed a novel topology-aware and individual-adaptive microservices deployment (TAIA-MD) scheme. Extensive simulations and physical platform validations demonstrate the superior performance of our proposed TAIA-MD scheme. The primary contributions of this paper are summarized as follows:

  • For the first time, we have proposed an innovative topology-aware MSA, which features a three-tier network traffic analysis model comprising the service, microservice, and edge node layers. This model meticulously captures the complex inter-microservice dependencies and edge network topologies by mapping microservice deployments onto edge nodes’ link traffic to accurately estimate the communication delay.

  • Building upon this model, we have formulated a weighted sum communication delay optimization problem that considers the load conditions of services, aiming to improve delay performance through optimizing microservices deployment. To address this intractable problem, we have developed a novel microservice deployment scheme named TAIA-MD, which customizes the deployment scheme by sensing network topologies and incorporates an individual-adaptive mechanism in the genetic algorithm (GA) to accelerate the convergence and avoid local optima.

  • Extensive simulations demonstrate that TAIA-MD can reduce communication delay by approximately 30% to 60% compared to existing deployment schemes in bandwidth-constrained and topology-complex edge networks. Moreover, physical platform experiments further show the superior robustness of the TAIA-MD in effectively combating link failures and network fluctuations.

The remainder of this paper is organized as follows. In Section II, we provide an in-depth literature review of relevant studies. The topology-aware MSA is introduced in Section III. Section IV formulates and solves the weighted sum communication delay optimization problem. The results and analysis of simulations and physical verifications are provided in Section V. Finally, Section VI concludes this paper and discusses future directions.

II RELATED WORKS

In this section, we first review earlier literature primarily focused on microservice deployments centered around cloud environments. Then, we comprehensively summarize and examine the latest microservice deployment schemes in edge networks. Finally, we identify limitations related to the aforementioned challenges from these literature reviews.

II-A Deployment Schemes Centered around Cloud Environments

Due to the highly decoupled nature of MSA, implementing a certain service usually requires close collaboration among numerous microservices [37, 38]. However, deploying hundreds or thousands of microservice instances on resource-constrained servers presents a formidable challenge [39, 38, 13, 23, 24]. Research on microservice deployment schemes is always of paramount importance. Early studies on microservice deployment primarily centered around cloud environments [40, 41, 42]. In [40], the authors have considered resource contention and deployment time among microservices, and proposed a parallel deployment scheme to aggregate microservices that compete for as diverse resources as possible, thereby minimizing interference in resource-constrained cloud systems. To alleviate the enormous pressure brought by the scale of microservices on cluster management in cloud computing, the authors in [41] have developed a topology-aware deployment framework that leverages a heuristic graph mapping algorithm to capture the topological structure of microservices and clusters, thereby minimizing network overhead for applications. In order to mitigate communication delays caused by intricately interdependent microservices, the authors in [42] have explored a novel deployment scheme that leverages the fine-grained and comprehensive modeling of microservice dependencies.

II-B Latest Deployment Schemes for Edge Networks

Microservice deployment in edge networks is receiving increasing attention due to the flourishing development of latency-sensitive services, as well as advancements in edge or fog computing techniques. In [20] and [43], the authors have proposed a novel MSA called Nautilus to deploy microservice instances in the cloud-edge continuum effectively. This MSA adjusts instance placement and achieves load balancing by migrating containers from busy nodes, thereby ensuring the required QoS under external shared resource contention. To mitigate the impact of network load and routing on service reliability, in [44], we have proposed a network-aware service reliability model that effectively captures the correlation between network state changes and reliability. In addition, we have designed a service reliability-aware deployment algorithm, which leverages shared path reliability calculation to improve service failure tolerance and bandwidth consumption. In [45], the authors have studied an optimal microservice deployment scheme to balance layer sharing and chain sharing in resource-constrained edge servers. This scheme effectively addresses the challenges of microservice image pull delay and communication overhead minimization.

II-C Limitations of Literature Reviews

Although current research reveals valuable insights into microservice deployment in edge networks, it seldom accounts for the tangible impact of edge environments and resource contention [37, 38, 39, 13, 23, 24, 40, 41, 42, 20, 43, 44, 45]. The widespread distribution of edge devices and nodes, multi-hop forwarding characteristics of inter-node communications, and the complex intrinsic dependencies among microservices make accurate traffic analysis modeling a mystery [31, 20, 35]. Concurrently, the coexistence of multiple flows on numerous links sparks resource competition, leading to significant variations in link performance across deployment strategies. Moreover, improper deployment might overwhelm links, drastically degrading data transfer rates and increasing service delays. Nevertheless, prevailing studies typically simplify traffic analysis by disregarding the dynamic nature of network topology, node bandwidth, and delay [35].

III Topology-aware Microservice Architecture for Edge Networks

Refer to caption
Figure 4: Topology-aware microservice deployment architecture.

As illustrated in Fig. 4, we propose an innovative topology-aware microservice architecture characterized by a three-tier network traffic analysis model, including service layer invocation model, microservice layer data model, and edge-node layer traffic model. This model maps microservice deployments onto edge nodes’ link traffic to meticulously capture the complex inter-microservice dependencies and edge network topology. Let WkW_{k} and NpN_{p} indicate the kk-th type of service and the pp-th node, respectively, where k𝒦{1,,K}k\!\in\!\mathcal{K}\!\triangleq\!\left\{1,\cdots,K\right\} and p𝒫{1,,P}p\!\in\!\mathcal{P}\!\triangleq\!\left\{1,\cdots,P\right\}. The ii-th type of microservice is denoted as MiM_{i}, where i{1,,I}i\!\in\!\mathcal{I}\!\triangleq\!\left\{1,\cdots,I\right\}. Let Mi,aM_{i,a} denote the aa-th instance of MiM_{i}, where a𝒜i{1,,Ai}a\!\in\!\mathcal{A}_{i}\ \!\triangleq\!\left\{1,\cdots,A_{i}\right\} and 𝒜i\mathcal{A}_{i} is the set of instances for MiM_{i}.

For example, consider microservice instances deployed across five computing nodes N1N5N_{1}\!\!\sim\!\!N_{5}. Notably, our proposed microservice architecture is not limited to this example. Instead, it is designed with broad applicability. As shown in Fig. 4, when W1W_{1} is requested, microservices will be invoked in the order M1M2M4M5M_{1}\!\rightarrow\!M_{2}\!\rightarrow\!M_{4}\!\rightarrow\!M_{5}. One possible sequence of instance invocations at the microservice layer can be denoted as M1,1M2,1M4,1M5,2M_{1,1}\!\!\rightarrow\!\!M_{2,1}\!\!\rightarrow\!\!M_{4,1}\!\!\rightarrow\!\!M_{5,2}, which are deployed on nodes N1N_{1}, N3N_{3}, and N4N_{4}. In this case, the core of minimizing communication delay in service execution at the edge-node layer rests with how to optimize the deployment locations of microservices in the microservice layer.

III-A Service Layer Invocation Model

First of all, we introduce the service layer invocation model. To facilitate clarity, we illustrate it with a specific example. As shown in Fig. 5, we examine a particular service invocation executing the service Wk,k𝒦W_{k},k\!\in\!\mathcal{K}. The labels inside each node indicate the type of microservice, while the labels on each arrow represent the execution order of requests or responses. In edge networks, users access the network through their nearby user-access nodes. In particular, user’s requests first arrive at the user-access nodes and are then forwarded to the appropriate computing nodes based on the deployment scheme. To describe the process of user requests transitioning from the user-access node to the first microservice, we introduce a special microservice type M0M_{0} called the virtual head microservice that is occupancy-free of any computational resources.

Let Fi,jF_{i,j} and Ri,j,i,j,ijR_{i,j},i,j\in\mathcal{I},i\neq j indicate a request and response from MiM_{i} to MjM_{j} during the execution of service WkW_{k}, respectively. In this case, the microservice invocation of service WkW_{k} presented in Fig. 5 can be formulated using multiset as follows:

Wk\displaystyle W_{k}\Longrightarrow {F0,1,F1,2,R2,1,F1,3,F3,4,F4,2,R2,4,F4,5,F5,7,\displaystyle\bigg{\{}F_{0,1},F_{1,2},R_{2,1},F_{1,3},F_{3,4},F_{4,2},R_{2,4},F_{4,5},F_{5,7}, (1)
R7,5,F5,6,R6,5,R5,4,F4,2,R2,4,R4,3,R3,1,R1,0}\displaystyle R_{7,5},F_{5,6},R_{6,5},R_{5,4},F_{4,2},R_{2,4},R_{4,3},R_{3,1},R_{1,0}\bigg{\}}
\displaystyle\Longrightarrow {F0,1,F1,2,F1,3,F3,4,2F4,2,F4,5,F5,6,F5,7,\displaystyle\bigg{\{}F_{0,1},F_{1,2},F_{1,3},F_{3,4},2F_{4,2},F_{4,5},F_{5,6},F_{5,7},
R2,1,2R2,4,R3,1,R4,3,R5,4,R6,5,R7,5,R1,0}.\displaystyle R_{2,1},2R_{2,4},R_{3,1},R_{4,3},R_{5,4},R_{6,5},R_{7,5},R_{1,0}\bigg{\}}.

We define k\mathcal{F}_{k} as the request matrix for WkW_{k}, where Fi,jkkF_{i,j}^{k}\in\mathcal{F}_{k} represent the number of requests from MiM_{i} to MjM_{j} during the execution of service WkW_{k}. Thus, k\mathcal{F}_{k} can be given by

k=[0100000000110000000000000000100000200100000000110000000000000000].\mathcal{F}_{k}=\begin{bmatrix}0&1&0&0&0&0&0&0\\ 0&0&1&1&0&0&0&0\\ 0&0&0&0&0&0&0&0\\ 0&0&0&0&1&0&0&0\\ 0&0&2&0&0&1&0&0\\ 0&0&0&0&0&0&1&1\\ 0&0&0&0&0&0&0&0\\ 0&0&0&0&0&0&0&0\\ \end{bmatrix}. (2)

Similarly, the response matrix of service WkW_{k} can be defined by k=(k)T\mathcal{R}^{k}=\left(\mathcal{F}_{k}\right)^{\mathrm{T}}.

Refer to caption
Figure 5: A particular service invocation based on virtual head microservice.

III-B Microservice Layer Data Model

Each type of microservice is usually equipped with multiple instances, and this paper considers scheduling them using the Round Robin method. Thus, the likelihood of each instance Mi,aM_{i,a} for microservice MiM_{i} being invoked is the same and can be given as follows:

PMi,a=1|𝒜i|,i\{0},a𝒜i.P_{M_{i,a}}=\frac{1}{\left|\mathcal{A}_{i}\right|},\forall i\in\mathcal{I}\backslash\left\{0\right\},a\in\mathcal{A}_{i}. (3)

For the case of i=0i=0, it depends on the service arrival rate at each user-access node, which will be explained in detail later.

Let PMi,akP_{M_{i,a}}^{k} denote the probability of selecting instance Mi,aM_{i,a} when invoking microservice MiM_{i} during the execution of service WkW_{k}, where i,k𝒦,a𝒜ii\in\mathcal{I},k\in\mathcal{K},a\in\mathcal{A}_{i}. Let FMi,a,Mj,bkF_{M_{i,a},M_{j,b}}^{k} represent the average frequencies instance Mi,aM_{i,a} requests instance Mj,bM_{j,b} during the execution of service WkW_{k}, then we have

FMi,a,Mj,bk=Fi,jkPMi,akPMj,bk,a𝒜i,b𝒜j,ij.F_{M_{i,a},M_{j,b}}^{k}=F_{i,j}^{k}P_{M_{i,a}}^{k}P_{M_{j,b}}^{k},a\in\mathcal{A}_{i},b\in\mathcal{A}_{j},i\neq j. (4)

Let Di,kreqD_{i,k}^{req} indicate the average requested data size when MiM_{i} invokes MjM_{j}, and F~Mi,a,Mj,bk\widetilde{F}_{M_{i,a},M_{j,b}}^{k} denote the average data size requested by instance Mi,aM_{i,a} from instance Mj,bM_{j,b} during the execution of service WkW_{k}. Then we can obtain that

F~Mi,a,Mj,bk=Di,kreqFMi,a,Mj,bk.\widetilde{F}_{M_{i,a},M_{j,b}}^{k}=D_{i,k}^{req}F_{M_{i,a},M_{j,b}}^{k}. (5)

Similarly, let RMi,a,Mj,bkR_{M_{i,a},M_{j,b}}^{k} indicate the average frequencies that the instance Mi,aM_{i,a} responds to instance Mj,bM_{j,b} when WkW_{k} is invoked, where i,j,a𝒜i,b𝒜ji,j\in\mathcal{I},a\in\mathcal{A}_{i},b\in\mathcal{A}_{j}. The average data size per invocation from MiM_{i} to MjM_{j} and from Mi,aM_{i,a} to Mj,bM_{j,b} can be expressed as Di,jresD_{i,j}^{res} and R~Mi,a,Mj,bk\widetilde{R}_{M_{i,a},M_{j,b}}^{k}, respectively. Since the request and response processes correspond to each other, we can obtain that

RMi,a,Mj,bk=FMj,b,Mi,ak,\displaystyle{R}_{M_{i,a},M_{j,b}}^{k}={F}_{M_{j,b},M_{i,a}}^{k}, (6a)
R~Mi,a,Mj,bk=Di,jresRMi,a,Mj,bk.\displaystyle\widetilde{{R}}_{M_{i,a},M_{j,b}}^{k}=D^{res}_{i,j}{R}_{M_{i,a},M_{j,b}}^{k}. (6b)

III-C Edge-Node Layer Traffic Model

The edge-node layer comprises three types of nodes: user-access nodes for processing user requests, routing nodes for data forwarding, and computing nodes for deploying and executing microservices. All nodes can be represented as set 𝒩\mathcal{N}, with user-acess nodes as set 𝒩acc\mathcal{N}_{acc}, routing nodes as set 𝒩rou\mathcal{N}_{rou}, and computing nodes as set 𝒩cmp\mathcal{N}_{cmp}. In this case, we have 𝒩𝒩acc𝒩rou𝒩cmp\mathcal{N}\triangleq\mathcal{N}_{acc}\cup\mathcal{N}_{rou}\cup\mathcal{N}_{cmp}. We consider service requests arriving at each user-access node follow a Poisson distribution. If the Poisson parameter for WkW_{k} at user-access node Np𝒩accN_{p}\in\mathcal{N}_{acc} is λpk\lambda_{p}^{k}, then the Poisson parameter for the request of service WkW_{k} in the edge-node layer can be expressed as

λk=Np𝒩accλpk.\lambda^{k}=\sum\limits_{N_{p}\in\mathcal{N}_{acc}}\lambda_{p}^{k}. (7)

The computing node for deploying instance Mi,aM_{i,a} can be represented as Li,a𝒩cmpL_{i,a}\in\mathcal{N}_{cmp}. For instances of the virtual head microservice M0M_{0}, the invocation probability of invoking instance M0,cM_{0,c} during the execution of service WkW_{k} can be given by

PM0,ck=λL0,ckλk,c𝒜0.P_{M_{0,c}}^{k}=\frac{\lambda_{L_{0,c}}^{k}}{\lambda^{k}},c\in\mathcal{A}_{0}. (8)

In this case, once microservice MiM_{i} is invoked during the execution of WkW_{k}, the invocation probability of the instance Mi,aM_{i,a} can be denoted as follows:

PMi,ak={λL0,akλk, if i=0,a𝒜0,1|𝒜i|, if i0,a𝒜i.P_{M_{i,a}}^{k}=\begin{cases}\frac{\lambda^{k}_{L_{0,a}}}{\lambda^{k}},&\text{ if }i=0,a\in\mathcal{A}_{0},\\ \frac{1}{\left|\mathcal{A}_{i}\right|},&\text{ if }i\neq 0,a\in\mathcal{A}_{i}.\end{cases} (9)

Given any two nodes NpN_{p}, Nq𝒩N_{q}\in\mathcal{N}, we define the traffic that directly flows from NpN_{p} to NqN_{q} without any forwarding as the direct traffic Sp,qS_{p,q}. If nodes NxN_{x} and NyN_{y} are two adjacent nodes, then the traffic of all types, including forwarding traffic flowing this link, can be defined as the total traffic, represented by S~x,y\widetilde{S}_{x,y}. Let 𝒬p\mathcal{Q}_{p} denote the set of instances deployed on node Np,p𝒩N_{p},p\in\mathcal{N}, then the direct traffic between any two nodes NpN_{p} and NqN_{q} during a single execution of WkW_{k} can be represented as

Sp,qk=Mi,a𝒬pMj,b𝒬q(F~Mi,a,Mj,bk+R~Mi,a,Mj,bk),k𝒦.S_{{p},{q}}^{k}=\!\!\sum_{M_{i,a}\in\mathcal{Q}_{p}}\!\sum_{M_{j,b}\in\mathcal{Q}_{q}}\!\!\!\left(\widetilde{{F}}_{M_{i,a},M_{j,b}}^{k}\!+\!\widetilde{{R}}_{M_{i,a},M_{j,b}}^{k}\!\right),\forall k\in\mathcal{K}. (10)

Then, the direct traffic from NpN_{p} to NqN_{q} in unit time can be represented by

Sp,q=k𝒦λkSp,qk.S_{p,q}=\sum\limits_{k\in\mathcal{K}}\lambda^{k}S_{{p},{q}}^{k}. (11)

We use Up,qU_{p,q} to represent the routing path from NpN_{p} to NqN_{q}, where p,q𝒩p,q\in\mathcal{N}. Given any two adjacent nodes NxN_{x} and NyN_{y}, if the directly connected link between them is one hop of the routing path Up,qU_{p,q}, then we can denote it as Nx,NyUp,q\langle N_{x},N_{y}\rangle\in U_{p,q}. The total traffic from NxN_{x} to its adjacent nodes NyN_{y} can be expressed as

S~x,y=p=1|𝒩|q=1|𝒩|{Sp,q,ifNx,NyUp,qelse 0},\widetilde{S}_{{x,y}}=\sum_{p=1}^{\left|\mathcal{N}\right|}\sum_{q=1}^{\left|\mathcal{N}\right|}\bigg{\{}S_{p,q},\ \text{if}\ \left<N_{x},N_{y}\right>\in{U}_{p,q}\ \text{else}\ 0\bigg{\}}, (12)

where (12)(\ref{eq12}) reveals that when traversing the routing path Up,qU_{p,q}, if the direct traffic from NpN_{p} to NqN_{q} is Sp,qS_{p,q}, and NxN_{x} to NyN_{y} is one hop of Up,qU_{p,q}, then the total traffic between NxN_{x} and NyN_{y} is incremented by Sp,qS_{p,q}.

III-D Communications Delay Analysis Model

To address the issue of multiple service traffic flows sharing link bandwidth, this paper leverages queuing theory to analyze available link bandwidth. We exploit the M/M/1 queuing model to model the data packet transmission process between any two adjacent nodes NxN_{x} and NyN_{y}. Let the packet size be ss kB, the bandwidth from NxN_{x} to NyN_{y} be Zx,yZ_{x,y} Mbps, the packet arrival rate from NxN_{x} to NyN_{y} be λx,y\lambda_{x,y} Mbps, and the service rate of the link be μx,y\mu_{x,y} Mbps. The average sending time for each packet between NxN_{x} and NyN_{y} can be expressed as

Tx,ypkg=1μx,yλx,y.T^{pkg}_{{x,y}}=\frac{1}{\mu_{{x,y}}-\lambda_{{x,y}}}. (13)

The average available bandwidth of the directly connected link from NxN_{x} to its adjacent node NyN_{y} can be represented by

Zx,y=sTx,ypkg=s×(μx,yλx,y)\displaystyle Z^{\prime}_{x,y}=\frac{s}{T^{pkg}_{{x,y}}}=s\times\bigg{(}{\mu_{{x,y}}-\lambda_{{x,y}}}\bigg{)} (14)
=s×(Zx,ysS~x,ys)=Zx,yS~x,y.\displaystyle=s\times\bigg{(}\frac{Z_{x,y}}{s}-\frac{\widetilde{S}_{{x,y}}}{s}\bigg{)}=Z_{x,y}-\widetilde{S}_{{x,y}}.

Given any two adjacent nodes NxN_{x} and NyN_{y}, the propagation delay between them is represented as Vx,yV_{x,y}. Assume full-duplex communication method among nodes, the total propagation delay between any two nodes NpN_{p} and NqN_{q} can be expressed as follows:

V~p,q=Nx,NyUp,qVx,y.\widetilde{V}_{p,q}=\sum_{\left\langle N_{x},N_{y}\right\rangle\in{U}_{p,q}}V_{x,y}. (15)

Let Zminp,qZ_{min}^{p,q} denote the minimum available bandwidth among all paths from NpN_{p} to NqN_{q}. Ignoring the forwarding time at intermediate nodes, the average communication delay TMi,a,Mj,bT_{M_{i,a},M_{j,b}} for instance Mi,aM_{i,a} to send requests to instance Mj,bM_{j,b} can be calculated as

TMi,a,Mj,b={V~Li,a,Lj,b+Di,jreqZLi,a,Lj,bmin, if Li,aLj,b,0, if Li,a=Lj,b,\!\!T_{M_{i,a},M_{j,b}}=\!\!\begin{cases}\widetilde{V}_{{L_{i,a},L_{j,b}}}\!+\!\frac{D^{req}_{i,j}}{Z^{min}_{{L_{i,a},L_{j,b}}}}\!\!&,\text{ if }L_{{i,a}}\neq L_{{j,b}},\\ 0\!\!&,\text{ if }L_{{i,a}}=L_{{j,b}},\end{cases} (16)

where Li,a,a𝒜iL_{i,a},a\in\mathcal{A}_{i} and Lj,b,b𝒜jL_{j,b},b\in\mathcal{A}_{j} represent the computing nodes where Mi,aM_{i,a} and Mj,bM_{j,b} are deployed, respectively. Similarly, the average communication delay for Mi,aM_{i,a} to send responses to Mj,bM_{j,b} can be given as follows:

TMi,a,Mj,b={V~Li,a,Lj,b+Di,jresZLi,a,Lj,bmin, if LMi,aLMj,b,0, if LMi,a=LMj,b.\!\!T^{\prime}_{M_{i,a},M_{j,b}}\!=\!\begin{cases}\widetilde{V}_{{L_{i,a},L_{j,b}}}\!+\!\frac{D^{res}_{i,j}}{Z^{min}_{{L_{i,a},L_{j,b}}}}\!\!&,\text{ if }L_{M_{i,a}}\neq L_{M_{j,b}},\\ 0\!\!&,\text{ if }L_{M_{i,a}}=L_{M_{j,b}}.\end{cases} (17)

Therefore, the average communication delay for each execution of service WkW_{k} can be expressed as

Tk=\displaystyle T_{k}= i=0||a=1|𝒜i|j=0||b=1|𝒜j|(FMi,a,Mj,bkTMi,a,Mj,b\displaystyle\sum_{i=0}^{\left|\mathcal{I}\right|}\sum_{a=1}^{\left|\mathcal{A}_{i}\right|}\sum_{j=0}^{\left|\mathcal{I}\right|}\sum_{b=1}^{\left|\mathcal{A}_{j}\right|}\!\bigg{(}{F}_{M_{i,a},M_{j,b}}^{k}T_{M_{i,a},M_{j,b}} (18)
+RMi,a,Mj,bkTMi,a,Mj,b),ij,k𝒦.\displaystyle+{R}_{M_{i,a},M_{j,b}}^{k}T^{\prime}_{M_{i,a},M_{j,b}}\!\bigg{)},\forall i\neq j,k\in\mathcal{K}.

Due to the various delay requirements of different service types, we employ the weighted sum of the average communication delays for each type of service to indicate system’s overall average communication delay, which can be expressed as follows:

T=k𝒦θkTk,T=\sum_{k\in\mathcal{K}}\theta_{k}T_{k},\vspace{-0.8em} (19)

where θkΘ{θ1,,θK}\theta_{k}\in\Theta\triangleq\left\{\theta_{1},\cdots,\theta_{K}\right\} denotes the weight of service Wk,k𝒦W_{k},k\in\mathcal{K}.

IV Problem Formulation and Solutions

In this section, we aim to minimize the weighted-sum of average communication delay by optimizing the microservice deployment scheme {Li,a}i,a𝒜i\mathcal{L}\triangleq\left\{L_{i,a}\right\}_{i\in\mathcal{I},a\in\mathcal{A}_{i}} of microservice instances, where Li,aL_{i,a} indicates the computing node where microservice instance Mi,aM_{i,a} is deployed. Based on the discussion in Sec. II A-D, we can formulate the minimization problem as follows:

𝒫1:\displaystyle\mathcal{P}1: min{Li,a}i,a𝒜iT=k𝒦θkTk,\displaystyle\min_{\left\{L_{i,a}\right\}_{i\in\mathcal{I},a\in\mathcal{A}_{i}}}\ \ T=\sum_{k\in\mathcal{K}}\theta_{k}T_{k}, (20a)
s.t. Mi,apCiCp,Np𝒩,\displaystyle\sum_{M_{i,a}\in\mathcal{M}_{p}}C_{i}\leqslant C_{p},\ \forall N_{p}\in\mathcal{N}, (20b)
Mi,apBiBp,Np𝒩,\displaystyle\sum_{M_{i,a}\in\mathcal{M}_{p}}B_{i}\leqslant B_{p},\ \forall N_{p}\in\mathcal{N}, (20c)
S~p,qZp,q,q𝒩p,p𝒩,\displaystyle\widetilde{S}_{{p},{q}}\leqslant Z_{p,q},\ \forall q\in\mathcal{N}_{p},\ \forall p\in\mathcal{N}, (20d)

where p\mathcal{M}_{p} and 𝒩p\mathcal{N}_{p} represents the sets of all instances of microservices deployed and the nodes adjacent to node Np,p𝒩N_{p},p\in\mathcal{N}, respectively. CiC_{i} and BiB_{i} indicate the computational and memory resources required to deploy the instance of microservice MiM_{i} on node NpN_{p}, respectively. Constraints (20b) and (20c) guarantee that the total computational and memory resources required by the microservice instances on each node do not exceed the node’s CPU count and memory capacity, respectively. Constraint (20d) ensures that the traffic on each link does not exceed its bandwidth limit.

IV-A Proposed Solution

Due to the numerous microservices and nodes, it is extremely challenging to efficiently tackle the optimal solution of the original problem 𝒫1\mathcal{P}1 leveraging traditional convex optimization methods. To this end, we first introduce 0-11 variables to transform 𝒫1\mathcal{P}1 into a nonlinear 0-11 programming problem.

IV-A1 Problem Transformation

Generally, to guarantee the availability of services in edge networks, instances of each microservice should be deployed across multiple nodes. In this case, we introduce binary variables Gp,i𝒢G_{p,i}\in\mathcal{G} to characterize whether the instance of microservice Mi,iM_{i},i\in\mathcal{I} is deployed on node NpN_{p}. Moreover, we use Mp,iM_{p,i} to denote the instance of microservice MiM_{i} located on node NpN_{p}, thereby uniquely identifying whether the instance of MiM_{i} is on Np𝒩N_{p}\in\mathcal{N} or not. Then, (10) can be reformulated as follows:

Sp,qk=MiMjGp,iGq,j(F~Mp,i,Mq,jk+R~Mp,i,Mq,jk).S_{{p},{q}}^{k}=\sum_{M_{i}\in\mathcal{M}}\sum_{M_{j}\in\mathcal{M}}G_{p,i}G_{q,j}(\widetilde{{F}}_{M_{p,i},M_{q,j}}^{k}+\widetilde{{R}}_{M_{p,i},M_{q,j}}^{k}). (21)

For the topology relationships, we introduce binary variable Ip,qx,yIp,qI_{p,q}^{x,y}\in I_{p,q} to indicate whether the path from node NpN_{p} to node NqN_{q} includes the hop between adjacent nodes NxN_{x} and NyN_{y}. Then, (12) can be reformulated as follows:

S~x,y=p=1|𝒩|q=1|𝒩|Ix,yp,qSpq.\widetilde{S}_{{x,y}}=\sum_{p=1}^{\left|\mathcal{N}\right|}\sum_{q=1}^{\left|\mathcal{N}\right|}I^{p,q}_{x,y}S_{{p}{q}}. (22)

Similarly, (15) can also be rewritten as follows:

V~p,q=Ip,qx,yIp,qIp,qx,yVx,y.\widetilde{V}_{p,q}=\sum_{I_{p,q}^{x,y}\in I_{p,q}}I_{p,q}^{x,y}V_{x,y}. (23)

Furthermore, (16) and (17) can be reformulated as follows:

TMp,i,Mq,j={V~p,q+Di,jreqZp,qmin,ifpqij,0,else,T_{M_{p,i},M_{q,j}}=\begin{cases}\widetilde{V}_{p,q}+\frac{D^{req}_{i,j}}{Z^{min}_{p,q}}&,\text{if}\ p\neq q\wedge i\neq j,\\ 0&,\text{else},\end{cases} (24)
TMp,i,Mq,j={V~p,q+Di,jresZp,qmin,ifpqij,0, else .T^{\prime}_{M_{p,i},M_{q,j}}=\begin{cases}\widetilde{V}_{p,q}+\frac{D^{res}_{i,j}}{Z^{min}_{p,q}}&,\text{if}\ p\neq q\wedge i\neq j,\\ 0&,\text{ else }.\end{cases} (25)

Finally, (18) can be can be transformed as follows:

Tk=\displaystyle T_{k}= i=0||p=1|𝒩|j=0||q=1|𝒩|Gp,iGq,j(FMp,i,Mq,jkTMp,i,Mq,j\displaystyle\sum_{i=0}^{\left|\mathcal{I}\right|}\sum_{p=1}^{\left|\mathcal{N}\right|}\sum_{j=0}^{\left|\mathcal{I}\right|}\sum_{q=1}^{\left|\mathcal{N}\right|}G_{p,i}G_{q,j}\bigg{(}{F}_{M_{p,i},M_{q,j}}^{k}T_{M_{p,i},M_{q,j}} (26)
+RMp,i,Mq,jkTMp,i,Mq,j),ij.\displaystyle+{R}_{M_{p,i},M_{q,j}}^{k}T^{\prime}_{M_{p,i},M_{q,j}}\bigg{)},\ i\neq j.

IV-A2 Problem Reformulation

As a result, the original problem 𝒫1\mathcal{P}1 can be equivalently transformed into the following form:

𝒫2:\displaystyle\mathcal{P}2: min𝒢:{Gp,i}i,p𝒩T=Wk𝒲θkTk,\displaystyle\min_{\mathcal{G}:\left\{G_{p,i}\right\}_{i\in\mathcal{I},p\in\mathcal{N}}}\ \ T=\sum_{W_{k}\in\mathcal{W}}\theta_{k}T_{k}, (27a)
s.t. p=1|𝒩|Gp,i|𝒜i|,i,\displaystyle\sum_{p=1}^{\left|\mathcal{N}\right|}G_{p,i}\leq\left|\mathcal{A}_{i}\right|,\ \forall i\in\mathcal{I}, (27b)
i=1||Gp,i1,Np𝒩,\displaystyle\sum_{i=1}^{\left|\mathcal{I}\right|}G_{p,i}\leq 1,\ \forall N_{p}\in\mathcal{N}, (27c)
i=1||Gp,iCiCp,p𝒩,\displaystyle\sum_{i=1}^{\left|\mathcal{I}\right|}G_{p,i}C_{i}\leqslant C_{p},\ \forall p\in\mathcal{N}, (27d)
i=1||Gp,iBiBp,p𝒩,\displaystyle\sum_{i=1}^{\left|\mathcal{I}\right|}G_{p,i}B_{i}\leqslant B_{p},\ \forall p\in\mathcal{N}, (27e)
Γ~p,qZp,q,q𝒩p,p𝒩,\displaystyle\widetilde{\Gamma}_{{p},{q}}\leqslant Z_{p,q},\ \forall q\in\mathcal{N}_{p},\ \forall p\in\mathcal{N}, (27f)

where (27b)(\ref{eq27b}) and (27c)(\ref{eq27c}) specify the constraints on the number of microservice instances. In particular, (27b)(\ref{eq27b}) ensures that the total instances of microservice MiM_{i} across do not exceed |𝒜i|\left|\mathcal{A}_{i}\right|. (27c)(\ref{eq27c}) prevents multiple instances of MiM_{i} from being deployed on the same node. Constraints (27d)(\ref{eq27d}) and (27e)(\ref{eq27e}) ensure that the computational and memory resources required by the deployed microservices MiM_{i} cannot exceed the total computational and memory resources available on the node. Constraint (27f)(\ref{eq27f}) enforces the bandwidth constraint between node NpN_{p} and its adjacent nodes NqN_{q}.

IV-B Topology-aware and Individual-adaptive Microservice Deployment Scheme

Given a large number of microservices and nodes, effectively tackling the nonlinear 0-11 programming problem 𝒫2\mathcal{P}2 optimally within a constrained timeframe is challenging. While the genetic algorithm (GA) offers a robust heuristic approach for combinatorial problems, its inherent limitations of slow convergence and suboptimal performance in specific scenarios necessitate refinement. To address these shortcomings, we propose a novel topology-aware and individual-adaptive microservice deployment (TAIA-MD) scheme. This scheme leverages the perceived network topology to design the microservice deployment scheme. Moreover, it incorporates an individual-adaptive mechanism that enhances individual adaptability by strategically initializing a select group of “super individuals” within the GA population, thereby accelerating convergence and avoiding local optima. As described in Algorithm 1, the first part of our proposed TAIA-MD scheme encompasses four aspects, as follows:

  • Chromosome Encoding: We employ binary encoding, where each chromosome consists of ||\left|\mathcal{I}\right| genes, representing the deployment of ||\left|\mathcal{I}\right| microservice types. The set of computing nodes available for deployment is |𝒩cmp|\left|\mathcal{N}_{cmp}\right|. Each gene is a binary string of length |𝒩cmp|\left|\mathcal{N}_{cmp}\right|, with a value of ‘0’ or ‘11’ indicating whether the microservice is deployed on the corresponding computing node or not. In addition, the number of ‘11’s on each gene is consistent with the number of instances of the corresponding microservice type.

  • Fitness Function: The fitness function measures the quality of an individual in solving 𝒫2\mathcal{P}2. To minimize the average communication delay TT, the fitness function is defined as a sufficiently large number minus TT. The calculation of TT is detailed in Algorithm 2.

  • Genetic Operators: Genetic operators generate new individuals through selection, crossover, and mutation. For the selection operation, we implement the tournament method as our selection scheme. Specifically, two groups of individuals are randomly sampled from the population, with each containing YmY_{m} individuals. The fittest individual from each group is then selected as the parent. Following the selection operation, crossover operations are performed on these selected parents. Each gene from the parent has a probability of YpY_{p} to exchange two genes if the chromosomes of both individuals are feasible after exchanging. Subsequently, the offspring undergo mutation with a probability of YqY_{q}, where bits are randomly flipped to maintain chromosome feasibility. The process of selection, crossover, and mutation repeats until the offspring population reaches the desired population size YnY_{n}.

  • Termination Conditions: The algorithm has a maximum iteration count of YkY_{k} to ensure timely problem-solving. Termination transpires upon reaching this threshold or when individual fitness for YiY_{i} consecutive iterations has no significant change, conserving computational resources.

Input: Service set 𝒲\mathcal{W}; Microservice set \mathcal{M}; Node set 𝒩\mathcal{N}; Population size YnY_{n}; Number of individuals per group YmY_{m}; Gene crossover probability YpY_{p}; genetic mutation probability YqY_{q}; Maximum number of iterations YkY_{k}; Current number of iterations YiY_{i}
Output: Optimal deployment scheme 𝒢best\mathcal{G}^{best}
1 Randomly initialize each individual II in the population \mathbb{P};
2 for i1toYk\text{i}\leftarrow\text{1}~{}\textbf{to}~{}Y_{k} do
3       foreach Individual II in \mathbb{P} do
4             For the deployment scheme 𝒢\mathcal{G} of individual II, compute its time delay TT;
5             Updating the fitness of individual II;
6            
7       end foreach
8      Update the optimal scheme 𝒢best\mathcal{G}^{best};
9       if The un-updated round of 𝒢best\mathcal{G}^{best} >> YiY_{i} then
10             End the loop;
11            
12       end if
13      Create a collection of child individuals 𝕀\mathbb{I}^{\prime};
14      
15      while |𝕀|<Yn\left|\mathbb{I}^{\prime}\right|<Y_{n} do
16             Randomly select two groups of individuals, each with the number of YmY_{m};
17             Select the top two individuals with the highest fitness from each group, IAI_{A} and IBI_{B}, as parents;
18             IAI_{A} and IBI_{B} undergo uniform crossover with probability YpY_{p}, producing offspring IAI^{\prime}_{A} and IBI^{\prime}_{B};
19             Each gene on IAI^{\prime}_{A} and IBI^{\prime}_{B} undergoes mutation with probability YqY_{q};
20             Add IAI^{\prime}_{A} and IBI^{\prime}_{B} into 𝕀\mathbb{I}^{\prime};
21            
22       end while
23      Replace the population of individuals with 𝕀\mathbb{I}^{\prime};
24      
25 end for
Algorithm 1 Topology-aware genetic algorithm
Input: Microservice deployment scheme 𝒢\mathcal{G}; Service set 𝒲\mathcal{W}; Microservice set \mathcal{M}; Node set 𝒩\mathcal{N}; Service rating weight set Θ\Theta
Output: System average communication delay TT
1 Compute the propagation delay V~p,q\widetilde{V}_{p,q} between any nodes;
2 foreach Service WkW_{k} in 𝒲\mathcal{W} do
3       Compute FMi,a,Mj,bkF_{M_{i,a},M_{j,b}}^{k} and RMi,a,Mj,bkR_{M_{i,a},M_{j,b}}^{k} between any two instances during one execution of WkW_{k};
4      
5 end foreach
6Compute the direct traffic Sp,qS_{p,q} between any two nodes;
7 Compute the total traffic S~x,y\widetilde{S}_{x,y} between any adjacent nodes;
8 Compute the actual available bandwidth Zx,yZ^{\prime}_{x,y} between any adjacent nodes; Compute the minimum available bandwidth Zp,qminZ^{min}_{p,q} between any nodes;
9 Initialize the system average communication delay TT;
10
11foreach Service WkW_{k} in 𝒲\mathcal{W} do
12       Initialize the average communication delay TkT_{k} of service WkW_{k};
13       foreach Invoke Mi,MjM_{i},M_{j} during the execution of service WkW_{k} do
14            
15            Denote the node sets of deploying MiM_{i} and MjM_{j} as 𝒩i\mathcal{N}^{i} and 𝒩j\mathcal{N}^{j}, respectively;
16            
17            foreach Np𝒩iN_{p}\in\mathcal{N}^{i} do
18                   foreach Nq𝒩jN_{q}\in\mathcal{N}^{j} do
19                        
20                        Compute TMi,a,Mj,bT_{M_{i,a},M_{j,b}} and TMi,a,Mj,bT^{\prime}_{M_{i,a},M_{j,b}} between the corresponding instances on two nodes;
21                         TkT_{k} += FMi,a,Mj,bkTMi,a,Mj,b+RMi,a,Mj,bkTMi,a,Mj,bF_{M_{i,a},M_{j,b}}^{k}T_{M_{i,a},M_{j,b}}+R_{M_{i,a},M_{j,b}}^{k}T^{\prime}_{M_{i,a},M_{j,b}};
22                        
23                   end foreach
24                  
25             end foreach
26            
27       end foreach
28      T += θkTk\theta_{k}T_{k};
29      
30 end foreach
Algorithm 2 Microservice Deployment Scheme Evaluation Algorithm

Moreover, we implement an individual-adaptive mechanism. This mechanism generates a select few “super individuals” during population initialization and endows them with adaptive capabilities to accelerate algorithm convergence. Algorithm 3 introduces a low-complexity algorithm named greedy-based time optimization algorithm to fine-tune deployment strategies for these super individuals. By altering the traversal order of services in line 3 of Algorithm 3, we can generate diverse deployment schemes, ensuring the genetic diversity of super individuals. To prevent the algorithm from getting trapped in local optima, we strictly limit the number of super individuals. Coupled with the tournament selection scheme, non-super individuals still have a significant chance of being selected as parents.

IV-C Computational Complexity Analysis

We first denote the maximum number of instances for each type of microservice as XX. Then the complexity for randomly generating the deployment scheme in the initialization phase can be calculated 𝒪(X||)\mathcal{O}\left(X\left|\mathcal{M}\right|\right). In Algorithm 3, the complexity of deploying each microservice instance based on the invocation relationship is 𝒪(X|||𝒩|)\mathcal{O}\left(X\left|\mathcal{M}\right|\left|\mathcal{N}\right|\right). As illustrated in Algorithm 1, each iteration involves generating a new population and evaluating its fitness. As a result, the complexities for selection, crossover, and mutation operations are 𝒪(Ym)\mathcal{O}\left(Y_{m}\right), 𝒪(|||𝒩|)\mathcal{O}\left(\left|\mathcal{M}\right|\left|\mathcal{N}\right|\right), and 𝒪(1)\mathcal{O}\left(1\right), respectively, resulting in the complexity for producing a new individual of 𝒪(Ym+|||𝒩|)\mathcal{O}\left(Y_{m}+\left|\mathcal{M}\right|\left|\mathcal{N}\right|\right). The fitness evaluation in Algorithm 2 considers node-to-node forwarding paths and microservice instance invocation relationships, leading to a complexity of 𝒪(X2||2)\mathcal{O}\left(X^{2}\left|\mathcal{M}\right|^{2}\right). Algorithm 2 may require up to YkY_{k} iterations, each producing YnY_{n} individuals and evaluating their fitness. Compared to the iterative process, the initialization phase’s complexity is negligible. As a result, the proposed TASA-MD scheme has an overall computational complexity of 𝒪(YkYn(Ym+X2||2))\mathcal{O}\left(Y_{k}Y_{n}\big{(}Y_{m}+X^{2}\left|\mathcal{M}\right|^{2}\big{)}\right), significantly improving solving efficiency compared to the exponential time complexity of the enumeration algorithm 𝒪((X||)|𝒩|)\mathcal{O}\left(\big{(}X\left|\mathcal{M}\right|\big{)}^{\left|\mathcal{N}\right|}\right).

Input: Service set 𝒲\mathcal{W}; Microservice set \mathcal{M}; Node set 𝒩\mathcal{N}.
Output: Microservice deployment scheme 𝒢\mathcal{G}.
1 Initialize microservice deployment scheme 𝒢\mathcal{G};
2 Compute the communication delay between any two nodes NpN_{p} and NqN_{q} in 𝒩\mathcal{N}; foreach WkW_{k} in 𝒲\mathcal{W} do
3       if MiM_{i} is not deployed then
4             Compute the average delay from each computing node to all access nodes;
5             Deploy instances of MiM_{i} to |𝒜i|\left|\mathcal{A}_{i}\right| nodes with the lowest average delay that meet the resource requirements of MiM_{i};
6             Update the remaining resources of the corresponding computing nodes;
7            
8       end if
9      foreach MiM_{i} required by WkW_{k} do
10             if MiM_{i} is not deployed then
11                   Denote the set of associated nodes deployed with MiM_{i} frontend MjM_{j} as 𝒩\mathcal{N}^{\prime};
12                   Compute the average delay from each computing node to the nodes in 𝒩\mathcal{N}^{\prime};
13                   Deploy instances of MiM_{i} to the |𝒜i|\left|\mathcal{A}_{i}\right| nodes with the lowest average delay that meet resource requirements of MiM_{i};
14                   Update the remaining resources of the corresponding computing nodes;
15                  
16             end if
17            
18       end foreach
19      
20 end foreach
Algorithm 3 Greedy-based Microservice Deployment Algorithm

V Performance Evaluation

To comprehensively validate the superior performance of the proposed TAIA-MD scheme, we have conducted extensive simulations focusing on bandwidth resources, service arrival rates, computing resources, and network topology. In addition, to further demonstrate the effectiveness and practicality of the TAIA-MD scheme, we have also carried out the actual deployment on the previously developed microservice physical verification platform called DMSA [46]. These evaluations collectively confirm the robustness and adaptability of the TAIA-MD scheme in both simulated and real-world environments Notably, due to access restrictions on deployment permissions on commercial platforms such as Amazon and Netflix, our current performance evaluation, like most studies, is independent of actual and commercial parameters..

V-A Simulation Parameter Settings

In our simulations, we consider there are 5050 types of microservices, each with input and output data sizes ranging from 1010 KB to 100100 KB, requiring 0.10.1 to 0.30.3 CPU units and 100100 MB to 300300 MB of memory per instance, typically deployed in 33 to 55 instances. There are 1010 service types with 585\!\sim\!8 microservice invocations and an arrival rate λk\lambda^{k} of 305030\!\sim\!50. The edge network topology comprises 5050 nodes, including 3535 computing nodes, 1010 routing nodes, and 55 user-access nodes. Each link has a propagation delay of 10μs10\mu s and a bandwidth of 100100 Mbps. Microservice instances are deployed on computing nodes, user requests arrive via user-access nodes, and routing nodes handle data forwarding. Each computing node has 88 CPU cores and 80008000 MB memory. Notably, the simulation parameter settings are randomly generated within a reasonable range with reference to works [47, 31, 22] as well as the edge environment characteristics.

Furthermore, since different network topologies can significantly impact the performance of edge networks, for example, links used by multiple routing paths may lead to bandwidth shortages, in order to quantify this, we introduce the concept of link forwarding load (LFL) to indicate how often a link serves as the forwarding path in the network topology.

Definition 1.

Link forwarding load (LFL): For any given NxN_{x} and its adjacent node NyN_{y}, the link forwarding load between them can be represented as Ox,y=p=1|𝒩|q=1|𝒩|Ip,qx,yO_{x,y}=\sum_{p=1}^{\left|\mathcal{N}\right|}\sum_{q=1}^{\left|\mathcal{N}\right|}I_{p,q}^{x,y}. Let 𝔼\mathbb{E} denote the set of edges in the network topology, then the average link forwarding load of the network topology can be represented as

Oavg=x=1|𝒩|y=1|𝒩|Ox,y|𝔼|.O_{avg}=\frac{\sum_{x=1}^{\left|\mathcal{N}\right|}\sum_{y=1}^{\left|\mathcal{N}\right|}O_{x,y}}{\left|\mathbb{E}\right|}. (28)
Refer to caption
Figure 6: Example of LFLs for two different topologies.

Fig. 6 depicts the LFLs across two different topologies, where edge weights signify LFLs calculated per Definition 1. In Topology 1, all adjacent nodes have LFLs of 1212, resulting in an average LFL of 1212. In Topology 2, links such as E2,4E_{2,4} and E4,7E_{4,7} bear higher routing tasks, leading to higher LFLs. Despite both topologies having identical node and edge counts, Topology 2 exhibits a higher average LFL, making it more prone to bandwidth bottlenecks. In our simulation’s default topology, the connections between nodes in the edge network are based on the concept of the defined LFL. By adjusting different LFLs, we can simulate various network topologies. The default topology used in our simulations is shown in Fig. 7.

Refer to caption
Figure 7: Default topology used in the simulations.

V-B The Setup of Physical Verification Platform DMSA

DMSA is a decentralized MSA platform, and the biggest difference from traditional centralized MSAs is that it sinks the scheduling function from the control plane to edge nodes [46]. In particular, DMSA has redesigned three core modules of microservice discovery, monitoring, and scheduling, which achieve precise awareness of instance deployments, low monitoring overhead and measurement errors, and accurate dynamic scheduling, respectively. However, despite having a comprehensive scheduling mechanism, DMSA has not yet integrated microservice deployment optimization [46]. The picture of the real-world physical verification platform DMSA we constructed is shown in Fig. 8 (a).

In this paper, we have practically deployed the TAIA-MD scheme on the DMSA platform and verified its effectiveness and practicality. In particular, we exploit 6 Raspberry Pi 4Bs, 5 Orange Pi 5Bs, and 6 Gigabit Ethernet Switches to construct this edge network topology. This network topology includes 17 nodes, namely 6 Gigabit Ethernet switches as communication nodes, 3 Raspberry Pi 4B as user-access nodes, and 9 computing nodes (4 Raspberry Pi 4B and 5 Orange Pi 5B). We have designed three typical services: video services, download services, and graphic-text services. These services are implemented by 10 microservice instances 111Notably, three user-access nodes generate requests for three types of services at a specific arrival rate with loads evenly distributed.. Video segments range from 11 to 3 MB with a maximum wait time of 1010 seconds. Graphic-text pages are 0.50.5 to 11 MB with the same wait time, while download files are 1010 to 2020 MB with a 100100-second wait time. If it times out, the service execution will be marked as failed. Each test lasts for 4040 minutes. To comprehensively evaluate the network performance of the TAIA-MD scheme deployed on the DMSA platform, we design two network emergencies, as shown in Fig. 8 (b). Link 1 between Switch 1 and Node 3 is disconnected at the 10th minute and restored at the 15th minute to simulate the computing node suddenly goes offline. Link 3 between Switch 3 and Switch 5 is restricted to 100 Mbps at the 30th minute and restored back to 11 Gbps at the 35th minute to portray the network fluctuations. As shown in Table I, we also test the performance of TAIA-MD under high, medium, and low load conditions on the DMSA platform.

Refer to caption
(a) Practical platform of DMSA
Refer to caption
(b) Logical topology of DMSA
Figure 8: The real-world physical verification platform of DMSA, (a) practical platform of DMSA, and (b) logical topology of DMSA.
TABLE I: Arrival Rates of Different Services under Various Loads.
Load Condition Video Service Download Service Graphic-text Service
High Load 5 1 10
Medium Load 3 0.6 6
Low Load 1.5 0.3 3

V-C Comparison Schemes in Numerous Simulations

To comprehensively validate the effectiveness of the proposed TAIA-MD scheme, we compare it with four different baseline schemes in simulations, as follows:

  • Random Deployment Scheme (Random) [48]: Microservice instances are deployed randomly on selected nodes that meet resource requirements, generating the deployment scheme.

  • Greedy-based Deployment Scheme (Greedy) [31]: This scheme considers the complex dependencies among microservices, which are modeled as the chain structure but ignoring the network topology. It only considers the previous and next microservices in the invocation chain when deploying each instance, selecting the best-performing node after evaluating all possible nodes.

  • GA-Based Deployment Scheme (GA) [49]: This scheme takes into account the dependencies between microservices and the bandwidth capacity of edge nodes but overlooks the network topology. Since it also uses the GA, comparing it with our proposed TAIA-MD scheme highlights the performance gains from incorporating topology awareness.

  • Non-Individual-Adaptive TAIA-MD Scheme (w/o, IA) TAIA-MD: This variant of the proposed TAIA-MD scheme omits the individual-adaptation mechanism to validate the effectiveness individual-adaptation mechanism in accelerating convergence and avoiding local optima.

V-D Numerous Simulation Results

As illustrated in Fig. 9, we analyze the convergence performance of the traditional GA, (w/o, IA) TAIA-MD, and TAIA-MD schemes. Traditional GA converges quickly with fewer iterations. This primarily stems from its oversight of the edge network topology and individual adaptability, considering fewer influencing factors, thus necessitating fewer convergence iterations. Conversely, (w/o IA) TAIA-MD, which takes into account network topology and bandwidth contention, is more significantly impacted by deployment scheme variations, necessitating a longer optimization period. Addressing these issues, our proposed TAIA-MD scheme generates relatively superior individuals and gene fragments during genetic initialization, facilitating rapid convergence. Numerical results underscore that compared to (w/o, IA) TAIA-MD scheme, the proposed TAIA-MD reduces the iteration rounds by approximately 50%50\%, typically completing within 300300 iterations. Furthermore, the TAIA-MD scheme tackles the tardy convergence issue of (w/o, IA) TAIA-MD, significantly enhancing algorithm efficiency. Noteworthy is that although traditional GA exhibits slightly faster convergence, its deployment schemes are far inferior compared to (w/o, IA) TAIA-MD and TAIA-MD, as will be detailed later.

Refer to caption
Figure 9: Comparison of convergence performance among the proposed TAIA-MD scheme and baseline schemes.

As depicted in Fig. 10, we compare the performance of TAIA-MD against the baseline schemes across different link bandwidths. In the scenarios with lower bandwidth, due to certain link traffic exceeding available bandwidth, the Random, Greedy, and GA schemes experience severe congestion and significant delays. In contrast, (w/o, IA) TAIA-MD and TAIA-MD can effectively reduce the delay by over 90%90\% and competently prevent congestion through optimized microservices deployment. As bandwidth scales up to 100100 Mbps, the congestion and delay issues for Random, Greedy, and GA schemes can be significantly alleviated. Nonetheless, (w/o, IA) TAIA-MD and TAIA-MD schemes still demonstrate remarkable performance advantages. This primarily arises from their consideration of the impact of microservice deployment on available bandwidth, incorporating network topology and individual adaptation. Compared to Greedy and GA, (w/o, IA) TAIA-MD reduces delay by 73.1%73.1\% and 39.3%39.3\%, respectively. As the link bandwidth further increases to above 150150 Mbps, due to the available bandwidth being much greater than the system requirement, the actual available bandwidth between nodes approaches the link bandwidth, and the impact of deployment schemes on available bandwidth gradually weakens. Except for the Random scheme, the performance gap between other schemes and (w/o, IA) TAIA-MD and TAIA-MD gradually narrows to within 20%20\%. Moreover, across diverse link bandwidths, TAIA-MD improves delay performance by about 5%5\% and convergence performance by approximately 50%50\% compared to (w/o, IA) TAIA-MD scheme, which demonstrates the effectiveness of the individual-adaptive mechanism, i.e., Algorithm 3.

Refer to caption
Figure 10: Performance comparison between TAIA-MD and baseline schemes under different bandwidths.

Fig. 11 illustrates the performance of five schemes across diverse service arrival rates. In scenarios with lower service arrival rates, denoting low-load conditions, except for random deployment, the performance differentials among these schemes appear marginal. However, (w/o, IA) TAIA-MD and TAIA-MD consistently exhibit the most superior performance. As service arrival rates increase, the bandwidth requirements of the system also increase accordingly. Compared with Random, Greedy, and GA schemes, the advantages of (w/o, IA) TAIA-MD and TAIA-MD become increasingly pronounced. This is primarily because they have taken into account the impact of microservice deployments on available bandwidth. As the service arrival rates increase, the burden borne by the links of edge networks intensifies, thereby leading to the realm of heightened network intricacies. Therefore, the proposed TAIA-MD scheme that integrates topology awareness and individual adaptability can fully utilize its advantages in these constantly changing networks.

Refer to caption
Figure 11: Performance comparison between TAIA-MD and baseline schemes under different service arrival rates.

Given sufficient memory resources, we adjust the computational resources on each node to evaluate the performance of algorithms. As shown in Fig. 12, with limited CPU resources, the flexibility of microservice deployment options is constrained, which makes the crossover and mutation process difficult for GA and (w/o, IA) TAIA-MD. In this case, Greedy and TAIA-MD, with individual adaptability, demonstrate superior performance. As computational resources increase, the flexibility of microservice deployment options expands, enhancing the performance of GA and (w/o, IA) TAIA-MD. When CPUs exceed four units, they surpass microservice deployment requirements, resulting in no significant performance changes for all schemes. This reveals that dynamically adjusting computational resources based on microservice deployment results can reduce resource overhead.

To thoroughly investigate the impact of network topology on microservice deployment performance, we analyze the delay performance of TAIA-MD and baseline schemes under different average link forwarding loads, as shown in Fig. 13. As the average link forwarding load increases, the competition for bandwidth resources between microservices becomes more serious, leading to higher system delay. Under high average link forwarding loads, (w/o, IA) TAIA-MD and TAIA-MD exhibit significantly better delay performance compared to Random, GA, and Greedy, which demonstrates that (w/o, IA) TAIA-MD and TAIA-MD effectively address bandwidth collision in edge networks. Furthermore, although (w/o, IA) TAIA-MD and TAIA-MD achieve similar delay performance, Fig. 9 has already demonstrated TAIA-MD’s superior convergence performance.

Refer to caption
Figure 12: Performance comparison between TAIA-MD and baseline schemes under different service arrival rates.
Refer to caption
Figure 13: Performance comparison between TAIA-MD and baseline schemes under different network topologies.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 14: Performance improvement percentage between TAIA-MD and baseline schemes under different (a) bandwidths, (b) service arrival rates, (c) computational resources, and (d) network topologies.

Assume the delays for Scheme A and Scheme B are TAT_{A} and TBT_{B}, respectively. Then, the performance improvement percentage of Scheme A over Scheme B can be calculated as TBTATB×100%\frac{T_{B}-T_{A}}{T_{B}}\times 100\%. As shown in Fig. 14, we present the performance improvement percentage of TAIA-MD compared to baseline schemes under varied bandwidths, service arrival rates, computational resources, and average link forwarding loads. In most cases, considering the effects of topology and traffic on available bandwidth during deployment scheme calculation can effectively reduce the system’s delay. TAIA-MD can decrease system delay by approximately 30%30\%-60%60\% compared to other baseline schemes assuming fixed link bandwidth. Remarkably, in scenarios with low bandwidth, high service arrival rates, and complex topologies, TAIA-MD significantly prevents link congestion, providing substantial performance improvements. Furthermore, as illustrated in Fig. 14 (c), TAIA-MD generally suggests a performance gain of about 5%5\% over (w/o, IA) TAIA-MD. Importantly, with limited CPU resources, TAIA-MD performs markedly better than (w/o, IA) TAIA-MD, which underscores TAIA-MD’s extraordinary performance with topology-aware and individual-adaptive mechanisms for edge networks.

V-E DMSA Platform Validation Results

For latency-sensitive services like video and graphic-text services, we focus on analyzing the average delay performance of the TAIA-MD scheme deployed on the DMSA platform. As shown in Fig. 15, the average delay under various load conditions reveals that the DMSA platform with the TAIA-MD scheme (i.e., (DMSA, TAIA-MD) scheme) significantly outperforms the standalone DMSA platform. Specifically, for graphic-text services, the average delay improvement of the (DMSA, TAIA-MD) scheme over standalone the DMSA scheme is substantial, ranging from 31.13% under low loads to 38.51% under high loads. For video services, the performance enhancement of average delay for the (DMSA, TAIA-MD) scheme under low, medium, and high loads is 25.20%, 26.48%, and 37.92%, respectively. It can be seen that as the load increases, the delay performance advantage of the (DMSA, TAIA-MD) scheme becomes increasingly significant. In addition, the average delay of the DMSA platform deteriorates with the increase of load for both graphic-text and video services, while the average delay of the (DMSA, TAIA-MD) scheme is less affected by increasing load. This is primarily attributed to microservice deployment optimization of the (DMSA, TAIA-MD) scheme, which significantly reduces the overhead of instance scheduling in the DMSA platform and thus achieves lower average delay.

Refer to caption
(a) Graphic-text Services
Refer to caption
(b) Video Services
Figure 15: The average response delay of the TAIA-MD scheme for graphic-text and video services across different load conditions on the DMSA platform.

As illustrated in Fig. 16, we present the dynamic variations in delay performance over run time under various load conditions, taking video services as an example. Fig. 16 (a), (b), and (c) demonstrate that the (DMSA, TAIA-MD) scheme consistently outperforms the standalone DMSA platform in terms of delay performance under different loads. In particular, during the 101510\sim 15 min interval, a link disruption occurs between Switch 1 and Node 3, which causes a sharp increase in delay for both the (DMSA, TAIA-MD) scheme and DMSA. However, the delay performance can quickly recover from the link disconnection. Remarkably, the (DMSA, TAIA-MD) scheme restores network performance more rapidly. This is primarily due to the fact that the TAIA-MD scheme optimizes instance deployment on the basis of the DMSA platform’s effective sensing of network load and edge node status, and dynamically adjusts the scheduling strategies. During the 303530\sim 35 min interval, the link bandwidth between Switch 3 and Switch 5 fluctuates and drops sharply to 100100 Mbps. During this period, user requests are still routed through the constrained link, resulting in a significant increase in delay for both the (DMSA, TAIA-MD) scheme and DMSA platform. However, the delay performance can recover shortly after. The (DMSA, TAIA-MD) scheme is less affected by network fluctuations and recovers faster. This demonstrates the enhanced robustness and adaptability of the DMSA platform with the TAIA-MD scheme in handling network emergencies.

Refer to caption
(a) Low Load
Refer to caption
(b) Medium Load
Refer to caption
(c) High Load
Figure 16: Dynamic response delay of video services under different load conditions on the DMSA Platform with the TAIA-MD Scheme.

Next, the robustness of the (DMSA, TAIA-MD) scheme and DMSA platform is analyzed. As shown in Fig. 17 (a), (b), and (c), we examine the average service execution success rates of graphic-text, video, and file download services under various load conditions. Compared to the performance of the DMSA platform itself, the (DMSA, TAIA-MD) scheme overall improves the service execution success rates in edge networks and significantly enhances their robust performance. Fig. 17 (a)-(c) demonstrate that the (DMSA, TAIA-MD) scheme exhibits more pronounced performance advantages in microservice deployments as service data volume and load increase. For graphic-text and video services, the (DMSA, TAIA-MD) scheme stably maintains the service execution success rates of more than 98% and 97%, respectively. In large file downloads, the delay performance improvements in service execution success rates with the (DMSA, TAIA-MD) scheme become notably more significant as the load increases compared to the DMSA platform alone.

Refer to caption
(a) Graphic-text Services
Refer to caption
(b) Video Services
Refer to caption
(c) File Download Services
Figure 17: The TAIA-MD scheme on the service execution success rate under different services on the DMSA platform.

VI CONCLUSION AND FUTURE OUTLOOK

In this paper, we have introduced an innovative microservice deployment architecture, which integrates a three-tier network traffic model encompassing the service layer, microservices layer, and edge node layer. This traffic model meticulously characterizes the complex dependencies between edge network topology and microservices, and maps microservice deployments onto link traffic to accurately estimate communication delay. On this basis, we have formulated a weighted sum communication delay optimization problem to minimize communication delay by optimizing microservices deployments. To effectively solve this problem, we have proposed a novel deployment scheme called TAIA-MD, which accurately senses the network topology and incorporates an individual-adaptive mechanism in GA to accelerate convergence and avoid local optima. Extensive simulations show that in bandwidth-constrained edge networks with complex topologies, TAIA-MD improves the delay performance by approximately 30% to 60% compared to existing deployment schemes. Moreover, through real-world deployments on the DMSA platform, we have demonstrated the robustness of the TAIA-MD scheme for withstanding link failures and network fluctuations and validated its practicality in MSA for edge networks.

There are still limitations to this study. In particular, in the analysis of the microservice deployment problem, we focused on static routing schemes in edge networks. However, there actually may be multiple reachable paths between nodes, and alterations in routing schemes might impact link loads, consequently influencing the communication performance of services. Therefore, in future work, we intend to explore dynamic routing schemes to develop more comprehensive link traffic analysis models. In addition, we will also consider routing strategies as variable factors for joint optimization of routing strategies and microservice deployment schemes.

References

  • [1] Y. Chen et al., “Performance optimization in RSMA-assisted uplink xURLLC IIoT networks with statistical QoS provisioning,” arXiv preprint arXiv:2405.16471, 2024.
  • [2] A. Hannousse and S. Yahiouche, “Securing microservices and microservice architectures: A systematic mapping study,” Computer Science Review, vol. 41, p. 100415, 2021.
  • [3] T. Wu et al., “A brief overview of ChatGPT: The history, status quo and potential future development,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 5, pp. 1122–1136, 2023.
  • [4] Y. Chen et al., “Statistical QoS provisioning analysis and performance optimization in xURLLC-enabled massive MU-MIMO networks: A stochastic network calculus perspective,” IEEE Trans. Wireless Commun., pp. 1–1, 2024.
  • [5] Y. Chen et al., “When xURLLC meets NOMA: A stochastic network calculus perspective,” IEEE Commun. Mag., vol. 62, no. 6, pp. 90–96, 2024.
  • [6] A. K. Sandhu, “Big data with cloud computing: Discussions and challenges,” Big Data Mining and Analytics, vol. 5, no. 1, pp. 32–40, 2021.
  • [7] F. Jauro et al., “Deep learning architectures in emerging cloud computing architectures: Recent development, challenges and next research trend,” Applied Soft Computing, vol. 96, p. 106582, 2020.
  • [8] L. Qin et al., “Joint transmission and resource optimization in NOMA-assisted IoVT with mobile edge computing,” IEEE Trans. Veh. Technol., pp. 1–16, 2024.
  • [9] L. Qin et al., “Energy-efficient blockchain-enabled user-centric mobile edge computing,” IEEE Trans. Cogn. Commun. Netw., pp. 1–1, 2024.
  • [10] L. Qin et al., “Towards decentralized task offloading and resource allocation in user-centric MEC,” IEEE Trans. Mobile Comput., pp. 1–17, 2024.
  • [11] M. S. Elbamby et al., “Wireless edge computing with latency and reliability guarantees,” Proc. IEEE, vol. 107, no. 8, pp. 1717–1737, 2019.
  • [12] W. Zhang et al., “Energy-efficient workload allocation and computation resource configuration in distributed cloud/edge computing systems with stochastic workloads,” IEEE J. Sel. Areas Commun., vol. 38, no. 6, pp. 1118–1132, 2020.
  • [13] S. Pallewatta, V. Kostakos, and R. Buyya, “Placement of microservices-based IoT applications in Fog computing: A taxonomy and future directions,” ACM Computing Surveys, vol. 55, no. 14s, pp. 1–43, 2023.
  • [14] L. Wang et al., “Microservice-oriented service placement for mobile edge computing in sustainable internet of vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 9, pp. 10 012–10 026, 2023.
  • [15] Y. Mansouri and M. A. Babar, “A review of edge computing: Features and resource virtualization,” J. Parallel Distrib. Comput., vol. 150, pp. 155–183, 2021.
  • [16] F. Xu et al., “Managing performance overhead of virtual machines in cloud computing: A survey, state of the art, and future directions,” Proc. IEEE, vol. 102, no. 1, pp. 11–31, 2013.
  • [17] B. Wan et al., “Modeling analysis and cost-performance ratio optimization of virtual machine scheduling in cloud computing,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 7, pp. 1518–1532, 2020.
  • [18] C. Pahl and P. Jamshidi, “Microservices: A systematic mapping study.” CLOSER (1), pp. 137–146, 2016.
  • [19] N. Cruz Coulson et al., “Adaptive microservice scaling for elastic applications,” IEEE Internet Things J., vol. 7, no. 5, pp. 4195–4202, 2020.
  • [20] K. Fu et al., “Adaptive resource efficient microservice deployment in cloud-edge continuum,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 8, pp. 1825–1840, 2022.
  • [21] S. Luo et al., “An in-depth study of microservice call graph and runtime performance,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 12, pp. 3901–3914, 2022.
  • [22] H. Zhao et al., “Distributed redundant placement for microservice-based applications at the edge,” IEEE Trans. Serv. Comput., vol. 15, no. 3, pp. 1732–1745, 2022.
  • [23] L. Gu et al., “Layer-aware collaborative microservice deployment toward maximal edge throughput,” in IEEE INFOCOM 2022-IEEE Conference on Computer Communications.   IEEE, 2022, pp. 71–79.
  • [24] D. Zeng, et al., “Layered structure aware dependent microservice placement toward cost efficient edge clouds,” in IEEE INFOCOM 2023-IEEE Conference on Computer Communications.   IEEE, 2023, pp. 1–9.
  • [25] G. Yu et al., “Microscaler: Cost-effective scaling for microservice applications in the cloud with an online learning approach,” IEEE Trans. Cloud Comput., vol. 10, no. 2, pp. 1100–1116, 2022.
  • [26] Y. Wang, H. Kadiyala, and J. Rubin, “Promises and challenges of microservices: an exploratory study,” Empirical Software Engineering, vol. 26, no. 4, p. 63, 2021.
  • [27] I. K. Aksakalli, T. Çelik, A. B. Can, and B. Tekinerdoğan, “Deployment and communication patterns in microservice architectures: A systematic literature review,” Journal of Systems and Software, vol. 180, p. 111014, 2021.
  • [28] “AWS Microservice Extractor for .NET,” https://aws.amazon.com/tw/microservice-extractor/, accessed on November 30, 2024.
  • [29] Z. Li et al., “Noah: Reinforcement-learning-based rate limiter for microservices in large-scale e-commerce services,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 5403–5417, 2023.
  • [30] S. Pallewatta et al., “QoS-aware placement of microservices-based iot applications in Fog computing environments,” Future Gener. Comput. Syst., vol. 131, pp. 121–136, 2022.
  • [31] X. He et al., “Online deployment algorithms for microservice systems with complex dependencies,” IEEE Trans. Cloud Comput., vol. 11, no. 2, pp. 1746–1763, 2023.
  • [32] W. Lv et al., “Graph-reinforcement-learning-based dependency-aware microservice deployment in edge computing,” IEEE Internet Things J., vol. 11, no. 1, pp. 1604–1615, 2024.
  • [33] A. Samanta and J. Tang, “Dyme: Dynamic microservice scheduling in edge computing enabled IoT,” IEEE Internet of Things J., vol. 7, no. 7, pp. 6164–6174, 2020.
  • [34] X. Chen et al., “Dynamic service migration and request routing for microservice in multicell mobile-edge computing,” IEEE Internet of Things J., vol. 9, no. 15, pp. 13 126–13 143, 2022.
  • [35] J. Zhu et al., “QoS-aware co-scheduling for distributed long-running applications on shared clusters,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 12, pp. 4818–4834, 2022.
  • [36] N. Bugshan et al., “Intrusion detection-based ensemble learning and microservices for zero touch networks,” IEEE Commun. Mag., vol. 61, no. 6, pp. 86–92, 2023.
  • [37] R. Heinrich et al., “Performance engineering for microservices: research challenges and directions,” in Proceedings of the 8th ACM/SPEC on international conference on performance engineering companion, 2017, pp. 223–226.
  • [38] L. Carvalho et al., “On the usefulness of automatically generated microservice architectures,” IEEE Trans. Software Eng., vol. 50, no. 3, pp. 651–667, 2024.
  • [39] J. Shi et al., “Adaptive QoS-aware microservice deployment with excessive loads via intra- and inter-datacenter scheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 35, no. 9, pp. 1565–1582, 2024.
  • [40] M. Adeppady et al., “Reducing microservices interference and deployment time in resource-constrained cloud systems,” IEEE Trans. Netw. Serv. Manag., vol. 20, no. 3, pp. 3135–3147, 2023.
  • [41] X. Li et al., “Topology-aware scheduling framework for microservice applications in cloud,” IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 5, pp. 1635–1649, 2023.
  • [42] R. Xie et al., “Towards minimum latency in cloud-native applications via service-characteristic-aware microservice deployment,” in 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024, pp. 35–46.
  • [43] K. Fu et al., “QoS-aware and resource efficient microservice deployment in cloud-edge continuum,” in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021, pp. 932–941.
  • [44] F. Zhang et al., “Network-aware reliability modeling and optimization for microservice placement,” 2024. [Online]. Available: https://arxiv.org/abs/2405.18001
  • [45] Y. Liu et al., “How to share: Balancing layer and chain sharing in industrial microservice deployment,” IEEE Trans. Services Comput., vol. 16, no. 4, pp. 2685–2698, 2023.
  • [46] Y. Chen, C. Lu, Y. Huang, C. Wu, F. Guo, H. Lu, and C. W. Chen, “DMSA: A decentralized microservice architecture for edge networks,” 2025. [Online]. Available: https://arxiv.org/abs/2501.00883
  • [47] D. Alencar et al., “Dynamic microservice allocation for virtual reality distribution with qoe support,” IEEE Trans. Network Serv. Manag., vol. 19, no. 1, pp. 729–740, 2022.
  • [48] K. Peng et al., “Joint optimization of service deployment and request routing for microservices in mobile edge computing,” IEEE Trans. Services Comput., vol. 17, no. 3, pp. 1016–1028, 2024.
  • [49] L. Chen et al., “IoT microservice deployment in edge-cloud hybrid environment using reinforcement learning,” IEEE Internet of Things J., vol. 8, no. 16, pp. 12 610–12 622, 2021.
[Uncaptioned image] Yuang Chen (Graduate Student Member, IEEE) received the B.S. degree from the Hefei University of Technology (HFUT), Hefei, China, in 2021. He is currently pursuing the master’s degree with the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC), Hefei, China. His research interests include 5G/6G wireless network technologies, particularly analyzing key performance indicators such as reliability, latency, jitter, and AoI of URLLLC services using stochastic network calculus theory, extreme value theory, and large deviation theory.
[Uncaptioned image] Chang Wu received the B.S. degree from Dalian Maritime University (DLMU), Dalian, China, in 2021. He is currently pursuing the Ph.D. degree in communication and information systems with the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC), Hefei, China. His research interests include 5G/6G wireless network technologies such as traffic engineering, radio resource management and multiple access networks.
[Uncaptioned image] Fangyu Zhang received the B.S. degree from the School of University of Science and Technology of China (USTC), Hefei, China, in 2020. He is currently pursuing the Ph.D. degree with the Department of Electronic Engineering and Information Science, USTC, Hefei, China. His research interests include machine learning, network function virtualization, and network resource allocation in heterogeneous networks.
[Uncaptioned image] Chengdi Lu received the BS degree from the Department of Electronic Engineering and Information Science, USTC in July, 2020. He is currently working toward the PhD degree with the Department of Electronic Engineering and Information Science. His research interests include data center networks and programmable switches.
[Uncaptioned image] Yongsheng Huang received the B.S. degree in information engineering from University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2021. He received the master’s degree with the Department of Electronic Engineering and Information Science, University of Scienceand Technology of China (USTC), Hefei, China. His research interests include microservice.
[Uncaptioned image] Hancheng Lu (Senior Member, IEEE) received his Ph.D. in communication and information systems from the University of Science and Technology of China, Hefei, China, in 2005. He is currently a tenured professor in the Department of Electronic Engineering and Information Science at the University of Science and Technology of China. He is also working at the Hefei National Comprehensive Science Center Artificial Intelligence Research Institute located in Hefei, China. He has rich research experience in multimedia communication, wireless edge networks, future network architecture and protocols, as well as machine learning algorithms for network communication, involving scheduling, resource management, routing, transmission, and other fields. In the past 5 years, more than 80 papers have been published in top journals such as IEEE Trans and flagship conferences such as IEEE INFOCOM in this field, and have won the Best Paper Award of IEEE GLOBECOM 2021 and the Best Paper Award of WCSP 2019 and WCSP 2016 in the field of communication. In addition, he currently serves as an editorial board member for numerous journals such as IEEE Internet of Things Journal, China Communications, and IET Communications.