Early Work on Efficient Patching for Coordinating
Edge Applications

Naveen T.R. Babu and Christopher Stewart
Department of Computer Science and Engineering, The Ohio State University

Abstract

Multiple applications running on Edge computers can be orchestrated to achieve the desired goal. Orchestration of applications is prominent when working with Internet of Things based applications, Autonomous driving and Autonomous Aerial vehicles. As the applications receive modified classifiers/code, there will be multiple applications that need to be updated. If all the classifiers are synchronously updated there would be increased throughput and bandwidth degradation. On the other hand, delaying updates of applications which need immediate update hinders performance and delays progress towards end goal. The updates of applications should be prioritized and updates should happen according to this priority. This paper explores the setup and benchmarks to understand the impact of updates when multiple applications working to achieve same objective are orchestrated with prioritized updates. We discuss methods to build a distributed, reliable and scalable system called ”DSOC”(Docker Swarm Orchestration Component).

1 INTRODUCTION

Autonomous systems like Self-driving cars, autonomous aerial systems, smart restaurants and smart traffic lights have attracted a lot of interest from both academia and Industry [15] [4]. Many top companies like Google, Uber, Intel, Apple, Tesla, Amazon have significantly invested in researching and building Autonomous Systems [10]. An autonomous system is a critical decision making system which makes decisions without human intervention. An autonomous system is comprised of complex technologies which learns the environment, makes decision and accomplishes the goal [15]. In this paper, we focus on ”Microservices Orchestration” for coordinating multiple autonomous applications that are working towards a common objective.

Microservices is an architectural style which structures an application as a collection of different services that are loosely coupled, independently deployable and highly maintainable. Large, complex applications can be deployed using Microservices where each service will have different logical function and contribute to the bigger application [18]. When working towards a particular goal, we might need to deploy multiple applications which need to take up a sub task and coordinate with other applications in order to efficiently complete the task at hand. Efficient coordination of multiple different applications is really crucial for building fully autonomous systems.Configuring, controlling and keeping track of each microservice would be really hard [6]. An efficient way to track and manage multiple applications would be using an orchestrator. Orchestration is a process which involves automated configuration, management and coordination of multiple computer systems and application software [19]. There are various orchestration tools for Microservices like ansible [9], Kubernetes [8], Docker Swarm [7].

Refer to caption — Figure 1: Why Coordination and Patching?

When working with Artificial Intelligence based applications, the performance of each microservice may degrade over time and there needs to be an updated code or Machine Learning model in order to restore the performance [2]. When multiple applications seek an update, allowing all updates would degrade bandwidth, increase throughput and may not yield much performance gain [2]. If any microservice application need an update, it would be a tedious task to identify individual application and perform the update. While performing such updates we need to consider individual application performance, progress towards end goal and system performance [14]. It is practically impossible to consider every application’s performance parameters and pick the model to be updated at run-time [1].

A Patch could be a code update that fixes a bug or yields performance improvement, Machine Learning model update and would be referred as ”Classifier”. The term Classifier is used in rest of this paper. Figure 1 approximates the usage of Classifiers in AI-based applications and patching in real world Autonomous Systems such as Self Driving cars, Smart Traffic Systems, Aerial vehicles and Smart surveillance. These autonomous applications make use of nearly 40-140 total classifiers out of which at least 40 percent have frequent classifier update to improve performance [2]. The frequency of update in individual application is calculated by performing a literature survey of updates using incremental software releases [13] [11] [12]. Out of the total updates, at least 50 percent of them are correlated updates. For example, an update to an application’s model would impact the performance of another interdependent model or code fragment. If multiple applications are coordinating with one another towards a common objective, the choice of update significantly impacts the performance of the system and rate of completion towards the end goal.

This paper proposes Docker Swarm Orchestration Component called ”DSOC” which is responsible for orchestrating multiple applications and efficiently prioritizing classifier updates. To the best of our knowledge, this is the first work to propose an efficient method of using Docker Swarm for multiple AI based application coordination involving classifier updates.

2 Background

2.1 Docker and Docker Swarm

Docker Containers are best suited for Microservices. Docker provides lightweight encapsulation of each application enabling independent deployment and scaling of each microservice. Docker is a composing Engine for Linux containers, an OS-level virtualization technique, which uses namespace and cgroups to isolate applications in the same linux kernel. Control group (abbreviated as cgroup) is a collection of processes that are associated with a set of parameters. cgroup ensures that the specified resources are actually available for a container. Namespace isolation is another such feature where groups of processes are separated such that each group cannot see the resources used by other groups. The kernel resources are partitioned such that a set of processes sees a set of resources while another set of processes see a different set of resources [3].

Docker uses copy on write (COW) and layered storage within images and containers. A Docker image is a read-only template, which references a list of read-only storage layers, used to build a linux container. Docker container is a standard software unit that packages up code and dependencies so that the application runs quickly and can be shipped reliably from one computing environment to another. Docker images become Docker containers at run time when they run on Docker Engine. The layered storage allows fast packaging and shipping of an application as a lightweight container by sharing common layers between images and container. By using Docker, there is potential for faster deployment time and faster model updates [3].

When you have a lot of containerized applications running, there should be a mechanism to make them all work towards a common goal. One method to achieve this is using Docker Swarm. Docker Swarm is a group of machines that are joined together as a cluster and commands are executed by swarm manager to control the group of machines. Each machine in swarm is called a node which can be physical or virtual Machine. Applications can be defined using a manifest file and easily deployed using Docker commands [7].

2.2 Worker and Manager Nodes

Manager Nodes control the cluster with tasks such as maintaining the state of the cluster, dispatching tasks to worker and providing fault tolerance to the system. Currently, Docker supports using multiple Manager nodes where only one manager would be elected as leader and it performs all the responsibilities of a manager. The other manager nodes are standby managers which receive updated information about state of the system and may be chosen as leader when leader node goes down. Using multiple managers is fairly new experimental feature in Docker which would be explored in this research. Worker Nodes are instances which accept tasks from Manager Nodes and execute them as containers. Worker Nodes will not share their state information with other worker nodes and do not make scheduling decisions [7].

2.3 Scale-in and Scale-out of applications

When a Microservice application is deployed, we might need to increase the number of microservice components (scale-out) or decrease the number of microservice components (scale-in) based on the user demand and progress towards end goal. This calculation should happen automatically and applications need to be re-scaled based on workload and progress towards end goal [21].

3 System Architecture

As depicted in Figure 2, the System architecture consists of 3 main components: Application, Coalescer and Stratagem. Applications are lightweight and containerized units which are deployed to achieve a particular sub-task. The main focus in this research is to choose updates efficiently when a group of different applications coordinate to achieve a common objective. Such a group of applications working towards a common objective is called ”Swarm” [7].

The single point of contact for multiple applications is the ”Coalescer”. Coalescer is the orchestrator unit in our design which helps in coordinating multiple applications to achieve a particular goal. A Coalescer has multiple functionalities: It tracks application changes, it processes migration request, it tracks progress and performance per application. If there is performance degradation in any of the application(s), Coalescer makes sure the expected performance of application would be restored. Coalescer handles coordination of multiple applications, updates to application and makes sure overall performance of the system is preserved. Stratagem is a component which records application changes and updates the application with suitable code/model in order to satisfy performance criteria. Stratagem prioritizes updates considering different performance metrics and migrates the required difference, between source code/model and updated code/model, to the appropriate application.

Figure 3 explains the logical components that are used to build Coalescer and Stratagem. ”Manager” is a logical component present in Coalescer. A Manager node tracks one or more applications and makes sure the performance of those applications are optimum. ”Worker” is a logical component present in Stratagem. A Manager node creates multiple worker nodes to track the deployed applications. A manager creates a worker node per application to track changes and makes sure the performance, progress expectation of that application is being met over time.

As Autonomous systems such as Self driving cars, Aerial Vehicles, Smart traffic system, smart restaurants are becoming increasingly popular [7], they have not focused on building DSOC type application which efficiently progresses towards the goal, using a strategy which is easy to deploy and maintain. An autonomous application which is deployed in production will comprise of several smaller applications coordinating together to achieve an end goal [17]. This is a Microservices based architecture where each independent component would have a logical function and contributes towards the end goal [18]. Building such an efficient system which tracks and makes timely progress towards end goal is really crucial. During deployment, there might be updates to individual applications which improves their performance. If all the update hungry applications are allowed to update their model, it would lead to increased throughput and bandwidth degradation. There should be an effective method of prioritizing updates taking several factors such as latency, progress, cpu utilization, memory, accuracy when multiple applications are seeking an update. The implementation section discusses details about prioritizing updates and using the framework from [5]. Using a DSOC approach would give greater control over applications and ensure performance of the system is maintained. Using DSOC, critical concerns like code update, Machine learning model update, performance based progress towards end goal are carefully considered.

4 Implementation

We can leverage the existing Docker swarm functionality for our implementation. Swarm managers control all the nodes and they can use several strategies to run containers efficiently. It can be 1. ”emptiest node” technique - which fills the least utilized machine with containers 2. ”global”- which ensures each machines gets exactly one instance of the specified machines. These strategies help in load balancing, scaling and fault management [7]. There are two methods to implement coordination among groups of applications working towards a specific goal. Greedy approach is one method where every application is eager to increase it’s accuracy and performance. Whenever there’s a newer model or updated code available which improves accuracy and performance, the application tries to perform an update. Figure 4 explains how a greedy approach for patching works. We maintain an update Queue which stores all the model and code updates of applications. M_ij is a code/model update for application-j running on node-i. Calculate ’K’ updates which can be performed such that overall system performance doesn’t degrade. Till the update Queue is non-empty, choose an update M_ij and check if the node-i is unconstrained. If it’s unconstrained, assign a worker to update the application-j on node-i with M ${ij}$ . If the node-i was constrained, delay the M_ij update and proceed by choosing the next model in update Queue.

The second approach is the DSOC approach (Figure 5) where the ”Coalescer” handles all the updates, evaluating the priority of update requests. The system specific parameters like throughput, memory, cpu utilization, bandwidth and application specific parameters like accuracy improvement, execution time, latency are carefully considered before updating an existing model/code fragment.

Refer to Figure 6 to understand how priority is assigned to model/code update. The updates are prioritized after considering all system specific and application specific parameters. c1, c2 are the parameters used to indicate the weight to be given to SP(system specific parameters) and AP(Application specific parameters) such that $0\leq{c1}\leq{c2}\leq{1}$ and c1 + c2 = 1. System specific weights for CPU utilization, memory, storage and throughput are stored in sWeight. Application specific weights for accuracy, progress, latency and execution time are stored in aWeight. Using these, the system performance and application performance of running applicationj on nodei would be calculated. Using these metrics, we would be able to calculate pVal which combines both application and system performance into single metric. The applications which need their updates immediately would be classified as green(priority one), next prioritized updates would be yellow (priority two), updates with least priority would be blue (priority three) and classifiers which need not be updated are red. Green, yellow, blue and red are the coloring scheme maintained by coalescer in order to assign priority to an application’s model update (refer to Figure 6). In the DSOC approach, if individual applications are consistent and make efficient progress towards end goal by carefully considering model updates, we can state that the overall system performance and progress would be prolific.

In Figure 7, we try to carefully predict the trade-off between accuracy improvement and closeness towards end-goal. Closeness towards end-goal is percentage of task completed like 20%, 40%, 60% and so on. In Greedy approach, the accuracy constantly increases and we reach faster towards the ends goal. We reach the end goal with slightly better accuracy using greedy approach using a lot of resources and performing many updates. On the other hand if we choose DSOC approach, there would be slight improvement in overall accuracy as we progress towards end-goal and there would be slower progress towards the end goal, but it uses less resources and performs fewer updates. In DSOC approach, we reach the end goal with lesser updates and slightly lesser accuracy compared to greedy model.

5 Related Work

To the best of our knowledge, so far no approaches with focus of this paper (Efficient Patching for coordinating Edge Applications) have been published. In this section, we discuss related work which are closely related to the research problem of this paper. Lele Ma et al. [16] proposed efficient service hand-off across edge servers using Docker containers migration. Researchers give an in-depth explanation of leveraging Docker features to the full extent. The paper incorporates migration algorithm for service hand-off which gives insights on the process of patching an application. Taherizadeh et al. [20] proposed an auto-scaling method for time-critical cloud applications considering system performance and application level monitoring. The researchers built a Dynamic Multi-level autoscaling system using Kubernetes as an orchestrator. Kaewkasi et al. [14] worked on building Ant colony optimization based scheduling algorithm which outperforms the built-in scheduling provided by Docker Swarm. This research gave hints on carefully considering resource utilization and available resources for coordinating applications.

6 Conclusion and future work

Autonomous systems are evolving at a very fast pace and moving towards achieving full autonomy [15]. The industry and research community need to focus on coordinating multiple applications that work closely together towards achieving an end goal. Containers are increasing in popularity for building and shipping applications efficiently [3]. This paper is an early work which focuses on building an orchestrator component using Docker Swarm mode to coordinate multiple applications that are working together towards an end goal. The orchestrator component is responsible for tracking and choosing the model updates leading to performance improvement. The updates would be prioritized considering system performance and individual application performance. Currently, we have a framework and infrastructure setup to deploy, track and update an application. The future plan is to build several different applications using different Machine Learning techniques. We plan to build applications such as intruder detection, simple face recognition, obstacle detection, mission planner which can work collectively towards safely reaching the destination from a source point. During the mission, we try to run different workload by constraining the mission to measure performance of system and record how DSOC efficiently reaches the end goal.

Acknowledgments: This work was funded in part by NSF Grants 1749501 and 1350941 with support from NSF CENTRA collaborations (grant 1550126). This was an IDEA and Early work paper submitted to ICAC 2019 (now known as ASCOS).

References

[1] Naveen T. R. Babu and Christopher Stewart. Revisiting online scheduling for ai-driven internet of things. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC ’19, page 310–312, New York, NY, USA, 2019. Association for Computing Machinery.
[2] Naveen T.R. Babu and Christopher Stewart. Energy, latency and staleness tradeoffs in ai-driven iot. In ACM Symposium on Edge Computing, 2019.
[3] Babak Bashari Rad, Harrison Bhatti, and Mohammad Ahmadi. An introduction to docker and analysis of its performance. IJCSNS International Journal of Computer Science and Network Security, 173:8, 03 2017.
[4] Jayson Boubin, Naveen T.R. Babu, John Chumley Christopher Stewart, and Shiqi Zhang. Managing edge resources for fully autonomous aerial systems. In ACM Symposium on Edge Computing, 2019.
[5] Jayson Boubin, Christopher Stewart, Shiqi Zhang, Naveen T.R. Babu, and Zichen Zhang. Softwarepilot. http://github.com/boubinjg/softwarepilot, 2019.
[6] https://developer.ibm.com/articles/why-should-we-use-microservices-and containers/. Why should you use microservices and containers?, 2018.
[7] https://docs.docker.com/get started/part4/. Get started: Swarms, 2019.
[8] https://kubernetes.io/. Production grade container orchestration, 2019.
[9] https://www.ansible.com/. Automation for everyone, 2019.
[10] https://www.cnbc.com/2018/01/12/intel-cisco-and-amazon-introduce-self-driving-car-technology-at ces.html. Self-driving cars take over ces: Here’s how big tech is playing the market, 2018.
[11] https://www.dji.com/phantom 3-standard/info. Dji phantom-3 standard, 2019.
[12] https://www.rapidflowtech.com/surtrac. Surtrac: Intelligent traffic signal control, 2018.
[13] https://www.tesla.com/autopilot. Full self driving hardware on all cars, 2019.
[14] C. Kaewkasi and K. Chuenmuneewong. Improvement of container scheduling for docker using ant colony optimization. In 2017 9th International Conference on Knowledge and Smart Technology (KST), pages 254–259, Feb 2017.
[15] Christos Katrakazas, Mohammed Quddus, Wen-Hua Chen, and Lipika Deka. Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions. Transportation Research Part C: Emerging Technologies, 60:416–442, 2015.
[16] Lele Ma, Shanhe Yi, and Qun Li. Efficient service handoff across edge servers via docker container migration. pages 1–13, 10 2017.
[17] Paolo Medagliani, Jeremie Leguay, A Duda, Franck Rousseau, Simon Duquennoy, Shahid Raza, Gianluigi Ferrari, Pietro Gonizzi, Simone Cirani, L Veltri, Màrius Montón, Marc Domingo Prieto, M Dohler, I Villajosana, and O Dupont. Internet of things applications - from research and innovation to market deployment. 01 2014.
[18] Dmitry Namiot and Manfred sneps sneppe. On micro-services architecture. International Journal of Open Information Technologies, 2:24–27, 09 2014.
[19] Magno Queiroz, Paul Tallon, Rajeev Sharma, and Tima Coltman. The role of it application orchestration capability in improving agility and performanc. The Journal of Strategic Information Systems, pages 1–18, 2017.
[20] Salman Taherizadeh and Vlado Stankovski. Dynamic multi-level auto-scaling rules for containerized applications. The Computer Journal, 62(2):174–197, 2018.
[21] M.V.L.N. Venugopal. Containerized microservices architecture. International Journal of Engineering and Computer Science, 6(11):23199–23208, Nov. 2017.

Early Work on Efficient Patching for Coordinating Edge Applications