This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

CoMesh: Fully-Decentralized Control for Sense-Trigger-Actuate Routines in Edge Meshes

Anna Karanika, Rui Yang, Xiaojuan Ma, Jiangran Wang, Shalni Sundram, Indranil Gupta annak8,ry2,xm20,jw22,shalnis2,[email protected] Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaILUSA
Abstract.

While mesh networking for edge settings (e.g., smart buildings, farms, battlefields, etc.) has received much attention, the layer of control over such meshes remains largely centralized and cloud-based. This paper focuses on applications with sense-trigger-actuate (STA) workloads—these are similar to the abstraction of routines popular in smart homes, but applied to larger-scale edge IoT deployments. We present CoMesh, which tackles the challenge of building local, non-cloud, and decentralized solutions for control of sense-trigger-actuate applications. At its core CoMesh uses an abstraction called k-groups to spread in a fine-grained way, the load of STA actions. Coordination within the k-group uses selective fast and cheap mechanisms rather than expensive off-the-shelf solutions. k-group selection is proactively dynamic, and occurs by using a combination of zero-message-exchange mechanisms (to reduce load) and locality sensitive hashing (to be aware of physical layout of devices). We analyze and theoretically prove the safety of CoMesh’s mechanisms. Our evaluations using both simulation and Raspberry Pi lab deployments show that CoMesh is load-balanced, fast, and fault-tolerant.

internet of things, routine management, fault-tolerance, smart buildings
Work supported in part by NSF CNS 1908888, and a Capital One gift.
copyright: none

1. Introduction

Commercial edge deployments with Internet of Things (IoT) devices are rapidly growing in size and density. There are over 10 B IoT devices today, expected to exceed 25 B by 2030 (J., 2021). For instance, smart buildings account for 1.26 B devices, and the smart building IoT sector is forecast to grow to 2.5 B devices by 2027, and comprise a $90+ B market (Memoori, 2022). Other emerging areas for edge IoT deployment include smart farms with robots and sensors (Vasisht et al., 2017; Ryu et al., 2015; Gan and Lee, 2018; Chandra and Collis, 2021; Sivakumar et al., 2021), battlefield deployments (Liu et al., 2022; Feng et al., 2020; Shahid et al., 2021), performance arenas (Fox et al., 2019; Turchet et al., 2018), and even indoor spaces such as warehouses (Fatima et al., 2022; Wang et al., 2022) and large marketplaces (Lampropoulos et al., 2019; Lin et al., 2016) with growing markets (Research, 2022; Aarti and Vineet, 2022; Future, 2022).

In such physically distributed settings, it is commonplace today to build mesh networks among the IoT and edge devices. Mesh networks can be robust, resilient, and allow easy “scale out” by adding and removing devices, without requiring a central operating unit (or hub) which may be overloaded and prolong latencies. Some devices may be “smart” devices capable of compute and storage (both small-scale), while other devices may be “simple” devices. Mesh networks are today being built using Wifi, Zigbee, Bluetooth, LoRa, etc. (K. et al., 2021; Guo et al., 2022; Li et al., 2022). Industrial mesh networks using IEEE 802.15.4, WirelessHart (Song et al., 2008), or IEC (ISO Central Secretary, 2016) are becoming common. There is rich literature at the networking layer, e.g., on how to build, configure, and route inside mesh networks (Akyildiz and Wang, 2005; Yang et al., 2005).

Sitting above the mesh network is the control plane—this needs to execute three kinds of actions: Sense-Trigger-Actuate (or STA). The control plane needs to sense: continuously collect measurements from multiple devices. Applications specify predicates, which are arbitrary Boolean clauses involving multiple device readings. When a predicate is satisfied, the control plane needs to trigger programmed series of actions that touch many devices. Third, because some of these actions may be long-running (e.g., mechanical movements, repetitive actions, or time-based actions such as water sprinklers), the control plane needs to initiate and monitor actuation of these commands (at multiple devices), while satisfying the challenging goal of maintaining consistency (Ahsan et al., 2021).

At a smaller scale, this STA abstraction exists in smart homes. The most popular programmatic abstraction for STA in smart homes, which is also applicable in edge meshes, is a routine. A routine is a sequence of commands that touches multiple devices, wherein each command executes an action on only one device. Multiple routines may be active at any given time, while many more remain dormant, waiting to be triggered. A routine may be triggered either based on sensor values, or at particular times, (or a combination thereof), or manually. Routines are offered today by Amazon Alexa (Amazon Alexa, 2014), Google Home (Google Home, 2016), Samsung SmartThings (Samsung Smart Things, 2012), and others. For instance, one routine at an office building may be triggered every week day at 9 pm, switching on security cameras, switching on specific external lights, and locking the outside doors. Another routine might be triggered by a combination of sensors and time, e.g., if the building temperature drops below 45 F between 11 pm and 6 am, take emergency actions to prevent bursting of pipes, and send notifications to facilities staff. During the day multiple routines may run continuously, e.g., some raising and lowering shades on the exterior of the building in order to regulate sunlight flowing into the building as a function of temperature, while other routines change interior lighting in different rooms and corridors as a function of occupancy, etc. Routines may be triggered either by humans (on demand) or automatically.

In a smart home the small number of devices, and the physically proximate nature of the home, means that a single central hub (e.g., an Alexa unit or a Google Home unit) can be used to execute all the three S-T-A categories of actions in the control plane. However, in commercial IoT deployments, the central hub approach becomes untenable as these deployments span large areas, involve many tens or hundreds of devices or more, and routines (active+dormant) numbering in several tens to a few hundred. In fact, a centralized hub may even be infeasible to even set up in really large areas like smart farms and battlefields. Hence mesh networks are growing in popularity for such settings.

Where it is possible to set up, the central hub may be overwhelmed very quickly by S-T-A traffic. For the setting with routines that we described, with a (say) 1 second fidelity for monitoring devices that trigger routines, monitoring just 500 devices in a building involves the home hub exchanging messages every 2 ms; even using AWS IoT’s liberal (suggested) 30 s ping frequency (AWS, 2022) means the central hub exchanges messages every 40 ms.

Distributed support for STA needs to be local, without needing a connection to the cloud. Central hubs in use today are themselves unreliable (Moore et al., 2020), and almost always connect to the cloud which itself may suffer outages (Grothaus, 2019). No mechanisms exist today to coordinate multiple local hubs for the specific STA workload of routines running over large edge meshes.

This paper presents CoMesh (Control Mesh), the first system for monitoring and management of routines (like those available via Amazon Alexa, Google Home, Samsung SmartThings, etc.) over edge meshes. CoMesh is naturally fully-decentralized and contains local distributed protocols running among available smart devices (other simple devices unable to run compute or memory, may also co-exist)—only a small fraction of devices need to be smart for CoMesh to work. CoMesh relies on neither a cloud nor on a local central hub.

CoMesh has to tackle four major challenges: (1) fully-decentralized monitoring, triggering, and actuation of routines, (2) load-balance work across both devices and time, (3) tolerate simultaneous failures of multiple devices, and (4) have load that is aware of the physical layout of devices. We propose a building block called a k-group, which is a dynamically selected subgroup of smart devices. This allows us to break the STA workload into fine granularities, by assigning a separate k-group for sensing each device (or a group thereof), and one k-group for triggering and actuating each routine. CoMesh’s k-groups use a combination of spatial load balancing and temporal load balancing: a) each k-group is selected via consistent hashing, and b) migrated periodically. The migration spreads the load of “heavy” subgroups (e.g., k-groups assigned to devices whose readings vary significantly, or k-groups assigned to frequently triggered routines) over the system. Within each k-group, we eschew using internet-based coordination systems like Zookeeper (Hunt et al., 2010) which may be expensive, and instead we carefully design fast and cheap versions of coordination techniques including agreement via quorums, leader election, and responding to failures of subgroup leader or members. Specifically, to tolerate ff failures, it suffices for each subgroup to contain (2f+1)(2f+1) members.

To make proactive k-group migrations cheap, CoMesh uses zero-message exchange protocols—“zero-message” means that any node can (in the fast path), without exchanging any messages, calculate the current subgroup for any other given device or routine. In large spaces it is important keep communication local for latency and energy reasons—CoMesh adapts ideas from the space of locality-sensitive hashing (LSH) (Datar et al., 2004) to IoT and smart space settings, so that sub-group selections are in the locality of the monitored device(s), and yet retain the benefits of zero-message exchange mechanisms.

Underneath CoMesh we require: i) any mesh routing algorithm, and ii) a weakly consistent membership protocol for mesh networks (e.g., Medley (Yang et al., 2022)).

The contributions of this paper are:

  • We present CoMesh, the first local and fully-decentralized system for routine monitoring and management over edge meshes intended for sense-trigger-actuate scenarios.

  • CoMesh proposes a building block called k-groups, which help load-balance sensing, triggering and actuation work over devices. k-groups are selected using zero-message mechanisms and locality sensitive hashing, and within k-groups are running fast and cheap protocols for coordination and election.

  • We analyze and theoretically prove safety, liveness, and progress of CoMesh’s mechanisms.

  • We perform large-scale simulations as well as smaller-scale Raspberry Pi-4 deployment experiments.

2. System Model

We assume a crash-recovery model: IoT devices may fail (crash) at any time, and then recover at a later time. We do not target Byzantine (malicious) and leave them to future work. As is traditional in fault-tolerant literature, we assume that only up to ff simultaneous crash failures occur in the system. The value of ff can be calculated from the deployment settings. For instance, buildings contain power domains, each connected to a power breaker. A common failure mode is an entire power domain going out, thus making all the devices inside that domain fail. Thus one rule of thumb is to set ff to the number of smart devices in the largest power domain inside the building. Because a building contains many power domains, ff remains much smaller than NN, the total number of devices in the building.

We assume a subset of devices (>3>3) are smart devices, equipped with (small) memory and computation capability—other simple (non-smart) IoT devices may also co-exist in the system. Smart devices may include any combination of IoT devices capable of arbitrary compute, Raspberry Pis, home hubs, handheld devices, etc. We assume non-smart devices can only receive and process commands, and respond to queries for status, but they cannot run any computation.

Since IoT devices are typically fixed early in the operation of a building, we assume that the locations of all devices are known by the smart devices 111Device motion can be treated as a failure event, followed by recovery event at its new location.. We assume smart devices’ clocks are synchronized (although CoMesh works with loose synchronization).

Smart devices communicate via a wireless network that is lossy. For generality, we assume the existence of a wireless ad-hoc routing protocol among smart devices (Dijkstra, 1959; Perkins et al., 2003; Johnson and Maltz, 1996). We assume any smart device can send a command to any IoT device. Finally, CoMesh is built atop a weakly-consistent membership service like Medley (Yang et al., 2022), which helps maintain a full membership list (of all IoT device IDs that are alive) at each smart device. Medley provides O(1)O(1) detection time of failure and recovery events. Table 1 summarizes key terms, and Fig. 1 shows an example setting.

Term Description
Command A message sent to, and associated action executed by, one IoT device.
Routine A sequence of commands. Triggered either by time, or a set of triggering clauses or manually.
IoT Device All IoT devices in system.
Smart Device An IoT device that has (small) memory and computation capability.
Simple Device An IoT Device that is not a Smart Device. Typically contain sensor(s) or actuator(s).
Table 1. Key Terms used in this paper.
Refer to caption
Figure 1. Three Sense-Trigger-Actuate Deployment Settings: (a) Smart Building, (b) Smart Farm, (c) Smart Factory.

3. CoMesh’s subgroups: The kk-groups Mechanism

CoMesh’s protocols rely on its subgroup formation and maintenance. CoMesh uses these subgroups to sense-trigger-actuate routines and devices (described in following sections). We describe subgroup formation in this section.

CoMesh’s subgroups are called k-groups, signifying our goal of keeping a kk-group’s size to be around (a small value) kk. kk is a globally configured variable. To tolerate ff system-wide failures, we set k=(2f+1)k=(2f+1). We present kk-group selection (Sec. 3.1), election (Sec. 3.2), quorums (Sec. 3.3), temporal migration (Sec. 3.4), failure response (Sec. 3.5), and locality (Sec. 3.6).

3.1. kk-group Member Selection

CoMesh’s kk-groups need to be selected in way that is: (i) load-balanced, and (ii) uses a minimum number of messages.

Load-balanced Selection via Zero-message Exchange: A smart device with ID SS discovers if it is present in a kk-group for a target device with ID DD by calculating a consistent hash: Hash(S,D)Hash(S,D). The lowest kk hashed IDs across the entire system are then the approved kk-group members. This design is inspired by the idea of cryptographic sortition (Micali et al., 1999; Apecechea, 2019). Notice that, given a fixed hash function (e.g., a cryptographic hash like SHA-2 or MD-5), any arbitrary smart device can calculate the entire membership of any kk-group by calculating this hash function on each of the IDs from its (full) membership list. With correct membership lists, this takes zero messages.

3.2. Leader Election

The kk-group’s goal is to fault-tolerantly replicate the state of its monitored entity. To make its operations efficient, each kk-group elects a leader. The alive smart device with the lowest hashed value (Hash(S,D)Hash(S,D)) is considered to be the leader. While election could be done via a zero-exchange mechanism, in order to ensure that the leader is actually alive, CoMesh runs the well-known Bully election algorithm to quickly elect the leader (Garcia-Molina, 1982). The Bully algorithm is attractive as it is fast in the common case: 1 RTT for the lowest-hashed node (leader) to inform others in the kk-group. If failures occur, the Bully algorithm takes a worst case of 5 RTTs.

3.3. Quorum Agreement

Fast agreement in a kk-group is done via quorums. In a kk-group of size k=2f+1k=2f+1, anyone in a group (typically the leader) sends a message to all MM members, and waits for acknowledgments from at least (f+1)(f+1) members. Thus with ff failures, at least one surviving kk-group member knows about the latest decisions in the kk-group, and can communicate it to future leaders.

3.4. Epoch-based kk-group Migration

While Section 3.1 is spatially load-balanced, for temporal load balancing, CoMesh migrates each kk-group membership periodically. This is done once every epoch, and epoch lengths are fixed system-wide. (Different kk-groupscan migrate at different times.)

First we extend the hash function for the selection to include the epoch number, making it Hash(epoch,S,D)Hash(epoch,S,D). The epochepoch is incremented by 1 on each epoch change. Next, when an epoch expires, the following sequence executes: 1) the new kk-group is formed by using Hash (++epoch, S, D); 2) an election is run in the new kk-group (using Sec. 3.2); 3) any state maintained by the leader in the old kk-group is migrated to the new kk-group’s leader; 4) the new kk-group’s leader replicates the state information at a quorum of its new kk-group members (Sec. 3.3); 5) atomic changeover to new kk-group: new kk-group leader tells old kk-group leader that the old kk-group is decommissioned, old leader acknowledges; and 6) old leader tells old kk-group members that they are decommissioned.

When the new kk-group leader acknowledges to the old kk-group leader in step 5 that it has received all the state, the old kk-group leader can delete its old kk-group state. After step 6, an old kk-group node can delete its old state for that kk-group. During steps 1 though 7, no monitoring operations occur in the kk-group. This means that old kk-group members and old leader cease all kk-group operations at exactly the epoch changeover boundary. In practice the epoch change is fast enough that this gap in monitoring (between steps 1 to 6) is small, as shown by our experimental results. After step 6, all normal kk-group operations resume.

3.5. Failure Response: kk-group Updates

If a smart node NiN_{i} fails, the failure detector layer informs all other nodes of its failure. Because CoMesh’s operations remain correct for group size [f+1,2f+1]\in[f+1,2f+1], we trigger a kk-group selection only when ff nodes have failed (rather than on each failures)—this keeps the overhead of failure response low.

The only exception is if the failed node NiN_{i} was a leader of a kk-group. This causes Section 3.2’s election algorithm to be re-run. This new leader needs to reconstruct the state from all the nodes in the kk-group. Each piece of state needs to be confirmed by a quorum (f+1f+1 nodes). This state is correct because all past updates were present at at least f+1f+1 nodes, so the new leader receives each past update from at least one non-faulty node. If the new leader fails during reconstruction, election is re-run and the process repeats.

3.6. Grouping via Locality-Sensitive Hashing

Because the underlying routing infrastructure is an ad-hoc network, this raises two issues. First, the sheer number of devices and routines may lead to a large number of kk-groups. Second, a randomly selected kk-group will consist of nodes that are spread across the ad-hoc network, thus incurring high message overhead on links.

Limiting the Number of kk-groups: We use a clustering algorithm (Krishna and Narasimha Murty, 1999) to partition the devices into clusters. Each cluster is assigned to one kk-group, with a random device from the cluster chosen as the “center” (i.e., representative) in order to calculate the locality sensitive hashing function described above. For routines, we pick a random device it touches as its representative, and that device’s cluster is now the routine’s cluster (and thus its kk-group).

Locality-Sensitive Hashing: Selecting kk-groups randomly (Section 3.1) spreads the kk-group members all over the network, slowing down intra-group operations (election, quorum, inter-leader). The challenge is to make kk-group selection local while still retaining many of the spatial load-balancing benefits of hashing. To do so, we need to adapt ideas from locality-sensitive hashing (LSH) (Datar et al., 2004), an idea from high-dimensional data science. To the best of our knowledge, our paper is the first to apply LSH to edge/IoT networks.

Concretely, we modify the kk-group member selection to be a two stage process: select f+1f+1 nodes in the vicinity of the monitored device/cluster, and an additional ff nodes randomly from across the group. This provides for speed in intra-group operations (via nearby f+1f+1 members), while still ensuring that a power domain failure (see Section 2) does not compromise the operation of a group (since there will be fewer than f+1f+1 devices in a power domain).

The main idea of LSH (Datar et al., 2004) is to hash similar vectors (in space) into the same “buckets” with high probability. A randomly selected “center” (from the monitored device group) and the smart devices are hashed into buckets and the majority is selected from the smart devices that are in the same buckets with the center “candidates”). If there are more than f+1f+1 candidates, we choose the nearest f+1f+1 ones to the center. Notice that selecting the majority randomly from the candidates usually does not work well since buckets may also have nodes that are not close to the center. If there are fewer than f+1f+1 candidates, CoMesh first checks if the center’s neighbors (devices that are close to the center) have any candidates that we can “borrow”, before moving to random selection.

Overlapping of kk-group’s members in consecutive epochs helps with data transfer operations during epoch change. To achieve this, we append (prei,,pred)(pre_{i},\dots,pre_{d}^{\prime}) to its own location vector (x1,,xd)(x_{1},\dots,x_{d}^{\prime}), where preipre_{i} is the average location of all devices this smart device monitored during the previous epoch on iith dimension. Then we use this vector as an input for LSH. Adding some randomness to the vector entries for load balance purposes is also an option (our default setting). We append (x1,,xd)(x_{1},\dots,x_{d}^{\prime}) to a simple device’s location vector to make all devices have the same dimensions.

The resultant locality-sensitive hash function h𝒂,b=𝒂𝒗+brh_{\bm{a},b}=\lfloor{\frac{\bm{a\cdot v}+b}{r}}\rfloor adapted from (Datar et al., 2004), maps a dd dimensional vector 𝒗\bm{v} onto a set of integers, where aa is a random dd dimensional vector whose entries are independently chosen from a pp-stable distribution (Gaussian distribution in our case) (Zolotarev, 1986; Datar et al., 2004); bb is a real number uniformly chosen from [0,r][0,r] and rr is a parameter. To further reduce the probability that vectors with very different entries fall into the same buckets, several individual hash functions can be concatenated, such that H=(h1,,hm)H=(h_{1},...,h_{m})). Additionally, we can use ll hashes (HHs) to increase the number of similar vectors in the same buckets. That is, we have ={H1,,Hl}={(h11,,h1m),,(hl1,,hlm)}\mathcal{H}=\{H_{1},...,H_{l}\}=\{(h_{11},...,h_{1m}),...,(h_{l1},...,h_{lm})\}.

Refer to caption
(a) Average quorum distance. NN = 500, #smart devices = 200.
Refer to caption
(b) Average kk-group overlap. NN = 500, #smart devices = 200.
Refer to caption
(c) Average quorum distance for different system sizes (N).
Refer to caption
(d) Average kk-group overlap for different system sizes (N).
Figure 2. Average quorum distance and kk-group overlap for different selection policies, parameters and system sizes.

We give some intuition for choosing the values of these LSH parameters. Large mm values exclude nodes that are not similar and reduce the number of candidates, while large ll and rr may increase this number (Fig. 2(a) and Fig. 2(b)). The default setting in our paper is m=2,l=2,r=4m=2,l=2,r=4. In Fig. 2(c) and Fig. 2(d) we tested our default setting for grid topologies with different sizes. We can see that while the average hop count to achieve quorum (quorum distance) increases linearly with the system size for the random selection policy, it stays small for the LSH-based selection policy. More experiments on random and clustered networks can be found in Sec. 6.

NN 50 250 500 750 1000
#candidates 13.0223 34.1258 50.7501 52.8035 66.9135
Table 2. Average number of kk-group member candidates for different NNs. 40% of the devices are smart ones.

Although the kk-group overlap based on LSH drops for large NN (Fig. 2(d)), it is much larger than the random policy. We hypothesize that with more nodes, and thus more candidates, the probability of the same device being selected in consecutive epochs grows smaller. This hypothesis is validated by Table 2. One may also adjust LSH parameters to reduce the candidate count without affecting quorum distance. E.g., at N=1000N=1000, m=3,l=2,r=4m=3,l=2,r=4, the average number of candidates is 21 and the average quorum distance is 2.4 (vs. 5.3 for the random policy).

4. Routine & Device Management

We now describe how sense-trigger-actuation management of routines and devices builds atop the kk-groupsof Section 3.

4.1. Device Management

In CoMesh each device is assigned to a unique kk-group. This kk-group is responsible for sensing actions on the device, i.e., tracking the up/down status of the device, and the latest state of the device (e.g., temperature readings for a thermostat). The leader of the device’s kk-group (henceforth called the device leader) periodically pings the device. If the device does not respond within 2 RTTs, the device is presumed failed, and the device leader notifies all smart devices in the topology. If the detected state of the device DD (received in the ack) differs from its latest recorded state at the device leader, the device leader sends this new state to all the routines which contain DD in their triggering clauses (specifically, sending to the leaders of those routine’s respective kk-groups).

4.2. Routine Management

Recall that a routine consists of a trigger clause (arbitrary boolean clause involving multiple devices and their states), and a set of commands to execute when the routine is triggered. The set of devices in the former trigger set (read from) and the latter command set (actuated on) may be different or may overlap.

In CoMesh each routine is assigned a unique kk-group, which is responsible for sensing (monitoring), triggering, and actuation (execution) of the routine. The sensing is done in collaboration with device kk-groups, as described earlier. A routine may be in one of the following states (maintained at its kk-group): (i) NOT_TRIGGERED, (ii) ACQUIRING_LOCKS, (iii) EXECUTING, and (iv) RELEASING_LOCKS. We call the leader node of the routine’s kk-group as the routine leader. The routine leader connects with the device leaders of each device in the routine’s trigger set. Whenever these device leaders send updated states to the routine leader (Section 4.1), the latter checks if this state change satisfies the trigger clause for the routine.

When a routine is triggered, the routine leader changes its state from NOT_TRIGGERED, to ACQUIRING_LOCKS, and replicates this state to all its kk-group members. It waits for (f+1f+1) responses, counting itself. When a routine starts it first acquires (virtual) locks to all devices in the routine’s command set. Then the routine executes its commands. Finally when the routine is finished it releases all its locks. This pessimistic concurrency control approach has been shown to be fast, avoid deadlocks and livelocks, and scales well (Ahsan et al., 2021).

The lock for a given device is only maintained in that device kk-group (rather than the routine kk-group)—this allows multiple routines to compete for locks. For a routine RR to acquire a device DD’s lock, RR’s routine leader communicates with DD’s device leader. (If the routine leader is unaware of a current device leader, it contacts all kk-group members of that device, to know the leader via Section 3.1’s zero-message exchange protocol, and then contacts the device leader.)

The device leader maintains a wait list of routines requesting a lock, and grants only one lock at a time (in FIFO order). When a lock is released by a routine leader (at RELEASING_LOCKS stage), the device leader dequeues the next entry from the wait list, sends a request to all its device kk-group members, and waits for (f+1)(f+1) acks, before it sends back the lock to this requesting routine leader. When all locks have been acquired, the routine leader changes the routine’s state to EXECUTING, communicates this to all routine kk-group members, waits for (f+1)(f+1) acks, and then starts issuing the commands for the routine. A routine’s command is always routed from the routine leader to the appropriate device leader which then forwards it to the device. This way the device leader (kk-group) always knows the device’s latest status.

The order in which locks are acquired can affect the speed of triggering. There are two design options—sequential and optimistic.

4.2.1. Sequential Lock-Acquiring (SLA) Strategy

The routine leader acquires locks sequentially. If the locks are acquired in increasing order of device ID, this prevents deadlocks (cycles cannot occur) among multiple routines with conflicting device sets. The SLA strategy can be slow when there are no or few conflicts.

4.2.2. Optimistic Lock-Acquiring (OLA) Strategy

CoMesh’s Optimistic Lock-Acquiring (OLA) attempts to acquire all locks for its (desired) devices simultaneously. If any lock request fails, the routine leader releases all locks it acquired, and retries all again. To prevent message implosion, retries occur after a backoff timeout. OLA is costly when routines’ command sets overlap.

We experimentally compare these two strategies later.

5. Formal Analysis

We formally analyze CoMesh’s properties. For brevity, we intuitively summarize our findings first. Then we state the formal theorems. The proofs can be found in Appendix A.

  • \bullet

    Inheritance: When a k-group changes, the state held by the old leader and new leader are identical.

  • \bullet

    Safety: No two routines that touch an overlapping set of devices, are allowed to execute simultaneously.

  • \bullet

    Liveness: No routines deadlock.

  • \bullet

    Progress: Every routine makes progress.

  • \bullet

    We also calculate the (probabilistic) availability of the system when more than ff devices fail.

Definition 0.

Inheritance means that given a kk-group, the old and new versions of the kk-group—before and after epoch change, or after leader failure—maintain identical state.

We now state the key results.

Lemma 0.

[Inheritance] Inheritance is guaranteed after the state transfer stage when at most one node fails. That is, the state held by an old kk-group leader is the same as the state held by the new kk-group leader after the state transfer stage, under failure of at most one node.

Theorem 3.

[Safety] No two routines that conflict in devices (i.e., their touched device sets are not disjoint) can execute simultaneously, when there is at most one node failure. This is true for both SLA and OLA locking strategies (Section 4.2.1).

Theorem 4.

[No Deadlocks] No two routines are stuck in a deadlock.

Theorem 5.

[Progress/Liveness] Consider a set of executing routines \mathcal{R} each still waiting to acquire 1 or more locks, and the (union) set of devices DD they are waiting for. If no further routines arrive into \mathcal{R}, and SLA locking is used, then: at least one routine from the set \mathcal{R} will make progress (i.e., get access to one more device that it desires).

Refer to caption
(a) Availability under different number of simultaneous failures.
Refer to caption
(b) Impact of different number of groups in the system
Figure 3. Probability of CoMesh running correctly under different parameters. (Default S=100, G=30, f=5)
Theorem 6.

[Availability] With F(>f)F(>f) simultaneous failures, the system remains available, i.e., all kk-groups contain at least (f+1)(f+1) non-faulty nodes, with probability P(F)P(F{}) that varies as:

  1. (1)

    When FfF{}\leq f{}, P(F)=100%P(F)=100\%

  2. (2)

    When FfF{}\geq f{}, the availability is:

    P(F)\displaystyle P(F) =P(at most f failures across k-groups)\displaystyle=P(\text{at most }f\text{ failures across }k\text{-groups})
    =P(at most f failures in one k-group)G\displaystyle=P(\text{at most }f\text{ failures in one }k\text{-group})^{G}
    =(i=max(0,k+FS)f(SFki)×(Fi)(Sk))G\displaystyle=\Bigg{(}\sum_{i=\max(0,k+F-S)}^{f}\frac{\binom{S-F}{k-i}\times\binom{F}{i}}{\binom{S}{k}}\Bigg{)}^{G}

Practically, this scales. Fig. 3(a) shows that as FF rises, CoMesh availability drops as expected but still stays above 50% when FF{} grows up to 9×f\times f. Fig. 3(b) shows that in order to reach high availability (p=50%,90%,99%p=50\%,90\%,99\%), the required FF value drops very slow—this indicates scalability with the number of groups.

6. Simulation Evaluation

To perform large-scale experiments with 100s of nodes on CoMesh, as well as to measure its internal behavior, we wrote a custom simulator. Higher-fidelity simulators like NS3 have known scalability issues beyond a couple of hundred nodes. We measure both: i) user-facing metrics, e.g., latency to start routines, load balancing, etc., and ii) microbenchmarks of kk-group operations.

Our simulator models network contention by scaling down network bandwidth based on density: specifically sending bandwidth from node AA to node BB is set to the default node bandwidth divided by number of 1-hop neighbors of AA. We set default node bandwidth (Table 3) conservatively based on bandwidth numbers reported via user complaints on online forums about IoT devices (McNally, 2022; rpi, 2019). Latency on each hop is fixed to 1 time unit, as we assume that transmission radii are small. Ad-hoc routing is based on Dijkstra’s shortest paths.

Parameter Notation Default value
# all devices NN 250
# failures for completeness ff 2
k-group size KK 5
# smart devices SS 100 (40% of all devices)
#seeds 10
epoch length 200
max routine length 5
avg devices per routine 5
# device managed by kk-group 25
bandwidth cap 625 KBps/5Mbps
kk-group selection policy LSH-random mix
device cluster policy locality
leader election policy LSH smallest hash
default LSH parameters m = 2, l = 2, r = 4
Table 3. Default parameter values

We use three different network topologies: 3D grid, random, and clustered. We ensure the graph is connected. CoMesh was configured with topology-aware LSHMix (Sec. 3.6). Table 3 shows default parameters.

6.1. Client Delay

Refer to caption
(a) Grid topology
Refer to caption
(b) Random topology
Refer to caption
(c) Clustered topology
Figure 4. Client Delay (sys) Under Serial vs. Parallel Locking Strategy in three different topology setups

When a routine’s trigger conditions become true, how long does it take for the routine to execute its first command, given there are no other routines active? This is the client delay for a routine: the time it takes for its kk-group to detect (and replicate) the trigger, lock all of its touched devices, and start execution. In Fig. 4, we observe that CoMesh’s client delay scales with device count, regardless of topology or locking strategy (SLA = serial or OLA = parallel). The trend is gradually increasing in all plots.

As expected the parallel locking (OLA) strategy is faster than serial locking (SLA) because client delay excludes conflicting routines. Among the 3 topologies, grid has higher client delay than random and clustered (which are similar) because grid topology’s devices are more spread out. That is, while all three topologies have the same overall density, the variance of density in different parts of the network is higher for random and clustered topologies, causing “clusters” of nodes to form and speeding up communication.

6.2. Churn Effect on Client/Sync Delays

Refer to caption
Figure 5. Client Delay and Synchronization Delay when 40% of the smart devices in a grid topology experience churn (either a join or a failure)

Fig. 5 shows that when 40% of the smart devices’ nodes are churned in a grid topology (SLA locking), the client delay increases only 25%. This extra delay accounts for the leader election and state transfer to the new leader after a kk-group’s old leader has failed. We define synchronization delay between a routine releasing its last conflicted lock, and the next waiting (blocked) routine executing its first command. This experiment was performed with two instances of the same routine triggered back to back (i.e., all devices are conflicting devices). We observe that sync delay also rises slowly with system size—increasing number of nodes by 15×\times from 50 to 750 causes <4×<4\times increase in sync delay (and client delay).

One may be led to believe that client delay is higher than sync delay. Yet, each has operations excluded in the other. Sync delay excludes triggering of routine and associated kk-group quorum. Client delay excludes lock release latency and associated quorums.

6.3. Load Distribution

Refer to caption
(a) Temporal load distribution with locality kk-groups
Refer to caption
(b) Spatial load distribution with locality kk-groups
Refer to caption
(c) Temporal load distribution with random kk-groups
Refer to caption
(d) Spatial load distribution with random kk-groups
Figure 6. CoMesh’s load distribution across time and space for the two member and leader selection algorithms (locality vs. random)

Fig. 6 shows the CDF of load distributions, in a grid topology, for the leader, for member, and for all (leader or member) nodes. Temporal load is the amount of time that a smart device serves in a particular role over the entire experiment duration. Spatial load is the number of concurrent k-groups that a smart device is present in.

The top two plots show CoMesh with LSH and the bottom two plots show CoMesh with random member selection. As expected, random member selection is the most load balanced across nodes and across time. Although LSH clusters responsibilities closer to target devices leading to load imbalance (top two plots), the median temporal load (left two plots) for member and leader/member are surprisingly comparable between random and LSH, showing that LSH balances member load almost as well as random. LSH member load has a higher tail due to clustering effects. The same clustering effects lead to repeated leader candidates and increase the median leader load in LSH (top left plot).

6.4. k-Group Operations

Refer to caption
Figure 7. Delays of operations within a kk-group for different values of kk
Refer to caption
Figure 8. Delays of operations within a kk-group for different policies for entity clustering, member selection, and leader election
Refer to caption
Figure 9. Delays of operations within a kk-group for different percentages of node churn

We now measure the internal performance within kk-groups. We use the default parameters of Table 3. Fig. 7 shows that all kk-group operations—election, quorum, and state transfer—scale well as the size of the kk-group is increased from 3 (f=1f=1) to 11 (f=5f=5). Quorums are scalable because requests and replies are sent in parallel. Leader election delays are scalable in the common case because of the best case of the Bully algorithm. State transfer delay is a function of the amount of state, but does not depend on kk.

Fig. 8 confirms our hypothesis that locality-aware mechanisms for kk-group entity selection, member selection, and leader election decrease the operational delays of a kk-group. The state transfer operation’s delay is not decreased as much as the quorum or leader election’s delay since, apart from a quorum, the unicast to the old leader is not affected by LSH.

Fig. 9 shows the k-group’s operations are not affected by the smart devices’ churn in the topology. Only when the churn percentage affects about 60% of smart devices do kk-group operations’ get slower. This is because a kk-group’s operation only requires a majority of its members. Upon the failure of ff members in a kk-group, the leader ensures that failed members are replaced so that future failures of up to ff smart devices do not affect it either.

7. Deployment Evaluation

We implemented CoMesh in the Raspberry Pi (RPi) 4 environment. Our implementation contains 12.5K lines of Java code. Our deployments are on RPi 4 model B, possessing 2GB of LPDDR4 RAM and a Broadcom BCM2711 quad-core CPU of 1.5GHz Cortex-A72 cores. While Section 6 showed larger-scale simulation results, our goal with RPi’s is to benchmark the behavior of CoMesh’s core mechanism, i.e., k-groups, on a small number of real devices.

We deployed 11 RPi 4 devices in our lab, forming different mesh topologies. To attenuate RPi’s strong signal and create the mesh topology, we consistently wrap each device in aluminum foil to reduce transmission power to the lowest setting, 15dBm (Ahsan et al., 2021). While CoMesh works with any ad-hoc routing protocol, for concreteness we use OLSR routing (OLSR.org, 2021) due to its standardization in Pi 4s.

We configure CoMesh to use underneath it the open-source implementation of the Medley (Yang et al., 2022) membership protocol. CoMesh is configured to use kk-groups with k=3k=3, and monitors devices every second. To avoid cold start biases, we start measurements 10 s after CoMesh is initialized. Each data point involves at least 5 trial runs with different seeds. Epochs change every 60 s. Different routines may be triggered simultaneously. The same routine may be triggered multiple times, but never within less than 10 s of its own previous invocation. We use SLA locking (Sec.4.2.1).

Refer to caption
(a) Grid Topology
Refer to caption
(b) Line Topology
Figure 10. Raspberry Pi Topologies: Lab Deployment.

We implement two Pi topologies shown in Fig. 10: (i) a 4 ×\times 4 2D Grid-like topology and (ii) an L-shaped Line (modeling the outline of one floor of our office building). Each covers an area of 4 meters ×\times 4 meters. The Line topology’s diameter is higher than Grid’s.

Bandwidth Usage—Decentralized vs. Centralized: Fig. 11 shows decentralized CoMesh’s bandwidth per node, both end to end (E2E) messages, as well as ad-hoc routed (hop to hop/ H2H) messages. For comparison we also instantiated CoMesh with k=1k=1 and one kk-group, i.e., a single centralized device responsible for all routines’ monitoring and management—this is akin to using a central home hub (Sec. 1). We observe that: a) Decentralized CoMesh reduces bandwidth by an order of magnitude (over 10 ×\times) compared to the centralized approach. b) Both end to end bandwidth and hop bandwidth are scalable, i.e., remain flat, as the number of routines grows. c) Bandwidth of foreground operations (e.g., election, k-group migration, etc.) remain small compared to unavoidable background bandwidth (e.g., device monitoring).

Refer to caption
Figure 11. Bandwidth: Decentralized CoMesh vs. Centralized.
Refer to caption
Figure 12. Lock Acquisition Time and Sync Delay.

Wait Times to Start a Routine: Fig. 12 shows two metrics for any given routine. First, the Lock Acquisition time—the average time to acquire a device lock after it becomes available—varies between 3.41 s - 13.02 s. As routine count increases, median latencies remain stable for Grid, and rise slightly for Line. Second, the Sync delay—the time between the routine’s last device becoming available and the routine starting—stays stable with increasing routine count, with median between 10.23 s - 20.833 s.

Inter-kk-group Operation Latencies: Table 4 shows that CoMesh’s common operations are fast and take a median of 1.08 s for quorum, 1.17 s for election, and 2.07 s for state transfer during epoch change.

kk-group delay median mean P90
Quorum reply 1.0805 1.4021 1.4962
Leader election 1.1675 1.4913 1.9249
State transfer 2.0750 2.8468 3.1390
Table 4. CoMesh’s kk-group Latencies (seconds).

8. Related Work

Consistent Hashing & Committees: Peer-to-peer (p2p) networked systems such as Chord (Stoica et al., 2001), Dynamo (DeCandia et al., 2007), Cassandra (Lakshman and Malik, 2010) employ consistent hashing (Karger et al., 1997) to find the node storing a key-value pair. CoMesh’s kk-group mechanism is inspired by consistent subgroups used by the election protocol in (Gupta et al., 2000), but that mechanism is neither dynamic, nor intended for IoT settings, nor has locality.

Consensus: A variety of agreement protocols for wireless environment have been proposed, such as Wireless Paxos (Poirot et al., 2019), and WSN Byzantine consensus (Xu et al., 2021). Unlike these consensus algorithms which require many message exchange rounds, CoMesh’s kk-groups are efficient and fast, and work quickly in the common case, with just one communication round for a quorum.

IoT Application Management: IoT routine and application management system have been explored widely, but with a centralized control plane, e.g., HomeOS (Dixon et al., 2012), Beam (Shen et al., 2016), DepSys (Munir and Stankovic, 2014), eHome (Retkowitz and Kulle, 2009), and SafeHome (Ahsan et al., 2021). Rivulet (Ardekani et al., 2017) is distributed and uses smart devices to exchange messages and backup state to shadow nodes. However, unlike CoMesh, Rivulet focuses on fault-tolerance and omits the routine management. Other works on IoT fault-tolerance (Su et al., 2014; Javed et al., 2018; Norris et al., 2020), also do not manage routine execution. To the best of our knowledge, CoMesh is the first fully-decentralized IoT routine management system.

9. Summary

We have presented the CoMesh system that manages routines in a smart space (building, home, or campus) in a fully-decentralized way, without relying on any central components. Analysis of CoMesh’s techniques show that it achieves continuity (inheritance), safety, liveness, and progress. Our implementation and simulation of CoMesh under various topologies show that it scales well with increasing number of devices into the hundreds, and with larger values of ff (number of failures). CoMesh also balances load effectively, both over smart devices and over time. CoMesh opens the door for realizing self-managing versions of today’s smart space systems. Future directions include building central-distributed hybrid management systems, accounting for heterogeneity of smart devices, and energy considerations.

References

  • (1)
  • rpi (2019) 2019. Slow upload on new Raspberry Pi 4 2GB. https://forums.raspberrypi.com/viewtopic.php?t=244832
  • Aarti and Vineet (2022) G. Aarti and K. Vineet. 2022. IoT in Aerospace & Defense Market by Component (Hardware, Software, Services), by Deployment Mode (On-Premise, Cloud), by Connectivity Technology (Cellular, Wi-Fi, Satellite Communication, Radio Frequency), by Application (Fleet Management, Inventory Management, Equipment Maintenance, Security, Others): Global Opportunity Analysis and Industry Forecast, 2020-2030. https://www.alliedmarketresearch.com/internet-of-things-in-aerospace-and-defense-market.
  • Ahsan et al. (2021) S. B. Ahsan, R. Yang, S. A. Noghabi, and I. Gupta. 2021. Home, SafeHome: smart home reliability with visibility and atomicity. In EuroSys.
  • Akyildiz and Wang (2005) I. F. Akyildiz and X. Wang. 2005. A survey on wireless mesh networks. IEEE Commun. Mag. 43, 9 (2005).
  • Amazon Alexa (2014) Amazon Alexa 2014. Amazon Alexa. https://developer.amazon.com/alexa.
  • Apecechea (2019) G. I. Apecechea. 2019. Cryptographic Sortition in Blockchains: the importance of VRFs. https://medium.com/witnet/cryptographic-sortition-in-blockchains-the-importance-of-vrfs-ad5c20a4e018
  • Ardekani et al. (2017) M. Ardekani, R. Singh, N. Agrawal, D. Terry, and R. Suminto. 2017. Rivulet: A Fault-Tolerant Platform for Smart-Home Applications. In Middleware.
  • AWS (2022) AWS. 2022. AWS IoT Core additional metering details. https://aws.amazon.com/iot-core/pricing/additional-details/
  • Chandra and Collis (2021) R. Chandra and S. Collis. 2021. Digital Agriculture for Small-Scale Producers: Challenges and Opportunities. CACM 64, 12 (2021).
  • Datar et al. (2004) M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG.
  • DeCandia et al. (2007) G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. 2007. Dynamo: Amazon’s Highly Available Key-Value Store. In SOSP.
  • Dijkstra (1959) E. W. et al. Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959).
  • Dixon et al. (2012) C. Dixon, R. Mahajan, S. Agarwal, A. J. Brush, B. Lee, S. Saroiu, and P. Bahl. 2012. An operating system for the home. In NSDI.
  • Fatima et al. (2022) Z. Fatima, M. H. Tanveer, S. Zardari, L. F. Naz, H. Khadim, N. Ahmed, and M. Tahir. 2022. Production plant and warehouse automation with IoT and industry 5.0. Applied Sciences 12, 4 (2022).
  • Feng et al. (2020) Y. Feng, M. Li, C. Zeng, and H. Liu. 2020. Robustness of internet of battlefield things (iobt): A directed network perspective. Entropy 22, 10 (2020).
  • Fox et al. (2019) M. A. Fox, J. L. Breese, and G. Vaidyanathan. 2019. Live Music Performances and the Internet of Things.
  • Future (2022) Market Research Future. 2022. Smart Stadium Market Estimated to Hit 24.3 Billion with a CAGR of 23% during 2021 - 2030. https://www.globenewswire.com/news-release/2022/06/01/2453900/0/en/Smart-Stadium-Market-Estimated-to-Hit-24-3-Billion-with-a-CAGR-of-23-during-2021-2030-Report-by-Market-Research-Future-MRFR.html.
  • Gan and Lee (2018) H. Gan and W.S. Lee. 2018. Development of a Navigation System for a Smart Farm. IFAC-PapersOnLine 51, 17 (2018).
  • Garcia-Molina (1982) H. Garcia-Molina. 1982. Elections in a Distributed Computing System. IEEE TOC C-31, 1 (1982).
  • Google Home (2016) Google Home 2016. Google Home. https://store.google.com/us/product/google_home.
  • Grothaus (2019) M. Grothaus. 2019. That major Google outage meant some Nest users couldn’t unlock doors or use the AC. https://www.fastcompany.com/90358396/that-major-google-outage-meant-some-nest-users-couldnt-unlock-doors-or-use-the-ac
  • Guo et al. (2022) X. Guo, Lo. Shangguan, Y. He, N. Jing, J. Zhang, H. Jiang, and Y. Liu. 2022. Saiyan: Design and Implementation of a Low-power Demodulator for LoRa Backscatter Systems. In NSDI.
  • Gupta et al. (2000) I. Gupta, R. Van Renesse, and K. P Birman. 2000. A probabilistically correct leader election protocol for large groups. In DISC.
  • Hunt et al. (2010) P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems.. In ATC.
  • ISO Central Secretary (2016) ISO Central Secretary. 2016. Systems and software engineering – Lifecycle profiles for Very Small Entities (VSEs) – Part 1: Overview. Standard ISO/IEC TR 29110-1:2016. International Organization for Standardization, Geneva, CH. https://www.iso.org/standard/62711.html
  • J. (2021) Bojan J. 2021. Internet of Things statistics for 2021 – Taking Things Apart. https://dataprot.net/statistics/iot-statistics/.
  • Javed et al. (2018) A. Javed, K. Heljanko, A. Buda, and K. Främling. 2018. CEFIoT: A fault-tolerant IoT architecture for edge and cloud. In WF-IoT.
  • Johnson and Maltz (1996) D. B. Johnson and D. A. Maltz. 1996. Dynamic source routing in ad hoc wireless networks. In Mobile computing.
  • K. et al. (2021) Mohamad K., Anthony W., and Vamsi T. 2021. Simplifying Backscatter Deployment: Full-Duplex LoRa Backscatter. In NSDI.
  • Karger et al. (1997) David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In STOC. 654–663.
  • Krishna and Narasimha Murty (1999) K. Krishna and M. Narasimha Murty. 1999. Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 29, 3 (1999), 433–439. https://doi.org/10.1109/3477.764879
  • Lakshman and Malik (2010) A. Lakshman and P. Malik. 2010. Cassandra: a decentralized structured storage system. IGOPS 44, 2 (2010).
  • Lampropoulos et al. (2019) G. Lampropoulos, Siakas K., and T. Anastasiadis. 2019. Internet of things in the context of industry 4.0: An overview. IJEK (2019).
  • Li et al. (2022) Chenning Li, Xiuzhen Guo, Longfei Shangguan, Zhichao Cao, and Kyle Jamieson. 2022. CurvingLoRa to Boost LoRa Network Throughput via Concurrent Transmission. In NSDI.
  • Lin et al. (2016) K. Lin, W. Wang, M. Bi, Y.and Qiu, and M. M. Hassan. 2016. Human localization based on inertial sensors and fingerprints in the Industrial Internet of Things. Elsevier COMNET 101 (2016).
  • Liu et al. (2022) D. Liu, T. F. Abdelzaher, T. Wang, Y. Hu, J. Li, S. Liu, M. Caesar, D. Kalasapura, J. Bhattacharyya, N. Srour, J. Kim, G. Wang, G. Kimberly, and S. Yao. 2022. IoBT-OS: Optimizing the Sensing-to-Decision Loop for the Internet of Battlefield Things. In ICCCN.
  • McNally (2022) C. McNally. 2022. Google Nest Wi-Fi Review 2022. https://www.reviews.org/internet-service/google-nest-wifi-review/
  • Memoori (2022) Memoori. 2022. The Global Market for the Internet of Things in Smart Commercial Buildings. https://memoori.com/portfolio/the-internet-of-things-in-smart-commercial-buildings-2022-to-2027/.
  • Micali et al. (1999) S. Micali, M. Rabin, and S. Vadhan. 1999. Verifiable random functions. In FOCS.
  • Moore et al. (2020) S. J. Moore, C. D. Nugent, S. Zhang, and I. Cleland. 2020. IoT reliability: a review leading to 5 key research directions. CCF TPCI 2 (2020).
  • Munir and Stankovic (2014) S. Munir and J. A. Stankovic. 2014. Depsys: Dependency aware integration of cyber-physical systems for smart homes. In ICCPS.
  • Norris et al. (2020) M. Norris, B. Celik, P. Venkatesh, S. Zhao, P. McDaniel, A. Sivasubramaniam, and G. Tan. 2020. IoTRepair: Systematically Addressing Device Faults in Commodity IoT. In IoTDI.
  • OLSR.org (2021) OLSR.org. 2021. Optimized Link State Routing Protocol. https://tinyurl.com/olsrd-wiki.
  • Perkins et al. (2003) C. Perkins, E. Belding-Royer, and S. Das. 2003. RFC3561: Ad hoc on-demand distance vector (AODV) routing.
  • Poirot et al. (2019) V. Poirot, B. Al Nahas, and O. Landsiedel. 2019. Paxos Made Wireless: Consensus in the Air.. In EWSN.
  • Research (2022) Emergen Research. 2022. Internet of Things In Agriculture Market, By System (Automation and Control Systems, Sensing and Monitoring Devices, Livestock Monitoring Hardware, Fish Farming Hardware), By Application, and By Region Forecast to 2030.
  • Retkowitz and Kulle (2009) D. Retkowitz and S. Kulle. 2009. Dependency management in smart homes. In DAIS.
  • Ryu et al. (2015) M. Ryu, J. Yun, T. Miao, I.-Y. Ahn, S.-C. Choi, and J. Kim. 2015. Design and implementation of a connected farm for smart farming system. In Sensors. IEEE.
  • Samsung Smart Things (2012) Samsung Smart Things 2012. Samsung SmartThings. https://www.smartthings.com/.
  • Shahid et al. (2021) H. Shahid, M. A. Shah, A. Almogren, H. A. Khattak, I. U. Din, and C. Kumar, N.and Maple. 2021. Machine Learning-Based Mist Computing Enabled Internet of Battlefield Things. ACM TOIT 21, 4 (2021).
  • Shen et al. (2016) C. Shen, R. P. Singh, A. Phanishayee, A. Kansal, and R. Mahajan. 2016. Beam: Ending monolithic applications for connected devices. In ATC. 143–157.
  • Sivakumar et al. (2021) Arun Narenthiran Sivakumar, Sahil Modi, Mateus Valverde Gasparino, Che Ellis, Andres Eduardo Baquero Velasquez, Girish Chowdhary, and Saurabh Gupta. 2021. Learned Visual Navigation for Under-Canopy Agricultural Robots. In Robotics: Science and Systems.
  • Song et al. (2008) J. Song, S. Han, A. Mok, D. Chen, M. Lucas, M. Nixon, and W. Pratt. 2008. WirelessHART: Applying wireless technology in real-time industrial process control. In RTAS.
  • Stoica et al. (2001) I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. SIGCOMM 31, 4 (2001).
  • Su et al. (2014) P. H Su, C.-S. Shih, J. Y.-J. Hsu, K.-J. Lin, and Y.-C. Wang. 2014. Decentralized fault tolerance mechanism for intelligent IoT/M2M middleware. In WF-IoT.
  • Turchet et al. (2018) L. Turchet, C. Fischione, G. Essl, D. Keller, and M. Barthet. 2018. Internet of musical things: Vision and challenges. IEEE access 6 (2018).
  • Vasisht et al. (2017) D. Vasisht, Z. Kapetanovic, J. Won, X. Jin, R. Chandra, S. Sinha, A. Kapoor, M. Sudarshan, and S. Stratman. 2017. FarmBeats: An IoT Platform for Data-Driven Agriculture. In NSDI.
  • Wang et al. (2022) L. Wang, A. Hamad, and V. Sakthivel. 2022. IoT assisted machine learning model for warehouse management. Journal of Interconnection Networks 22, Supp02 (2022).
  • Xu et al. (2021) Mi. Xu, C. Liu, Y. Zou, F. Zhao, J. Yu, and X. Cheng. 2021. wChain: A Fast Fault-Tolerant Blockchain Protocol for Multihop Wireless Networks. TWC (2021).
  • Yang et al. (2022) R. Yang, J. Wang, J. Hu, S. Zhu, Y. Li, and I. Gupta. 2022. Medley: A Membership Service for IoT Networks. IEEE TNSM 19, 3 (2022).
  • Yang et al. (2005) Y. Yang, J. Wang, and R. H. Kravets. 2005. Designing routing metrics for mesh networks.
  • Zolotarev (1986) V. M. Zolotarev. 1986. One-dimensional stable distributions. Translations of Mathematical Monographs, Vol. 65. 277 pages.

Appendix A Appendix: Formal Analysis

We formally analyze CoMesh’s properties. For the reader who wishes to skip this section, we summarize our findings:

  • \bullet

    Inheritance: When a k-group changes, the state held by the old leader and new leader are identical.

  • \bullet

    Safety: No two routines that touch an overlapping set of devices, are allowed to execute simultaneously.

  • \bullet

    Liveness: No routines deadlock.

  • \bullet

    Progress: Every routine makes progress.

  • \bullet

    We also calculate the (probabilistic) availability of the system when more than ff devices fail.

A.1. Inheritance

Definition 1.

Inheritance means that given a kk-group, the old and new versions of the kk-group—before and after epoch change, or after leader failure—maintain identical state.

For a device kk-group, the state that must be transferred to the new leader includes: the device’s current availability and the lock queue (if applicable). For a routine kk-group, the new leader should receive its state which includes: the routine stage, the IDs of devices that have already been locked (if applicable at the time), and the IDs of devices that have already been released (if applicable at the time).

Lemma 1.

Inheritance is guaranteed after the state transfer stage when at most one node fails. That is, the state held by an old kk-group leader is the same as the state held by the new kk-group leader after the state transfer stage, under failure of at most one node.

Proof.

If there are new requests coming during the state transfer stage, request processing will be delayed until the state transfer stage finishes. The state transfer stage happens in two cases: leader failure and/or epoch change.

Case 1: When there is no epoch change, upon kk-group leader LoL_{o}’s failure, the new leader LnL_{n} sends a message to all kk-group members requesting their local kk-group state. All the members reply with their local state back to LnL_{n}. LnL_{n} includes a state entry ee in the recreated state if and only if there is at least 1 received state that contains ee. More specifically for the device kk-groups, which may include a lock queue, every routine request in the lock queue comes with a sequence number denoting when it reached DkD_{k}’s kk-group leader. LnL_{n} includes a routine entry ee in the recreated lock queue if and only if there is at least one received queue that contain ee. Since each routine entry ee in the lock queue was replicated by at least a quorum of kk-group members before the old leader LoL_{o}’s failure, LnL_{n} is guaranteed to receive the exact same set of routine requests. LnL_{n} will sort the queue based on the gathered requests’ sequence numbers. Thus, the recreated lock queue is exactly the same as LoL_{o}’s latest lock queue; which also guarantees that LnL_{n}’s recreated state is identical to LoL_{o}’s latest state.

Case 2: Upon epoch change, the system enters state transfer stage: the new kk-group’s leader LnL_{n} requests the old kk-group’s leader LoL_{o}’s state, to which LoL_{o} replies accordingly. There are 4 possibilities during the state transfer stage: a) the old leader fails, b) the new leader fails, c) a non-leader member of the old kk-group fails, d) a non-leader member of the new kk-group fails.

Case 2a: If old leader LoL_{o} fails after state transfer starts but before state transfer completes, the new leader LnL_{n} sends a request for local states to all of the old kk-group’s members and recreates LoL_{o}’s lock queue as in Case 1.

Case 2b: If new leader LnL_{n} fails before it receives the state from old leader LoL_{o}, another new leader LnL_{n}^{\prime} is elected and sends a new state request to LoL_{o}. LoL_{o} will reply again with the kk-group state.

Case 2c: If a non-leader member of the old kk-group fails, the state transfer will not be affected and the new kk-group leader LnL_{n} will receive the correct state.

Case 2d: If a non-leader member of the new kk-group fails, another member will be recruited to take its place. The new kk-group leader will distribute the state to the new member. ∎

Lemma 2.

Inheritance can be guaranteed after the state transfer stage when at most ff{} nodes fail.

Proof.

If there are no epoch changes (Lemma 1 Case 1) or failures happen only among non-leader members during the epoch change (Lemma 1 Cases 2c2d), each routine entry ee is maintained at at least f+1f=1f+1-f=1 alive member. Thus, the state can be recreated correctly as in Lemma 1 Case 1.

Upon epoch change, if the old leader LoL_{o} fails before the new leader LnL_{n} gets the state, LnL_{n} (Lemma 1 Case 2a) can reconstruct the same state as LoL_{o} since each state entry ee is maintained at at least f+1f=1f+1-f=1 alive old kk-group member. If LnL_{n} keeps failing before it gets or re-constructs the state, the latest new leader can re-construct it similarly.

If LoL_{o} is alive and LnL_{n} fails during epoch change, the working process is the same as Lemma 1 Case 2b. ∎

A.2. Safety

Theorem 1.

[Safety] No two routines that conflict in devices (i.e., their touched device sets are not disjoint) can execute simultaneously, under at most one node failure.

Proof.

Assume routines R1R_{1} and R2R_{2} share devices. We analyze SLA and OLA separately.

Sequential Lock-Acquiring Strategy (SLA): First, we will analyze Sequential Lock-Acquiring Strategy (Section 4.2.1). Let DkD_{k} be that shared device that has the minimum IDID, with its kk-group leader denoted as LkL_{k}. If LkL_{k} does not fail, by definition at most one routine can reach quorum and therefore acquire the lock on the device.

No new requests are handled during leader transition. Such requests are put into new leader’s (LnL_{n}) wait queue. The request/quorum to add such requests to the lock queue will be initiated only after the leader transition completes. Therefore the rest of the proof handles only outstanding requests that have been approved.

If the leader LkL_{k} fails, several cases arise:

Case 1: If LkL_{k} fails before it has replicated the request from R1R_{1} to a quorum of kk-group nodes, there is no further operation. R1R_{1}’s kk-group leader resends R1R_{1}’s request after a timeout T of not receiving an acknowledgement.

Case 2: If LkL_{k} fails after it has replicated R1R_{1}’s request to a quorum of kk-group members, the new leader LnL_{n} adds R1R_{1} to the lock queue during the reconstruction phase when it queries (a quorum of) k-group members for their pieces of the state.

Case 3: If LkL_{k} fails after R1R_{1}’s request reaches the top of the lock queue, two sub-cases occur: a) If the request has been approved by a quorum of nodes (i.e., been forwarded to nodes, and a quorum has approved R1R_{1} to acquire the lock), the new leader LnL_{n} marks R1R_{1} as the approved routine and notifies R1R_{1}’s kk-group leader. b) If the request has not yet reached quorum (to approve the routine for lock acquisition): then the new leader LnL_{n} adds R1R_{1} to the lock queue, and continues its normal processing.

Case 4: If LkL_{k} fails after it has sent a Locked message to R1R_{1}’s k-group, the new leader LnL_{n}: 1) re-constitutes the lock queue from the surviving nodes, and 2) (re)notifies R1R_{1}’s kk-group leader of its approval.

Optimistic Lock-Acquiring Strategy (OLA): For each device DkD_{k} that is touched by both R1R_{1} and R2R_{2}, only one of the two routines’ lock requests will be processed first by DkD_{k}’s kk-group leader LkL_{k} and reach quorum for the device pre-lock, as long as LkL_{k} does not fail. By definition only this routine will go on to lock the device if all other desired devices are also available and each desired device’s DkD_{k}’s LkL_{k} does not fail.

If a leader LkL_{k} does fail, two cases arise:

Case 1: If LkL_{k} fails before it gets a quorum of acknowledgement for pre-lock replication, the new leader LnL_{n} will have no record of it. R1R_{1}’s kk-group leader will resend the request after timeout.

Case 2: If LkL_{k} fails after it has replicated R1R_{1}’s pre-lock request, but before it hears back from R1R_{1}’s leader for further action (of either locking for execution or releasing due to failed pre-lock), the new leader LnL_{n} will know R1R_{1} has pre-locked DkD_{k} from its group member and resend pre-lock information back to R1R_{1}.

Case 3: If LkL_{k} fails after it hears from R1R_{1} for lock acquisition or releasing, but before it gets quorum acknowledgement for replication, R1R_{1}’s kk-group leader will resend the request after timeout and the locking process keeps up.

Case 4: If LkL_{k} fails after it has locked DkD_{k} and notifies R1R_{1}’s kk-group the new leader LnL_{n} 1) re-constitutes the state from the surviving nodes, and 2) (re)notifies R1R_{1}’s kk-group leader of its approval. Therefore, the OLA strategy is safe as well. Thus CoMesh is safe under at most one node failure. ∎

A.3. Liveness

Theorem 2.

[No Deadlocks] No two routines are stuck in a deadlock.

Proof.

Each SLA and OLA violate separate necessary conditions for deadlocks. For SLA, routines only wait for higher-IDID devices than they have already locked. This means no routine waits for lower-IDID devices than it has locked. Thus, there is no “wait for” cycle among routines. As for OLA, routines do not wait for any already-locked devices. If a device is already (pre-)locked, the new routine will abort and try to acquire its touched devices’ locks at a later time. Without “hold and wait” there is no deadlock.

Leader change and failure do not affect device locks held by CoMesh. Meanwhile Lemmas 12 guarantee inheritance. Thus lock assignment is not influenced by leader changes and failures, i.e., there is no deadlock. ∎

Finally, we prove that CoMesh always makes progress. We prove the induction step via the following theorem (this theorem can be applied iteratively).

Theorem 3.

[Progress/Liveness] Consider a set of executing routines \mathcal{R} that are each still waiting to acquire 1 or more locks, and a set of devices DD they are waiting for. If no further routines arrive into \mathcal{R}, and SLA is used for locking, then: at least one routine from the set \mathcal{R} will make progress (i.e., get access to one more devices that it desires).

Proof.

Consider the highest-IDID device DD in set DD, and the routine RR that currently holds the lock on device DD. Then RR cannot be waiting for any further locks (otherwise DD would not be the highest-IDID in DD), and thus RR will eventually complete. At that point, the routine at the head of DD’s queue will be granted access to DD and thus will make progress. ∎

For OLA, an adversary may always be able to prevent two competing routines from making progress. However, in practice, we observed that routines make fast progress in OLA. Nevertheless, because of this, we use SLA as the default in the CoMesh implementation.

A.4. System Availability

While CoMesh can tolerate ff failures, what happens if there are more than ff failures? With F(>f)F(>f) simultaneous failures, we calculate the availability P(F)P(F{}) = the probability that CoMesh still works correctly, i.e., all kk-groups contains at least (f+1)(f+1) non-faulty nodes. Assume we have SS smart devices, GG kk-groups, and each group has kk members.

  1. (1)

    When FfF{}\leq f{}, P(F)=100%P(F)=100\%

  2. (2)

    When FfF{}\geq f{}, the availability is:

    P(F)\displaystyle P(F) =P(at most f failures across all k-groups)\displaystyle=P(\text{at most }f\text{ failures across all }k\text{-groups})
    =P(at most f failures in one k-group)G\displaystyle=P(\text{at most }f\text{ failures in one }k\text{-group})^{G}
    =(i=max(0,k+FS)f(SFki)×(Fi)(Sk))G\displaystyle=\Bigg{(}\sum_{i=\max(0,k+F-S)}^{f}\frac{\binom{S-F}{k-i}\times\binom{F}{i}}{\binom{S}{k}}\Bigg{)}^{G}

The equation for one kk-group is explained as follows. Given FF failures, the numerator is the number of ways to select a kk-group containing (ki)(k-i) non-faulty nodes (left term), and ii faulty nodes (right term), while the denominator is the number of all possible selections for a kk-group. ii varies from ff down to 0 (or a higher number (k+FS)(k+F-S) if there are insufficient alive nodes).

From Fig. 3(a) we observe the availability of CoMesh drops when the number of simultaneous failures FF{} becomes >f>f{}. However, CoMesh still provides 50% of availability when simultaneous failures FF{} grow up to 9×f\times f. Fig. 3(b) shows that to reach a specific percentile of availability pp, CoMesh’s tolerated FF drops very slowly as the number of groups increases—this indicates scalability with the number of groups.