Distributed Data Storage and Fusion for Collective Perception
in Resource-Limited Mobile Robot Swarms

Nathalie Majcherczyk, Daniel Jeswin Nallathambi, Tim Antonelli, Carlo Pinciroli Robotics Engineering, Worcester Polytechnic Institute, MA, USA
Email: {nmajcherczyk, cpinciroli}@wpi.edu

Abstract

In this paper, we propose an approach to the distributed storage and fusion of data for collective perception in resource-limited robot swarms. We demonstrate our approach in a distributed semantic classification scenario. We consider a team of mobile robots, in which each robot runs a pre-trained classifier of known accuracy to annotate objects in the environment. We provide two main contributions: (i) a decentralized, shared data structure for efficient storage and retrieval of the semantic annotations, specifically designed for low-resource mobile robots; and (ii) a voting-based, decentralized algorithm to reduce the variance of the calculated annotations in presence of imperfect classification. We discuss theory and implementation of both contributions, and perform an extensive set of realistic simulated experiments to evaluate the performance of our approach.

I INTRODUCTION

Decentralized collective perception is one of the main activities that a swarm of robots must perform. A major challenge in collective perception is dealing with partial and uncertain knowledge. To deploy robots in safety-critical applications, such as disaster recovery or autonomous driving, researchers strive to increase the reliability of the information used for decision-making. In the context of resource-constrained robot swarms, significant effort has been devoted to solving this problem, with particular focus on the best-of- $n$ formulation [1], [2].

In this paper, we deal with a collective perception scenario related to, but more complex than, the best-of- $n$ formulation. We assume that individual robots perform observations, from which they individually derive annotations on the environment through a machine learning algorithm. The algorithm, however, might produce incorrect annotations. The aim of our work is to take advantage of the fact that multiple robots can combine individual annotations to derive a set of more accurate consolidated annotations.

To showcase our approach, as shown in Fig. 1, we consider the case in which the robots must collectively construct and store a coherent set of semantic annotations on features found in the environment, such as pieces of furniture, trees, or doorways. To consider low-memory constraints, we assume that the objects are internally stored as bounding boxes, rather than through more detailed representations such as voxels or graphs. These annotated bounding boxes constitute the essence of a distributed semantic map [3] of the environment. This problem is different from the best-of- $n$ problem. In the latter, the robots must agree on a single choice for the entire swarm; in our problem, the robots must collectively construct a coherent set of annotations.

Semantic mapping poses interesting challenges in the context of resource-constrained collective robotics. Constructing a semantic map entails the collection and storage of large amounts of data. This data is typically processed by the robots individually at first. The result is then shared and fused with the results of other robots to produce a coherent representation of the environment. An effective approach to solve these problems is capable of coping with low memory availability, mobility, noise, uncertainty, and bandwidth limitations [4, 5, 6].

Our approach to semantic mapping draws inspiration from research in ensemble learning [7]. Ensemble learning reduces the variability of individual models through information fusion [8]. In ensemble learning, fusion is achieved through bagging and stacking techniques that aggregate models, typically classifiers, by combining their outputs through an averaging process or through a meta-model trained for that purpose [7, 9].

Refer to caption — Figure 1: Collective semantic annotation application with annotation consolidation.

This paper offers two main contributions. First, we present an algorithm to store localized semantic annotations across a swarm of mobile robots, under the form of a shared global memory. Second, we propose an approach to online fusion of uncertain semantic annotations, formalizing the problem and deriving a solution from first principles. We consider a scenario in which the user has already trained a multi-class classifier on an extensive dataset with multiple annotations of various objects [10]. This classifier is deployed on each robot and used to identify objects for the map, which are shared with the rest of the swarm. The semantic map is constructed by consolidating multiple observations of the same objects into single ones. Through a voting mechanism, our approach copes with the fact that classifiers are imperfect and might mislabel objects. The final result of our approach is a semantic map whose accuracy is superior to that achievable by any individual robot alone. We also provide a closed-form equation to assess the accuracy of a consolidated annotation as a function of the number of votes and the individual classifier accuracy.

Our paper is organized as follows. In Section II we discuss related work. The components of our framework are described in Section IV. We report the results of our performance evaluation in Section V, and conclude the paper in Section VI.

II RELATED WORK

Best-of- $\mathbf{n}$ problem. Previous work in swarm robotics has tackled collective classification problems for one global environmental property. These problems have commonly been referred to as best-of- $n$ [11]. Valentini et al. proposed a honeybee-inspired approach for robots to collectively decide whether a black and white colored environment has a majority of white or black tiles [1]. Ebert et al. applied Bayesian estimation to the same problem with the added explicit group-wide agreement on the output [2]. Robust formulations have also been studied in which robots might be affected by noise [12] or act in an adversarial manner [13]. As discussed in the introduction, semantic mapping is a more complex problem, in that it can be seen as a repeated best-of- $n$ problem in which the annotations must also be coherent with each other.

Estimation of continuous variables. Other work has considered the collective estimation of a continuous value. Early work focused on average consensus [14, 15, 16], in which the robots must agree on the average of individual initial estimates of a quantity of interest. More recently, Albani et al. proposed a collaborative weed mapping application that considers inter-robot networking and motion planning [17]. Robots communicate to improve the confidence interval of weed density predictions for each cell in a discrete environment. They assume that robots know the confidence value for each independent prediction and keep the prediction with the maximum confidence. In our work, in contrast, we do not assume the confidence of the prediction known, and resort to a voting process to identify the most likely annotation.

Distributed mapping and semantic classification. Notable recent work in collective perception include distributed mapping, such as DOOR-SLAM [18]. In DOOR-SLAM, the robots construct a graph-based map (but not a semantic one) in a decentralized fashion, coping with spurious observations by identifying and rejecting outliers. In distributed semantic classification [19], effort have concentrated on cloud-based method rather than decentralized ones, such as RoboEarth [3]. In the work of Ruiz et al. [20], a centralized knowledge base is used to build semantic map. Classification uncertainty is solved through Conditional Random Fields, which estimate beliefs on the annotations by combining spatial and contextual knowledge.

III PROBLEM STATEMENT

In this section, we explain our main modelling assumptions and formulate the distributed storage and collective annotation problems.

III-A Assumptions

III-A1 Multi-Robot System

We consider a system of $N$ robots, each with a unique identifier $i\in\{1,2,\ldots,N\}$ . Each robot can exchange messages within a communication range $C$ with its neighbors $\mathcal{N}_{i}(t)$ . Outgoing messages are limited in size at each time step $t$ to model a finite broadcasting bandwidth. Furthermore, each robot has a limited memory capacity $M_{i}$ devoted to the housekeeping of a shared data structure. This capacity is subdivided into storage capacity $S_{i}$ and routing capacity $R_{i}$ . The available memory of a robot at a given time is denoted by $m_{i}(t)$ . Robots move according to a diffusion policy with obstacle avoidance [21]. We assume that each robot can localize itself in a global frame. We further assume robots to be equipped with a perception module. To focus on the problems of storage and fusion, in this paper we abstract away the inner workings of the module and simulate its behavior through the following assumptions: (i) the perception module uses a sensor with physical specifications constraining its viewing frustum (see Figure 2); (ii) the perception module is able to determine the center position of objects perfectly; (iii) the perception module annotates objects imperfectly (see Section III-A3).

III-A2 Object Representation and Detection

We consider robots to be in an environment with non-uniformly distributed objects of different types. We use 3D bounding boxes to represent these objects in space. A bounding box is fully described by its center, dimensions, and orientation [22]. We assume that the sensor of the perception module can detect and localize objects when their front face corners fall inside the sensor viewing frustum (see Figure 2).

III-A3 Simulated Classifier

Within the perception module, we simulate the statistical behavior of a trained multi-class object classifier. We leverage the work of Carillo et al. [23] deriving the posterior distribution of the balanced accuracy, conditioned on predictions of a trained classifier on a sample test dataset. We use this posterior distribution to generate either correct or incorrect object annotations upon each object detection. To distinguish cases of erroneous labels, we further modelled the leftover probability distributions using the confusion matrices for the classifier. The statistical data used for these probabilistic models comes from the SceneNN benchmarking data [24]. SceneNN provides raw outcomes and accuracies for several multi-class classifiers tested extensively on a catalog of point clouds. We use the PB-T50-RS dataset with translation, rotation and scaling perturbations of the bounding boxes to compute the posterior distribution modelling the classifier statistical behavior.

III-B Distributed Storage Problem Formulation

The first goal of our work is to distribute object annotation data across robots in a way that makes efficient use of the collective memory. Each robot can hold a defined number of data items specified by its memory capacity $M_{i}$ . Besides memory constraints, we want to account for communication-related costs.

Since we want to pack data items into bins (robots) with different properties, we formulate the distributed storage problem as a heterogeneous bin packing optimization problem. In particular, we consider the variant known as the Variable Cost and Size Bin Packing Problem (VCSBPP) [25].

Given a set of items $\mathcal{T}$ with different volumes $v_{\tau}$ to be loaded into a set of available bins $\mathcal{I}$ with different capacities $M_{i}$ and costs $c_{i}$ , we aim to minimize the total cost of the bins selected to store items:


$\displaystyle\min_{a,s}\quad$	$\displaystyle f(a,s)=\sum_{i\in\mathcal{I}}c_{i}s_{i}$		(1a)
s.t.	$\displaystyle\sum_{i\in\mathcal{I}}a_{\tau i}=1\quad$	$\displaystyle\forall\tau\in\mathcal{T}$	(1b)
	$\displaystyle\sum_{\tau\in\mathcal{T}}v_{\tau}a_{\tau i}\leq M_{i}s_{i}\quad$	$\displaystyle\forall i\in\mathcal{I}$	(1c)
	$\displaystyle a_{\tau i}\in\{0,1\}\quad$	$\displaystyle\forall\tau\in\mathcal{T},\ \forall i\in\mathcal{I}$	(1d)
	$\displaystyle s_{i}\in\{0,1\}\quad$	$\displaystyle\forall i\in\mathcal{I}$	(1e)

with $\{s_{i}\}_{i\in\mathcal{I}}$ denoting bin-selection variables, and $s_{i}$ equalling 1 when a bin $i$ is selected for storing items. The set $\{a_{\tau i}\}_{t\in\mathcal{T},i\in\mathcal{I}}$ contains item-to-bin assignment variables; $a_{\tau i}$ equals 1 if item $\tau$ is assigned to bin $i$ . In our variant of the problem, we define bin costs $c_{i}$ as follows:

c_{i}=\cfrac{1}{|\mathcal{N}_{i}|\cdot m_{i}}=\cfrac{1}{|\mathcal{N}_{i}|\cdot(M_{i}-\sum_{\tau\in\mathcal{T}}v_{\tau}a_{\tau i})}

(2)

We select bin cost to be inversely proportional to the product of the number of neighbors and the memory left after the assignment. This is to model practical considerations such as potential for disconnections and subsequent memory overflows, which are not otherwise modelled in the VCSBPP.

III-C Annotation Consolidation Problem Statement

Our second goal is to improve the map annotation accuracy by using at least $V$ votes for each object to identify in the environment. We seek to derive the ensemble probability of success in predicting the correct class $p_{ens}(V,\,p)$ as a non-decreasing function of $V$ that is always at least as large as the accuracy $p$ of the correct class.

IV METHODOLOGY

IV-A Overview

Figure 1 shows the interconnection between the different components of the collective semantic annotation application with label consolidation. Robots move in an environment with objects to annotate. Upon detecting an object and deciding to record an annotation, each robot assigns an uncertain class annotation for the location. The robots writes the annotation into the shared memory as described in Section IV-B. Upon storing multiple annotations from the same location, robots query the shared memory for all data from that location to retrieve all the related annotations. If they receive enough annotations back from the query, the robots write the most frequent label for that location into the shared memory as a consolidated annotation. The exact consolidation mechanism and the resulting ensemble accuracy are formalized in Section IV-C. Over time, as robots explore the environment, the shared memory stores an increasingly more complete map of the annotated objects in the form of consolidated annotations. Section IV-D explains the local routine run by the robots to perform the various actions across components.

IV-B Distributed Storage Through SwarmMesh

We implement a shared memory to enable distributed storage across the robots. For this purpose we used SwarmMesh [26], a decentralized data structure for mobile swarms. As part of the work in this paper, we also provide a generic, open-source C++ implementation of this data structure.¹¹1https://github.com/NESTLab/SwarmMeshLibrary SwarmMesh is based on a distributed online heuristic solution to the Bin Packing problem formulated in Section III-B, Eqs. (1)-(2).

SwarmMesh stores data in the form of tuples (key-value pairs). The tuple key k is used for both storing and addressing tuples. It consists of two parts: (i) a unique tuple identifier $\tau$ and (ii) a tuple hash $\rho_{\tau}$ that depends on the tuple value v to be stored. When a robot $i$ creates a new tuple, the unique tuple identifier $\tau$ is calculated as the concatenation of the binary representations of $i$ and of the count of tuples created so far by the robot. The tuple value v contains the annotation $\lambda_{\nu}$ , the box center, the box orientation and the vector from the center of the box to the front right corner of the box. These variables fully describe an annotated 3D bounding box.

Each robot calculates a so-called NodeID, $\delta_{i}$ , which determines which tuples can be stored on a robot and which cannot. The robots compute their NodeIDs at each time step as follows:

\delta_{i}(t)=\begin{cases}m_{i}(t)\cdot|\mathcal{N}_{i}(t)|&\text{if }|\mathcal{N}_{i}(t)|>0\\ 1&\text{otherwise}\end{cases}

(3)

Intuitively, the NodeID encodes how willing a robot is to store a new tuple, but also how desirable it is to store a tuple on that robot. In Eq. (3), the NodeID increases linearly with the amount of available memory (more room for tuples) and the number of neighbors of a robot (better connectivity, lower likelihood of disconnections, better routing). As the robots move and store or delete tuples, the NodeIDs change over time.

The NodeIDs form a distribution that defines an hierarchy among robots. To decide which robot eventually stores a tuple, SwarmMesh has an analogous mechanism that imposes an hierarchy among tuples. A tuple is passed from robot to robot until $\delta_{i}(t)>\rho_{\tau}$ . Intuitively, we want tuples with a high hash $\rho_{\tau}$ to occupy the best robots in the network. Our insight is that the most uncertain annotations are those that should be mapped to higher values of $\rho_{\tau}$ , because this makes them easier to query and route. This makes it more efficient to consolidate uncertain annotations into annotations with a sufficient level of confidence. In contrast, annotations with high levels of certainty need a less critical placement. To formalize this notion, we calculate $\rho_{\tau}=R(\lambda_{\nu})$ , where $R(\cdot)$ is a staircase function increasing with the uncertainty of annotation $\lambda_{\nu}$ .

SwarmMesh offers a wide API of operations to handle tuples. In this work, we use the following operations:

•

store(k,v) which assigns a tuple to a suitable robot in the data structure;
•

get(x,y,r), which returns all the tuples located within a radius $r$ of the point $(x,y)$ expressed in a global reference frame;
•

erase_except_tuple(x,y,r,k), which removes from the data structure all tuples located within a radius $r$ around the point $(x,y)$ in the global reference frame, except for the tuples with unique identifier $\tau$ in the list k.

IV-C Annotation Consolidation Through Plurality Voting

Plurality vote. Upon querying the shared memory for a particular location in space, the querying robot receives class annotations predicted by various robots. If the number of votes $n$ is greater than or equal to the minimum number of votes $V$ , the robot aggregates these votes into a consolidated annotation $\bar{\lambda}$ for the object through plurality voting:

\bar{\lambda}=\operatorname*{argmax}_{\text{classes}}\sum_{\nu=1}^{n}I(\lambda_{\nu}=\text{class})

where $I$ is the indicator function. In case of ties, the consolidated annotation is selected at random from the tied options.

Ensemble Probability of Success. We seek to derive the probability of success for $n$ independent votes, each for one of $c$ classes where $c\geq n$ . We assume that class 1 is correct and that each vote is for class 1 with probability $p$ . The probability of an incorrect vote is then $1-p$ , and we make the simplifying assumption that each incorrect class is equally likely. We denote the total vote count for each class as a vector $z=(z_{1},z_{2},\ldots,z_{c})$ where $z_{i}\in\{0,1,\ldots,n\}$ and $z_{1}+z_{2}+\cdots+z_{c}=n$ . This vector follows a multinomial distribution with probability mass function

\phi(z\,|\,n,p)=\binom{n}{z_{1},\ldots,z_{c}}p^{z_{1}}\left(\frac{1-p}{c-1}\right)^{n-{z_{1}}}

where

\binom{n}{z_{1},\ldots,z_{c}}=\frac{n!}{z_{1}!\cdots z_{c}!}

denotes the multinomial coefficient.

Without knowing the correct class, we can estimate it from an observed vector as the class with the most votes, whether a majority or plurality. In the case where several classes are tied with the highest number of votes, we select one of those classes at random. Again considering that $z_{1}$ represents the true correct class, we use $z^{*}$ to denote a vector of vote counts for which $z_{1}\geq z_{i}$ , for all $i=2,3,\ldots,c$ . The probability of identifying the correct class with $n$ total votes is then

p_{ens}(n,p)=\sum_{z^{*}}\phi(z^{*}\,|\,n,p)P(\mathrm{success}\,|\,z^{*})

(4)

since all other vectors $z$ result in a success probability of zero. Because the largest term in $z^{*}$ appears first and all incorrect classes are chosen with equal probability, we can express (4) using integer partitions.

Integer partitions. An integer partition $\xi$ is a nonincreasing sequence of positive integers $\xi_{1}\geq\xi_{2}\geq\cdots\geq\xi_{\omega}$ . We say that $\xi$ is a partition of $n$ , denoted $\xi\vdash n$ , if $\sum_{i=1}^{\omega}\xi_{i}=n$ , and we consider the set of all such $\xi$ to be $\mathcal{P}$ . The $\xi_{i}$ are called parts of the partition, and we define the length of the partition $\ell(\xi):=\omega$ . An alternative formulation is to consider the infinite sequence of multiplicities of each part $\xi=(k_{1},k_{2},\ldots)$ where $k_{j}\in\{0,1,\ldots\}.$ Here, $\xi\vdash n$ if $\sum_{j=1}^{\infty}jk_{j}=n$ , and the length $\ell(\xi)=\sum_{j=1}^{\infty}k_{j}=\omega$ , since all but a finite number of $k_{j}$ are zero [27]. For example, $\xi=(5,3,2,1,1)$ in part notation and $\xi=(2,1,1,0,1,0,\ldots)$ in multiplicity notation represent the same partition of $n=12$ . We will refer to both the parts $\xi_{i}$ and multiplicities $k_{j}$ of the same partition throughout.

Note that we can map each vector $z^{*}$ to an integer partition $g(x^{*})=\xi$ , where $\xi_{1}$ represents the number of votes for the correct class, and the other $\xi_{i}$ represent the positive numbers of votes for $\omega-1$ incorrect classes. Since all incorrect classes are equally likely, each $z^{*}$ that maps to a particular $\xi$ yields the same probability of success. Thus,

	$\displaystyle p_{ens}(n,p)$	$\displaystyle=\sum_{\begin{subarray}{c}\xi\vdash n\\ \xi\in\mathcal{P}\end{subarray}}\sum_{\begin{subarray}{c}z^{}:g(z^{})=\xi\end{subarray}}\phi(z^{}\,\|\,n,p)P(\mathrm{success}\,\|\,z^{})$
		$\displaystyle=\sum_{\begin{subarray}{c}\xi\vdash n\\ \xi\in\mathcal{P}\end{subarray}}\left\|g^{-1}\big{(}\{\xi\}\big{)}\right\|\phi(\xi\,\|\,n,p)P(\mathrm{success}\,\|\,\xi)$		(5)

where $\left|g^{-1}\big{(}\{\xi\}\big{)}\right|$ is the cardinality of the preimage of $\{\xi\}$ , i.e., the number of $z^{*}$ that map to $\xi$ .

To determine $\left|g^{-1}\big{(}\{\xi\}\big{)}\right|$ , we start with the $\binom{c-1}{\omega-1}$ combinations of incorrect classes and count the ways to assign vote counts with multiplicities $k_{j}$ to each. There are

\binom{\omega-1}{k_{1},k_{2},\ldots,k_{\xi_{1}}-1}

such assignments, where the $k_{\xi_{1}}-1$ derives from the fact that one of the spots for the largest possible vote count is taken by the correct class. Together, we have

\left|g^{-1}\big{(}\{\xi\}\big{)}\right|=\binom{c-1}{\omega-1}\binom{\omega-1}{k_{1},k_{2},\ldots,k_{\xi_{1}}-1}

For the conditional probability $P(\mathrm{success}\,|\,\xi)$ , when $\xi_{1}$ is the strictly largest part, it is chosen with probability 1; otherwise, there is a $k_{\xi_{1}}$ -way tie, from which a class is chosen at random. In both cases, the correct class is chosen with probability $P(\mathrm{success}\,|\,\xi)=1/k_{\xi_{1}}$ . Substituting into (5) and simplifying gives

p_{ens}(n,p)=\\ \frac{1}{c}\sum_{\begin{subarray}{c}\xi\vdash n\\ \xi\in\mathcal{P}\end{subarray}}\binom{n}{\xi_{1},\ldots,\xi_{\omega}}\binom{c}{\omega}\binom{\omega}{k_{1},\ldots,k_{\xi_{1}}}p^{\xi_{1}}\left(\frac{1-p}{c-1}\right)^{n-\xi_{1}}

(6)

IV-D Robot Local Routine

Robots each run a local procedure in a loop. We refer to this procedure as the control step and outline it from the perspective of robot $i$ in Algorithm 1. Every time step, the robot performs one iteration of this loop.

At the beginning of the procedure, the robot receives and deserializes messages from its neighbors. Certain messages are meant for specific recipients only (point-to-point) and get discarded by any other receiving robot during deserialization. The robot then computes and performs a motion increment according to a diffusion policy [21].

Next, the robot checks the output of its perception module if it is not in recording timeout. The recording timeout is a delay imposed to avoid making successive observations of the same object. If the perception module has detected an object, the robot creates a tuple and starts a store() operation on the shared memory. The payload of the tuple contains the object annotation and the bounding box location and dimensions.

If the robot is not in a querying timeout, it reviews the tuples stored in its memory by bounding box location. If the robot finds multiple tuples with the same location in its own local memory, it starts a get() query on SwarmMesh to retrieve all the tuples currently stored with that bounding box location. It then saves the meta data for the started query.

The robot goes through the queries it has started and checks whether it has not received replies for a period longer than a time threshold. If the number of tuples is larger than the minimum number of votes, the robot creates and writes a tuple into SwarmMesh with the consolidated annotation. It also start an erase_except_tuple() request of all tuples with the given location except for the newly created tuple.

SwarmMesh routing at line 30 of Alg. 1 decides which requests and replies need to be routed and to which robot. A robot may have to send a reply to a request and continue propagating the request or route a neighbor’s reply or discard the request or reply. The details of the routing process are reported in [26].

Algorithm 1 Control step - message

\mu

, neighbors

\mathcal{N}_{i}

, requests

\mathcal{Q}

, replies

\mathcal{R}

1:procedure Control_step

\mathbf{receive}\,\,\{\mu_{j}\}_{j\in\mathcal{N}_{i}}

\triangleright

RX from neighbors

3: for each

j\in\mathcal{N}_{i}

(j,\delta_{j},\mathcal{Q}_{j},\mathcal{R}_{j})\leftarrow\mu_{j}

\triangleright

Deserialization

5: end for

\mathbf{move()}

\triangleright

Diffusion motion

7: if not in recording timeout then

8: v

\leftarrow\mathbf{senseObject()}

9: if object detected then

10: start recording timeout

11: k

\leftarrow\mathbf{hash(\texttt{v})}

12:

\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup

store(k, v)

\triangleright

Write label

13: end if

14: end if

15: if not in querying timeout then

16: if holds multiple tuples from same (x,y) then

17: start querying timeout

18:

\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup

get(x, y, 0)

\triangleright

Ask for labels

19: save query information

20: end if

21: end if

22: for each started query do

23: if results ready and enough votes then

24: v

\leftarrow

plurality vote

\triangleright

Consolidated label

25: k

\leftarrow\mathbf{hash(\texttt{v})}

26:

\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup

store(k,v)

27:

\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup

erase_expt_tuple(x,y,0,k)

28: end if

29: end for

30:

(\mathcal{\bar{Q}}_{i},\mathcal{\bar{R}}_{i})\leftarrow\mathbf{route(\mathcal{Q}_{\bullet},\mathcal{R}_{\bullet})}

\triangleright

SwarmMesh routing

31:

\mu_{i}\leftarrow(i,\delta_{i},\mathcal{\bar{Q}}_{i},\bar{\mathcal{R}}_{i})

\triangleright

Serialization

32:

\mathbf{send}\left(\mu_{i}\right)

\triangleright

TX to neighbors

33:end procedure

V EVALUATION

We performed the evaluation of our approach using the physics-based ARGoS multi-robot simulator [28] with data from the SceneNN benchmark [24]. The main parameters of interest are the minimum number of votes $V$ and the number of robots $N$ . The former relates to the ensemble accuracy of the map through Eq. (6) and Figure 4. The latter changes the scale of the distributed system and the degree of parallelization in collecting data.

V-A Simulation Parameters

Robots and environment. We study the influence of the number of robots $N$ by running experiments with 30, 60 and 90 robots. The robots have a communication range $C$ of 2 m and move at a forward speed of 5 cm/s. The robots diffuse and avoid obstacles in an $8\times 8\,\mathrm{m^{2}}$ environment. Given these settings, the robot density is such that robots are within communication range most of the time, but a significant number of intermittent disconnections occur. We do not consider line-of-sight obstructions.

In order to run experiments representative of collective annotation in an indoors environment, we import 40 objects of 13 types from the SceneNN dataset. Therefore, the objects are non-uniformly distributed in space. We made the following adjustments when importing scene 005 of the dataset: (i) we limited physical dimensions to fit in the viewing frustum; (ii) we lowered objects to the ground level; (iii) we rotated them to face towards the center of each “room”; (iv) we removed overlapping objects; (v) we re-labelled “unknown” objects by drawing from the list of object classes at random. Figure 3 shows the imported environment with these modifications.

Classifier. We simulate the statistical behavior of the BGA-DGCNN classifier using data from the SceneNN dataset [24, 10]. The procedure to generate a posterior distribution is described in Section III-A3. Each observed annotation $\lambda_{\nu}$ in the simulation is given by drawing from this distribution.

Ensemble accuracy. In order to set $V$ according to a desired range of probabilistic accuracy, we compute the ensemble accuracy $p_{ens}(n,p)$ for each class of objects for a given number of votes $n$ using Equation 6. We use the per-class accuracy $p$ for BGA-DGCNN shown in Table I, as reported in [10]. Figure 4 is the resulting curve. In our experiments, we vary the threshold $V$ in the range between the two vertical lines in Figure 4.

TABLE I: BGA-DGCNN classifier class accuracy on SceneNN PB-T50-RS dataset. [10]

bin	cabinet	chair	desk	display	door	shelf
81.9	84.4	92.6	77.3	80.4	92.4	80.5
table	bed	pillow	sink	sofa	toilet	overall
74.1	72.7	78.1	79.2	91	79.7	75.7

SwarmMesh parameters. We select the data structure parameters so as to maintain a reasonable load factor, i.e., the ratio of data to store over the available storage capacity. We set the memory capacity of robots $M_{i}$ to 20 tuples with a target of 10 tuple for storage and an allowance of 10 for the routing queue. We set the step size in the hashing function $R(\lambda_{\nu})$ to 5 (see Section IV-B).

V-B Mapping Performance

Effect of the number of robots. Using more robots distributes sensing spatially and leads to a more densely connected robotic network. As a result, we expect that increasing the number of robots $N$ speeds up the collection and aggregation of data needed to complete the semantic map. In our setting, map coverage refers to the ratio of covered objects to the total number of objects. We make a distinction between two types of coverage: (i) the observation coverage which considers an object covered if at least one robot annotated the object; (ii) the consolidation coverage which considers an object covered if there exists a consolidated annotation $\bar{\lambda}$ for the object in the shared memory. Figure 5 shows the temporal curve of both types of coverage across different values of $N$ and a set threshold of votes $V=3$ . The curves of map coverage over time increase faster with a higher number of robots. The consolidation coverage rises slower than the observation coverage since it depends on the consolidation process that queries the shared memory as described in Section IV-D. With $N=90$ , the delay for consolidating all the annotations is 14 s from the time all objects have been observed.

Effect of the number of votes. The observation coverage is independent of $V$ . Figure 6 shows the map accuracy and the consolidation coverage over time across values of $V$ . The experimental map accuracy is the ratio of correct consolidated annotations to the number of consolidated annotations. We observe that the map accuracy increases with $V$ at the cost of a slower coverage.

V-C Memory-Related Performance

Distributed storage cost. Figure 7 shows the total storage cost (1a) of the tuples-to-robots assignment realized in practice over time in the simulation, and optimal and worst case curves. At every time step, the tuple set $\mathcal{T}$ and the number of neighbors of each robot $|\mathcal{N}_{i}|$ are updated. In our evaluation, we consider tuples of unit volume ( $v_{\tau}=1,\,\forall\tau$ ) and an homogeneous memory capacity across robots ( $M_{i}=M,\,\forall i$ ). We calculate the cost obtained in simulation using the heuristic tuple-to-robot assignment rule of SwarmMesh. In order to tell apart cases in which tuples are assigned to robots with null $|\mathcal{N}_{i}|$ and/or $m_{i}$ , we cap each term in the sum to 1. These cases result in jumps of 1 of the cost in Figure 7.

The VSCBPP problem is NP-hard but our experimental settings ( $v_{\tau}=1,\,\forall\tau$ ) and ( $M_{i}=M,\,\forall i$ ) allow for optimally solving instances of the simulation with $N=30$ , $V=3$ . We reduce the space of the exhaustive search by noting that permutations of the rows of $(a_{\tau i})$ lead to equivalent solutions in terms of cost. We enumerate integer partitions of $|\mathcal{T}|$ with fewer than $|\mathcal{I}|$ parts and such that all parts are smaller than $M$ . For each such partition, we determine the minimal cost using the rearrangement inequality for the product of $1/|\mathcal{N}_{i}|$ and $1/m_{i}$ :

\sum_{i\in\mathcal{I}}\cfrac{1}{|\mathcal{N}_{i}|}\cdot\cfrac{1}{m_{i}}\geq\cfrac{1}{|\mathcal{N}_{\sigma(1)}|}\cdot\cfrac{1}{|m_{\pi(1)}|}\,+\dots+\cfrac{1}{|\mathcal{N}_{\sigma(|\mathcal{I}|)}|}\cdot\cfrac{1}{|m_{\pi(|\mathcal{I}|)}|}

where $\sigma(.)$ is an ascending ordering of $|\mathcal{N}_{i}|$ , and $\pi(.)$ is a descending ordering of $m_{i}$ .

Given $|\mathcal{N}_{i}|$ and $\mathcal{T}$ , we compute the worst cost by assigning one tuple to each robot with $|\mathcal{N}_{i}|=0$ , filling the memory ( $m_{i}=0$ ) of as many robots as possible given the number of tuples $|\mathcal{T}|$ and placing the remainder of tuples in the robot of lowest non-zero $|\mathcal{N}_{i}|$ .

SwarmMesh dimensioning. Figure 8 validates our selection of the SwarmMesh parameters. It shows that the bias of the NodeID distribution is greater than that of the distribution of tuple hashes. This indicates that our insight that uncertain annotations should correspond to higher hash values is appropriate, i.e., it is likely to find a robot of greater NodeID than a given tuple hash. We keep the parameters constant across numbers of robots. This means that with higher swarm size $N$ , the collective memory capacity $N\cdot M$ is over-sized and the key partitioning matches tuples to robots randomly. In practice, this means that the robot writing the tuple typically keeps it locally, which leads to reduced communication overhead at greater swarm sizes.

V-D Communication Load

Effect of the number of votes. We find that the number of bytes sent increases with the minimum number of votes $V$ (Figure 9). This indicates that an increase in map accuracy comes at the cost of a higher communication overhead.

Effect of the number of robots. Figure 10 shows the bandwidth usage per robot across different values of $N$ in the configuration with the largest amount of data to be managed ( $V=6$ ). Communication overhead decreases when $N$ increases because we keep the same shared memory parameters across values of $N$ . This is consistent with the effect described when discussing the SwarmMesh dimensioning.

VI CONCLUSIONS

We proposed a method for the distributed storage and fusion of semantic annotations across a swarm of robots. We consider robots with limited memory, each running a pre-trained classifier to annotate objects in the environment. We show the costs of storing and reducing the uncertainty of a semantic map in terms of mapping performance, memory usage and communication overhead for users of this framework to make informed decisions on the trade-offs between the key performance metrics. In future work, we will study the aggregation of outputs of different types of classifiers to produce richer semantic maps. We will also assess the performance of our approach on real robots.

ACKNOWLEDGMENTS

This work was funded by a grant from Amazon Robotics. This research was performed using computational resources supported by the Academic & Research Computing group at Worcester Polytechnic Institute.

References

[1] G. Valentini, D. Brambilla, H. Hamann, and M. Dorigo, “Collective Perception of Environmental Features in a Robot Swarm,” in Swarm Intelligence. Cham: Springer International Publishing, 2016, pp. 65–76.
[2] J. T. Ebert, M. Gauci, F. Mallmann-Trenn, and R. Nagpal, “Bayes bots: Collective bayesian decision-making in decentralized robot swarms,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 7186–7192.
[3] L. Riazuelo, M. Tenorth, D. Di Marco, M. Salas, D. Gálvez-López, L. Mösenlechner, L. Kunze, M. Beetz, J. D. Tardós, L. Montano et al., “Roboearth semantic mapping: A cloud enabled knowledge-based approach,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 432–443, 2015.
[4] R. Khodayi-mehr, Y. Kantaros, and M. M. Zavlanos, “Distributed state estimation using intermittently connected robot networks,” IEEE Transactions on Robotics, vol. 35, no. 3, pp. 709–724, 2019.
[5] N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Decentralized active information acquisition: Theory and application to multi-robot slam,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 4775–4782.
[6] K. Y. Leung, T. D. Barfoot, and H. H. Liu, “Decentralized cooperative slam for sparsely-communicating robot networks: A centralized-equivalent approach,” Journal of Intelligent & Robotic Systems, vol. 66, no. 3, pp. 321–342, 2012.
[7] R. Polikar, Ensemble Learning, C. Zhang and Y. Ma, Eds. Boston, MA: Springer US, 2012.
[8] N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford et al., “The limits and potentials of deep learning for robotics,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 405–420, 2018.
[9] L. I. Kuncheva, Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2014.
[10] M. A. Uy, Q.-H. Pham, B.-S. Hua, T. Nguyen, and S.-K. Yeung, “Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1588–1597.
[11] G. Valentini, E. Ferrante, and M. Dorigo, “The best-of-n problem in robot swarms: Formalization, state of the art, and novel perspectives,” Frontiers in Robotics and AI, vol. 4, p. 9, 2017. [Online]. Available: http://journal.frontiersin.org/article/10.3389/frobt.2017.00009
[12] M. Crosscombe, J. Lawry, S. Hauert, and M. Homer, “Robust distributed decision-making in robot swarms: Exploiting a third truth state,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 4326–4332.
[13] A. Mitra, J. A. Richards, S. Bagchi, and S. Sundaram, “Resilient distributed state estimation with mobile agents: overcoming byzantine adversaries, communication losses, and intermittent measurements,” Autonomous Robots, vol. 43, no. 3, pp. 743–768, 2019.
[14] A. Tahbaz-Salehi and A. Jadbabaie, “On consensus over random networks,” in 44th Annual Allerton Conference. Citeseer, 2006.
[15] L. Xiao, S. Boyd, and S.-J. Kim, “Distributed average consensus with least-mean-square deviation,” Journal of Parallel and Distributed Computing, vol. 67, no. 1, pp. 33–46, Jan. 2007.
[16] H. J. Leblanc, H. Zhang, X. Koutsoukos, and S. Sundaram, “Resilient asymptotic consensus in robust networks,” Ieee Journal on Selected Areas in Communications, vol. 31, no. 4, 2013.
[17] D. Albani, D. Nardi, and V. Trianni, “Field coverage and weed mapping by uav swarms,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Ieee, 2017, pp. 4319–4325.
[18] P.-Y. Lajoie, B. Ramtoula, Y. Chang, L. Carlone, and G. Beltrame, “Door-slam: Distributed, online, and outlier resilient slam for robotic teams,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1656–1663, 2020.
[19] J. P. Queralta, J. Taipalmaa, B. C. Pullinen, V. K. Sarker, T. N. Gia, H. Tenhunen, M. Gabbouj, J. Raitoharju, and T. Westerlund, “Collaborative multi-robot systems for search and rescue: Coordination and perception,” arXiv preprint arXiv:2008.12610, 2020.
[20] J.-R. Ruiz-Sarmiento, C. Galindo, and J. Gonzalez-Jimenez, “Building multiversal semantic maps for mobile robot operation,” Knowledge-Based Systems, vol. 119, pp. 257–272, 2017.
[21] A. Howard, M. Mataric, and G. Sukhatme, “Mobile sensor network deployment using potential fields: A distributed, scalable solution to the area coverage problem,” in DARS, 2002.
[22] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation using deep learning and geometry,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7074–7082.
[23] H. Carrillo, K. H. Brodersen, and J. A. Castellanos, “Probabilistic performance evaluation for multiclass classification using the posterior balanced accuracy,” in ROBOT2013: First Iberian Robotics Conference. Springer, 2014, pp. 347–361.
[24] B.-S. Hua, Q.-H. Pham, D. T. Nguyen, M.-K. Tran, L.-F. Yu, and S.-K. Yeung, “Scenenn: A scene meshes dataset with annotations,” in International Conference on 3D Vision (3DV), 2016.
[25] T. G. Crainic, G. Perboli, W. Rei, and R. Tadei, “Efficient lower bounds and heuristics for the variable cost and size bin packing problem,” Computers & Operations Research, vol. 38, no. 11, pp. 1474–1482, 2011.
[26] N. Majcherczyk and C. Pinciroli, “SwarmMesh: A distributed data structure for cooperative multi-robot applications,” IEEE International Conference on Robotics and Automation, 2020.
[27] G. E. Andrews, The theory of partitions. Cambridge university press, 1998, no. 2.
[28] C. Pinciroli, V. Trianni, R. O’Grady, G. Pini, A. Brutschy, M. Brambilla, N. Mathews, E. Ferrante, G. Di Caro, F. Ducatelle, M. Birattari, L. M. Gambardella, and M. Dorigo, “ARGoS: a modular, parallel, multi-engine simulator for multi-robot systems,” Swarm Intelligence, vol. 6, no. 4, pp. 271–295, 2012.

Distributed Data Storage and Fusion for Collective Perception in Resource-Limited Mobile Robot Swarms