This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Distributed Data Storage and Fusion for Collective Perception
in Resource-Limited Mobile Robot Swarms

Nathalie Majcherczyk, Daniel Jeswin Nallathambi, Tim Antonelli,  Carlo Pinciroli Robotics Engineering, Worcester Polytechnic Institute, MA, USA
Email: {nmajcherczyk, cpinciroli}@wpi.edu
Abstract

In this paper, we propose an approach to the distributed storage and fusion of data for collective perception in resource-limited robot swarms. We demonstrate our approach in a distributed semantic classification scenario. We consider a team of mobile robots, in which each robot runs a pre-trained classifier of known accuracy to annotate objects in the environment. We provide two main contributions: (i) a decentralized, shared data structure for efficient storage and retrieval of the semantic annotations, specifically designed for low-resource mobile robots; and (ii) a voting-based, decentralized algorithm to reduce the variance of the calculated annotations in presence of imperfect classification. We discuss theory and implementation of both contributions, and perform an extensive set of realistic simulated experiments to evaluate the performance of our approach.

I INTRODUCTION

Decentralized collective perception is one of the main activities that a swarm of robots must perform. A major challenge in collective perception is dealing with partial and uncertain knowledge. To deploy robots in safety-critical applications, such as disaster recovery or autonomous driving, researchers strive to increase the reliability of the information used for decision-making. In the context of resource-constrained robot swarms, significant effort has been devoted to solving this problem, with particular focus on the best-of-nn formulation [1][2].

In this paper, we deal with a collective perception scenario related to, but more complex than, the best-of-nn formulation. We assume that individual robots perform observations, from which they individually derive annotations on the environment through a machine learning algorithm. The algorithm, however, might produce incorrect annotations. The aim of our work is to take advantage of the fact that multiple robots can combine individual annotations to derive a set of more accurate consolidated annotations.

To showcase our approach, as shown in Fig. 1, we consider the case in which the robots must collectively construct and store a coherent set of semantic annotations on features found in the environment, such as pieces of furniture, trees, or doorways. To consider low-memory constraints, we assume that the objects are internally stored as bounding boxes, rather than through more detailed representations such as voxels or graphs. These annotated bounding boxes constitute the essence of a distributed semantic map [3] of the environment. This problem is different from the best-of-nn problem. In the latter, the robots must agree on a single choice for the entire swarm; in our problem, the robots must collectively construct a coherent set of annotations.

Semantic mapping poses interesting challenges in the context of resource-constrained collective robotics. Constructing a semantic map entails the collection and storage of large amounts of data. This data is typically processed by the robots individually at first. The result is then shared and fused with the results of other robots to produce a coherent representation of the environment. An effective approach to solve these problems is capable of coping with low memory availability, mobility, noise, uncertainty, and bandwidth limitations [4, 5, 6].

Our approach to semantic mapping draws inspiration from research in ensemble learning [7]. Ensemble learning reduces the variability of individual models through information fusion [8]. In ensemble learning, fusion is achieved through bagging and stacking techniques that aggregate models, typically classifiers, by combining their outputs through an averaging process or through a meta-model trained for that purpose [7, 9].

Refer to caption
Figure 1: Collective semantic annotation application with annotation consolidation.

This paper offers two main contributions. First, we present an algorithm to store localized semantic annotations across a swarm of mobile robots, under the form of a shared global memory. Second, we propose an approach to online fusion of uncertain semantic annotations, formalizing the problem and deriving a solution from first principles. We consider a scenario in which the user has already trained a multi-class classifier on an extensive dataset with multiple annotations of various objects [10]. This classifier is deployed on each robot and used to identify objects for the map, which are shared with the rest of the swarm. The semantic map is constructed by consolidating multiple observations of the same objects into single ones. Through a voting mechanism, our approach copes with the fact that classifiers are imperfect and might mislabel objects. The final result of our approach is a semantic map whose accuracy is superior to that achievable by any individual robot alone. We also provide a closed-form equation to assess the accuracy of a consolidated annotation as a function of the number of votes and the individual classifier accuracy.

Our paper is organized as follows. In Section II we discuss related work. The components of our framework are described in Section IV. We report the results of our performance evaluation in Section V, and conclude the paper in Section VI.

II RELATED WORK

Best-of-𝐧\mathbf{n} problem. Previous work in swarm robotics has tackled collective classification problems for one global environmental property. These problems have commonly been referred to as best-of-nn [11]. Valentini et al. proposed a honeybee-inspired approach for robots to collectively decide whether a black and white colored environment has a majority of white or black tiles [1]. Ebert et al. applied Bayesian estimation to the same problem with the added explicit group-wide agreement on the output [2]. Robust formulations have also been studied in which robots might be affected by noise [12] or act in an adversarial manner [13]. As discussed in the introduction, semantic mapping is a more complex problem, in that it can be seen as a repeated best-of-nn problem in which the annotations must also be coherent with each other.

Estimation of continuous variables. Other work has considered the collective estimation of a continuous value. Early work focused on average consensus [14, 15, 16], in which the robots must agree on the average of individual initial estimates of a quantity of interest. More recently, Albani et al. proposed a collaborative weed mapping application that considers inter-robot networking and motion planning [17]. Robots communicate to improve the confidence interval of weed density predictions for each cell in a discrete environment. They assume that robots know the confidence value for each independent prediction and keep the prediction with the maximum confidence. In our work, in contrast, we do not assume the confidence of the prediction known, and resort to a voting process to identify the most likely annotation.

Distributed mapping and semantic classification. Notable recent work in collective perception include distributed mapping, such as DOOR-SLAM [18]. In DOOR-SLAM, the robots construct a graph-based map (but not a semantic one) in a decentralized fashion, coping with spurious observations by identifying and rejecting outliers. In distributed semantic classification [19], effort have concentrated on cloud-based method rather than decentralized ones, such as RoboEarth [3]. In the work of Ruiz et al. [20], a centralized knowledge base is used to build semantic map. Classification uncertainty is solved through Conditional Random Fields, which estimate beliefs on the annotations by combining spatial and contextual knowledge.

III PROBLEM STATEMENT

In this section, we explain our main modelling assumptions and formulate the distributed storage and collective annotation problems.

III-A Assumptions

III-A1 Multi-Robot System

We consider a system of NN robots, each with a unique identifier i{1,2,,N}i\in\{1,2,\ldots,N\}. Each robot can exchange messages within a communication range CC with its neighbors 𝒩i(t)\mathcal{N}_{i}(t). Outgoing messages are limited in size at each time step tt to model a finite broadcasting bandwidth. Furthermore, each robot has a limited memory capacity MiM_{i} devoted to the housekeeping of a shared data structure. This capacity is subdivided into storage capacity SiS_{i} and routing capacity RiR_{i}. The available memory of a robot at a given time is denoted by mi(t)m_{i}(t). Robots move according to a diffusion policy with obstacle avoidance [21]. We assume that each robot can localize itself in a global frame. We further assume robots to be equipped with a perception module. To focus on the problems of storage and fusion, in this paper we abstract away the inner workings of the module and simulate its behavior through the following assumptions: (i) the perception module uses a sensor with physical specifications constraining its viewing frustum (see Figure 2); (ii) the perception module is able to determine the center position of objects perfectly; (iii) the perception module annotates objects imperfectly (see Section III-A3).

III-A2 Object Representation and Detection

We consider robots to be in an environment with non-uniformly distributed objects of different types. We use 3D bounding boxes to represent these objects in space. A bounding box is fully described by its center, dimensions, and orientation [22]. We assume that the sensor of the perception module can detect and localize objects when their front face corners fall inside the sensor viewing frustum (see Figure 2).

Refer to caption
Figure 2: The front corners of an object are inside the sensor viewing frustum.

III-A3 Simulated Classifier

Within the perception module, we simulate the statistical behavior of a trained multi-class object classifier. We leverage the work of Carillo et al. [23] deriving the posterior distribution of the balanced accuracy, conditioned on predictions of a trained classifier on a sample test dataset. We use this posterior distribution to generate either correct or incorrect object annotations upon each object detection. To distinguish cases of erroneous labels, we further modelled the leftover probability distributions using the confusion matrices for the classifier. The statistical data used for these probabilistic models comes from the SceneNN benchmarking data [24]. SceneNN provides raw outcomes and accuracies for several multi-class classifiers tested extensively on a catalog of point clouds. We use the PB-T50-RS dataset with translation, rotation and scaling perturbations of the bounding boxes to compute the posterior distribution modelling the classifier statistical behavior.

III-B Distributed Storage Problem Formulation

The first goal of our work is to distribute object annotation data across robots in a way that makes efficient use of the collective memory. Each robot can hold a defined number of data items specified by its memory capacity MiM_{i}. Besides memory constraints, we want to account for communication-related costs.

Since we want to pack data items into bins (robots) with different properties, we formulate the distributed storage problem as a heterogeneous bin packing optimization problem. In particular, we consider the variant known as the Variable Cost and Size Bin Packing Problem (VCSBPP) [25].

Given a set of items 𝒯\mathcal{T} with different volumes vτv_{\tau} to be loaded into a set of available bins \mathcal{I} with different capacities MiM_{i} and costs cic_{i}, we aim to minimize the total cost of the bins selected to store items:

mina,s\displaystyle\min_{a,s}\quad f(a,s)=icisi\displaystyle f(a,s)=\sum_{i\in\mathcal{I}}c_{i}s_{i} (1a)
s.t. iaτi=1\displaystyle\sum_{i\in\mathcal{I}}a_{\tau i}=1\quad τ𝒯\displaystyle\forall\tau\in\mathcal{T} (1b)
τ𝒯vτaτiMisi\displaystyle\sum_{\tau\in\mathcal{T}}v_{\tau}a_{\tau i}\leq M_{i}s_{i}\quad i\displaystyle\forall i\in\mathcal{I} (1c)
aτi{0,1}\displaystyle a_{\tau i}\in\{0,1\}\quad τ𝒯,i\displaystyle\forall\tau\in\mathcal{T},\ \forall i\in\mathcal{I} (1d)
si{0,1}\displaystyle s_{i}\in\{0,1\}\quad i\displaystyle\forall i\in\mathcal{I} (1e)

with {si}i\{s_{i}\}_{i\in\mathcal{I}} denoting bin-selection variables, and sis_{i} equalling 1 when a bin ii is selected for storing items. The set {aτi}t𝒯,i\{a_{\tau i}\}_{t\in\mathcal{T},i\in\mathcal{I}} contains item-to-bin assignment variables; aτia_{\tau i} equals 1 if item τ\tau is assigned to bin ii. In our variant of the problem, we define bin costs cic_{i} as follows:

ci=1|𝒩i|mi=1|𝒩i|(Miτ𝒯vτaτi)c_{i}=\cfrac{1}{|\mathcal{N}_{i}|\cdot m_{i}}=\cfrac{1}{|\mathcal{N}_{i}|\cdot(M_{i}-\sum_{\tau\in\mathcal{T}}v_{\tau}a_{\tau i})} (2)

We select bin cost to be inversely proportional to the product of the number of neighbors and the memory left after the assignment. This is to model practical considerations such as potential for disconnections and subsequent memory overflows, which are not otherwise modelled in the VCSBPP.

III-C Annotation Consolidation Problem Statement

Our second goal is to improve the map annotation accuracy by using at least VV votes for each object to identify in the environment. We seek to derive the ensemble probability of success in predicting the correct class pens(V,p)p_{ens}(V,\,p) as a non-decreasing function of VV that is always at least as large as the accuracy pp of the correct class.

IV METHODOLOGY

IV-A Overview

Figure 1 shows the interconnection between the different components of the collective semantic annotation application with label consolidation. Robots move in an environment with objects to annotate. Upon detecting an object and deciding to record an annotation, each robot assigns an uncertain class annotation for the location. The robots writes the annotation into the shared memory as described in Section IV-B. Upon storing multiple annotations from the same location, robots query the shared memory for all data from that location to retrieve all the related annotations. If they receive enough annotations back from the query, the robots write the most frequent label for that location into the shared memory as a consolidated annotation. The exact consolidation mechanism and the resulting ensemble accuracy are formalized in Section IV-C. Over time, as robots explore the environment, the shared memory stores an increasingly more complete map of the annotated objects in the form of consolidated annotations. Section IV-D explains the local routine run by the robots to perform the various actions across components.

IV-B Distributed Storage Through SwarmMesh

We implement a shared memory to enable distributed storage across the robots. For this purpose we used SwarmMesh [26], a decentralized data structure for mobile swarms. As part of the work in this paper, we also provide a generic, open-source C++ implementation of this data structure.111https://github.com/NESTLab/SwarmMeshLibrary SwarmMesh is based on a distributed online heuristic solution to the Bin Packing problem formulated in Section III-B, Eqs. (1)-(2).

SwarmMesh stores data in the form of tuples (key-value pairs). The tuple key k is used for both storing and addressing tuples. It consists of two parts: (i) a unique tuple identifier τ\tau and (ii) a tuple hash ρτ\rho_{\tau} that depends on the tuple value v to be stored. When a robot ii creates a new tuple, the unique tuple identifier τ\tau is calculated as the concatenation of the binary representations of ii and of the count of tuples created so far by the robot. The tuple value v contains the annotation λν\lambda_{\nu}, the box center, the box orientation and the vector from the center of the box to the front right corner of the box. These variables fully describe an annotated 3D bounding box.

Each robot calculates a so-called NodeID, δi\delta_{i}, which determines which tuples can be stored on a robot and which cannot. The robots compute their NodeIDs at each time step as follows:

δi(t)={mi(t)|𝒩i(t)|if |𝒩i(t)|>01otherwise\delta_{i}(t)=\begin{cases}m_{i}(t)\cdot|\mathcal{N}_{i}(t)|&\text{if }|\mathcal{N}_{i}(t)|>0\\ 1&\text{otherwise}\end{cases} (3)

Intuitively, the NodeID encodes how willing a robot is to store a new tuple, but also how desirable it is to store a tuple on that robot. In Eq. (3), the NodeID increases linearly with the amount of available memory (more room for tuples) and the number of neighbors of a robot (better connectivity, lower likelihood of disconnections, better routing). As the robots move and store or delete tuples, the NodeIDs change over time.

The NodeIDs form a distribution that defines an hierarchy among robots. To decide which robot eventually stores a tuple, SwarmMesh has an analogous mechanism that imposes an hierarchy among tuples. A tuple is passed from robot to robot until δi(t)>ρτ\delta_{i}(t)>\rho_{\tau}. Intuitively, we want tuples with a high hash ρτ\rho_{\tau} to occupy the best robots in the network. Our insight is that the most uncertain annotations are those that should be mapped to higher values of ρτ\rho_{\tau}, because this makes them easier to query and route. This makes it more efficient to consolidate uncertain annotations into annotations with a sufficient level of confidence. In contrast, annotations with high levels of certainty need a less critical placement. To formalize this notion, we calculate ρτ=R(λν)\rho_{\tau}=R(\lambda_{\nu}), where R()R(\cdot) is a staircase function increasing with the uncertainty of annotation λν\lambda_{\nu}.

SwarmMesh offers a wide API of operations to handle tuples. In this work, we use the following operations:

  • store(k,v) which assigns a tuple to a suitable robot in the data structure;

  • get(x,y,r), which returns all the tuples located within a radius rr of the point (x,y)(x,y) expressed in a global reference frame;

  • erase_except_tuple(x,y,r,k), which removes from the data structure all tuples located within a radius rr around the point (x,y)(x,y) in the global reference frame, except for the tuples with unique identifier τ\tau in the list k.

IV-C Annotation Consolidation Through Plurality Voting

Plurality vote. Upon querying the shared memory for a particular location in space, the querying robot receives class annotations predicted by various robots. If the number of votes nn is greater than or equal to the minimum number of votes VV, the robot aggregates these votes into a consolidated annotation λ¯\bar{\lambda} for the object through plurality voting:

λ¯=argmaxclassesν=1nI(λν=class)\bar{\lambda}=\operatorname*{argmax}_{\text{classes}}\sum_{\nu=1}^{n}I(\lambda_{\nu}=\text{class})

where II is the indicator function. In case of ties, the consolidated annotation is selected at random from the tied options.

Ensemble Probability of Success. We seek to derive the probability of success for nn independent votes, each for one of cc classes where cnc\geq n. We assume that class 1 is correct and that each vote is for class 1 with probability pp. The probability of an incorrect vote is then 1p1-p, and we make the simplifying assumption that each incorrect class is equally likely. We denote the total vote count for each class as a vector z=(z1,z2,,zc)z=(z_{1},z_{2},\ldots,z_{c}) where zi{0,1,,n}z_{i}\in\{0,1,\ldots,n\} and z1+z2++zc=nz_{1}+z_{2}+\cdots+z_{c}=n. This vector follows a multinomial distribution with probability mass function

ϕ(z|n,p)=(nz1,,zc)pz1(1pc1)nz1\phi(z\,|\,n,p)=\binom{n}{z_{1},\ldots,z_{c}}p^{z_{1}}\left(\frac{1-p}{c-1}\right)^{n-{z_{1}}}

where

(nz1,,zc)=n!z1!zc!\binom{n}{z_{1},\ldots,z_{c}}=\frac{n!}{z_{1}!\cdots z_{c}!}

denotes the multinomial coefficient.

Without knowing the correct class, we can estimate it from an observed vector as the class with the most votes, whether a majority or plurality. In the case where several classes are tied with the highest number of votes, we select one of those classes at random. Again considering that z1z_{1} represents the true correct class, we use zz^{*} to denote a vector of vote counts for which z1ziz_{1}\geq z_{i}, for all i=2,3,,ci=2,3,\ldots,c. The probability of identifying the correct class with nn total votes is then

pens(n,p)=zϕ(z|n,p)P(success|z)p_{ens}(n,p)=\sum_{z^{*}}\phi(z^{*}\,|\,n,p)P(\mathrm{success}\,|\,z^{*}) (4)

since all other vectors zz result in a success probability of zero. Because the largest term in zz^{*} appears first and all incorrect classes are chosen with equal probability, we can express (4) using integer partitions.

Integer partitions. An integer partition ξ\xi is a nonincreasing sequence of positive integers ξ1ξ2ξω\xi_{1}\geq\xi_{2}\geq\cdots\geq\xi_{\omega}. We say that ξ\xi is a partition of nn, denoted ξn\xi\vdash n, if i=1ωξi=n\sum_{i=1}^{\omega}\xi_{i}=n, and we consider the set of all such ξ\xi to be 𝒫\mathcal{P}. The ξi\xi_{i} are called parts of the partition, and we define the length of the partition (ξ):=ω\ell(\xi):=\omega. An alternative formulation is to consider the infinite sequence of multiplicities of each part ξ=(k1,k2,)\xi=(k_{1},k_{2},\ldots) where kj{0,1,}.k_{j}\in\{0,1,\ldots\}. Here, ξn\xi\vdash n if j=1jkj=n\sum_{j=1}^{\infty}jk_{j}=n, and the length (ξ)=j=1kj=ω\ell(\xi)=\sum_{j=1}^{\infty}k_{j}=\omega, since all but a finite number of kjk_{j} are zero [27]. For example, ξ=(5,3,2,1,1)\xi=(5,3,2,1,1) in part notation and ξ=(2,1,1,0,1,0,)\xi=(2,1,1,0,1,0,\ldots) in multiplicity notation represent the same partition of n=12n=12. We will refer to both the parts ξi\xi_{i} and multiplicities kjk_{j} of the same partition throughout.

Note that we can map each vector zz^{*} to an integer partition g(x)=ξg(x^{*})=\xi, where ξ1\xi_{1} represents the number of votes for the correct class, and the other ξi\xi_{i} represent the positive numbers of votes for ω1\omega-1 incorrect classes. Since all incorrect classes are equally likely, each zz^{*} that maps to a particular ξ\xi yields the same probability of success. Thus,

pens(n,p)\displaystyle p_{ens}(n,p) =ξnξ𝒫z:g(z)=ξϕ(z|n,p)P(success|z)\displaystyle=\sum_{\begin{subarray}{c}\xi\vdash n\\ \xi\in\mathcal{P}\end{subarray}}\sum_{\begin{subarray}{c}z^{*}:g(z^{*})=\xi\end{subarray}}\phi(z^{*}\,|\,n,p)P(\mathrm{success}\,|\,z^{*})
=ξnξ𝒫|g1({ξ})|ϕ(ξ|n,p)P(success|ξ)\displaystyle=\sum_{\begin{subarray}{c}\xi\vdash n\\ \xi\in\mathcal{P}\end{subarray}}\left|g^{-1}\big{(}\{\xi\}\big{)}\right|\phi(\xi\,|\,n,p)P(\mathrm{success}\,|\,\xi) (5)

where |g1({ξ})|\left|g^{-1}\big{(}\{\xi\}\big{)}\right| is the cardinality of the preimage of {ξ}\{\xi\}, i.e., the number of zz^{*} that map to ξ\xi.

To determine |g1({ξ})|\left|g^{-1}\big{(}\{\xi\}\big{)}\right|, we start with the (c1ω1)\binom{c-1}{\omega-1} combinations of incorrect classes and count the ways to assign vote counts with multiplicities kjk_{j} to each. There are

(ω1k1,k2,,kξ11)\binom{\omega-1}{k_{1},k_{2},\ldots,k_{\xi_{1}}-1}

such assignments, where the kξ11k_{\xi_{1}}-1 derives from the fact that one of the spots for the largest possible vote count is taken by the correct class. Together, we have

|g1({ξ})|=(c1ω1)(ω1k1,k2,,kξ11)\left|g^{-1}\big{(}\{\xi\}\big{)}\right|=\binom{c-1}{\omega-1}\binom{\omega-1}{k_{1},k_{2},\ldots,k_{\xi_{1}}-1}

For the conditional probability P(success|ξ)P(\mathrm{success}\,|\,\xi), when ξ1\xi_{1} is the strictly largest part, it is chosen with probability 1; otherwise, there is a kξ1k_{\xi_{1}}-way tie, from which a class is chosen at random. In both cases, the correct class is chosen with probability P(success|ξ)=1/kξ1P(\mathrm{success}\,|\,\xi)=1/k_{\xi_{1}}. Substituting into (5) and simplifying gives

pens(n,p)=1cξnξ𝒫(nξ1,,ξω)(cω)(ωk1,,kξ1)pξ1(1pc1)nξ1p_{ens}(n,p)=\\ \frac{1}{c}\sum_{\begin{subarray}{c}\xi\vdash n\\ \xi\in\mathcal{P}\end{subarray}}\binom{n}{\xi_{1},\ldots,\xi_{\omega}}\binom{c}{\omega}\binom{\omega}{k_{1},\ldots,k_{\xi_{1}}}p^{\xi_{1}}\left(\frac{1-p}{c-1}\right)^{n-\xi_{1}} (6)

IV-D Robot Local Routine

Robots each run a local procedure in a loop. We refer to this procedure as the control step and outline it from the perspective of robot ii in Algorithm 1. Every time step, the robot performs one iteration of this loop.

At the beginning of the procedure, the robot receives and deserializes messages from its neighbors. Certain messages are meant for specific recipients only (point-to-point) and get discarded by any other receiving robot during deserialization. The robot then computes and performs a motion increment according to a diffusion policy [21].

Next, the robot checks the output of its perception module if it is not in recording timeout. The recording timeout is a delay imposed to avoid making successive observations of the same object. If the perception module has detected an object, the robot creates a tuple and starts a store() operation on the shared memory. The payload of the tuple contains the object annotation and the bounding box location and dimensions.

If the robot is not in a querying timeout, it reviews the tuples stored in its memory by bounding box location. If the robot finds multiple tuples with the same location in its own local memory, it starts a get() query on SwarmMesh to retrieve all the tuples currently stored with that bounding box location. It then saves the meta data for the started query.

The robot goes through the queries it has started and checks whether it has not received replies for a period longer than a time threshold. If the number of tuples is larger than the minimum number of votes, the robot creates and writes a tuple into SwarmMesh with the consolidated annotation. It also start an erase_except_tuple() request of all tuples with the given location except for the newly created tuple.

SwarmMesh routing at line 30 of Alg. 1 decides which requests and replies need to be routed and to which robot. A robot may have to send a reply to a request and continue propagating the request or route a neighbor’s reply or discard the request or reply. The details of the routing process are reported in [26].

Algorithm 1 Control step - message μ\mu, neighbors 𝒩i\mathcal{N}_{i}, requests 𝒬\mathcal{Q}, replies \mathcal{R}
1:procedure Control_step
2:    𝐫𝐞𝐜𝐞𝐢𝐯𝐞{μj}j𝒩i\mathbf{receive}\,\,\{\mu_{j}\}_{j\in\mathcal{N}_{i}} \triangleright RX from neighbors
3:    for each j𝒩ij\in\mathcal{N}_{i} do
4:       (j,δj,𝒬j,j)μj(j,\delta_{j},\mathcal{Q}_{j},\mathcal{R}_{j})\leftarrow\mu_{j} \triangleright Deserialization
5:    end for
6:    𝐦𝐨𝐯𝐞()\mathbf{move()} \triangleright Diffusion motion
7:    if not in recording timeout then
8:       v 𝐬𝐞𝐧𝐬𝐞𝐎𝐛𝐣𝐞𝐜𝐭()\leftarrow\mathbf{senseObject()}
9:       if object detected then
10:          start recording timeout
11:          k 𝐡𝐚𝐬𝐡(v)\leftarrow\mathbf{hash(\texttt{v})}
12:          𝒬i𝒬i\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup store(k, v) \triangleright Write label
13:       end if
14:    end if
15:    if not in querying timeout then
16:       if holds multiple tuples from same (x,y) then
17:          start querying timeout
18:          𝒬i𝒬i\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup get(x, y, 0) \triangleright Ask for labels
19:          save query information
20:       end if
21:    end if
22:    for each started query do
23:       if results ready and enough votes then
24:          v \leftarrow plurality vote \triangleright Consolidated label
25:          k 𝐡𝐚𝐬𝐡(v)\leftarrow\mathbf{hash(\texttt{v})}
26:          𝒬i𝒬i\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup store(k,v)
27:          𝒬i𝒬i\mathcal{Q}_{i}\leftarrow\mathcal{Q}_{i}\,\cup erase_expt_tuple(x,y,0,k)
28:       end if
29:    end for
30:    (𝒬¯i,¯i)𝐫𝐨𝐮𝐭𝐞(𝒬,)(\mathcal{\bar{Q}}_{i},\mathcal{\bar{R}}_{i})\leftarrow\mathbf{route(\mathcal{Q}_{\bullet},\mathcal{R}_{\bullet})} \triangleright SwarmMesh routing
31:    μi(i,δi,𝒬¯i,¯i)\mu_{i}\leftarrow(i,\delta_{i},\mathcal{\bar{Q}}_{i},\bar{\mathcal{R}}_{i}) \triangleright Serialization
32:    𝐬𝐞𝐧𝐝(μi)\mathbf{send}\left(\mu_{i}\right) \triangleright TX to neighbors
33:end procedure

V EVALUATION

We performed the evaluation of our approach using the physics-based ARGoS multi-robot simulator [28] with data from the SceneNN benchmark [24]. The main parameters of interest are the minimum number of votes VV and the number of robots NN. The former relates to the ensemble accuracy of the map through Eq. (6) and Figure 4. The latter changes the scale of the distributed system and the degree of parallelization in collecting data.

V-A Simulation Parameters

Robots and environment. We study the influence of the number of robots NN by running experiments with 30, 60 and 90 robots. The robots have a communication range CC of 2 m and move at a forward speed of 5 cm/s. The robots diffuse and avoid obstacles in an 8×8m28\times 8\,\mathrm{m^{2}} environment. Given these settings, the robot density is such that robots are within communication range most of the time, but a significant number of intermittent disconnections occur. We do not consider line-of-sight obstructions.

In order to run experiments representative of collective annotation in an indoors environment, we import 40 objects of 13 types from the SceneNN dataset. Therefore, the objects are non-uniformly distributed in space. We made the following adjustments when importing scene 005 of the dataset: (i) we limited physical dimensions to fit in the viewing frustum; (ii) we lowered objects to the ground level; (iii) we rotated them to face towards the center of each “room”; (iv) we removed overlapping objects; (v) we re-labelled “unknown” objects by drawing from the list of object classes at random. Figure 3 shows the imported environment with these modifications.

Refer to caption
Figure 3: Storing map annotations collectively and reducing uncertainty through data fusion.

Classifier. We simulate the statistical behavior of the BGA-DGCNN classifier using data from the SceneNN dataset [24, 10]. The procedure to generate a posterior distribution is described in Section III-A3. Each observed annotation λν\lambda_{\nu} in the simulation is given by drawing from this distribution.

Ensemble accuracy. In order to set VV according to a desired range of probabilistic accuracy, we compute the ensemble accuracy pens(n,p)p_{ens}(n,p) for each class of objects for a given number of votes nn using Equation 6. We use the per-class accuracy pp for BGA-DGCNN shown in Table I, as reported in [10]. Figure 4 is the resulting curve. In our experiments, we vary the threshold VV in the range between the two vertical lines in Figure 4.

TABLE I: BGA-DGCNN classifier class accuracy on SceneNN PB-T50-RS dataset. [10]
bin cabinet chair desk display door shelf
81.9 84.4 92.6 77.3 80.4 92.4 80.5
table bed pillow sink sofa toilet overall
74.1 72.7 78.1 79.2 91 79.7 75.7
Refer to caption
Figure 4: Ensemble probability of success per class of object. This graph is based on the per-class accuracies in Table I.

SwarmMesh parameters. We select the data structure parameters so as to maintain a reasonable load factor, i.e., the ratio of data to store over the available storage capacity. We set the memory capacity of robots MiM_{i} to 20 tuples with a target of 10 tuple for storage and an allowance of 10 for the routing queue. We set the step size in the hashing function R(λν)R(\lambda_{\nu}) to 5 (see Section IV-B).

V-B Mapping Performance

Effect of the number of robots. Using more robots distributes sensing spatially and leads to a more densely connected robotic network. As a result, we expect that increasing the number of robots NN speeds up the collection and aggregation of data needed to complete the semantic map. In our setting, map coverage refers to the ratio of covered objects to the total number of objects. We make a distinction between two types of coverage: (i) the observation coverage which considers an object covered if at least one robot annotated the object; (ii) the consolidation coverage which considers an object covered if there exists a consolidated annotation λ¯\bar{\lambda} for the object in the shared memory. Figure 5 shows the temporal curve of both types of coverage across different values of NN and a set threshold of votes V=3V=3. The curves of map coverage over time increase faster with a higher number of robots. The consolidation coverage rises slower than the observation coverage since it depends on the consolidation process that queries the shared memory as described in Section IV-D. With N=90N=90, the delay for consolidating all the annotations is 14 s from the time all objects have been observed.

Refer to caption
Figure 5: Map coverage over time for V=3V=3 and N{30,60,90}N\in\{30,60,90\}. Cursors show the time when the maximum coverage was first reached.

Effect of the number of votes. The observation coverage is independent of VV. Figure 6 shows the map accuracy and the consolidation coverage over time across values of VV. The experimental map accuracy is the ratio of correct consolidated annotations to the number of consolidated annotations. We observe that the map accuracy increases with VV at the cost of a slower coverage.

Refer to caption
Figure 6: Map accuracy and coverage over time for N=60N=60 and V{3,4,5,6}V\in\{3,4,5,6\}.

V-C Memory-Related Performance

Distributed storage cost. Figure 7 shows the total storage cost (1a) of the tuples-to-robots assignment realized in practice over time in the simulation, and optimal and worst case curves. At every time step, the tuple set 𝒯\mathcal{T} and the number of neighbors of each robot |𝒩i||\mathcal{N}_{i}| are updated. In our evaluation, we consider tuples of unit volume (vτ=1,τv_{\tau}=1,\,\forall\tau) and an homogeneous memory capacity across robots (Mi=M,iM_{i}=M,\,\forall i). We calculate the cost obtained in simulation using the heuristic tuple-to-robot assignment rule of SwarmMesh. In order to tell apart cases in which tuples are assigned to robots with null |𝒩i||\mathcal{N}_{i}| and/or mim_{i}, we cap each term in the sum to 1. These cases result in jumps of 1 of the cost in Figure 7.

The VSCBPP problem is NP-hard but our experimental settings (vτ=1,τv_{\tau}=1,\,\forall\tau) and (Mi=M,iM_{i}=M,\,\forall i) allow for optimally solving instances of the simulation with N=30N=30, V=3V=3. We reduce the space of the exhaustive search by noting that permutations of the rows of (aτi)(a_{\tau i}) lead to equivalent solutions in terms of cost. We enumerate integer partitions of |𝒯||\mathcal{T}| with fewer than |||\mathcal{I}| parts and such that all parts are smaller than MM. For each such partition, we determine the minimal cost using the rearrangement inequality for the product of 1/|𝒩i|1/|\mathcal{N}_{i}| and 1/mi1/m_{i}:

i1|𝒩i|1mi1|𝒩σ(1)|1|mπ(1)|++1|𝒩σ(||)|1|mπ(||)|\sum_{i\in\mathcal{I}}\cfrac{1}{|\mathcal{N}_{i}|}\cdot\cfrac{1}{m_{i}}\geq\cfrac{1}{|\mathcal{N}_{\sigma(1)}|}\cdot\cfrac{1}{|m_{\pi(1)}|}\,+\dots+\cfrac{1}{|\mathcal{N}_{\sigma(|\mathcal{I}|)}|}\cdot\cfrac{1}{|m_{\pi(|\mathcal{I}|)}|}

where σ(.)\sigma(.) is an ascending ordering of |𝒩i||\mathcal{N}_{i}|, and π(.)\pi(.) is a descending ordering of mim_{i}.

Given |𝒩i||\mathcal{N}_{i}| and 𝒯\mathcal{T}, we compute the worst cost by assigning one tuple to each robot with |𝒩i|=0|\mathcal{N}_{i}|=0, filling the memory (mi=0m_{i}=0) of as many robots as possible given the number of tuples |𝒯||\mathcal{T}| and placing the remainder of tuples in the robot of lowest non-zero |𝒩i||\mathcal{N}_{i}|.

Refer to caption
Figure 7: VSCBPP cost over simulation time for V=3V=3 and N=30N=30.

SwarmMesh dimensioning. Figure 8 validates our selection of the SwarmMesh parameters. It shows that the bias of the NodeID distribution is greater than that of the distribution of tuple hashes. This indicates that our insight that uncertain annotations should correspond to higher hash values is appropriate, i.e., it is likely to find a robot of greater NodeID than a given tuple hash. We keep the parameters constant across numbers of robots. This means that with higher swarm size NN, the collective memory capacity NMN\cdot M is over-sized and the key partitioning matches tuples to robots randomly. In practice, this means that the robot writing the tuple typically keeps it locally, which leads to reduced communication overhead at greater swarm sizes.

Refer to caption
Figure 8: Histograms of data hashes and NodeIDs for V=6V=6 and N{30,60,90}N\in\{30,60,90\}.

V-D Communication Load

Effect of the number of votes. We find that the number of bytes sent increases with the minimum number of votes VV (Figure 9). This indicates that an increase in map accuracy comes at the cost of a higher communication overhead.

Refer to caption
Figure 9: Bytes sent per second per robot over time for N=60N=60 and V{3,5}V\in\{3,5\}.

Effect of the number of robots. Figure 10 shows the bandwidth usage per robot across different values of NN in the configuration with the largest amount of data to be managed (V=6V=6). Communication overhead decreases when NN increases because we keep the same shared memory parameters across values of NN. This is consistent with the effect described when discussing the SwarmMesh dimensioning.

Refer to caption
Figure 10: Bytes sent per second per robot over time for V=6V=6 and N{30,60,90}N\in\{30,60,90\}.

VI CONCLUSIONS

We proposed a method for the distributed storage and fusion of semantic annotations across a swarm of robots. We consider robots with limited memory, each running a pre-trained classifier to annotate objects in the environment. We show the costs of storing and reducing the uncertainty of a semantic map in terms of mapping performance, memory usage and communication overhead for users of this framework to make informed decisions on the trade-offs between the key performance metrics. In future work, we will study the aggregation of outputs of different types of classifiers to produce richer semantic maps. We will also assess the performance of our approach on real robots.

ACKNOWLEDGMENTS

This work was funded by a grant from Amazon Robotics. This research was performed using computational resources supported by the Academic & Research Computing group at Worcester Polytechnic Institute.

References

  • [1] G. Valentini, D. Brambilla, H. Hamann, and M. Dorigo, “Collective Perception of Environmental Features in a Robot Swarm,” in Swarm Intelligence.   Cham: Springer International Publishing, 2016, pp. 65–76.
  • [2] J. T. Ebert, M. Gauci, F. Mallmann-Trenn, and R. Nagpal, “Bayes bots: Collective bayesian decision-making in decentralized robot swarms,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 7186–7192.
  • [3] L. Riazuelo, M. Tenorth, D. Di Marco, M. Salas, D. Gálvez-López, L. Mösenlechner, L. Kunze, M. Beetz, J. D. Tardós, L. Montano et al., “Roboearth semantic mapping: A cloud enabled knowledge-based approach,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 432–443, 2015.
  • [4] R. Khodayi-mehr, Y. Kantaros, and M. M. Zavlanos, “Distributed state estimation using intermittently connected robot networks,” IEEE Transactions on Robotics, vol. 35, no. 3, pp. 709–724, 2019.
  • [5] N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Decentralized active information acquisition: Theory and application to multi-robot slam,” in 2015 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2015, pp. 4775–4782.
  • [6] K. Y. Leung, T. D. Barfoot, and H. H. Liu, “Decentralized cooperative slam for sparsely-communicating robot networks: A centralized-equivalent approach,” Journal of Intelligent & Robotic Systems, vol. 66, no. 3, pp. 321–342, 2012.
  • [7] R. Polikar, Ensemble Learning, C. Zhang and Y. Ma, Eds.   Boston, MA: Springer US, 2012.
  • [8] N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford et al., “The limits and potentials of deep learning for robotics,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 405–420, 2018.
  • [9] L. I. Kuncheva, Combining pattern classifiers: methods and algorithms.   John Wiley & Sons, 2014.
  • [10] M. A. Uy, Q.-H. Pham, B.-S. Hua, T. Nguyen, and S.-K. Yeung, “Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1588–1597.
  • [11] G. Valentini, E. Ferrante, and M. Dorigo, “The best-of-n problem in robot swarms: Formalization, state of the art, and novel perspectives,” Frontiers in Robotics and AI, vol. 4, p. 9, 2017. [Online]. Available: http://journal.frontiersin.org/article/10.3389/frobt.2017.00009
  • [12] M. Crosscombe, J. Lawry, S. Hauert, and M. Homer, “Robust distributed decision-making in robot swarms: Exploiting a third truth state,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 4326–4332.
  • [13] A. Mitra, J. A. Richards, S. Bagchi, and S. Sundaram, “Resilient distributed state estimation with mobile agents: overcoming byzantine adversaries, communication losses, and intermittent measurements,” Autonomous Robots, vol. 43, no. 3, pp. 743–768, 2019.
  • [14] A. Tahbaz-Salehi and A. Jadbabaie, “On consensus over random networks,” in 44th Annual Allerton Conference.   Citeseer, 2006.
  • [15] L. Xiao, S. Boyd, and S.-J. Kim, “Distributed average consensus with least-mean-square deviation,” Journal of Parallel and Distributed Computing, vol. 67, no. 1, pp. 33–46, Jan. 2007.
  • [16] H. J. Leblanc, H. Zhang, X. Koutsoukos, and S. Sundaram, “Resilient asymptotic consensus in robust networks,” Ieee Journal on Selected Areas in Communications, vol. 31, no. 4, 2013.
  • [17] D. Albani, D. Nardi, and V. Trianni, “Field coverage and weed mapping by uav swarms,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   Ieee, 2017, pp. 4319–4325.
  • [18] P.-Y. Lajoie, B. Ramtoula, Y. Chang, L. Carlone, and G. Beltrame, “Door-slam: Distributed, online, and outlier resilient slam for robotic teams,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1656–1663, 2020.
  • [19] J. P. Queralta, J. Taipalmaa, B. C. Pullinen, V. K. Sarker, T. N. Gia, H. Tenhunen, M. Gabbouj, J. Raitoharju, and T. Westerlund, “Collaborative multi-robot systems for search and rescue: Coordination and perception,” arXiv preprint arXiv:2008.12610, 2020.
  • [20] J.-R. Ruiz-Sarmiento, C. Galindo, and J. Gonzalez-Jimenez, “Building multiversal semantic maps for mobile robot operation,” Knowledge-Based Systems, vol. 119, pp. 257–272, 2017.
  • [21] A. Howard, M. Mataric, and G. Sukhatme, “Mobile sensor network deployment using potential fields: A distributed, scalable solution to the area coverage problem,” in DARS, 2002.
  • [22] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation using deep learning and geometry,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7074–7082.
  • [23] H. Carrillo, K. H. Brodersen, and J. A. Castellanos, “Probabilistic performance evaluation for multiclass classification using the posterior balanced accuracy,” in ROBOT2013: First Iberian Robotics Conference.   Springer, 2014, pp. 347–361.
  • [24] B.-S. Hua, Q.-H. Pham, D. T. Nguyen, M.-K. Tran, L.-F. Yu, and S.-K. Yeung, “Scenenn: A scene meshes dataset with annotations,” in International Conference on 3D Vision (3DV), 2016.
  • [25] T. G. Crainic, G. Perboli, W. Rei, and R. Tadei, “Efficient lower bounds and heuristics for the variable cost and size bin packing problem,” Computers & Operations Research, vol. 38, no. 11, pp. 1474–1482, 2011.
  • [26] N. Majcherczyk and C. Pinciroli, “SwarmMesh: A distributed data structure for cooperative multi-robot applications,” IEEE International Conference on Robotics and Automation, 2020.
  • [27] G. E. Andrews, The theory of partitions.   Cambridge university press, 1998, no. 2.
  • [28] C. Pinciroli, V. Trianni, R. O’Grady, G. Pini, A. Brutschy, M. Brambilla, N. Mathews, E. Ferrante, G. Di Caro, F. Ducatelle, M. Birattari, L. M. Gambardella, and M. Dorigo, “ARGoS: a modular, parallel, multi-engine simulator for multi-robot systems,” Swarm Intelligence, vol. 6, no. 4, pp. 271–295, 2012.