This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Brain Inspired Probabilistic Occupancy Grid Mapping with Hyperdimensional Computing

Shay Snyder George Mason University
Fairfax, USA
[email protected]
   Andrew Capodieci                            {@IEEEauthorhalign} David Gorsich Neya Systems, LLC
Warrendale, USA
[email protected]
US Army Futures Command
Warren, USA
[email protected]
   Maryam Parsa George Mason University
Fairfax, USA
[email protected]
Abstract

Real-time robotic systems require advanced perception, computation, and action capability. However, the main bottleneck in current autonomous systems is the trade-off between computational capability, energy efficiency and model determinism. World modeling, a key objective of many robotic systems, commonly uses occupancy grid mapping (OGM) as the first step towards building an end-to-end robotic system with perception, planning, autonomous maneuvering, and decision making capabilities. OGM divides the environment into discrete cells and assigns probability values to attributes such as occupancy and traversability. Existing methods fall into two categories: traditional methods and neural methods. Traditional methods rely on dense statistical calculations, while neural methods employ deep learning for probabilistic information processing. Recent works formulate a deterministic theory of neural computation at the intersection of cognitive science and vector symbolic architectures. In this study, we propose a Fourier-based hyperdimensional OGM system, VSA-OGM, combined with a novel application of Shannon entropy that retains the interpretability and stability of traditional methods along with the improved computational efficiency of neural methods. Our approach, validated across multiple datasets, achieves similar accuracy to covariant traditional methods while approximately reducing latency by 200x and memory by 1000x. Compared to invariant traditional methods, we see similar accuracy values while reducing latency by 3.7x. Moreover, we achieve 1.5x latency reductions compared to neural methods while eliminating the need for domain-specific model training.

Index Terms:
hyperdimensional computing, vector symbolic architectures, probabilistic robotics, occupancy grid mapping

I Introduction

Since the introduction of Unimate [1] in 1954, the field of robotics has experienced exponential growth, leading to the development of numerous specialized fields [2, 3, 4]. A major area known as probabilistic robotics, focuses on understanding how autonomous systems perform despite uncertainty in multiple tasks such as occupancy grid mapping (OGM), perception, and control, with each having their own unique challenges [5]. An essential capability for real-time robotic systems is the accurate and computationally efficient probabilistic mapping of dynamic environments. Efficient probabilistic mapping is the central idea within OGM where environments are discretized into individual cells, also known as voxels [6]. Each cell is then assigned labels or properties such as occupancy, traversability, or other relevant attributes based on specific application requirements [7].DISTRIBUTION A. Approved for public release; distribution unlimited. OPSEC #8540.

OGM methods are broadly categorized into two main areas: traditional and neural methods. Traditional methods [5, 8, 9, 10, 11] adopt a dense approach where each voxel is assigned a fully defined probability distribution. In contrast, neural methods, such as [12, 13, 14, 15, 16], utilize deep learning to create a black-box solution, which mitigates many of the computational challenges associated with traditional methods but introduces fragility to domain adaptation and model stochasticity [17]. Although neural methods represent significant computational advantages over traditional approaches, their intrinsic stochasticity and lack of interpretability within deep latent spaces, pose significant validation challenges, particularly in safety-critical applications [18].

Recent advances in cognitive science [19] and computational neuroscience [20] highlight the unique computational properties in hyperdimensional spaces. Also known as vector symbolic architectures (VSA), these methods are actively being explored as mathematical analogs to the higher-order computational capabilities of spiking neurons [21]. Furthermore, VSA approaches posses unique computational properties allowing them to statistically, and deterministically, encode semantic and spatial information [22].

In this work, we introduce a Fourier-based VSA framework, known as VSA-OGM, leveraging the unique probabilistic properties of bio-inspired hyperdimensional vectors to efficiently and accurately perform OGM. Our approach draws inspiration from both theoretical [5] and neural [12] methods aiming to strike a balance between the accuracy, interpretability, stability, and domain flexibility of traditional methods with the computational efficiency of neural methods. Moreover, VSA-OGM employs a novel application of Shannon entropy [23] to extract feature rich information from the internal holographically reduced representations. As such, this work stands as a foundational work at the intersection of traditional and neural methods known as quasi-neural methods.

Existing works on VSAs for real-time robotic systems focus on robotic path integration [24] and simplified mapping scenarios [22], rather than large scale and real world mapping applications. To the best of our knowledge, this work is the first large scale application and development of a VSA framework for real time occupancy grid mapping.

Throughout the remainder of this work, we provide further background information on the fields of OGM and VSAs. We discuss the specifics of VSA-OGM and highlight the performance of our framework on three synthetic datasets [10, 25] and one real dataset [25] compared to traditional [10, 8], and neural methods [12]. We also conduct an in-depth ablation study providing further insight into the capabilities and unique aspects of VSA-OGM. We conclude with a detailed discussion of our results and potential avenues for future research. In summary, the major contributions of our paper are:

  • We introduce a quasi-neural VSA framework for OGM, VSA-OGM, with a novel application of Shannon entropy.

  • Compared to covariant traditional methods [10] across two simulated [10] and one real-word [25] dataset, our VSA framework provides approximately 1000x and 200x memory and latency reductions while maintaining comparable accuracy.

  • We demonstrate the single-agent and multi-agent mapping capabilities of VSA-OGM compared to an invariant traditional method [8] where we reduce latency by approximately 4x.

  • We highlight the capabilities of our framework compared to neural methods [12] where we reduce latency by approximately 1.5x without model pretraining despite maintaining comparable accuracy.

  • We conduct a detailed ablation study highlighting the performance implications of VSA-OGM across a range of hyperparameter combinations.

II Background & Motivation

We begin this section with a background of vector symbolic architectures (VSAs), their application in cognitive neuroscience, and on-going research initiatives. We continue with a discussion on the details of occupancy grid mapping (OGM) and the breakdown into traditional, neural, and quasi-neural methods. Moreover, we discuss how VSAs could serve as a brain-inspired, quasi-neural bridge providing high-levels of accuracy from traditional methods and computational efficiency from neural methods.

II-A Vector Symbolic Architectures (VSAs)

Refer to caption
Figure 1: (A) The increasing orthogonality between pairs of randomly generated vectors as their dimensionality increases. (B) The dot product between two Spatial Semantic Pointers (SSPs) serving as a quasi kernel density estimator [26].

VSAs, also known as Hyperdimensional Computing, were introduced as an algorithmic abstract for the higher order computational properties of connectionist models [19]. VSA systems leverage the quasi-orthogonality property, that arises as the dimensionality of independent and identically distributed vectors approaches infinity [19]. Figure 1A provides a visualization of this process. The individual elements within hypervectors is dependent upon the specific type of VSA but generally consists of binary values, binary spatter codes, or Fourier coefficients [27]. This flexibility highlights the diversity of VSA literature where each has unique properties for specific applications.

VSA systems utilize two primary operations: binding (\otimes) and bundling (++). Some implementations also incorporate a permutation operator (\sqcap). Binding, analogous to multiplication, combines two or more input vectors into a final invertible representation that is distinct from each input. Binding also has some other notable mathematical properties:

  • Self-Identity: xx=1x\otimes x=1

  • Associativity: (xy)z=x(yz)(x\otimes y)\otimes z=x\otimes(y\otimes z)

  • Invertibility: xyx1=y+ϵx\otimes y\otimes x^{-1}=y+\epsilon

Bundling, analogous to addition, combines multiple input vectors into the superposition of each where the commutative and associative properties apply:

v=ax+by.v=a\otimes x+b\otimes y. (1)

Lastly, permutation protects the orders of role and filler pairs (associativity rules) by shifting vector dimensions:

v=ax+(by).v=a\otimes x+\sqcap(b\otimes y). (2)

In this work, we focus on a specific subset of VSAs based on Holographic Reduced Representations (HRR) [19]. HHRs were pivotal in increasing understanding about biological cognition within the mammalian brain. Further research lead to the emergence the Fourier Holographic Reduced Representations (FHRR) [19]. FHRRs were instrumental in constructing the world’s largest full scale brain model Spaun2.0 [20], and have been applied in a variety of small scale experiments in robotics and engineering applications [24, 28]. This paper goes beyond these previous works by developing a novel framework for VSAs to perform OGM with large scale real-world data. Also known as the semantic pointer architecture with Spatial Semantic Pointers (SSPs) [20], FHRR-based implementations have unique computational properties where they can encode probabilistic and continuous spatial information [29, 22]. Figure 1B provides a visualization of this process. More information on SSPs are highlighted in Section III.

II-B Occupancy Grid Mapping (OGM)

OGM discretizes continuous, n-dimensional environments into discrete hypercubes, commonly known as voxels [7]. Each voxel is then assigned an array of attributes. These attributes range from a single attribute indicating occupancy, to multiple attributes such as temperature, traversability, and semantic labels. The field of OGM has evolved into various sub-fields, each characterized by unique operational attributes. Traditional methods, such as those detailed in [5, 9, 8], utilize dense arrays of statistical distributions to model the observed environment. In contrast, neural methods [12] leverage deep learning to create latent spaces that abstract probabilistic information into hidden representations. More recently, quasi-neural methods [7] have emerged, combining the most effective attributes of both traditional and neural methods. This approach aims to optimize performance while minimizing stochasticity in mission-critical and safety-sensitive applications. As such, our VSA framework, VSA-OGM, stands as a novel contribution to the field of quasi-neural OGM methods. We explore each area in detail in the following sections.

TABLE I: Comparing the determinism, accuracy, runtime complexity, and space complexity of traditional, neural, and quasi-neural occupancy grid mapping approaches
Traditional Neural Quasi-Neural
Determinism Very High Low High
Accuracy
Very High High High
Runtime Complexity
O(n3)\mathrm{O}(n^{3}) Varies Constant
Space Complexity
O(n2)\mathrm{O}(n^{2}) Varies Constant

II-B1 Traditional Methods

Initially introduced in [9], traditional methods represent the most extensively documented and comprehensively understood approaches for OGM. These methods are characterized by their use of dense arrays of individually parameterized statistical distributions. Moreover, these methods are known for high levels of accuracy with a corresponding high level of determinism. The most well-known traditional method, described in [9], estimates voxel occupancy through a geometric sensor model with a conditional Bayesian update. Building on this approach, [5] extends beyond the traditional approach by introducing a forward sensor model that generates statistical maps with the highest probability of generating the prior observations.

The major limitation of traditional methods is the significant computational and space complexity associated with the learning of dense statistical distributions with time and space complexities of O(n3)O(n^{3}) and O(n2)O(n^{2}), respectively. More recent advancements, such as [10, 11], utilize Bayesian Hilbert Maps (BHMs). BHMs offer reduced runtime complexity for traditional methods by eliminating the need to retain previously observed points and eliminating regularization parameter optimization. Although Fast-BHM [8] managed to lower the runtime complexity of methods developed by Elfes and Thrun [9, 5] to sub-quadratic levels, runtime complexity continues to pose a significant challenge.

II-B2 Neural Methods

Neural methods, also known as neural implicit representations, leverage deep learning and back-propagation to map perceptions into implicit representations [12]. Various approaches are being explored in this domain, including Neural Radiance Fields (NeRF) [13], occupancy networks [14], as well as traditional convolutional and recurrent neural networks [15, 16]. While neural methods have shown remarkable capabilities, their implicit operational characteristics pose challenges in terms of validation for safety critical applications and knowledge-transferability across diverse domains [7].

II-B3 Quasi-Neural Methods

Quasi-neural methods emerge at the intersection of traditional methods and neural methods, aiming to harness the increased computational efficiency from neural methods while retaining the accuracy and interpretability of traditional methods. One notable quasi-neural method is [7] where a shallow convolutional neural network (CNN) and a sparse kernel are combined to update a three-dimensional semantic occupancy map by post processing previously labelled semantic point clouds. A major limitation of this work lies in the shallow CNN architecture, which necessitates retraining with domain shifts. Taking inspiration from recent works in cognitive science [19] and connectionist models [20], VSA-OGM is a foundational approach to quasi-neural OGM based on biologically-inspired hyperdimensional mathematics.

III Hyperdimenional Occupancy Grid Mapping

We begin with an overview of encoding statistical and spatial relationships within hyperdimensional space in Section III-A. In Section III-B, we define our hyperdimensional occupancy grid mapping framework and discuss how point cloud data is encoded, processed, and subsequently decoded.

TABLE II: The mathematical symbols used to describe our framework.
Symbol Meaning
\otimes circular convolution
++ element-wise addition
dd vector dimensionality
nn environmental dimensionality
ϕ\phi Spatial Semantic Pointer (SSP)
ϕ(x)\phi(x) xx\in\mathcal{R} encoded as a SSP
ϕ1\phi^{-1} pseudo-inverse of ϕ\phi
\mathcal{F} Discrete Fourier Transform (DFT)
1\mathcal{F}^{-1} inverse DFT
eiae^{ia} complex phasor
|||\bullet| vector magnitude
ll length scale
λ\lambda point cloud
tt time step
γ\gamma nn-dimensional coordinate vector
ϕ\phi attribute vector
δ\delta number of quadrants per dimension
ω\omega quadrant memory vector
β\beta singular quadrant
μ\mu center coordinate of β\beta
rr disk filter radius
𝒮\mathcal{S} Shannon entropy
ρ\rho decision threshold
Ω\Omega set of global memory vectors

III-A Hyperdimensional Computation

Our hyperdimensional occupancy grid mapping framework, VSA-OGM, is based on Spatial Sementic Pointers (SSPs) [20] for their unique probabilistic properties and ability to encode real-valued data. More information on the background of SSPs can be found in Section II-A. The n-th dimension of a dd dimensional SSP, ϕ\phi, is defined as a complex phasor:

ϕ=1[e0ia,e1ia,e2ia,,edia],\phi=\mathcal{F}^{-1}[e^{ia}_{0},e^{ia}_{1},e^{ia}_{2},...,e^{ia}_{d}], (3)

where \mathcal{F} is the Fast Fourier Transform [30]. A major requirement of this framework is that each complex value must be uniformly distributed with the entire vector having unit length [19]: |{ϕ}d|=1|\mathcal{F}\{\phi\}_{d}|=1.

Many VSA systems [28] encode classes of objects into random vectors that are dissimilar and cannot capture continuous spatial information. Using SSPs as the basis of VSA-OGM, we are able to encode real-valued spatial information through complex element-wise exponentiation111Also commonly known as fractional binding [29] [22]:

ϕ(x)=1{{ϕ}x};x.\phi(x)=\mathcal{F}^{-1}\{\mathcal{F}\{\phi\}^{x}\};x\in\mathbb{R}. (4)

All fundamental VSA operations are compatible with this framework where grouping multiple vectors together, bundling, is achieved through element-wise addition in the time domain:

ϕ(z)=ϕ(x)+ϕ(y).\phi(z)=\phi(x)+\phi(y). (5)

The binding operation uses circular convolution to assign vectors with classes or attributes as shown in Equation 6. This process is commonly referred to as slot and filler representations with SSPs [20].

ϕ(x)ϕ(y)=1{{ϕ(x)}{ϕ(y)}}\phi(x)\otimes\phi(y)=\mathcal{F}^{-1}\{\mathcal{F}\{\phi(x)\}\odot\mathcal{F}\{\phi(y)\}\} (6)

Circular convolution is invertable where ϕ(x)\phi(x) can be approximately decoded from ϕ(x)ϕ(y)\phi(x)\otimes\phi(y) by binding the resultant vector with the pseudo-inverse of ϕ(y)\phi(y):

(ϕ(x)ϕ(y))ϕ(y)1=ϕ(x)+ϵ,(\phi(x)\otimes\phi(y))\otimes\phi(y)^{-1}=\phi(x)+\epsilon, (7)

where ϵ\epsilon represents noise.

Moreover, two vectors can be compared through a dot product operation. SSP’s have a unique computational property where this dot product approaches a true sinc function as vector dimensionality dd approaches infinity [26]. For example, given x=0x=0 encoded as ϕ(x)\phi(x) and the estimate of xx encoded as ϕ(x)\phi(x)^{\prime}, the pseudo kernel density estimator k(x,x)k(x,x^{\prime}) is calculated as: k(x,x)=ϕ(x)ϕ(x)k(x,x^{\prime})=\phi(x)*\phi(x)^{\prime}. A visualization of the decoded kernel is shown in Figure 1. Kernel width can be adjusted to specific values by modifying Equation 4 to

ϕ(x)=1{{ϕ}(x/l)};x,\phi(x)=\mathcal{F}^{-1}\{\mathcal{F}\{\phi\}^{(x/l)}\};x\in\mathbb{R}, (8)

where ll is the length scale parameter and there is a positive correlation between ll and the kernel width. Despite a lack of widespread use, sinc kernels have proven to be more accurate, in terms of mean integrated squared error, versus other kernels such as the Epanechnikov kernel [31].

III-B Building Occupancy Grids

III-B1 Data Preprocessing

Regardless of the perception capabilities of any given robotic system, its perspective of the world should be projected to a stable reference frame for accurate map building across individual or multiple agents. This projection is a non-trivial task with a rich collection of literature on the topic [32]. This process is commonly achieved through a linear transformation calculated based on inertial measurement units, and localization methods to ensure differentiability between successive data samples [33]. To remove any additional noise from these calculations in our experiments, all data generated within simulated environments are natively located on the global reference frame. Moreover, all real-world datasets are preprocessed to the global reference frame as in [8].

III-B2 Data Format

Robotic systems sample multiple point clouds {λ0,λ1,,λt}\{\lambda_{0},\lambda_{1},...,\lambda_{t}\}, with the coordinates projected to the global reference frame at time step 0. For each tt, λt\lambda_{t} consists of a point vector γ\gamma and an attribute vector ψ\psi of labels. All experiments in this work limit the environmental dimensionality of γ\gamma to 22 but this can be generalized to support higher dimensional problem spaces. We limit the dimensionality of ψ\psi to 11, where binary values such that ψ{0,1}\psi\in\{0,1\} specifies if a voxel is occupied ψ=1\psi=1 or empty ψ=0\psi=0.

III-B3 Initialization

Multiple properties of the VSA system must be specified before any point clouds can be processed. The most critical parameter is the individual vector dimensionality dd, which has a direct impact on the noise resilience and information capacity of the hypersphere [19]. We explore the impact of this parameter in Section IV-B1. Length scale ll should be adjusted according to the desired map resolution. We explore the impact of this parameter in Section IV-B2. Encoding n-dimensional spatial information requires the specification of an axis vector ϕaxis\phi_{axis} per dimension. In the case of our 2D problem, we define axis vectors ϕaxisx\phi_{axis}^{x} and ϕaxisy\phi_{axis}^{y}.

III-B4 Space Discretization

Inspired by recent understanding of chunking where mammalian brains separate large scale spatial phenomenon into different portions of memory [34], we create an abstraction of this theory to separate the global space into a matrix of hypercubes. Specified by δ\delta, we split each dimension into δ\delta evenly space chunks. To further discretize the environment, each chunk is also separated on a class-by-class basis. As such, for a given experiment with δ\delta quadrants per dimension and 22 classes, there are δn2\delta^{n}*2 memory vectors, ωi\omega_{i}, with dd dimensionality:

ω1,ω2,,ωδn2.\omega_{1},\omega_{2},...,\omega_{\delta^{n}*2}. (9)

We explore the impact of this approach in Section IV-B3 where we evaluate multiple parameter values and determine the required VSA dimensionality.

III-B5 Encoding Point Clouds

For each time step tt, a point cloud λt\lambda_{t} contains jj points and labels in γ\gamma and ψ\psi, respectively. Across all jj, γi\gamma_{i} contains an nn-dimensional matrix of coordinates and ψi\psi_{i} contains a vector of attributes. All results in this work assume γi\gamma_{i} is a 2D vector across the xyx-y plane and ψi\psi_{i} contains one binary attribute specifying occupancy.

Each point γi\gamma_{i} in λt\lambda_{t} is encoded into a set of location vectors {ϕ0,ϕ1,,ϕj}\{\phi_{0},\phi_{1},...,\phi_{j}\} by fractionally binding along each individual dimension of γi\gamma_{i} and using circular convolution to bind the individual dimensions together as a single vector ϕi\phi_{i} with a variable length scale ll by expanding Equation 6:

ϕi=1{{ϕaxisx}γix/l{ϕaxisy}γiy/l}.\phi_{i}=\mathcal{F}^{-1}\{\mathcal{F}\{\phi_{axis}^{x}\}^{\gamma_{i}^{x}/l}\odot\mathcal{F}\{\phi_{axis}^{y}\}^{\gamma_{i}^{y}/l}\}. (10)

This process is repeated for all jj points in γ\gamma resulting in a set of jj location vectors representing each point in λt\lambda_{t} encoded into hyperdimensional space.

Given the set of location vectors {ϕ1,ϕ2,,ϕj}\{\phi_{1},\phi_{2},...,\phi_{j}\} and their corresponding labels {ψ1,ψ2,,ψj}\{\psi_{1},\psi_{2},...,\psi_{j}\} from λt\lambda_{t}, we follow Algorithm 1 to add the jj points to the memory β\beta, corresponding to the correct quadrant and class. The process begins by calculating the centers of each quadrant: μ\mu. Looping over each point, we determine the corresponding quadrant center with the minimum Euclidean distance to the given point and add the point vector to that quadrant’s memory representing the point’s label ψi\psi_{i}.

Algorithm 1 Adding point vectors to quadrant memories
0:  {ω1,ω2,,ωδn2}\{\omega_{1},\omega_{2},...,\omega_{\delta^{n}*2}\}\leftarrow quadrant memory vectors
0:  {ϕ1,ϕ2,,ϕj}\{\phi_{1},\phi_{2},...,\phi_{j}\}\leftarrow points in hyperspace
0:  {ψ1,ψ2,,ψj}\{\psi_{1},\psi_{2},...,\psi_{j}\}\leftarrow occupancy labels
0:  nn\leftarrow environment dimensionality
0:  δ\delta\leftarrow quadrants per dimension
0:  ψi{0,1}\psi_{i}\in\{0,1\}
0:  n1n\geq 1
0:  δ1\delta\geq 1
0:  j1j\geq 1
  i1i\leftarrow 1
  {μ1,,μδn}getQuadrantCenters()\{\mu_{1},...,\mu_{\delta^{n}}\}\leftarrow getQuadrantCenters()
  while iji\leq j do
     βargminkdL2(ϕi,μk)\beta\leftarrow\arg\min\limits_{k}d_{L2}(\phi_{i},\mu_{k})
     ω2β+ψiω2β+ψi+ϕi\omega_{2\beta+\psi_{i}}\leftarrow\omega_{2\beta+\psi_{i}}+\phi_{i}
     ii+1i\leftarrow i+1
  end while

III-B6 Decoding

With all labeled points fully realized in each quadrant memory, we can query individual quadrant memories for specific information about the environment. For occupancy grid mapping, we are primarily concerned with two main types: (1) what portions of the map are occupied, and (2) what parts of the map are unoccupied or empty. Given our fully specified quadrant memory vectors {ω1,ω2,,ωδn2}\{\omega_{1},\omega_{2},...,\omega_{\delta^{n}*2}\}, we can make inferences about occupancy and emptiness through another series of deterministic vector operations and a novel application of Shannon entropy [23].

Throughout the continuous operation of any SSP-based VSA system, successive bundling will slowly increase the magnitude of the memory vectors. Without correction, this will drastically reduce information capacity as the unit-modulus property is violated [19]. Therefore, a normalization operation transforming the memory vector back to unit length is required before querying any information:

ωi=ωiωi.\omega_{i}=\frac{\omega_{i}}{\|\omega_{i}\|}. (11)

With the quasi-probabilistic property of our VSA framework, we calculate the probability of a coordinate γi\gamma_{i}, located within a quadrant β\beta, with a class label ψi{0,1}\psi_{i}\in\{0,1\} as:

Pr(class|γi,ψi,δβ)=ψ2β+ψiϕloc,Pr(class|\gamma_{i},\psi_{i},\delta_{\beta})=\psi_{2\beta+\psi_{i}}*\phi_{loc}, (12)

given a location vector ϕloc\phi_{loc} representing γi\gamma_{i} in hyperdimensional space as described in Equation 10. Depending on the specific application and the amount of available memory, the computational complexity of calculating ϕloc\phi_{loc} in Equation 12 can be minimized by precalculating a matrix of location vectors and storing them in memory. The class of a region is calculated by expanding Equation 12 across a matrix of ϕloc\phi_{loc}.

Even with the implementation of vector normalization as described in Equation 11, the inherent noise within VSAs makes quantitatively decoding the information non-trivial. This leads many hyperdimensional computing systems to use some form of clean-up memory or associative memory to transform the noisy representations into more clearly defined and linearly separable representations [28].

With our focus of maximizing the interpretability and minimizing the stochasticity of our system, we decided to take a radically different approach. Shannon’s theory of information has revolutionized modern communication systems and allows us to quantify information within noisy signals [23]. Recently, many works in material sciences [35] and radar imaging [36] use this well studied approach to extract feature rich information from noisy or otherwise incomprehensible data.

Refer to caption
Figure 2: The heat map of probabilities, shown in the plasma color map, of ψ\psi when decoding directly from the memory vectors. Calculating the local entropy of individual heat map values by iteratively passing a disk filter across the heat map image and applying Shannon entropy [23]. The localized entropy values 𝒮\mathcal{S}, shown in the viridis color map, across the heat maps for all ψ\psi. The global occupancy entropy map calculated by subtracting 𝒮(ψ=0)\mathcal{S}(\psi=0) from 𝒮(ψ=1)\mathcal{S}(\psi=1).

Taking inspiration from these works, we employ an entropy-based mechanism to extract labeled spatial information from the probability heat maps calculated by applying Equation 12 across the entire map. To avoid violating the axioms of probability assumed in information theory [37], we convert the quasi-probability values from our VSA system to true probability values with a loose interpretation of the Born rule [38] where each true probability is equal to the squared quasi-probability. Using disk filters with radii r(ψ=1)r(\psi=1) and r(ψ=0)r(\psi=0) for both classes, we compute the local Shannon entropy 𝒮\mathcal{S} within a radius of any heat map voxel at (x,y)(x,y) representing the 𝒮(ψ|x,y)\mathcal{S}(\psi|x,y). This process is repeated across all voxels resulting in global class-wise entropies that increase class separability and reduce background noise. Globally, we subtract the entropy values of occupancy probability from the entropy values of empty probability resulting in a final entropy matrix 𝒮(ψ)\mathcal{S}(\psi). This process is highlighted visually in Figure 2 and programatically in Algorithm 2. The specification of r(ψ=1)r(\psi=1) and r(ψ=0)r(\psi=0) and the implications thereof are explored in our ablation study in Section IV-B4.

With 𝒮(ψ)\mathcal{S}(\psi), we classify each point (x,y)(x,y) as:

class(x,y)={occupied,for 𝒮(ψ|x,y)ρempty,for 𝒮(ψ|x,y)<ρ,\mathrm{class}(x,y)=\begin{cases}\mathrm{occupied},&\text{for }\mathcal{S}(\psi|x,y)\geq\rho\\ \mathrm{empty},&\text{for }\mathcal{S}(\psi|x,y)<\rho\\ \end{cases}, (13)

where ρ\rho is the decision threshold. The sensitivity of our method to the proper specification of ρ\rho is highlighted with the area under the receiver operating characteristic curve [39] in Section IV-A.

Algorithm 2 Calculating class-wise entropy and the global occupancy entropy
0:  𝒮(ψ)\mathcal{S}(\psi)\leftarrow 2D matrix of global entropy values
0:  Pr(ψ)Pr(\psi)\leftarrow probability map of class ψ\psi
0:  r(ψ)r(\psi)\leftarrow disk filter radius for class ψ\psi
0:  ψ{0,1}\psi\in\{0,1\}
0:  xx\in\mathbb{N}
0:  yy\in\mathbb{N}
  x1x\leftarrow 1
  y1y\leftarrow 1
  d1=disk(r(ψ=1))d_{1}=disk(r(\psi=1))
  d2=disk(r(ψ=0))d_{2}=disk(r(\psi=0))
  while xnumVoxels(x)x\leq\mathrm{numVoxels}(x) do
     while ynumVoxels(y)y\leq\mathrm{numVoxels}(y) do
        𝒮(ψ=1|x,y)=entropy(Pr(ψ=1),x,y,d1)\mathcal{S}(\psi=1|x,y)=entropy(Pr(\psi=1),x,y,d_{1})
        𝒮(ψ=0|x,y)=entropy(Pr(ψ=0),x,y,d2)\mathcal{S}(\psi=0|x,y)=entropy(Pr(\psi=0),x,y,d_{2})
        𝒮(ψ)x,y=𝒮(ψ=1|x,y)𝒮(ψ=0|x,y)\mathcal{S}(\psi)_{x,y}=\mathcal{S}(\psi=1|x,y)-\mathcal{S}(\psi=0|x,y)
        y=y+1y=y+1
     end while
     x=x+1x=x+1
  end while

III-B7 Multi-Map Fusion

Our hyperdimensional occupancy grid mapping system supports single agent mapping and multi-agent mapping through memory fusion. Given a set of nn independently operating agents {Γ1,,Γn}\{\Gamma_{1},...,\Gamma_{n}\}, their quadrant memory vectors can be merged if and only if three conditions are satisfied. First, each agent Γi\Gamma_{i} must be operating on the same axis basis vectors for all dimensions of their operating environment. In this work focusing on two-dimensional occupancy grid mapping, all agents must have the same axis basis vectors ϕaxisx\phi_{axis}^{x} and ϕaxisy\phi_{axis}^{y}. Secondly, each agent should be learning the same number of classes with each class having the same ψ\psi value. Lastly, each agent must use the same value of δ\delta to separate their environment into δ\delta distinct chunks along the global reference frame.

Algorithm 3 Fusing multiple agent’s memory vectors
0:  {Γ1,,Γn}\{\Gamma_{1},...,\Gamma_{n}\}\leftarrow the set of all agents
0:  n2n\geq 2
  Ω\Omega\leftarrow\emptyset
  i1i\leftarrow 1
  while ini\leq n do
     j1j\leftarrow 1
     while jgetNumMemories(Γi)j\leq getNumMemories(\Gamma_{i}) do
        ωj=getMemory(Γi,j)\omega_{j}=getMemory(\Gamma_{i},j)
        if ωjΩ\omega_{j}\notin\Omega then
           Ω=Ωωj\Omega=\Omega\cup\omega_{j} {Add ωj\omega_{j} to the set Ω\Omega}
        else
           Ω{ωj}=Ω{ωj}+ωj\Omega\{\omega_{j}\}=\Omega\{\omega_{j}\}+\omega_{j}
        end if
        j=j+1j=j+1
     end while
     i=i+1i=i+1
  end while

Assuming all three conditions are met, their memories can be merged into a cohesive set of quadrant memories representing the collective knowledge amongst the group. The process begins by initializing an empty set Ω\Omega that will contain the unique memory vectors within all agents. Now looping over each agent, the following steps are followed. First, check if all quadrant memories within the agent are represented within Ω\Omega. Second, if all memories exist within Ω\Omega, proceed to the third step. Otherwise, empty memory vectors are initialized within Ω\Omega corresponding to the unique memory vectors within the current agent. Third, use Equation 5, to bundle each memory vector within the current agent with the global memory vector within Ω\Omega. Lastly, repeat this process until all agent memories have been accounted for in Ω\Omega. This approach is also highlighted programmatically in Algorithm 3.

IV Results

To evaluate the performance of VSA-OGM, we use four datasets and three baseline methods. The first dataset was synthetically created with the simulator from Bayesian Hilbert Maps [10] (Simulated Lidar). The second is the Intel Campus dataset that was captured with a physical robot at Intel’s Seattle campus [25] (Intel Campus Map). The third dataset is the simulated autonomous driving dataset from EviLOG [12] (EviLOG Dataset). The fourth and final ablation dataset is a simplified dataset we created for our ablation study. More information on each dataset can be found in the following sections.

In all experiments, mapping accuracy is measured as the area under the receiver operating characteristic curve (AUC) [39]. Unless otherwise noted, all experiments are conducted with an Intel(R) Xeon(R) W-2295 CPU, 128GB RAM, and an RTX A5000 GPU.

We split the overall results into two major sections. In Section A, we compare our approach with two traditional methods and one neural method: Bayesian Hilbert Maps (BHM) [10], Fast-BHM [8], and Evilog [12], respectively. In Section B, we perform an ablation study of VSA-OGM across each hyperparameter: vector dimensionality dd, length scale ll, the number of quadrants along each dimension δ\delta, and the disk filter radii r(ψ=0)r(\psi=0) and r(ψ=1)r(\psi=1).

IV-A VSA versus Baselines

We compare VSA-OGM versus three baseline methods consisting of two traditional methods (Bayesian Hilbert Maps (BHM) [10] and Fast-BHM [8]) and a neural method known as Evilog [12]. Using the aforementioned datasets, we split our experiments into Single Agent and Multi Agent tasks where the former uses a single agent to map the entire environment and the latter uses multiple agents whose individual perspectives are combined to create a single ”fused” map.

IV-A1 Simulated Lidar - Single Agent

We created a synthetic LiDAR dataset with a re-implementation of the simulator introduced in BHM [11]. This dataset uses a single agent equipped with a single LiDAR sensor having a 360360^{\circ} field of view, a view distance of 2020 meters, and 5050 individual beams. The total area of the environment is 100m2100\mbox{m}^{2}.

The dataset contains 45004500 individual points with a class distribution of 60%60\% occupied and 40%40\% empty. We train VSA-OGM and BHM [10] on 90%90\% of the dataset with 10%10\% reserved for validation where the final AUC score is determined.

This dataset is particularly difficult due to the sparse features. Many applications of probabilistic occupancy grid mapping avoid this issue by augmenting the training data. This is accomplished by interpolating a linear line with random points between the agent’s location and the LiDAR points [10]. We chose to avoid this preprocessing step to highlight how theoretical methods and VSA-OGM respond to data sparsity.

TABLE III: A comparison between VSA-OGM versus Bayesian Hilbert Maps (BHM)[11] on the simulated dataset created from our reimplementation of the simulator introduced in [10]. We were unable to find a working implementation of BHM [10] with support for CUDA acceleration.

. VSA (CPU) VSA (GPU) BHM (CPU) AUC 0.959 0.959 0.978 Latency 81ms 7ms 19388ms Model Size 1MB 1MB 800MB

Refer to caption
Figure 3: Simulated single agent mapping experiment with the dataset generated from the simulator introduced in Bayesian Hilbert Maps (BHM) [10] (A) The simulated testing environment with the agent’s trajectory (B) The sparse point cloud data with ground truth labels for training (C) Trained occupancy grid map (OGM) with VSA-OGM (C) Trained OGM from BHM [10].

With a vector dimensionality of 3200032000, a length scale of 22, occupied and empty disk filter radii of 44 and 1010 voxels, and 22 quadrants per dimension, VSA-OGM achieves an AUC score of 0.9590.959 versus BHM with a score of 0.9780.978. We were unable to find a working GPU-compatible implementation of BHM.

Despite adjusting the parameters for BHM, we were unable to build an occupancy grid map that successfully interpolates between sparse points to broadly classify empty regions with no training or testing points. This is shown in Figure 3 where the majority of the map is labelled as unknown, highlighted by the teal color. On the other hand, VSA-OGM was able to successfully interpolate between the empty points and label the majority of the area as empty. Our increased interpolation performance also comes with significantly reduced inference latency at approximately 240x (CPU) and 2800x (GPU) reductions versus BHM (CPU) while maintaining comparable accuracy on the training and validation sets. Furthermore, our approach reduces the total memory size of the learned representation by 800x. These results are summarized in Table III. Furthermore, the ground truth representation and trained occupancy grid maps are visualized in Figure 3.

IV-A2 Simulated Lidar - Multi-Agent

TABLE IV: The Precision, Recall, F1-Score, AUC score [39] and training time of agents 1 and 2 on our simulated dataset before being merged into a single, fused representation.
Agent 1 Agent 2 Map Fusion
Precision 0.9220.922 0.930.93 0.9320.932
Recall 0.9590.959 0.9570.957 0.9360.936
F1-Score 0.9410.941 0.9430.943 0.9340.934
AUC 0.9750.975 0.9760.976 0.9740.974
Latency 77ms 77ms 7878ms

We created a second dataset from our re-implemented simulation environment from [11]. Rather than being focused on a single agent, this experiment simulates the accumulation of data from two independent agents operating within the same environment. As such, each will have a unique perspective of the environment that should help improve the collective world model.

Both agents are equipped with a single LiDAR sensor having a 360360^{\circ} field of view, a view distance of 2020 meters, and 5050 beams. The total area of the environment is 100m2100\mbox{m}^{2}. The data sequences from both agents contain 1835018350 points across 367367 individual point clouds. That leads to a total of 3670036700 points across 734734 point clouds. The class distribution for agent 1 is 48.9%48.9\% occupied and 51.1%51.1\% empty with agent 1 having 53.1%53.1\% and 46.9%46.9\%, respectively.

Refer to caption
Figure 4: Simulated multi-agent mapping experiment displayed individually. (A) The simulated testing environments with both agent’s unique trajectory (B) The sparse point cloud data with ground truth labels for training (C) The localized occupancy entropy 𝒮(ψ=1)\mathcal{S}(\psi=1) (D) The localized empty entropy 𝒮(ψ=0)\mathcal{S}(\psi=0) (E) The global entropy 𝒮(ψ)\mathcal{S}(\psi) calculated by 𝒮(ψ=1)𝒮(ψ=0)\mathcal{S}(\psi=1)-\mathcal{S}(\psi=0).
Refer to caption
Figure 5: Simulated multi-agent mapping experiment with our simulated dataset fused together. (A) The localized occupancy entropy 𝒮(ψ=1)\mathcal{S}(\psi=1) (B) The localized empty entropy 𝒮(ψ=0)\mathcal{S}(\psi=0) and (E) The global entropy 𝒮(ψ)\mathcal{S}(\psi) of the fused maps calculated by 𝒮(ψ=1)𝒮(ψ=0)\mathcal{S}(\psi=1)-\mathcal{S}(\psi=0).

The same configuration parameters are used from Section IV-A1 except that agent 1 has r(ψ=1)=5r(\psi=1)=5 voxels and r(ψ=0)=6r(\psi=0)=6 voxels versus agent 2 with r(ψ=1)=4r(\psi=1)=4 voxels and r(ψ=0)=10r(\psi=0)=10 voxels. Therefore, both agents are equipped with identically configured hyperdimensional mapping systems with the only difference being the disk radii. The radius values were selected through cross validation on the validation set. Both agent’s are trained simultaneously on 90%90\% of their training data along with fusion occurring at every time step. The AUC score is calculated across both agent’s individually with agent 1 achieving 0.9750.975 and agent 2 with 0.9760.976. Furthermore, the final fused map achieves a score of 0.9740.974, suggesting that our fusion methodology results in negligible information losses while combining the unique perspectives of each agent into an accurate representation of the entire environment. The individual training results amongst both agents are shown in Figure 4 with the fused results highlighted in Figure 5 and quantitative results in Table IV.

IV-A3 Intel Campus Map - Single Agent

TABLE V: A comparison of BHM [11] and Fast-BHM [8] versus our vector symbolic architecture based occupancy gird mapping method on the Intel Campus Dataset [25]. The hardware specifications for both baseline methods were not provided in the original papers.
VSA (Ours) BHM[10] Fast-BHM[8]
AUC 0.950.95 0.960.96 0.950.95
Latency 125125ms 2242522425ms 463463ms
Covariance True True False
Memory 6.46.4 MB 64006400 MB 0.04480.0448 MB
Refer to caption
Figure 6: Single agent mapping experiment with the Intel campus dataset [25]: (A) The ground-truth labeled training data (B) The entropy of the probability of occupancy (C) The entropy of the probability of emptiness (D) The global combined entropy calculated as 𝒮(ψ=1)𝒮(ψ=0)\mathcal{S}(\psi=1)-\mathcal{S}(\psi=0) and (E) The final occupancy grid map from Fast-BHM [8].

The Intel Campus dataset [25] is a commonly used, real-word dataset for benchmarking semantic mapping algorithms. It contains 383118383118 data points with 910910 point clouds collected over an area 4040 meters by 3636 meters. The data has a class distribution of 42%42\% occupied and 58%58\% empty. Using a vector dimensionality of 200000200000, a length scale of 0.20.2 meters, occupied and empty disk filter radii of 11 voxels, and 22 quadrants per dimension, our hyperdimensional mapping system achieves an AUC score of 0.950.95 with an average per point cloud processing time of 125125ms versus Bayesian Hilbert Maps (BHM) [10] with a score of 0.960.96 and per frame processing time of 2242522425ms. Furthermore, Fast Bayesian Hilbert Maps (Fast-BHM) [8], assuming independence between model weights, achieved an AUC score of 0.950.95 with a per frame processing time of 463463ms. These results are summarized in Table V with the ground truth representation and trained occupancy grid maps visualized in Figure 6. With an inference latency of 125125ms, our method achieves comparable performance to the baseline methods while reducing inference latency by approximately 180180x and 44x respectively. Our method reduces the runtime of Fast-BHM [8] by approximately 4 times while maintaining the covariance between voxels. We also see a 1000x memory reduction compared to BHM [10]. Compared to Fast-BHM [8] we do not see a memory improvement because they assume independence between points which itself drastically reduces the model’s memory requirement.

IV-A4 Intel Campus Map - Multi Agent

Based on the multi-map fusion technique described in [8], we split the Intel dataset [25] into 44 quadrants across the xyx-y plane. The final splits of the global map are highlighted in Figure 7.

TABLE VI: The data distribution among the individual quadrants of the Intel dataset [25] along with the performance characteristics of each agent.
Quad. Points Occ. Empty Precision Recall F1 AUC
11 9530195301 38.83%38.83\% 61.16%61.16\% 0.8160.816 0.9290.929 0.8690.869 0.8870.887
22 9914299142 43.35%43.35\% 56.64%56.64\% 0.8040.804 0.9480.948 0.8700.870 0.9160.916
33 120135120135 42.29%42.29\% 57.70%57.70\% 0.7550.755 0.9580.958 0.8450.845 0.9300.930
44 6854068540 42.05%42.05\% 57.94%57.94\% 0.8060.806 0.9480.948 0.8720.872 0.9270.927

As summarized in Table VI, quadrant 1 (upper-right) contains 9530195301 points with a class distribution of 38.83%38.83\% occupied and 61.16%61.16\% empty. Quadrant 2 (upper left) contains 9914299142 points with a class distribution of 43.35%43.35\% occupied and 56.64%56.64\% empty. Quadrant 3 (lower-left) contains 120135120135 points with a class distribution of 42.29%42.29\% occupied and 57.70%57.70\% empty. Quadrant 4 (lower right) contains 6854068540 points with a class distribution of 42.05%42.05\% occupied and 57.94%57.94\% empty.

TABLE VII: The multi-map fusion capabilities of VSA-OGM versus Fast Bayesian Hilbert Maps [8] based on AUC score [39] and the learned model sizes. The reduction of model size within [8] is due to their approach assuming independence between voxels.
VSA (Ours) Fast-BHM [8]
AUC 0.9150.915 0.9360.936
Model Size 6400KB 44.8KB
Voxel Covariance True False
Refer to caption
Figure 7: Multi-Agent fusion experiment: (A) The four quadrant occupancy grid maps with our method on the Intel dataset [25] (B) The entropy of the probability of occupancy (C) The entropy of the probability of emptiness (D) The global combined entropy calculated as 𝒮(ψ=1)𝒮(ψ=0)\mathcal{S}(\psi=1)-\mathcal{S}(\psi=0) (E) The decoded class labels from (D) after applying Equation 13 with ρ\rho selected by maximizing True Positive Rate versus False Positive Rate with respect to the ROC [39] and (F) The final occupancy grid map from Fast-BHM [8].

.

Similar to Section 2, each of the four agents are equipped with identically configured hyperdimensional mapping systems. Each system is initialized with a vector dimensionality of 200000200000, a length scale of 0.20.2 meters, disk radii of 11 voxel, and 22 quadrants per dimension. Individually, the final AUC values on the 10%10\% validation set are 0.887, 0.927, 0.930, and 0.916. Furthermore, the global fused AUC was 0.915 compared to Fast-BHM [8] with a final AUC of 0.9360.936. Therefore our approach achieves comparable performance to Fast-BHM in multi-agent fusion. These values are summarized in Table VII with the training data and occupancy grid maps visualized in Figure 7.

IV-A5 EviLOG Dataset

Introduced in [12], Evidential Lidar Occupancy Grid Mapping (EviLOG) uses a multi-layer convolutional neural network architecture with 4.74.7 million trainable parameters. They demonstrate the ability of their system to perform localized occupancy grid mapping. They introduce a fully-labelled synthetic dataset using a VLP32C lidar sensor mounted on the roof of the moving vehicle.

Refer to caption
Figure 8: A comparison of our hyperdimensional occupancy grid mapping (OGM) approach compared to the neural method presented in [12]. The input point clouds and decoded OGMs from [12] are highlighted before showing the class-wise and global entropy maps.

.

Refer to caption
Figure 9: The effects of increasing vector dimensionality dd on our simulated ablation dataset from 222^{2} to 2152^{15}. We select a subset of the data points (shown in red) and visualize the probability and entropy values for each class and globally.

The EviLOG dataset is separated into three sets for training, testing, and validation with each containing 1000010000, 100100, and 10001000 point clouds (PCs), respectively. Each PC is bounded within a 56.3256.32 by 81.9281.92 meter rectangular region. Furthermore, each PC has a corresponding ground truth occupancy grid map with a voxel resolution of 0.160.16 meters.

Training EviLOG for 100 epochs with a batch size of 2000 took approximately 3030 hours. This comes in stark contrast to our approach where no pre-training is required and represents a major limitation of neural methods where the training process induces domain shift sensitivity. Furthermore, our method has a per frame inference time of 98ms versus EviLOG at 153ms. Therefore without pretraining, VSA-OGM is approximately 1.5x faster than EviLOG.

With a vector dimensionality of 32000, length scale of 0.2, an axis resolution of 0.1, 1 quadrant per dimension, and disk radii of r(ψ=1)=2r(\psi=1)=2 and r(ψ=0)=4r(\psi=0)=4, our method achieves a mean AUC score of 0.977, recall of 0.94, precision of 0.92, and F1-Score of 0.93 across the testing set. This results in a final model size of 0.25MB versus 18.9MB for EviLOG representing a 75x memory reduction with VSA-OGM.

EviLOG [12] calculates performance based on the KL-divergence across the entire domain, including areas with no labelled points so we are unable to compare via quantitative analysis as in Section 3. However, qualitative (visual) analysis provides a clear story of the differences between our quasi-neural approach and their neural approach.

As shown in Figure 8, the input point clouds are sparser versus any of the prior experiments. EviLOG’s neural network begins with a PointPillars architecture [40] before passing the latent representation series of convolution layers. This architecture allows EviLOG to better interpolate between points that are outside of the observable domain. This is clearly apparent when comparing the output of their network versus the global entropy map 𝒮(ψ)\mathcal{S}(\psi) from our VSA method in Figure 8. While our approach accurately interpolates between the points with a higher density in the vicinity of the moving vehicle, it fails to fully model the distant environment where points are extremely sparse. Their neural network was trained on a dataset from the same domain to extrapolate the unseen structure from sparse, unlabelled data points. This comes in contrast to our method where we make no prior assumptions about the observable space but requires a point labelling heuristic. This is a limitation of our quasi-neural method, however, prior experiments show that our method would be able to fully learn this environment provided adequate time to traverse the domain and sample more points.

IV-B Ablation Study

Our VSA mapping framework has multiple parameters that can be tuned for particular domains. In this section, we explore the selection of these parameters applied to a simulated dataset created from a simulator introduced in [11] and a real-world dataset from [25]. These datasets are discussed in further detail in Sections 1 and 3.

Refer to caption
Figure 10: The qualitative impact of increasing and decreasing the length scale ll between 0.010.01 and 2.02.0 in regards to spatial precision and noise resilience. We select a subset of the evaluated points (shown in red) and visualize the probability and entropy values for each class and globally.

IV-B1 Dimensionality (d)

Vector dimensionality dd plays a critical role in the design of VSA based systems. Intuitively, increasing this value should have a positive correlation with model accuracy and noise resilience. Despite the ever increasing power of the cloud, this parameter should be carefully considered as it will have a direct impact upon operational characteristics at the edge.

The ablation dataset consists of 50005000 randomly generated occupied points located along a circle with a radius of 44 meters. Furthermore, we include an additional 50005000 empty points uniformly distributed within the center of the circle. Consisting of 1000010000 points, this allows us to evaluate the qualitative and quantitative impacts of vector dimensionality.

As shown in Figure 9, we have unique columns for every discrete dimensionality value along with a row for each output of our VSA pipeline. This begins with the Pr(ψ=0)Pr(\psi=0) and Pr(ψ=1)Pr(\psi=1) as well as the global combined quasi-probability Pr(ψ)Pr(\psi) calculated as Pr(ψ=1)Pr(ψ=0)Pr(\psi=1)-Pr(\psi=0). The length scale for each test is fixed at 0.20.2 meters. We also include the class-wise and global entropy values calculated with r(ψ=1)=r(ψ=0)=2r(\psi=1)=r(\psi=0)=2.

We conduct a search over a set of vector dimension sizes ranging from 222^{2} to 2152^{15}. Qualitatively, it is clear that increasing the dimensionality has an improvement on the visual quality of the decoded heat maps. Quantitatively, performance improves as dimensionality increases. At the lower end with d=22d=2^{2}, our classifier achieves an AUC score of 0.511, suggesting that this dimensionality results in a classifier that is better than random guessing but far from being considered high quality. Increasing through the powers of 2 until 2152^{15}, we see final AUC values between 0.5110.511 and 0.9750.975. Beyond d=212d=2^{12}, the AUC score remains constant within the margin of error.

This suggests that d=212d=2^{12} is adequate for high classification performance on this dataset. However, if a given application does not require this level of classification performance or spatial fidelity, users can reduce the vector dimensionality to achieve their desired performance characteristics. The major takeaways from this experiment are (1) our VSA method has the ability to accurately perform occupancy grid mapping with (2) predictable variations as dimensionality is modified. This confirms the stability of our method despite changes in the operating configuration.

IV-B2 Length Scale (ll)

The length scale parameter ll is used during the fractional binding process, as shown in Equation 8, to adjust the width of the kernel around a given point. This process can be extended into more than one dimension as in Equation 10. Depending upon the specific application, the proper specification of ll is vital as it directly impacts the resolution and noise resilience of the decoded probability and entropy maps. As shown in Figure 10, increasing the length scale has a positive correlation with noise resilience but also corresponds to a decrease in resolution. On the other hand, decreasing the length scale will increase resolution but also increase noise sensitivity.

IV-B3 Space Discretization (δ\delta)

While our hyperdimensional occupancy grid mapping approach could be fully realized in a single, global memory vector ωglobal\omega_{global}, this would be impractical for real-time distributed systems with power, computational, and memory constraints. These issues arise as a result of the ever increasing vector dimensionality required to accurately model larger and more intricate environments.

Recent research efforts hypothesize the existence of a chunking mechanism within the mammalian brain that break larger spatial phenomenon into smaller, more easily comprehensible chunks or quadrants [34]. Our implementation of chunking comes with the specification of δ\delta that quantifies the number of discrete regions along each axis. Therefore, increasing δ\delta will result in an exponential increase in the number of hypercubes in the nn-dimensional space. Operating in 2D, n=2n=2, with two distinct class labels ψ{0,1}\psi\in\{0,1\} results in a total number of quadrant memories calculated as δn2\delta^{n}*2.

The proper specification of δ\delta is critical as too small of a value results in fewer quadrant memories with each having increased dimensionality. Conversely, a δ\delta value that is too high will fail to properly account for the probabilistic relationships of nearby points in adjacent quadrants.

As shown in Figure 11, we evaluate the specification of δ\delta in two scenarios with vector dimensionalities of d=32000d=32000 and d=256d=256. Each experiment uses the same dataset as the prior ablation sections with an area of 8 meters by 8 meters with an axis resolution of 0.2 meters. We specify the length scale and disk radii as 0.2 meters and 2 voxels, respectively.

Beginning with d=32000d=32000 across a discrete range of δ\delta values {1,2,4,6}\{1,2,4,6\}, all experiments achieve a final AUC score greater than 0.98. This suggests that for this specific problem, a vector dimensionality of 32000 is adequate to accurately model and classify the entire observed space. On the other hand, a qualitative assessment of the decoded probability and entropy maps suggests there is a remarkable improvement in visual quality and overall clarity as δ\delta increases. This discrepancy should be taken into account depending upon specific applications that simply require a high level of classification performance or also require a high level of visual fidelity.

Refer to caption
Figure 11: The qualitative impact of separating the environment into multiple chunks specified by δ\delta in regards to spatial precision and noise resilience. We select a subset of the data points, δ=1\delta=1 and δ=6\delta=6 for both dimensionalities (shown in orange and blue) and visualize the probability and entropy values for each class and globally.

To show how the proper specification of δ\delta can be used to model spaces with orders of magnitude less dimensionality, we conduct the same experiment across the same range of δ\delta values with the vector dimensionality being reduced to 256256. As shown in Figure 11, the qualitative performance of this configuration transforms from completely inaccurate and imperceivable to accurate and perceivably modeling the environment. An important thing to notice is the slight addition of artifacts along the quadrant boundaries, this is due to the discretization of the environment where the crossover of additional information in adjacent quadrants is not reflected.

The visual interpretation of increasing performance is reflected in the final AUC scores with a range of 0.6550.655 to 0.9860.986. The major takeaway from this study is that the specification of δ\delta should be considered in tandem with the specification of vector dimensionality based upon the complexity and goals of the mapping problem along with the available compute hardware.

IV-B4 Disk Filter Radius

To study the specification of the local entropy disk filter radii, we transition to a more complex, real world dataset from [25]. Depicting the interior of the Intel Campus in Seattle, this dataset has multiple corridors, rooms, and main pathways. Without any preprocessing, this results in the main pathways being sampled multiple times versus the rooms and corridors providing much less information. This is shown in Figure 12, where the probability of the individual classes along the main corridor are clearly visible with respect to the baseline noise. This comes in stark contrast to the reduced clarity of the decoded information within the corridors and rooms.

Refer to caption
Figure 12: Qualitative implications of disk radii r(ψ)r(\psi) specification on the Intel data [25] across both classes: occupied (ψ=1\psi=1) and empty (ψ=0\psi=0). All other experimental parameters remain constant as defined in Section IV-A3. The corresponding quantitative values for each parameter combination is shown in Table VIII.

We employ localized Shannon entropy [23] to illuminate information encoded within sparser areas and increase qualitative performance and class separability. This operation requires the specification of disk radii for both classes. Specified as r(ψ=0)r(\psi=0) and r(ψ=1)r(\psi=1), these values have a drastic impact on the decoded map quality.

Figure 12 visually displays the localized entropy maps for both classes. Specifically in the case of ψ=1\psi=1, the entropy map clearly illuminates information in the less sampled portions of the environment. A similar effect occurs when ψ=0\psi=0 however the effect is less pronounced. We evaluate all combinations of disk radii in {1,2,3,4}\{1,2,3,4\}.

TABLE VIII: The resulting AUC scores on the Intel campus dataset [25] when adjusting r(ψ=0)r(\psi=0) and r(ψ=1)r(\psi=1) between 1 and 4 voxels.
𝐫(ψ=𝟏)\mathbf{r(\psi=1)}
1 2 3 4
𝐫(ψ=𝟎)\mathbf{r(\psi=0)} 1 0.954 0.922 0.865 0.829
2 0.928 0.911 0.859 0.829
3 0.902 0.884 0.810 0.774
4 0.888 0.867 0.780 0.732

At lower radii values, we see the decoded entropy maps are quite sharp along the edges, this is in contrast to higher values where the maps have a blurred effect. At the center of the figure are the combined global entropy maps with the different combinations of disk radii. Compared to the original probability maps, we see radii in the range of 1 to 2 provide the most clarity.

Quantitatively, the final AUC scores reflect our qualitative analysis as r(ψ=1)=r(ψ=0)=1r(\psi=1)=r(\psi=0)=1 results in a final score of 0.955 versus r(ψ=1)=r(ψ=0)=4r(\psi=1)=r(\psi=0)=4 with 0.732. These results are also summarized in Table VIII. This suggests our addition of localized Shannon entropy is not overly sensitive to parameter specification and provides predictable performance variations with parameter shifts.

V Discussion

In this paper, we present an innovative approach to occupancy grid mapping (OGM) drawing inspiration from recent advances in cognitive science [19] and computational neuroscience [20]. Our methodology is grounded in the evolving field of vector symbolic architectures (VSA) [27]. VSAs have been conceptualized as a computationally viable abstraction for modeling biological phenomenon, including memory, cognition, and motor control [20].

This research builds upon the foundational work of [21] which introduced Spatial Semantic Pointers (SSP). SSPs define an algebraic framework founded on hyperdimensional vectors of unitary phasors and circular convolution. Previous studies demonstrate the efficacy of SSP-based VSA systems serving as kernel density estimators in hyperdimensional space [26]. Expanding upon these prior works, we describe an end-to-end OGM framework founded on SSPs, VSA-OGM.

Existing approaches to occupancy grid mapping encounter two prominent challenges. Firstly, theoretical methods offer high accuracy but struggle with significant time complexity due to the learning and parameterization of fully specified statistical distributions. Secondly, neural methods demonstrate high accuracy in extrapolating beyond observed domains with reduced latency but have limited application in mission and safety critical applications due to operating stochasticity unseen domains.

In response to these challenges, our approach pioneers a novel direction in OGM systems drawing inspiration from theoretical and neural methodologies. Known as quasi-neural methods, this innovative framework seeks to leverage the determinism and high accuracy of theoretical approaches while harnessing the enhanced computational efficiency of neural methods. By amalgamating the strengths of both domains, quasi-neural methods offer a promising solution to address the computational complexity and stochasticity issues prevalent in current mapping techniques.

We showcase the efficacy of VSA-OGM as a quasi-neural method through comprehensive evaluation, including an ablation study and quantitative and qualitative comparisons with state-of-the-art theoretical [10, 8] and neural methods [12]. In our ablation study, we assess the influence of key parameters, including vector dimensionality, distribution length-scale, spatial chunking, and local entropy extraction. Our findings illuminate the impact of these parameters on the accuracy, stability, and graceful degradation of system performance. Our results and analysis underscore the robustness and adaptability of our hyperdimensional occupancy grid mapping system, positioning it as a promising alternative that bridges the strengths of both theoretical and neural methodologies in addressing complex spatial mapping tasks.

Compared to theoretical methods, our approach achieves similar levels of mapping accuracy while significantly reducing inference latency by approximately 200.0x and 4.0x when compared to Bayesian Hilbert Maps (BHM) [10] and Fast Bayesian Hilbert Maps (Fast-BHM) [8], respectively. Furthermore, our method facilitates multi-map fusion among multiple agents. Our results demonstrate minimal information loss during the fusion process and performance comparable to that of Fast-BHM. This capability highlights the versatility and scalability of our approach in collaborative, multi-agent mapping scenarios.

In comparison to the neural method presented in EviLOG [12], VSA-OGM demonstrates the capability to accurately model and interpolate LiDAR points sampled along major road sections and around moving vehicles without the need for a 29.6329.63 hour pretraining session. This highlights the efficiency and effectiveness of our approach in capturing complex environmental dynamics. However, it’s important to note a key distinction between VSA-OGM and EviLOG: while our method excels in accurately representing data within the observed domain, it encounters challenges in extrapolating outside this domain, particularly in areas with extremely limited information. This limitation arises from our method’s fundamental principle of not assuming any specific properties about the operating environment. It is worth noting that EviLOG may be more susceptible to domain adaptation issues, as it is finely tuned to extrapolate domain-specific information.

VI Conclusion

Hyperdimensional computing presents significant opportunities in environments prioritizing low latency and minimal computational complexity. In this study, we introduce a novel system for occupancy grid mapping based on vector symbolic architectures (VSA), VSA-OGM. Our findings demonstrate that VSA methods achieve comparable performance to established theoretical approaches [10, 8] while offering a substantial reduction in runtime and space complexity. Furthermore, we conduct a comparative analysis with neural methods [12], highlighting a key distinction: our approach’s lack of assumptions about the environment presents certain limitations when extrapolating around sparse data. However, this characteristic also provides resilience against domain shifts, which is a noteworthy advantage.

Moving forward, our aim is to expand VSA-OGM to accommodate semantic 3D applications [41]. Additionally, we plan to conduct real-world demonstrations in driving and off-road scenarios, utilizing customized FPGA and neuromorphic hardware [42] to enhance both latency and power efficiency while also minimizing space complexity. Furthermore, we intend to explore strategies for mitigating the performance limitations associated with VSA-OGM’s lack of assumptions about the environment, particularly in comparison to neural methods. This includes investigating techniques to optimize VSA-OGM’s adaptability while preserving its efficiency, accuracy, and deterministic operating characteristics.

Acknowledgment

We acknowledge the technical and financial support of the Automotive Research Center (ARC) in accordance with Cooperative Agreement W56HZV-19-2-0001 U.S. Army DEVCOM Ground Vehicle Systems Center (GVSC) Warren, MI.

References

  • [1] G. C. D. Jr., “Programmed article transfer,” Dec. 1954. [Online]. Available: hhttps://patents.google.com/patent/US2988237A/en
  • [2] F. Vicentini, “Collaborative robotics: a survey,” Journal of Mechanical Design, vol. 143, no. 4, p. 040802, 2021.
  • [3] A. I. Károly, P. Galambos, J. Kuti, and I. J. Rudas, “Deep learning in robotics: Survey on model structures and training strategies,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 1, pp. 266–279, 2020.
  • [4] L. F. Oliveira, A. P. Moreira, and M. F. Silva, “Advances in agriculture robotics: A state-of-the-art review and challenges ahead,” Robotics, vol. 10, no. 2, p. 52, 2021.
  • [5] S. Thrun, “Probabilistic robotics,” Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002.
  • [6] L. Shi and J. Luo, “A framework of point cloud simplification based on voxel grid and its applications,” IEEE Sensors Journal, pp. 1–1, 2023.
  • [7] J. Wilson, Y. Fu, A. Zhang, J. Song, A. Capodieci, P. Jayakumar, K. Barton, and M. Ghaffari, “Convolutional bayesian kernel inference for 3d semantic mapping,” arXiv preprint arXiv:2209.10663, 2022.
  • [8] W. Zhi, L. Ott, R. Senanayake, and F. Ramos, “Continuous occupancy map fusion with fast bayesian hilbert maps,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 4111–4117.
  • [9] A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46–57, 1989.
  • [10] R. Senanayake and F. Ramos, “Bayesian hilbert maps for dynamic continuous occupancy mapping,” in Conference on Robot Learning, 2017, pp. 458–471.
  • [11] ——, “Building continuous occupancy maps with moving robots,” vol. 32, no. 1, 2018.
  • [12] R. Van Kempen, B. Lampe, T. Woopen, and L. Eckstein, “A simulation-based end-to-end learning framework for evidential occupancy grid mapping,” in 2021 IEEE Intelligent Vehicles Symposium (IV), 2021, pp. 934–939.
  • [13] M. Tancik, V. Casser, X. Yan, S. Pradhan, B. P. Mildenhall, P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-nerf: Scalable large scene neural view synthesis,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).   Los Alamitos, CA, USA: IEEE Computer Society, jun 2022, pp. 8238–8248. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/CVPR52688.2022.00807
  • [14] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).   Los Alamitos, CA, USA: IEEE Computer Society, jun 2019, pp. 4455–4465. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00459
  • [15] Y. Xiang and D. Fox, “Da-rnn: Semantic mapping with data associated recurrent neural networks,” 2017.
  • [16] P. Wu, S. Chen, and D. Metaxas, “Motionnet: Joint perception and motion prediction for autonomous driving based on bird’s eye view maps,” 2020.
  • [17] P. Singhal, R. Walambe, S. Ramanna, and K. Kotecha, “Domain adaptation: Challenges, methods, datasets, and applications,” IEEE Access, vol. 11, pp. 6973–7020, 2023.
  • [18] C.-H. Cheng, F. Diehl, G. Hinz, Y. Hamza, G. Nuehrenberg, M. Rickert, H. Ruess, and M. Truong-Le, “Neural networks for safety-critical applications — challenges, experiments and perspectives,” in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp. 1005–1006.
  • [19] T. A. Plate, Holographic Reduced Representation: Distributed representation for cognitive structures.   CSLI Publications Stanford, 2003, vol. 150.
  • [20] C. Eliasmith, How to build a brain: A neural architecture for biological cognition.   OUP USA, 2013.
  • [21] C. Eliasmith and C. H. Anderson, Neural engineering: Computation, representation, and dynamics in neurobiological systems.   MIT press, 2003.
  • [22] Komer, Brent, “Biologically inspired spatial representation,” Ph.D. dissertation, 2020. [Online]. Available: http://hdl.handle.net/10012/16430
  • [23] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
  • [24] N. S.-Y. Dumont, J. Orchard, and C. Eliasmith, “A model of path integration that connects neural and symbolic representation,” in Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 44, no. 44, 2022.
  • [25] A. Howard and N. Roy, “The robotics data set repository (radish),” 2003. [Online]. Available: http://radish.sourceforge.net/
  • [26] A. R. Voelker, “A short letter on the dot product between rotated fourier transforms,” 2020.
  • [27] D. Kleyko, D. A. Rachkovskij, E. Osipov, and A. Rahimi, “A survey on hyperdimensional computing aka vector symbolic architectures, part i: Models and data transformations,” ACM Computing Surveys, vol. 55, no. 6, 2022. [Online]. Available: http://dx.doi.org/10.1145/3538531
  • [28] D. Kleyko, D. Rachkovskij, E. Osipov, and A. Rahimi, “A survey on hyperdimensional computing aka vector symbolic architectures, part ii: Applications, cognitive models, and challenges,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–52, 2023.
  • [29] P. M. Furlong, T. C. Stewart, and C. Eliasmith, “Fractional binding in vector symbolic representations for efficient mutual information exploration,” in Proceedings of the ICRA Workshop: Towards Curious Robots: Modern Approaches for Intrinsically-Motivated Intelligent Behavior, 2022, pp. 1–5.
  • [30] H. J. Nussbaumer, The Fast Fourier Transform.   Berlin, Heidelberg: Springer Berlin Heidelberg, 1981, pp. 80–111. [Online]. Available: https://doi.org/10.1007/978-3-662-00551-4_4
  • [31] A. B. Tsybakov, “Nonparametric estimators,” in Springer Series in Statistics, ser. Springer series in statistics.   New York, NY: Springer New York, 2009, pp. 1–76.
  • [32] X. Huang, G. Mei, J. Zhang, and R. Abbas, “A comprehensive survey on point cloud registration,” 2021. [Online]. Available: https://arxiv.org/abs/2103.02690
  • [33] A. Macario Barros, M. Michel, Y. Moline, G. Corre, and F. Carrel, “A comprehensive survey of visual slam algorithms,” Robotics, vol. 11, no. 1, p. 24, 2022.
  • [34] F. Gobet, P. Lane, S. Croker, P. Cheng, G. Jones, I. Oliver, and J. Pine, “Chunking mechanisms in human learning,” Trends in cognitive sciences, vol. 5, pp. 236–243, 07 2001.
  • [35] G. Liu and X. Zheng, “Fabric defect detection based on information entropy and frequency domain saliency,” The Visual Computer, vol. 37, no. 3, pp. 515–528, 2021. [Online]. Available: https://doi.org/10.1007/s00371-020-01820-w
  • [36] D. Chan, J. Gambini, and A. Frery, “Speckle noise reduction in sar images using information theory,” in 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS), 2020, pp. 456–461.
  • [37] A. N. Kolmogorov, “Foundations of the theory of probability,” 1960. [Online]. Available: https://api.semanticscholar.org/CorpusID:117215830
  • [38] M. Born, “Quantenmechanik der stoßvorgänge,” Zeitschrift für Physik, vol. 38, no. 11, pp. 803–827, Nov 1926. [Online]. Available: https://doi.org/10.1007/BF01397184
  • [39] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, no. 7, pp. 1145–1159, 1997. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0031320396001422
  • [40] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” CoRR, vol. abs/1812.05784, 2018. [Online]. Available: http://arxiv.org/abs/1812.05784
  • [41] P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “Rellis-3d dataset: Data, benchmarks and analysis,” 2020.
  • [42] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, and H. Wang, “Loihi: A neuromorphic manycore processor with on-chip learning,” IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018.