This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Agile Robot Navigation through Hallucinated Learning and Sober Deployment

Xuesu Xiao1, Bo Liu1, and Peter Stone1,2 1Xuesu Xiao, Bo Liu, and Peter Stone are with Department of Computer Science, University of Texas at Austin, Austin, TX 78712 {xiao, bliu, pstone}@cs.utexas.edu. 2Peter Stone is with Sony AI. This work has taken place in the Learning Agents Research Group (LARG) at UT Austin. LARG research is supported in part by NSF (CPS-1739964, IIS-1724157, NRI-1925082), ONR (N00014-18-2243), FLI (RFP2-000), ARO (W911NF-19-2-0333), DARPA, Lockheed Martin, GM, and Bosch. Peter Stone serves as the Executive Director of Sony AI America and receives financial compensation for this work. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.
Abstract

Learning from Hallucination (LfH) is a recent machine learning paradigm for autonomous navigation, which uses training data collected in completely safe environments and adds numerous imaginary obstacles to make the environment densely constrained, to learn navigation planners that produce feasible navigation even in highly constrained (more dangerous) spaces. However, LfH requires hallucinating the robot perception during deployment to match with the hallucinated training data, which creates a need for sometimes-infeasible prior knowledge and tends to generate very conservative planning. In this work, we propose a new LfH paradigm that does not require runtime hallucination—a feature we call “sober deployment”—and can therefore adapt to more realistic navigation scenarios. This novel Hallucinated Learning and Sober Deployment (HLSD) paradigm is tested in a benchmark testbed of 300 simulated navigation environments with a wide range of difficulty levels, and in the real-world. In most cases, HLSD outperforms both the original LfH method and a classical navigation planner.

I INTRODUCTION

Machine learning techniques have been recently applied to mobile robot navigation to develop robots that are capable of moving from one point to another within obstacle-occupied environments in a collision-free manner [1, 2, 3, 4, 5, 6, 7]. Besides classical planning methods [8, 9], machine learning approaches can produce effective planners from data instead of hand-crafted rules and heuristics.

Among the thrust of learning to navigate, Imitation Learning (IL) [1] and Reinforcement Learning (RL) [3] are the two main streams. While the former requires expert demonstration, the latter learns from trial-and-error. Their initial successes indicate a promising potential future for these data-driven approaches, which do not require sophisticated engineering and in-situ adjustment [10, 5]. However, most learning approaches require a large amount of training data in order to produce good navigation behaviors, especially in challenging unseen environments.

Learning from Hallucination (LfH) [6] is a recently proposed paradigm to address the difficulty of obtaining high-quality training data. Using LfH, the robot collects training data in an obstacle-free, and thus completely safe, environment with a random exploration policy. During training, the most constrained surrounding obstacle configuration is synthetically projected onto the robot’s perception, which allows the effective action taken by the robot in the open space to be the only feasible, and therefore optimal, action. A control policy is learned with training data as if the robot had been moving in those constrained spaces. Thanks to the inherent safety of navigating in a completely open training environment, the robot can autonomously generate a large amount of training data with no human supervision or any costly failure during trial-and-error learning.

However, one major drawback of LfH [6] is that the perception also needs to be hallucinated during deployment, with the help of a fine-resolution reference path (prior knowledge that is sometimes infeasible to obtain), and it requires other modules to address out-of-distribution scenarios. This runtime hallucination adds extra computation to the perception and becomes ineffective when only sparse future waypoints are available. Furthermore, hallucinating to be always in the most constrained environment during deployment causes the planner to become unnecessarily conservative and fail to adapt to some realistic situations, e.g., an actual open space.

The Hallucinated Learning and Sober Deployment (HLSD) approach proposed in this work eliminates the necessity of hallucination during deployment and allows the robot to perceive its actual surroundings. This “sober deployment” relaxes the requirement for a high-resolution reference path and enables the robot to adapt to the real deployment environment. Through a novel hallucination strategy during training, the robot is able to learn from many obstacle configurations as augmentations sampled in addition to the minimal unreachable set, which is the smallest set of obstacles required to cause the actions performed in the obstacle-free training environment to be optimal, given a specific goal. Our simulated experiment on 300 benchmark testbeds [11] and our real-world experiment using a physical robot show that after seeing an extensive amount of carefully designed hallucinated training data, the robot is able to efficiently produce agile maneuvers without runtime hallucination. Superior navigation performance is achieved compared to the original LfH approach with extra access to a high-resolution global path and runtime hallucination, and also to a classical navigation planner.

II RELATED WORK

This section presents related work in mobile robot navigation, using classical motion planning and recent machine learning techniques.

II-A Classical Motion Planning

In terms of classical motion planning, mobile robot navigation is the problem of moving a robot from one point to another within an obstacle-occupied space in a collision-free manner. Such planning usually happens in the robot Configuration Space (C-Space) [12] and produces (asymptotically) optimal motion plans based on a predefined metric, such as maximum clearance, shortest path, or a combination thereof [13]. These metrics are optimized in terms of C-space representation (e.g. cellular decomposition [13]), global planning (e.g. Dijkstra’s), and local planning. For example, the optimization-based Elastic Bands (E-Band) planner [8] uses repulsive forces generated from obstacles to deform and optimize an initial trajectory, primarily to achieve maximum clearance. The Dynamic Window Approach (DWA) [9] samples feasible actions and scores them based on a weighted score of distance to obstacles, closeness to a global path, and progress toward the goal. The new hallucination strategy proposed in this paper assumes the planner learned from hallucinated training data primarily seeks the shortest path. We leave hallucination strategies that optimize for maximum clearance for future work.

II-B Machine Learning for Navigation

Despite decades of effort to develop classical autonomous navigation systems [8, 9], machine learning has recently shown promise for creating competitive end-to-end planners with obstacle avoidance [1], enabling terrain-based navigation [4, 14], allowing robots to move around humans [2], and tuning parameters for classical navigation systems [5, 15, 16]. All these learning methods require either extensive or high-quality training data, such as that derived from trial-and-error exploration or from human demonstrations [17].

To address the difficulty in obtaining extensive or high-quality training data, self-supervised LfH [6] collects training data in an obstacle-free environment with complete safety, and then learns a local navigation policy through synthetically projecting the most constrained C-space (CobstC_{obst}^{*}) onto the robot perception. The most constrained C-space corresponds to the obstacle configuration with as many obstacles as possible such that the executed motion is the only feasible and therefore optimal motion plan: if any more obstacles were to be added, then the motion plan would no longer be feasible. However, LfH also requires hallucination of CobstC_{obst}^{*} during deployment. This hallucination “on-the-fly” requirement assumes prior knowledge such as a relatively high-resolution global path is available. Furthermore, this previous work [6] only works well when the linear velocity in the training set is mostly constant, due to the fact that the robot always hallucinates navigating in the most constrained scenarios and the trained local planner is therefore relatively conservative. Varying speeds in the same most constrained space will lead to ambiguity, as shown in our experiments (Section IV). In this work, we eliminate the necessity of hallucination “on-the-fly”, and therefore the requirement of an available high-resolution global path. During sober deployment, the local planner only needs a single local goal point, along with the real, rather than hallucinated, perception. In addition, the novel hallucination technique also teaches the robot to vary its speed in response to the real environment, instead of the hallucinated most constrained spaces.

III APPROACH

In this section, we present our Hallucinated Learning and Sober Deployment (HLSD) technique for mobile robot navigation. In Sec. III-A, we start with a simplified example, which assumes the robot to be a point mass that follows a relatively simple path, to introduce the idea of a minimal unreachable set, CobstminC_{obst}^{min}, for a given optimal plan. We provide necessary conditions for being a minimal unreachable set, as the basis for many unreachable sets for the optimal plan. In Sec. III-B, we show how we use one special case, Cobstmin¯C_{obst}^{\overline{min}}, to represent all other minimal unreachable sets. We also present how we adapt the point mass example to deal with real-world robots. In Sec. III-C, we introduce a sampling method to generate augmentations to the representative minimal unreachable set (Cobstmin¯C_{obst}^{\overline{min}}) to generalize for sober deployment.

III-A Point Robot Example

We first present a simplified example, which assumes a point mass robot going through three non-colinear configurations. In this case, the robot’s workspace is exactly the C-space. We use the same notation used by Xiao et al. to formalize LfH [6]: given a robot’s C-space partitioned by unreachable (obstacle) and reachable (free) configurations, C=CobstCfreeC=C_{obst}\cup C_{free}, the classical motion planning problem is to find a function f()f(\cdot) that can be used to produce optimal plans p=f(Cobst|cc,cg)p=f(C_{obst}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g}) that result in the robot moving from the robot’s current configuration ccc_{c} to a specified goal configuration cgc_{g} without intersecting (the interior of) CobstC_{obst}. Here, a plan p𝒫p\in\mathcal{P} is a sequence of low-level actions {ui}i=1t\{u_{i}\}_{i=1}^{t} (ui𝒰u_{i}\in\mathcal{U}, 𝒫\mathcal{P} and 𝒰\mathcal{U} are the robot’s plan and action space, respectively). Considering the “dual” problem of finding f()f(\cdot), LfH [6] includes a method to find the (unique) most constrained unreachable set corresponding to pp, CobstC_{obst}^{*}. In this work, however, we introduce the definition of a (not unique) minimal unreachable set, CobstminC_{obst}^{min}, corresponding to pp:

Definition III.1

𝒞obstmin{CobstmincCobstmin,f(Cobstmin{c}|cc,cg)f(Cobstmin|cc,cg)}\mathcal{C}_{obst}^{min}\doteq\{C_{obst}^{min}\mid\forall c\in C_{obst}^{min},f(C_{obst}^{min}\setminus\{c\}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g})\neq f(C_{obst}^{min}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g})\}

In other words, every member, Cobstmin𝒞obstminC_{obst}^{min}\in\mathcal{C}_{obst}^{min}, is a minimal set of obstacles that lead to pp being an optimal plan.111CobstminC_{obst}^{min} is minimal in the sense that no subset of it also leads to the same optimal plan, i.e. nothing can be removed such that pp remains optimal. Any path that arises from an optimal plan can be approximated by connected line segments. The simplest case of a robot path following an optimal plan pp to move from ccc_{c} to cgc_{g} (except a straight line) is composed of configurations on two straight line segments, cccmc_{c}-c_{m} and cmcgc_{m}-c_{g}. The intermediate turning point is defined as cmc_{m}. Since cmc_{m} is one configuration on the robot’s path, we say pp goes through cmc_{m} (Fig. 1). In the following, given a point-mass robot moving from ccc_{c} to cgc_{g} according to some optimal plan pp computed by f()f(\cdot), we show two necessary conditions (\Longrightarrow) for an unreachable set (CobstC_{obst}) to be a minimal unreachable set (CobstminC_{obst}^{min}), to aid in identifying the representative unreachable set (Cobstmin¯C_{obst}^{\overline{min}}) used in Sec. III-B.

Proposition III.1

If cCobstminc\in C_{obst}^{min}, then the optimal plan for the unreachable set Cobstmin{c}C_{obst}^{min}\setminus\{c\}, p^=f(Cobstmin{c}|cc,cg)\hat{p}=f(C_{obst}^{min}\setminus\{c\}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g}), must go through cc.

Proof:

Assume otherwise. Since p^=f(Cobstmin{c}|cc,cg)f(Cobstmin|cc,cg)=p\hat{p}=f(C_{obst}^{min}\setminus\{c\}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g})\neq f(C_{obst}^{min}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g})=p (Definition III.1), for Cobstmin{c}C_{obst}^{min}\setminus\{c\}, the path arising from p^\hat{p} is shorter than the one arising from pp. Since we assume p^\hat{p} does not go through cc, adding cc back to Cobstmin{c}C_{obst}^{min}\setminus\{c\} does not affect the feasibility of p^\hat{p} for CobstminC_{obst}^{min} and does not change the path length. The path arising from p^\hat{p} is still shorter than the one arises from pp in CobstminC_{obst}^{min}, thus contradicting the optimality of pp for CobstminC_{obst}^{min}. ∎

Proposition III.2

(\Longrightarrow 1) Cobst𝒞obstmin,cmCobst\forall C_{obst}\in\mathcal{C}_{obst}^{min},c_{m}\in C_{obst}.222CC is a topological space, CobstC_{obst} is a closed set, and Cfree=CCobstC_{free}=C\setminus C_{obst} is an open set. cmc_{m} is a boundary point of both CobstC_{obst} and CfreeC_{free}. The robot can come arbitrarily close to the obstacles while remaining in CfreeC_{free} [13].

Proof:

Consider any circle (cm,ϵ)\mathcal{B}(c_{m},\epsilon) that centers at cmc_{m} with radius ϵ\epsilon. \mathcal{B} intersects cccmc_{c}-c_{m} at c1c_{1} and cmcgc_{m}-c_{g} at c2c_{2} (Fig. 1 a). Assume there exists no configuration cc\in\mathcal{B} such that Cobst𝒞obstmin\exists C_{obst}\in\mathcal{C}_{obst}^{min} and cCobstc\in C_{obst}. Consider the path ccc1c2cgc_{c}-c_{1}-c_{2}-c_{g}: since cccmcgc_{c}-c_{m}-c_{g} is feasible, then ccc1c_{c}-c_{1} and c2cgc_{2}-c_{g} are feasible. Moreover, by assumption c1c2c_{1}-c_{2} is also feasible. But ccc1c2cgc_{c}-c_{1}-c_{2}-c_{g} is shorter than cccmcgc_{c}-c_{m}-c_{g} since c1c2c_{1}-c_{2} is shorter than c1cmc2c_{1}-c_{m}-c_{2} due to triangle inequality. This contradicts the optimality of cccmcgc_{c}-c_{m}-c_{g}. Therefore, c\exists c\in\mathcal{B} that belongs to some Cobst𝒞obstminC_{obst}\in\mathcal{C}_{obst}^{min}. Since this is true for limϵ0(cm,ϵ)\lim_{\epsilon\rightarrow 0}\mathcal{B}(c_{m},\epsilon), and CobstC_{obst} is a closed set [13], the limit point (cm,0)=cmCobst\mathcal{B}(c_{m},0)=c_{m}\in C_{obst}. ∎

We name the union of all robot configurations in the triangle defined by ccc_{c}, cmc_{m}, and cgc_{g} as G1G_{1}. On the other side of line segment cccgc_{c}-c_{g}, we name the union of all configurations in the half ellipse, whose focal points locate at ccc_{c} and cgc_{g}, and whose major axis length is |cccm|+|cmcg||c_{c}c_{m}|+|c_{m}c_{g}|, as G2G_{2}. We define G=G1G2G=G_{1}\cup G_{2} (the grey area in Fig. 1).

Proposition III.3

(\Longrightarrow 2) Cobst𝒞obstmin\forall C_{obst}\in\mathcal{C}_{obst}^{min}, cG\forall c\in G, {cp|cp on line segments ccccg}Cobst\{c_{p}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{p}\text{ on line segments }c_{c}-c-c_{g}\}\cap C_{obst}\neq\emptyset

Proof:

Assume cG\exists c\in G, {cp|cp on line segments ccccg}Cobst=\{c_{p}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{p}\text{ on line segments }c_{c}-c-c_{g}\}\cap C_{obst}=\emptyset. Then ccccgc_{c}-c-c_{g} is a feasible path. The length of ccccgc_{c}-c-c_{g} is the sum of distances from cc (inside ellipse) to the two ellipse focal points ccc_{c} and cgc_{g}, which is, per definition of an ellipse, shorter than its major axis length |cccm|+|cmcg||c_{c}c_{m}|+|c_{m}c_{g}|. This contradicts the optimality of pp. ∎

Based on the two necessary conditions of Cobst𝒞obstminC_{obst}\in\mathcal{C}_{obst}^{min} (Proposition 2 and III.3), one class of minimal unreachable sets could be simply constructed by connecting cmc_{m} with some point cec_{e} on the left boundary of the ellipse with a straight line cmcec_{m}-c_{e} (Cobstmin1C_{obst}^{min_{1}} in Fig. 1 a), or two line segments cmcg(cc)c_{m}-c_{g}(c_{c}) and cg(cc)cec_{g}(c_{c})-c_{e} (Cobstmin2C_{obst}^{min_{2}} in Fig. 1 b), if not all configurations on cmcec_{m}-c_{e} are in GG. In particular, for efficiency, we simply represent all minimal unreachable sets with a special case, Cobstmin¯C_{obst}^{\overline{min}} (Fig. 1 d), which is all configurations along the straight line cmc_{m} and cmc_{m}^{\prime} (the reflective symmetry point of cmc_{m} with respect to cccgc_{c}-c_{g}). Empirical evidence of this approximation’s sufficiency for the purpose of learning will be provided in Sec. IV. Here, we further provide one more observation to help develop intuition regarding how to identify a CobstminC_{obst}^{min}.

Proposition III.4

Cobstmin\forall C_{obst}^{min}, cCobstmin\forall c\in C_{obst}^{min}, cGc\in G.

Proof:

Assume cCobstminc\in C_{obst}^{min} and cGc\notin G, only two possibilities exist:

(1) cc is outside the entire (left and right half) ellipse whose focal points locate at ccc_{c} and cgc_{g}, and whose major axis length is |cccm|+|cmcg||c_{c}c_{m}|+|c_{m}c_{g}|: based on Proposition III.1, the optimal plan p^=f(Cobstmin{c}|cc,cg)\hat{p}=f(C_{obst}^{min}\setminus\{c\}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g}) must go through cc. The shortest possible path, which goes through cc, is ccccgc_{c}-c-c_{g}, if it exists. However, based on the definition of ellipse, for any cc outside the ellipse, |ccc|+|ccg|>|cccm|+|cmcg||c_{c}c|+|cc_{g}|>|c_{c}c_{m}|+|c_{m}c_{g}|. This contradicts the optimality of p^\hat{p}.

(2) cc is inside the entire ellipse, but outside GG: cc must be in the right half of the ellipse but outside G1G_{1} (Fig. 1 e). Again, p^\hat{p} must go through cc. The shortest possible path which goes through cc is ccccgc_{c}-c-c_{g}, if it exists, which must intersect either cccmc_{c}-c_{m} or cmcgc_{m}-c_{g} at some point cic_{i}. Due to substructure optimality (i.e. a sub-path of a shortest path is still a shortest path), the shortest path between any points on cccmcgc_{c}-c_{m}-c_{g} must be its sub-path. If cic_{i} is on cccmc_{c}-c_{m}, then the shortest path from cic_{i} to cgc_{g} must be cicmcgc_{i}-c_{m}-c_{g}. ccccicmcgc_{c}-c-c_{i}-c_{m}-c_{g} is longer than cccmcgc_{c}-c_{m}-c_{g}, and therefore not optimal. If cic_{i} is on cmcgc_{m}-c_{g}, the shortest path between ccc_{c} and cic_{i} must be cccmcic_{c}-c_{m}-c_{i}. cccmciccgc_{c}-c_{m}-c_{i}-c-c_{g} is longer than cccmcgc_{c}-c_{m}-c_{g}, and therefore not optimal. In both cases, the shortest possible path going through cc is longer than cccmcgc_{c}-c_{m}-c_{g}. This contradicts the optimality of p^\hat{p}.

Therefore, Cobstmin\forall C_{obst}^{min}, cCobstmin\forall c\in C_{obst}^{min}, cGc\in G. ∎

Refer to caption
Figure 1: Cobstmin1,Cobstmin2,,Cobstmini,Cobstmin¯𝒞obstminC_{obst}^{min_{1}},\leavevmode\nobreak\ C_{obst}^{min_{2}},\leavevmode\nobreak\ ...\leavevmode\nobreak\ ,\leavevmode\nobreak\ C_{obst}^{min_{i}},\leavevmode\nobreak\ C_{obst}^{\overline{min}}\in\mathcal{C}_{obst}^{min}.

III-B Realistic Nonholonomic Robot

Based on the propositions discussed in Sec. III-A, we present the hallucination technique for a realistic robot. In realistic scenarios, we approximate all CobstminC_{obst}^{min} in 𝒞obstmin\mathcal{C}_{obst}^{min} with Cobstmin¯C_{obst}^{\overline{min}}. We hypothesize that all CobstminC_{obst}^{min} are sufficiently similar that using just one leads to learning that is just as good as if we used them all, especially when (1) ccc_{c}, cmc_{m}, and cgc_{g} extracted from realistic trajectories are close to each other, and (2) CobstC_{obst} is instantiated in terms of discrete LiDAR beams. Empirical evidence of this sufficiency will be provided in Sec. IV. However, a realistic robot cannot be modeled as a simple point mass because: (1) the size is not negligible and we need a path for the center of the robot that causes no part of the robot to collide with an obstacle; (2) nonholonomic robots cannot change motion direction instantly, so we need to generalize to a continuously turning path from the piece-wise point mass example. In Sec. III-B, we aim to adapt the point mass example (Sec. III-A) to real-world robots, and therefore need to address these differences.

To address (1), we define one point to the left and another to the right of the centroid of the robot, offset by the robot width, as footprint points, as shown in Fig. 2. The instantaneous linear velocity along the line between these two footprint points is zero for nonholonomic vehicles. The polygon defined by a sequence of footprint points must belong to free space. The actual path executed by the optimal plan pp follows the middle of this area. To address (2), we define cωc_{\omega} as configurations between the current and goal configurations with a non-zero angular velocity (ω0\omega\neq 0). Given the current configuration, each cωc_{\omega}, and each cωc_{\omega}’s next configuration, their left or right footprint points are treated as ccc_{c}, cmc_{m}, and cgc_{g} in the point robot case: based on the sign of ω\omega, the robot turns left or right, and the left or right footprint points are chosen. For each triple of point robot ccc_{c}, cmc_{m}, and cgc_{g}, for efficiency, we approximate all different CobstminC_{obst}^{min} with Cobstmin¯C_{obst}^{\overline{min}} (Fig. 1 d). For a realistic optimal plan pp with actual ccc_{c} (current) and cgc_{g} (goal) and multiple point robot cccmcgc_{c}-c_{m}-c_{g} triples in between, we define the union of all Cobstmin¯C_{obst}^{\overline{min}} as 𝒞obstmin¯\mathcal{C}_{obst}^{\overline{min}}.

In particular, we define an “opposite” function of f()f(\cdot): 𝒞obstmin¯=o(p|cc,cg)\mathcal{C}_{obst}^{\overline{min}}=o(p\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g}) as the minimal hallucination function.333Note the inverse function f1()f^{-1}(\cdot) does not exist. Technically, cgc_{g} can be uniquely determined by pp and ccc_{c}, but we include it as an input to o()o(\cdot) for notational symmetry with f()f(\cdot). This function finds 𝒞obstmin¯\mathcal{C}_{obst}^{\overline{min}} based on the fact that pp is an optimal plan. As visualized in Fig. 2, CobstC_{obst} is instantiated in terms of discrete LiDAR range readings. For each LiDAR beam, we define a minimum and a maximum range, which limit the possible range readings of this particular beam based on the optimal plan pp. The minimum range of all beams is determined by the left and right boundary, as configurations within the boundary must be in CfreeC_{free}. We project all Cobstmin¯C_{obst}^{\overline{min}} onto the corresponding LiDAR beams and for the beams directly intersect any Cobstmin¯C_{obst}^{\overline{min}}, the maximum range is set as the distance from the robot to the intersection point. For other beams, the maximum range is simply the LiDAR’s physical limit, or manually pre-processed to a certain threshold.

Refer to caption
Figure 2: Applying the point robot example to realistic robot case: CobstminC_{obst}^{min} is instantiated as LiDAR beams whose maximum range is determined by ray casting.

III-C Sampling between Min and Max Range

Given the representative minimal unreachable set, we want to find all possible unreachable sets, or their sensor readings, that could lead to pp being the optimal plan. Based on the LiDAR scan with a minimum and maximum range for each beam (end of Sec. III-B), a sampling strategy is devised to create many obstacle sets, CobstC_{obst}, in which pp is optimal.

Our sampling strategy aims at creating different range readings that (1) resemble real-world obstacles, and (2) respect uncertainty/safety. For (1), most real-world obstacles have a certain footprint, and their surface contribute to continuity among neighboring beams. Starting from the first beam, a random range is sampled between min and max with a uniform distribution. Moving on to the neighboring beam, with a probability α\alpha, we increase, or decrease, the previous range by a small random amount, and assign the value to the current beam. This practice is to simulate the continuity in neighboring beams. We make sure this value is within the min and max range of that beam. With probability 12α1-2\alpha, we start from scratch and randomly sample between this beam’s min and max values. This practice is to simulate the scenarios where the next beam misses the current obstacle, and reaches another one or does not reach any obstacle at all. For (2), we add an offset value as a function of the optimal plan pp to the ranges. For example, given a faster speed of pp, a larger positive offset is added to the range reading (obstacles are farther away), because faster motion is correlated with more uncertainty or less safety (details can be found in Sec. IV-A). One example scan sample is shown in Fig. 2 as blue ×\times’s.

The entire HLSD pipeline is described in Alg. 1. The inputs to the algorithm are a random exploration policy πrand\pi_{rand} in open space; the minimal hallucination function o()o(\cdot); a sampling count of hallucinated CobstC_{obst} to be generated per data point; the offset()(\cdot) function; the probability α\alpha; and a parameterized planner fθ()f_{\theta}(\cdot). For every data point (p,cc,cg)(p,c_{c},c_{g}) in 𝒟raw\mathcal{D}_{raw} collected using πrand\pi_{rand} in open space (line 2), we hallucinate 𝒞obstmin¯\mathcal{C}_{obst}^{\overline{min}} (line 5) and generate the min and max values for every LiDAR beam (line 6). Lines 8–15 correspond to the sampling technique to generate random laser scans. We instantiate CobstC_{obst} as LiDAR readings LL and add it to 𝒟train\mathcal{D}_{train} (line 16). This process is repeated sampling count times for every data in 𝒟raw\mathcal{D}_{raw}. Finally, we train fθ()f_{\theta}(\cdot) with supervised learning (line 19). This hallucinated learning enables sober deployment with perception of the real configuration CobstrealC_{obst}^{real} without runtime hallucination (Lines 21–22).

Algorithm 1 Hallucinated Learning and Sober Deployment
0:  πrand\pi_{rand}, o()o(\cdot), sampling count, offset(\cdot), α\alpha, fθ()f_{\theta}(\cdot)
1:  // Hallucinated Learning
2:  collect motion plans (p,cc,cg)(p,c_{c},c_{g}) from πrand\pi_{rand} in free space and form raw data set 𝒟raw\mathcal{D}_{raw}
3:  𝒟train\mathcal{D}_{train}\leftarrow\emptyset
4:  for every (p,cc,cg)(p,c_{c},c_{g}) in 𝒟raw\mathcal{D}_{raw} do
5:     hallucinate 𝒞obstmin¯=o(p|cc,cg)\mathcal{C}_{obst}^{\overline{min}}=o(p\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g})
6:     generate LiDAR range LminL_{min} and LmaxL_{max} with 𝒞obstmin¯\mathcal{C}_{obst}^{\overline{min}}
7:     for iter = 1 : sampling count do
8:        LL\leftarrow\emptyset
9:        l1[lmin1,lmax1]l^{1}\sim[l_{min}^{1},l_{max}^{1}], l1l1+offset(p)l^{1}\leftarrow l^{1}+\text{\emph{offset}}(p)
10:        add l1l^{1} to LL, llastl1l^{last}\leftarrow l^{1}
11:        for i = 2 : |L||L| do
12:           increase, decrease llastl^{last} by a randomly selected small amount, or li[lmini,lmaxi]l^{i}\sim[l_{min}^{i},l_{max}^{i}], lili+offset(p)l^{i}\leftarrow l^{i}+\text{\emph{offset}}(p), with probability α\alpha, α\alpha, and 12α1-2\alpha, respectively, and assign to lil^{i}
13:           make sure li[lmini,lmaxi]l^{i}\in[l_{min}^{i},l_{max}^{i}]
14:           add lil_{i} to LL, llastlil^{last}\leftarrow l^{i}
15:        end for
16:        CobstLC_{obst}\leftarrow L, 𝒟train=𝒟train(Cobst,p,cc,cg)\mathcal{D}_{train}=\mathcal{D}_{train}\cup(C_{obst},p,c_{c},c_{g})
17:     end for
18:  end for
19:  train fθ()f_{\theta}(\cdot) with 𝒟train\mathcal{D}_{train} by minimizing the error 𝔼(Cobst,p,cc,cg)𝒟train[(p,fθ(Cobst|cc,cg))]\mathbb{E}_{(C_{obst},p,c_{c},c_{g})\sim\mathcal{D}_{train}}\big{[}\ell(p,f_{\theta}(C_{obst}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g}))\big{]}
20:  // Sober Deployment (each time step)
21:  receive Cobstreal,cc,cgC_{obst}^{real},c_{c},c_{g}
22:  plan p={ui}i=1t=fθ(Cobstreal|cc,cg)p=\{u_{i}\}_{i=1}^{t}=f_{\theta}(C_{obst}^{real}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g})
23:  return  pp

IV EXPERIMENTS

Simulated and physical experiments are conducted to validate our hypothesis that HLSD can achieve better performance (faster, smoother, safer) than a classical method and LfH with runtime hallucination. In our experiments, we use a Clearpath Jackal robot, a four-wheeled, differential-drive, nonholonomic, Unmanned Ground Vehicle (UGV), running the Robot Operating System (ROS) move_base navigation stack. Its DWA local planner is replaced with HLSD. The robot navigates without a map. The global planner (Dijkstra’s algorithm) assumes unknown regions are free and replans when obstacles are perceived. The local environment is known to the local planner.

IV-A Implementation

In order to instantiate Alg. 1, with o()o(\cdot) described in detail in Sec. III-B, one still needs to define fθ()f_{\theta}(\cdot), πrand\pi_{rand}, sampling count, α\alpha, and offset()(\cdot):

fθ()f_{\theta}(\cdot): For p={ui}i=1t=fθ(Cobstreal|cc,cg)p=\{u_{i}\}_{i=1}^{t}=f_{\theta}(C_{obst}^{real}\leavevmode\nobreak\ |\leavevmode\nobreak\ c_{c},c_{g}), instead of requiring a high-resolution global path from the global planner (Dijkstra’s) to construct CobstC_{obst}^{*} in LfH [6], we only query a single local goal cgc_{g} on the global path 1m away from the robot at each time step, and ccc_{c} is the origin in the robot frame (orientation is ignored for simplicity). The UGV is equipped with a 720-dimensional 2D LiDAR with a 270 field of view, and we clip the maximum range to 1m to reduce the input space (CobstrealC_{obst}^{real}). The planning horizon tt of pp is set to 1, i.e. only a single action p={u1}={(v1,ω1)}p=\{u_{1}\}=\{\left(v_{1},\omega_{1}\right)\} (linear and angular velocity) is produced. We use the same three-layer neural network, with 256 hidden neurons and ReLU activation for each layer as in LfH [6].

πrand\pi_{rand}: πrand\pi_{rand} randomly picks a target (v^,ω^)\left(\hat{v},\hat{\omega}\right) pair and commands the robot to reach that speed with constant increments/decrements considering acceleration limit. After reaching (v^,ω^)\left(\hat{v},\hat{\omega}\right), πrand\pi_{rand} keeps that command with some probability (0.9) or otherwise generates a new target command. We limit the output v[0,1.0]v\in[0,1.0]m/s and ω[1.57,1.57]\omega\in[-1.57,1.57]rad/s. During training in a simulated open space, control inputs (vv and ω\omega) and robot configurations (xx, yy, and ψ\psi) are recorded. Unlike the dataset collected by LfH [6], which mostly contains v=0.4v=0.4m/s, our dataset contains a variety of vv values. 12585 data points are collected in a 505s real-time simulation, including a variety of motions in an open space.

sampling count, α\alpha, and offset()(\cdot): We set sampling count to 10 and α\alpha to 0.48. The offset(\cdot) function linearly maps current vv in the range [0.3, 1.0]m/s to an offset value between [0, 1]m. Considering the fact that a real robot also needs to turn even in open space due to nonholonomic constraints, as opposed to an ideal point mass robot which does not, we also hallucinate Cobst=C_{obst}=\emptyset for configurations where v>0.8v>0.8m/s. Considering highly constrained spaces, we additionally hallucinate the most constrained CobstC_{obst}^{*} for configurations where v<0.3v<0.3m/s. So |𝒟train|=12|𝒟raw||\mathcal{D}_{train}|=12*|\mathcal{D}_{raw}|. Training takes less than three minutes on a NVIDIA GeForce GTX 1650 laptop GPU.

After computing pp with Alg. 1, we use the same Model Predictive Control model as in LfH [6] to check for collisions. When a collision is detected, the robot enters a two-phase recovery behavior: it first queries the global path immediately in front of the robot, and rotates to align the robot heading with the tangential direction of the global path. If this recovery behavior is still not safe as determined by the collision checking, the robot backs up at v=0.2v=-0.2m/s. Since LfH learns only from the most constrained C-space, it requires runtime hallucination to match with training data, and a Turn in Place module to drive the robot out of out-of-distribution scenarios. Neither of those components are required by HLSD. Because our dataset contains varying v[0,1.0]v\in[0,1.0]m/s, the LfH speed modulation module to adapt mostly constant vv to real environments is not necessary either. The robot is able to react to the real obstacle configuration by using fθ()f_{\theta}(\cdot) alone because 𝒟train\mathcal{D}_{train} covers a wide range of distributions.

IV-B Simulated Experiments

Refer to caption
Figure 3: Simulated Environments of Different Difficulties in the BARN Dataset [11]
Refer to caption
Figure 4: Simulation Results

We first use the Benchmark Autonomous Robot Navigation (BARN) dataset with 300 navigation environments and quantified difficulty levels [11] (example environments shown in Fig. 3) to compare (1) DWA, (2) original LfH trained on the mostly constant 0.4m/s dataset with hallucinated CobstC_{obst}^{*} (LfH 0.4), (3) LfH trained on our varying speed 1.0m/s dataset with CobstC_{obst}^{*} (LfH 1.0), (4) HLSD trained on the 0.4m/s dataset with augmentations to Cobstmin¯C_{obst}^{\overline{min}} (HLSD 0.4), and (5) HLSD trained on the 1.0m/s dataset with Cobstmin¯C_{obst}^{\overline{min}} (HLSD 1.0). The baselines are chosen for the following reasons: DWA is a widely used local planner and its implementation is available through the ROS move_base navigation stack. We only consider similar self-supervised approaches and exclude methods that require expert supervision [1, 18], because HLSD does not require any expert motion trajectories. Although DWA’s default max linear velocity is 0.5m/s, for a fair comparison, we set DWA’s max linear velocity to the same as HLSD’s (1.0m/s). We find that by also doubling DWA’s default sampling rate for linear and angular velocity (12 and 40), the robot’s performance is roughly the same as when using the default parameters (but at double the speed). Simulated results are shown in Fig. 4.

In each of the 300 navigation environments generated by Cellular Automaton, the robot navigates between a specified start and goal location without a map. We record the traversal time for each trial, with a maximum of 50s. Three trials are conducted for each planner in each environment, resulting in a total of 4500 trials. The difficulty of the 300 environments are ordered from left to right based on the DWA performance (blue line). The performances of other planners are plotted as dots, for which a line is fit using linear regression. LfH 1.0 (red) fails many trials. The reason is that learning from a dataset with varying speed and always hallucinating the most constrained spaces causes ambiguity: the same most constrained C-space can map to different plans, which confuses the learner. LfH 0.4 (magenta) and HLSD 0.4 (cyan), both trained on the mostly constant 0.4m/s dataset without ambiguity, achieve similar performance and are less sensitive to navigational difficulty, especially LfH 0.4, but LfH 0.4 requires runtime hallucination and other components. Note that DWA has max speed of 1.0m/s, while LfH 0.4 and HLSD 0.4 have up to 0.6m/s max speed, modulated from mostly 0.4m/s learned from the dataset. DWA has an advantage in easy environments (left) due to its fast speed, but in difficult ones (right), LfH 0.4 and HLSD 0.4 are more stable. Also having 1.0m/s max speed, HLSD 1.0 considers the varying linear velocity in the 1.0m/s dataset with the offset(\cdot) function (line 9 in Alg. 1), and achieves the best result, outperforming all alternatives across the entire range of difficulties. The means and standard deviations of all five planners calculated from all trials are shown in Tab. I. LfH 1.0 has the largest average time and variance, because the learner is confused by the varying training labels in the same most constrained spaces. DWA has large time and also high variance due to its sampling nature. Again, HLSD 0.4 performs similarly as LfH 0.4 does. HLSD 1.0 still outperforms all other planners.

Table I: Simulated and Physical Traversal Time in Seconds
DWA LfH 0.4 LfH 1.0 HLSD 0.4 HLSD 1.0
Sim. 17.0±\pm12.6 13.5±\pm6.4 26.7±\pm15.0 13.4±\pm9.8 8.5±\pm6.3
Phy. 78.8±\pm10.0 67.0±\pm4.4 \infty 66.3±\pm0.7 45.4±\pm0.4

IV-C Physical Experiments

We also deploy the same set of local planners in a physical test course, five trials each (Fig. 5 top). The results are shown in Tab. I. The complicated obstacle course causes DWA to execute many recovery behaviors, and the robot takes a long average time with large variance to finish the traversal. LfH 0.4 and HLSD 0.4 are both able to successfully navigate the robot through, with similar traversal times. But note that LfH 0.4 requires a fine-resolution global path for hallucination during deployment and the Turn in Place module, while HLSD does not and only reacts to the real obstacles without any other extra components. In this physical obstacle course, LfH 1.0 fails every trial due to multiple collisions. Our HLSD 1.0 achieves the best performance both in terms of average time and standard deviation. HLSD deployment in a natural cluttered environment is shown in Fig. 5 bottom.

Refer to caption
Refer to caption
Figure 5: Physical Experiments (https://www.youtube.com/watch?v=LZcBN9zgtXg&t=11s)

V CONCLUSIONS

We introduce HLSD, a self-supervised machine learning technique for mobile robot navigation with safety guarantee during training. Similar to LfH [6], the robot safely learns in a completely obstacle-free environment and its perception is hallucinated with obstacle configurations where the actions taken in the free space remain optimal. Instead of overfitting to the most constrained spaces during training and requiring runtime hallucination and other modules to adapt to actual environments, the new HLSD pipeline allows the robot to navigate with the learned planner alone in response to realistic perception. By leveraging a large body of carefully designed hallucinations used for training, the learned planner does not need to deal with many out-of-distribution scenarios and has its own capability to address uncertainty/safety in the real-world during deployment.

References

  • [1] M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart, and C. Cadena, “From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 1527–1533.
  • [2] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 1343–1350.
  • [3] A. Faust, K. Oslund, O. Ramirez, A. Francis, L. Tapia, M. Fiser, and J. Davidson, “Prm-rl: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 5113–5120.
  • [4] S. Siva, M. Wigness, J. Rogers, and H. Zhang, “Robot adaptation to unstructured terrains by joint representation and apprenticeship learning,” in Robotics: Science and Systems (RSS), 2019.
  • [5] X. Xiao, B. Liu, G. Warnell, J. Fink, and P. Stone, “Appld: Adaptive planner parameter learning from demonstration,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4541–4547, 2020.
  • [6] X. Xiao, B. Liu, G. Warnell, and P. Stone, “Toward agile maneuvers in highly constrained spaces: Learning from hallucination,” IEEE Robotics and Automation Letters, pp. 1503–1510, 2021.
  • [7] B. Liu, X. Xiao, and P. Stone, “A lifelong learning approach to mobile robot navigation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1090–1096, 2021.
  • [8] S. Quinlan and O. Khatib, “Elastic bands: Connecting path planning and control,” in [1993] Proceedings IEEE International Conference on Robotics and Automation.   IEEE, 1993, pp. 802–807.
  • [9] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997.
  • [10] X. Xiao, J. Dufek, T. Woodbury, and R. Murphy, “Uav assisted usv visual navigation for marine mass casualty incident response,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 6105–6110.
  • [11] D. Perille, A. Truong, X. Xiao, and P. Stone, “Benchmarking metric ground navigation,” in 2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).   IEEE, 2020, pp. 116–121.
  • [12] J.-C. Latombe, Robot motion planning.   Springer Science & Business Media, 2012, vol. 124.
  • [13] S. LaValle, Planning algorithms.   Cambridge university press, 2006.
  • [14] X. Xiao, J. Biswas, and P. Stone, “Learning inverse kinodynamics for accurate high-speed off-road navigation on unstructured terrain,” arXiv preprint arXiv:2102.12667, 2021.
  • [15] Z. Wang, X. Xiao, B. Liu, G. Warnell, and P. Stone, “Appli: Adaptive planner parameter learning from interventions,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021.
  • [16] Z. Xu, G. Dhamankar, A. Nair, X. Xiao, G. Warnell, B. Liu, Z. Wang, and P. Stone, “Applr: Adaptive planner parameter learning from reinforcement,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021.
  • [17] X. Xiao, B. Liu, G. Warnell, and P. Stone, “Motion control for mobile robot navigation using machine learning: a survey,” arXiv preprint arXiv:2011.13112, 2020.
  • [18] A. Pokle, R. Martín-Martín, P. Goebel, V. Chow, H. M. Ewald, J. Yang, Z. Wang, A. Sadeghian, D. Sadigh, S. Savarese et al., “Deep local trajectory replanning and control for robot navigation,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 5815–5822.