Path Planning using Neural A* Search

Ryo Yonetani Tatsunori Taniai Mohammadamin Barekatain Mai Nishimura Asako Kanezaki

Abstract

We present Neural A*, a novel data-driven search method for path planning problems. Despite the recent increasing attention to data-driven path planning, machine learning approaches to search-based planning are still challenging due to the discrete nature of search algorithms. In this work, we reformulate a canonical A* search algorithm to be differentiable and couple it with a convolutional encoder to form an end-to-end trainable neural network planner. Neural A* solves a path planning problem by encoding a problem instance to a guidance map and then performing the differentiable A* search with the guidance map. By learning to match the search results with ground-truth paths provided by experts, Neural A* can produce a path consistent with the ground truth accurately and efficiently. Our extensive experiments confirmed that Neural A* outperformed state-of-the-art data-driven planners in terms of the search optimality and efficiency trade-off. Furthermore, Neural A* successfully predicted realistic human trajectories by directly performing search-based planning on natural image inputs¹¹1Project page: https://omron-sinicx.github.io/neural-astar/..

ICML, Path Planning, Neural A* Search

1 Introduction

Path planning refers to the problem of finding a valid low-cost path from start to goal points in an environmental map. Search-based planning, including the popular A* search (Hart et al., 1968), is a common approach to path planning problems and has been used in a wide range of applications such as autonomous vehicle navigation (Paden et al., 2016), robot arm manipulation (Smith et al., 2012), and game AI (Abd Algfoor et al., 2015). Compared to other planning approaches such as sampling-based planning (González et al., 2015) and reactive planning (Tamar et al., 2016; Lee et al., 2018), search-based planning is guaranteed to find a solution path, if one exists, by incrementally and extensively exploring the map.

Learning how to plan from expert demonstrations is gathering attention as promising extensions to classical path planners. Recent work has demonstrated major advantages of such data-driven path planning in two scenarios: (1) finding near-optimal paths more efficiently than classical heuristic planners in point-to-point shortest path search problems (Choudhury et al., 2018; Qureshi et al., 2019; Takahashi et al., 2019; Chen et al., 2020; Ichter et al., 2020) and (2) enabling path planning on raw image inputs (Tamar et al., 2016; Lee et al., 2018; Ichter & Pavone, 2019; Vlastelica et al., 2020), which is hard for classical planners unless semantic pixel-wise labeling of the environment is given.

Refer to caption — Figure 1: Two Scenarios of Path Planning with Neural A*. (a) Point-to-point shortest path search: finding a near-optimal path (red) with fewer node explorations (green) for an input map. (b) Path planning on raw image inputs: accurately predicting a human trajectory (red) on a natural image.

In this study, we address both of these separately studied scenarios in a principled fashion, as highlighted in Fig. 1. In contrast to most of the existing data-driven methods that extend either sampling-based or reactive planning, we pursue a search-based approach to data-driven planning with the intrinsic advantage of guaranteed planning success.

Studies in this direction so far are largely limited due to the difficulty arising from the discrete nature of incremental search steps in search-based planning, which makes the learning using back-propagation non-trivial. Some existing methods thus train heuristic cost functions at each grid point independently, which requires fine-grained expert annotations such as oracle planners running online (Choudhury et al., 2018) and exhaustive pre-computations of the optimal heuristic function of a planner (Takahashi et al., 2019). However, such rich annotations are not always available nor scalable, especially when involving labor-intensive manual annotation processes as in (Kim & Pineau, 2016; Kretzschmar et al., 2016; Pérez-Higueras et al., 2018). More recently, Vlastelica et al. (2020) have applied black-box optimization to combinatorial solvers integrated into neural networks, thus enabling end-to-end learning through combinatorial algorithms including search-based planning. This general-purpose optimization can be adapted to our problem. However, treating the entire search procedure as a black-box function loses the detailed track of internal search steps, making the training hard.

To address the non-differentiability of search-based planning, we propose a novel data-driven search-based planner named Neural A*. At its core, we reformulate a canonical A* search algorithm to be differentiable as a module referred to as the differentiable A*, by combining a discretized activation technique inspired by Hubara et al. (2016) with basic matrix operations. This module enables us to perform an A* search in the forward pass of a neural network and back-propagate losses through every search step to other trainable backbone modules. As illustrated in Fig. 2, Neural A* consists of the combination of a fully-convolutional encoder and the differentiable A* module, and is trained as follows: (1) Given a problem instance (i.e., an environmental map annotated with start and goal points), the encoder transforms it into a scalar-valued map representation referred to as a guidance map; (2) The differentiable A* module then performs a search with the guidance map to output a search history and a resulting path; (3) The search history is compared against the ground-truth path of the input instance to derive a loss, which is back-propagated to train the encoder.

The role of the differentiable A* for training is simple: teaching the encoder to produce guidance maps that lead to minimizing the discrepancy between the resulting search histories and ground-truth paths. The encoder then learns to capture visual cues in the inputs that are effective for reproducing the ground-truth paths. This learning principle provides unified solutions to the aforementioned two problem scenarios. Specifically, in the shortest-path search scenario (Fig. 1a), where the ground-truth paths are given by optimal planners, the encoder is trained to find near-optimal paths efficiently by exploiting visual cues such as shapes of dead ends. Here, guidance maps are served to augment the input maps to prioritize which nodes to explore/avoid for improving the search optimality-efficiency trade-off. On the other hand, when the ground-truth paths are given by human annotators for raw image inputs (Fig. 1b), the encoder enables the planning directly on the images by learning passable and impassable regions from colors and textures as low- and high-cost nodes in guidance maps.

We extensively evaluate our approach on both synthetic (Bhardwaj et al., 2017) and real-world datasets (Sturtevant, 2012; Robicquet et al., 2016). Our results show that Neural A* outperformed state-of-the-art data-driven search-based planners (Choudhury et al., 2018; Vlastelica et al., 2020) in terms of the trade-off between search optimality and efficiency for point-to-point shortest path problems. Further, we demonstrate that Neural A* can learn to predict pedestrian trajectories from raw real-world surveillance images more accurately than Vlastelica et al. (2020) and other imitation learners (Ratliff et al., 2006; Tamar et al., 2016; Lee et al., 2018).

2 Preliminaries

Path planning problem.

Let us consider a path planning problem, in particular a point-to-point shortest (i.e., lowest-cost) path problem, on a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ where $\mathcal{V}$ is the set of nodes representing the locations of the environment and $\mathcal{E}$ is the set of potentially valid movements between nodes. For convenience, we define an alternate graph form $\mathcal{G}=(\mathcal{V},\mathcal{N})$ with $\mathcal{N}(v)=\{v^{\prime}\mid(v,v^{\prime})\in\mathcal{E},v\neq v^{\prime}\}$ referring to the set of the neighbor nodes of $v$ . Each edge $(v,v^{\prime})$ is given a non-negative movement cost $c(v^{\prime})\in\mathbb{R}_{+}$ that depends only on the node $v^{\prime}$ . Given start and goal nodes, $v_{\rm s},v_{\rm g}\in\mathcal{V}$ , the objective of path planning is to find a sequence of connected nodes, $P=(v_{1},v_{2},\dots,v_{T})\in\mathcal{V}^{T}$ , $v_{1}=v_{\rm s}$ , $v_{T}=v_{\rm g}$ , with its total cost $\sum_{t=1}^{T-1}c(v_{t+1})$ being the lowest among all possible paths from $v_{\rm s}$ to $v_{\rm g}$ . Following Choudhury et al. (2018), this work focuses on a popular setting where $\mathcal{G}$ refers to the eight-neighbor grid world and $c(v^{\prime})$ is a unit cost taking $c(v^{\prime})=1$ when $v^{\prime}$ is passable and $c(v^{\prime})=\infty$ otherwise, e.g., when $v^{\prime}$ is occupied by an obstacle.

A* search.

Algorithm 1 A* Search

1:Graph

\mathcal{G}

, movement cost

c

, start

v_{\rm s}

, and goal

v_{\rm g}

2:Shortest path

P

3:Initialize

\mathcal{O}\leftarrow v_{\rm s}

\mathcal{\mathcal{C}}\leftarrow\emptyset

\texttt{Parent}(v_{\rm s})\leftarrow\emptyset

4:while

v_{\rm g}\notin\mathcal{C}

5: Select

v^{*}\in\mathcal{O}

based on Eq. (1).

6: Update

O\leftarrow O\setminus v^{*}

\mathcal{C}\leftarrow\mathcal{C}\cup v^{*}

7: Extract

\mathcal{V}_{\rm nbr}\subset\mathcal{V}

based on Eq. (2).

8: for each

v^{\prime}\in\mathcal{V}_{\rm nbr}

9: Update

\mathcal{O}\leftarrow\mathcal{O}\cup v^{\prime}

\texttt{Parent}(v^{\prime})\leftarrow v^{*}

10: end for

11:end while

12:

P\leftarrow\texttt{Backtrack}(\texttt{Parent},v_{\rm g})

Algorithm 1 overviews the implementation of A* search employed in this work. It explores nodes to find a shortest path $P$ by iterating between (1) selecting the most promising node from the list of candidates for constructing a shortest path and (2) expanding neighbors of the selected node to update the candidate list, until the goal $v_{\rm g}$ is selected. More specifically, the node selection (Line 3 of Algorithm 1) is done based on the following criterion:

v^{*}=\mathop{\rm arg~{}min}\limits_{v\in\mathcal{O}}\left(g(v)+h(v)\right),

(1)

where $\mathcal{O}\subset\mathcal{V}$ is an open list that manages candidate nodes for the node selection. $g(v)$ refers to the actual total cost accumulating $c(v^{\prime})$ for the nodes $v^{\prime}$ along the current best path from $v_{\rm s}$ to $v$ , which is updated incrementally during the search. On the other hand, $h(v)$ is a heuristic function estimating the total cost from $v$ to $v_{\rm g}$ , for which the straight-line distance between $v$ and $v_{\rm g}$ is often used in the grid world. All the selected nodes are stored in another list of search histories called closed list, $\mathcal{C}\subseteq\mathcal{V}$ , as done in Line 4.

In Line 5 of Algorithm 1, we expand the neighboring nodes of $v^{*}$ as $\mathcal{V}_{\rm nbr}\subset\mathcal{V}$ based on the following criterion:

\mathcal{V}_{\rm nbr}=\{v^{\prime}\mid v^{\prime}\in\mathcal{N}(v^{*})\land v^{\prime}\notin\mathcal{O}\land v^{\prime}\notin\mathcal{\mathcal{C}}\}.

(2)

The neighbor nodes $\mathcal{V}_{\rm nbr}$ are then added to $\mathcal{O}$ in Line 7 to propose new selection candidates in the next iteration. The search is terminated once $v_{\rm g}$ is selected in Line 3 and stored in $\mathcal{C}$ , followed by Backtrack that traces the parent nodes $\texttt{Parent}(v)$ from $v_{\rm g}$ to $v_{\rm s}$ to obtain a solution path $P$ .

Data-driven planning setup.

As explained in Sec. 1, we seek a principled search-based approach to two distinct scenarios of data-driven path planning in existing work: (1) finding near-optimal paths efficiently for point-to-point shortest path problems (Choudhury et al., 2018), and (2) enabling the path planning on raw images in the absence of movement costs (Vlastelica et al., 2020).

To this end, we abstract these two settings by introducing a 2D environmental-map variable $X$ representing the input graph $\mathcal{G}$ . Specifically, corresponding to the two settings with known and unknown movement costs $c(v^{\prime})$ , the map $X$ represents $c(v^{\prime})$ either (1) explicitly as a binary map $X\in\{0,1\}^{\mathcal{V}}$ taking $1$ for passable locations (i.e., $c(v^{\prime})=1$ ) and $0$ otherwise, or (2) implicitly as a raw colored image $X\in[0,1]^{3\times\mathcal{V}}$ . As a result, each problem instance is represented consistently as an indexed tuple $Q^{(i)}=(X^{(i)},v_{\rm s}^{(i)},v_{\rm g}^{(i)})$ .

For each problem instance $Q^{(i)}$ , we further assume that a ground-truth path is given as a 2D binary map $\bar{P}^{(i)}\in\{0,1\}^{\mathcal{V}}$ whose elements take $1$ along the desired path. When $X^{(i)}$ is a binary map indicating the movement cost explicitly, $\bar{P}^{(i)}$ is obtained by solving the shortest path problem using an optimal planner. If $X^{(i)}$ is a raw image, we assume that $\bar{P}^{(i)}$ is provided by a human annotator.

3 Neural A* Search

Now we present the proposed Neural A* search. At a high level, Neural A* uses an encoder to transform a problem instance $Q^{(i)}$ into a guidance map, as illustrated in Fig. 2. This guidance map imposes a guidance cost $\phi^{(i)}(v)\in\mathbb{R}_{+}$ to each node $v$ . Then, the differentiable A* module, detailed in Sec. 3.1, performs a search under the policy to minimize a total guidance cost, i.e., $\sum_{t=1}^{T-1}\phi^{(i)}(v_{t+1})$ . By repeating this forward pass and the back-propagation through the training procedure described in Sec. 3.2, the encoder learns to capture visual cues in the training instances to improve the path accuracy and search efficiency.

3.1 Differentiable A* Module

Variable representations.

To reformulate A* search in a differentiable fashion, we represent the variables in Algorithm 1 as matrices of the map size so that each line can be executed via matrix operations. Specifically, let $O,C,V_{\rm nbr}\in\mathcal{\{}0,1\}^{\mathcal{V}}$ be binary matrices indicating the nodes contained in $\mathcal{O},\mathcal{C},\mathcal{V}_{\rm nbr}$ , respectively (we omit an index $i$ here for simplicity.) We represent the start node $v_{\rm s}$ , goal node $v_{\rm g}$ , and selected node $v^{*}$ as one-hot indicator matrices $V_{\rm s},V_{\rm g},V^{*}\in\{0,1\}^{\mathcal{V}}$ , respectively, where $\langle V_{\rm s},\mathds{1}\rangle=\langle V_{\rm g},\mathds{1}\rangle=\langle V^{*},\mathds{1}\rangle=1$ (i.e., one-hot), $\langle A,B\rangle$ is the inner product of two matrices $A$ and $B$ , and $\mathds{1}$ is the all-ones matrix of the map size. In addition, let $G,H,\Phi\in\mathbb{R}_{+}^{\mathcal{V}}$ be a matrix version of $g(v)$ , $h(v)$ , and $\phi(v)$ , respectively.

Node selection.

Performing Eq. (1) in a differentiable manner is non-trivial as it involves a discrete operation. Here, we leverage a discretized activation inspired by Hubara et al. (2016) and reformulate the equation as follows:

V^{*}=\mathcal{I}_{\text{max}}\left(\frac{\exp(-(G+H)/\tau)\odot O}{\langle\exp(-(G+H)/\tau),O\rangle}\right),

(3)

where $A\odot B$ denotes the element-wise product, and $\tau$ is a temperature parameter that will be defined empirically. $\mathcal{I}_{\text{max}}(A)$ is the function that gives the argmax index of $A$ as a binary one-hot matrix during a forward pass, while acting as the identity function for back-propagation. The open-list matrix $O$ is used here to mask the exponential term so that the selection is done from the nodes in the current open list.

Node expansion.

Expanding the neighboring nodes of $v^{*}$ in Eq. (2) involves multiple conditions, i.e., $v^{\prime}\in\mathcal{N}(v^{*})$ , $v^{\prime}\notin\mathcal{O}$ , and $v^{\prime}\notin\mathcal{C}$ . Inspired by Tamar et al. (2016), we implement $\mathcal{N}$ as a convolution between $V^{*}$ and the fixed kernel $K=[[1,1,1]^{\top},[1,0,1]^{\top},[1,1,1]^{\top}]$ . When $X$ is given as a binary cost map indicating the passable/impassable nodes, $V_{\rm nbr}$ is obtained by the following matrix operations:

V_{\rm nbr}=(V^{*}*K)\odot X\odot(\mathds{1}-O)\odot(\mathds{1}-C),

(4)

where $A*B$ is the 2D convolution of $A$ and $B$ . The multiplication by $X\odot(\mathds{1}-O)\odot(\mathds{1}-C)$ acts as a mask to extract the nodes that are passable and not contained in the open nor close lists. Masking with $X$ preserves the original obstacle structures and thus the graph topology, keeping the differentiable A* complete (i.e., guaranteed to always find a solution if one exists in the graph $\mathcal{G}$ , as in standard A*). We also introduce $\bar{V}_{\rm nbr}=(V^{*}*K)\odot X\odot O\odot(\mathds{1}-C)$ indicating the neighboring nodes already in the open list, which we will use to update $G$ below. When $X$ is otherwise a raw image that does not explicitly indicate the passable nodes, we use $V_{\rm nbr}=(V^{*}*K)\odot(\mathds{1}-O)\odot(\mathds{1}-C)$ and $\bar{V}_{\rm nbr}=(V^{*}*K)\odot O\odot(\mathds{1}-C)$ .

Updating $G$ .

As explained earlier, $g(v)$ here represents the total guidance cost paid for the actual path from $v_{\rm s}$ to $v$ , instead of accumulating the original movement cost $c(v^{\prime})$ that is not always available explicitly. To update this total cost at each iteration, we partially update $G$ with new costs $G^{\prime}$ for the neighbor nodes as follows:

$\displaystyle G$	$\displaystyle\leftarrow$	$\displaystyle G^{\prime}\odot V_{\rm nbr}+\min(G^{\prime},G)\odot\bar{V}_{\rm nbr}$	(5)
		$\displaystyle+G\odot(\mathds{1}-V_{\rm nbr}-\bar{V}_{\rm nbr}),$	(5)
$\displaystyle G^{\prime}$	$\displaystyle=$	$\displaystyle\langle G,V^{*}\rangle\cdot\mathds{1}+\Phi.$	(6)

In Eq. (5), the neighbors $V_{\rm nbr}$ that are opened for the first time are assigned $G^{\prime}$ , and the neighbors $\bar{V}_{\rm nbr}$ that have already been opened are assigned the lower of the new costs $G^{\prime}$ and previously computed costs $G$ . The new costs $G^{\prime}$ for the neighbors are computed in Eq. (6) as the sum of the current cost of the selected node, $g(v^{*})$ , expressed using $g(v^{*})=\langle G,V^{*}\rangle$ , and the one-step guidance cost to the neighbors represented by $\Phi$ .

3.2 Training Neural A*

Loss design.

The differentiable A* module connects its guidance-map input $\Phi$ and search output so that a loss for the output is back-propagated to $\Phi$ and, consequently, to the encoder. Here, the output is the closed list $C$ , which is a binary matrix accumulating all the searched nodes $V^{*}$ from Eq. (3) in a search (see Eq. (8) for details). We evaluate the mean $L_{1}$ loss between $C$ and the ground-truth path map $\bar{P}$ :

\mathcal{L}=\|C-\bar{P}\|_{1}/|\mathcal{V}|.

(7)

This loss supervises the node selections by penalizing both (1) the false-negative selections of nodes that should have been taken into $C$ to find $\bar{P}$ and (2) the false-positive selections of nodes that were excessive in $C$ to match with $\bar{P}$ . In other words, this loss encourages Neural A* to (1) search for a path that is close to the ground-truth path (2) with fewer node explorations.

In practice, we disable the gradients of $\mathcal{O}$ in Eq. (3), $V_{\rm nbr},\bar{V}_{\rm nbr}$ in Eq. (5), and $G$ in Eq. (6) by detaching them from back-propagation chains. Doing so effectively informs the encoder of the importance of the guidance cost $\Phi$ in Eq. (6) for the node selection in Eq. (3), while simplifying recurrent back-propagation chains to stabilize the training process and reduce large memory consumption²²2We found in our experiments that fully enabling the gradients caused training failures due to an out-of-memory problem with 16GB GPU RAM..

Encoder design.

The loss shown above is back-propagated through every search step in the differentiable A* module to the encoder. Here, we expect the encoder to learn visual cues in the given problem instances for enabling accurate and efficient search-based planning. These cues include, for instance, the shapes of dead ends and bypasses in binary cost maps or colors and textural patterns of passable roads in raw natural images. For this purpose, we utilize a fully-convolutional network architecture such as U-Net (Ronneberger et al., 2015) used for semantic segmentation, which can learn local visual representations at the original resolution useful for the downstream task³³3As the nature of convolutional neural networks, guidance cost outputs are only sensitive to the input map within their limited receptive fields. Thus, when the map size is intractably large, we could partially sweep the input map with the encoder incrementally during a search without changing search results.. The input to the encoder is given as the concatenation of $X$ and $V_{\rm s}+V_{\rm g}$ . In this way, the extraction of those visual cues is properly conditioned by the start and goal positions.

Enabling mini-batch training.

The complete algorithm of Neural A* is summarized in Algorithm 2. To accelerate the training, it is critical to process multiple problem instances at once in a mini batch. However, this is not straightforward because those intra-batch samples may be solved within different numbers of search steps. We address this issue by introducing a binary goal verification flag $\eta^{(i)}=1-\langle V^{(i)}_{\rm g},V^{*(i)}\rangle$ and updating $O^{(i)},C^{(i)}$ as follows:

O^{(i)}\leftarrow O^{(i)}-\eta^{(i)}V^{*(i)},\;\;C^{(i)}\leftarrow C^{(i)}+\eta^{(i)}V^{*(i)}.

(8)

Intuitively, these operations keep $O^{(i)}$ and $C^{(i)}$ unchanged once the goal is found. We repeat Lines 9–15 until we obtain $\eta^{(i)}=0$ for all the samples in the batch.

Algorithm 2 Neural A* Search

1:Problem instances

\{Q^{(i)}=(X^{(i)},v^{(i)}_{\rm s},v^{(i)}_{\rm g})\mid i=1,\dots,b\}

in a mini-batch of size

b

2:Closed-list matrices

\{C^{(i)}\mid i=1,\dots,b\}

and solution paths

\{P^{(i)}\mid i=1,\dots,b\}

3:for all

i=1,\dots,b

do in parallel

4: Compute

V^{(i)}_{\rm s},V^{(i)}_{\rm g}

from

v^{(i)}_{\rm s},v^{(i)}_{\rm g}

5: Compute

\Phi^{(i)}

from

X^{(i)},V^{(i)}_{\rm s},V^{(i)}_{\rm g}

by the encoder.

6: Initialize

O^{(i)}\leftarrow V^{(i)}_{\rm s}

C^{(i)}\leftarrow\mathbf{0}

G^{(i)}\leftarrow\mathbf{0}

7: Initialize

\texttt{Parent}^{(i)}(v^{(i)}_{\rm s})\leftarrow\emptyset

8:end for

9:repeat

10: for all

i=1,\dots,b

do in parallel

11: Select

V^{*(i)}

based on Eq. (3).

12: Compute

\eta^{(i)}=1-\langle V^{(i)}_{\rm g},V^{*(i)}\rangle

13: Update

O^{(i)}

and

C^{(i)}

based on Eq. (8).

14: Compute

V^{(i)}_{\rm nbr}

based on Eq. (4).

15: Update

O^{(i)}\leftarrow O^{(i)}+V^{(i)}_{\rm nbr}

16: Update

G^{(i)}

based on Eq. (5) and Eq. (6).

17: Update

\texttt{Parent}^{(i)}

based on Algorithm 1-L6,7.

18: end for

19:until

\eta^{(i)}=0

for

i=1,\dots,b

20:for all

i=1,\dots,b

do in parallel

21:

P^{(i)}\leftarrow\texttt{Backtrack}(\texttt{Parent}^{(i)},v^{(i)}_{\rm g})

22:end for

4 Experiments

In this section, we first conduct an extensive experiment to evaluate the effect of Neural A* on the search optimality and efficiency trade-off, i.e., how efficiently Neural A* search can find near-optimal paths for point-to-point shortest path problems. Due to space limitations, we provide the detailed experimental setups in Appendix A.

4.1 Datasets

We adopted the following public path-planning datasets with obstacle annotations to collect planning demonstrations.

•

Motion Planning (MP) Dataset: A collection of eight types of grid-world environments with distinctive obstacle shapes, created by Bhardwaj et al. (2017). Each environment group consists of 800 training, 100 validation, and 100 test maps, with the same type of obstacles placed in different layouts. We resized each environmental map into the size of $32\times 32$ to complete the whole experiment in a reasonable time. By following the original setting, the training and evaluation were conducted for each environment group independently.
•

Tiled MP Dataset: Our extension to the MP dataset to make obstacle structures more complex and diverse. Specifically, we tiled four maps drawn randomly from the MP dataset to construct an environmental map with the size of $64\times 64$ . We repeated this process to create 3,200 training, 400 validation, and 400 test maps.
•

City/Street Map (CSM) Dataset: A collection of 30 city maps with explicit obstacle annotations as binary images, organized by Sturtevant (2012). For data augmentation, we cropped multiple random patches with the size of $128\times 128$ from each map and resized them to the size of $64\times 64$ . We used 20 of the 30 maps to generate random 3,200 training and 400 validation maps and the remaining 10 maps for 400 test samples. In this way, we ensured that no maps were shared between training/validation and test splits.

For each map, we created problem instances by randomly picking out a single goal from one of the four corner regions of the map as well as one, six, and fifteen start locations for training, validation, and test splits, respectively, from areas sufficiently distant from the goal. We obtained the ground-truth shortest paths using the Dijkstra algorithm.

4.2 Methods and Implementations

Neural A*.

As the encoder, we adopted U-Net (Ronneberger et al., 2015) with the VGG-16 backbone (Simonyan & Zisserman, 2015) implemented by Yakubovskiy (2020), where the final layer was activated by the sigmoid function to constrain guidance maps to be $\Phi\in[0,1]^{\mathcal{V}}$ . For the differentiable A* module, we used the Chebyshev distance as the heuristic function $H$ suitable for the eight-neighbor grid-world (Patel, 1997). The Euclidean distance multiplied by a small constant ( $0.001$ ) was further added to $H$ for tie-breaking. The temperature $\tau$ in Eq. (3) was empirically set to the square root of the map width.

Baselines.

For assessing the significance of Neural A* in search-based planning, we adopted the following data-driven search-based planners as baseline competitors that have the planning success guarantee, like ours, by design. We extended the authors’ implementations available online.

•

SAIL (Choudhury et al., 2018): A data-driven best-first search method that achieved high search efficiency by learning a heuristic function from demonstrations. Unlike Neural A*, SAIL employs hand-engineered features such as the straight-line distances from each node to the goal and the closest obstacles. We evaluated two variants of SAIL where training samples were rolled out using the learned heuristic function (SAIL) or the oracle planner (SAIL-SL).
•

Blackbox Differentiation (Vlastelica et al., 2020): A general-purpose approach to data-driven combinatorial solvers including search-based path planning. It consists of an encoder module that transforms environmental maps into node-cost maps and a solver module that performs a combinatorial algorithm as a piece-wise constant black-box function taking the cost maps as input. We implemented a Blackbox-Differentiation version of A* search (BB-A*) by treating our differentiable A* module as a black-box function and training the encoder via black-box optimization, while keeping other setups, such as the encoder architecture and loss function, the same as those of Neural A*.

We also evaluated best-first search (BF) and weighted A* search (WA*) (Pohl, 1970) as baseline classical planners, which used the same heuristic function as that of Neural A*. Finally, we implemented a degraded version of Neural A* named Neural BF, which always used $G=\Phi$ in Eq. (3) instead of $G$ incrementally updated by Eq. (5). By doing so, Neural BF ignored the accumulation of guidance costs from the start node, much like best-first search.

4.3 Experimental Setups

All the models were trained using the RMSProp optimizer, with the mini-batch size, the number of epochs, and the learning rate set to (100, 100, 0.001) for the MP dataset and (100, 400, 0.001) for the Tiled MP and CSM datasets.

For each trained model, we calculated the following metrics to evaluate how much the trade-off between search optimality and efficiency was improved from a standard A* search performed using the identical heuristic function.

•

Path optimality ratio (Opt) measuring the percentage of shortest path predictions for each environmental map, as used by Vlastelica et al. (2020).
•

Reduction ratio of node explorations (Exp) measuring the number of search steps reduced by a model compared to the standard A* search in $0-100$ (%). Specifically, by letting $E^{*}$ and $E$ be the number of node explorations by A* and a model, respectively, it is defined by $\max(100\times\frac{E^{*}-E}{E^{*}},0)$ and averaged over all the problem instances for each environmental map.
•

The Harmonic mean (Hmean) of Opt and Exp, showing how much their trade-off was improved by a model.

During the training, we saved model weights that marked the best Hmean score on the validation split and used them for measuring final performances on the test split. To investigate statistical differences among models, we computed the bootstrap mean and 95% confidence bounds per metric.

4.4 Results

Table 1: Quantitative Results. Bootstrap means and 95% confidence bounds of path optimality ratio (Opt), reduction ratio of node explorations (Exp), and their harmonic mean (Hmean).

MP Dataset
	Opt	Exp	Hmean
BF	65.8 (63.8, 68.0)	44.1 (42.8, 45.5)	44.8 (43.4, 46.3)
WA *	68.4 (66.5, 70.4)	35.8 (34.5, 37.1)	40.4 (39.0, 41.8)
SAIL	34.6 (32.1, 37.0)	48.6 (47.2, 50.2)	26.3 (24.6, 28.0)
SAIL-SL	37.2 (34.8, 39.5)	46.3 (44.8, 47.8)	28.3 (26.6, 29.9)
BB-A*	62.7 (60.6, 64.9)	42.0 (40.6, 43.4)	42.1 (40.5, 43.6)
Neural BF	75.5 (73.8, 77.1)	45.9 (44.6, 47.2)	52.0 (50.7, 53.4)
Neural A*	87.7 (86.6, 88.9)	40.1 (38.9, 41.3)	52.0 (50.7, 53.3)
Tiled MP Dataset
	Opt	Exp	Hmean
BF	32.3 (30.0, 34.6)	58.9 (57.1, 60.8)	34.0 (32.1, 36.0)
WA*	35.3 (32.9, 37.7)	52.6 (50.8, 54.5)	34.3 (32.5, 36.1)
SAIL	5.3 (4.3, 6.1)	58.4 (56.6, 60.3)	7.5 (6.3, 8.6)
SAIL-SL	6.6 (5.6, 7.6)	54.6 (52.7, 56.5)	9.1 (7.9, 10.3)
BB-A*	31.2 (28.8, 33.5)	52.0 (50.2, 53.9)	31.1 (29.2, 33.0)
Neural BF	43.7 (41.4, 46.1)	61.5 (59.7, 63.3)	44.4 (42.5, 46.2)
Neural A*	63.0 (60.7, 65.2)	55.8 (54.1, 57.5)	54.2 (52.6, 55.8)
CSM Dataset
	Opt	Exp	Hmean
BF	54.4 (51.8, 57.0)	39.9 (37.6, 42.2)	35.7 (33.9, 37.6)
WA*	55.7 (53.1, 58.3)	37.1 (34.8, 39.3)	34.4 (32.6, 36.3)
SAIL	20.6 (18.6, 22.6)	41.0 (38.8, 43.3)	18.3 (16.7, 19.9)
SAIL-SL	21.4 (19.4, 23.3)	39.3 (37.1, 41.6)	17.6 (16.1, 19.1)
BB-A*	54.4 (51.8, 57.1)	40.0 (37.7, 42.3)	35.6 (33.8, 37.4)
Neural BF	60.9 (58.5, 63.3)	42.1 (39.8, 44.3)	40.6 (38.7, 42.6)
Neural A*	73.5 (71.5, 75.5)	37.6 (35.5, 39.7)	43.6 (41.7, 45.5)

Comparisons with baselines.

Table 1 summarizes quantitative results. Overall, Neural A* outperformed baseline methods in terms of both Opt and Hmean and improved the path optimality and search efficiency trade-off. SAIL and SAIL-SL sometimes performed more efficiently, which however came with low optimality ratios especially for larger and more complex maps of the Tiled MP and CSM datasets. BB-A* showed higher Hmean scores than those of SAIL/SAIL-SL but was consistently outperformed by Neural A*. Since the only difference between Neural A* and BB-A* is whether making the A* search differentiable or treating it as a black-box in updating an identically-configured encoder, the comparison between them highlights the effectiveness of our approach. With the differentiable A* module, Neural A* has access to richer information of internal steps, which is necessary to effectively analyze causal relationships between individual node selections by Eq. (3) and results. This information is however black-boxed in BB-A*. Also, we found that classical heuristic planners (BF and WA*) performed comparably to or sometimes better than other data-driven baselines. This result could be explained by our challenging experimental setup adopting randomized start and goal locations instead of pre-defined ones by the original work (Choudhury et al., 2018; Vlastelica et al., 2020). Finally, Neural BF achieved the highest Exp scores in the Tiled MP and CSM datasets but often ended up with sub-optimal paths since it ignored the actual cost accumulation. For more evaluations including analysis on the complexity and runtime of Neural A*, see Appendix B.

Qualitative results.

Figure 3 visualizes example search results with guidance maps produced by the encoder of Neural A*. We confirmed that the encoder successfully captured visual cues in problem instances. Specifically, higher guidance costs (shown in green) were assigned to the whole obstacle regions creating dead ends, while lower costs (shown in white) were given to bypasses and shortcuts that guided the search to the goal. Figure 4 depicts how Neural A* performed a search adaptively for different start and goal locations in the same map. Notice the entrance to the U-shaped dead end is (a) blocked with high guidance costs when the dead end is placed between the start and goal and (b) open when the goal is inside the dead end.

Ablation study.

We further assessed the effects of the encoder by comparing several different configurations for its architecture. In the configuration w/o start and goal, we fed only an environmental map $X_{i}$ to the encoder as input. In w/ ResNet-18 backbone, a residual network architecture (He et al., 2016) was used instead of VGG-16. Table 2 shows the degraded performances with these configurations especially in terms of the optimality ratio. Additionally, we tried the following variants of the proposed approach, which we found not effective. Adopting the straight-through Gumbel-softmax (Jang et al., 2017) for Eq. (3) caused complete planning failures (with Opt, Exp, Hmean all $0$ ) as it biases the planner to avoid the most promising nodes. Using the mean squared loss for Eq. (7) produces the same gradients as the mean $L_{1}$ loss up to a constant factor for binary variables (i.e., $C$ and $\bar{P}$ ), and indeed led to a comparable performance.

Table 2: Ablation Study. Performance comparisons with different architectural design choices for the encoder on the MP dataset.

	Opt	Exp	Hmean
Neural A*	87.7 (86.6, 88.9)	40.1 (38.9, 41.3)	52.0 (50.7, 53.3)
w/o start and goal	67.0 (65.1, 68.8)	36.8 (35.6, 38.1)	41.5 (40.2, 42.7)
w/ ResNet-18 backbone	79.8 (78.1, 81.5)	41.4 (40.2, 42.7)	49.2 (47.9, 50.5)

Limitations.

Although Neural A* worked well in various environments, its current implementation assumes the grid world environments with unit node costs. One interesting direction is to extend Neural A* to work on a high-dimensional or continuous state space. This will require the encoder to be tailored to such a space, as done in Qureshi et al. (2019); Chen et al. (2020); Ichter & Pavone (2019). We leave this extension for future work.

5 Path Planning on Raw-Image Inputs

As another scenario for Neural A*, we address the task of planning paths directly on raw image inputs. Specifically, suppose a video of an outdoor scene taken by a stationary surveillance camera. Planning demonstrations then consist of color images of the scene and actual trajectories of pedestrians (i.e., ground-truth paths provided by human annotators). Given these data, we aim by Neural A* to predict realistic trajectories consistent with those of pedestrians when start and goal locations are provided. We here compared Neural A* with BB-A* (Vlastelica et al., 2020) as a competitor that can also perform planning on raw image inputs. For more comparisons with other imitation learners (Ratliff et al., 2006; Tamar et al., 2016; Lee et al., 2018), see Appendix B.1.

Dataset.

We used Stanford Drone Dataset (SDD) (Robicquet et al., 2016), which comprises surveillance videos captured by static drone cameras placed at eight distinct locations. We split the dataset in two ways: (1) intra-scenes: for each location, choosing one video to build a single test split while using the rest to train a model, to see if the model can predict trajectories of unseen individuals observed at different times; (2) inter-scenes: performing leave-one-location-out cross-validation to see how well a learned model can generalize to unseen locations. As planning demonstrations, we extracted pedestrian trajectories and the local patch of a video frame that encompassed the trajectories.

Implementations and experimental setups.

Unlike the previous experiment with obstacle locations given explicitly, each model now has to learn visual representations of obstacles to avoid them during node selections. Therefore, we modified the U-Net encoder by multiplying the final sigmoid activation by a trainable positive scalar (initialized to $10.0$ ) so that obstacle regions can be assigned sufficiently high costs. We trained both Neural A* and BB-A* using RMSProp with the batch size, the number of epochs, and the learning rate set to $(64,20,0.001)$ . Because the main objective of this task is to predict paths close to the ground-truth ones, rather than to improve the search optimality and efficiency trade-off, we calculated the chamfer distance as a metric for evaluating the dissimilarities between those paths.

Results.

Table 3 shows that Neural A* significantly outperformed BB-A*. As visualized in the first two examples in Fig. 5, Neural A* often predicted paths along roads, resulting in predictions closer to ground-truth paths compared to those by BB-A*. However, both methods sometimes failed to predict actual pedestrian trajectories when there were multiple possible routes to the destinations, as shown in the example at the bottom. A possible extension to address this issue is to adopt a generative framework (Gupta et al., 2018; Salzmann et al., 2020) that allows multiple paths to be stochastically sampled.

Table 3: Quantitative Results on SDD. Bootstrap means and 95% confidence bounds of the chamfer distance between predicted and ground-truth pedestrian trajectories.

	Intra-scenes	Inter-scenes
BB-A*	152.2 (144.9, 159.1)	134.3 (132.1, 136.4)
Neural A*	16.1 (13.2, 18.8)	37.4 (35.8, 39.0)

6 Related Work

Approaches to path planning problems can be broadly classified into several types, each with its advantages and limitations. For example, sampling-based planning, such as RRT (LaValle, 1998) and PRM (Kavraki et al., 1996), can explore high-dimensional state spaces and has widely been used for robot motion planning (Kingston et al., 2018). A practical challenge is identifying important regions in the state spaces to effectively sample sparse state-points for efficient planning. To this end, a variety of data-driven methods have been developed to learn such regions (Ichter et al., 2018; Ichter & Pavone, 2019) or to learn exploration strategies (Qureshi et al., 2019; Pérez-Higueras et al., 2018; Chen et al., 2020) from expert demonstrations or prior successful planning experiences.

Meanwhile, reactive planners learn a policy of moving agents to determine the next best actions, such as moving left or right, from current states via supervised learning (Tamar et al., 2016; Kanezaki et al., 2017; Gupta et al., 2017; Karkus et al., 2017; Lee et al., 2018; Bency et al., 2019) or inverse reinforcement learning (IRL) (Ratliff et al., 2006; Ziebart et al., 2008; Wulfmeier et al., 2015; Kim & Pineau, 2016; Kretzschmar et al., 2016; Fu et al., 2018; Yu et al., 2019). These approaches can be useful for dynamic environments where the agents have to act adaptively. Also, IRL-based methods are relevant to our work in that they learn to recover cost functions from demonstrations. However, they often suffer from planning failures especially for a large map and thus require the help of other classical planners to enable long-range planning (Faust et al., 2018). In Appendix B.1, we provide quantitative evaluations of some relevant methods (Ratliff et al., 2006; Tamar et al., 2016; Lee et al., 2018) with our experimental setting.

Compared to these two approaches, search-based planning is advantageous in terms of ensured success in finding valid paths in a fine grid map. Classical heuristic planners have sought better heuristic functions, search algorithms, and efficient implementations to improve the search performance (e.g., Burns et al. (2012); Zhou & Zeng (2015); Abd Algfoor et al. (2015)). Recent data-driven approaches further extend heuristic planners in two ways: learning from expert demonstrations to (a) find near-optimal paths efficiently (Choudhury et al., 2018; Takahashi et al., 2019) or (b) perform the planning directly on raw image inputs (Vlastelica et al., 2020). Standing on both sides, our work proposes the first differentiable search-based planner that can solve these two problems in a principled fashion. From another perspective, data-driven heuristic planners can learn their cost functions, as in ours and Vlastelica et al. (2020), as well as their heuristic functions, as in Choudhury et al. (2018); Takahashi et al. (2019). These variations are analogous to learning a (negative) reward function in IRL and a Q-function in RL, respectively. Very recently, Archetti et al. (2021) have extended Neural A* to learn both of these functions.

7 Conclusion

We have presented a novel data-driven search-based planner named Neural A*, which involves a differentiable A* algorithm. Neural A* learns from demonstrations to improve the trade-off between search optimality and efficiency in path planning and also to enable the planning directly on raw image inputs. Our extensive experimental evaluations on multiple public datasets have demonstrated the effectiveness of our approach over state-of-the-art planners.

References

Abd Algfoor et al. (2015) Abd Algfoor, Z., Sunar, M. S., and Kolivand, H. A comprehensive study on pathfinding techniques for robotics and video games. International Journal of Computer Games Technology, 2015.
Anderson et al. (2018) Anderson, P., Chang, A., Chaplot, D. S., Dosovitskiy, A., Gupta, S., Koltun, V., Kosecka, J., Malik, J., Mottaghi, R., Savva, M., et al. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018.
Archetti et al. (2021) Archetti, A., Cannici, M., and Matteucci, M. Neural weighted A*: Learning graph costs and heuristics with differentiable anytime A*. arXiv preprint arXiv:2105.01480, 2021.
Bency et al. (2019) Bency, M. J., Qureshi, A. H., and Yip, M. C. Neural path planning: Fixed time, near-optimal path generation via oracle imitation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
Bhardwaj et al. (2017) Bhardwaj, M., Choudhury, S., and Scherer, S. Learning heuristic search via imitation. In Proceedings of the Conference on Robot Learning (CoRL), 2017.
Burns et al. (2012) Burns, E. A., Hatem, M., Leighton, M. J., and Ruml, W. Implementing fast heuristic search code. In Proceedings of the Annual Symposium on Combinatorial Search (SOCS), 2012.
Chen et al. (2020) Chen, B., Dai, B., Lin, Q., Ye, G., Liu, H., and Song, L. Learning to plan in high dimensions via neural exploration-exploitation trees. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
Choudhury et al. (2018) Choudhury, S., Bhardwaj, M., Arora, S., Kapoor, A., Ranade, G., Scherer, S., and Dey, D. Data-driven planning via imitation learning. International Journal of Robotics Research (IJRR), 37(13-14):1632–1672, 2018.
Faust et al. (2018) Faust, A., Oslund, K., Ramirez, O., Francis, A., Tapia, L., Fiser, M., and Davidson, J. PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 5113–5120, 2018.
Fu et al. (2018) Fu, J., Luo, K., and Levine, S. Learning robust rewards with adversarial inverse reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
González et al. (2015) González, D., Pérez, J., Milanés, V., and Nashashibi, F. A review of motion planning techniques for automated vehicles. IEEE Transactions on Intelligent Transportation Systems, 17(4):1135–1145, 2015.
Gupta et al. (2018) Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., and Alahi, A. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2255–2264, 2018.
Gupta et al. (2017) Gupta, S., Davidson, J., Levine, S., Sukthankar, R., and Malik, J. Cognitive mapping and planning for visual navigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2616–2625, 2017.
Hart et al. (1968) Hart, P. E., Nilsson, N. J., and Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968.
He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
Hubara et al. (2016) Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y. Binarized neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 4107–4115, 2016.
Ichter & Pavone (2019) Ichter, B. and Pavone, M. Robot motion planning in learned latent spaces. IEEE Robotics and Automation Letters (RA-L), 4(3):2407–2414, 2019.
Ichter et al. (2018) Ichter, B., Harrison, J., and Pavone, M. Learning sampling distributions for robot motion planning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 7087–7094, 2018.
Ichter et al. (2020) Ichter, B., Schmerling, E., Lee, T. W. E., and Faust, A. Learned critical probabilistic roadmaps for robotic motion planning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 9535–9541, 2020.
Jang et al. (2017) Jang, E., Gu, S., and Poole, B. Categorical reparameterization with gumbel-softmax. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
Kanezaki et al. (2017) Kanezaki, A., Nitta, J., and Sasaki, Y. Goselo: Goal-directed obstacle and self-location map for robot navigation using reactive neural networks. IEEE Robotics and Automation Letters (RA-L), 3(2):696–703, 2017.
Karkus et al. (2017) Karkus, P., Hsu, D., and Lee, W. S. QMDP-Net: Deep learning for planning under partial observability. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 4694–4704, 2017.
Kavraki et al. (1996) Kavraki, L. E., Svestka, P., Latombe, J.-C., and Overmars, M. H. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. t-ro, 12(4):566–580, 1996.
Kim & Pineau (2016) Kim, B. and Pineau, J. Socially adaptive path planning in human environments using inverse reinforcement learning. International Journal of Social Robotics, 8(1):51–66, 2016.
Kingston et al. (2018) Kingston, Z., Moll, M., and Kavraki, L. E. Sampling-based methods for motion planning with constraints. Annual review of control, robotics, and autonomous systems, 1:159–185, 2018.
Kretzschmar et al. (2016) Kretzschmar, H., Spies, M., Sprunk, C., and Burgard, W. Socially compliant mobile robot navigation via inverse reinforcement learning. International Journal of Robotics Research (IJRR), 35(11):1289–1307, 2016.
LaValle (1998) LaValle, S. M. Rapidly-exploring random trees: A new tool for path planning. Technical report, Computer Science Dept., Iowa State University, 1998.
Lee et al. (2018) Lee, L., Parisotto, E., Chaplot, D. S., Xing, E., and Salakhutdinov, R. Gated path planning networks. In Proceedings of the International Conference on Machine Learning (ICML), 2018.
Paden et al. (2016) Paden, B., Čáp, M., Yong, S. Z., Yershov, D., and Frazzoli, E. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles, 1(1):33–55, 2016.
Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 8024–8035, 2019.
Patel (1997) Patel, A. Heuristics – Stanford CS Theory, 1997. URL {http://theory.stanford.edu/~amitp/GameProgramming/Heuristics.html}.
Pérez-Higueras et al. (2018) Pérez-Higueras, N., Caballero, F., and Merino, L. Learning human-aware path planning with fully convolutional networks. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 1–5, 2018.
Pohl (1970) Pohl, I. Heuristic search viewed as path finding in a graph. Artificial intelligence, 1(3-4):193–204, 1970.
Qureshi et al. (2019) Qureshi, A. H., Simeonov, A., Bency, M. J., and Yip, M. C. Motion planning networks. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2118–2124, 2019.
Ratliff et al. (2006) Ratliff, N. D., Bagnell, J. A., and Zinkevich, M. A. Maximum margin planning. In Proceedings of the International Conference on Machine Learning (ICML), ICML ’06, pp. 729–736, 2006.
Robicquet et al. (2016) Robicquet, A., Sadeghian, A., Alahi, A., and Savarese, S. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 549–565, 2016.
Ronneberger et al. (2015) Ronneberger, O., Fischer, P., and Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 234–241, 2015.
Salzmann et al. (2020) Salzmann, T., Ivanovic, B., Chakravarty, P., and Pavone, M. Trajectron++: Multi-agent generative trajectory forecasting with heterogeneous data for control. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
Simonyan & Zisserman (2015) Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
Smith et al. (2012) Smith, C., Karayiannidis, Y., Nalpantidis, L., Gratal, X., Qi, P., Dimarogonas, D. V., and Kragic, D. Dual arm manipulation - a survey. Robotics and Autonomous Systems, 60(10):1340–1353, 2012.
Sturtevant (2012) Sturtevant, N. Benchmarks for grid-based pathfinding. IEEE Transactions on Computational Intelligence and AI in Game, 4(2):144 – 148, 2012.
Takahashi et al. (2019) Takahashi, T., Sun, H., Tian, D., and Wang, Y. Learning heuristic functions for mobile robot path planning using deep neural networks. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), pp. 764–772, 2019.
Tamar et al. (2016) Tamar, A., Wu, Y., Thomas, G., Levine, S., and Abbeel, P. Value iteration networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 2154–2162, 2016.
Tange (2011) Tange, O. Gnu parallel - the command-line power tool. The USENIX Magazine, 36(1):42–47, Feb 2011. URL http://www.gnu.org/s/parallel.
Vlastelica et al. (2020) Vlastelica, M., Paulus, A., Musil, V., Martius, G., and Rolínek, M. Differentiation of blackbox combinatorial solvers. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
Wulfmeier et al. (2015) Wulfmeier, M., Ondruska, P., and Posner, I. Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888, 2015.
Yakubovskiy (2020) Yakubovskiy, P. Segmentation models pytorch. https://github.com/qubvel/segmentation_models.pytorch, 2020.
Yu et al. (2019) Yu, L., Yu, T., Finn, C., and Ermon, S. Meta-inverse reinforcement learning with probabilistic context variables. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 11772–11783, 2019.
Zhou & Zeng (2015) Zhou, Y. and Zeng, J. Massively parallel a* search on a gpu. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2015.
Ziebart et al. (2008) Ziebart, B. D., Maas, A. L., Bagnell, J. A., and Dey, A. K. Maximum entropy inverse reinforcement learning. In Proceedings of the Conference on Artificial Intelligence (AAAI), pp. 1433–1438, 2008.

Appendix A Details of Experimental Setups

A.1 Dataset Creation

Here we present supplementary information on how each dataset was created in our experiments. Since our dataset generation process involves randomness, we fixed a random seed in each generation program to ensure reproducibility. Please refer to the code in our project page: https://omron-sinicx.github.io/neural-astar/.

MP/Tiled-MP/CSM datasets.

In the experiments with MP/Tiled-MP/CSM datasets, we employed more challenging settings involving randomized start and goal locations instead of pre-defined consistent locations used in prior work (Choudhury et al., 2018; Vlastelica et al., 2020). We determined these start and goal locations strategically based on their actual distances to avoid generating easy problem instances. Specifically, for each environmental map, a single goal location was randomly determined once and fixed throughout the experiments. Here, for a map with the width and height denoted as $(W,H)$ , i.e., $(32,32)$ for the MP and $(64,64)$ for the Tiled MP and CSM datasets, the goal location was sampled from one of the four corner regions of size $(W/4,H/4)$ , as illustrated in Fig. 6 (middle). Then, we performed the Dijkstra algorithm to compute actual movement costs from every passable node to the goal, and calculated the $55,70,85$ -percentile points of the costs. For every iteration in the training phase, we sampled a new random start location from regions whose actual costs higher than the $55$ percentile point. As for validation and testing data, we sampled two and five random but fixed start locations, respectively, from each of three kinds of regions whose costs are within the percentile ranges of $[55,70]$ , $[70,85]$ , and $[85,100]$ (see Fig. 6 (right) for an illustration). Consequently, we created $2\times 3=6$ and $5\times 3=15$ diverse start locations for each validation and test map, respectively. The ground-truth paths for all the generated problem instances were obtained by the Dijkstra algorithm. When computing losses for the Tiled MP and CSM datasets, these paths were dilated with a $3\times 3$ kernel, which stabilized the training.

SDD.

In SDD, we extracted relatively simple trajectories of pedestrians who moved directly towards their destinations. Specifically, for each trajectory provided by Robicquet et al. (2016), we first trimmed it randomly to have a sequence length in the range of $[300,600]$ timesteps (at 2.5fps). We then calculated the ratio of its straight-line distance between start and goal points to the trajectory length, as a measure of path simplicity that gives a lower value for a more complex trajectory. We discarded trajectories whose simplicity was less than 0.5. Finally, we cropped image patches that encompass each trajectory with the margin of 50 pixels and resized them to the size of $64\times 64$ . As a result, 8,325 samples were extracted from the dataset.

Table 4: Hyper-parameter Selection. The list of hyper-parameters and ranges of these values tried during the development of this paper.

Common
Parameter Name	Values (range of values tried)
Optimizer	RMSProp
Learning rate	0.001 (0.0001, 0.0005, 0.001, 0.005)
Batch size	100
Number of epochs	100 (MP), 400 (Tiled MP, CSM)
Tie breaking (for $h$ func.)	$0.001\times$ Euclidean distance
Neural A*/Neural BF
Encoder arch	U-Net with VGG-16 backbone (VGG-16, ResNet-18)
Temperature $\tau$	$\sqrt{32}$ for MP, $\sqrt{64}$ for Tiled MP and CSM
BB-A*
Encoder arch	U-Net with VGG-16 backbone
Trade-off parameter $\lambda$	20.0 (0.1, 1.0, 5.0, 10.0, 20.0)
SAIL/SAIL-SL
Max data samples	300 (60, 300)
Sampling rate	10 (10, 100)
WA*
Weight factor for $h(v)$	0.8 (0.6, 0.7, 0.8)

A.2 Hyper-Parameter Selection

Table 4 shows the list of hyper-parameters as well as ranges of these values we tried to produce the final results. We selected the final parameters based on the validation performance on the MP dataset in terms of Hmean scores. Because completing all the experiments took a considerably long time (see the next section), we performed each experiment only once with a fixed set of random seeds.

Below we provide several remarks regarding the hyper-parameter list. We observed that the tie-breaking in A* search, i.e., the adjustment to $h(v)$ by adding the Euclidean distance from $v$ to the goal scaled by a small positive constant (0.001), was critical to improving the base efficiency of A* search. Thus, we used this setting for all the A*-based methods throughout the experiments. The choice of the learning rate little affected final performances given a sufficient number of epochs. BB-A* has an additional hyper-parameter $\lambda$ that controls the trade-off between “informativeness of the gradient” and “faithfulness to the original function” (Vlastelica et al., 2020). We tried several values and found that any choice worked reasonably well, except for extremely small values (e.g., 0.1). SAIL and SAIL-SL (Choudhury et al., 2018) have hyper-parameters on how to collect samples from each training environment instance, which little affected final performances. Finally, the weighted A* baseline used a modified node selection criterion with a weight factor $w$ to the heuristic function; i.e., $(1-w)\cdot g(v)+w\cdot h(v)$ , for which we set $w=0.8$ throughout the experiments. Note that using $w=1.0$ for the criterion corresponds to the best-first search in our baselines.

A.3 Computing Infrastructure and Training Time

We performed all the experiments on a server operated on Ubuntu 18.04.3 LTS with NVIDIA V100 GPUs, Intel Xeon Gold 6252 CPU @ 2.10GHz (48 cores), and 768GB main memory. Our implementation was based on PyTorch 1.5 (Paszke et al., 2019) and Segmentation Models PyTorch (Yakubovskiy, 2020). To efficiently carry out the experiments, we used GNU Parallel (Tange, 2011) to run multiple experiment sessions in parallel. See our setup.py and Dockerfile for the list of all dependencies.

In the training sessions, each model was trained using a single V100 GPU with 16 GB graphics memory. Training each of our models (Neural A* and Neural BF) took approximately 50 minutes on the MP dataset (100 epochs $\times$ 800 maps with the size of $32\times 32$ ) and 35 hours on the Tiled MP and CSM datasets (400 epochs $\times$ 3,200 maps with the size of $64\times 64$ ). As for SAIL/SAIL-SL and BB-A*, the training on the MP dataset took approximately the same time (i.e., 50 minutes). On the Tiled MP and CSM datasets, SAIL/SAIL-SL took up to about 22 hours and BB-A* took 65 hours.

Appendix B Additional Results

Figures 7 and 8 show additional qualitative results (including those of MMP introduced below.) In what follows, we also add more detailed performance analysis to Neural A* from different perspectives.

B.1 Comparisons with Imitation Learning Methods

As introduced in Sec. 6, some of the imitation learning (IL) methods are relevant to our work in that they learn to recover unknown reward (i.e., negative cost) functions from demonstrations. Here, we compared our approach with several IL methods tailored to path planning tasks, namely, Maximum Margin Planning (MMP) (Ratliff et al., 2006), Value Iteration Network (VIN) (Tamar et al., 2016) and Gated Path Planning Network (GPPN) (Lee et al., 2018).

For MMP, we followed the original work and modeled the cost function as a per-node linear mapping from features to costs. For feature extraction, we used the VGG-16 network pretrained on ImageNet. As in Neural A*, we activated the costs with the sigmoid function to constrain them to be in $[0,1]$ . Unlike other IL-based planners, MMP uses A* search to plan a path with the estimated cost. We used the same implementation of A* search as that of Neural A*.

For VIN and GPPN, we used the official codebase of Lee et al. (2018). Because these reactive planners are not based on a search-based algorithm, we could not employ the Exp and Hmean metrics, which are associated with performances of the baseline A* search. Instead, we calculated the success ratio (Suc), which is the percentage of problem instances for which a planner found a valid path.

As shown in Tables 5 and 6, we confirmed that the performances of these IL methods are limited compared to the proposed Neural A*. Although MMP ensures 100% planning success as using A* search, it was consistently outperformed by Neural A* in terms of the Opt, Exp, and Hmean metrics. One possible reason of these limited performances is that MMP cannot learn how internal search steps of A* affect search histories and resulting paths, as we compared BB-A* against Neural A* in Sec. 4.4. While GPPN obtained a higher Opt score than Neural A* on the Tiled MP dataset, it did not always successfully find a valid path as shown in its success ratio. Moreover, GPPN and VIN completely failed to learn on SDD. These failures can be accounted for by its large input maps and limited demonstrations (single trajectory per map), which are more challenging settings than those by the original work.

Table 5: Comparisons with Imitation Learning Methods. Bootstrap means and 95% confidence bounds of path optimality ratio (Opt), reduction ratio of node explorations (Exp), the harmonic mean (Hmean) between Opt and Exp, and success ratio (Suc).

MP Dataset
	Opt	Exp	Hmean	Suc
VIN	24.7 (23.9, 25.5)	N/A	N/A	31.4 (30.6, 32.3)
GPPN	71.0 (70.2, 71.8)	N/A	N/A	86.2 (85.6, 86.9)
MMP	81.6 (80.6, 82.8)	22.4 (21.7, 23.2)	31.5 (30.6, 32.4)	100.0 (100.0, 100.0)
Neural A*	87.7 (86.6, 88.9)	40.1 (38.9, 41.3)	52.0 (50.7, 53.3)	100.0 (100.0, 100.0)
Tiled MP Dataset
	Opt	Exp	Hmean	Suc
VIN	52.7 (51.5, 54.0)	N/A	N/A	58.3 (57.1, 59.5)
GPPN	68.2 (67.0, 69.4)	N/A	N/A	81.5 (80.5, 82.5)
MMP	44.8 (42.4, 47.1)	40.5 (38.7, 42.4)	35.5 (33.8, 37.2)	100.0 (100.0, 100.0)
Neural A*	63.0 (60.7, 65.2)	55.8 (54.1, 57.5)	54.2 (52.6, 55.8)	100.0 (100.0, 100.0)
CSM Dataset
	Opt	Exp	Hmean	Suc
VIN	70.4 (69.3, 71.6)	N/A	N/A	73.2 (72.1, 74.4)
GPPN	68.9 (67.7 60.1)	N/A	N/A	85.3 (84.4, 86.2)
MMP	66.4 (64.0, 68.9)	28.4 (26.4, 30.4)	31.9 (30.0, 33.8)	100.0 (100.0, 100.0)
Neural A*	73.5 (71.5, 75.5)	37.6 (35.5, 39.7)	43.6 (41.7, 45.5)	100.0 (100.0, 100.0)

Table 6: Comparisons with Imitation Learning Methods on SDD. Bootstrap means and 95% confidence bounds of the chamfer distance between predicted and actual pedestrian trajectories.

	Intra-scenes	Inter-scenes
VIN	920.7 (890.8, 950.4)	900.6 (888.1, 913.8)
GPPN	920.7 (890.8, 952.3)	900.6 (888.0, 913.4)
MMP	126.7 (119.3, 133.9)	130.3 (128.1, 132.6)
Neural A*	16.1 (13.2, 18.8)	37.4 (35.8, 39.0)

B.2 Path Length Optimality Evaluation

Following Tamar et al. (2016); Anderson et al. (2018), we introduce another metric for evaluating the path optimality, which calculates the ratio of the optimal path length $\bar{P}$ to predicted one $P$ . For consistency with the other metrics, this path-length ratio is measured in 0–100 (%) and maximized when the path is optimal, i.e., we calculate $|\bar{P}|/|P|\times 100$ . As shown in Table 7, Neural A* produced the most nearly-optimal paths across all the datasets.

Table 7: Path Length Optimality Evaluation. Bootstrap means and 95% confidence bounds of the ratio of optimal to produced path lengths (the higher the better).

	MP	Tiled-MP	CSM
BF	96.4 (96.1, 96.6)	92.1 (91.6, 92.6)	96.4 (96.1, 96.7)
WA*	96.9 (96.6, 97.1)	93.4 (93.0, 93.8)	96.8 (96.6, 97.1)
SAIL	87.5 (86.8, 88.3)	78.0 (77.0, 79.0)	87.1 (86.1, 88.1)
SAIL-SL	88.2 (87.5, 89.0)	73.4 (72.1, 74.7)	84.7 (83.5, 85.9)
BB-A*	96.3 (96.0, 96.6)	93.0 (92.5, 93.4)	96.5 (96.2, 96.8)
Neural BF	97.5 (97.3, 97.8)	95.0 (94.7, 95.4)	97.4 (97.1, 97.6)
Neural A*	99.1 (99.0, 99.2)	98.4 (98.3, 98.6)	98.9 (98.8, 99.0)

B.3 Computational Complexity and Runtime Analysis

The main computational bottleneck of Neural A* lies in the differentiable A* module. Because this module uses matrix operations involving all the nodes to enable back-propagation, its training-phase complexities in space and time are $\mathcal{O}(k|\mathcal{V}|)$ , where $k$ is the number of search steps. In theory, the required number of search steps depends on the true path length $d$ . Thus, the complexities in terms of $d$ amount to $\mathcal{O}(d|\mathcal{V}|)$ and $\mathcal{O}(b^{d}|\mathcal{V}|)$ for the best and worst case, respectively, where $b$ is the average number of neighboring nodes expanded per search step. Practically, when training is finished, we can replace the differentiable A* module with a standard (i.e., non-differentiable) A* search algorithm without changing the testing behaviors of Neural A*. With this replacement, Neural A* can perform planning in $\mathcal{O}(d)$ and $\mathcal{O}(b^{d})$ for the best and worst case.

To analyze search runtimes empirically, we created three sets of 50 maps with the sizes of $64\times 64$ , $128\times 128$ , and $256\times 256$ , by randomly drawing maps from the MP dataset and tiling them. We then measured the average runtime per problem taken using a standard A* search implementation⁴⁴4We implemented a python-based A* search algorithm with pqdict package (https://github.com/nvictus/priority-queue-dictionary) and used its priority queue feature for performing node selections efficiently. To accurately measure a search runtime per problem, we performed the program exclusively on a single CPU core (for performing A* search) and a single GPU (for additionally running the encoder to compute guidance maps for Neural A*) and solved the same problem five times after a single warm-up trial. The use of more sophisticated A* search implementations could result in further performance improvements, which is however beyond the scope of this work. coupled with and without our guidance maps. Regardless of the test map sizes, the guidance maps were trained using the Tiled MP dataset of the size $64\times 64$ , to see if our model generalizes well to larger maps. As shown in Table 8, we confirm that Neural A* greatly reduced runtimes of A* search with the help of guidance maps and also showed good generalization ability to larger maps.

Table 8: Search Runtime Evaluation. Bootstrap mean and 95% confidence bounds of the runtime (sec) required to solve a single problem with different map sizes.

	$64\times 64$	$128\times 128$	$256\times 256$
A*	0.09 (0.08, 0.10)	0.21 (0.17, 0.25)	0.78 (0.72, 0.82)
Neural A*	0.04 (0.04, 0.04)	0.07 (0.06, 0.08)	0.15 (0.14, 0.16)

Path Planning using Neural A* Search

Abstract

1 Introduction

2 Preliminaries

Path planning problem.

A* search.

Data-driven planning setup.

3 Neural A* Search

3.1 Differentiable A* Module

Variable representations.

Node selection.

Node expansion.

Updating GG.

3.2 Training Neural A*

Loss design.

Encoder design.

Enabling mini-batch training.

4 Experiments

4.1 Datasets

4.2 Methods and Implementations

Neural A*.

Baselines.

4.3 Experimental Setups

4.4 Results

Comparisons with baselines.

Qualitative results.

Ablation study.

Limitations.

5 Path Planning on Raw-Image Inputs

Dataset.

Implementations and experimental setups.

Results.

6 Related Work

7 Conclusion

References

Appendix A Details of Experimental Setups

A.1 Dataset Creation

MP/Tiled-MP/CSM datasets.

SDD.

A.2 Hyper-Parameter Selection

A.3 Computing Infrastructure and Training Time

Appendix B Additional Results

B.1 Comparisons with Imitation Learning Methods

B.2 Path Length Optimality Evaluation

B.3 Computational Complexity and Runtime Analysis

Updating $G$ .